[v4,00/23] drm/msm: de-struct_mutex-ification

Message ID	20201023165136.561680-1-robdclark@gmail.com
Headers	show Return-Path: <SRS0=ajYv=D6=vger.kernel.org=linux-arm-msm-owner@kernel.org> From: Rob Clark <robdclark@gmail.com> To: dri-devel@lists.freedesktop.org Cc: linux-arm-msm@vger.kernel.org, freedreno@lists.freedesktop.org, Rob Clark <robdclark@chromium.org>, Abhinav Kumar <abhinavk@codeaurora.org>, Akhil P Oommen <akhilpo@codeaurora.org>, AngeloGioacchino Del Regno <kholk11@gmail.com>, Brian Masney <masneyb@onstation.org>, Christophe JAILLET <christophe.jaillet@wanadoo.fr>, Daniel Vetter <daniel.vetter@ffwll.ch>, Emil Velikov <emil.velikov@collabora.com>, Eric Anholt <eric@anholt.net>, Harigovindan P <harigovi@codeaurora.org>, Jeffrey Hugo <jeffrey.l.hugo@gmail.com>, Jonathan Marek <jonathan@marek.ca>, Jordan Crouse <jcrouse@codeaurora.org>, "Kristian H. Kristensen" <hoegsberg@google.com>, linaro-mm-sig@lists.linaro.org (moderated list:DMA BUFFER SHARING FRAMEWORK), linux-kernel@vger.kernel.org (open list), linux-media@vger.kernel.org (open list:DMA BUFFER SHARING FRAMEWORK), Matthias Kaehlcke <mka@chromium.org>, Maxime Ripard <mripard@kernel.org>, Rajendra Nayak <rnayak@codeaurora.org>, Sam Ravnborg <sam@ravnborg.org>, Sharat Masetty <smasetty@codeaurora.org>, Thomas Zimmermann <tzimmermann@suse.de> Subject: [PATCH v4 00/23] drm/msm: de-struct_mutex-ification Date: Fri, 23 Oct 2020 09:51:01 -0700 Message-Id: <20201023165136.561680-1-robdclark@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	drm/msm: de-struct_mutex-ification \| expand [v4,00/23] drm/msm: de-struct_mutex-ification [v4,01/23] drm/msm: Fix a couple incorrect usages of get_vaddr_active() [v4,02/23] drm/msm/gem: Add obj->lock wrappers [v4,03/23] drm/msm/gem: Rename internal get_iova_locked helper [v4,04/23] drm/msm/gem: Move prototypes to msm_gem.h [v4,05/23] drm/msm/gem: Add some _locked() helpers [v4,06/23] drm/msm/gem: Move locking in shrinker path [v4,07/23] drm/msm/submit: Move copy_from_user ahead of locking bos [v4,08/23] drm/msm: Do rpm get sooner in the submit path [v4,09/23] drm/msm/gem: Switch over to obj->resv for locking [v4,10/23] drm/msm: Use correct drm_gem_object_put() in fail case [v4,11/23] drm/msm: Drop chatty trace [v4,12/23] drm/msm: Move update_fences() [v4,13/23] drm/msm: Add priv->mm_lock to protect active/inactive lists [v4,14/23] drm/msm: Document and rename preempt_lock [v4,15/23] drm/msm: Protect ring->submits with it's own lock [v4,16/23] drm/msm: Refcount submits [v4,17/23] drm/msm: Remove obj->gpu [v4,18/23] drm/msm: Drop struct_mutex from the retire path [v4,19/23] drm/msm: Drop struct_mutex in free_object() path [v4,20/23] drm/msm: Remove msm_gem_free_work [v4,21/23] drm/msm: Drop struct_mutex in madvise path [v4,22/23] drm/msm: Drop struct_mutex in shrinker path [v4,23/23] drm/msm: Don't implicit-sync if only a single ring

Message ID

20201023165136.561680-1-robdclark@gmail.com

Headers

From: Rob Clark <robdclark@gmail.com>
To: dri-devel@lists.freedesktop.org
Cc: linux-arm-msm@vger.kernel.org, freedreno@lists.freedesktop.org,
	Rob Clark <robdclark@chromium.org>,
	Abhinav Kumar <abhinavk@codeaurora.org>,
	Akhil P Oommen <akhilpo@codeaurora.org>,
	AngeloGioacchino Del Regno <kholk11@gmail.com>,
	Brian Masney <masneyb@onstation.org>,
	Christophe JAILLET <christophe.jaillet@wanadoo.fr>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	Emil Velikov <emil.velikov@collabora.com>, Eric Anholt <eric@anholt.net>,
	Harigovindan P <harigovi@codeaurora.org>,
	Jeffrey Hugo <jeffrey.l.hugo@gmail.com>,
	Jonathan Marek <jonathan@marek.ca>,
	Jordan Crouse <jcrouse@codeaurora.org>,
	"Kristian H. Kristensen" <hoegsberg@google.com>,
	linaro-mm-sig@lists.linaro.org (moderated list:DMA BUFFER SHARING
	FRAMEWORK), linux-kernel@vger.kernel.org (open list),
	linux-media@vger.kernel.org (open list:DMA BUFFER SHARING FRAMEWORK), 
	Matthias Kaehlcke <mka@chromium.org>, Maxime Ripard <mripard@kernel.org>,
	Rajendra Nayak <rnayak@codeaurora.org>, Sam Ravnborg <sam@ravnborg.org>,
	Sharat Masetty <smasetty@codeaurora.org>,
	Thomas Zimmermann <tzimmermann@suse.de>
Subject: [PATCH v4 00/23] drm/msm: de-struct_mutex-ification
Date: Fri, 23 Oct 2020 09:51:01 -0700
Message-Id: <20201023165136.561680-1-robdclark@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

drm/msm: de-struct_mutex-ification | expand

Message

Rob Clark Oct. 23, 2020, 4:51 p.m. UTC

From: Rob Clark <robdclark@chromium.org>

This doesn't remove *all* the struct_mutex, but it covers the worst
of it, ie. shrinker/madvise/free/retire.  The submit path still uses
struct_mutex, but it still needs *something* serialize a portion of
the submit path, and lock_stat mostly just shows the lock contention
there being with other submits.  And there are a few other bits of
struct_mutex usage in less critical paths (debugfs, etc).  But this
seems like a reasonable step in the right direction.

v2: teach lockdep about shrinker locking patters (danvet) and
    convert to obj->resv locking (danvet)
v3: fix get_vaddr locking for legacy userspace (relocs), devcoredump,
    and rd/hangrd
v4: couple minor review comments (krh), fix deadlock with imported
    dma-buf's (ie. from v4l2, etc)

Rob Clark (23):
  drm/msm: Fix a couple incorrect usages of get_vaddr_active()
  drm/msm/gem: Add obj->lock wrappers
  drm/msm/gem: Rename internal get_iova_locked helper
  drm/msm/gem: Move prototypes to msm_gem.h
  drm/msm/gem: Add some _locked() helpers
  drm/msm/gem: Move locking in shrinker path
  drm/msm/submit: Move copy_from_user ahead of locking bos
  drm/msm: Do rpm get sooner in the submit path
  drm/msm/gem: Switch over to obj->resv for locking
  drm/msm: Use correct drm_gem_object_put() in fail case
  drm/msm: Drop chatty trace
  drm/msm: Move update_fences()
  drm/msm: Add priv->mm_lock to protect active/inactive lists
  drm/msm: Document and rename preempt_lock
  drm/msm: Protect ring->submits with it's own lock
  drm/msm: Refcount submits
  drm/msm: Remove obj->gpu
  drm/msm: Drop struct_mutex from the retire path
  drm/msm: Drop struct_mutex in free_object() path
  drm/msm: Remove msm_gem_free_work
  drm/msm: Drop struct_mutex in madvise path
  drm/msm: Drop struct_mutex in shrinker path
  drm/msm: Don't implicit-sync if only a single ring

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     |   6 +-
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c |  12 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c     |   6 +-
 drivers/gpu/drm/msm/disp/mdp4/mdp4_crtc.c |   1 +
 drivers/gpu/drm/msm/disp/mdp5/mdp5_crtc.c |   1 +
 drivers/gpu/drm/msm/dsi/dsi_host.c        |   1 +
 drivers/gpu/drm/msm/msm_debugfs.c         |   7 +
 drivers/gpu/drm/msm/msm_drv.c             |  21 +-
 drivers/gpu/drm/msm/msm_drv.h             |  73 +-----
 drivers/gpu/drm/msm/msm_fbdev.c           |   1 +
 drivers/gpu/drm/msm/msm_gem.c             | 271 +++++++++++-----------
 drivers/gpu/drm/msm/msm_gem.h             | 133 +++++++++--
 drivers/gpu/drm/msm/msm_gem_shrinker.c    |  81 ++-----
 drivers/gpu/drm/msm/msm_gem_submit.c      | 164 ++++++++-----
 drivers/gpu/drm/msm/msm_gpu.c             | 110 +++++----
 drivers/gpu/drm/msm/msm_gpu.h             |   5 +-
 drivers/gpu/drm/msm/msm_rd.c              |   2 +-
 drivers/gpu/drm/msm/msm_ringbuffer.c      |   3 +-
 drivers/gpu/drm/msm/msm_ringbuffer.h      |  13 +-
 19 files changed, 506 insertions(+), 405 deletions(-)

Comments

Rob Clark Oct. 24, 2020, 3:49 a.m. UTC | #1

On Fri, Oct 23, 2020 at 11:20 AM Lucas Stach <l.stach@pengutronix.de> wrote:
>
> On Fr, 2020-10-23 at 09:51 -0700, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > If there is only a single ring (no-preemption), everything is FIFO order
> > and there is no need to implicit-sync.
> >
> > Mesa should probably just always use MSM_SUBMIT_NO_IMPLICIT, as behavior
> > is undefined when fences are not used to synchronize buffer usage across
> > contexts (which is the only case where multiple different priority rings
> > could come into play).
>
> Really, doesn't this break cross-device implicit sync? Okay, you may
> not have many peripherals that rely on implicit sync on devices where
> Adreno is usually found, but it seems rather heavy-handed.
>
> Wouldn't it be better to only ignore fences from your own ring context
> in the implicit sync, like we do in the common DRM scheduler
> (drm_sched_dependency_optimized)?

we already do this.. as was discussed on an earlier iteration of this patchset

But I'm not aware of any other non-gpu related implicit sync use-case
(even on imx devices where display is decoupled from gpu).. I'll
revert the patch if someone comes up with one, but otherwise lets let
the implicit sync baggage die

BR,
-R



>
> Regards,
> Lucas
>
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
> > ---
> >  drivers/gpu/drm/msm/msm_gem_submit.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> > index d04c349d8112..b6babc7f9bb8 100644
> > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > @@ -283,7 +283,7 @@ static int submit_lock_objects(struct msm_gem_submit *submit)
> >       return ret;
> >  }
> >
> > -static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
> > +static int submit_fence_sync(struct msm_gem_submit *submit, bool implicit_sync)
> >  {
> >       int i, ret = 0;
> >
> > @@ -303,7 +303,7 @@ static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
> >                               return ret;
> >               }
> >
> > -             if (no_implicit)
> > +             if (!implicit_sync)
> >                       continue;
> >
> >               ret = msm_gem_sync_object(&msm_obj->base, submit->ring->fctx,
> > @@ -774,7 +774,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> >       if (ret)
> >               goto out;
> >
> > -     ret = submit_fence_sync(submit, !!(args->flags & MSM_SUBMIT_NO_IMPLICIT));
> > +     ret = submit_fence_sync(submit, (gpu->nr_rings > 1) &&
> > +                     !(args->flags & MSM_SUBMIT_NO_IMPLICIT));
> >       if (ret)
> >               goto out;
> >
>

Daniel Vetter Oct. 29, 2020, 4:14 p.m. UTC | #2

On Mon, Oct 26, 2020 at 10:34 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Fri, Oct 23, 2020 at 08:49:14PM -0700, Rob Clark wrote:
> > On Fri, Oct 23, 2020 at 11:20 AM Lucas Stach <l.stach@pengutronix.de> wrote:
> > >
> > > On Fr, 2020-10-23 at 09:51 -0700, Rob Clark wrote:
> > > > From: Rob Clark <robdclark@chromium.org>
> > > >
> > > > If there is only a single ring (no-preemption), everything is FIFO order
> > > > and there is no need to implicit-sync.
> > > >
> > > > Mesa should probably just always use MSM_SUBMIT_NO_IMPLICIT, as behavior
> > > > is undefined when fences are not used to synchronize buffer usage across
> > > > contexts (which is the only case where multiple different priority rings
> > > > could come into play).
> > >
> > > Really, doesn't this break cross-device implicit sync? Okay, you may
> > > not have many peripherals that rely on implicit sync on devices where
> > > Adreno is usually found, but it seems rather heavy-handed.
> > >
> > > Wouldn't it be better to only ignore fences from your own ring context
> > > in the implicit sync, like we do in the common DRM scheduler
> > > (drm_sched_dependency_optimized)?
> >
> > we already do this.. as was discussed on an earlier iteration of this patchset
> >
> > But I'm not aware of any other non-gpu related implicit sync use-case
> > (even on imx devices where display is decoupled from gpu).. I'll
> > revert the patch if someone comes up with one, but otherwise lets let
> > the implicit sync baggage die
>
> The thing is, dma_resv won't die, even if implicit sync is dead. We're
> using internally for activity tracking and memory management. If you don't
> set these, then we can't share generic code with msm, and I think everyone
> inventing their own memory management is a bit a mistake.
>
> Now you only kill the implicit write sync stuff here, but I'm not sure
> that's worth much since you still install all the read fences for
> consistency. And if userspace doesn't want to be synced, they can set the
> flag and do this on their own: I think you should be able to achieve
> exactly the same thing in mesa.
>
> Aside: If you're worried about overhead, you can do O(1) submit if you
> manage your ppgtt like amdgpu does.

So just remember a use-case which is maybe a bit yucky, but it is
actually possible to implement race-free. If you have implicit sync.

There's screen-capture tool in mplayer and obs which capture your
compositor by running getfb2 in a loop. It works, and after some
initial screaming I realized it does actually work race-free. If you
have implicit sync.

I really don't think you can sunset this, as much as you want to. And
sunsetting it inconsistently is probably the worst.
-Daniel

> -Daniel
>
> >
> > BR,
> > -R
> >
> >
> >
> > >
> > > Regards,
> > > Lucas
> > >
> > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
> > > > ---
> > > >  drivers/gpu/drm/msm/msm_gem_submit.c | 7 ++++---
> > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > index d04c349d8112..b6babc7f9bb8 100644
> > > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > @@ -283,7 +283,7 @@ static int submit_lock_objects(struct msm_gem_submit *submit)
> > > >       return ret;
> > > >  }
> > > >
> > > > -static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
> > > > +static int submit_fence_sync(struct msm_gem_submit *submit, bool implicit_sync)
> > > >  {
> > > >       int i, ret = 0;
> > > >
> > > > @@ -303,7 +303,7 @@ static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
> > > >                               return ret;
> > > >               }
> > > >
> > > > -             if (no_implicit)
> > > > +             if (!implicit_sync)
> > > >                       continue;
> > > >
> > > >               ret = msm_gem_sync_object(&msm_obj->base, submit->ring->fctx,
> > > > @@ -774,7 +774,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> > > >       if (ret)
> > > >               goto out;
> > > >
> > > > -     ret = submit_fence_sync(submit, !!(args->flags & MSM_SUBMIT_NO_IMPLICIT));
> > > > +     ret = submit_fence_sync(submit, (gpu->nr_rings > 1) &&
> > > > +                     !(args->flags & MSM_SUBMIT_NO_IMPLICIT));
> > > >       if (ret)
> > > >               goto out;
> > > >
> > >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Rob Clark Oct. 29, 2020, 4:59 p.m. UTC | #3

On Thu, Oct 29, 2020 at 9:14 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>

> On Mon, Oct 26, 2020 at 10:34 AM Daniel Vetter <daniel@ffwll.ch> wrote:

> >

> > On Fri, Oct 23, 2020 at 08:49:14PM -0700, Rob Clark wrote:

> > > On Fri, Oct 23, 2020 at 11:20 AM Lucas Stach <l.stach@pengutronix.de> wrote:

> > > >

> > > > On Fr, 2020-10-23 at 09:51 -0700, Rob Clark wrote:

> > > > > From: Rob Clark <robdclark@chromium.org>

> > > > >

> > > > > If there is only a single ring (no-preemption), everything is FIFO order

> > > > > and there is no need to implicit-sync.

> > > > >

> > > > > Mesa should probably just always use MSM_SUBMIT_NO_IMPLICIT, as behavior

> > > > > is undefined when fences are not used to synchronize buffer usage across

> > > > > contexts (which is the only case where multiple different priority rings

> > > > > could come into play).

> > > >

> > > > Really, doesn't this break cross-device implicit sync? Okay, you may

> > > > not have many peripherals that rely on implicit sync on devices where

> > > > Adreno is usually found, but it seems rather heavy-handed.

> > > >

> > > > Wouldn't it be better to only ignore fences from your own ring context

> > > > in the implicit sync, like we do in the common DRM scheduler

> > > > (drm_sched_dependency_optimized)?

> > >

> > > we already do this.. as was discussed on an earlier iteration of this patchset

> > >

> > > But I'm not aware of any other non-gpu related implicit sync use-case

> > > (even on imx devices where display is decoupled from gpu).. I'll

> > > revert the patch if someone comes up with one, but otherwise lets let

> > > the implicit sync baggage die

> >

> > The thing is, dma_resv won't die, even if implicit sync is dead. We're

> > using internally for activity tracking and memory management. If you don't

> > set these, then we can't share generic code with msm, and I think everyone

> > inventing their own memory management is a bit a mistake.

> >

> > Now you only kill the implicit write sync stuff here, but I'm not sure

> > that's worth much since you still install all the read fences for

> > consistency. And if userspace doesn't want to be synced, they can set the

> > flag and do this on their own: I think you should be able to achieve

> > exactly the same thing in mesa.

> >

> > Aside: If you're worried about overhead, you can do O(1) submit if you

> > manage your ppgtt like amdgpu does.

>

> So just remember a use-case which is maybe a bit yucky, but it is

> actually possible to implement race-free. If you have implicit sync.

>

> There's screen-capture tool in mplayer and obs which capture your

> compositor by running getfb2 in a loop. It works, and after some

> initial screaming I realized it does actually work race-free. If you

> have implicit sync.

>

> I really don't think you can sunset this, as much as you want to. And

> sunsetting it inconsistently is probably the worst.


For the case where you only have a single ring, as long as it is
importing the fb in to egl to read it (which it would need to do to
get a linear view), this would still all work

(but I may drop this patch because it is just a micro-optimization and
seems to cause more confusion)

BR,
-R


> -Daniel

>

> > -Daniel

> >

> > >

> > > BR,

> > > -R

> > >

> > >

> > >

> > > >

> > > > Regards,

> > > > Lucas

> > > >

> > > > > Signed-off-by: Rob Clark <robdclark@chromium.org>

> > > > > Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

> > > > > ---

> > > > >  drivers/gpu/drm/msm/msm_gem_submit.c | 7 ++++---

> > > > >  1 file changed, 4 insertions(+), 3 deletions(-)

> > > > >

> > > > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c

> > > > > index d04c349d8112..b6babc7f9bb8 100644

> > > > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c

> > > > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c

> > > > > @@ -283,7 +283,7 @@ static int submit_lock_objects(struct msm_gem_submit *submit)

> > > > >       return ret;

> > > > >  }

> > > > >

> > > > > -static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)

> > > > > +static int submit_fence_sync(struct msm_gem_submit *submit, bool implicit_sync)

> > > > >  {

> > > > >       int i, ret = 0;

> > > > >

> > > > > @@ -303,7 +303,7 @@ static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)

> > > > >                               return ret;

> > > > >               }

> > > > >

> > > > > -             if (no_implicit)

> > > > > +             if (!implicit_sync)

> > > > >                       continue;

> > > > >

> > > > >               ret = msm_gem_sync_object(&msm_obj->base, submit->ring->fctx,

> > > > > @@ -774,7 +774,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,

> > > > >       if (ret)

> > > > >               goto out;

> > > > >

> > > > > -     ret = submit_fence_sync(submit, !!(args->flags & MSM_SUBMIT_NO_IMPLICIT));

> > > > > +     ret = submit_fence_sync(submit, (gpu->nr_rings > 1) &&

> > > > > +                     !(args->flags & MSM_SUBMIT_NO_IMPLICIT));

> > > > >       if (ret)

> > > > >               goto out;

> > > > >

> > > >

> > > _______________________________________________

> > > dri-devel mailing list

> > > dri-devel@lists.freedesktop.org

> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel

> >

> > --

> > Daniel Vetter

> > Software Engineer, Intel Corporation

> > http://blog.ffwll.ch

>

>

>

> --

> Daniel Vetter

> Software Engineer, Intel Corporation

> http://blog.ffwll.ch