diff mbox series

drm/msm: Check for the GPU IOMMU during bind

Message ID 20230309222049.4180579-1-jorcrous@amazon.com
State New
Headers show
Series drm/msm: Check for the GPU IOMMU during bind | expand

Commit Message

Jordan Crouse March 9, 2023, 10:20 p.m. UTC
While booting with amd,imageon on a headless target the GPU probe was
failing with -ENOSPC in get_pages() from msm_gem.c.

Investigation showed that the driver was using the default 16MB VRAM
carveout because msm_use_mmu() was returning false since headless devices
use a dummy parent device. Avoid this by extending the existing is_a2xx
priv member to check the GPU IOMMU state on all platforms and use that
check in msm_use_mmu().

This works for memory allocations but it doesn't prevent the VRAM carveout
from being created because that happens before we have a chance to check
the GPU IOMMU state in adreno_bind.

There are a number of possible options to resolve this but none of them are
very clean. The easiest way is to likely specify vram=0 as module parameter
on headless devices so that the memory doesn't get wasted.

Signed-off-by: Jordan Crouse <jorcrous@amazon.com>
---

 drivers/gpu/drm/msm/adreno/adreno_device.c | 6 +++++-
 drivers/gpu/drm/msm/msm_drv.c              | 7 +++----
 drivers/gpu/drm/msm/msm_drv.h              | 2 +-
 3 files changed, 9 insertions(+), 6 deletions(-)

Comments

Dmitry Baryshkov March 9, 2023, 11:05 p.m. UTC | #1
On 10/03/2023 00:20, Jordan Crouse wrote:
> While booting with amd,imageon on a headless target the GPU probe was
> failing with -ENOSPC in get_pages() from msm_gem.c.
> 
> Investigation showed that the driver was using the default 16MB VRAM
> carveout because msm_use_mmu() was returning false since headless devices
> use a dummy parent device. Avoid this by extending the existing is_a2xx
> priv member to check the GPU IOMMU state on all platforms and use that
> check in msm_use_mmu().

I wonder if we can fix this by setting 'dummy_dev'->of_node to adreno's 
of_node. Did you check that possibility?

> 
> This works for memory allocations but it doesn't prevent the VRAM carveout
> from being created because that happens before we have a chance to check
> the GPU IOMMU state in adreno_bind.
> 
> There are a number of possible options to resolve this but none of them are
> very clean. The easiest way is to likely specify vram=0 as module parameter
> on headless devices so that the memory doesn't get wasted.
> 
> Signed-off-by: Jordan Crouse <jorcrous@amazon.com>
> ---
> 
>   drivers/gpu/drm/msm/adreno/adreno_device.c | 6 +++++-
>   drivers/gpu/drm/msm/msm_drv.c              | 7 +++----
>   drivers/gpu/drm/msm/msm_drv.h              | 2 +-
>   3 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
> index 36f062c7582f..4f19da28f80f 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> @@ -539,7 +539,11 @@ static int adreno_bind(struct device *dev, struct device *master, void *data)
>   	DBG("Found GPU: %u.%u.%u.%u", config.rev.core, config.rev.major,
>   		config.rev.minor, config.rev.patchid);
>   
> -	priv->is_a2xx = config.rev.core == 2;
> +	/*
> +	 * A2xx has a built in IOMMU and all other IOMMU enabled targets will
> +	 * have an ARM IOMMU attached
> +	 */
> +	priv->has_gpu_iommu = config.rev.core == 2 || device_iommu_mapped(dev);
>   	priv->has_cached_coherent = config.rev.core >= 6;
>   
>   	gpu = info->init(drm);
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index aca48c868c14..a125a351ec90 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -318,11 +318,10 @@ bool msm_use_mmu(struct drm_device *dev)
>   	struct msm_drm_private *priv = dev->dev_private;
>   
>   	/*
> -	 * a2xx comes with its own MMU
> -	 * On other platforms IOMMU can be declared specified either for the
> -	 * MDP/DPU device or for its parent, MDSS device.
> +	 * Return true if the GPU or the MDP/DPU or parent MDSS device has an
> +	 * IOMMU
>   	 */
> -	return priv->is_a2xx ||
> +	return priv->has_gpu_iommu ||
>   		device_iommu_mapped(dev->dev) ||
>   		device_iommu_mapped(dev->dev->parent);

It is not a problem of you patch, of course, but this check now looks 
strange to me. We mix the GPU check and MDP/DPU checks. Consider msm8x60 
(a220, mdp4) and, for example, no system level MMU.

>   }
> diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
> index 9f0c184b02a0..f33f94acd1b9 100644
> --- a/drivers/gpu/drm/msm/msm_drv.h
> +++ b/drivers/gpu/drm/msm/msm_drv.h
> @@ -126,7 +126,7 @@ struct msm_drm_private {
>   	struct msm_gpu *gpu;
>   
>   	/* gpu is only set on open(), but we need this info earlier */
> -	bool is_a2xx;
> +	bool has_gpu_iommu;
>   	bool has_cached_coherent;
>   
>   	struct drm_fb_helper *fbdev;
Jordan Crouse April 5, 2023, 3:37 p.m. UTC | #2
On Fri, Mar 10, 2023 at 01:05:36AM +0200, Dmitry Baryshkov wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 10/03/2023 00:20, Jordan Crouse wrote:
> > While booting with amd,imageon on a headless target the GPU probe was
> > failing with -ENOSPC in get_pages() from msm_gem.c.
> > 
> > Investigation showed that the driver was using the default 16MB VRAM
> > carveout because msm_use_mmu() was returning false since headless devices
> > use a dummy parent device. Avoid this by extending the existing is_a2xx
> > priv member to check the GPU IOMMU state on all platforms and use that
> > check in msm_use_mmu().
> 
> I wonder if we can fix this by setting 'dummy_dev'->of_node to adreno's
> of_node. Did you check that possibility?

I said I would check and then never looped back around. This will stick
on my todo list for now and I'll check on the next cycle. If anybody
else wants to jump in the meantime then please go for it.

Jordan
Dmitry Baryshkov July 6, 2023, 6:55 p.m. UTC | #3
On 10/03/2023 00:20, Jordan Crouse wrote:
> While booting with amd,imageon on a headless target the GPU probe was
> failing with -ENOSPC in get_pages() from msm_gem.c.
> 
> Investigation showed that the driver was using the default 16MB VRAM
> carveout because msm_use_mmu() was returning false since headless devices
> use a dummy parent device. Avoid this by extending the existing is_a2xx
> priv member to check the GPU IOMMU state on all platforms and use that
> check in msm_use_mmu().
> 
> This works for memory allocations but it doesn't prevent the VRAM carveout
> from being created because that happens before we have a chance to check
> the GPU IOMMU state in adreno_bind.
> 
> There are a number of possible options to resolve this but none of them are
> very clean. The easiest way is to likely specify vram=0 as module parameter
> on headless devices so that the memory doesn't get wasted.

This patch was on my plate for quite a while, please excuse me for 
taking it so long.

I see the following problem with the current code. We have two different 
instances than can access memory: MDP/DPU and GPU. And each of them can 
either have or miss the MMU.

For some time I toyed with the idea of determining whether the allocated 
BO is going to be used by display or by GPU, but then I abandoned it. We 
can have display BOs being filled by GPU, so handling it this way would 
complicate things a lot.

This actually rings a tiny bell in my head with the idea of splitting 
the display and GPU parts to two different drivers, but I'm not sure 
what would be the overall impact.

More on the msm_use_mmu() below.

> 
> Signed-off-by: Jordan Crouse <jorcrous@amazon.com>
> ---
> 
>   drivers/gpu/drm/msm/adreno/adreno_device.c | 6 +++++-
>   drivers/gpu/drm/msm/msm_drv.c              | 7 +++----
>   drivers/gpu/drm/msm/msm_drv.h              | 2 +-
>   3 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
> index 36f062c7582f..4f19da28f80f 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> @@ -539,7 +539,11 @@ static int adreno_bind(struct device *dev, struct device *master, void *data)
>   	DBG("Found GPU: %u.%u.%u.%u", config.rev.core, config.rev.major,
>   		config.rev.minor, config.rev.patchid);
>   
> -	priv->is_a2xx = config.rev.core == 2;
> +	/*
> +	 * A2xx has a built in IOMMU and all other IOMMU enabled targets will
> +	 * have an ARM IOMMU attached
> +	 */
> +	priv->has_gpu_iommu = config.rev.core == 2 || device_iommu_mapped(dev);
>   	priv->has_cached_coherent = config.rev.core >= 6;
>   
>   	gpu = info->init(drm);
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index aca48c868c14..a125a351ec90 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -318,11 +318,10 @@ bool msm_use_mmu(struct drm_device *dev)
>   	struct msm_drm_private *priv = dev->dev_private;
>   
>   	/*
> -	 * a2xx comes with its own MMU
> -	 * On other platforms IOMMU can be declared specified either for the
> -	 * MDP/DPU device or for its parent, MDSS device.
> +	 * Return true if the GPU or the MDP/DPU or parent MDSS device has an
> +	 * IOMMU
>   	 */
> -	return priv->is_a2xx ||
> +	return priv->has_gpu_iommu ||
>   		device_iommu_mapped(dev->dev) ||
>   		device_iommu_mapped(dev->dev->parent);

I have a generic feeling that both old an new code is not fully correct. 
Please correct me if I'm wrong:

We should be using VRAM, if either of the blocks can not use remapped 
memory. So this should have been:

bool msm_use_mmu()
{
  if (!gpu_has_iommu)
    return false;

  if (have_display_part && !display_has_mmu())
    return false;

  return true;
}

What do you think.

>   }
> diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
> index 9f0c184b02a0..f33f94acd1b9 100644
> --- a/drivers/gpu/drm/msm/msm_drv.h
> +++ b/drivers/gpu/drm/msm/msm_drv.h
> @@ -126,7 +126,7 @@ struct msm_drm_private {
>   	struct msm_gpu *gpu;
>   
>   	/* gpu is only set on open(), but we need this info earlier */
> -	bool is_a2xx;
> +	bool has_gpu_iommu;
>   	bool has_cached_coherent;
>   
>   	struct drm_fb_helper *fbdev;
Jordan Crouse July 7, 2023, 3:03 p.m. UTC | #4
On Thu, Jul 06, 2023 at 09:55:13PM +0300, Dmitry Baryshkov wrote:
> 
> On 10/03/2023 00:20, Jordan Crouse wrote:
> > While booting with amd,imageon on a headless target the GPU probe was
> > failing with -ENOSPC in get_pages() from msm_gem.c.
> > 
> > Investigation showed that the driver was using the default 16MB VRAM
> > carveout because msm_use_mmu() was returning false since headless devices
> > use a dummy parent device. Avoid this by extending the existing is_a2xx
> > priv member to check the GPU IOMMU state on all platforms and use that
> > check in msm_use_mmu().
> > 
> > This works for memory allocations but it doesn't prevent the VRAM carveout
> > from being created because that happens before we have a chance to check
> > the GPU IOMMU state in adreno_bind.
> > 
> > There are a number of possible options to resolve this but none of them are
> > very clean. The easiest way is to likely specify vram=0 as module parameter
> > on headless devices so that the memory doesn't get wasted.
> 
> This patch was on my plate for quite a while, please excuse me for
> taking it so long.

No worries. I'm also chasing a bunch of other stuff too.

> I see the following problem with the current code. We have two different
> instances than can access memory: MDP/DPU and GPU. And each of them can
> either have or miss the MMU.
> 
> For some time I toyed with the idea of determining whether the allocated
> BO is going to be used by display or by GPU, but then I abandoned it. We
> can have display BOs being filled by GPU, so handling it this way would
> complicate things a lot.
> 
> This actually rings a tiny bell in my head with the idea of splitting
> the display and GPU parts to two different drivers, but I'm not sure
> what would be the overall impact.

As I now exclusively work on headless devices I would be 100% for this,
but I'm sure that our laptop friends might not agree :)

> More on the msm_use_mmu() below.
> 
> > 
> > Signed-off-by: Jordan Crouse <jorcrous@amazon.com>
> > ---
> > 
> >   drivers/gpu/drm/msm/adreno/adreno_device.c | 6 +++++-
> >   drivers/gpu/drm/msm/msm_drv.c              | 7 +++----
> >   drivers/gpu/drm/msm/msm_drv.h              | 2 +-
> >   3 files changed, 9 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
> > index 36f062c7582f..4f19da28f80f 100644
> > --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> > +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> > @@ -539,7 +539,11 @@ static int adreno_bind(struct device *dev, struct device *master, void *data)
> >       DBG("Found GPU: %u.%u.%u.%u", config.rev.core, config.rev.major,
> >               config.rev.minor, config.rev.patchid);
> > 
> > -     priv->is_a2xx = config.rev.core == 2;
> > +     /*
> > +      * A2xx has a built in IOMMU and all other IOMMU enabled targets will
> > +      * have an ARM IOMMU attached
> > +      */
> > +     priv->has_gpu_iommu = config.rev.core == 2 || device_iommu_mapped(dev);
> >       priv->has_cached_coherent = config.rev.core >= 6;
> > 
> >       gpu = info->init(drm);
> > diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> > index aca48c868c14..a125a351ec90 100644
> > --- a/drivers/gpu/drm/msm/msm_drv.c
> > +++ b/drivers/gpu/drm/msm/msm_drv.c
> > @@ -318,11 +318,10 @@ bool msm_use_mmu(struct drm_device *dev)
> >       struct msm_drm_private *priv = dev->dev_private;
> > 
> >       /*
> > -      * a2xx comes with its own MMU
> > -      * On other platforms IOMMU can be declared specified either for the
> > -      * MDP/DPU device or for its parent, MDSS device.
> > +      * Return true if the GPU or the MDP/DPU or parent MDSS device has an
> > +      * IOMMU
> >        */
> > -     return priv->is_a2xx ||
> > +     return priv->has_gpu_iommu ||
> >               device_iommu_mapped(dev->dev) ||
> >               device_iommu_mapped(dev->dev->parent);
> 
> I have a generic feeling that both old an new code is not fully correct.
> Please correct me if I'm wrong:
> 
> We should be using VRAM, if either of the blocks can not use remapped
> memory. So this should have been:
> 
> bool msm_use_mmu()
> {
>  if (!gpu_has_iommu)
>    return false;
> 
>  if (have_display_part && !display_has_mmu())
>    return false;
> 
>  return true;
> }
> 
> What do you think.

I would have to see (and try) the real code but that seems like it might
be reasonable. I would like to hear from some of the a2xx users too
because this mostly affects them. On 3xx and newer you've always had the
option of not having a MMU for GPU or display but I can't think of any
use cases where you wouldn't want it.

> --
> With best wishes
> Dmitry
> 

Jordan
Dmitry Baryshkov July 7, 2023, 5:27 p.m. UTC | #5
On 07/07/2023 18:03, Jordan Crouse wrote:
> On Thu, Jul 06, 2023 at 09:55:13PM +0300, Dmitry Baryshkov wrote:
>>
>> On 10/03/2023 00:20, Jordan Crouse wrote:
>>> While booting with amd,imageon on a headless target the GPU probe was
>>> failing with -ENOSPC in get_pages() from msm_gem.c.
>>>
>>> Investigation showed that the driver was using the default 16MB VRAM
>>> carveout because msm_use_mmu() was returning false since headless devices
>>> use a dummy parent device. Avoid this by extending the existing is_a2xx
>>> priv member to check the GPU IOMMU state on all platforms and use that
>>> check in msm_use_mmu().
>>>
>>> This works for memory allocations but it doesn't prevent the VRAM carveout
>>> from being created because that happens before we have a chance to check
>>> the GPU IOMMU state in adreno_bind.
>>>
>>> There are a number of possible options to resolve this but none of them are
>>> very clean. The easiest way is to likely specify vram=0 as module parameter
>>> on headless devices so that the memory doesn't get wasted.
>>
>> This patch was on my plate for quite a while, please excuse me for
>> taking it so long.
> 
> No worries. I'm also chasing a bunch of other stuff too.
> 
>> I see the following problem with the current code. We have two different
>> instances than can access memory: MDP/DPU and GPU. And each of them can
>> either have or miss the MMU.
>>
>> For some time I toyed with the idea of determining whether the allocated
>> BO is going to be used by display or by GPU, but then I abandoned it. We
>> can have display BOs being filled by GPU, so handling it this way would
>> complicate things a lot.
>>
>> This actually rings a tiny bell in my head with the idea of splitting
>> the display and GPU parts to two different drivers, but I'm not sure
>> what would be the overall impact.
> 
> As I now exclusively work on headless devices I would be 100% for this,
> but I'm sure that our laptop friends might not agree :)

I do not know here. This is probably a question to Rob, as he better 
understands the interaction between GPU and display parts of the userspace.

> 
>> More on the msm_use_mmu() below.
>>
>>>
>>> Signed-off-by: Jordan Crouse <jorcrous@amazon.com>
>>> ---
>>>
>>>    drivers/gpu/drm/msm/adreno/adreno_device.c | 6 +++++-
>>>    drivers/gpu/drm/msm/msm_drv.c              | 7 +++----
>>>    drivers/gpu/drm/msm/msm_drv.h              | 2 +-
>>>    3 files changed, 9 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
>>> index 36f062c7582f..4f19da28f80f 100644
>>> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
>>> @@ -539,7 +539,11 @@ static int adreno_bind(struct device *dev, struct device *master, void *data)
>>>        DBG("Found GPU: %u.%u.%u.%u", config.rev.core, config.rev.major,
>>>                config.rev.minor, config.rev.patchid);
>>>
>>> -     priv->is_a2xx = config.rev.core == 2;
>>> +     /*
>>> +      * A2xx has a built in IOMMU and all other IOMMU enabled targets will
>>> +      * have an ARM IOMMU attached
>>> +      */
>>> +     priv->has_gpu_iommu = config.rev.core == 2 || device_iommu_mapped(dev);
>>>        priv->has_cached_coherent = config.rev.core >= 6;
>>>
>>>        gpu = info->init(drm);
>>> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
>>> index aca48c868c14..a125a351ec90 100644
>>> --- a/drivers/gpu/drm/msm/msm_drv.c
>>> +++ b/drivers/gpu/drm/msm/msm_drv.c
>>> @@ -318,11 +318,10 @@ bool msm_use_mmu(struct drm_device *dev)
>>>        struct msm_drm_private *priv = dev->dev_private;
>>>
>>>        /*
>>> -      * a2xx comes with its own MMU
>>> -      * On other platforms IOMMU can be declared specified either for the
>>> -      * MDP/DPU device or for its parent, MDSS device.
>>> +      * Return true if the GPU or the MDP/DPU or parent MDSS device has an
>>> +      * IOMMU
>>>         */
>>> -     return priv->is_a2xx ||
>>> +     return priv->has_gpu_iommu ||
>>>                device_iommu_mapped(dev->dev) ||
>>>                device_iommu_mapped(dev->dev->parent);
>>
>> I have a generic feeling that both old an new code is not fully correct.
>> Please correct me if I'm wrong:
>>
>> We should be using VRAM, if either of the blocks can not use remapped
>> memory. So this should have been:
>>
>> bool msm_use_mmu()
>> {
>>   if (!gpu_has_iommu)
>>     return false;
>>
>>   if (have_display_part && !display_has_mmu())
>>     return false;
>>
>>   return true;
>> }
>>
>> What do you think.
> 
> I would have to see (and try) the real code but that seems like it might
> be reasonable.

Sure, let me craft it then.

> I would like to hear from some of the a2xx users too
> because this mostly affects them. On 3xx and newer you've always had the
> option of not having a MMU for GPU or display but I can't think of any
> use cases where you wouldn't want it.

msm8974 doesn't have (working) IOMMU driver. I also think there was an 
issue somewhere around sdm630/660. And the wonderful msm8992/4, IIRC, 
also don't have one.

Also the headless mode was quite useful for bringing up platforms, as it 
allowed us to test GPU separately (and ofc. in some cases even w/o MMU).

I have both a2xx (only iMX for now) and a3xx for the tests here, on my 
desk.
Akhil P Oommen July 9, 2023, 9:50 p.m. UTC | #6
On Fri, Jul 07, 2023 at 08:27:18PM +0300, Dmitry Baryshkov wrote:
> 
> On 07/07/2023 18:03, Jordan Crouse wrote:
> > On Thu, Jul 06, 2023 at 09:55:13PM +0300, Dmitry Baryshkov wrote:
> > > 
> > > On 10/03/2023 00:20, Jordan Crouse wrote:
> > > > While booting with amd,imageon on a headless target the GPU probe was
> > > > failing with -ENOSPC in get_pages() from msm_gem.c.
> > > > 
> > > > Investigation showed that the driver was using the default 16MB VRAM
> > > > carveout because msm_use_mmu() was returning false since headless devices
> > > > use a dummy parent device. Avoid this by extending the existing is_a2xx
> > > > priv member to check the GPU IOMMU state on all platforms and use that
> > > > check in msm_use_mmu().
> > > > 
> > > > This works for memory allocations but it doesn't prevent the VRAM carveout
> > > > from being created because that happens before we have a chance to check
> > > > the GPU IOMMU state in adreno_bind.
> > > > 
> > > > There are a number of possible options to resolve this but none of them are
> > > > very clean. The easiest way is to likely specify vram=0 as module parameter
> > > > on headless devices so that the memory doesn't get wasted.
> > > 
> > > This patch was on my plate for quite a while, please excuse me for
> > > taking it so long.
> > 
> > No worries. I'm also chasing a bunch of other stuff too.
> > 
> > > I see the following problem with the current code. We have two different
> > > instances than can access memory: MDP/DPU and GPU. And each of them can
> > > either have or miss the MMU.
> > > 
> > > For some time I toyed with the idea of determining whether the allocated
> > > BO is going to be used by display or by GPU, but then I abandoned it. We
> > > can have display BOs being filled by GPU, so handling it this way would
> > > complicate things a lot.
> > > 
> > > This actually rings a tiny bell in my head with the idea of splitting
> > > the display and GPU parts to two different drivers, but I'm not sure
> > > what would be the overall impact.
> > 
> > As I now exclusively work on headless devices I would be 100% for this,
> > but I'm sure that our laptop friends might not agree :)
> 
> I do not know here. This is probably a question to Rob, as he better
> understands the interaction between GPU and display parts of the userspace.

I fully support this if it is feasible.

In our architecture, display and GPU are completely independent subsystems.
Like Jordan mentioned, there are IOT products without display. And I wouldn't
be surprised if there is a product with just display and uses software rendering.

-Akhil

> 
> > 
> > > More on the msm_use_mmu() below.
> > > 
> > > > 
> > > > Signed-off-by: Jordan Crouse <jorcrous@amazon.com>
> > > > ---
> > > > 
> > > >    drivers/gpu/drm/msm/adreno/adreno_device.c | 6 +++++-
> > > >    drivers/gpu/drm/msm/msm_drv.c              | 7 +++----
> > > >    drivers/gpu/drm/msm/msm_drv.h              | 2 +-
> > > >    3 files changed, 9 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
> > > > index 36f062c7582f..4f19da28f80f 100644
> > > > --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> > > > +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> > > > @@ -539,7 +539,11 @@ static int adreno_bind(struct device *dev, struct device *master, void *data)
> > > >        DBG("Found GPU: %u.%u.%u.%u", config.rev.core, config.rev.major,
> > > >                config.rev.minor, config.rev.patchid);
> > > > 
> > > > -     priv->is_a2xx = config.rev.core == 2;
> > > > +     /*
> > > > +      * A2xx has a built in IOMMU and all other IOMMU enabled targets will
> > > > +      * have an ARM IOMMU attached
> > > > +      */
> > > > +     priv->has_gpu_iommu = config.rev.core == 2 || device_iommu_mapped(dev);
> > > >        priv->has_cached_coherent = config.rev.core >= 6;
> > > > 
> > > >        gpu = info->init(drm);
> > > > diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> > > > index aca48c868c14..a125a351ec90 100644
> > > > --- a/drivers/gpu/drm/msm/msm_drv.c
> > > > +++ b/drivers/gpu/drm/msm/msm_drv.c
> > > > @@ -318,11 +318,10 @@ bool msm_use_mmu(struct drm_device *dev)
> > > >        struct msm_drm_private *priv = dev->dev_private;
> > > > 
> > > >        /*
> > > > -      * a2xx comes with its own MMU
> > > > -      * On other platforms IOMMU can be declared specified either for the
> > > > -      * MDP/DPU device or for its parent, MDSS device.
> > > > +      * Return true if the GPU or the MDP/DPU or parent MDSS device has an
> > > > +      * IOMMU
> > > >         */
> > > > -     return priv->is_a2xx ||
> > > > +     return priv->has_gpu_iommu ||
> > > >                device_iommu_mapped(dev->dev) ||
> > > >                device_iommu_mapped(dev->dev->parent);
> > > 
> > > I have a generic feeling that both old an new code is not fully correct.
> > > Please correct me if I'm wrong:
> > > 
> > > We should be using VRAM, if either of the blocks can not use remapped
> > > memory. So this should have been:
> > > 
> > > bool msm_use_mmu()
> > > {
> > >   if (!gpu_has_iommu)
> > >     return false;
> > > 
> > >   if (have_display_part && !display_has_mmu())
> > >     return false;
> > > 
> > >   return true;
> > > }
> > > 
> > > What do you think.
> > 
> > I would have to see (and try) the real code but that seems like it might
> > be reasonable.
> 
> Sure, let me craft it then.
> 
> > I would like to hear from some of the a2xx users too
> > because this mostly affects them. On 3xx and newer you've always had the
> > option of not having a MMU for GPU or display but I can't think of any
> > use cases where you wouldn't want it.
> 
> msm8974 doesn't have (working) IOMMU driver. I also think there was an issue
> somewhere around sdm630/660. And the wonderful msm8992/4, IIRC, also don't
> have one.
> 
> Also the headless mode was quite useful for bringing up platforms, as it
> allowed us to test GPU separately (and ofc. in some cases even w/o MMU).
> 
> I have both a2xx (only iMX for now) and a3xx for the tests here, on my desk.
> 
> -- 
> With best wishes
> Dmitry
>
Rob Clark July 20, 2023, 3:52 p.m. UTC | #7
On Thu, Jul 6, 2023 at 11:55 AM Dmitry Baryshkov
<dmitry.baryshkov@linaro.org> wrote:
>
> On 10/03/2023 00:20, Jordan Crouse wrote:
> > While booting with amd,imageon on a headless target the GPU probe was
> > failing with -ENOSPC in get_pages() from msm_gem.c.
> >
> > Investigation showed that the driver was using the default 16MB VRAM
> > carveout because msm_use_mmu() was returning false since headless devices
> > use a dummy parent device. Avoid this by extending the existing is_a2xx
> > priv member to check the GPU IOMMU state on all platforms and use that
> > check in msm_use_mmu().
> >
> > This works for memory allocations but it doesn't prevent the VRAM carveout
> > from being created because that happens before we have a chance to check
> > the GPU IOMMU state in adreno_bind.
> >
> > There are a number of possible options to resolve this but none of them are
> > very clean. The easiest way is to likely specify vram=0 as module parameter
> > on headless devices so that the memory doesn't get wasted.
>
> This patch was on my plate for quite a while, please excuse me for
> taking it so long.
>
> I see the following problem with the current code. We have two different
> instances than can access memory: MDP/DPU and GPU. And each of them can
> either have or miss the MMU.
>
> For some time I toyed with the idea of determining whether the allocated
> BO is going to be used by display or by GPU, but then I abandoned it. We
> can have display BOs being filled by GPU, so handling it this way would
> complicate things a lot.

There is MSM_BO_SCANOUT .. but it wouldn't completely surprise me if
it isn't used in some place where it should somewhere or other.  But
that is the hint that contiguous allocation should be used if the
display doesn't support some sort of iommu.  (Using a GPU without some
sort of mmu/iommu isn't something sane to do.. the only reason the
support for that exists at all is to aid bringup.  I wouldn't call
that a "supported" configuration.)

> This actually rings a tiny bell in my head with the idea of splitting
> the display and GPU parts to two different drivers, but I'm not sure
> what would be the overall impact.

Userspace does have better support for split display/gpu these days
than it did when drm/msm was first merged.  It _might_ just work if
one device only advertised DRIVER_RENDER and the other
MODESET/ATOMIC.. but I'd be a bit concerned about breaking things.  I
guess you could try some sort of kconfig knob to have two "msm"
devices and see what breaks, but I'm a bit skeptical that we could
make this the default anytime soon.

For now, just addressing the only-display and only-gpu cases
(continuing with the single device arrangement when you have both
display and gpu), maybe split up drm_dev_alloc() and drm_dev_init() so
that we could use drm_device::driver_features to mask out
DRIVER_RENDER if needed.

BR,
-R

> More on the msm_use_mmu() below.
>
> >
> > Signed-off-by: Jordan Crouse <jorcrous@amazon.com>
> > ---
> >
> >   drivers/gpu/drm/msm/adreno/adreno_device.c | 6 +++++-
> >   drivers/gpu/drm/msm/msm_drv.c              | 7 +++----
> >   drivers/gpu/drm/msm/msm_drv.h              | 2 +-
> >   3 files changed, 9 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
> > index 36f062c7582f..4f19da28f80f 100644
> > --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> > +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> > @@ -539,7 +539,11 @@ static int adreno_bind(struct device *dev, struct device *master, void *data)
> >       DBG("Found GPU: %u.%u.%u.%u", config.rev.core, config.rev.major,
> >               config.rev.minor, config.rev.patchid);
> >
> > -     priv->is_a2xx = config.rev.core == 2;
> > +     /*
> > +      * A2xx has a built in IOMMU and all other IOMMU enabled targets will
> > +      * have an ARM IOMMU attached
> > +      */
> > +     priv->has_gpu_iommu = config.rev.core == 2 || device_iommu_mapped(dev);
> >       priv->has_cached_coherent = config.rev.core >= 6;
> >
> >       gpu = info->init(drm);
> > diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> > index aca48c868c14..a125a351ec90 100644
> > --- a/drivers/gpu/drm/msm/msm_drv.c
> > +++ b/drivers/gpu/drm/msm/msm_drv.c
> > @@ -318,11 +318,10 @@ bool msm_use_mmu(struct drm_device *dev)
> >       struct msm_drm_private *priv = dev->dev_private;
> >
> >       /*
> > -      * a2xx comes with its own MMU
> > -      * On other platforms IOMMU can be declared specified either for the
> > -      * MDP/DPU device or for its parent, MDSS device.
> > +      * Return true if the GPU or the MDP/DPU or parent MDSS device has an
> > +      * IOMMU
> >        */
> > -     return priv->is_a2xx ||
> > +     return priv->has_gpu_iommu ||
> >               device_iommu_mapped(dev->dev) ||
> >               device_iommu_mapped(dev->dev->parent);
>
> I have a generic feeling that both old an new code is not fully correct.
> Please correct me if I'm wrong:
>
> We should be using VRAM, if either of the blocks can not use remapped
> memory. So this should have been:
>
> bool msm_use_mmu()
> {
>   if (!gpu_has_iommu)
>     return false;
>
>   if (have_display_part && !display_has_mmu())
>     return false;
>
>   return true;
> }
>
> What do you think.
>
> >   }
> > diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
> > index 9f0c184b02a0..f33f94acd1b9 100644
> > --- a/drivers/gpu/drm/msm/msm_drv.h
> > +++ b/drivers/gpu/drm/msm/msm_drv.h
> > @@ -126,7 +126,7 @@ struct msm_drm_private {
> >       struct msm_gpu *gpu;
> >
> >       /* gpu is only set on open(), but we need this info earlier */
> > -     bool is_a2xx;
> > +     bool has_gpu_iommu;
> >       bool has_cached_coherent;
> >
> >       struct drm_fb_helper *fbdev;
>
> --
> With best wishes
> Dmitry
>
Dmitry Baryshkov July 20, 2023, 5:11 p.m. UTC | #8
On 20/07/2023 18:52, Rob Clark wrote:
> On Thu, Jul 6, 2023 at 11:55 AM Dmitry Baryshkov
> <dmitry.baryshkov@linaro.org> wrote:
>>
>> On 10/03/2023 00:20, Jordan Crouse wrote:
>>> While booting with amd,imageon on a headless target the GPU probe was
>>> failing with -ENOSPC in get_pages() from msm_gem.c.
>>>
>>> Investigation showed that the driver was using the default 16MB VRAM
>>> carveout because msm_use_mmu() was returning false since headless devices
>>> use a dummy parent device. Avoid this by extending the existing is_a2xx
>>> priv member to check the GPU IOMMU state on all platforms and use that
>>> check in msm_use_mmu().
>>>
>>> This works for memory allocations but it doesn't prevent the VRAM carveout
>>> from being created because that happens before we have a chance to check
>>> the GPU IOMMU state in adreno_bind.
>>>
>>> There are a number of possible options to resolve this but none of them are
>>> very clean. The easiest way is to likely specify vram=0 as module parameter
>>> on headless devices so that the memory doesn't get wasted.
>>
>> This patch was on my plate for quite a while, please excuse me for
>> taking it so long.
>>
>> I see the following problem with the current code. We have two different
>> instances than can access memory: MDP/DPU and GPU. And each of them can
>> either have or miss the MMU.
>>
>> For some time I toyed with the idea of determining whether the allocated
>> BO is going to be used by display or by GPU, but then I abandoned it. We
>> can have display BOs being filled by GPU, so handling it this way would
>> complicate things a lot.
> 
> There is MSM_BO_SCANOUT .. but it wouldn't completely surprise me if
> it isn't used in some place where it should somewhere or other.  But
> that is the hint that contiguous allocation should be used if the
> display doesn't support some sort of iommu.  (Using a GPU without some
> sort of mmu/iommu isn't something sane to do.. the only reason the
> support for that exists at all is to aid bringup.  I wouldn't call
> that a "supported" configuration.)
> 
>> This actually rings a tiny bell in my head with the idea of splitting
>> the display and GPU parts to two different drivers, but I'm not sure
>> what would be the overall impact.
> 
> Userspace does have better support for split display/gpu these days
> than it did when drm/msm was first merged.  It _might_ just work if
> one device only advertised DRIVER_RENDER and the other
> MODESET/ATOMIC.. but I'd be a bit concerned about breaking things.  I
> guess you could try some sort of kconfig knob to have two "msm"
> devices and see what breaks, but I'm a bit skeptical that we could
> make this the default anytime soon.

Thanks. Yes, breaking userspace would be a bad thing. I do not know if 
we should consider a single GPU+KMS driver to be an ABI and thus set in 
stone.

> 
> For now, just addressing the only-display and only-gpu cases
> (continuing with the single device arrangement when you have both
> display and gpu), maybe split up drm_dev_alloc() and drm_dev_init() so
> that we could use drm_device::driver_features to mask out
> DRIVER_RENDER if needed.

Yep. I'll continue following that path.

> 
> BR,
> -R
> 
>> More on the msm_use_mmu() below.
>>
>>>
>>> Signed-off-by: Jordan Crouse <jorcrous@amazon.com>
>>> ---
>>>
>>>    drivers/gpu/drm/msm/adreno/adreno_device.c | 6 +++++-
>>>    drivers/gpu/drm/msm/msm_drv.c              | 7 +++----
>>>    drivers/gpu/drm/msm/msm_drv.h              | 2 +-
>>>    3 files changed, 9 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
>>> index 36f062c7582f..4f19da28f80f 100644
>>> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
>>> @@ -539,7 +539,11 @@ static int adreno_bind(struct device *dev, struct device *master, void *data)
>>>        DBG("Found GPU: %u.%u.%u.%u", config.rev.core, config.rev.major,
>>>                config.rev.minor, config.rev.patchid);
>>>
>>> -     priv->is_a2xx = config.rev.core == 2;
>>> +     /*
>>> +      * A2xx has a built in IOMMU and all other IOMMU enabled targets will
>>> +      * have an ARM IOMMU attached
>>> +      */
>>> +     priv->has_gpu_iommu = config.rev.core == 2 || device_iommu_mapped(dev);
>>>        priv->has_cached_coherent = config.rev.core >= 6;
>>>
>>>        gpu = info->init(drm);
>>> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
>>> index aca48c868c14..a125a351ec90 100644
>>> --- a/drivers/gpu/drm/msm/msm_drv.c
>>> +++ b/drivers/gpu/drm/msm/msm_drv.c
>>> @@ -318,11 +318,10 @@ bool msm_use_mmu(struct drm_device *dev)
>>>        struct msm_drm_private *priv = dev->dev_private;
>>>
>>>        /*
>>> -      * a2xx comes with its own MMU
>>> -      * On other platforms IOMMU can be declared specified either for the
>>> -      * MDP/DPU device or for its parent, MDSS device.
>>> +      * Return true if the GPU or the MDP/DPU or parent MDSS device has an
>>> +      * IOMMU
>>>         */
>>> -     return priv->is_a2xx ||
>>> +     return priv->has_gpu_iommu ||
>>>                device_iommu_mapped(dev->dev) ||
>>>                device_iommu_mapped(dev->dev->parent);
>>
>> I have a generic feeling that both old an new code is not fully correct.
>> Please correct me if I'm wrong:
>>
>> We should be using VRAM, if either of the blocks can not use remapped
>> memory. So this should have been:
>>
>> bool msm_use_mmu()
>> {
>>    if (!gpu_has_iommu)
>>      return false;
>>
>>    if (have_display_part && !display_has_mmu())
>>      return false;
>>
>>    return true;
>> }
>>
>> What do you think.
>>
>>>    }
>>> diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
>>> index 9f0c184b02a0..f33f94acd1b9 100644
>>> --- a/drivers/gpu/drm/msm/msm_drv.h
>>> +++ b/drivers/gpu/drm/msm/msm_drv.h
>>> @@ -126,7 +126,7 @@ struct msm_drm_private {
>>>        struct msm_gpu *gpu;
>>>
>>>        /* gpu is only set on open(), but we need this info earlier */
>>> -     bool is_a2xx;
>>> +     bool has_gpu_iommu;
>>>        bool has_cached_coherent;
>>>
>>>        struct drm_fb_helper *fbdev;
>>
>> --
>> With best wishes
>> Dmitry
>>
Bjorn Andersson July 20, 2023, 6:31 p.m. UTC | #9
On Mon, Jul 10, 2023 at 03:20:44AM +0530, Akhil P Oommen wrote:
> On Fri, Jul 07, 2023 at 08:27:18PM +0300, Dmitry Baryshkov wrote:
> > 
> > On 07/07/2023 18:03, Jordan Crouse wrote:
> > > On Thu, Jul 06, 2023 at 09:55:13PM +0300, Dmitry Baryshkov wrote:
> > > > 
> > > > On 10/03/2023 00:20, Jordan Crouse wrote:
> > > > > While booting with amd,imageon on a headless target the GPU probe was
> > > > > failing with -ENOSPC in get_pages() from msm_gem.c.
> > > > > 
> > > > > Investigation showed that the driver was using the default 16MB VRAM
> > > > > carveout because msm_use_mmu() was returning false since headless devices
> > > > > use a dummy parent device. Avoid this by extending the existing is_a2xx
> > > > > priv member to check the GPU IOMMU state on all platforms and use that
> > > > > check in msm_use_mmu().
> > > > > 
> > > > > This works for memory allocations but it doesn't prevent the VRAM carveout
> > > > > from being created because that happens before we have a chance to check
> > > > > the GPU IOMMU state in adreno_bind.
> > > > > 
> > > > > There are a number of possible options to resolve this but none of them are
> > > > > very clean. The easiest way is to likely specify vram=0 as module parameter
> > > > > on headless devices so that the memory doesn't get wasted.
> > > > 
> > > > This patch was on my plate for quite a while, please excuse me for
> > > > taking it so long.
> > > 
> > > No worries. I'm also chasing a bunch of other stuff too.
> > > 
> > > > I see the following problem with the current code. We have two different
> > > > instances than can access memory: MDP/DPU and GPU. And each of them can
> > > > either have or miss the MMU.
> > > > 
> > > > For some time I toyed with the idea of determining whether the allocated
> > > > BO is going to be used by display or by GPU, but then I abandoned it. We
> > > > can have display BOs being filled by GPU, so handling it this way would
> > > > complicate things a lot.
> > > > 
> > > > This actually rings a tiny bell in my head with the idea of splitting
> > > > the display and GPU parts to two different drivers, but I'm not sure
> > > > what would be the overall impact.
> > > 
> > > As I now exclusively work on headless devices I would be 100% for this,
> > > but I'm sure that our laptop friends might not agree :)
> > 
> > I do not know here. This is probably a question to Rob, as he better
> > understands the interaction between GPU and display parts of the userspace.
> 
> I fully support this if it is feasible.
> 

I second this.

> In our architecture, display and GPU are completely independent subsystems.
> Like Jordan mentioned, there are IOT products without display. And I wouldn't
> be surprised if there is a product with just display and uses software rendering.
> 

And we have SA8295P/SA8540P with two MDSS instances and one GPU.

Regards,
Bjorn
diff mbox series

Patch

diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 36f062c7582f..4f19da28f80f 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -539,7 +539,11 @@  static int adreno_bind(struct device *dev, struct device *master, void *data)
 	DBG("Found GPU: %u.%u.%u.%u", config.rev.core, config.rev.major,
 		config.rev.minor, config.rev.patchid);
 
-	priv->is_a2xx = config.rev.core == 2;
+	/*
+	 * A2xx has a built in IOMMU and all other IOMMU enabled targets will
+	 * have an ARM IOMMU attached
+	 */
+	priv->has_gpu_iommu = config.rev.core == 2 || device_iommu_mapped(dev);
 	priv->has_cached_coherent = config.rev.core >= 6;
 
 	gpu = info->init(drm);
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index aca48c868c14..a125a351ec90 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -318,11 +318,10 @@  bool msm_use_mmu(struct drm_device *dev)
 	struct msm_drm_private *priv = dev->dev_private;
 
 	/*
-	 * a2xx comes with its own MMU
-	 * On other platforms IOMMU can be declared specified either for the
-	 * MDP/DPU device or for its parent, MDSS device.
+	 * Return true if the GPU or the MDP/DPU or parent MDSS device has an
+	 * IOMMU
 	 */
-	return priv->is_a2xx ||
+	return priv->has_gpu_iommu ||
 		device_iommu_mapped(dev->dev) ||
 		device_iommu_mapped(dev->dev->parent);
 }
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 9f0c184b02a0..f33f94acd1b9 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -126,7 +126,7 @@  struct msm_drm_private {
 	struct msm_gpu *gpu;
 
 	/* gpu is only set on open(), but we need this info earlier */
-	bool is_a2xx;
+	bool has_gpu_iommu;
 	bool has_cached_coherent;
 
 	struct drm_fb_helper *fbdev;