drm/i915/gt: Limit VFE threads based on GT

Message ID	20201016175411.30406-1-chris@chris-wilson.co.uk
State	Superseded
Headers	show Return-Path: <SRS0=3jhr=DX=vger.kernel.org=stable-owner@kernel.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Cc: Chris Wilson <chris@chris-wilson.co.uk>, Mika Kuoppala <mika.kuoppala@linux.intel.com>, Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>, Akeem G Abodunrin <akeem.g.abodunrin@intel.com>, Balestrieri Francesco <francesco.balestrieri@intel.com>, Bloomfield Jon <jon.bloomfield@intel.com>, stable@vger.kernel.org Subject: [PATCH] drm/i915/gt: Limit VFE threads based on GT Date: Fri, 16 Oct 2020 18:54:11 +0100 Message-Id: <20201016175411.30406-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	drm/i915/gt: Limit VFE threads based on GT \| expand drm/i915/gt: Limit VFE threads based on GT

Message ID

20201016175411.30406-1-chris@chris-wilson.co.uk

State

Superseded

Headers

From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>,
	Mika Kuoppala <mika.kuoppala@linux.intel.com>,
	Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>,
	Akeem G Abodunrin <akeem.g.abodunrin@intel.com>,
	Balestrieri Francesco <francesco.balestrieri@intel.com>,
	Bloomfield Jon <jon.bloomfield@intel.com>, stable@vger.kernel.org
Subject: [PATCH] drm/i915/gt: Limit VFE threads based on GT
Date: Fri, 16 Oct 2020 18:54:11 +0100
Message-Id: <20201016175411.30406-1-chris@chris-wilson.co.uk>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

drm/i915/gt: Limit VFE threads based on GT | expand

Commit Message

Chris Wilson Oct. 16, 2020, 5:54 p.m. UTC

MEDIA_STATE_VFE only accepts the 'maximum number of threads' in the
range [0, n-1] where n is #EU * (#threads/EU) with the number of threads
based on plaform and the number of EU based on the number of slices and
subslices. This is a fixed number per platform/gt, so appropriately
limit the number of threads we spawn to match the device.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2024
Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Cc: Balestrieri Francesco <francesco.balestrieri@intel.com>
Cc: Bloomfield Jon <jon.bloomfield@intel.com>
Cc: <stable@vger.kernel.org> # v5.7+
---
 drivers/gpu/drm/i915/gt/gen7_renderclear.c | 35 +++++++++++++++-------
 1 file changed, 24 insertions(+), 11 deletions(-)

Comments

Randy Wright Nov. 10, 2020, 12:32 a.m. UTC | #1

On Fri, Oct 16, 2020 at 06:54:11PM +0100, Chris Wilson wrote:

> MEDIA_STATE_VFE only accepts the 'maximum number of threads' in the

> range [0, n-1] where n is #EU * (#threads/EU) with the number of threads

> based on plaform and the number of EU based on the number of slices and

> subslices. This is a fixed number per platform/gt, so appropriately

> limit the number of threads we spawn to match the device.

> 

> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2024

> Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts")

> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>

> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>

> Cc: Prathap Kumar Valsan <prathap.kumar.valsan at intel.com>

> Cc: Akeem G Abodunrin <akeem.g.abodunrin at intel.com>

> Cc: Balestrieri Francesco <francesco.balestrieri at intel.com>

> Cc: Bloomfield Jon <jon.bloomfield at intel.com>

> Cc: <stable at vger.kernel.org> # v5.7+

> ---

> ...

I tested this patch and found that it prevents the GPU hang I had
reported on the HP Pavilion Mini 300-020 in
https://gitlab.freedesktop.org/drm/intel/-/issues/2413.

In more detail: I built linux-next at tag next-20201106 without
the patch, and booted the result on an Ubuntu 20.04 base system.  As
expected, I observed the hang that I had previously reported as soon as
Cinnnamon started when I entered graphical.target.

I then applied this patch - that being the only change to my kernel -
and I was able to boot to graphical.target 5 times consecutively without
any GPU hang.

You may add my endorsements:

Tested-by: Randy Wright <rwright@hpe.com>

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2413
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2106

-- 
Randy Wright            Hewlett Packard Enterprise
Phone: (970) 898-0998   Mail: rwright@hpe.com

Rodrigo Vivi Jan. 7, 2021, 7:50 p.m. UTC | #2

On Fri, Oct 16, 2020 at 06:54:11PM +0100, Chris Wilson wrote:
> MEDIA_STATE_VFE only accepts the 'maximum number of threads' in the

> range [0, n-1] where n is #EU * (#threads/EU) with the number of threads

> based on plaform and the number of EU based on the number of slices and

> subslices. This is a fixed number per platform/gt, so appropriately

> limit the number of threads we spawn to match the device.

> 

> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2024


we need to get this closed...

> Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts")

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>

> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>

> Cc: Balestrieri Francesco <francesco.balestrieri@intel.com>

> Cc: Bloomfield Jon <jon.bloomfield@intel.com>

> Cc: <stable@vger.kernel.org> # v5.7+

> ---

>  drivers/gpu/drm/i915/gt/gen7_renderclear.c | 35 +++++++++++++++-------

>  1 file changed, 24 insertions(+), 11 deletions(-)

> 

> diff --git a/drivers/gpu/drm/i915/gt/gen7_renderclear.c b/drivers/gpu/drm/i915/gt/gen7_renderclear.c

> index d93d85cd3027..f3b8fea6226e 100644

> --- a/drivers/gpu/drm/i915/gt/gen7_renderclear.c

> +++ b/drivers/gpu/drm/i915/gt/gen7_renderclear.c

> @@ -7,8 +7,6 @@

>  #include "i915_drv.h"

>  #include "intel_gpu_commands.h"

>  

> -#define MAX_URB_ENTRIES 64

> -#define STATE_SIZE (4 * 1024)

>  #define GT3_INLINE_DATA_DELAYS 0x1E00

>  #define batch_advance(Y, CS) GEM_BUG_ON((Y)->end != (CS))

>  

> @@ -34,8 +32,7 @@ struct batch_chunk {

>  };

>  

>  struct batch_vals {

> -	u32 max_primitives;

> -	u32 max_urb_entries;

> +	u32 max_primitives; /* == number of VFE threads */

>  	u32 cmd_size;

>  	u32 state_size;

>  	u32 state_start;

> @@ -50,18 +47,35 @@ static void

>  batch_get_defaults(struct drm_i915_private *i915, struct batch_vals *bv)

>  {

>  	if (IS_HASWELL(i915)) {

> -		bv->max_primitives = 280;

> -		bv->max_urb_entries = MAX_URB_ENTRIES;

> +		switch (INTEL_INFO(i915)->gt) {

> +		default:

> +		case 1:

> +			bv->max_primitives = 70;

> +			break;

> +		case 2:

> +			bv->max_primitives = 140;

> +			break;

> +		case 3:

> +			bv->max_primitives = 280;

> +			break;

> +		}

>  		bv->surface_height = 16 * 16;

>  		bv->surface_width = 32 * 2 * 16;

>  	} else {

> -		bv->max_primitives = 128;

> -		bv->max_urb_entries = MAX_URB_ENTRIES / 2;

> +		switch (INTEL_INFO(i915)->gt) {

> +		default:

> +		case 1: /* including vlv */

> +			bv->max_primitives = 36;

> +			break;

> +		case 2:

> +			bv->max_primitives = 128;

> +			break;

> +		}

>  		bv->surface_height = 16 * 8;

>  		bv->surface_width = 32 * 16;

>  	}

>  	bv->cmd_size = bv->max_primitives * 4096;

> -	bv->state_size = STATE_SIZE;

> +	bv->state_size = SZ_4K;

>  	bv->state_start = bv->cmd_size;

>  	bv->batch_size = bv->cmd_size + bv->state_size;

>  	bv->scratch_size = bv->surface_height * bv->surface_width;

> @@ -244,7 +258,6 @@ gen7_emit_vfe_state(struct batch_chunk *batch,

>  		    u32 urb_size, u32 curbe_size,

>  		    u32 mode)

>  {

> -	u32 urb_entries = bv->max_urb_entries;

>  	u32 threads = bv->max_primitives - 1;

>  	u32 *cs = batch_alloc_items(batch, 32, 8);

>  

> @@ -254,7 +267,7 @@ gen7_emit_vfe_state(struct batch_chunk *batch,

>  	*cs++ = 0;

>  

>  	/* number of threads & urb entries for GPGPU vs Media Mode */

> -	*cs++ = threads << 16 | urb_entries << 8 | mode << 2;

> +	*cs++ = threads << 16 | 1 << 8 | mode << 2;


why urb_entries = 1 ?

the range is 0,64 and 0,128 depending on the sku.

in general there's a min of 32 URBs

>  

>  	*cs++ = 0;

>  

> -- 

> 2.20.1

> 

> _______________________________________________

> Intel-gfx mailing list

> Intel-gfx@lists.freedesktop.org

> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Chris Wilson Jan. 7, 2021, 10:04 p.m. UTC | #3

Quoting Rodrigo Vivi (2021-01-07 19:50:37)
> On Fri, Oct 16, 2020 at 06:54:11PM +0100, Chris Wilson wrote:

> > MEDIA_STATE_VFE only accepts the 'maximum number of threads' in the

> > range [0, n-1] where n is #EU * (#threads/EU) with the number of threads

> > based on plaform and the number of EU based on the number of slices and

> > subslices. This is a fixed number per platform/gt, so appropriately

> > limit the number of threads we spawn to match the device.

> > 

> > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2024

> 

> we need to get this closed...


Unfortunately this failed the validation test. And as that test is still
not in CI, I cannot say why. My vote would be to remove the
clear_residuals until it works on all target platforms. Plus we clearly
need a hsw-gt1 in CI.
 
> >       bv->scratch_size = bv->surface_height * bv->surface_width;

> > @@ -244,7 +258,6 @@ gen7_emit_vfe_state(struct batch_chunk *batch,

> >                   u32 urb_size, u32 curbe_size,

> >                   u32 mode)

> >  {

> > -     u32 urb_entries = bv->max_urb_entries;

> >       u32 threads = bv->max_primitives - 1;

> >       u32 *cs = batch_alloc_items(batch, 32, 8);

> >  

> > @@ -254,7 +267,7 @@ gen7_emit_vfe_state(struct batch_chunk *batch,

> >       *cs++ = 0;

> >  

> >       /* number of threads & urb entries for GPGPU vs Media Mode */

> > -     *cs++ = threads << 16 | urb_entries << 8 | mode << 2;

> > +     *cs++ = threads << 16 | 1 << 8 | mode << 2;

> 

> why urb_entries = 1 ?


We only used a single entry. There was no measurable benefit from
assigning more entries, and the importance of any side effects from doing
so unknown.

> the range is 0,64 and 0,128 depending on the sku.

> 

> in general there's a min of 32 URBs


Don't forget num_entries * entry_size must fit within the URB
allocation/allotment.
-Chris

diff --git a/drivers/gpu/drm/i915/gt/gen7_renderclear.c b/drivers/gpu/drm/i915/gt/gen7_renderclear.c
index d93d85cd3027..f3b8fea6226e 100644
--- a/drivers/gpu/drm/i915/gt/gen7_renderclear.c
+++ b/drivers/gpu/drm/i915/gt/gen7_renderclear.c
@@ -7,8 +7,6 @@ 
 #include "i915_drv.h"
 #include "intel_gpu_commands.h"
 
-#define MAX_URB_ENTRIES 64
-#define STATE_SIZE (4 * 1024)
 #define GT3_INLINE_DATA_DELAYS 0x1E00
 #define batch_advance(Y, CS) GEM_BUG_ON((Y)->end != (CS))
 
@@ -34,8 +32,7 @@  struct batch_chunk {
 };
 
 struct batch_vals {
-	u32 max_primitives;
-	u32 max_urb_entries;
+	u32 max_primitives; /* == number of VFE threads */
 	u32 cmd_size;
 	u32 state_size;
 	u32 state_start;
@@ -50,18 +47,35 @@  static void
 batch_get_defaults(struct drm_i915_private *i915, struct batch_vals *bv)
 {
 	if (IS_HASWELL(i915)) {
-		bv->max_primitives = 280;
-		bv->max_urb_entries = MAX_URB_ENTRIES;
+		switch (INTEL_INFO(i915)->gt) {
+		default:
+		case 1:
+			bv->max_primitives = 70;
+			break;
+		case 2:
+			bv->max_primitives = 140;
+			break;
+		case 3:
+			bv->max_primitives = 280;
+			break;
+		}
 		bv->surface_height = 16 * 16;
 		bv->surface_width = 32 * 2 * 16;
 	} else {
-		bv->max_primitives = 128;
-		bv->max_urb_entries = MAX_URB_ENTRIES / 2;
+		switch (INTEL_INFO(i915)->gt) {
+		default:
+		case 1: /* including vlv */
+			bv->max_primitives = 36;
+			break;
+		case 2:
+			bv->max_primitives = 128;
+			break;
+		}
 		bv->surface_height = 16 * 8;
 		bv->surface_width = 32 * 16;
 	}
 	bv->cmd_size = bv->max_primitives * 4096;
-	bv->state_size = STATE_SIZE;
+	bv->state_size = SZ_4K;
 	bv->state_start = bv->cmd_size;
 	bv->batch_size = bv->cmd_size + bv->state_size;
 	bv->scratch_size = bv->surface_height * bv->surface_width;
@@ -244,7 +258,6 @@  gen7_emit_vfe_state(struct batch_chunk *batch,
 		    u32 urb_size, u32 curbe_size,
 		    u32 mode)
 {
-	u32 urb_entries = bv->max_urb_entries;
 	u32 threads = bv->max_primitives - 1;
 	u32 *cs = batch_alloc_items(batch, 32, 8);
 
@@ -254,7 +267,7 @@  gen7_emit_vfe_state(struct batch_chunk *batch,
 	*cs++ = 0;
 
 	/* number of threads & urb entries for GPGPU vs Media Mode */
-	*cs++ = threads << 16 | urb_entries << 8 | mode << 2;
+	*cs++ = threads << 16 | 1 << 8 | mode << 2;
 
 	*cs++ = 0;

drm/i915/gt: Limit VFE threads based on GT

Commit Message

Comments

Patch