Message ID | 20201016175411.30406-1-chris@chris-wilson.co.uk |
---|---|
State | Superseded |
Headers | show |
Series | drm/i915/gt: Limit VFE threads based on GT | expand |
On Fri, Oct 16, 2020 at 06:54:11PM +0100, Chris Wilson wrote: > MEDIA_STATE_VFE only accepts the 'maximum number of threads' in the > range [0, n-1] where n is #EU * (#threads/EU) with the number of threads > based on plaform and the number of EU based on the number of slices and > subslices. This is a fixed number per platform/gt, so appropriately > limit the number of threads we spawn to match the device. > > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2024 > Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts") > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com> > Cc: Prathap Kumar Valsan <prathap.kumar.valsan at intel.com> > Cc: Akeem G Abodunrin <akeem.g.abodunrin at intel.com> > Cc: Balestrieri Francesco <francesco.balestrieri at intel.com> > Cc: Bloomfield Jon <jon.bloomfield at intel.com> > Cc: <stable at vger.kernel.org> # v5.7+ > --- > ... I tested this patch and found that it prevents the GPU hang I had reported on the HP Pavilion Mini 300-020 in https://gitlab.freedesktop.org/drm/intel/-/issues/2413. In more detail: I built linux-next at tag next-20201106 without the patch, and booted the result on an Ubuntu 20.04 base system. As expected, I observed the hang that I had previously reported as soon as Cinnnamon started when I entered graphical.target. I then applied this patch - that being the only change to my kernel - and I was able to boot to graphical.target 5 times consecutively without any GPU hang. You may add my endorsements: Tested-by: Randy Wright <rwright@hpe.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2413 Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2106 -- Randy Wright Hewlett Packard Enterprise Phone: (970) 898-0998 Mail: rwright@hpe.com
On Fri, Oct 16, 2020 at 06:54:11PM +0100, Chris Wilson wrote: > MEDIA_STATE_VFE only accepts the 'maximum number of threads' in the > range [0, n-1] where n is #EU * (#threads/EU) with the number of threads > based on plaform and the number of EU based on the number of slices and > subslices. This is a fixed number per platform/gt, so appropriately > limit the number of threads we spawn to match the device. > > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2024 we need to get this closed... > Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts") > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> > Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> > Cc: Balestrieri Francesco <francesco.balestrieri@intel.com> > Cc: Bloomfield Jon <jon.bloomfield@intel.com> > Cc: <stable@vger.kernel.org> # v5.7+ > --- > drivers/gpu/drm/i915/gt/gen7_renderclear.c | 35 +++++++++++++++------- > 1 file changed, 24 insertions(+), 11 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/gen7_renderclear.c b/drivers/gpu/drm/i915/gt/gen7_renderclear.c > index d93d85cd3027..f3b8fea6226e 100644 > --- a/drivers/gpu/drm/i915/gt/gen7_renderclear.c > +++ b/drivers/gpu/drm/i915/gt/gen7_renderclear.c > @@ -7,8 +7,6 @@ > #include "i915_drv.h" > #include "intel_gpu_commands.h" > > -#define MAX_URB_ENTRIES 64 > -#define STATE_SIZE (4 * 1024) > #define GT3_INLINE_DATA_DELAYS 0x1E00 > #define batch_advance(Y, CS) GEM_BUG_ON((Y)->end != (CS)) > > @@ -34,8 +32,7 @@ struct batch_chunk { > }; > > struct batch_vals { > - u32 max_primitives; > - u32 max_urb_entries; > + u32 max_primitives; /* == number of VFE threads */ > u32 cmd_size; > u32 state_size; > u32 state_start; > @@ -50,18 +47,35 @@ static void > batch_get_defaults(struct drm_i915_private *i915, struct batch_vals *bv) > { > if (IS_HASWELL(i915)) { > - bv->max_primitives = 280; > - bv->max_urb_entries = MAX_URB_ENTRIES; > + switch (INTEL_INFO(i915)->gt) { > + default: > + case 1: > + bv->max_primitives = 70; > + break; > + case 2: > + bv->max_primitives = 140; > + break; > + case 3: > + bv->max_primitives = 280; > + break; > + } > bv->surface_height = 16 * 16; > bv->surface_width = 32 * 2 * 16; > } else { > - bv->max_primitives = 128; > - bv->max_urb_entries = MAX_URB_ENTRIES / 2; > + switch (INTEL_INFO(i915)->gt) { > + default: > + case 1: /* including vlv */ > + bv->max_primitives = 36; > + break; > + case 2: > + bv->max_primitives = 128; > + break; > + } > bv->surface_height = 16 * 8; > bv->surface_width = 32 * 16; > } > bv->cmd_size = bv->max_primitives * 4096; > - bv->state_size = STATE_SIZE; > + bv->state_size = SZ_4K; > bv->state_start = bv->cmd_size; > bv->batch_size = bv->cmd_size + bv->state_size; > bv->scratch_size = bv->surface_height * bv->surface_width; > @@ -244,7 +258,6 @@ gen7_emit_vfe_state(struct batch_chunk *batch, > u32 urb_size, u32 curbe_size, > u32 mode) > { > - u32 urb_entries = bv->max_urb_entries; > u32 threads = bv->max_primitives - 1; > u32 *cs = batch_alloc_items(batch, 32, 8); > > @@ -254,7 +267,7 @@ gen7_emit_vfe_state(struct batch_chunk *batch, > *cs++ = 0; > > /* number of threads & urb entries for GPGPU vs Media Mode */ > - *cs++ = threads << 16 | urb_entries << 8 | mode << 2; > + *cs++ = threads << 16 | 1 << 8 | mode << 2; why urb_entries = 1 ? the range is 0,64 and 0,128 depending on the sku. in general there's a min of 32 URBs > > *cs++ = 0; > > -- > 2.20.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Quoting Rodrigo Vivi (2021-01-07 19:50:37) > On Fri, Oct 16, 2020 at 06:54:11PM +0100, Chris Wilson wrote: > > MEDIA_STATE_VFE only accepts the 'maximum number of threads' in the > > range [0, n-1] where n is #EU * (#threads/EU) with the number of threads > > based on plaform and the number of EU based on the number of slices and > > subslices. This is a fixed number per platform/gt, so appropriately > > limit the number of threads we spawn to match the device. > > > > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2024 > > we need to get this closed... Unfortunately this failed the validation test. And as that test is still not in CI, I cannot say why. My vote would be to remove the clear_residuals until it works on all target platforms. Plus we clearly need a hsw-gt1 in CI. > > bv->scratch_size = bv->surface_height * bv->surface_width; > > @@ -244,7 +258,6 @@ gen7_emit_vfe_state(struct batch_chunk *batch, > > u32 urb_size, u32 curbe_size, > > u32 mode) > > { > > - u32 urb_entries = bv->max_urb_entries; > > u32 threads = bv->max_primitives - 1; > > u32 *cs = batch_alloc_items(batch, 32, 8); > > > > @@ -254,7 +267,7 @@ gen7_emit_vfe_state(struct batch_chunk *batch, > > *cs++ = 0; > > > > /* number of threads & urb entries for GPGPU vs Media Mode */ > > - *cs++ = threads << 16 | urb_entries << 8 | mode << 2; > > + *cs++ = threads << 16 | 1 << 8 | mode << 2; > > why urb_entries = 1 ? We only used a single entry. There was no measurable benefit from assigning more entries, and the importance of any side effects from doing so unknown. > the range is 0,64 and 0,128 depending on the sku. > > in general there's a min of 32 URBs Don't forget num_entries * entry_size must fit within the URB allocation/allotment. -Chris
diff --git a/drivers/gpu/drm/i915/gt/gen7_renderclear.c b/drivers/gpu/drm/i915/gt/gen7_renderclear.c index d93d85cd3027..f3b8fea6226e 100644 --- a/drivers/gpu/drm/i915/gt/gen7_renderclear.c +++ b/drivers/gpu/drm/i915/gt/gen7_renderclear.c @@ -7,8 +7,6 @@ #include "i915_drv.h" #include "intel_gpu_commands.h" -#define MAX_URB_ENTRIES 64 -#define STATE_SIZE (4 * 1024) #define GT3_INLINE_DATA_DELAYS 0x1E00 #define batch_advance(Y, CS) GEM_BUG_ON((Y)->end != (CS)) @@ -34,8 +32,7 @@ struct batch_chunk { }; struct batch_vals { - u32 max_primitives; - u32 max_urb_entries; + u32 max_primitives; /* == number of VFE threads */ u32 cmd_size; u32 state_size; u32 state_start; @@ -50,18 +47,35 @@ static void batch_get_defaults(struct drm_i915_private *i915, struct batch_vals *bv) { if (IS_HASWELL(i915)) { - bv->max_primitives = 280; - bv->max_urb_entries = MAX_URB_ENTRIES; + switch (INTEL_INFO(i915)->gt) { + default: + case 1: + bv->max_primitives = 70; + break; + case 2: + bv->max_primitives = 140; + break; + case 3: + bv->max_primitives = 280; + break; + } bv->surface_height = 16 * 16; bv->surface_width = 32 * 2 * 16; } else { - bv->max_primitives = 128; - bv->max_urb_entries = MAX_URB_ENTRIES / 2; + switch (INTEL_INFO(i915)->gt) { + default: + case 1: /* including vlv */ + bv->max_primitives = 36; + break; + case 2: + bv->max_primitives = 128; + break; + } bv->surface_height = 16 * 8; bv->surface_width = 32 * 16; } bv->cmd_size = bv->max_primitives * 4096; - bv->state_size = STATE_SIZE; + bv->state_size = SZ_4K; bv->state_start = bv->cmd_size; bv->batch_size = bv->cmd_size + bv->state_size; bv->scratch_size = bv->surface_height * bv->surface_width; @@ -244,7 +258,6 @@ gen7_emit_vfe_state(struct batch_chunk *batch, u32 urb_size, u32 curbe_size, u32 mode) { - u32 urb_entries = bv->max_urb_entries; u32 threads = bv->max_primitives - 1; u32 *cs = batch_alloc_items(batch, 32, 8); @@ -254,7 +267,7 @@ gen7_emit_vfe_state(struct batch_chunk *batch, *cs++ = 0; /* number of threads & urb entries for GPGPU vs Media Mode */ - *cs++ = threads << 16 | urb_entries << 8 | mode << 2; + *cs++ = threads << 16 | 1 << 8 | mode << 2; *cs++ = 0;
MEDIA_STATE_VFE only accepts the 'maximum number of threads' in the range [0, n-1] where n is #EU * (#threads/EU) with the number of threads based on plaform and the number of EU based on the number of slices and subslices. This is a fixed number per platform/gt, so appropriately limit the number of threads we spawn to match the device. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2024 Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Cc: Balestrieri Francesco <francesco.balestrieri@intel.com> Cc: Bloomfield Jon <jon.bloomfield@intel.com> Cc: <stable@vger.kernel.org> # v5.7+ --- drivers/gpu/drm/i915/gt/gen7_renderclear.c | 35 +++++++++++++++------- 1 file changed, 24 insertions(+), 11 deletions(-)