diff mbox series

drm/i915: check to see if SIMD registers are available before using SIMD

Message ID 20200430221016.3866-1-Jason@zx2c4.com
State New
Headers show
Series drm/i915: check to see if SIMD registers are available before using SIMD | expand

Commit Message

Jason A. Donenfeld April 30, 2020, 10:10 p.m. UTC
Sometimes it's not okay to use SIMD registers, the conditions for which
have changed subtly from kernel release to kernel release. Usually the
pattern is to check for may_use_simd() and then fallback to using
something slower in the unlikely case SIMD registers aren't available.
So, this patch fixes up i915's accelerated memcpy routines to fallback
to boring memcpy if may_use_simd() is false.

Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

---
 drivers/gpu/drm/i915/i915_memcpy.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

-- 
2.26.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Comments

Sebastian Andrzej Siewior May 1, 2020, 10:42 a.m. UTC | #1
On 2020-04-30 16:10:16 [-0600], Jason A. Donenfeld wrote:
> Sometimes it's not okay to use SIMD registers, the conditions for which

> have changed subtly from kernel release to kernel release. Usually the

> pattern is to check for may_use_simd() and then fallback to using

> something slower in the unlikely case SIMD registers aren't available.

> So, this patch fixes up i915's accelerated memcpy routines to fallback

> to boring memcpy if may_use_simd() is false.


That would indicate that these functions are used from IRQ/softirq which
break otherwise if the kernel is also using the registers. The crypto
code uses it for that purpose.

So
   Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>


May I ask how large the memcpy can be? I'm asking in case it is large
and an explicit rescheduling point might be needed.

> Cc: stable@vger.kernel.org

> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


Sebastian
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
David Laight May 1, 2020, 11:34 a.m. UTC | #2
From: Sebastian Andrzej Siewior

> Sent: 01 May 2020 11:42

> On 2020-04-30 16:10:16 [-0600], Jason A. Donenfeld wrote:

> > Sometimes it's not okay to use SIMD registers, the conditions for which

> > have changed subtly from kernel release to kernel release. Usually the

> > pattern is to check for may_use_simd() and then fallback to using

> > something slower in the unlikely case SIMD registers aren't available.

> > So, this patch fixes up i915's accelerated memcpy routines to fallback

> > to boring memcpy if may_use_simd() is false.

> 

> That would indicate that these functions are used from IRQ/softirq which

> break otherwise if the kernel is also using the registers. The crypto

> code uses it for that purpose.

> 

> So

>    Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

> 

> May I ask how large the memcpy can be? I'm asking in case it is large

> and an explicit rescheduling point might be needed.


It is also quite likely that a 'rep movs' copy will be at least just as
fast on modern hardware.

Clearly if you are copying to/from PCIe memory you need the largest
resisters possible - but I think the graphics buffers are mapped cached?
(Otherwise I wouldn't see 3ms 'spins' while it invalidates the
entire screen buffer cache.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
Jason A. Donenfeld May 1, 2020, 9:54 p.m. UTC | #3
On Fri, May 1, 2020 at 4:42 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>    Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>


Thanks.

>

> May I ask how large the memcpy can be? I'm asking in case it is large

> and an explicit rescheduling point might be needed.


Yea I was worried about that too. I'm not an i915 developer, but so
far as I can tell:

- The path from intel_engine_cmd_parser is  <= 256 KiB for "known
users", so that's rather large.
- The path from perf_memcpy is either 4k, 64k, or 4M, depending on the
type of object, so that seems gigantic, but I think that might be
selftest code.
- The path from compress_page appears to be PAGE_SIZE, so 4k, which
meshes with the limits we set agreed on few weeks ago for the crypto
stuff.
- The path from guc_read_update_log_buffer appears to be 8k, 32k, 2M,
or 8M, depending on the type of object, so that seems absurdly huge
and doesn't appear to be selftest code either like the other case.

I have no doubt the i915 developers will jump in here waving their
arms, but either way, it sure seems to me like you might have a point.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
Jason A. Donenfeld May 1, 2020, 9:55 p.m. UTC | #4
On Fri, May 1, 2020 at 12:07 PM Christoph Hellwig <hch@infradead.org> wrote:
>

> On Thu, Apr 30, 2020 at 04:10:16PM -0600, Jason A. Donenfeld wrote:

> > Sometimes it's not okay to use SIMD registers, the conditions for which

> > have changed subtly from kernel release to kernel release. Usually the

> > pattern is to check for may_use_simd() and then fallback to using

> > something slower in the unlikely case SIMD registers aren't available.

> > So, this patch fixes up i915's accelerated memcpy routines to fallback

> > to boring memcpy if may_use_simd() is false.

>

> Err, why does i915 implements its own uncached memcpy instead of relying

> on core functionality to start with?


I was wondering the same. It sure does seem like this ought to be more
generalized functionality, with a name that represents the type of
transfer it's optimized for (wc or similar).
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
Chris Wilson May 3, 2020, 8:20 p.m. UTC | #5
Quoting Christoph Hellwig (2020-05-01 19:07:31)
> On Thu, Apr 30, 2020 at 04:10:16PM -0600, Jason A. Donenfeld wrote:

> > Sometimes it's not okay to use SIMD registers, the conditions for which

> > have changed subtly from kernel release to kernel release. Usually the

> > pattern is to check for may_use_simd() and then fallback to using

> > something slower in the unlikely case SIMD registers aren't available.

> > So, this patch fixes up i915's accelerated memcpy routines to fallback

> > to boring memcpy if may_use_simd() is false.

> 

> Err, why does i915 implements its own uncached memcpy instead of relying

> on core functionality to start with?


What is this core functionality that provides movntqda?
-Chris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
Chris Wilson May 3, 2020, 8:30 p.m. UTC | #6
Quoting Jason A. Donenfeld (2020-04-30 23:10:16)
> Sometimes it's not okay to use SIMD registers, the conditions for which

> have changed subtly from kernel release to kernel release. Usually the

> pattern is to check for may_use_simd() and then fallback to using

> something slower in the unlikely case SIMD registers aren't available.

> So, this patch fixes up i915's accelerated memcpy routines to fallback

> to boring memcpy if may_use_simd() is false.

> 

> Cc: stable@vger.kernel.org


The same argument as on the previous submission is that we return to the
caller if we can't use movntqda as their fallback path should be faster
than uncached memcpy.
-Chris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
Jason A. Donenfeld May 3, 2020, 8:35 p.m. UTC | #7
On Sun, May 3, 2020 at 2:30 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
>

> Quoting Jason A. Donenfeld (2020-04-30 23:10:16)

> > Sometimes it's not okay to use SIMD registers, the conditions for which

> > have changed subtly from kernel release to kernel release. Usually the

> > pattern is to check for may_use_simd() and then fallback to using

> > something slower in the unlikely case SIMD registers aren't available.

> > So, this patch fixes up i915's accelerated memcpy routines to fallback

> > to boring memcpy if may_use_simd() is false.

> >

> > Cc: stable@vger.kernel.org

>

> The same argument as on the previous submission is that we return to the

> caller if we can't use movntqda as their fallback path should be faster

> than uncached memcpy.


Oh, THAT's what you meant before. Okay, will follow up.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
David Laight May 4, 2020, 4:15 p.m. UTC | #8
From: Christoph Hellwig

> Sent: 04 May 2020 17:03

> 

> On Sun, May 03, 2020 at 09:20:19PM +0100, Chris Wilson wrote:

> > > Err, why does i915 implements its own uncached memcpy instead of relying

> > > on core functionality to start with?

> >

> > What is this core functionality that provides movntqda?

> 

> A sensible name might be memcpy_uncached or mempcy_nontemporal.

> But the important point is that this should be arch code with a common

> fallback rather than hacking it up in drivers.


More the point, you are trying to do a copy where:
1) The kernel isn't expected to read the data - so can bypass the cache.
and maybe:
2) The data needs flushing from the cache to actual memory.
and maybe:
3) The cache lines need invalidating.

The fallbacks depend on the required behaviour.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_memcpy.c b/drivers/gpu/drm/i915/i915_memcpy.c
index fdd550405fd3..7c0e022586bc 100644
--- a/drivers/gpu/drm/i915/i915_memcpy.c
+++ b/drivers/gpu/drm/i915/i915_memcpy.c
@@ -24,6 +24,7 @@ 
 
 #include <linux/kernel.h>
 #include <asm/fpu/api.h>
+#include <asm/simd.h>
 
 #include "i915_memcpy.h"
 
@@ -38,6 +39,12 @@  static DEFINE_STATIC_KEY_FALSE(has_movntdqa);
 #ifdef CONFIG_AS_MOVNTDQA
 static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len)
 {
+	if (unlikely(!may_use_simd())) {
+		memcpy(dst, src, len);
+		return;
+	}
+
+
 	kernel_fpu_begin();
 
 	while (len >= 4) {
@@ -67,6 +74,11 @@  static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len)
 
 static void __memcpy_ntdqu(void *dst, const void *src, unsigned long len)
 {
+	if (unlikely(!may_use_simd())) {
+		memcpy(dst, src, len);
+		return;
+	}
+
 	kernel_fpu_begin();
 
 	while (len >= 4) {