mbox series

[0/9] Input: Fix insufficent DMA alignment.

Message ID 20221127144116.1418083-1-jic23@kernel.org
Headers show
Series Input: Fix insufficent DMA alignment. | expand

Message

Jonathan Cameron Nov. 27, 2022, 2:41 p.m. UTC
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

This problem was discovered in IIO as a side effect of the discussions about
relaxing kmalloc alignment on arm64 and resulted in a series of large
patch sets.

https://lore.kernel.org/linux-iio/20220508175712.647246-1-jic23@kernel.org/

Unsurprisingly there are cases of it in other subsystems.

The short version of this is that there are a few known arm64 chips where
___cacheline_aligned enforces 64 byte alignment which is what we typically
want for performance optimization as the size of the L1 cache lines.
However, further out in the cache hierarchy we have caches with 128 byte
lines.  Those are the ones that matter for DMA safety.
So we need the larger alignment guarantees of ARCH_KMALLOC_MINALIGN which
in this case is 128 bytes.

There is one other use of ____cacheline_aligned in input:
joystick/iforce/iforce-usb.c

Whilst suspicious I'm not sure enough of the requirements of USB to
know if they are there for DMA safety or some other constraint.

Jonathan

Cc: Daniel Mack <daniel@zonque.org>
Cc: Michael Hennerich <michael.hennerich@analog.com>
Cc: Tomohiro Yoshidomi <sylph23k@gmail.com>
Cc: Javier Martinez Canillas <javier@dowhile0.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: Lauri Kasanen <cand@gmx.com>
Cc: Daniel Hung-yu Wu <hywu@google.com>

Jonathan Cameron (9):
  Input: psxpad - Fix padding for DMA safe buffers.
  Input: ad714x - Fix padding for DMA safe buffers.
  Input: ad7887 - Fix padding for DMA safe buffers.
  Input: ads7846 - Fix padding for DMA safe buffers.
  Input: cyttsp - Fix padding for DMA safe buffers.
  Input: surface3 - Fix padding for DMA safe buffers.
  Input: n64joy - Fix DMA buffer alignment.
  Input: atmel_captouch - Avoid suspect DMA buffer alignment.
  Input: elants - Fix suspect DMA buffer alignment

 drivers/input/joystick/n64joy.c          | 6 +++---
 drivers/input/joystick/psxpad-spi.c      | 4 ++--
 drivers/input/misc/ad714x.h              | 2 +-
 drivers/input/misc/atmel_captouch.c      | 2 +-
 drivers/input/touchscreen/ad7877.c       | 4 ++--
 drivers/input/touchscreen/ads7846.c      | 4 ++--
 drivers/input/touchscreen/cyttsp_core.h  | 2 +-
 drivers/input/touchscreen/elants_i2c.c   | 2 +-
 drivers/input/touchscreen/surface3_spi.c | 2 +-
 9 files changed, 14 insertions(+), 14 deletions(-)

Comments

Jonathan Cameron Nov. 27, 2022, 4:26 p.m. UTC | #1
On Sun, 27 Nov 2022 14:41:13 +0000
Jonathan Cameron <jic23@kernel.org> wrote:

> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> On some architectures (e.g. arm64), ____cachline_aligned only aligns
> to the cacheline size of the L1 cache size. L1_CACHE_BYTES in
> arch64/include/asm/cache.h  Unfortunately DMA safety on these
> architectures requires the buffer no share a last level cache cacheline
> given by ARCH_DMA_MINALIGN which has a greater granularity.
> ARCH_DMA_MINALIGN is not defined for all architectures, but when it is
> defined it is used to set the size of ARCH_KMALLOC_MINALIGN
> to allow DMA safe buffer allocations.
> 
> As such the correct alignment requirement is
> __aligned(ARCH_KMALLOC_MINALIGN).
> This has recently been fixed in other subsystems such as IIO.
> 
> Presumably this part is little used on boards where this could actually
> matter so this is mostly about removing code that might be coppied
> elsewhere.
> 
> Fixes: 4feacbc24eea ("Input: add new driver for the Surface 3")
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>

Sigh. Sunday afternoon incompetence on my part...

Dmitry, if everything else is fine with this series I can resend this
or if you are feeling generous feel free to fix it up whilst applying ;)

Jonathan

> ---
>  drivers/input/touchscreen/surface3_spi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/input/touchscreen/surface3_spi.c b/drivers/input/touchscreen/surface3_spi.c
> index 1da23e5585a0..6c884fc2b332 100644
> --- a/drivers/input/touchscreen/surface3_spi.c
> +++ b/drivers/input/touchscreen/surface3_spi.c
> @@ -32,7 +32,7 @@ struct surface3_ts_data {
>  	struct input_dev *pen_input_dev;
>  	int pen_tool;
>  
> -	u8 rd_buf[SURFACE3_PACKET_SIZE]		____cacheline_aligned;
> +	u8 rd_buf[SURFACE3_PACKET_SIZE] __aligned(ARCH_KMALLOC_MINALIGN;

Missing bracket.

>  };
>  
>  struct surface3_ts_data_finger {
Hennerich, Michael Nov. 28, 2022, 7:14 a.m. UTC | #2
> -----Original Message-----
> From: Jonathan Cameron <jic23@kernel.org>
> Sent: Sonntag, 27. November 2022 15:41
> To: linux-input@vger.kernel.org; Dmitry Torokhov
> <dmitry.torokhov@gmail.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>; Hennerich, Michael
> <Michael.Hennerich@analog.com>
> Subject: [PATCH 2/9] Input: ad714x - Fix padding for DMA safe buffers.
> 
> 
> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> On some architectures (e.g. arm64), ____cachline_aligned only aligns to the
> cacheline size of the L1 cache size. L1_CACHE_BYTES in
> arch64/include/asm/cache.h  Unfortunately DMA safety on these architectures
> requires the buffer no share a last level cache cacheline given by
> ARCH_DMA_MINALIGN which has a greater granularity.
> ARCH_DMA_MINALIGN is not defined for all architectures, but when it is
> defined it is used to set the size of ARCH_KMALLOC_MINALIGN to allow DMA
> safe buffer allocations.
> 
> As such the correct alignment requirement is
> __aligned(ARCH_KMALLOC_MINALIGN).
> This has recently been fixed in other subsystems such as IIO.
> 
> Fixes tag is inprecise because there may not have been any architectures
> where the two values were different at the time of the earlier fix.
> 
> Fixes: c0409feb8689 ("Input: ad714x - use DMA-safe buffers for spi_write()")
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Michael Hennerich <michael.hennerich@analog.com>

Acked-by: Michael Hennerich <michael.hennerich@analog.com>

> ---
>  drivers/input/misc/ad714x.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/input/misc/ad714x.h b/drivers/input/misc/ad714x.h index
> af847b5f0d0e..2b8b901183be 100644
> --- a/drivers/input/misc/ad714x.h
> +++ b/drivers/input/misc/ad714x.h
> @@ -41,7 +41,7 @@ struct ad714x_chip {
>  	unsigned product;
>  	unsigned version;
> 
> -	__be16 xfer_buf[16] ____cacheline_aligned;
> +	__be16 xfer_buf[16] __aligned(ARCH_KMALLOC_MINALIGN);
> 
>  };
> 
> --
> 2.38.1
Dmitry Torokhov Nov. 28, 2022, 6:16 p.m. UTC | #3
Hi Jonathan,

On Sun, Nov 27, 2022 at 02:41:07PM +0000, Jonathan Cameron wrote:
> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> This problem was discovered in IIO as a side effect of the discussions about
> relaxing kmalloc alignment on arm64 and resulted in a series of large
> patch sets.
> 
> https://lore.kernel.org/linux-iio/20220508175712.647246-1-jic23@kernel.org/
> 
> Unsurprisingly there are cases of it in other subsystems.
> 
> The short version of this is that there are a few known arm64 chips where
> ___cacheline_aligned enforces 64 byte alignment which is what we typically
> want for performance optimization as the size of the L1 cache lines.
> However, further out in the cache hierarchy we have caches with 128 byte
> lines.  Those are the ones that matter for DMA safety.
> So we need the larger alignment guarantees of ARCH_KMALLOC_MINALIGN which
> in this case is 128 bytes.

I wonder if we could have something like ____dmasafe_aligned instead of
sprinkling ARCH_KMALLOC_MINALIGN around?

> 
> There is one other use of ____cacheline_aligned in input:
> joystick/iforce/iforce-usb.c
> 
> Whilst suspicious I'm not sure enough of the requirements of USB to
> know if they are there for DMA safety or some other constraint.

Yes, USB has requirements similar to SPI.

Thanks.
Jonathan Cameron Nov. 29, 2022, 9:18 a.m. UTC | #4
On Mon, 28 Nov 2022 10:16:31 -0800
Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote:

> Hi Jonathan,
> 
> On Sun, Nov 27, 2022 at 02:41:07PM +0000, Jonathan Cameron wrote:
> > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > 
> > This problem was discovered in IIO as a side effect of the discussions about
> > relaxing kmalloc alignment on arm64 and resulted in a series of large
> > patch sets.
> > 
> > https://lore.kernel.org/linux-iio/20220508175712.647246-1-jic23@kernel.org/
> > 
> > Unsurprisingly there are cases of it in other subsystems.
> > 
> > The short version of this is that there are a few known arm64 chips where
> > ___cacheline_aligned enforces 64 byte alignment which is what we typically
> > want for performance optimization as the size of the L1 cache lines.
> > However, further out in the cache hierarchy we have caches with 128 byte
> > lines.  Those are the ones that matter for DMA safety.
> > So we need the larger alignment guarantees of ARCH_KMALLOC_MINALIGN which
> > in this case is 128 bytes.  
> 
> I wonder if we could have something like ____dmasafe_aligned instead of
> sprinkling ARCH_KMALLOC_MINALIGN around?

I agree in principle and eventually that will be ARCH_DMA_MINALIGN.
But it isn't useable yet for backports.
Catalin has worked on several approaches to reducing the alignment of kmalloc.
I may well be lagging on the current plan...
https://lore.kernel.org/linux-arm-kernel/20221106220143.2129263-1-catalin.marinas@arm.com/#r

The way crypto and IIO handled this was to add a local define
IIO_DMA_MINALIGN, CRYPTO_MINALIGN

Catalin, if you do forwards with making ARCH_DMA_MINALIGN global available, make sure
to include IIO_DMA_MINALIGN.  I'm fine with the approach you used for crypto of changing
the local define.  Note that in IIO this typically only bloats a single structure
per device instance, so it's not worth the complexity of dynamic alignment.
As we are marking structure elements with it, in all cases they are multiples of
IIO_DMA_MINALIGN in size, so everything is easy.

On the other side of things, we might be able to relax most of these alignment tricks
in IIO (and input) if swiotlb is the approach eventually used.  I'm personally not that
keen on that transition but meh, this stuff usually involves a serial bus, so the extra
bounce isnt' too painful. Note that we regularly do fun things like setting tx
buffer to be ARCH_KMALLOC_MINALIGN and the rx buffer to be that +n bytes on the basis
that a given serial bus master won't trash itself and hence this is safe. I think your
approach will bounce all of those which is somewhat unfortunate.  Other fun things
include DMA mapping a series of small buffers in one allocation for a series of serial
messages.  Again, that is guaranteed to be safe.

Jonathan






> 
> > 
> > There is one other use of ____cacheline_aligned in input:
> > joystick/iforce/iforce-usb.c
> > 
> > Whilst suspicious I'm not sure enough of the requirements of USB to
> > know if they are there for DMA safety or some other constraint.  
> 
> Yes, USB has requirements similar to SPI.
> 
> Thanks.
>
Dmitry Torokhov Nov. 30, 2022, 1:23 a.m. UTC | #5
On Tue, Nov 29, 2022 at 09:18:28AM +0000, Jonathan Cameron wrote:
> On Mon, 28 Nov 2022 10:16:31 -0800
> Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote:
> 
> > Hi Jonathan,
> > 
> > On Sun, Nov 27, 2022 at 02:41:07PM +0000, Jonathan Cameron wrote:
> > > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > 
> > > This problem was discovered in IIO as a side effect of the discussions about
> > > relaxing kmalloc alignment on arm64 and resulted in a series of large
> > > patch sets.
> > > 
> > > https://lore.kernel.org/linux-iio/20220508175712.647246-1-jic23@kernel.org/
> > > 
> > > Unsurprisingly there are cases of it in other subsystems.
> > > 
> > > The short version of this is that there are a few known arm64 chips where
> > > ___cacheline_aligned enforces 64 byte alignment which is what we typically
> > > want for performance optimization as the size of the L1 cache lines.
> > > However, further out in the cache hierarchy we have caches with 128 byte
> > > lines.  Those are the ones that matter for DMA safety.
> > > So we need the larger alignment guarantees of ARCH_KMALLOC_MINALIGN which
> > > in this case is 128 bytes.  
> > 
> > I wonder if we could have something like ____dmasafe_aligned instead of
> > sprinkling ARCH_KMALLOC_MINALIGN around?
> 
> I agree in principle and eventually that will be ARCH_DMA_MINALIGN.
> But it isn't useable yet for backports.

Sorry, I do not follow. Are talking about backports because the code is
broken in the mainline right now, or it will become broken when
Catalin's changes land? And even if it is broken right now why can't we
add

#define ____dmasafe_aligned __attribute__((__aligned__(ARCH_KMALLOC_MINALIGN)))

somewhere in include/linux/cache.h? Then you can tweak it as needed
independently of kmalloc alignment and without need to touch drivers
again and it should be easy to backport still.

Thanks.