diff mbox series

swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

Message ID YL7XXNOnbaDgmTB9@atmark-techno.com
State New
Headers show
Series swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12) | expand

Commit Message

Dominique Martinet June 8, 2021, 2:35 a.m. UTC
I'm not able to find any individual mails for Christoph's patches so
replying to the PR.

In particular, this commit:
Konrad Rzeszutek Wilk wrote on Fri, Feb 26, 2021 at 11:00:08AM -0500:
> Christoph Hellwig (8):
>       swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single

merged as 16fc3cef33a0, breaks caam as used today (thanks Lukas for
bisecting it!)


More precisely, drivers/crypto/caam/caamalg.c does a lot of mappings
that aren't aligned:

dma_sync_single_for_device(jrdev, ctx->sh_desc_enc_dma,
                           desc_bytes(desc), ctx->dir);
dma_sync_single_for_device(jrdev, ctx->sh_desc_dec_dma,
                           desc_bytes(desc), ctx->dir);


which can be caught by crypto tests with this caam enabled, for example
adding a warning when an unaligned mapping happens I get this trace:
--------
[ 1628.670226]  swiotlb_tbl_sync_single+0x74/0xa0
[ 1628.674677]  dma_sync_single_for_device+0xe4/0x110
[ 1628.679472]  skcipher_setkey+0xd0/0xf0
[ 1628.683224]  des3_skcipher_setkey+0x74/0xac
[ 1628.687416]  crypto_skcipher_setkey+0x54/0x110
[ 1628.691866]  crypto_authenc_setkey+0x94/0xd0
[ 1628.696138]  crypto_aead_setkey+0x34/0x10c
[ 1628.700236]  test_aead_vec_cfg+0x3a0/0x770
[ 1628.704338]  test_aead+0xac/0x130
[ 1628.707656]  alg_test_aead+0xa8/0x190
[ 1628.711324]  alg_test.part.0+0xf4/0x41c
[ 1628.715161]  alg_test+0x1c/0x60
[ 1628.718307]  do_test+0x37ec/0x4c50
[ 1628.721709]  do_test+0x4bec/0x4c50
[ 1628.725114]  tcrypt_mod_init+0x54/0xac
[ 1628.728864]  do_one_initcall+0x4c/0x1b0
[ 1628.732701]  kernel_init_freeable+0x1d0/0x234
[ 1628.737060]  kernel_init+0x10/0x114
[ 1628.740550]  ret_from_fork+0x10/0x24
-----

and the tests themselves also fail (all or at least most of them) with
e.g.
------
[    8.454233] caam_jr 30901000.jr: 40001713: DECO: desc idx 23: Header Error. Invalid length or parity, or certain other problems.
[    8.465820] alg: ahash: hmac-sha256-caam final() failed with err -22 on test vector 0, cfg="init+update+final aligned buffer"
[    8.477149] ------------[ cut here ]------------
[    8.481781] alg: self-tests for hmac-sha256-caam (hmac(sha256)) failed (rc=-22)
[    8.481818] WARNING: CPU: 2 PID: 295 at crypto/testmgr.c:5645 alg_test.part.0+0x128/0x41c
[    8.497307] Modules linked in:
[    8.500365] CPU: 2 PID: 295 Comm: cryptomgr_test Tainted: G        W         5.13.0-rc5-00002-gc98cdee6172e #23
[    8.510455] Hardware name: NXP i.MX8MPlus EVK board (DT)
[    8.515767] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[    8.521778] pc : alg_test.part.0+0x128/0x41c
[    8.526050] lr : alg_test.part.0+0x128/0x41c
[    8.530324] sp : ffff80001371bd10
[    8.533637] x29: ffff80001371bd10 x28: 000000000000008f x27: 000000000000008f
[    8.540785] x26: 000000000000008f x25: 0000000000000400 x24: ffff8000111658c8
[    8.547930] x23: ffff0000c02aaa80 x22: 000000000001008f x21: ffff0000c02aaa00
[    8.555075] x20: 0000000000000085 x19: 00000000ffffffea x18: 00000000fffffffc
[    8.562221] x17: 0000000000000001 x16: 0000000000000003 x15: 0000000000000020
[    8.569365] x14: ffffffffffffffff x13: 00000000000003e7 x12: ffff80001371b9e0
[    8.576511] x11: ffff80001188c940 x10: ffff800011844300 x9 : ffff800011886b98
[    8.583658] x8 : ffff80001182eb98 x7 : ffff800011886b98 x6 : ffffffffffff0888
[    8.590801] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff
[    8.597945] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c1684e00
[    8.605093] Call trace:
[    8.607540]  alg_test.part.0+0x128/0x41c
[    8.611467]  alg_test+0x1c/0x60
[    8.614608]  cryptomgr_test+0x28/0x50
[    8.618275]  kthread+0x154/0x160
[    8.621511]  ret_from_fork+0x10/0x24
[    8.625088] ---[ end trace 2d195377ee3c219e ]---
------



Looking at it a bit further it seems to me that swiotlb_bounce() should
either keep the offset (re-adding the line that was removed except it
would go back in swiotlb_bounce, diff at end of mail), or the size
should be adjusted to cover from the start of the page up until the
original offset + size which would also probably work (not tested)

That, or make unaligned mappings forbidden and warn when we see one, but
I have no idea what other component could be doing some -- I'm not sure
if what the caam code does it legitimate (e.g. would it be possible to
do the mappings once at init and use them?), but the swiotlb code
doesn't look quite right.


For now I'll just revert this locally but there must have been a reason
the adjustment got removed in the first place, what's the best way
forward?





--- (for 5.13-rc1+)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8ca7d505d61c..3acdd77edfed 100644
---

Thanks,

Comments

Horia Geanta June 10, 2021, 2:52 p.m. UTC | #1
On 6/8/2021 5:35 AM, Dominique MARTINET wrote:
> I'm not able to find any individual mails for Christoph's patches so

> replying to the PR.

> 

The patch set is here:
https://lore.kernel.org/linux-iommu/20210207160327.2955490-1-hch@lst.de

> In particular, this commit:

> Konrad Rzeszutek Wilk wrote on Fri, Feb 26, 2021 at 11:00:08AM -0500:

>> Christoph Hellwig (8):

>>       swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single

> 

> merged as 16fc3cef33a0, breaks caam as used today (thanks Lukas for

> bisecting it!)

> 

Thanks.

I've noticed the failure also in v5.10 and v5.11 stable kernels,
since the patch set has been backported.

> 

> More precisely, drivers/crypto/caam/caamalg.c does a lot of mappings

> that aren't aligned:

> 

> dma_sync_single_for_device(jrdev, ctx->sh_desc_enc_dma,

>                            desc_bytes(desc), ctx->dir);

> dma_sync_single_for_device(jrdev, ctx->sh_desc_dec_dma,

>                            desc_bytes(desc), ctx->dir);

> 

Right. These dma sync ops are in caaamalg.c and should be fixed.

OTOH there are other dma sync ops in caam driver - e.g. caamhash.c:
	dma_sync_single_for_device(jrdev, ctx->sh_desc_update_dma,
				   desc_bytes(desc), ctx->dir);
where the mappings are aligned (see struct caam_hash_ctx),
but even in this case the crypto algorithms are failing.

> 

> which can be caught by crypto tests with this caam enabled, for example

> adding a warning when an unaligned mapping happens I get this trace:

> --------

> [ 1628.670226]  swiotlb_tbl_sync_single+0x74/0xa0

> [ 1628.674677]  dma_sync_single_for_device+0xe4/0x110

> [ 1628.679472]  skcipher_setkey+0xd0/0xf0

> [ 1628.683224]  des3_skcipher_setkey+0x74/0xac

> [ 1628.687416]  crypto_skcipher_setkey+0x54/0x110

> [ 1628.691866]  crypto_authenc_setkey+0x94/0xd0

> [ 1628.696138]  crypto_aead_setkey+0x34/0x10c

> [ 1628.700236]  test_aead_vec_cfg+0x3a0/0x770

> [ 1628.704338]  test_aead+0xac/0x130

> [ 1628.707656]  alg_test_aead+0xa8/0x190

> [ 1628.711324]  alg_test.part.0+0xf4/0x41c

> [ 1628.715161]  alg_test+0x1c/0x60

> [ 1628.718307]  do_test+0x37ec/0x4c50

> [ 1628.721709]  do_test+0x4bec/0x4c50

> [ 1628.725114]  tcrypt_mod_init+0x54/0xac

> [ 1628.728864]  do_one_initcall+0x4c/0x1b0

> [ 1628.732701]  kernel_init_freeable+0x1d0/0x234

> [ 1628.737060]  kernel_init+0x10/0x114

> [ 1628.740550]  ret_from_fork+0x10/0x24

> -----

> 

> and the tests themselves also fail (all or at least most of them) with

> e.g.

> ------

> [    8.454233] caam_jr 30901000.jr: 40001713: DECO: desc idx 23: Header Error. Invalid length or parity, or certain other problems.

> [    8.465820] alg: ahash: hmac-sha256-caam final() failed with err -22 on test vector 0, cfg="init+update+final aligned buffer"

> [    8.477149] ------------[ cut here ]------------

> [    8.481781] alg: self-tests for hmac-sha256-caam (hmac(sha256)) failed (rc=-22)

> [    8.481818] WARNING: CPU: 2 PID: 295 at crypto/testmgr.c:5645 alg_test.part.0+0x128/0x41c

> [    8.497307] Modules linked in:

> [    8.500365] CPU: 2 PID: 295 Comm: cryptomgr_test Tainted: G        W         5.13.0-rc5-00002-gc98cdee6172e #23

> [    8.510455] Hardware name: NXP i.MX8MPlus EVK board (DT)

> [    8.515767] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)

> [    8.521778] pc : alg_test.part.0+0x128/0x41c

> [    8.526050] lr : alg_test.part.0+0x128/0x41c

> [    8.530324] sp : ffff80001371bd10

> [    8.533637] x29: ffff80001371bd10 x28: 000000000000008f x27: 000000000000008f

> [    8.540785] x26: 000000000000008f x25: 0000000000000400 x24: ffff8000111658c8

> [    8.547930] x23: ffff0000c02aaa80 x22: 000000000001008f x21: ffff0000c02aaa00

> [    8.555075] x20: 0000000000000085 x19: 00000000ffffffea x18: 00000000fffffffc

> [    8.562221] x17: 0000000000000001 x16: 0000000000000003 x15: 0000000000000020

> [    8.569365] x14: ffffffffffffffff x13: 00000000000003e7 x12: ffff80001371b9e0

> [    8.576511] x11: ffff80001188c940 x10: ffff800011844300 x9 : ffff800011886b98

> [    8.583658] x8 : ffff80001182eb98 x7 : ffff800011886b98 x6 : ffffffffffff0888

> [    8.590801] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff

> [    8.597945] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c1684e00

> [    8.605093] Call trace:

> [    8.607540]  alg_test.part.0+0x128/0x41c

> [    8.611467]  alg_test+0x1c/0x60

> [    8.614608]  cryptomgr_test+0x28/0x50

> [    8.618275]  kthread+0x154/0x160

> [    8.621511]  ret_from_fork+0x10/0x24

> [    8.625088] ---[ end trace 2d195377ee3c219e ]---

> ------

> 

> 

> 

> Looking at it a bit further it seems to me that swiotlb_bounce() should

> either keep the offset (re-adding the line that was removed except it

> would go back in swiotlb_bounce, diff at end of mail), or the size

> should be adjusted to cover from the start of the page up until the

> original offset + size which would also probably work (not tested)

> 

> That, or make unaligned mappings forbidden and warn when we see one, but

> I have no idea what other component could be doing some -- I'm not sure

> if what the caam code does it legitimate (e.g. would it be possible to

> do the mappings once at init and use them?), but the swiotlb code

> doesn't look quite right.

> 

Well, it's not only about unaligned accesses.

It's also about partial syncs, e.g.
	dma_handle = dma_map_single(dev, cpu_addr, size, DMA_BIDIRECTIONAL);
	[...]
	dma_sync_single_for_device(dev, dma_handle + offset, size - offset,
				   DMA_BIDIRECTIONAL);
(where dma_handle + offset should be cacheline-aligned).

Documentation/core-api/dma-api.rst explicitly allows for partial syncs:
Synchronise a single contiguous or scatter/gather mapping for the CPU
and device. With the sync_sg API, all the parameters must be the same
as those passed into the single mapping API. With the sync_single API,
you can use dma_handle and size parameters that aren't identical to
those passed into the single mapping API to do a partial sync.

AFAICS commit 16fc3cef33a0 ("swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single")
is breaking this functionality.

Thanks,
Horia
Linus Torvalds June 10, 2021, 7:41 p.m. UTC | #2
On Thu, Jun 10, 2021 at 7:52 AM Horia Geantă <horia.geanta@nxp.com> wrote:
>

> Documentation/core-api/dma-api.rst explicitly allows for partial syncs:

> Synchronise a single contiguous or scatter/gather mapping for the CPU

> and device. With the sync_sg API, all the parameters must be the same

> as those passed into the single mapping API. With the sync_single API,

> you can use dma_handle and size parameters that aren't identical to

> those passed into the single mapping API to do a partial sync.

>

> AFAICS commit 16fc3cef33a0 ("swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single")

> is breaking this functionality.


How about a patch like the attached? Does that fix things for you.

Christoph? Comments - that commit removed the offset calculation
entirely, because the old

        (unsigned long)tlb_addr & (IO_TLB_SIZE - 1)

was wrong, but instead of removing it, I think it should have just
fixed it to be

        (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);

instead. That way the slot offset always matches the slot index calculation.

I also made it then take the offset into account for the alloc_size checks.

Does this UNTESTED patch perhaps do the right thing?

                    Linus
kernel/dma/swiotlb.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8ca7d505d61c..f63d15e94d35 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -342,6 +342,7 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
 {
 	struct io_tlb_mem *mem = io_tlb_default_mem;
 	int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
+	unsigned long offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);
 	phys_addr_t orig_addr = mem->slots[index].orig_addr;
 	size_t alloc_size = mem->slots[index].alloc_size;
 	unsigned long pfn = PFN_DOWN(orig_addr);
@@ -350,6 +351,14 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
 	if (orig_addr == INVALID_PHYS_ADDR)
 		return;
 
+	if (offset > alloc_size) {
+		dev_WARN_ONCE(dev, 1,
+			"Buffer overflow detected. Offset: %lu. Mapping size: %zu.\n",
+			offset, size);
+		return;
+	}
+	alloc_size -= offset;
+	orig_addr += offset;
 	if (size > alloc_size) {
 		dev_WARN_ONCE(dev, 1,
 			"Buffer overflow detected. Allocation size: %zu. Mapping size: %zu.\n",
Horia Geanta June 10, 2021, 11:20 p.m. UTC | #3
On 6/10/2021 10:41 PM, Linus Torvalds wrote:
> 

> How about a patch like the attached? Does that fix things for you.

> 

Yes, it fixes the caam driver regression.

Tested-by: Horia Geantă <horia.geanta@nxp.com>


on top of next-20210610.

Thank you,
Horia
Christoph Hellwig June 11, 2021, 6:21 a.m. UTC | #4
On Thu, Jun 10, 2021 at 05:52:07PM +0300, Horia Geantă wrote:
> I've noticed the failure also in v5.10 and v5.11 stable kernels,

> since the patch set has been backported.


FYI, there has been a patch on the list that should have fixed this
for about a month:

https://lore.kernel.org/linux-iommu/20210510091816.GA2084@lst.de/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f

but it seems like it never got picked up.
Konrad Rzeszutek Wilk June 11, 2021, 10:34 a.m. UTC | #5
On Fri, Jun 11, 2021 at 08:21:53AM +0200, Christoph Hellwig wrote:
> On Thu, Jun 10, 2021 at 05:52:07PM +0300, Horia Geantă wrote:

> > I've noticed the failure also in v5.10 and v5.11 stable kernels,

> > since the patch set has been backported.

> 

> FYI, there has been a patch on the list that should have fixed this

> for about a month:

> 

> https://lore.kernel.org/linux-iommu/20210510091816.GA2084@lst.de/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f

> 

> but it seems like it never got picked up.


Yikes!

Dominique,

Would you be up to testing the attached (and inline) patch please?

Linus,

Would you be terribly offended if I took your code (s/unsigned
long/unsigned int), and used Chanho's description of the problem (see below)?

Christoph,
I took the liberty of putting your Reviewed-by on the patch, you OK with
that?

Jianxiong,
Would you be up for testing this patch on your NVMe rig please? I don't
forsee a problem.. but just in case

From f06da55596675383fbf2563fe2919b2a8f68901b Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>

Date: Thu, 10 Jun 2021 12:41:26 -0700
Subject: [PATCH] swiotlb: manipulate orig_addr when tlb_addr has offset
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

in case of driver wants to sync part of ranges with offset,
swiotlb_tbl_sync_single() copies from orig_addr base to tlb_addr with
offset and ends up with data mismatch.

It was removed from
"swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single",
but said logic has to be added back in.

[From Linus's email:
That commit which the removed the offset calculation entirely, because the old

        (unsigned long)tlb_addr & (IO_TLB_SIZE - 1)

was wrong, but instead of removing it, I think it should have just
fixed it to be

        (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);

instead. That way the slot offset always matches the slot index calculation.]

The use-case that drivers are hitting is as follow:

1. Get dma_addr_t from dma_map_single()

dma_addr_t tlb_addr = dma_map_single(dev, vaddr, vsize, DMA_TO_DEVICE);

    |<---------------vsize------------->|
    +-----------------------------------+
    |                                   | original buffer
    +-----------------------------------+
  vaddr

 swiotlb_align_offset
     |<----->|<---------------vsize------------->|
     +-------+-----------------------------------+
     |       |                                   | swiotlb buffer
     +-------+-----------------------------------+
          tlb_addr

2. Do something
3. Sync dma_addr_t through dma_sync_single_for_device(..)

dma_sync_single_for_device(dev, tlb_addr + offset, size, DMA_TO_DEVICE);

  Error case.
    Copy data to original buffer but it is from base addr (instead of
  base addr + offset) in original buffer:

 swiotlb_align_offset
     |<----->|<- offset ->|<- size ->|
     +-------+-----------------------------------+
     |       |            |##########|           | swiotlb buffer
     +-------+-----------------------------------+
          tlb_addr

    |<- size ->|
    +-----------------------------------+
    |##########|                        | original buffer
    +-----------------------------------+
  vaddr

The fix is to copy the data to the original buffer and take into
account the offset, like so:

 swiotlb_align_offset
     |<----->|<- offset ->|<- size ->|
     +-------+-----------------------------------+
     |       |            |##########|           | swiotlb buffer
     +-------+-----------------------------------+
          tlb_addr

    |<- offset ->|<- size ->|
    +-----------------------------------+
    |            |##########|           | original buffer
    +-----------------------------------+
  vaddr

[This patch text is from Bumyong's email; and his solution was very close
to Linus's, so incorporating his text]

Fixes: 16fc3cef33a0 ("swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single")
Signed-off-by: Bumyong Lee <bumyong.lee@samsung.com>

Signed-off-by: Chanho Park <chanho61.park@samsung.com>

Reviewed-by: Christoph Hellwig <hch@lst.de>

Reported-by: Dominique MARTINET <dominique.martinet@atmark-techno.com>
Reported-by: Horia Geantă <horia.geanta@nxp.com>
Tested-by: Horia Geantă <horia.geanta@nxp.com>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

CC: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad@darnok.org>

---
 kernel/dma/swiotlb.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8ca7d50..dc438b5 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -342,6 +342,7 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
 {
 	struct io_tlb_mem *mem = io_tlb_default_mem;
 	int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
+	unsigned int offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);
 	phys_addr_t orig_addr = mem->slots[index].orig_addr;
 	size_t alloc_size = mem->slots[index].alloc_size;
 	unsigned long pfn = PFN_DOWN(orig_addr);
@@ -350,6 +351,14 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
 	if (orig_addr == INVALID_PHYS_ADDR)
 		return;
 
+	if (offset > alloc_size) {
+		dev_WARN_ONCE(dev, 1,
+			"Buffer overflow detected. Offset: %lu. Mapping size: %zu.\n",
+			offset, size);
+		return;
+	}
+	alloc_size -= offset;
+	orig_addr += offset;
 	if (size > alloc_size) {
 		dev_WARN_ONCE(dev, 1,
 			"Buffer overflow detected. Allocation size: %zu. Mapping size: %zu.\n",
-- 
1.8.3.1
From f06da55596675383fbf2563fe2919b2a8f68901b Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>

Date: Thu, 10 Jun 2021 12:41:26 -0700
Subject: [PATCH] swiotlb: manipulate orig_addr when tlb_addr has offset
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

in case of driver wants to sync part of ranges with offset,
swiotlb_tbl_sync_single() copies from orig_addr base to tlb_addr with
offset and ends up with data mismatch.

It was removed from
"swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single",
but said logic has to be added back in.

[From Linus's email:
That commit which the removed the offset calculation entirely, because the old

        (unsigned long)tlb_addr & (IO_TLB_SIZE - 1)

was wrong, but instead of removing it, I think it should have just
fixed it to be

        (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);

instead. That way the slot offset always matches the slot index calculation.]

The use-case that drivers are hitting is as follow:

1. Get dma_addr_t from dma_map_single()

dma_addr_t tlb_addr = dma_map_single(dev, vaddr, vsize, DMA_TO_DEVICE);

    |<---------------vsize------------->|
    +-----------------------------------+
    |                                   | original buffer
    +-----------------------------------+
  vaddr

 swiotlb_align_offset
     |<----->|<---------------vsize------------->|
     +-------+-----------------------------------+
     |       |                                   | swiotlb buffer
     +-------+-----------------------------------+
          tlb_addr

2. Do something
3. Sync dma_addr_t through dma_sync_single_for_device(..)

dma_sync_single_for_device(dev, tlb_addr + offset, size, DMA_TO_DEVICE);

  Error case.
    Copy data to original buffer but it is from base addr (instead of
  base addr + offset) in original buffer:

 swiotlb_align_offset
     |<----->|<- offset ->|<- size ->|
     +-------+-----------------------------------+
     |       |            |##########|           | swiotlb buffer
     +-------+-----------------------------------+
          tlb_addr

    |<- size ->|
    +-----------------------------------+
    |##########|                        | original buffer
    +-----------------------------------+
  vaddr

The fix is to copy the data to the original buffer and take into
account the offset, like so:

 swiotlb_align_offset
     |<----->|<- offset ->|<- size ->|
     +-------+-----------------------------------+
     |       |            |##########|           | swiotlb buffer
     +-------+-----------------------------------+
          tlb_addr

    |<- offset ->|<- size ->|
    +-----------------------------------+
    |            |##########|           | original buffer
    +-----------------------------------+
  vaddr

[This patch text is from Bumyong's email; and his solution was very close
to Linus's, so incorporating his text]

Fixes: 16fc3cef33a0 ("swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single")
Signed-off-by: Bumyong Lee <bumyong.lee@samsung.com>

Signed-off-by: Chanho Park <chanho61.park@samsung.com>

Reviewed-by: Christoph Hellwig <hch@lst.de>

Reported-by: Dominique MARTINET <dominique.martinet@atmark-techno.com>
Reported-by: Horia Geantă <horia.geanta@nxp.com>
Tested-by: Horia Geantă <horia.geanta@nxp.com>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

CC: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad@darnok.org>

---
 kernel/dma/swiotlb.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8ca7d50..dc438b5 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -342,6 +342,7 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
 {
 	struct io_tlb_mem *mem = io_tlb_default_mem;
 	int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
+	unsigned int offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);
 	phys_addr_t orig_addr = mem->slots[index].orig_addr;
 	size_t alloc_size = mem->slots[index].alloc_size;
 	unsigned long pfn = PFN_DOWN(orig_addr);
@@ -350,6 +351,14 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
 	if (orig_addr == INVALID_PHYS_ADDR)
 		return;
 
+	if (offset > alloc_size) {
+		dev_WARN_ONCE(dev, 1,
+			"Buffer overflow detected. Offset: %lu. Mapping size: %zu.\n",
+			offset, size);
+		return;
+	}
+	alloc_size -= offset;
+	orig_addr += offset;
 	if (size > alloc_size) {
 		dev_WARN_ONCE(dev, 1,
 			"Buffer overflow detected. Allocation size: %zu. Mapping size: %zu.\n",
-- 
1.8.3.1
Horia Geanta June 11, 2021, 10:59 a.m. UTC | #6
On 6/11/2021 1:35 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Jun 11, 2021 at 08:21:53AM +0200, Christoph Hellwig wrote:

>> On Thu, Jun 10, 2021 at 05:52:07PM +0300, Horia Geantă wrote:

>>> I've noticed the failure also in v5.10 and v5.11 stable kernels,

>>> since the patch set has been backported.

>>

>> FYI, there has been a patch on the list that should have fixed this

>> for about a month:

>>

>> https://lore.kernel.org/linux-iommu/20210510091816.GA2084@lst.de/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f

>>

>> but it seems like it never got picked up.

> 

> Yikes!

> 

> Dominique,

> 

> Would you be up to testing the attached (and inline) patch please?

> 

> Linus,

> 

> Would you be terribly offended if I took your code (s/unsigned

> long/unsigned int), and used Chanho's description of the problem (see below)?

> 

Both patches work for my case.

However, there's yet another, possibly significant, difference b/w the two:
	offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);
vs.
	offset = (tlb_addr & (IO_TLB_SIZE - 1)) -
		 swiotlb_align_offset(dev, orig_addr);

I think accounting for the alignment offset (swiotlb_align_offset())
has to be kept.

Horia
Linus Torvalds June 11, 2021, 4:01 p.m. UTC | #7
On Thu, Jun 10, 2021 at 11:21 PM Christoph Hellwig <hch@lst.de> wrote:
>

> FYI, there has been a patch on the list that should have fixed this

> for about a month:

>

> https://lore.kernel.org/linux-iommu/20210510091816.GA2084@lst.de/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f


Honestly, that patch is all kinds of strange.

This expression:

    tlb_offset = (tlb_addr & (IO_TLB_SIZE - 1)) -
        swiotlb_align_offset(dev, orig_addr);

makes no sense to me. Maybe it happens to work, but I think it does so
by mistake rather than by design.

What my patch used was:

    unsigned long offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);

which actually pairs with - and makes sense with - the index calculation:

    int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;

so that offset truly is the offset within that index. Look at how that
'index' calculation calculates the high bits of the difference, and
the 'offset' calculation now literally is the low bits of the same
thing that got dropped on the floor by the 'index' calculation?

So those two calculations actually make sense. The
swiotlb_align_offset() one doesn't.

It's possible that that swiotlb_align_offset() function ends up giving
the right answer just almost by mistake (because of how tlb_addr and
orig_addr end up being related - the swiotlb_align_offset() expression
might just end up being the same thing - I didn't look deeper), but
even if the result is the same, it's not _sensible_ code,

It's also possible that the swiotlb_align_offset() function ends up
giving the right answer very much by design and because of how
orig_addr works - because maybe the remapping is doing odd things and
using that swiotlb_align_offset() function in ways that make the
*obvious* and natural offset calculation not actually work.

So it's at least in theory possible that my "natural offset"
calculation that matches how the index is calculated doesn't actually
work. But that means that the swiotlb remapping is doing some really
odd things, and then I think the patch would need a lot more
commentary on exactly what those very odd things are.

            Linus
Linus Torvalds June 11, 2021, 4:21 p.m. UTC | #8
On Fri, Jun 11, 2021 at 3:35 AM Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>

> Linus,

>

> Would you be terribly offended if I took your code (s/unsigned

> long/unsigned int), and used Chanho's description of the problem (see below)?


No offense to that at all - that looks like the right solution. See my
answer to Christoph: I do think my patch does the right one, but I
can't test it and my knowledge of the swiotlb code is not complete
enough to really do anything else than "this looks right".

And adding my sign-off to the patch is fine, but I don't necessarily
need the authorship credit - mine was a throw-away patch just looking
at what the bisection report said. All the real effort was by the
reporters (and for the commit message, Bumyong Lee & co).

Finally - looking at the two places that do have that
swiotlb_align_offset(), my reaction is "I don't understand what that
code is doing".

The index does that

        index = find_slots(dev, orig_addr, alloc_size + offset);

so that offset does seem to depend on how the find_slots code works.
Which in turn does use the same dma_get_min_align_mask() thing that
swiotlb_align_offset() uses.  So the offsets do seem to match, but
find_slots(dev() does a lot of stuff that I don't know. so...

IOW, it does reinforce my "I don't know this code AT ALL". Which just
makes me more convinced that I shouldn't get authorship of the patch
because if something goes wrong with it, I can't help.

So at most maybe a "Suggested-by:".

My patch really was based on very little context and "this is the
calculation that makes sense given the other calculations in the
function".

              Linus
Jianxiong Gao June 16, 2021, 8:49 p.m. UTC | #9
On Fri, Jun 11, 2021 at 3:35 AM Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>

> On Fri, Jun 11, 2021 at 08:21:53AM +0200, Christoph Hellwig wrote:

> > On Thu, Jun 10, 2021 at 05:52:07PM +0300, Horia Geantă wrote:

> > > I've noticed the failure also in v5.10 and v5.11 stable kernels,

> > > since the patch set has been backported.

> >

> > FYI, there has been a patch on the list that should have fixed this

> > for about a month:

> >

> > https://lore.kernel.org/linux-iommu/20210510091816.GA2084@lst.de/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f

> >

> > but it seems like it never got picked up.

>

> Jianxiong,

> Would you be up for testing this patch on your NVMe rig please? I don't

> forsee a problem.. but just in case

>

I have tested the attached patch and it generates an error when
formatting a disk to xfs format in Rhel 8 environment:

sudo mkfs.xfs -f /dev/nvme0n2
meta-data=/dev/nvme0n2           isize=512    agcount=4, agsize=32768000 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=131072000, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=64000, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.
bad magic number
bad magic number
Metadata corruption detected at 0x56211de4c0c8, xfs_sb block 0x0/0x200
libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x200
releasing dirty buffer (bulk) to free list!

I applied the patch on commit 06af8679449d.
Konrad Rzeszutek Wilk June 17, 2021, 12:27 a.m. UTC | #10
On Wed, Jun 16, 2021 at 01:49:54PM -0700, Jianxiong Gao wrote:
> On Fri, Jun 11, 2021 at 3:35 AM Konrad Rzeszutek Wilk

> <konrad.wilk@oracle.com> wrote:

> >

> > On Fri, Jun 11, 2021 at 08:21:53AM +0200, Christoph Hellwig wrote:

> > > On Thu, Jun 10, 2021 at 05:52:07PM +0300, Horia Geantă wrote:

> > > > I've noticed the failure also in v5.10 and v5.11 stable kernels,

> > > > since the patch set has been backported.

> > >

> > > FYI, there has been a patch on the list that should have fixed this

> > > for about a month:

> > >

> > > https://lore.kernel.org/linux-iommu/20210510091816.GA2084@lst.de/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f

> > >

> > > but it seems like it never got picked up.

> >

> > Jianxiong,

> > Would you be up for testing this patch on your NVMe rig please? I don't

> > forsee a problem.. but just in case

> >

> I have tested the attached patch and it generates an error when

> formatting a disk to xfs format in Rhel 8 environment:


Thank you for testing that - and this is a bummer indeed.

Jianxiong,
How unique is this NVMe? Should I be able to reproduce this with any
type or is it specific to Google Cloud?

Dominique, Horia,

Are those crypto devices somehow easily available to test out the
patches?

P.S.
Most unfortunate timing - I am out in rural areas in US with not great
Internet, so won't be able to get fully down to this until Monday.
Dominique Martinet June 17, 2021, 12:39 a.m. UTC | #11
Konrad Rzeszutek Wilk wrote on Wed, Jun 16, 2021 at 08:27:39PM -0400:
> Thank you for testing that - and this is a bummer indeed.


Hm, actually not that surprising if it was working without the offset
adjustments and doing non-aligned mappings -- perhaps the nvme code just
needs to round the offsets down instead of expecting swiotlb to do it?

Note I didn't look at that part of the code at all, so I might be
stating the obvious in a way that's difficult to adjust...


> Dominique, Horia,

> 

> Are those crypto devices somehow easily available to test out the

> patches?


The one I have is included in the iMX8MP and iMX8MQ socs, the later is
included in the mnt reform and librem 5 and both have evaluation
toolkits but I wouldn't quite say they are easy to get...

I'm happy to test different patch variants if Horia doesn't beat me to
it though, it's not as practical as having the device but don't hesitate
to ask if I can run with extra debugs or something.

-- 
Dominique
Christoph Hellwig June 17, 2021, 5:12 a.m. UTC | #12
On Thu, Jun 17, 2021 at 09:39:15AM +0900, Dominique MARTINET wrote:
> Konrad Rzeszutek Wilk wrote on Wed, Jun 16, 2021 at 08:27:39PM -0400:

> > Thank you for testing that - and this is a bummer indeed.

> 

> Hm, actually not that surprising if it was working without the offset

> adjustments and doing non-aligned mappings -- perhaps the nvme code just

> needs to round the offsets down instead of expecting swiotlb to do it?


It can't.  The whole point of the series was to keep the original offsets.
Dominique Martinet June 17, 2021, 5:36 a.m. UTC | #13
Christoph Hellwig wrote on Thu, Jun 17, 2021 at 07:12:32AM +0200:
> On Thu, Jun 17, 2021 at 09:39:15AM +0900, Dominique MARTINET wrote:

> > Konrad Rzeszutek Wilk wrote on Wed, Jun 16, 2021 at 08:27:39PM -0400:

> > > Thank you for testing that - and this is a bummer indeed.

> > 

> > Hm, actually not that surprising if it was working without the offset

> > adjustments and doing non-aligned mappings -- perhaps the nvme code just

> > needs to round the offsets down instead of expecting swiotlb to do it?

> 

> It can't.  The whole point of the series was to keep the original offsets.


Right, now I'm reading this again there are two kind of offsets (quoting
code from today's master)
---
static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size,
                           enum dma_data_direction dir)
{
        struct io_tlb_mem *mem = io_tlb_default_mem;
        int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
        phys_addr_t orig_addr = mem->slots[index].orig_addr;
---

There is:
 - (tlb_addr - mem->start) alignment that Linus added up
 - mem->slots[index].orig_addr alignment (within IO_TLB_SIZE blocks)


I would assume that series made it possible to preserve offsets within a
block for orig_addr, but in the process broke the offsets of a bounce
within an memory slot (the first one) ; I assume we want to restore here
the offset within the IO_TLB_SIZE block in orig_addr so it needs another
offseting of that orig_addr offset e.g. taking a block and offsets
within blocks, we have at the start of function:

 |-----------------|-------------------|--------------------------|
 ^                 ^                   ^
 block start       slot orig addr      tlb_addr

and want the orig_addr variable to align with tlb_addr.


So I was a bit hasty in saying nvme needs to remove offsets, it's more
that current code only has the second one working while the quick fix
breaks the second one in the process of fixing the first...



Jianxiong Gao, before spending more time on this, could you also try
Chanho Park's patch?
https://lore.kernel.org/linux-iommu/20210510091816.GA2084@lst.de/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f

I frankly don't understand many details of that code at this point,
in particular I have no idea why or if the patch needs another offset
with mem->start or where the dma_get_min_align_mask(dev) comes from,
but it'll be interesting to test.


Thanks,
-- 
Dominique
Christoph Hellwig June 17, 2021, 11:33 a.m. UTC | #14
On Wed, Jun 16, 2021 at 08:27:39PM -0400, Konrad Rzeszutek Wilk wrote:
> How unique is this NVMe? Should I be able to reproduce this with any

> type or is it specific to Google Cloud?


With swiotlb=force this should be reproducable everywhere.
Jianxiong Gao June 18, 2021, 6:01 p.m. UTC | #15
> Jianxiong Gao, before spending more time on this, could you also try

> Chanho Park's patch?

> https://lore.kernel.org/linux-iommu/20210510091816.GA2084@lst.de/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f

>

I have tested Chanho Parks's patch and it works for us.
The NVMe driver performs correctly with the patch.

I have teste the patch on 06af8679449d

-- 
Jianxiong Gao
Dominique Martinet June 21, 2021, 2:03 a.m. UTC | #16
Jianxiong Gao wrote on Fri, Jun 18, 2021 at 11:01:59AM -0700:
> > Jianxiong Gao, before spending more time on this, could you also try

> > Chanho Park's patch?

> > https://lore.kernel.org/linux-iommu/20210510091816.GA2084@lst.de/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f

> >

> I have tested Chanho Parks's patch and it works for us.

> The NVMe driver performs correctly with the patch.

> 

> I have teste the patch on 06af8679449d


Thanks!
(a bit late, but added Chanho Park in Cc...)

I can confirm it also works for our caam problem, as Horia said.

I've also come to term with the use of swiotlb_align_offset() through
testing, or rather many devices seem to have a 0 mask so it will almost
always be cancelled out, so if it works for Jianxiong then it's probably
good enough and I'll just assume that's how the orig_addr has been
designed...

I think it's missing a couple of checks like the one Linus had in his
patch, and would be comfortable with something like the attached patch
(in practice for me exactly the same as the original patch, except I've
added two checks: offsets smaller than orig addr offset are refused as
well as offsets bigger than the mapping size)

I'm sorry Jianxiong but would you be willing to take the time to test
again just to make sure there were no such offsets in your case?


If we're good with that I'll send it as an official v2 keeping Chanho's
from, unless he wants to.


Thanks everyone,
-- 
Dominique
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8ca7d505d61c..23f8d0b168c5 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -334,6 +334,14 @@ void __init swiotlb_exit(void)
 	io_tlb_default_mem = NULL;
 }
 
+/*
+ * Return the offset into a iotlb slot required to keep the device happy.
+ */
+static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
+{
+	return addr & dma_get_min_align_mask(dev) & (IO_TLB_SIZE - 1);
+}
+
 /*
  * Bounce: copy the swiotlb buffer from or back to the original dma location
  */
@@ -346,10 +354,31 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
 	size_t alloc_size = mem->slots[index].alloc_size;
 	unsigned long pfn = PFN_DOWN(orig_addr);
 	unsigned char *vaddr = phys_to_virt(tlb_addr);
+	unsigned int tlb_offset, orig_addr_offset;
 
 	if (orig_addr == INVALID_PHYS_ADDR)
 		return;
 
+	tlb_offset = tlb_addr & (IO_TLB_SIZE - 1);
+	orig_addr_offset = swiotlb_align_offset(dev, orig_addr);
+	if (tlb_offset < orig_addr_offset) {
+		dev_WARN_ONCE(dev, 1,
+			"Access before mapping start detected. orig offset %u, requested offset %u.\n",
+			orig_addr_offset, tlb_offset);
+		return;
+	}
+
+	tlb_offset -= orig_addr_offset;
+	if (tlb_offset > alloc_size) {
+		dev_WARN_ONCE(dev, 1,
+			"Buffer overflow detected. Allocation size: %zu. Mapping size: %zu+%u.\n",
+			alloc_size, size, tlb_offset);
+		return;
+	}
+
+	orig_addr += tlb_offset;
+	alloc_size -= tlb_offset;
+
 	if (size > alloc_size) {
 		dev_WARN_ONCE(dev, 1,
 			"Buffer overflow detected. Allocation size: %zu. Mapping size: %zu.\n",
@@ -390,14 +419,6 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
 
 #define slot_addr(start, idx)	((start) + ((idx) << IO_TLB_SHIFT))
 
-/*
- * Return the offset into a iotlb slot required to keep the device happy.
- */
-static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
-{
-	return addr & dma_get_min_align_mask(dev) & (IO_TLB_SIZE - 1);
-}
-
 /*
  * Carefully handle integer overflow which can occur when boundary_mask == ~0UL.
  */
Chanho Park June 21, 2021, 2:55 a.m. UTC | #17
+ Bumyong who is the original author of the patch. 

Hi Dominique,

> Thanks!

> (a bit late, but added Chanho Park in Cc...)

> 

> I can confirm it also works for our caam problem, as Horia said.

> 

> I've also come to term with the use of swiotlb_align_offset() through

> testing, or rather many devices seem to have a 0 mask so it will almost

> always be cancelled out, so if it works for Jianxiong then it's probably

> good enough and I'll just assume that's how the orig_addr has been

> designed...

> 

> I think it's missing a couple of checks like the one Linus had in his

> patch, and would be comfortable with something like the attached patch (in

> practice for me exactly the same as the original patch, except I've added

> two checks: offsets smaller than orig addr offset are refused as well as

> offsets bigger than the mapping size)

> 

> I'm sorry Jianxiong but would you be willing to take the time to test

> again just to make sure there were no such offsets in your case?

> 

> 

> If we're good with that I'll send it as an official v2 keeping Chanho's

> from, unless he wants to.

> 


Sure. No problem. But, the patch was already stacked on Konrad's tree
and linux-next as well.

https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/commit/?h=devel/for-linus-5.14&id=33d1641f38f0c327bc3e5c21de585c77a6512bc6 

Best Regards, 
Chanho Park
Dominique Martinet June 21, 2021, 4:14 a.m. UTC | #18
Chanho Park wrote on Mon, Jun 21, 2021 at 11:55:22AM +0900:
> Sure. No problem. But, the patch was already stacked on Konrad's tree

> and linux-next as well.

> 

> https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/commit/?h=devel/for-linus-5.14&id=33d1641f38f0c327bc3e5c21de585c77a6512bc6 


That patch is slightly different, it's a rewrite Konrad did that mixes
in Linus' suggestion[1], which breaks things for the NVMe usecase
Jianxiong Gao has.

[1] offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1)


Konrad is aware so I think it shouldn't be submitted :)

-- 
Dominique
Konrad Rzeszutek Wilk June 21, 2021, 1:16 p.m. UTC | #19
On Mon, Jun 21, 2021 at 01:14:48PM +0900, 'Dominique MARTINET' wrote:
> Chanho Park wrote on Mon, Jun 21, 2021 at 11:55:22AM +0900:

> > Sure. No problem. But, the patch was already stacked on Konrad's tree

> > and linux-next as well.

> > 

> > https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/commit/?h=devel/for-linus-5.14&id=33d1641f38f0c327bc3e5c21de585c77a6512bc6 

> 

> That patch is slightly different, it's a rewrite Konrad did that mixes

> in Linus' suggestion[1], which breaks things for the NVMe usecase

> Jianxiong Gao has.

> 

> [1] offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1)

> 

> 

> Konrad is aware so I think it shouldn't be submitted :)


The beaty of 'devel' and 'linux-next' is that they can be reshuffled and
mangled. I pushed them original patch from Bumyong there and will let
it sit for a day and then create a stable branch and give it to Linus.

Then I need to expand the test-regression bucket so that this does not
happen again. Dominique, how easy would it be to purchase one of those
devices?

I was originally thinking to create a crypto device in QEMU to simulate
this but that may take longer to write than just getting the real thing.

Or I could create some fake devices with weird offsets and write a driver
for it to exercise this.. like this one I had done some time ago that
needs some brushing off.
Dominique Martinet June 22, 2021, 7:48 a.m. UTC | #20
Konrad Rzeszutek Wilk wrote on Mon, Jun 21, 2021 at 09:16:43AM -0400:
> The beaty of 'devel' and 'linux-next' is that they can be reshuffled and

> mangled. I pushed them original patch from Bumyong there and will let

> it sit for a day and then create a stable branch and give it to Linus.


Thanks, that should be good.

Do you want me to send a follow-up patch with the two extra checks
(tlb_addr & (IO_TLB_SIZE -1)) > swiotlb_align_offset(dev, orig_addr)
tlb_offset < alloc_size

or are we certain this can't ever happen?
(I didn't see any hit in dmesg when I ran with these, but my opinion is
better safe than sorry...)


> Then I need to expand the test-regression bucket so that this does not

> happen again. Dominique, how easy would it be to purchase one of those

> devices?


My company is making such a device, but it's not on the market yet
(was planned for august, with some delay in approvisionning it'll
probably be a bit late), and would mean buying from Japan so I'm not
sure how convenient that would be...

These are originally NXP devices so I assume Horia would have better
suggestions, if you would?


> I was originally thinking to create a crypto device in QEMU to simulate

> this but that may take longer to write than just getting the real thing.

> 

> Or I could create some fake devices with weird offsets and write a driver

> for it to exercise this.. like this one I had done some time ago that

> needs some brushing off.


Just a fake device with fake offsets as a test is probably good enough,
ideally would need to exerce both failures we've seen (offset in
dma_sync_single_for_device like caam does and in the original mapping (I
assume?) like the NVMe driver does), but that sounds possible :)


Thanks again!
-- 
Dominique
Konrad Rzeszutek Wilk June 22, 2021, 9:58 p.m. UTC | #21
On Tue, Jun 22, 2021 at 04:48:24PM +0900, 'Dominique MARTINET' wrote:
> Konrad Rzeszutek Wilk wrote on Mon, Jun 21, 2021 at 09:16:43AM -0400:

> > The beaty of 'devel' and 'linux-next' is that they can be reshuffled and

> > mangled. I pushed them original patch from Bumyong there and will let

> > it sit for a day and then create a stable branch and give it to Linus.

> 

> Thanks, that should be good.

> 

> Do you want me to send a follow-up patch with the two extra checks

> (tlb_addr & (IO_TLB_SIZE -1)) > swiotlb_align_offset(dev, orig_addr)

> tlb_offset < alloc_size

> 

> or are we certain this can't ever happen?


I would love more patches and I saw the previous one you posted.

But we only got two (or one) weeks before the next merge window opens
so I am sending to Linus the one that was tested with NVMe and crypto
(see above).

That is the
https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/commit/?h=stable/for-linus-5.14

And then after Linus releases the 5.14 - I would love to take your
cleanup on top of that and test it?

> (I didn't see any hit in dmesg when I ran with these, but my opinion is

> better safe than sorry...)

> 

> 

> > Then I need to expand the test-regression bucket so that this does not

> > happen again. Dominique, how easy would it be to purchase one of those

> > devices?

> 

> My company is making such a device, but it's not on the market yet

> (was planned for august, with some delay in approvisionning it'll

> probably be a bit late), and would mean buying from Japan so I'm not

> sure how convenient that would be...

> 

> These are originally NXP devices so I assume Horia would have better

> suggestions, if you would?

> 

> 

> > I was originally thinking to create a crypto device in QEMU to simulate

> > this but that may take longer to write than just getting the real thing.

> > 

> > Or I could create some fake devices with weird offsets and write a driver

> > for it to exercise this.. like this one I had done some time ago that

> > needs some brushing off.

> 

> Just a fake device with fake offsets as a test is probably good enough,

> ideally would need to exerce both failures we've seen (offset in

> dma_sync_single_for_device like caam does and in the original mapping (I

> assume?) like the NVMe driver does), but that sounds possible :)


Yup. Working on that now.
> 

> 

> Thanks again!

> -- 

> Dominique
Dominique Martinet June 22, 2021, 11:04 p.m. UTC | #22
Konrad Rzeszutek Wilk wrote on Tue, Jun 22, 2021 at 05:58:14PM -0400:
> On Tue, Jun 22, 2021 at 04:48:24PM +0900, 'Dominique MARTINET' wrote:

> > Thanks, that should be good.

> > 

> > Do you want me to send a follow-up patch with the two extra checks

> > (tlb_addr & (IO_TLB_SIZE -1)) > swiotlb_align_offset(dev, orig_addr)

> > tlb_offset < alloc_size

> > 

> > or are we certain this can't ever happen?

> 

> I would love more patches and I saw the previous one you posted.

> 

> But we only got two (or one) weeks before the next merge window opens

> so I am sending to Linus the one that was tested with NVMe and crypto

> (see above).

> 

> That is the

> https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/commit/?h=stable/for-linus-5.14

> 

> And then after Linus releases the 5.14 - I would love to take your

> cleanup on top of that and test it?


That sounds good to me, will send with proper formatting after release.

-- 
Dominique
diff mbox series

Patch

--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -349,6 +349,7 @@  static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
 
 	if (orig_addr == INVALID_PHYS_ADDR)
 		return;
+	orig_addr += (unsigned long)tlb_addr & (IO_TLB_SIZE - 1);
 
 	if (size > alloc_size) {
 		dev_WARN_ONCE(dev, 1,