mbox series

[v5.10,0/8] preserve DMA offsets when using swiotlb

Message ID 20210405210230.1707074-1-jxgao@google.com
Headers show
Series preserve DMA offsets when using swiotlb | expand

Message

Jianxiong Gao April 5, 2021, 9:02 p.m. UTC
Hi all,

This series of backports fixes the SWIOTLB library to maintain the
page offset when mapping a DMA address. The bug that motivated this
patch series manifested when running a 5.4 kernel as a SEV guest with
an NVMe device. However, any device that infers information from the
page offset and is accessed through the SWIOTLB will benefit from this
bug fix.

Jianxiong Gao (8):
  driver core: add a min_align_mask field to struct 
    device_dma_parameters
  swiotlb: factor out an io_tlb_offset helper
  swiotlb: factor out a nr_slots helper
  swiotlb: clean up swiotlb_tbl_unmap_single
  swiotlb: refactor swiotlb_tbl_map_single
  swiotlb: don't modify orig_addr in  swiotlb_tbl_sync_single
  swiotlb: respect min_align_mask
  nvme-pci: set min_align_mask

 drivers/nvme/host/pci.c     |   1 +
 include/linux/device.h      |   1 +
 include/linux/dma-mapping.h |  16 +++
 include/linux/swiotlb.h     |   1 +
 kernel/dma/swiotlb.c        | 256 ++++++++++++++++++++----------------
 5 files changed, 160 insertions(+), 115 deletions(-)

Comments

Greg KH April 7, 2021, 12:50 p.m. UTC | #1
On Mon, Apr 05, 2021 at 09:02:23PM +0000, Jianxiong Gao wrote:
> 'commit 36950f2da1ea ("driver core: add a min_align_mask field to struct device_dma_parameters")'


Odd first line, can you at least try to use the normal format we use for
stable kernels?

> Some devices rely on the address offset in a page to function

> correctly (NVMe driver as an example). These devices may use

> a different page size than the Linux kernel. The address offset

> has to be preserved upon mapping, and in order to do so, we

> need to record the page_offset_mask first.

> 

> Signed-off-by: Jianxiong Gao <jxgao@google.com>

> Signed-off-by: Christoph Hellwig <hch@lst.de>


What happened to all the other signed-off-by lines?

And yours should go last, right?

This series needs work :(

thanks,

greg k-h
Greg KH April 7, 2021, 12:51 p.m. UTC | #2
On Mon, Apr 05, 2021 at 09:02:22PM +0000, Jianxiong Gao wrote:
> Hi all,

> 

> This series of backports fixes the SWIOTLB library to maintain the

> page offset when mapping a DMA address. The bug that motivated this

> patch series manifested when running a 5.4 kernel as a SEV guest with

> an NVMe device. However, any device that infers information from the

> page offset and is accessed through the SWIOTLB will benefit from this

> bug fix.


But this is 5.10, not 5.4, why mention 5.4 here?

And you are backporting a 5.12-rc feature to 5.10, what happened to
5.11?

Why not just use 5.12 to get this new feature instead of using an older
kernel?  It's not like this has ever worked before, right?

thanks,

greg k-h
Jianxiong Gao April 20, 2021, 11:38 p.m. UTC | #3
On Wed, Apr 7, 2021 at 5:51 AM Greg KH <greg@kroah.com> wrote:
>

> On Mon, Apr 05, 2021 at 09:02:22PM +0000, Jianxiong Gao wrote:

> > Hi all,

> >

> > This series of backports fixes the SWIOTLB library to maintain the

> > page offset when mapping a DMA address. The bug that motivated this

> > patch series manifested when running a 5.4 kernel as a SEV guest with

> > an NVMe device. However, any device that infers information from the

> > page offset and is accessed through the SWIOTLB will benefit from this

> > bug fix.

>

> But this is 5.10, not 5.4, why mention 5.4 here?

Oops. The cover letter shouldn't mention the kernel version. The bug
is present in both 5.4 and 5.10. Sorry for the confusion.>
> And you are backporting a 5.12-rc feature to 5.10, what happened to

> 5.11?

No. The goal is to backport a bug fix to the LTS releases.
> Why not just use 5.12 to get this new feature instead of using an older

> kernel?  It's not like this has ever worked before, right?

>

It's true, that a new feature (SEV virtualization) is what motivated
the bug fix. However, I still think this makes sense to backport to
the LTS releases because it does fix a pre-existing bug that may be
impacting pre-existing setups.

In particular, while working on these patches, I got the following feedback:
"There are plenty of other hardware designs that rely on dma mapping
not adding offsets that did not exist, e.g. ahci and various RDMA
NICs."

[1] https://lkml.org/lkml/2020/11/24/520
> thanks,

>

> greg k-h




-- 
Jianxiong Gao
Greg KH April 23, 2021, 3:14 p.m. UTC | #4
On Tue, Apr 20, 2021 at 04:38:13PM -0700, Jianxiong Gao wrote:
> On Wed, Apr 7, 2021 at 5:51 AM Greg KH <greg@kroah.com> wrote:

> >

> > On Mon, Apr 05, 2021 at 09:02:22PM +0000, Jianxiong Gao wrote:

> > > Hi all,

> > >

> > > This series of backports fixes the SWIOTLB library to maintain the

> > > page offset when mapping a DMA address. The bug that motivated this

> > > patch series manifested when running a 5.4 kernel as a SEV guest with

> > > an NVMe device. However, any device that infers information from the

> > > page offset and is accessed through the SWIOTLB will benefit from this

> > > bug fix.

> >

> > But this is 5.10, not 5.4, why mention 5.4 here?

> Oops. The cover letter shouldn't mention the kernel version. The bug

> is present in both 5.4 and 5.10. Sorry for the confusion.>

> > And you are backporting a 5.12-rc feature to 5.10, what happened to

> > 5.11?

> No. The goal is to backport a bug fix to the LTS releases.

> > Why not just use 5.12 to get this new feature instead of using an older

> > kernel?  It's not like this has ever worked before, right?

> >

> It's true, that a new feature (SEV virtualization) is what motivated

> the bug fix. However, I still think this makes sense to backport to

> the LTS releases because it does fix a pre-existing bug that may be

> impacting pre-existing setups.


How?  Anything that installed 5.10 when it was released never had this
working, they had to move to 5.12 to get that to work.

> In particular, while working on these patches, I got the following feedback:

> "There are plenty of other hardware designs that rely on dma mapping

> not adding offsets that did not exist, e.g. ahci and various RDMA

> NICs."


I do not understand that statement, how does that pertain to this patch
set?

confused,

greg k-h
Jianxiong Gao April 23, 2021, 5:28 p.m. UTC | #5
> How?  Anything that installed 5.10 when it was released never had this

> working, they had to move to 5.12 to get that to work.


I wasn't clear. The bug is not specific to SEV virtualization. We
simply encountered it while working on SEV virtualization. This is a
pre-existing bug.

Briefly, the NVMe spec expects a page offset to be retained from the
memory address space to the IO address space.

Before these patches, the SWIOTLB truncates any page offset.

Thus, all NVMe + SWIOTLB systems are broken due to this bug without
these patches.

I searched online and found what appeared to be a very similar bug
from a few years ago [1]. Ultimately, it was fixed in the device
firmware. However, it began with NVMe + SWIOTLB resulting in similar
issues to what we observed without these patches.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1402533
-- 
Jianxiong Gao
Jianxiong Gao April 23, 2021, 10:10 p.m. UTC | #6
+Marc, who can help filling the gaps.
-- 
Jianxiong Gao
Marc Orr April 23, 2021, 10:33 p.m. UTC | #7
On Fri, Apr 23, 2021 at 3:11 PM Jianxiong Gao <jxgao@google.com> wrote:
>

> +Marc, who can help filling the gaps.

> --

> Jianxiong Gao


Oops. Gao over-trimmed. From lkml, here's Gao's last reply.

>> How?  Anything that installed 5.10 when it was released never had this

>> working, they had to move to 5.12 to get that to work.


> I wasn't clear. The bug is not specific to SEV virtualization. We

> simply encountered it while working on SEV virtualization. This is a

> pre-existing bug.

>

> Briefly, the NVMe spec expects a page offset to be retained from the

> memory address space to the IO address space.

>

> Before these patches, the SWIOTLB truncates any page offset.

>

> Thus, all NVMe + SWIOTLB systems are broken due to this bug without

> these patches.

>

> I searched online and found what appeared to be a very similar bug

> from a few years ago [1]. Ultimately, it was fixed in the device

> firmware. However, it began with NVMe + SWIOTLB resulting in similar

> issues to what we observed without these patches.

>

> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1402533


The bug is not specific to SEV virtualization.  We've repro'd the bug
on vanilla NVMe + SWIOTLB kernels, and confirmed that these patches
fix the issue. We simply first encountered it while working on SEV
virtualization.

The bug itself is that on an NVMe + SWIOTLB setup, `mkfs.xfs -f
/dev/nvme2n1` triggers the following error "mkfs.xfs: pwrite failed:
Input/output error". We observed this on a RHEL system.

An example system where a user might encounter this bug is the
following. On a system with NVMe + older 32-bit devices that has been
configured with the `swiotlb=force` kernel command line flag to ensure
that the 32-bit devices work properly.
Greg KH April 24, 2021, 2:43 p.m. UTC | #8
On Fri, Apr 23, 2021 at 10:28:32AM -0700, Jianxiong Gao wrote:
> > How?  Anything that installed 5.10 when it was released never had this

> > working, they had to move to 5.12 to get that to work.

> 

> I wasn't clear. The bug is not specific to SEV virtualization. We

> simply encountered it while working on SEV virtualization. This is a

> pre-existing bug.

> 

> Briefly, the NVMe spec expects a page offset to be retained from the

> memory address space to the IO address space.

> 

> Before these patches, the SWIOTLB truncates any page offset.

> 

> Thus, all NVMe + SWIOTLB systems are broken due to this bug without

> these patches.


Ok, and what prevents you from adding this new feature do your "custom"
kernel, or to use 5.12 instead?

This is a new feature that Linux has never supported, and these patches
are not "trivial" at all.  I also do not see the maintainer of the
subsystem agreeing that these are needed to be backported, which is not
a good sign.

So I recommend just using a newer kernel version, that way all will be
good and no need to backport anything.  What is preventing you from
doing that today?

thanks,

greg k-h
Jianxiong Gao April 26, 2021, 6 p.m. UTC | #9
On Sat, Apr 24, 2021 at 7:43 AM Greg KH <greg@kroah.com> wrote:
> Ok, and what prevents you from adding this new feature do your "custom"

> kernel, or to use 5.12 instead?

>

> This is a new feature that Linux has never supported, and these patches

> are not "trivial" at all.  I also do not see the maintainer of the

> subsystem agreeing that these are needed to be backported, which is not

> a good sign.

>

So this is not about a new feature. This is about an existing bug that
we stumbled onto while using SEV virtualization. However SEV is not
needed to trigger the bug. We have reproduced the bug with just
NVMe + SWIOTLB=force option in Rhel 8 environment. Please note
that NVMe and SWIOTLB=force are both existing feature and without
the patch they don't work together. This is why we are proposing to merge
the patches into the LTS kernels.
> So I recommend just using a newer kernel version, that way all will be

> good and no need to backport anything.  What is preventing you from

> doing that today?

>

> thanks,

>

> greg k-h




-- 
Jianxiong Gao
Sasha Levin April 28, 2021, 4:18 p.m. UTC | #10
On Mon, Apr 26, 2021 at 11:00:56AM -0700, Jianxiong Gao wrote:
>On Sat, Apr 24, 2021 at 7:43 AM Greg KH <greg@kroah.com> wrote:

>> Ok, and what prevents you from adding this new feature do your "custom"

>> kernel, or to use 5.12 instead?

>>

>> This is a new feature that Linux has never supported, and these patches

>> are not "trivial" at all.  I also do not see the maintainer of the

>> subsystem agreeing that these are needed to be backported, which is not

>> a good sign.

>>

>So this is not about a new feature. This is about an existing bug that

>we stumbled onto while using SEV virtualization. However SEV is not

>needed to trigger the bug. We have reproduced the bug with just

>NVMe + SWIOTLB=force option in Rhel 8 environment. Please note

>that NVMe and SWIOTLB=force are both existing feature and without

>the patch they don't work together. This is why we are proposing to merge

>the patches into the LTS kernels.


Could you re-spin this series with the comments around patch formatting
addressed, and explicitly cc hch on this to get his ack?

-- 
Thanks,
Sasha