mbox series

[5.4,v2,0/9] preserve DMA offsets when using swiotlb

Message ID 20210518221818.2963918-1-jxgao@google.com
Headers show
Series preserve DMA offsets when using swiotlb | expand

Message

Jianxiong Gao May 18, 2021, 10:18 p.m. UTC
We observed several NVMe failures when running with SWIOTLB. The root
cause of the issue is that when data is mapped via SWIOTLB, the address
offset is not preserved. Several device drivers including the NVMe
driver relies on this offset to function correctly.

Even though we discovered the error when running using AMD SEV, we have
reproduced the same error in Rhel 8 without SEV. By adding swiotlb=force
option to the boot command line parameter, NVMe funcionality is
impacted. For example formatting a disk into xfs format returns an
error.

----
Changes in v2:
Rebased patches to 5.4.115
Updated patch description to correct format.

Jianxiong Gao (9):
  driver core: add a min_align_mask field to struct
    device_dma_parameters
  swiotlb: add a IO_TLB_SIZE define
  swiotlb: factor out an io_tlb_offset helper
  swiotlb: factor out a nr_slots helper
  swiotlb: clean up swiotlb_tbl_unmap_single
  swiotlb: refactor swiotlb_tbl_map_single
  swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single
  swiotlb: respect min_align_mask
  nvme-pci: set min_align_mask

 drivers/nvme/host/pci.c     |   1 +
 include/linux/device.h      |   1 +
 include/linux/dma-mapping.h |  16 +++
 include/linux/swiotlb.h     |   1 +
 kernel/dma/swiotlb.c        | 269 ++++++++++++++++++++----------------
 5 files changed, 169 insertions(+), 119 deletions(-)

Comments

Greg KH May 19, 2021, 8:11 a.m. UTC | #1
On Tue, May 18, 2021 at 10:18:09PM +0000, Jianxiong Gao wrote:
> We observed several NVMe failures when running with SWIOTLB. The root

> cause of the issue is that when data is mapped via SWIOTLB, the address

> offset is not preserved. Several device drivers including the NVMe

> driver relies on this offset to function correctly.

> 

> Even though we discovered the error when running using AMD SEV, we have

> reproduced the same error in Rhel 8 without SEV. By adding swiotlb=force

> option to the boot command line parameter, NVMe funcionality is

> impacted. For example formatting a disk into xfs format returns an

> error.


I still fail to understand why you can not just use the 5.10.y kernel or
newer.  What is preventing you from doing this if you wish to use this
type of hardware?  This is not a "regression" in that the 5.4.y kernel
has never worked with this hardware before, it feels like a new feature.

Please, just use 5.10.y or newer, your life will be so much easier in
the longrun.

thanks,

greg k-h
Jianxiong Gao May 19, 2021, 4:42 p.m. UTC | #2
On Wed, May 19, 2021 at 1:11 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>

> I still fail to understand why you can not just use the 5.10.y kernel or

> newer.  What is preventing you from doing this if you wish to use this

> type of hardware?  This is not a "regression" in that the 5.4.y kernel

> has never worked with this hardware before, it feels like a new feature.

>

NVMe + SWIOTLB is not a new feature. From my understanding it should
be supported by 5.4.y kernel correctly. Currently without the patch, any
NVMe device (along with some other devices that relies on offset to
work correctly), could be broken if the SWIOTLB is used on a 5.4.y kernel.

Neither NVMe driver nor the SWIOTLB is new on 5.4.y kernel.

If by new feature you mean SEV, then yes SEV relies on NVMe and SWIOTLB
to work correctly. But we can, and have reproduced the same bug without
SEV. So I believe this patch is necessary even without considering SEV.

Thanks
> Please, just use 5.10.y or newer, your life will be so much easier in

> the longrun.

>

> thanks,

>

> greg k-h




-- 
Jianxiong Gao
Greg KH May 19, 2021, 5:03 p.m. UTC | #3
On Wed, May 19, 2021 at 09:42:42AM -0700, Jianxiong Gao wrote:
> On Wed, May 19, 2021 at 1:11 AM Greg KH <gregkh@linuxfoundation.org> wrote:

> >

> > I still fail to understand why you can not just use the 5.10.y kernel or

> > newer.  What is preventing you from doing this if you wish to use this

> > type of hardware?  This is not a "regression" in that the 5.4.y kernel

> > has never worked with this hardware before, it feels like a new feature.

> >

> NVMe + SWIOTLB is not a new feature. From my understanding it should

> be supported by 5.4.y kernel correctly. Currently without the patch, any

> NVMe device (along with some other devices that relies on offset to

> work correctly), could be broken if the SWIOTLB is used on a 5.4.y kernel.


Then do not do that, as obviously it never worked without your fixes, so
this isn't a "regression".

And again, why can you not just use 5.10.y?

thanks,

greg k-h
Marc Orr May 19, 2021, 5:18 p.m. UTC | #4
On Wed, May 19, 2021 at 10:03 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>

> On Wed, May 19, 2021 at 09:42:42AM -0700, Jianxiong Gao wrote:

> > On Wed, May 19, 2021 at 1:11 AM Greg KH <gregkh@linuxfoundation.org> wrote:

> > >

> > > I still fail to understand why you can not just use the 5.10.y kernel or

> > > newer.  What is preventing you from doing this if you wish to use this

> > > type of hardware?  This is not a "regression" in that the 5.4.y kernel

> > > has never worked with this hardware before, it feels like a new feature.

> > >

> > NVMe + SWIOTLB is not a new feature. From my understanding it should

> > be supported by 5.4.y kernel correctly. Currently without the patch, any

> > NVMe device (along with some other devices that relies on offset to

> > work correctly), could be broken if the SWIOTLB is used on a 5.4.y kernel.

>

> Then do not do that, as obviously it never worked without your fixes, so

> this isn't a "regression".


NVMe + SWIOTLB works fine without this bug fix. By fine I mean that a
guest kernel using this configuration boots and runs stably for weeks
and months under general-purpose usage. The bug that this patch set
fixes was only encountered when a user tried to format an xfs
filesystem under a RHEL guest kernel.

> And again, why can you not just use 5.10.y?


For our use case, this fixes the guest kernel, not the host kernel.
The guest distros that we support use 5.4 kernels. We do not control
the kernel that the distros deploy for usage as a guest OS on cloud.
We only control the host kernel.

Thanks,
Marc
Greg KH May 19, 2021, 5:25 p.m. UTC | #5
On Wed, May 19, 2021 at 10:18:38AM -0700, Marc Orr wrote:
> On Wed, May 19, 2021 at 10:03 AM Greg KH <gregkh@linuxfoundation.org> wrote:

> >

> > On Wed, May 19, 2021 at 09:42:42AM -0700, Jianxiong Gao wrote:

> > > On Wed, May 19, 2021 at 1:11 AM Greg KH <gregkh@linuxfoundation.org> wrote:

> > > >

> > > > I still fail to understand why you can not just use the 5.10.y kernel or

> > > > newer.  What is preventing you from doing this if you wish to use this

> > > > type of hardware?  This is not a "regression" in that the 5.4.y kernel

> > > > has never worked with this hardware before, it feels like a new feature.

> > > >

> > > NVMe + SWIOTLB is not a new feature. From my understanding it should

> > > be supported by 5.4.y kernel correctly. Currently without the patch, any

> > > NVMe device (along with some other devices that relies on offset to

> > > work correctly), could be broken if the SWIOTLB is used on a 5.4.y kernel.

> >

> > Then do not do that, as obviously it never worked without your fixes, so

> > this isn't a "regression".

> 

> NVMe + SWIOTLB works fine without this bug fix. By fine I mean that a

> guest kernel using this configuration boots and runs stably for weeks

> and months under general-purpose usage. The bug that this patch set

> fixes was only encountered when a user tried to format an xfs

> filesystem under a RHEL guest kernel.

> 

> > And again, why can you not just use 5.10.y?

> 

> For our use case, this fixes the guest kernel, not the host kernel.

> The guest distros that we support use 5.4 kernels. We do not control

> the kernel that the distros deploy for usage as a guest OS on cloud.

> We only control the host kernel.


And how are you going to get your guest kernels to update to these
patches?  What specific ones are you concerned about?

RHEL ignores stable kernel updates, so if you are worried about them,
please just work with that company directly.

Good luck!

greg k-h
Marc Orr May 19, 2021, 8:01 p.m. UTC | #6
On Wed, May 19, 2021 at 10:25 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>

> On Wed, May 19, 2021 at 10:18:38AM -0700, Marc Orr wrote:

> > On Wed, May 19, 2021 at 10:03 AM Greg KH <gregkh@linuxfoundation.org> wrote:

> > >

> > > On Wed, May 19, 2021 at 09:42:42AM -0700, Jianxiong Gao wrote:

> > > > On Wed, May 19, 2021 at 1:11 AM Greg KH <gregkh@linuxfoundation.org> wrote:

> > > > >

> > > > > I still fail to understand why you can not just use the 5.10.y kernel or

> > > > > newer.  What is preventing you from doing this if you wish to use this

> > > > > type of hardware?  This is not a "regression" in that the 5.4.y kernel

> > > > > has never worked with this hardware before, it feels like a new feature.

> > > > >

> > > > NVMe + SWIOTLB is not a new feature. From my understanding it should

> > > > be supported by 5.4.y kernel correctly. Currently without the patch, any

> > > > NVMe device (along with some other devices that relies on offset to

> > > > work correctly), could be broken if the SWIOTLB is used on a 5.4.y kernel.

> > >

> > > Then do not do that, as obviously it never worked without your fixes, so

> > > this isn't a "regression".

> >

> > NVMe + SWIOTLB works fine without this bug fix. By fine I mean that a

> > guest kernel using this configuration boots and runs stably for weeks

> > and months under general-purpose usage. The bug that this patch set

> > fixes was only encountered when a user tried to format an xfs

> > filesystem under a RHEL guest kernel.

> >

> > > And again, why can you not just use 5.10.y?

> >

> > For our use case, this fixes the guest kernel, not the host kernel.

> > The guest distros that we support use 5.4 kernels. We do not control

> > the kernel that the distros deploy for usage as a guest OS on cloud.

> > We only control the host kernel.

>

> And how are you going to get your guest kernels to update to these

> patches?  What specific ones are you concerned about?

>

> RHEL ignores stable kernel updates, so if you are worried about them,

> please just work with that company directly.


We support COS as a guest [1], which does base their kernel on 5.4
LTS. If these patches were accepted into 5.4 LTS, they would
automatically get picked up by COS.

[1] https://cloud.google.com/container-optimized-os

Thanks,
Marc
Greg KH May 20, 2021, 8:32 a.m. UTC | #7
On Wed, May 19, 2021 at 01:01:25PM -0700, Marc Orr wrote:
> On Wed, May 19, 2021 at 10:25 AM Greg KH <gregkh@linuxfoundation.org> wrote:

> >

> > On Wed, May 19, 2021 at 10:18:38AM -0700, Marc Orr wrote:

> > > On Wed, May 19, 2021 at 10:03 AM Greg KH <gregkh@linuxfoundation.org> wrote:

> > > >

> > > > On Wed, May 19, 2021 at 09:42:42AM -0700, Jianxiong Gao wrote:

> > > > > On Wed, May 19, 2021 at 1:11 AM Greg KH <gregkh@linuxfoundation.org> wrote:

> > > > > >

> > > > > > I still fail to understand why you can not just use the 5.10.y kernel or

> > > > > > newer.  What is preventing you from doing this if you wish to use this

> > > > > > type of hardware?  This is not a "regression" in that the 5.4.y kernel

> > > > > > has never worked with this hardware before, it feels like a new feature.

> > > > > >

> > > > > NVMe + SWIOTLB is not a new feature. From my understanding it should

> > > > > be supported by 5.4.y kernel correctly. Currently without the patch, any

> > > > > NVMe device (along with some other devices that relies on offset to

> > > > > work correctly), could be broken if the SWIOTLB is used on a 5.4.y kernel.

> > > >

> > > > Then do not do that, as obviously it never worked without your fixes, so

> > > > this isn't a "regression".

> > >

> > > NVMe + SWIOTLB works fine without this bug fix. By fine I mean that a

> > > guest kernel using this configuration boots and runs stably for weeks

> > > and months under general-purpose usage. The bug that this patch set

> > > fixes was only encountered when a user tried to format an xfs

> > > filesystem under a RHEL guest kernel.

> > >

> > > > And again, why can you not just use 5.10.y?

> > >

> > > For our use case, this fixes the guest kernel, not the host kernel.

> > > The guest distros that we support use 5.4 kernels. We do not control

> > > the kernel that the distros deploy for usage as a guest OS on cloud.

> > > We only control the host kernel.

> >

> > And how are you going to get your guest kernels to update to these

> > patches?  What specific ones are you concerned about?

> >

> > RHEL ignores stable kernel updates, so if you are worried about them,

> > please just work with that company directly.

> 

> We support COS as a guest [1], which does base their kernel on 5.4

> LTS. If these patches were accepted into 5.4 LTS, they would

> automatically get picked up by COS.

> 

> [1] https://cloud.google.com/container-optimized-os


Then go work with that group to add this "required" set of new features
for your cloud systems that require this as again, I fail to see how
this is a "regression" at all.

Maybe I should just go rip out these from 5.10.y as well as it feels
like a _very_ platform-specific issue that you all are having here.

thanks,

greg k-h