Message ID | 20200925161916.204667-1-pgonda@google.com |
---|---|
Headers | show |
Series | Backport unencrypted non-blocking DMA allocations | expand |
On Fri, Sep 25, 2020 at 09:18:46AM -0700, Peter Gonda wrote: > Currently SEV enabled guests hit might_sleep() warnings when a driver (nvme in > this case) allocates through the DMA API in a non-blockable context. Making > these unencrypted non-blocking DMA allocations come from the coherent pools > prevents this BUG. > > BUG: sleeping function called from invalid context at mm/vmalloc.c:1710 > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3383, name: fio > 2 locks held by fio/3383: > #0: ffff93b6a8568348 (&sb->s_type->i_mutex_key#16){+.+.}, at: ext4_file_write_iter+0xa2/0x5d0 > #1: ffffffffa52a61a0 (rcu_read_lock){....}, at: hctx_lock+0x1a/0xe0 > CPU: 0 PID: 3383 Comm: fio Tainted: G W 5.5.10 #14 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Call Trace: > dump_stack+0x98/0xd5 > ___might_sleep+0x175/0x260 > __might_sleep+0x4a/0x80 > _vm_unmap_aliases+0x45/0x250 > vm_unmap_aliases+0x19/0x20 > __set_memory_enc_dec+0xa4/0x130 > set_memory_decrypted+0x10/0x20 > dma_direct_alloc_pages+0x148/0x150 > dma_direct_alloc+0xe/0x10 > dma_alloc_attrs+0x86/0xc0 > dma_pool_alloc+0x16f/0x2b0 > nvme_queue_rq+0x878/0xc30 [nvme] > __blk_mq_try_issue_directly+0x135/0x200 > blk_mq_request_issue_directly+0x4f/0x80 > blk_mq_try_issue_list_directly+0x46/0xb0 > blk_mq_sched_insert_requests+0x19b/0x2b0 > blk_mq_flush_plug_list+0x22f/0x3b0 > blk_flush_plug_list+0xd1/0x100 > blk_finish_plug+0x2c/0x40 > iomap_dio_rw+0x427/0x490 > ext4_file_write_iter+0x181/0x5d0 > aio_write+0x109/0x1b0 > io_submit_one+0x7d0/0xfa0 > __x64_sys_io_submit+0xa2/0x280 > do_syscall_64+0x5f/0x250 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > Christoph Hellwig (9): > dma-direct: remove __dma_direct_free_pages > dma-direct: remove the dma_handle argument to __dma_direct_alloc_pages > dma-direct: provide mmap and get_sgtable method overrides > dma-mapping: merge the generic remapping helpers into dma-direct > dma-direct: consolidate the error handling in dma_direct_alloc_pages > xtensa: use the generic uncached segment support > dma-direct: make uncached_kernel_address more general > dma-mapping: DMA_COHERENT_POOL should select GENERIC_ALLOCATOR > dma-pool: fix coherent pool allocations for IOMMU mappings > > David Rientjes (13): > dma-remap: separate DMA atomic pools from direct remap code > dma-pool: add additional coherent pools to map to gfp mask > dma-pool: dynamically expanding atomic pools > dma-direct: atomic allocations must come from atomic coherent pools > dma-pool: add pool sizes to debugfs > x86/mm: unencrypted non-blocking DMA allocations use coherent pools > dma-pool: scale the default DMA coherent pool size with memory > capacity > dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL > dma-direct: always align allocation size in dma_direct_alloc_pages() > dma-direct: re-encrypt memory if dma_direct_alloc_pages() fails > dma-direct: check return value when encrypting or decrypting memory > dma-direct: add missing set_memory_decrypted() for coherent mapping > dma-mapping: warn when coherent pool is depleted > > Geert Uytterhoeven (1): > dma-pool: fix too large DMA pools on medium memory size systems > > Huang Shijie (1): > lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr > > Nicolas Saenz Julienne (6): > dma-direct: provide function to check physical memory area validity > dma-pool: get rid of dma_in_atomic_pool() > dma-pool: introduce dma_guess_pool() > dma-pool: make sure atomic pool suits device > dma-pool: do not allocate pool memory from CMA > dma/direct: turn ARCH_ZONE_DMA_BITS into a variable > > Documentation/core-api/genalloc.rst | 2 +- > arch/Kconfig | 8 +- > arch/arc/Kconfig | 1 - > arch/arm/Kconfig | 1 - > arch/arm/mm/dma-mapping.c | 8 +- > arch/arm64/Kconfig | 1 - > arch/arm64/mm/init.c | 9 +- > arch/ia64/Kconfig | 2 +- > arch/ia64/kernel/dma-mapping.c | 6 - > arch/microblaze/Kconfig | 3 +- > arch/microblaze/mm/consistent.c | 2 +- > arch/mips/Kconfig | 7 +- > arch/mips/mm/dma-noncoherent.c | 8 +- > arch/nios2/Kconfig | 3 +- > arch/nios2/mm/dma-mapping.c | 2 +- > arch/powerpc/include/asm/page.h | 9 - > arch/powerpc/mm/mem.c | 20 +- > arch/powerpc/platforms/Kconfig.cputype | 1 - > arch/s390/include/asm/page.h | 2 - > arch/s390/mm/init.c | 1 + > arch/x86/Kconfig | 1 + > arch/xtensa/Kconfig | 6 +- > arch/xtensa/include/asm/platform.h | 27 --- > arch/xtensa/kernel/Makefile | 3 +- > arch/xtensa/kernel/pci-dma.c | 121 ++--------- > drivers/iommu/dma-iommu.c | 5 +- > drivers/misc/sram-exec.c | 2 +- > include/linux/dma-direct.h | 12 +- > include/linux/dma-mapping.h | 7 +- > include/linux/dma-noncoherent.h | 4 +- > include/linux/genalloc.h | 2 +- > kernel/dma/Kconfig | 20 +- > kernel/dma/Makefile | 1 + > kernel/dma/direct.c | 224 ++++++++++++++++---- > kernel/dma/mapping.c | 45 +---- > kernel/dma/pool.c | 270 +++++++++++++++++++++++++ > kernel/dma/remap.c | 176 +--------------- > lib/genalloc.c | 5 +- > 38 files changed, 564 insertions(+), 463 deletions(-) > create mode 100644 kernel/dma/pool.c This is a pretty big set of changes for a stable tree. Can I get some acks/reviews/whatever by the developers involved here that this really is a good idea to take into the 5.4.y tree? thanks, greg k-h
We realize this is a large set of changes but it's the only way for us to remove that customer facing BUG for SEV guests. When David asked back in May a full series was preferred, is there another way forward? I've CCd the authors of the changes. On Mon, Oct 5, 2020 at 7:06 AM Greg KH <greg@kroah.com> wrote: > > On Fri, Sep 25, 2020 at 09:18:46AM -0700, Peter Gonda wrote: > > Currently SEV enabled guests hit might_sleep() warnings when a driver (nvme in > > this case) allocates through the DMA API in a non-blockable context. Making > > these unencrypted non-blocking DMA allocations come from the coherent pools > > prevents this BUG. > > > > BUG: sleeping function called from invalid context at mm/vmalloc.c:1710 > > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3383, name: fio > > 2 locks held by fio/3383: > > #0: ffff93b6a8568348 (&sb->s_type->i_mutex_key#16){+.+.}, at: ext4_file_write_iter+0xa2/0x5d0 > > #1: ffffffffa52a61a0 (rcu_read_lock){....}, at: hctx_lock+0x1a/0xe0 > > CPU: 0 PID: 3383 Comm: fio Tainted: G W 5.5.10 #14 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > Call Trace: > > dump_stack+0x98/0xd5 > > ___might_sleep+0x175/0x260 > > __might_sleep+0x4a/0x80 > > _vm_unmap_aliases+0x45/0x250 > > vm_unmap_aliases+0x19/0x20 > > __set_memory_enc_dec+0xa4/0x130 > > set_memory_decrypted+0x10/0x20 > > dma_direct_alloc_pages+0x148/0x150 > > dma_direct_alloc+0xe/0x10 > > dma_alloc_attrs+0x86/0xc0 > > dma_pool_alloc+0x16f/0x2b0 > > nvme_queue_rq+0x878/0xc30 [nvme] > > __blk_mq_try_issue_directly+0x135/0x200 > > blk_mq_request_issue_directly+0x4f/0x80 > > blk_mq_try_issue_list_directly+0x46/0xb0 > > blk_mq_sched_insert_requests+0x19b/0x2b0 > > blk_mq_flush_plug_list+0x22f/0x3b0 > > blk_flush_plug_list+0xd1/0x100 > > blk_finish_plug+0x2c/0x40 > > iomap_dio_rw+0x427/0x490 > > ext4_file_write_iter+0x181/0x5d0 > > aio_write+0x109/0x1b0 > > io_submit_one+0x7d0/0xfa0 > > __x64_sys_io_submit+0xa2/0x280 > > do_syscall_64+0x5f/0x250 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > Christoph Hellwig (9): > > dma-direct: remove __dma_direct_free_pages > > dma-direct: remove the dma_handle argument to __dma_direct_alloc_pages > > dma-direct: provide mmap and get_sgtable method overrides > > dma-mapping: merge the generic remapping helpers into dma-direct > > dma-direct: consolidate the error handling in dma_direct_alloc_pages > > xtensa: use the generic uncached segment support > > dma-direct: make uncached_kernel_address more general > > dma-mapping: DMA_COHERENT_POOL should select GENERIC_ALLOCATOR > > dma-pool: fix coherent pool allocations for IOMMU mappings > > > > David Rientjes (13): > > dma-remap: separate DMA atomic pools from direct remap code > > dma-pool: add additional coherent pools to map to gfp mask > > dma-pool: dynamically expanding atomic pools > > dma-direct: atomic allocations must come from atomic coherent pools > > dma-pool: add pool sizes to debugfs > > x86/mm: unencrypted non-blocking DMA allocations use coherent pools > > dma-pool: scale the default DMA coherent pool size with memory > > capacity > > dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL > > dma-direct: always align allocation size in dma_direct_alloc_pages() > > dma-direct: re-encrypt memory if dma_direct_alloc_pages() fails > > dma-direct: check return value when encrypting or decrypting memory > > dma-direct: add missing set_memory_decrypted() for coherent mapping > > dma-mapping: warn when coherent pool is depleted > > > > Geert Uytterhoeven (1): > > dma-pool: fix too large DMA pools on medium memory size systems > > > > Huang Shijie (1): > > lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr > > > > Nicolas Saenz Julienne (6): > > dma-direct: provide function to check physical memory area validity > > dma-pool: get rid of dma_in_atomic_pool() > > dma-pool: introduce dma_guess_pool() > > dma-pool: make sure atomic pool suits device > > dma-pool: do not allocate pool memory from CMA > > dma/direct: turn ARCH_ZONE_DMA_BITS into a variable > > > > Documentation/core-api/genalloc.rst | 2 +- > > arch/Kconfig | 8 +- > > arch/arc/Kconfig | 1 - > > arch/arm/Kconfig | 1 - > > arch/arm/mm/dma-mapping.c | 8 +- > > arch/arm64/Kconfig | 1 - > > arch/arm64/mm/init.c | 9 +- > > arch/ia64/Kconfig | 2 +- > > arch/ia64/kernel/dma-mapping.c | 6 - > > arch/microblaze/Kconfig | 3 +- > > arch/microblaze/mm/consistent.c | 2 +- > > arch/mips/Kconfig | 7 +- > > arch/mips/mm/dma-noncoherent.c | 8 +- > > arch/nios2/Kconfig | 3 +- > > arch/nios2/mm/dma-mapping.c | 2 +- > > arch/powerpc/include/asm/page.h | 9 - > > arch/powerpc/mm/mem.c | 20 +- > > arch/powerpc/platforms/Kconfig.cputype | 1 - > > arch/s390/include/asm/page.h | 2 - > > arch/s390/mm/init.c | 1 + > > arch/x86/Kconfig | 1 + > > arch/xtensa/Kconfig | 6 +- > > arch/xtensa/include/asm/platform.h | 27 --- > > arch/xtensa/kernel/Makefile | 3 +- > > arch/xtensa/kernel/pci-dma.c | 121 ++--------- > > drivers/iommu/dma-iommu.c | 5 +- > > drivers/misc/sram-exec.c | 2 +- > > include/linux/dma-direct.h | 12 +- > > include/linux/dma-mapping.h | 7 +- > > include/linux/dma-noncoherent.h | 4 +- > > include/linux/genalloc.h | 2 +- > > kernel/dma/Kconfig | 20 +- > > kernel/dma/Makefile | 1 + > > kernel/dma/direct.c | 224 ++++++++++++++++---- > > kernel/dma/mapping.c | 45 +---- > > kernel/dma/pool.c | 270 +++++++++++++++++++++++++ > > kernel/dma/remap.c | 176 +--------------- > > lib/genalloc.c | 5 +- > > 38 files changed, 564 insertions(+), 463 deletions(-) > > create mode 100644 kernel/dma/pool.c > > This is a pretty big set of changes for a stable tree. Can I get some > acks/reviews/whatever by the developers involved here that this really > is a good idea to take into the 5.4.y tree? > > thanks, > > greg k-h
Thanks Peter. The series of commits certainly expanded from my initial set that I asked about in a thread with the subject "DMA API stable backports for AMD SEV" on May 19. Turns out that switching how DMA memory is allocated based on various characteristics of the allocation and device is trickier than originally thought :) There were a number of fixes that were needed for subtleties and cornercases that folks ran into, but were addressed and have been merged by Linus. I believe it's stable in upstream and that we've been thorough in compiling a full set of changes that are required for 5.4. Note that without this series, all SEV-enabled guests will run into the "sleeping function called from invalid context" issue in the vmalloc layer that Peter cites when using certain drivers. For such configurations, there is no way to avoid the "BUG" messages in the guest kernel when using AMD SEV unless this series is merged into an LTS kernel that the distros will then pick up. For my 13 patches in the 30 patch series, I fully stand by Peter's backports and rationale for merge into 5.4 LTS. On Tue, 6 Oct 2020, Peter Gonda wrote: > We realize this is a large set of changes but it's the only way for us > to remove that customer facing BUG for SEV guests. When David asked > back in May a full series was preferred, is there another way forward? > I've CCd the authors of the changes. > > On Mon, Oct 5, 2020 at 7:06 AM Greg KH <greg@kroah.com> wrote: > > > > On Fri, Sep 25, 2020 at 09:18:46AM -0700, Peter Gonda wrote: > > > Currently SEV enabled guests hit might_sleep() warnings when a driver (nvme in > > > this case) allocates through the DMA API in a non-blockable context. Making > > > these unencrypted non-blocking DMA allocations come from the coherent pools > > > prevents this BUG. > > > > > > BUG: sleeping function called from invalid context at mm/vmalloc.c:1710 > > > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3383, name: fio > > > 2 locks held by fio/3383: > > > #0: ffff93b6a8568348 (&sb->s_type->i_mutex_key#16){+.+.}, at: ext4_file_write_iter+0xa2/0x5d0 > > > #1: ffffffffa52a61a0 (rcu_read_lock){....}, at: hctx_lock+0x1a/0xe0 > > > CPU: 0 PID: 3383 Comm: fio Tainted: G W 5.5.10 #14 > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > > Call Trace: > > > dump_stack+0x98/0xd5 > > > ___might_sleep+0x175/0x260 > > > __might_sleep+0x4a/0x80 > > > _vm_unmap_aliases+0x45/0x250 > > > vm_unmap_aliases+0x19/0x20 > > > __set_memory_enc_dec+0xa4/0x130 > > > set_memory_decrypted+0x10/0x20 > > > dma_direct_alloc_pages+0x148/0x150 > > > dma_direct_alloc+0xe/0x10 > > > dma_alloc_attrs+0x86/0xc0 > > > dma_pool_alloc+0x16f/0x2b0 > > > nvme_queue_rq+0x878/0xc30 [nvme] > > > __blk_mq_try_issue_directly+0x135/0x200 > > > blk_mq_request_issue_directly+0x4f/0x80 > > > blk_mq_try_issue_list_directly+0x46/0xb0 > > > blk_mq_sched_insert_requests+0x19b/0x2b0 > > > blk_mq_flush_plug_list+0x22f/0x3b0 > > > blk_flush_plug_list+0xd1/0x100 > > > blk_finish_plug+0x2c/0x40 > > > iomap_dio_rw+0x427/0x490 > > > ext4_file_write_iter+0x181/0x5d0 > > > aio_write+0x109/0x1b0 > > > io_submit_one+0x7d0/0xfa0 > > > __x64_sys_io_submit+0xa2/0x280 > > > do_syscall_64+0x5f/0x250 > > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > > > Christoph Hellwig (9): > > > dma-direct: remove __dma_direct_free_pages > > > dma-direct: remove the dma_handle argument to __dma_direct_alloc_pages > > > dma-direct: provide mmap and get_sgtable method overrides > > > dma-mapping: merge the generic remapping helpers into dma-direct > > > dma-direct: consolidate the error handling in dma_direct_alloc_pages > > > xtensa: use the generic uncached segment support > > > dma-direct: make uncached_kernel_address more general > > > dma-mapping: DMA_COHERENT_POOL should select GENERIC_ALLOCATOR > > > dma-pool: fix coherent pool allocations for IOMMU mappings > > > > > > David Rientjes (13): > > > dma-remap: separate DMA atomic pools from direct remap code > > > dma-pool: add additional coherent pools to map to gfp mask > > > dma-pool: dynamically expanding atomic pools > > > dma-direct: atomic allocations must come from atomic coherent pools > > > dma-pool: add pool sizes to debugfs > > > x86/mm: unencrypted non-blocking DMA allocations use coherent pools > > > dma-pool: scale the default DMA coherent pool size with memory > > > capacity > > > dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL > > > dma-direct: always align allocation size in dma_direct_alloc_pages() > > > dma-direct: re-encrypt memory if dma_direct_alloc_pages() fails > > > dma-direct: check return value when encrypting or decrypting memory > > > dma-direct: add missing set_memory_decrypted() for coherent mapping > > > dma-mapping: warn when coherent pool is depleted > > > > > > Geert Uytterhoeven (1): > > > dma-pool: fix too large DMA pools on medium memory size systems > > > > > > Huang Shijie (1): > > > lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr > > > > > > Nicolas Saenz Julienne (6): > > > dma-direct: provide function to check physical memory area validity > > > dma-pool: get rid of dma_in_atomic_pool() > > > dma-pool: introduce dma_guess_pool() > > > dma-pool: make sure atomic pool suits device > > > dma-pool: do not allocate pool memory from CMA > > > dma/direct: turn ARCH_ZONE_DMA_BITS into a variable > > > > > > Documentation/core-api/genalloc.rst | 2 +- > > > arch/Kconfig | 8 +- > > > arch/arc/Kconfig | 1 - > > > arch/arm/Kconfig | 1 - > > > arch/arm/mm/dma-mapping.c | 8 +- > > > arch/arm64/Kconfig | 1 - > > > arch/arm64/mm/init.c | 9 +- > > > arch/ia64/Kconfig | 2 +- > > > arch/ia64/kernel/dma-mapping.c | 6 - > > > arch/microblaze/Kconfig | 3 +- > > > arch/microblaze/mm/consistent.c | 2 +- > > > arch/mips/Kconfig | 7 +- > > > arch/mips/mm/dma-noncoherent.c | 8 +- > > > arch/nios2/Kconfig | 3 +- > > > arch/nios2/mm/dma-mapping.c | 2 +- > > > arch/powerpc/include/asm/page.h | 9 - > > > arch/powerpc/mm/mem.c | 20 +- > > > arch/powerpc/platforms/Kconfig.cputype | 1 - > > > arch/s390/include/asm/page.h | 2 - > > > arch/s390/mm/init.c | 1 + > > > arch/x86/Kconfig | 1 + > > > arch/xtensa/Kconfig | 6 +- > > > arch/xtensa/include/asm/platform.h | 27 --- > > > arch/xtensa/kernel/Makefile | 3 +- > > > arch/xtensa/kernel/pci-dma.c | 121 ++--------- > > > drivers/iommu/dma-iommu.c | 5 +- > > > drivers/misc/sram-exec.c | 2 +- > > > include/linux/dma-direct.h | 12 +- > > > include/linux/dma-mapping.h | 7 +- > > > include/linux/dma-noncoherent.h | 4 +- > > > include/linux/genalloc.h | 2 +- > > > kernel/dma/Kconfig | 20 +- > > > kernel/dma/Makefile | 1 + > > > kernel/dma/direct.c | 224 ++++++++++++++++---- > > > kernel/dma/mapping.c | 45 +---- > > > kernel/dma/pool.c | 270 +++++++++++++++++++++++++ > > > kernel/dma/remap.c | 176 +--------------- > > > lib/genalloc.c | 5 +- > > > 38 files changed, 564 insertions(+), 463 deletions(-) > > > create mode 100644 kernel/dma/pool.c > > > > This is a pretty big set of changes for a stable tree. Can I get some > > acks/reviews/whatever by the developers involved here that this really > > is a good idea to take into the 5.4.y tree? > > > > thanks, > > > > greg k-h >
On Tue, Oct 06, 2020 at 11:10:41AM -0700, David Rientjes wrote: > Thanks Peter. > > The series of commits certainly expanded from my initial set that I asked > about in a thread with the subject "DMA API stable backports for AMD SEV" > on May 19. Turns out that switching how DMA memory is allocated based on > various characteristics of the allocation and device is trickier than > originally thought :) There were a number of fixes that were needed for > subtleties and cornercases that folks ran into, but were addressed and > have been merged by Linus. I believe it's stable in upstream and that > we've been thorough in compiling a full set of changes that are required > for 5.4. > > Note that without this series, all SEV-enabled guests will run into the > "sleeping function called from invalid context" issue in the vmalloc layer > that Peter cites when using certain drivers. For such configurations, > there is no way to avoid the "BUG" messages in the guest kernel when using > AMD SEV unless this series is merged into an LTS kernel that the distros > will then pick up. > > For my 13 patches in the 30 patch series, I fully stand by Peter's > backports and rationale for merge into 5.4 LTS. Given that this "feature" has never worked in the 5.4 or older kernels, why should this be backported there? This isn't a bugfix from what I can tell, is it? And if so, what kernel version did work properly? And if someone really wants this new feature, why can't they just use a newer kernel release? thanks, greg k-h
On Mon, 4 Jan 2021, Greg KH wrote: > > The series of commits certainly expanded from my initial set that I asked > > about in a thread with the subject "DMA API stable backports for AMD SEV" > > on May 19. Turns out that switching how DMA memory is allocated based on > > various characteristics of the allocation and device is trickier than > > originally thought :) There were a number of fixes that were needed for > > subtleties and cornercases that folks ran into, but were addressed and > > have been merged by Linus. I believe it's stable in upstream and that > > we've been thorough in compiling a full set of changes that are required > > for 5.4. > > > > Note that without this series, all SEV-enabled guests will run into the > > "sleeping function called from invalid context" issue in the vmalloc layer > > that Peter cites when using certain drivers. For such configurations, > > there is no way to avoid the "BUG" messages in the guest kernel when using > > AMD SEV unless this series is merged into an LTS kernel that the distros > > will then pick up. > > > > For my 13 patches in the 30 patch series, I fully stand by Peter's > > backports and rationale for merge into 5.4 LTS. > > Given that this "feature" has never worked in the 5.4 or older kernels, > why should this be backported there? This isn't a bugfix from what I > can tell, is it? And if so, what kernel version did work properly? > I think it can be considered a bug fix. Today, if you boot an SEV encrypted guest running 5.4 and it requires atomic DMA allocations, you'll get the "sleeping function called from invalid context" bugs. We see this in our Cloud because there is a reliance on atomic allocations through the DMA API by the NVMe driver. Likely nobody else has triggered this because they don't have such driver dependencies. No previous kernel version worked properly since SEV guest support was introduced in 4.14. > And if someone really wants this new feature, why can't they just use a > newer kernel release? > This is more of a product question that I'll defer to Peter and he can loop the necessary people in if required. Since the SEV feature provides confidentiality for guest managed memory, running an unmodified guest vs a guest modified to avoid these bugs by the cloud provider is a very different experience from the perspective of the customer trying to protect their data. These customers are running standard distros that may be slow to upgrade to new kernels released over the past few months. We could certainly work with the distros to backport this support directly to them on a case-by-case basis, but the thought was to first attempt to fix this in 5.4 stable for everybody and allow them to receive the fixes necessary for running a non-buggy SEV encrypted guest that way vs multiple distros doing the backport so they can run with SEV.
On Mon, Jan 04, 2021 at 02:37:00PM -0800, David Rientjes wrote: > On Mon, 4 Jan 2021, Greg KH wrote: > > > > The series of commits certainly expanded from my initial set that I asked > > > about in a thread with the subject "DMA API stable backports for AMD SEV" > > > on May 19. Turns out that switching how DMA memory is allocated based on > > > various characteristics of the allocation and device is trickier than > > > originally thought :) There were a number of fixes that were needed for > > > subtleties and cornercases that folks ran into, but were addressed and > > > have been merged by Linus. I believe it's stable in upstream and that > > > we've been thorough in compiling a full set of changes that are required > > > for 5.4. > > > > > > Note that without this series, all SEV-enabled guests will run into the > > > "sleeping function called from invalid context" issue in the vmalloc layer > > > that Peter cites when using certain drivers. For such configurations, > > > there is no way to avoid the "BUG" messages in the guest kernel when using > > > AMD SEV unless this series is merged into an LTS kernel that the distros > > > will then pick up. > > > > > > For my 13 patches in the 30 patch series, I fully stand by Peter's > > > backports and rationale for merge into 5.4 LTS. > > > > Given that this "feature" has never worked in the 5.4 or older kernels, > > why should this be backported there? This isn't a bugfix from what I > > can tell, is it? And if so, what kernel version did work properly? > > > > I think it can be considered a bug fix. > > Today, if you boot an SEV encrypted guest running 5.4 and it requires > atomic DMA allocations, you'll get the "sleeping function called from > invalid context" bugs. We see this in our Cloud because there is a > reliance on atomic allocations through the DMA API by the NVMe driver. > Likely nobody else has triggered this because they don't have such driver > dependencies. > > No previous kernel version worked properly since SEV guest support was > introduced in 4.14. So since this has never worked, it is not a regression that is being fixed, but rather, a "new feature". And because of that, if you want it to work properly, please use a new kernel that has all of these major changes in it. > > And if someone really wants this new feature, why can't they just use a > > newer kernel release? > > > > This is more of a product question that I'll defer to Peter and he can > loop the necessary people in if required. If you want to make a "product" of a new feature, using an old kernel base, then yes, you have to backport this and you are on your own here. That's just totally normal for all "products" that do not want to use the latest kernel release. > Since the SEV feature provides confidentiality for guest managed memory, > running an unmodified guest vs a guest modified to avoid these bugs by the > cloud provider is a very different experience from the perspective of the > customer trying to protect their data. > > These customers are running standard distros that may be slow to upgrade > to new kernels released over the past few months. We could certainly work > with the distros to backport this support directly to them on a > case-by-case basis, but the thought was to first attempt to fix this in > 5.4 stable for everybody and allow them to receive the fixes necessary for > running a non-buggy SEV encrypted guest that way vs multiple distros doing > the backport so they can run with SEV. What distro that is based on 5.4 that follows the upstream stable trees have not already included these patches in their releases? And what prevents them from using a newer kernel release entirely for this new feature their customers are requesting? thanks, greg k-h
On Tue, 5 Jan 2021, Greg KH wrote: > > I think it can be considered a bug fix. > > > > Today, if you boot an SEV encrypted guest running 5.4 and it requires > > atomic DMA allocations, you'll get the "sleeping function called from > > invalid context" bugs. We see this in our Cloud because there is a > > reliance on atomic allocations through the DMA API by the NVMe driver. > > Likely nobody else has triggered this because they don't have such driver > > dependencies. > > > > No previous kernel version worked properly since SEV guest support was > > introduced in 4.14. > > So since this has never worked, it is not a regression that is being > fixed, but rather, a "new feature". And because of that, if you want it > to work properly, please use a new kernel that has all of these major > changes in it. > Hmm, maybe :) AMD shipped guest support in 4.14 and host support in 4.16 for the SEV feature. In turns out that a subset of drivers (for Google, NVMe) would run into scheduling while atomic bugs because they do GFP_ATOMIC allocations through the DMA API and that uses set_memory_decrypted() for SEV which can block. I'd argue that's a bug in the SEV feature for a subset of configs. So this never worked correctly for a subset of drivers until I added atomic DMA pools in 5.7, which was the preferred way of fixing it. But SEV as a feature works for everybody not using this subset of drivers. I wouldn't say that the fix is a "new feature" because it's the means by which we provide unencrypted DMA memory for atomic allocators that can't make the transition from encrypted to unecrypted during allocation because of their context; it specifically addresses the bug. > What distro that is based on 5.4 that follows the upstream stable trees > have not already included these patches in their releases? And what > prevents them from using a newer kernel release entirely for this new > feature their customers are requesting? > I'll defer this to Peter who would have a far better understanding of the base kernel versions that our customers use with SEV. Thanks
Reviving this thread. This is repro-able on 5.4 LTS guests as long as CONFIG_DEBUG_ATOMIC_SLEEP is set. I am not sure if this can be classified as a regression or not since I think this issue would have shown up on all SEV guests using NVMe boot disks since the start of SEV. So it seems like a regression to me. Currently I know of Ubuntu20.04.01 being based on 5.4. I have also sent this backport to 4.19 which is close to the 4.18 which RHEL8 is based on. I do not know what prevents the distros from using a more recent kernel but that seems like their choice. I was just hoping to get these backport submitted into LTS to show the issue and help any distro wanting to backport these changes to prevent it. Greg is there more information I can help provide as justification? On Mon, Jan 4, 2021 at 11:38 PM David Rientjes <rientjes@google.com> wrote: > > On Tue, 5 Jan 2021, Greg KH wrote: > > > > I think it can be considered a bug fix. > > > > > > Today, if you boot an SEV encrypted guest running 5.4 and it requires > > > atomic DMA allocations, you'll get the "sleeping function called from > > > invalid context" bugs. We see this in our Cloud because there is a > > > reliance on atomic allocations through the DMA API by the NVMe driver. > > > Likely nobody else has triggered this because they don't have such driver > > > dependencies. > > > > > > No previous kernel version worked properly since SEV guest support was > > > introduced in 4.14. > > > > So since this has never worked, it is not a regression that is being > > fixed, but rather, a "new feature". And because of that, if you want it > > to work properly, please use a new kernel that has all of these major > > changes in it. > > > > Hmm, maybe :) AMD shipped guest support in 4.14 and host support in 4.16 > for the SEV feature. In turns out that a subset of drivers (for Google, > NVMe) would run into scheduling while atomic bugs because they do > GFP_ATOMIC allocations through the DMA API and that uses > set_memory_decrypted() for SEV which can block. I'd argue that's a bug in > the SEV feature for a subset of configs. > > So this never worked correctly for a subset of drivers until I added > atomic DMA pools in 5.7, which was the preferred way of fixing it. But > SEV as a feature works for everybody not using this subset of drivers. I > wouldn't say that the fix is a "new feature" because it's the means by > which we provide unencrypted DMA memory for atomic allocators that can't > make the transition from encrypted to unecrypted during allocation because > of their context; it specifically addresses the bug. > > > What distro that is based on 5.4 that follows the upstream stable trees > > have not already included these patches in their releases? And what > > prevents them from using a newer kernel release entirely for this new > > feature their customers are requesting? > > > > I'll defer this to Peter who would have a far better understanding of the > base kernel versions that our customers use with SEV. > > Thanks