diff mbox series

selftests: KVM: avoid failures due to reserved HyperTransport region

Message ID 20210805105423.412878-1-pbonzini@redhat.com
State New
Headers show
Series selftests: KVM: avoid failures due to reserved HyperTransport region | expand

Commit Message

Paolo Bonzini Aug. 5, 2021, 10:54 a.m. UTC
Accessing guest physical addresses at 0xFFFD_0000_0000 and above causes
a failure on AMD processors because those addresses are reserved by
HyperTransport (this is not documented).  Avoid selftests failures
by reserving those guest physical addresses.

Fixes: ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tools/testing/selftests/kvm/lib/kvm_util.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Joao Martins Aug. 6, 2021, 10:57 a.m. UTC | #1
On 8/5/21 11:54 AM, Paolo Bonzini wrote:
> Accessing guest physical addresses at 0xFFFD_0000_0000 and above causes
> a failure on AMD processors because those addresses are reserved by
> HyperTransport (this is not documented).  

Oh, but it's actually documented in the AMD IOMMU manual [0] (and AMD IOMMU in linux do
mark it as a reserved IOVA region i.e. HT_RANGE_START..HT_RANGE_END). And it's usually
marked as a reserved type in E820. At least on the machines I've seen.

See manual section '2.1.2 IOMMU Logical Topology':

"Special address controls in Table 3 are interpreted against untranslated guest physical
addressess (GPA) that lack a PASID TLP prefix."

 Base Address   Top Address   Use

  FD_0000_0000h FD_F7FF_FFFFh Reserved interrupt address space
  FD_F800_0000h FD_F8FF_FFFFh Interrupt/EOI IntCtl
  FD_F900_0000h FD_F90F_FFFFh Legacy PIC IACK
  FD_F910_0000h FD_F91F_FFFFh System Management
  FD_F920_0000h FD_FAFF_FFFFh Reserved Page Tables
  FD_FB00_0000h FD_FBFF_FFFFh Address Translation
  FD_FC00_0000h FD_FDFF_FFFFh I/O Space
  FD_FE00_0000h FD_FFFF_FFFFh Configuration
  FE_0000_0000h FE_1FFF_FFFFh Extended Configuration/Device Messages
  FE_2000_0000h FF_FFFF_FFFFh Reserved

It covers the range starting that address you fixed up ... up to 1Tb, fwiw.

You mark it ~1010G as max gfn so shouldn't be a problem.

[0] https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf

> Avoid selftests failures
> by reserving those guest physical addresses.
> 
> Fixes: ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()")
> Cc: stable@vger.kernel.org
> Cc: David Matlack <dmatlack@google.com>
> Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  tools/testing/selftests/kvm/lib/kvm_util.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 10a8ed691c66..d995cc9836ee 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -309,6 +309,12 @@ struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm)
>  	/* Limit physical addresses to PA-bits. */
>  	vm->max_gfn = ((1ULL << vm->pa_bits) >> vm->page_shift) - 1;
>  
> +#ifdef __x86_64__
> +	/* Avoid reserved HyperTransport region on AMD processors.  */
> +	if (vm->pa_bits == 48)
> +		vm->max_gfn = 0xfffcfffff;
> +#endif
> +

Not sure if it's worth the trouble having a macro with the same name as AMD iommu like:

#define HT_RANGE_START                (0xfd00000000ULL)
#define MAX_GFN			      (HT_RANGE_START - 1ULL)

#ifdef __x86_64__
	/* Avoid reserved HyperTransport region on AMD processors.  */
	if (vm->pa_bits == 48)
		vm->max_gfn = MAX_GFN;
#endif

It's a detail, but *perhaps* would help people grepping around it.

Also, not sure if checking against AMD cpuid vendor is worth, considering this is
a limitation only on AMD.


>  	/* Allocate and setup memory for guest. */
>  	vm->vpages_mapped = sparsebit_alloc();
>  	if (phy_pages != 0)
>
Maxim Levitsky Aug. 9, 2021, 7:45 a.m. UTC | #2
On Fri, 2021-08-06 at 11:57 +0100, Joao Martins wrote:
> 

> On 8/5/21 11:54 AM, Paolo Bonzini wrote:

> > Accessing guest physical addresses at 0xFFFD_0000_0000 and above causes

> > a failure on AMD processors because those addresses are reserved by

> > HyperTransport (this is not documented).  

> 

> Oh, but it's actually documented in the AMD IOMMU manual [0] (and AMD IOMMU in linux do

> mark it as a reserved IOVA region i.e. HT_RANGE_START..HT_RANGE_END). And it's usually

> marked as a reserved type in E820. At least on the machines I've seen.

> 

> See manual section '2.1.2 IOMMU Logical Topology':

> 

> "Special address controls in Table 3 are interpreted against untranslated guest physical

> addressess (GPA) that lack a PASID TLP prefix."

> 

>  Base Address   Top Address   Use

> 

>   FD_0000_0000h FD_F7FF_FFFFh Reserved interrupt address space

>   FD_F800_0000h FD_F8FF_FFFFh Interrupt/EOI IntCtl

>   FD_F900_0000h FD_F90F_FFFFh Legacy PIC IACK

>   FD_F910_0000h FD_F91F_FFFFh System Management

>   FD_F920_0000h FD_FAFF_FFFFh Reserved Page Tables

>   FD_FB00_0000h FD_FBFF_FFFFh Address Translation

>   FD_FC00_0000h FD_FDFF_FFFFh I/O Space

>   FD_FE00_0000h FD_FFFF_FFFFh Configuration

>   FE_0000_0000h FE_1FFF_FFFFh Extended Configuration/Device Messages

>   FE_2000_0000h FF_FFFF_FFFFh Reserved

> 

> It covers the range starting that address you fixed up ... up to 1Tb, fwiw.

> 

> You mark it ~1010G as max gfn so shouldn't be a problem.

> 

> [0] https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf

> 

> > Avoid selftests failures

> > by reserving those guest physical addresses.

> > 

> > Fixes: ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()")

> > Cc: stable@vger.kernel.org

> > Cc: David Matlack <dmatlack@google.com>

> > Reported-by: Maxim Levitsky <mlevitsk@redhat.com>

> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

> > ---

> >  tools/testing/selftests/kvm/lib/kvm_util.c | 6 ++++++

> >  1 file changed, 6 insertions(+)

> > 

> > diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c

> > index 10a8ed691c66..d995cc9836ee 100644

> > --- a/tools/testing/selftests/kvm/lib/kvm_util.c

> > +++ b/tools/testing/selftests/kvm/lib/kvm_util.c

> > @@ -309,6 +309,12 @@ struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm)

> >  	/* Limit physical addresses to PA-bits. */

> >  	vm->max_gfn = ((1ULL << vm->pa_bits) >> vm->page_shift) - 1;

> >  

> > +#ifdef __x86_64__

> > +	/* Avoid reserved HyperTransport region on AMD processors.  */

> > +	if (vm->pa_bits == 48)

> > +		vm->max_gfn = 0xfffcfffff;

> > +#endif

> > +

> 

> Not sure if it's worth the trouble having a macro with the same name as AMD iommu like:

> 

> #define HT_RANGE_START                (0xfd00000000ULL)

> #define MAX_GFN			      (HT_RANGE_START - 1ULL)

> 

> #ifdef __x86_64__

> 	/* Avoid reserved HyperTransport region on AMD processors.  */

> 	if (vm->pa_bits == 48)

> 		vm->max_gfn = MAX_GFN;

> #endif


I guess now that we know that it is documented, it is worth it,
to remove '== 48' check and add check for an AMD cpu, and add reference
to this manual.

I am mentioning the 48 bit check because I have seen that AMD just recently
posted 5 level NPT support, so I guess CPUs which > 48 bit max physical address
are also probably on horison.

And long term solution for this I guess is to add these areas to a blacklist
and avoid them.

Best regards,
	Maxim Levitsky

> 

> It's a detail, but *perhaps* would help people grepping around it.

> 

> Also, not sure if checking against AMD cpuid vendor is worth, considering this is

> a limitation only on AMD.

> 

> 

> >  	/* Allocate and setup memory for guest. */

> >  	vm->vpages_mapped = sparsebit_alloc();

> >  	if (phy_pages != 0)

> >
Paolo Bonzini Aug. 9, 2021, 9:03 a.m. UTC | #3
On 06/08/21 12:57, Joao Martins wrote:
>   Base Address   Top Address   Use

> 

>    FD_0000_0000h FD_F7FF_FFFFh Reserved interrupt address space

>    FD_F800_0000h FD_F8FF_FFFFh Interrupt/EOI IntCtl

>    FD_F900_0000h FD_F90F_FFFFh Legacy PIC IACK

>    FD_F910_0000h FD_F91F_FFFFh System Management

>    FD_F920_0000h FD_FAFF_FFFFh Reserved Page Tables

>    FD_FB00_0000h FD_FBFF_FFFFh Address Translation

>    FD_FC00_0000h FD_FDFF_FFFFh I/O Space

>    FD_FE00_0000h FD_FFFF_FFFFh Configuration

>    FE_0000_0000h FE_1FFF_FFFFh Extended Configuration/Device Messages

>    FE_2000_0000h FF_FFFF_FFFFh Reserved


The problems we're seeing are with FFFD_0000_0000h.  How does the IOMMU 
interpret bits 40-47 of the address?

Paolo
Joao Martins Aug. 9, 2021, 10 a.m. UTC | #4
On 8/9/21 10:03 AM, Paolo Bonzini wrote:
> On 06/08/21 12:57, Joao Martins wrote:

>>   Base Address   Top Address   Use

>>

>>    FD_0000_0000h FD_F7FF_FFFFh Reserved interrupt address space

>>    FD_F800_0000h FD_F8FF_FFFFh Interrupt/EOI IntCtl

>>    FD_F900_0000h FD_F90F_FFFFh Legacy PIC IACK

>>    FD_F910_0000h FD_F91F_FFFFh System Management

>>    FD_F920_0000h FD_FAFF_FFFFh Reserved Page Tables

>>    FD_FB00_0000h FD_FBFF_FFFFh Address Translation

>>    FD_FC00_0000h FD_FDFF_FFFFh I/O Space

>>    FD_FE00_0000h FD_FFFF_FFFFh Configuration

>>    FE_0000_0000h FE_1FFF_FFFFh Extended Configuration/Device Messages

>>    FE_2000_0000h FF_FFFF_FFFFh Reserved

> 

> The problems we're seeing are with FFFD_0000_0000h.  How does the IOMMU 

> interpret bits 40-47 of the address?


The IOMMU supposedly makes an invalid device request iommu fault event when touching the
reserved interrupt address space. But that presumes performing DMA from/to device with
said IOVA=fdXXXXXXXX (doesn't matter if it's guest or baremetal).

Oh wait. You asked fffd******** not 00fd******** which then the spec quote I gave doesn't
document what you're aiming for. Ugh, sorry if I mislead you and Maxim. My mind eliminated
the starting 0xff when reading the address given the similarity.

But I've seen that address range somewhere. This errata might help [0]? But anyways, it
doesn't explain why how this is interpreted, just that it is 'reserved'. It doesn't look
to be HyperTransport address range either as that would be the 1010..1023G range (close to
1T) as documented by the IOMMU manual, not the 256T boundary (fffd********). It's
interesting that fn8000_000A EDX[28] is part of the reserved bits from that CPUID leaf.

[0] https://developer.amd.com/wp-content/resources/56323-PUB_0.78.pdf

1286 Spurious #GP May Occur When Hypervisor Running on
Another Hypervisor

Description

The processor may incorrectly generate a #GP fault if a hypervisor running on a hypervisor
attempts to access the following secure memory areas:

• The reserved memory address region starting at FFFD_0000_0000h and extending up to
FFFF_FFFF_FFFFh.
• ASEG and TSEG memory regions for SMM (System Management Mode)
• MMIO APIC Space

Potential Effect on System

Software running a hypervisor on a hypervisor may encounter an unexpected #GP fault.

Suggested Workaround

If CPUID bit fn8000_000A EDX[28] = 0b, then:
• Hypervisor software can trap #GP faults that potentially have this issue and ignore #GP
faults that were
erroneously generated.
If CPUID bit fn8000_000A EDX[28] = 1b, then the issue has been fixed and no workaround is
necessary.
Fix Planned
No fix planned
Paolo Bonzini Aug. 9, 2021, 10:23 a.m. UTC | #5
On 09/08/21 12:00, Joao Martins wrote:
> [0]https://developer.amd.com/wp-content/resources/56323-PUB_0.78.pdf

> 

> 1286 Spurious #GP May Occur When Hypervisor Running on

> Another Hypervisor

> 

> Description

> 

> The processor may incorrectly generate a #GP fault if a hypervisor running on a hypervisor

> attempts to access the following secure memory areas:

> 

> • The reserved memory address region starting at FFFD_0000_0000h and extending up to

> FFFF_FFFF_FFFFh.

> • ASEG and TSEG memory regions for SMM (System Management Mode)

> • MMIO APIC Space


This errata took a few months to debug so we're quite familiar with it 
:) but I only knew about the ASEG/TSEG/APIC cases.

So this HyperTransport region is not related to this issue, but the 
errata does point out that FFFD_0000_0000h and upwards is special in guests.

The Xen folks also had to deal with it only a couple months ago 
(https://yhbt.net/lore/all/1eb16baa-6b1b-3b18-c712-4459bd83e1aa@citrix.com/):

   From "Open-Source Register Reference for AMD Family 17h Processors 
(PUB)":
   https://developer.amd.com/wp-content/resources/56255_3_03.PDF

   "The processor defines a reserved memory address region starting at
   FFFD_0000_0000h and extending up to FFFF_FFFF_FFFFh."

   It's still doesn't say that it's at the top of physical address space
   although I understand that's how it's now implemented. The official
   document doesn't confirm it will move along with physical address space
   extension.

   [...]

   1) On parts with <40 bits, its fully hidden from software
   2) Before Fam17h, it was always 12G just below 1T, even if there was
   more RAM above this location
   3) On Fam17h and later, it is variable based on SME, and is either
   just below 2^48 (no encryption) or 2^43 (encryption)

> It's

> interesting that fn8000_000A EDX[28] is part of the reserved bits from that CPUID leaf.


It's only been defined after AMD deemed that the errata was not fixable 
in current generation processors); it's X86_FEATURE_SVME_ADDR_CHK now.

I'll update the patch based on the findings from the Xen team.

Paolo
Sean Christopherson Dec. 8, 2021, 4:54 p.m. UTC | #6
On Mon, Aug 09, 2021, Paolo Bonzini wrote:
> So this HyperTransport region is not related to this issue, but the errata
> does point out that FFFD_0000_0000h and upwards is special in guests.
> 
> The Xen folks also had to deal with it only a couple months ago
> (https://yhbt.net/lore/all/1eb16baa-6b1b-3b18-c712-4459bd83e1aa@citrix.com/):
> 
>   From "Open-Source Register Reference for AMD Family 17h Processors (PUB)":
>   https://developer.amd.com/wp-content/resources/56255_3_03.PDF
> 
>   "The processor defines a reserved memory address region starting at
>   FFFD_0000_0000h and extending up to FFFF_FFFF_FFFFh."
> 
>   It's still doesn't say that it's at the top of physical address space
>   although I understand that's how it's now implemented. The official
>   document doesn't confirm it will move along with physical address space
>   extension.
> 
>   [...]
> 
>   1) On parts with <40 bits, its fully hidden from software
>   2) Before Fam17h, it was always 12G just below 1T, even if there was
>   more RAM above this location
>   3) On Fam17h and later, it is variable based on SME, and is either
>   just below 2^48 (no encryption) or 2^43 (encryption)
> 
> > It's interesting that fn8000_000A EDX[28] is part of the reserved bits from
> > that CPUID leaf.
> 
> It's only been defined after AMD deemed that the errata was not fixable in
> current generation processors); it's X86_FEATURE_SVME_ADDR_CHK now.
> 
> I'll update the patch based on the findings from the Xen team.

So, about that update... :-)
diff mbox series

Patch

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 10a8ed691c66..d995cc9836ee 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -309,6 +309,12 @@  struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm)
 	/* Limit physical addresses to PA-bits. */
 	vm->max_gfn = ((1ULL << vm->pa_bits) >> vm->page_shift) - 1;
 
+#ifdef __x86_64__
+	/* Avoid reserved HyperTransport region on AMD processors.  */
+	if (vm->pa_bits == 48)
+		vm->max_gfn = 0xfffcfffff;
+#endif
+
 	/* Allocate and setup memory for guest. */
 	vm->vpages_mapped = sparsebit_alloc();
 	if (phy_pages != 0)