Message ID | 20210820131229.3417770-1-ovidiu.panait@windriver.com |
---|---|
State | New |
Headers | show |
Series | [4.14] KVM: X86: MMU: Use the correct inherited permissions to get shadow page | expand |
Hi, It seems that this backport missed the last 4.14 release, could it be considered for the next one? Thanks! Ovidiu On 8/20/21 4:12 PM, Ovidiu Panait wrote: > From: Lai Jiangshan <laijs@linux.alibaba.com> > > commit b1bd5cba3306691c771d558e94baa73e8b0b96b7 upstream. > > When computing the access permissions of a shadow page, use the effective > permissions of the walk up to that point, i.e. the logic AND of its parents' > permissions. Two guest PxE entries that point at the same table gfn need to > be shadowed with different shadow pages if their parents' permissions are > different. KVM currently uses the effective permissions of the last > non-leaf entry for all non-leaf entries. Because all non-leaf SPTEs have > full ("uwx") permissions, and the effective permissions are recorded only > in role.access and merged into the leaves, this can lead to incorrect > reuse of a shadow page and eventually to a missing guest protection page > fault. > > For example, here is a shared pagetable: > > pgd[] pud[] pmd[] virtual address pointers > /->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--) > /->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-) > pgd-| (shared pmd[] as above) > \->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--) > \->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--) > > pud1 and pud2 point to the same pmd table, so: > - ptr1 and ptr3 points to the same page. > - ptr2 and ptr4 points to the same page. > > (pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries) > > - First, the guest reads from ptr1 first and KVM prepares a shadow > page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1. > "u--" comes from the effective permissions of pgd, pud1 and > pmd1, which are stored in pt->access. "u--" is used also to get > the pagetable for pud1, instead of "uw-". > > - Then the guest writes to ptr2 and KVM reuses pud1 which is present. > The hypervisor set up a shadow page for ptr2 with pt->access is "uw-" > even though the pud1 pmd (because of the incorrect argument to > kvm_mmu_get_page in the previous step) has role.access="u--". > > - Then the guest reads from ptr3. The hypervisor reuses pud1's > shadow pmd for pud2, because both use "u--" for their permissions. > Thus, the shadow pmd already includes entries for both pmd1 and pmd2. > > - At last, the guest writes to ptr4. This causes no vmexit or pagefault, > because pud1's shadow page structures included an "uw-" page even though > its role.access was "u--". > > Any kind of shared pagetable might have the similar problem when in > virtual machine without TDP enabled if the permissions are different > from different ancestors. > > In order to fix the problem, we change pt->access to be an array, and > any access in it will not include permissions ANDed from child ptes. > > The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/ > Remember to test it with TDP disabled. > > The problem had existed long before the commit 41074d07c78b ("KVM: MMU: > Fix inherited permissions for emulated guest pte updates"), and it > is hard to find which is the culprit. So there is no fixes tag here. > > Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> > Message-Id: <20210603052455.21023-1-jiangshanlai@gmail.com> > Cc: stable@vger.kernel.org > Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching") > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > [OP: - apply arch/x86/kvm/mmu/* changes to arch/x86/kvm > - apply documentation changes to Documentation/virtual/kvm/mmu.txt > - add vcpu parameter to gpte_access() call] > Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> > --- > Note: The backport was validated by running the kvm-unit-tests testcase [1] > mentioned in the commit message (the testcase fails without the patch and > passes with the patch applied). > > [1] https://gitlab.com/kvm-unit-tests/kvm-unit-tests/-/commit/47fd6bc54674fb1d8a29c55305042689e8692522 > > Documentation/virtual/kvm/mmu.txt | 4 ++-- > arch/x86/kvm/paging_tmpl.h | 14 +++++++++----- > 2 files changed, 11 insertions(+), 7 deletions(-) > > diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt > index e507a9e0421e..851a8abcadce 100644 > --- a/Documentation/virtual/kvm/mmu.txt > +++ b/Documentation/virtual/kvm/mmu.txt > @@ -152,8 +152,8 @@ Shadow pages contain the following information: > shadow pages) so role.quadrant takes values in the range 0..3. Each > quadrant maps 1GB virtual address space. > role.access: > - Inherited guest access permissions in the form uwx. Note execute > - permission is positive, not negative. > + Inherited guest access permissions from the parent ptes in the form uwx. > + Note execute permission is positive, not negative. > role.invalid: > The page is invalid and should not be used. It is a root page that is > currently pinned (by a cpu hardware register pointing to it); once it is > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h > index 8cf7a09bdd73..677b195f7cf1 100644 > --- a/arch/x86/kvm/paging_tmpl.h > +++ b/arch/x86/kvm/paging_tmpl.h > @@ -93,8 +93,8 @@ struct guest_walker { > gpa_t pte_gpa[PT_MAX_FULL_LEVELS]; > pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS]; > bool pte_writable[PT_MAX_FULL_LEVELS]; > - unsigned pt_access; > - unsigned pte_access; > + unsigned int pt_access[PT_MAX_FULL_LEVELS]; > + unsigned int pte_access; > gfn_t gfn; > struct x86_exception fault; > }; > @@ -388,13 +388,15 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, > } > > walker->ptes[walker->level - 1] = pte; > + > + /* Convert to ACC_*_MASK flags for struct guest_walker. */ > + walker->pt_access[walker->level - 1] = FNAME(gpte_access)(vcpu, pt_access ^ walk_nx_mask); > } while (!is_last_gpte(mmu, walker->level, pte)); > > pte_pkey = FNAME(gpte_pkeys)(vcpu, pte); > accessed_dirty = have_ad ? pte_access & PT_GUEST_ACCESSED_MASK : 0; > > /* Convert to ACC_*_MASK flags for struct guest_walker. */ > - walker->pt_access = FNAME(gpte_access)(vcpu, pt_access ^ walk_nx_mask); > walker->pte_access = FNAME(gpte_access)(vcpu, pte_access ^ walk_nx_mask); > errcode = permission_fault(vcpu, mmu, walker->pte_access, pte_pkey, access); > if (unlikely(errcode)) > @@ -432,7 +434,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, > } > > pgprintk("%s: pte %llx pte_access %x pt_access %x\n", > - __func__, (u64)pte, walker->pte_access, walker->pt_access); > + __func__, (u64)pte, walker->pte_access, > + walker->pt_access[walker->level - 1]); > return 1; > > error: > @@ -601,7 +604,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, > { > struct kvm_mmu_page *sp = NULL; > struct kvm_shadow_walk_iterator it; > - unsigned direct_access, access = gw->pt_access; > + unsigned int direct_access, access; > int top_level, ret; > gfn_t gfn, base_gfn; > > @@ -633,6 +636,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, > sp = NULL; > if (!is_shadow_present_pte(*it.sptep)) { > table_gfn = gw->table_gfn[it.level - 2]; > + access = gw->pt_access[it.level - 2]; > sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1, > false, access); > }
On Sun, Aug 29, 2021 at 06:03:00PM +0300, Ovidiu Panait wrote: > Hi, > > > It seems that this backport missed the last 4.14 release, could it be > considered for the next one? Now queued up, sorry for the delay. greg k-h
diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt index e507a9e0421e..851a8abcadce 100644 --- a/Documentation/virtual/kvm/mmu.txt +++ b/Documentation/virtual/kvm/mmu.txt @@ -152,8 +152,8 @@ Shadow pages contain the following information: shadow pages) so role.quadrant takes values in the range 0..3. Each quadrant maps 1GB virtual address space. role.access: - Inherited guest access permissions in the form uwx. Note execute - permission is positive, not negative. + Inherited guest access permissions from the parent ptes in the form uwx. + Note execute permission is positive, not negative. role.invalid: The page is invalid and should not be used. It is a root page that is currently pinned (by a cpu hardware register pointing to it); once it is diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 8cf7a09bdd73..677b195f7cf1 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -93,8 +93,8 @@ struct guest_walker { gpa_t pte_gpa[PT_MAX_FULL_LEVELS]; pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS]; bool pte_writable[PT_MAX_FULL_LEVELS]; - unsigned pt_access; - unsigned pte_access; + unsigned int pt_access[PT_MAX_FULL_LEVELS]; + unsigned int pte_access; gfn_t gfn; struct x86_exception fault; }; @@ -388,13 +388,15 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, } walker->ptes[walker->level - 1] = pte; + + /* Convert to ACC_*_MASK flags for struct guest_walker. */ + walker->pt_access[walker->level - 1] = FNAME(gpte_access)(vcpu, pt_access ^ walk_nx_mask); } while (!is_last_gpte(mmu, walker->level, pte)); pte_pkey = FNAME(gpte_pkeys)(vcpu, pte); accessed_dirty = have_ad ? pte_access & PT_GUEST_ACCESSED_MASK : 0; /* Convert to ACC_*_MASK flags for struct guest_walker. */ - walker->pt_access = FNAME(gpte_access)(vcpu, pt_access ^ walk_nx_mask); walker->pte_access = FNAME(gpte_access)(vcpu, pte_access ^ walk_nx_mask); errcode = permission_fault(vcpu, mmu, walker->pte_access, pte_pkey, access); if (unlikely(errcode)) @@ -432,7 +434,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, } pgprintk("%s: pte %llx pte_access %x pt_access %x\n", - __func__, (u64)pte, walker->pte_access, walker->pt_access); + __func__, (u64)pte, walker->pte_access, + walker->pt_access[walker->level - 1]); return 1; error: @@ -601,7 +604,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, { struct kvm_mmu_page *sp = NULL; struct kvm_shadow_walk_iterator it; - unsigned direct_access, access = gw->pt_access; + unsigned int direct_access, access; int top_level, ret; gfn_t gfn, base_gfn; @@ -633,6 +636,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, sp = NULL; if (!is_shadow_present_pte(*it.sptep)) { table_gfn = gw->table_gfn[it.level - 2]; + access = gw->pt_access[it.level - 2]; sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1, false, access); }