Message ID | 20230421165305.804301-10-vipinsh@google.com |
---|---|
State | New |
Headers | show |
Series | KVM: arm64: Use MMU read lock for clearing dirty logs | expand |
On Fri, Apr 21, 2023 at 10:11 AM Marc Zyngier <maz@kernel.org> wrote: > > On Fri, 21 Apr 2023 17:53:05 +0100, > Vipin Sharma <vipinsh@google.com> wrote: > > > > Take MMU read lock for write protecting PTEs and use shared page table > > walker for clearing dirty logs. > > > > Clearing dirty logs are currently performed under MMU write locks. This > > means vCPUs write protection fault, which also take MMU read lock, will > > be blocked during this operation. This causes guest degradation and > > especially noticeable on VMs with lot of vCPUs. > > > > Taking MMU read lock will allow vCPUs to execute parallelly and reduces > > the impact on vCPUs performance. > > Sure. Taking no lock whatsoever would be even better. > > What I don't see is the detailed explanation that gives me the warm > feeling that this is safe and correct. Such an explanation is the > minimum condition for me to even read the patch. > Thanks for freaking me out. Your not getting warm feeling hunch was right, stage2_attr_walker() and stage2_update_leaf_attrs() combo do not retry if cmpxchg fails for write protection. Write protection callers don't check what the return status of the API is and just ignores cmpxchg failure. This means a vCPU (MMU read lock user) can cause cmpxchg to fail for write protection operation (under read lock, which this patch does) and clear ioctl will happily return as if everything is good. I will update the series and also work on validating the correctness to instill more confidence. Thanks
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index e0189cdda43d..3f2117d93998 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -67,8 +67,12 @@ static int stage2_apply_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, if (ret) break; - if (resched && next != end) - cond_resched_rwlock_write(&kvm->mmu_lock); + if (resched && next != end) { + if (flags & KVM_PGTABLE_WALK_SHARED) + cond_resched_rwlock_read(&kvm->mmu_lock); + else + cond_resched_rwlock_write(&kvm->mmu_lock); + } } while (addr = next, addr != end); return ret; @@ -994,7 +998,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, phys_addr_t start = (base_gfn + __ffs(mask)) << PAGE_SHIFT; phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT; - stage2_wp_range(&kvm->arch.mmu, start, end, 0); + stage2_wp_range(&kvm->arch.mmu, start, end, KVM_PGTABLE_WALK_SHARED); } /* @@ -1008,9 +1012,9 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn_offset, unsigned long mask) { - write_lock(&kvm->mmu_lock); + read_lock(&kvm->mmu_lock); kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask); - write_unlock(&kvm->mmu_lock); + read_unlock(&kvm->mmu_lock); } static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
Take MMU read lock for write protecting PTEs and use shared page table walker for clearing dirty logs. Clearing dirty logs are currently performed under MMU write locks. This means vCPUs write protection fault, which also take MMU read lock, will be blocked during this operation. This causes guest degradation and especially noticeable on VMs with lot of vCPUs. Taking MMU read lock will allow vCPUs to execute parallelly and reduces the impact on vCPUs performance. Tested improvement on a ARM Ampere Altra host (64 CPUs, 256 GB memory and single NUMA node) via dirty_log_perf_test for 48 vCPU, 96 GB memory, 8GB clear chunk size, 1 second wait between Clear-Dirty-Log calls and configuration: Test command: ./dirty_log_perf_test -s anonymous_hugetlb_2mb -b 2G -v 48 -l 1 -k 8G -j -m 2 Before: Total pages touched: 50331648 (Reads: 0, Writes: 50331648) After: Total pages touched: 125304832 (Reads: 0, Writes: 125304832) Signed-off-by: Vipin Sharma <vipinsh@google.com> --- arch/arm64/kvm/mmu.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)