Message ID | 20190903160858.5296-23-richard.henderson@linaro.org |
---|---|
State | New |
Headers | show |
Series | tcg patch queue | expand |
On Tue, 3 Sep 2019 at 17:09, Richard Henderson <richard.henderson@linaro.org> wrote: > > We had two different mechanisms to force a recheck of the tlb. > > Before TLB_RECHECK was introduced, we had a PAGE_WRITE_INV bit > that would immediate set TLB_INVALID_MASK, which automatically > means that a second check of the tlb entry fails. > > We can use the same mechanism to handle small pages. > Conserve TLB_* bits by removing TLB_RECHECK. > > Reviewed-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > --- > @@ -1265,27 +1269,6 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, > if ((addr & (size - 1)) != 0) { > goto do_unaligned_access; > } > - > - if (tlb_addr & TLB_RECHECK) { > - /* > - * This is a TLB_RECHECK access, where the MMU protection > - * covers a smaller range than a target page, and we must > - * repeat the MMU check here. This tlb_fill() call might > - * longjump out if this access should cause a guest exception. > - */ > - tlb_fill(env_cpu(env), addr, size, > - access_type, mmu_idx, retaddr); > - index = tlb_index(env, mmu_idx, addr); > - entry = tlb_entry(env, mmu_idx, addr); > - > - tlb_addr = code_read ? entry->addr_code : entry->addr_read; > - tlb_addr &= ~TLB_RECHECK; > - if (!(tlb_addr & ~TARGET_PAGE_MASK)) { > - /* RAM access */ > - goto do_aligned_access; > - } > - } > - > return io_readx(env, &env_tlb(env)->d[mmu_idx].iotlb[index], > mmu_idx, addr, retaddr, access_type, op); > } In the old version of this code, we do the "tlb fill if TLB_RECHECK is set", and then we say "now we've done the refill have we actually got RAM", and we avoid calling io_readx() if that is the case. This is necessary because io_readx() will misbehave if you try to call it on RAM (notably if what we have is notdirty-mem then we need to do the read-from-actual-host-ram because the IO ops backing notdirty-mem are intended for writes only). With this patch applied, we seem to have lost the handling for if the tlb_fill in a TLB_RECHECK case gives us back some real RAM. (Similarly for store_helper().) I think this is what's causing Mark Cave-Ayland's Solaris test case to fail. More generally, I don't really understand why this merging is correct -- "TLB needs a recheck" is not the same thing as "TLB is invalid" and I don't think we can merge the two bits. thanks -- PMM
On 9/6/19 7:02 AM, Peter Maydell wrote: > On Tue, 3 Sep 2019 at 17:09, Richard Henderson > <richard.henderson@linaro.org> wrote: >> >> We had two different mechanisms to force a recheck of the tlb. >> >> Before TLB_RECHECK was introduced, we had a PAGE_WRITE_INV bit >> that would immediate set TLB_INVALID_MASK, which automatically >> means that a second check of the tlb entry fails. >> >> We can use the same mechanism to handle small pages. >> Conserve TLB_* bits by removing TLB_RECHECK. >> >> Reviewed-by: David Hildenbrand <david@redhat.com> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >> --- > >> @@ -1265,27 +1269,6 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, >> if ((addr & (size - 1)) != 0) { >> goto do_unaligned_access; >> } >> - >> - if (tlb_addr & TLB_RECHECK) { >> - /* >> - * This is a TLB_RECHECK access, where the MMU protection >> - * covers a smaller range than a target page, and we must >> - * repeat the MMU check here. This tlb_fill() call might >> - * longjump out if this access should cause a guest exception. >> - */ >> - tlb_fill(env_cpu(env), addr, size, >> - access_type, mmu_idx, retaddr); >> - index = tlb_index(env, mmu_idx, addr); >> - entry = tlb_entry(env, mmu_idx, addr); >> - >> - tlb_addr = code_read ? entry->addr_code : entry->addr_read; >> - tlb_addr &= ~TLB_RECHECK; >> - if (!(tlb_addr & ~TARGET_PAGE_MASK)) { >> - /* RAM access */ >> - goto do_aligned_access; >> - } >> - } >> - >> return io_readx(env, &env_tlb(env)->d[mmu_idx].iotlb[index], >> mmu_idx, addr, retaddr, access_type, op); >> } > > In the old version of this code, we do the "tlb fill if TLB_RECHECK > is set", and then we say "now we've done the refill have we actually > got RAM", and we avoid calling io_readx() if that is the case. I don't think that's the case, since, if (!victim_tlb_hit(env, mmu_idx, index, tlb_off, addr & TARGET_PAGE_MASK)) { tlb_fill(env_cpu(env), addr, size, access_type, mmu_idx, retaddr); index = tlb_index(env, mmu_idx, addr); entry = tlb_entry(env, mmu_idx, addr); } tlb_addr = code_read ? entry->addr_code : entry->addr_read; tlb_addr &= ~TLB_INVALID_MASK; } the last line here clears INVALID. The only bits that could remain should be WATCHPOINT and MMIO. (NOTDIRTY can only be set for entry->addr_write, not for addr_read/addr_code.) And for that matter, once we've processed the watchpoint we remove TLB_WATCHPOINT as well, so that we only enter io_readx() if MMIO is set. > This is necessary because io_readx() will misbehave if you try to > call it on RAM (notably if what we have is notdirty-mem then we > need to do the read-from-actual-host-ram because the IO ops backing > notdirty-mem are intended for writes only). > > With this patch applied, we seem to have lost the handling for > if the tlb_fill in a TLB_RECHECK case gives us back some real RAM. > (Similarly for store_helper().) Again, I disagree. I think there must be some other explanation. > More generally, I don't really understand why this merging > is correct -- "TLB needs a recheck" is not the same thing as > "TLB is invalid" and I don't think we can merge the two > bits. "TLB is invalid" means that we cannot use an existing tlb entry, therefore we must go back to tlb_fill. "TLB needs a recheck" means we must go back to tlb_fill -- exactly the same. The only odd bit about "TLB is invalid" is that it applies to the *next* lookup. If we have just returned from tlb_fill, then the tlb entry *must* be valid. If it were not valid, then tlb_fill would not return at all. So, on the paths that use tlb_fill, we clear TLB_INVALID_MASK, indicating that the lookup has just been done. Which, honestly, ought to have happened with TLB_RECHECK because it was not uncommon to perform two tlb_fill in a row -- the first because of a true tlb miss and the second because the entry supplied by the fill has TLB_RECHECK set. r~
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h index 8323094648..8d07ae23a5 100644 --- a/include/exec/cpu-all.h +++ b/include/exec/cpu-all.h @@ -329,14 +329,11 @@ CPUArchState *cpu_copy(CPUArchState *env); #define TLB_NOTDIRTY (1 << (TARGET_PAGE_BITS - 2)) /* Set if TLB entry is an IO callback. */ #define TLB_MMIO (1 << (TARGET_PAGE_BITS - 3)) -/* Set if TLB entry must have MMU lookup repeated for every access */ -#define TLB_RECHECK (1 << (TARGET_PAGE_BITS - 4)) /* Use this mask to check interception with an alignment mask * in a TCG backend. */ -#define TLB_FLAGS_MASK (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO \ - | TLB_RECHECK) +#define TLB_FLAGS_MASK (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO) /** * tlb_hit_page: return true if page aligned @addr is a hit against the diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index d9787cc893..c9576bebcf 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -732,11 +732,8 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr, address = vaddr_page; if (size < TARGET_PAGE_SIZE) { - /* - * Slow-path the TLB entries; we will repeat the MMU check and TLB - * fill on every access. - */ - address |= TLB_RECHECK; + /* Repeat the MMU check and TLB fill on every access. */ + address |= TLB_INVALID_MASK; } if (attrs.byte_swap) { /* Force the access through the I/O slow path. */ @@ -1026,10 +1023,15 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index, victim_tlb_hit(env, mmu_idx, index, offsetof(CPUTLBEntry, TY), \ (ADDR) & TARGET_PAGE_MASK) -/* NOTE: this function can trigger an exception */ -/* NOTE2: the returned address is not exactly the physical address: it - * is actually a ram_addr_t (in system mode; the user mode emulation - * version of this function returns a guest virtual address). +/* + * Return a ram_addr_t for the virtual address for execution. + * + * Return -1 if we can't translate and execute from an entire page + * of RAM. This will force us to execute by loading and translating + * one insn at a time, without caching. + * + * NOTE: This function will trigger an exception if the page is + * not executable. */ tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr) { @@ -1043,19 +1045,20 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr) tlb_fill(env_cpu(env), addr, 0, MMU_INST_FETCH, mmu_idx, 0); index = tlb_index(env, mmu_idx, addr); entry = tlb_entry(env, mmu_idx, addr); + + if (unlikely(entry->addr_code & TLB_INVALID_MASK)) { + /* + * The MMU protection covers a smaller range than a target + * page, so we must redo the MMU check for every insn. + */ + return -1; + } } assert(tlb_hit(entry->addr_code, addr)); } - if (unlikely(entry->addr_code & (TLB_RECHECK | TLB_MMIO))) { - /* - * Return -1 if we can't translate and execute from an entire - * page of RAM here, which will cause us to execute by loading - * and translating one insn at a time, without caching: - * - TLB_RECHECK: means the MMU protection covers a smaller range - * than a target page, so we must redo the MMU check every insn - * - TLB_MMIO: region is not backed by RAM - */ + if (unlikely(entry->addr_code & TLB_MMIO)) { + /* The region is not backed by RAM. */ return -1; } @@ -1180,7 +1183,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr, } /* Notice an IO access or a needs-MMU-lookup access */ - if (unlikely(tlb_addr & (TLB_MMIO | TLB_RECHECK))) { + if (unlikely(tlb_addr & TLB_MMIO)) { /* There's really nothing that can be done to support this apart from stop-the-world. */ goto stop_the_world; @@ -1258,6 +1261,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, entry = tlb_entry(env, mmu_idx, addr); } tlb_addr = code_read ? entry->addr_code : entry->addr_read; + tlb_addr &= ~TLB_INVALID_MASK; } /* Handle an IO access. */ @@ -1265,27 +1269,6 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, if ((addr & (size - 1)) != 0) { goto do_unaligned_access; } - - if (tlb_addr & TLB_RECHECK) { - /* - * This is a TLB_RECHECK access, where the MMU protection - * covers a smaller range than a target page, and we must - * repeat the MMU check here. This tlb_fill() call might - * longjump out if this access should cause a guest exception. - */ - tlb_fill(env_cpu(env), addr, size, - access_type, mmu_idx, retaddr); - index = tlb_index(env, mmu_idx, addr); - entry = tlb_entry(env, mmu_idx, addr); - - tlb_addr = code_read ? entry->addr_code : entry->addr_read; - tlb_addr &= ~TLB_RECHECK; - if (!(tlb_addr & ~TARGET_PAGE_MASK)) { - /* RAM access */ - goto do_aligned_access; - } - } - return io_readx(env, &env_tlb(env)->d[mmu_idx].iotlb[index], mmu_idx, addr, retaddr, access_type, op); } @@ -1314,7 +1297,6 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi, return res & MAKE_64BIT_MASK(0, size * 8); } - do_aligned_access: haddr = (void *)((uintptr_t)addr + entry->addend); switch (op) { case MO_UB: @@ -1509,27 +1491,6 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val, if ((addr & (size - 1)) != 0) { goto do_unaligned_access; } - - if (tlb_addr & TLB_RECHECK) { - /* - * This is a TLB_RECHECK access, where the MMU protection - * covers a smaller range than a target page, and we must - * repeat the MMU check here. This tlb_fill() call might - * longjump out if this access should cause a guest exception. - */ - tlb_fill(env_cpu(env), addr, size, MMU_DATA_STORE, - mmu_idx, retaddr); - index = tlb_index(env, mmu_idx, addr); - entry = tlb_entry(env, mmu_idx, addr); - - tlb_addr = tlb_addr_write(entry); - tlb_addr &= ~TLB_RECHECK; - if (!(tlb_addr & ~TARGET_PAGE_MASK)) { - /* RAM access */ - goto do_aligned_access; - } - } - io_writex(env, &env_tlb(env)->d[mmu_idx].iotlb[index], mmu_idx, val, addr, retaddr, op); return; @@ -1579,7 +1540,6 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val, return; } - do_aligned_access: haddr = (void *)((uintptr_t)addr + entry->addend); switch (op) { case MO_UB: