Message ID | 20201016133317.553068-1-f4bug@amsat.org |
---|---|
State | New |
Headers | show |
Series | [RFC,v3] target/mips: Increase number of TLB entries on the 34Kf core (16 -> 64) | expand |
On 10/16/20 6:33 AM, Philippe Mathieu-Daudé wrote: > Per "MIPS32 34K Processor Core Family Software User's Manual, > Revision 01.13" page 8 in "Joint TLB (JTLB)" section: > > "The JTLB is a fully associative TLB cache containing 16, 32, > or 64-dual-entries mapping up to 128 virtual pages to their > corresponding physical addresses." > > There is no particular reason to restrict the 34Kf core model to > 16 TLB entries, so raise its config to 64. > > This is helpful for other projects, in particular the Yocto Project: > > Yocto Project uses qemu-system-mips 34Kf cpu model, to run 32bit > MIPS CI loop. It was observed that in this case CI test execution > time was almost twice longer than 64bit MIPS variant that runs > under MIPS64R2-generic model. It was investigated and concluded > that the difference in number of TLBs 16 in 34Kf case vs 64 in > MIPS64R2-generic is responsible for most of CI real time execution > difference. Because with 16 TLBs linux user-land trashes TLB more > and it needs to execute more instructions in TLB refill handler > calls, as result it runs much longer. > > (https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html) > > Buglink: https://bugzilla.yoctoproject.org/show_bug.cgi?id=13992 > Reported-by: Victor Kamensky <kamensky@cisco.com> > Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> > --- > v3: KISS > Supersedes: <20201015224746.540027-1-f4bug@amsat.org> > Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> > --- > target/mips/translate_init.c.inc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~
On 10/16/20 7:28 PM, Richard Henderson wrote: > On 10/16/20 6:33 AM, Philippe Mathieu-Daudé wrote: >> Per "MIPS32 34K Processor Core Family Software User's Manual, >> Revision 01.13" page 8 in "Joint TLB (JTLB)" section: >> >> "The JTLB is a fully associative TLB cache containing 16, 32, >> or 64-dual-entries mapping up to 128 virtual pages to their >> corresponding physical addresses." >> >> There is no particular reason to restrict the 34Kf core model to >> 16 TLB entries, so raise its config to 64. >> >> This is helpful for other projects, in particular the Yocto Project: >> >> Yocto Project uses qemu-system-mips 34Kf cpu model, to run 32bit >> MIPS CI loop. It was observed that in this case CI test execution >> time was almost twice longer than 64bit MIPS variant that runs >> under MIPS64R2-generic model. It was investigated and concluded >> that the difference in number of TLBs 16 in 34Kf case vs 64 in >> MIPS64R2-generic is responsible for most of CI real time execution >> difference. Because with 16 TLBs linux user-land trashes TLB more >> and it needs to execute more instructions in TLB refill handler >> calls, as result it runs much longer. >> >> (https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html) >> >> Buglink: https://bugzilla.yoctoproject.org/show_bug.cgi?id=13992 >> Reported-by: Victor Kamensky <kamensky@cisco.com> >> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> >> --- >> v3: KISS >> Supersedes: <20201015224746.540027-1-f4bug@amsat.org> >> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> >> --- >> target/mips/translate_init.c.inc | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) > > Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Thanks, applied to mips-next.
diff --git a/target/mips/translate_init.c.inc b/target/mips/translate_init.c.inc index 637caccd890..ad21756f4d9 100644 --- a/target/mips/translate_init.c.inc +++ b/target/mips/translate_init.c.inc @@ -254,7 +254,7 @@ const mips_def_t mips_defs[] = .CP0_PRid = 0x00019500, .CP0_Config0 = MIPS_CONFIG0 | (0x1 << CP0C0_AR) | (MMU_TYPE_R4000 << CP0C0_MT), - .CP0_Config1 = MIPS_CONFIG1 | (1 << CP0C1_FP) | (15 << CP0C1_MMU) | + .CP0_Config1 = MIPS_CONFIG1 | (1 << CP0C1_FP) | (63 << CP0C1_MMU) | (0 << CP0C1_IS) | (3 << CP0C1_IL) | (1 << CP0C1_IA) | (0 << CP0C1_DS) | (3 << CP0C1_DL) | (1 << CP0C1_DA) | (1 << CP0C1_CA),