Message ID | 1308752192-17849-1-git-send-email-peter.maydell@linaro.org |
---|---|
State | Accepted |
Commit | 5b620fb698e69a5386f2f02c7c455bdbdd59a52b |
Headers | show |
Ping? On 22 June 2011 15:16, Peter Maydell <peter.maydell@linaro.org> wrote: > The target-arm frontend's worst-case TCG ops per instr is 194 (and in > general many of the "load multiple registers" ARM instructions generate > more than 100 TCG ops). Raise MAX_OP_PER_INSTR accordingly to avoid > possible buffer overruns. > > Since it doesn't make any sense for the "64 bit guest on 32 bit host" > case to have a smaller limit than the normal case, we collapse the > two cases back into each other again. > > (This increase costs us about 14K in extra static buffer space and > 21K of extra margin at the end of a 32MB codegen buffer.) > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > --- > You might recall the patchset which moves the Neon load/store multiple > instructions to helper functions, and which turns out to slow them > down rather a lot. This is the other approach, which is just to > raise the limit so that the existing implementations don't risk > buffer overruns. The extra memory costs are tiny IMHO. > (The Neon instructions are the worst offenders but the VFP load/store > multiple insns also breach the previous limit. I think we should > consider an implementation of an instruction that's been basically > the same since VFP support was added to QEMU in 2005 to be an > acceptable one, and make sure our buffer sizes cope with it :-)) > > exec-all.h | 6 +----- > 1 files changed, 1 insertions(+), 5 deletions(-) > > diff --git a/exec-all.h b/exec-all.h > index 2a13a95..ef5f5b6 100644 > --- a/exec-all.h > +++ b/exec-all.h > @@ -43,11 +43,7 @@ typedef ram_addr_t tb_page_addr_t; > typedef struct TranslationBlock TranslationBlock; > > /* XXX: make safe guess about sizes */ > -#if (HOST_LONG_BITS == 32) && (TARGET_LONG_BITS == 64) > -#define MAX_OP_PER_INSTR 128 > -#else > -#define MAX_OP_PER_INSTR 96 > -#endif > +#define MAX_OP_PER_INSTR 208 > > #if HOST_LONG_BITS == 32 > #define MAX_OPC_PARAM_PER_ARG 2 > -- > 1.7.1
Thanks, applied. On Wed, Jul 6, 2011 at 2:15 PM, Peter Maydell <peter.maydell@linaro.org> wrote: > Ping? > > On 22 June 2011 15:16, Peter Maydell <peter.maydell@linaro.org> wrote: >> The target-arm frontend's worst-case TCG ops per instr is 194 (and in >> general many of the "load multiple registers" ARM instructions generate >> more than 100 TCG ops). Raise MAX_OP_PER_INSTR accordingly to avoid >> possible buffer overruns. >> >> Since it doesn't make any sense for the "64 bit guest on 32 bit host" >> case to have a smaller limit than the normal case, we collapse the >> two cases back into each other again. >> >> (This increase costs us about 14K in extra static buffer space and >> 21K of extra margin at the end of a 32MB codegen buffer.) >> >> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> >> --- >> You might recall the patchset which moves the Neon load/store multiple >> instructions to helper functions, and which turns out to slow them >> down rather a lot. This is the other approach, which is just to >> raise the limit so that the existing implementations don't risk >> buffer overruns. The extra memory costs are tiny IMHO. >> (The Neon instructions are the worst offenders but the VFP load/store >> multiple insns also breach the previous limit. I think we should >> consider an implementation of an instruction that's been basically >> the same since VFP support was added to QEMU in 2005 to be an >> acceptable one, and make sure our buffer sizes cope with it :-)) >> >> exec-all.h | 6 +----- >> 1 files changed, 1 insertions(+), 5 deletions(-) >> >> diff --git a/exec-all.h b/exec-all.h >> index 2a13a95..ef5f5b6 100644 >> --- a/exec-all.h >> +++ b/exec-all.h >> @@ -43,11 +43,7 @@ typedef ram_addr_t tb_page_addr_t; >> typedef struct TranslationBlock TranslationBlock; >> >> /* XXX: make safe guess about sizes */ >> -#if (HOST_LONG_BITS == 32) && (TARGET_LONG_BITS == 64) >> -#define MAX_OP_PER_INSTR 128 >> -#else >> -#define MAX_OP_PER_INSTR 96 >> -#endif >> +#define MAX_OP_PER_INSTR 208 >> >> #if HOST_LONG_BITS == 32 >> #define MAX_OPC_PARAM_PER_ARG 2 >> -- >> 1.7.1 > >
diff --git a/exec-all.h b/exec-all.h index 2a13a95..ef5f5b6 100644 --- a/exec-all.h +++ b/exec-all.h @@ -43,11 +43,7 @@ typedef ram_addr_t tb_page_addr_t; typedef struct TranslationBlock TranslationBlock; /* XXX: make safe guess about sizes */ -#if (HOST_LONG_BITS == 32) && (TARGET_LONG_BITS == 64) -#define MAX_OP_PER_INSTR 128 -#else -#define MAX_OP_PER_INSTR 96 -#endif +#define MAX_OP_PER_INSTR 208 #if HOST_LONG_BITS == 32 #define MAX_OPC_PARAM_PER_ARG 2
The target-arm frontend's worst-case TCG ops per instr is 194 (and in general many of the "load multiple registers" ARM instructions generate more than 100 TCG ops). Raise MAX_OP_PER_INSTR accordingly to avoid possible buffer overruns. Since it doesn't make any sense for the "64 bit guest on 32 bit host" case to have a smaller limit than the normal case, we collapse the two cases back into each other again. (This increase costs us about 14K in extra static buffer space and 21K of extra margin at the end of a 32MB codegen buffer.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- You might recall the patchset which moves the Neon load/store multiple instructions to helper functions, and which turns out to slow them down rather a lot. This is the other approach, which is just to raise the limit so that the existing implementations don't risk buffer overruns. The extra memory costs are tiny IMHO. (The Neon instructions are the worst offenders but the VFP load/store multiple insns also breach the previous limit. I think we should consider an implementation of an instruction that's been basically the same since VFP support was added to QEMU in 2005 to be an acceptable one, and make sure our buffer sizes cope with it :-)) exec-all.h | 6 +----- 1 files changed, 1 insertions(+), 5 deletions(-)