Message ID | 20190820210720.18976-1-richard.henderson@linaro.org |
---|---|
Headers | show |
Series | target/arm: Reduce overhead of cpu_get_tb_cpu_state | expand |
On 8/20/19 2:07 PM, Richard Henderson wrote: > Changes since v4: > * Split patch 1 into 15 smaller patches. > * Cache the new DEBUG_TARGET_EL field. > * Split out m-profile hflags separately from a-profile 32-bit. > * Move around non-cached tb flags as well, avoiding repetitive > checks for m-profile or other mutually exclusive conditions. Just after I posted this, I started rebasing my VHE patch set on top, and I found that the new DEBUG_TARGET_EL field has used The Last Bit, so that I could not add the one bit that I need for VHE. However, while working on this patch set, I noticed that we have a lot of unnecessary overlap between A- and M- profile in the TBFLAGs. Thus point 4 above and the completely separate rebuild_hflags_m32(). If we rearrange things like the appended, then we recover 4 bits. Thoughts? r~ diff --git a/target/arm/cpu.h b/target/arm/cpu.h index 91a54662c3..0c2803baa1 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -3183,38 +3183,50 @@ FIELD(TBFLAG_ANY, BE_DATA, 23, 1) */ FIELD(TBFLAG_ANY, DEBUG_TARGET_EL, 21, 2) -/* Bit usage when in AArch32 state: */ -FIELD(TBFLAG_A32, THUMB, 0, 1) /* Not cached. */ -FIELD(TBFLAG_A32, VECLEN, 1, 3) /* Not cached. */ -FIELD(TBFLAG_A32, VECSTRIDE, 4, 2) /* Not cached. */ +/* + * Bit usage when in AArch32 state, both A- and M-profile. + */ +FIELD(TBFLAG_AM32, CONDEXEC, 0, 8) /* Not cached. */ + +/* + * Bit usage when in AArch32 state, for A-profile only. + */ +FIELD(TBFLAG_A32, THUMB, 8, 1) /* Not cached. */ +FIELD(TBFLAG_A32, VECLEN, 9, 3) /* Not cached. */ +FIELD(TBFLAG_A32, VECSTRIDE, 12, 2) /* Not cached. */ /* * We store the bottom two bits of the CPAR as TB flags and handle * checks on the other bits at runtime. This shares the same bits as * VECSTRIDE, which is OK as no XScale CPU has VFP. * Not cached, because VECLEN+VECSTRIDE are not cached. */ -FIELD(TBFLAG_A32, XSCALE_CPAR, 4, 2) +FIELD(TBFLAG_A32, XSCALE_CPAR, 12, 2) +FIELD(TBFLAG_A32, VFPEN, 14, 1) /* Partially cached, minus FPEXC. */ +FIELD(TBFLAG_A32, SCTLR_B, 15, 1) /* * Indicates whether cp register reads and writes by guest code should access * the secure or nonsecure bank of banked registers; note that this is not * the same thing as the current security state of the processor! */ -FIELD(TBFLAG_A32, NS, 6, 1) -FIELD(TBFLAG_A32, VFPEN, 7, 1) /* Partially cached, minus FPEXC. */ -FIELD(TBFLAG_A32, CONDEXEC, 8, 8) /* Not cached. */ -FIELD(TBFLAG_A32, SCTLR_B, 16, 1) -/* For M profile only, set if FPCCR.LSPACT is set */ -FIELD(TBFLAG_A32, LSPACT, 18, 1) /* Not cached. */ -/* For M profile only, set if we must create a new FP context */ -FIELD(TBFLAG_A32, NEW_FP_CTXT_NEEDED, 19, 1) /* Not cached. */ -/* For M profile only, set if FPCCR.S does not match current security state */ -FIELD(TBFLAG_A32, FPCCR_S_WRONG, 20, 1) /* Not cached. */ -/* For M profile only, Handler (ie not Thread) mode */ -FIELD(TBFLAG_A32, HANDLER, 21, 1) -/* For M profile only, whether we should generate stack-limit checks */ -FIELD(TBFLAG_A32, STACKCHECK, 22, 1) +FIELD(TBFLAG_A32, NS, 16, 1) -/* Bit usage when in AArch64 state */ +/* + * Bit usage when in AArch32 state, for M-profile only. + */ +/* Set if FPCCR.LSPACT is set */ +FIELD(TBFLAG_M32, LSPACT, 8, 1) /* Not cached. */ +/* Set if we must create a new FP context */ +FIELD(TBFLAG_M32, NEW_FP_CTXT_NEEDED, 9, 1) /* Not cached. */ +/* Set if FPCCR.S does not match current security state */ +FIELD(TBFLAG_M32, FPCCR_S_WRONG, 10, 1) /* Not cached. */ +/* Handler (ie not Thread) mode */ +FIELD(TBFLAG_A32, HANDLER, 11, 1) +/* Whether we should generate stack-limit checks */ +FIELD(TBFLAG_A32, STACKCHECK, 12, 1) + +/* + * Bit usage when in AArch64 state + */ FIELD(TBFLAG_A64, TBII, 0, 2) FIELD(TBFLAG_A64, SVEEXC_EL, 2, 2) FIELD(TBFLAG_A64, ZCR_LEN, 4, 4)
On Wed, 21 Aug 2019 at 00:54, Richard Henderson <richard.henderson@linaro.org> wrote: > However, while working on this patch set, I noticed that we have a lot of > unnecessary overlap between A- and M- profile in the TBFLAGs. Thus point 4 > above and the completely separate rebuild_hflags_m32(). > > If we rearrange things like the appended, then we recover 4 bits. You can't make the THUMB bit A-profile only: we need it in M-profile too, so we can correctly generate code that takes the InvalidState exception for attempts to execute with the Thumb bit not set. If you want to make VFPEN be A-profile only you need to do something so we don't look at it for M-profile: currently we set it always-1 for M-profile so we don't trip the code that causes us to take an exception if it's 0. Otherwise seems reasonable. My overall question is: how bad is it if we just start using bits in the cs_base word? If we try to get too tricky with using the same bits for different purposes it opens the door for accidentally writing code where we use a bit that isn't actually set correctly for all the situations where we're using it. thanks -- PMM
On 9/4/19 3:48 AM, Peter Maydell wrote: > On Wed, 21 Aug 2019 at 00:54, Richard Henderson > <richard.henderson@linaro.org> wrote: >> However, while working on this patch set, I noticed that we have a lot of >> unnecessary overlap between A- and M- profile in the TBFLAGs. Thus point 4 >> above and the completely separate rebuild_hflags_m32(). >> >> If we rearrange things like the appended, then we recover 4 bits. > > You can't make the THUMB bit A-profile only: we need it in > M-profile too, so we can correctly generate code that takes > the InvalidState exception for attempts to execute with the > Thumb bit not set. Yes, IIRC I found that when I went to make this work, rather than merely a sketch through the header file. > If you want to make VFPEN be A-profile only you need to > do something so we don't look at it for M-profile: currently > we set it always-1 for M-profile so we don't trip the code > that causes us to take an exception if it's 0. > > Otherwise seems reasonable. My overall question is: how bad > is it if we just start using bits in the cs_base word? *shrug* Even then we don't want to waste bits; there are only 32 of them remaining, and after that there's no room at all for expansion. > If we try to get too tricky with using the same bits for > different purposes it opens the door for accidentally writing > code where we use a bit that isn't actually set correctly > for all the situations where we're using it. Thankfully, we don't touch these bits in many places. We set them in the generator function, and we read them at the beginning of the translator. r~