Message ID | 20180131165334.23175-6-julien.grall@arm.com |
---|---|
State | New |
Headers | show |
Series | xen/arm32: Branch predictor hardening (XSA-254 variant 2) | expand |
Hi, On 31/01/18 16:53, Julien Grall wrote: > +GLOBAL(hyp_traps_vector_bp_inv) > + /* > + * We encode the exception entry in the bottom 3 bits of > + * SP, and we have to guarantee to be 8 bytes aligned. > + */ > + add sp, sp, #1 /* Reset 7 */ > + add sp, sp, #1 /* Undef 6 */ > + add sp, sp, #1 /* Hypervisor Call 5 */ > + add sp, sp, #1 /* Prefetch abort 4 */ > + add sp, sp, #1 /* Data abort 3 */ > + add sp, sp, #1 /* Hypervisor 2 */ > + add sp, sp, #1 /* IRQ 1 */ > + nop /* FIQ 0 */ > + > + mcr p15, 0, r0, c7, c5, 6 /* BPIALL */ > + isb > + > + /* > + * As we cannot use any temporary registers and cannot > + * clobber SP, we can decode the exception entry using > + * an unrolled binary search. > + */ > + tst sp, #4 > + bne 1f > + > + tst sp, #2 > + bne 3f > + > + tst sp, #1 > + bic sp, sp, #0x7 > + bne trap_irq > + b trap_fiq > + > +1: > + tst sp, #2 > + bne 2f > + > + tst sp, #1 > + bic sp, sp, #0x7 > + bne trap_hypervisor_call > + b trap_prefetch_abort > + > +2: > + tst sp, #1 > + bic sp, sp, #0x7 > + bne trap_reset > + b trap_undefined_instruction > + > +3: > + tst sp, #1 > + bic sp, sp, #0x7 > + bne trap_data_abort > + b trap_guest_sync Just after I hit the sent button Marc pointed me to a discussion on Linux ML [1]. They found a solution which would streamline the code, improve the readability and prioritize exception. I would be tempt to use a similar solution on Xen. Stefano, do you have any opinions? Shall I respin the series or do a follow-up? Cheers,
On Wed, 31 Jan 2018, Julien Grall wrote: > Hi, > > On 31/01/18 16:53, Julien Grall wrote: > > +GLOBAL(hyp_traps_vector_bp_inv) > > + /* > > + * We encode the exception entry in the bottom 3 bits of > > + * SP, and we have to guarantee to be 8 bytes aligned. > > + */ > > + add sp, sp, #1 /* Reset 7 */ > > + add sp, sp, #1 /* Undef 6 */ > > + add sp, sp, #1 /* Hypervisor Call 5 */ > > + add sp, sp, #1 /* Prefetch abort 4 */ > > + add sp, sp, #1 /* Data abort 3 */ > > + add sp, sp, #1 /* Hypervisor 2 */ > > + add sp, sp, #1 /* IRQ 1 */ > > + nop /* FIQ 0 */ > > + > > + mcr p15, 0, r0, c7, c5, 6 /* BPIALL */ > > + isb > > + > > + /* > > + * As we cannot use any temporary registers and cannot > > + * clobber SP, we can decode the exception entry using > > + * an unrolled binary search. > > + */ > > + tst sp, #4 > > + bne 1f > > + > > + tst sp, #2 > > + bne 3f > > + > > + tst sp, #1 > > + bic sp, sp, #0x7 > > + bne trap_irq > > + b trap_fiq > > + > > +1: > > + tst sp, #2 > > + bne 2f > > + > > + tst sp, #1 > > + bic sp, sp, #0x7 > > + bne trap_hypervisor_call > > + b trap_prefetch_abort > > + > > +2: > > + tst sp, #1 > > + bic sp, sp, #0x7 > > + bne trap_reset > > + b trap_undefined_instruction > > + > > +3: > > + tst sp, #1 > > + bic sp, sp, #0x7 > > + bne trap_data_abort > > + b trap_guest_sync > > Just after I hit the sent button Marc pointed me to a discussion on Linux ML > [1]. > > They found a solution which would streamline the code, improve the readability > and prioritize exception. > > I would be tempt to use a similar solution on Xen. Stefano, do you have any > opinions? Shall I respin the series or do a follow-up? You forgot the link :-) I prefer if you respin, because it is going to make the backports slightly easier.
On 31/01/18 17:39, Stefano Stabellini wrote: > On Wed, 31 Jan 2018, Julien Grall wrote: >> Hi, >> >> On 31/01/18 16:53, Julien Grall wrote: >>> +GLOBAL(hyp_traps_vector_bp_inv) >>> + /* >>> + * We encode the exception entry in the bottom 3 bits of >>> + * SP, and we have to guarantee to be 8 bytes aligned. >>> + */ >>> + add sp, sp, #1 /* Reset 7 */ >>> + add sp, sp, #1 /* Undef 6 */ >>> + add sp, sp, #1 /* Hypervisor Call 5 */ >>> + add sp, sp, #1 /* Prefetch abort 4 */ >>> + add sp, sp, #1 /* Data abort 3 */ >>> + add sp, sp, #1 /* Hypervisor 2 */ >>> + add sp, sp, #1 /* IRQ 1 */ >>> + nop /* FIQ 0 */ >>> + >>> + mcr p15, 0, r0, c7, c5, 6 /* BPIALL */ >>> + isb >>> + >>> + /* >>> + * As we cannot use any temporary registers and cannot >>> + * clobber SP, we can decode the exception entry using >>> + * an unrolled binary search. >>> + */ >>> + tst sp, #4 >>> + bne 1f >>> + >>> + tst sp, #2 >>> + bne 3f >>> + >>> + tst sp, #1 >>> + bic sp, sp, #0x7 >>> + bne trap_irq >>> + b trap_fiq >>> + >>> +1: >>> + tst sp, #2 >>> + bne 2f >>> + >>> + tst sp, #1 >>> + bic sp, sp, #0x7 >>> + bne trap_hypervisor_call >>> + b trap_prefetch_abort >>> + >>> +2: >>> + tst sp, #1 >>> + bic sp, sp, #0x7 >>> + bne trap_reset >>> + b trap_undefined_instruction >>> + >>> +3: >>> + tst sp, #1 >>> + bic sp, sp, #0x7 >>> + bne trap_data_abort >>> + b trap_guest_sync >> >> Just after I hit the sent button Marc pointed me to a discussion on Linux ML >> [1]. >> >> They found a solution which would streamline the code, improve the readability >> and prioritize exception. >> >> I would be tempt to use a similar solution on Xen. Stefano, do you have any >> opinions? Shall I respin the series or do a follow-up? > > You forgot the link :-) > I prefer if you respin, because it is going to make the backports > slightly easier. Here the link: https://www.spinics.net/lists/arm-kernel/msg631860.html Cheers,
diff --git a/xen/arch/arm/arm32/entry.S b/xen/arch/arm/arm32/entry.S index 828e52c25c..a295f3ad67 100644 --- a/xen/arch/arm/arm32/entry.S +++ b/xen/arch/arm/arm32/entry.S @@ -160,6 +160,61 @@ GLOBAL(hyp_traps_vector) b trap_irq /* 0x18 - IRQ */ b trap_fiq /* 0x1c - FIQ */ + .align 5 +GLOBAL(hyp_traps_vector_bp_inv) + /* + * We encode the exception entry in the bottom 3 bits of + * SP, and we have to guarantee to be 8 bytes aligned. + */ + add sp, sp, #1 /* Reset 7 */ + add sp, sp, #1 /* Undef 6 */ + add sp, sp, #1 /* Hypervisor Call 5 */ + add sp, sp, #1 /* Prefetch abort 4 */ + add sp, sp, #1 /* Data abort 3 */ + add sp, sp, #1 /* Hypervisor 2 */ + add sp, sp, #1 /* IRQ 1 */ + nop /* FIQ 0 */ + + mcr p15, 0, r0, c7, c5, 6 /* BPIALL */ + isb + + /* + * As we cannot use any temporary registers and cannot + * clobber SP, we can decode the exception entry using + * an unrolled binary search. + */ + tst sp, #4 + bne 1f + + tst sp, #2 + bne 3f + + tst sp, #1 + bic sp, sp, #0x7 + bne trap_irq + b trap_fiq + +1: + tst sp, #2 + bne 2f + + tst sp, #1 + bic sp, sp, #0x7 + bne trap_hypervisor_call + b trap_prefetch_abort + +2: + tst sp, #1 + bic sp, sp, #0x7 + bne trap_reset + b trap_undefined_instruction + +3: + tst sp, #1 + bic sp, sp, #0x7 + bne trap_data_abort + b trap_guest_sync + DEFINE_TRAP_ENTRY(reset) DEFINE_TRAP_ENTRY(undefined_instruction) DEFINE_TRAP_ENTRY(hypervisor_call) diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c index 0a138fa735..c79e6d65d3 100644 --- a/xen/arch/arm/cpuerrata.c +++ b/xen/arch/arm/cpuerrata.c @@ -198,6 +198,13 @@ install_bp_hardening_vecs(const struct arm_cpu_capabilities *entry, this_cpu(bp_harden_vecs) = hyp_vecs; } +static int enable_bp_inv_hardening(void *data) +{ + install_bp_hardening_vecs(data, hyp_traps_vector_bp_inv, + "execute BPIALL"); + return 0; +} + #endif #define MIDR_RANGE(model, min, max) \ @@ -284,6 +291,18 @@ static const struct arm_cpu_capabilities arm_errata[] = { .enable = enable_psci_bp_hardening, }, #endif +#ifdef CONFIG_ARM32_HARDEN_BRANCH_PREDICTOR + { + .capability = ARM_HARDEN_BRANCH_PREDICTOR, + MIDR_ALL_VERSIONS(MIDR_CORTEX_A12), + .enable = enable_bp_inv_hardening, + }, + { + .capability = ARM_HARDEN_BRANCH_PREDICTOR, + MIDR_ALL_VERSIONS(MIDR_CORTEX_A17), + .enable = enable_bp_inv_hardening, + }, +#endif {}, };