mbox series

[0/5] arm64: Add workaround for Cortex-A77 erratum 1542418

Message ID 20191114145918.235339-1-suzuki.poulose@arm.com
Headers show
Series arm64: Add workaround for Cortex-A77 erratum 1542418 | expand

Message

Suzuki K Poulose Nov. 14, 2019, 2:59 p.m. UTC
This series adds workaround for Arm erratum 1542418 which affects
Cortex-A77 cores (r0p0 - r1p0). Affected cores may execute stale
instructions from the L0 macro-op cache violating the
prefetch-speculation-protection guaranteed by the architecture.
This happens when the when the branch predictor bases its predictions
on a branch at this address on the stale history due to ASID or VMID
reuse.

The workaround is to invalidate the branch history before reusing
any ASID for a new address space. This is done by ensuring 60 ASIDs
are selected before any ASID is reused.


James Morse (5):
  arm64: Add MIDR encoding for Arm Cortex-A77
  arm64: mm: Workaround Cortex-A77 erratum 1542418 on ASID rollover
  arm64: Workaround Cortex-A77 erratum 1542418 on boot due to kexec
  KVM: arm64: Workaround Cortex-A77 erratum 1542418 on VMID rollover
  KVM: arm/arm64: Don't invoke defacto-CnP on first run

 Documentation/arm64/silicon-errata.rst |  2 +
 arch/arm/include/asm/kvm_mmu.h         |  5 ++
 arch/arm64/Kconfig                     | 16 ++++++
 arch/arm64/include/asm/cpucaps.h       |  3 +-
 arch/arm64/include/asm/cputype.h       |  2 +
 arch/arm64/include/asm/kvm_mmu.h       | 15 ++++++
 arch/arm64/include/asm/mmu_context.h   |  1 +
 arch/arm64/kernel/cpu_errata.c         | 21 ++++++++
 arch/arm64/mm/context.c                | 73 +++++++++++++++++++++++++-
 virt/kvm/arm/arm.c                     | 23 +++++---
 10 files changed, 151 insertions(+), 10 deletions(-)

-- 
2.23.0

Comments

Will Deacon Nov. 14, 2019, 4:39 p.m. UTC | #1
Hi Suzuki,

On Thu, Nov 14, 2019 at 02:59:13PM +0000, Suzuki K Poulose wrote:
> This series adds workaround for Arm erratum 1542418 which affects


Searching for that erratum number doesn't find me a description :(

> Cortex-A77 cores (r0p0 - r1p0). Affected cores may execute stale

> instructions from the L0 macro-op cache violating the

> prefetch-speculation-protection guaranteed by the architecture.

> This happens when the when the branch predictor bases its predictions

> on a branch at this address on the stale history due to ASID or VMID

> reuse.


Two immediate questions:

 1. Can we disable the L0 MOP cache?
 2. Can we invalidate the branch predictor? If Spectre-v2 taught us
    anything it's that removing those instructions was a mistake!

Moving on...

Have you reproduced this at top-level? If I recall the
prefetch-speculation-protection, it's designed to protect against the
case where you have a direct branch:

addr:	B	foo

and another CPU writes out a new function:

bar:
	insn0
	...
	insnN

before doing any necessary maintenance and then patches the original
branch to:

addr:	B	bar

The idea is that a concurrently executing CPU could mispredict the original
branch to point at 'bar', fetch the instructions before they've been written
out and then confirm the prediction by looking at the newly written branch
instruction. Even without the prefetch-speculation-protection, that's
fairly difficult to achieve in practice: you'd need to be doing something
like reusing memory to hold the instructions so that the initial
misprediction occurs.

How does A77 stop this from occurring when the ASID is not reallocated (e.g.
the example above)? Is the MOP cache flushed somehow?

With this erratum, it sounds like you have to end up reusing an ASID from
a task that had a branch at 'addr' in its address space that branched to
the address of 'bar' (again. in its address space). Is that right? That
sounds super rare to me, particularly with ASLR: not only does the aliasing
branch need to exist, but it needs to be held in the branch predictor while
we cycle through 64k ASIDs *and* the race with the writer needs to happen
so that we get stale instructions from the MOP cache.

Is there something I'm missing that makes this remotely plausible?

Will
Suzuki K Poulose Nov. 15, 2019, 1:14 a.m. UTC | #2
Hi Will

On 11/14/2019 04:39 PM, Will Deacon wrote:
> Hi Suzuki,

> 

> On Thu, Nov 14, 2019 at 02:59:13PM +0000, Suzuki K Poulose wrote:

>> This series adds workaround for Arm erratum 1542418 which affects

> 

> Searching for that erratum number doesn't find me a description :(


I believe this was published in the Cortex-A77 SDEN v9.0. I will chase
it internally.

> 

>> Cortex-A77 cores (r0p0 - r1p0). Affected cores may execute stale

>> instructions from the L0 macro-op cache violating the

>> prefetch-speculation-protection guaranteed by the architecture.

>> This happens when the when the branch predictor bases its predictions

>> on a branch at this address on the stale history due to ASID or VMID

>> reuse.

> 

> Two immediate questions:

> 

>   1. Can we disable the L0 MOP cache?

Yes, but it hurts performance.

>   2. Can we invalidate the branch predictor? If Spectre-v2 taught us

>      anything it's that removing those instructions was a mistake!


The workaround suggested is actually invalidating the branch history
but in a costly way. I am unaware of any.
> Moving on...

> 

> Have you reproduced this at top-level? If I recall the

> prefetch-speculation-protection, it's designed to protect against the

> case where you have a direct branch:


No, see below.

> 

> addr:	B	foo

> 

> and another CPU writes out a new function:

> 

> bar:

> 	insn0

> 	...

> 	insnN

> 

> before doing any necessary maintenance and then patches the original

> branch to:

> 

> addr:	B	bar

> 

> The idea is that a concurrently executing CPU could mispredict the original

> branch to point at 'bar', fetch the instructions before they've been written

> out and then confirm the prediction by looking at the newly written branch

> instruction. Even without the prefetch-speculation-protection, that's

> fairly difficult to achieve in practice: you'd need to be doing something

> like reusing memory to hold the instructions so that the initial

> misprediction occurs.

> 

> How does A77 stop this from occurring when the ASID is not reallocated (e.g.

> the example above)? Is the MOP cache flushed somehow?


IIUC, The MOP cache is flushed on I-cache invalidate, thus it is fine.	

> 

> With this erratum, it sounds like you have to end up reusing an ASID from

> a task that had a branch at 'addr' in its address space that branched to

> the address of 'bar' (again. in its address space). Is that right? That

> sounds super rare to me, particularly with ASLR: not only does the aliasing


AFAICS, yes and on top of that, it should also miss "addr" in MOP-cache
and hit "bar" before the I-cache invalidate is received. This may cause
the "bar" to be fetched from mop (and is not canceled even though there
was a mop-flush triggered by the i-cache invalidate after the hit) and
"addr" should miss in I-cache, causing it to fetch the updated instruction.

Also this means that the new context must not have executed "addr"
(which would give a hit in MOP-cache) while "bar" was fetched. So,
this adds on more constraints to actually hit it.

> branch need to exist, but it needs to be held in the branch predictor while

> we cycle through 64k ASIDs *and* the race with the writer needs to happen

> so that we get stale instructions from the MOP cache.

> 

> Is there something I'm missing that makes this remotely plausible?


No :-)

Cheers
Suzuki
Will Deacon Nov. 20, 2019, 7:18 p.m. UTC | #3
On Fri, Nov 15, 2019 at 01:14:07AM +0000, Suzuki K Poulose wrote:
> On 11/14/2019 04:39 PM, Will Deacon wrote:

> > On Thu, Nov 14, 2019 at 02:59:13PM +0000, Suzuki K Poulose wrote:

> > > This series adds workaround for Arm erratum 1542418 which affects

> > > Cortex-A77 cores (r0p0 - r1p0). Affected cores may execute stale

> > > instructions from the L0 macro-op cache violating the

> > > prefetch-speculation-protection guaranteed by the architecture.

> > > This happens when the when the branch predictor bases its predictions

> > > on a branch at this address on the stale history due to ASID or VMID

> > > reuse.

> > 

> > Two immediate questions:

> > 

> >   1. Can we disable the L0 MOP cache?

> Yes, but it hurts performance.

> 

> >   2. Can we invalidate the branch predictor? If Spectre-v2 taught us

> >      anything it's that removing those instructions was a mistake!

> 

> The workaround suggested is actually invalidating the branch history

> but in a costly way. I am unaware of any.

> > Moving on...

> > 

> > Have you reproduced this at top-level? If I recall the

> > prefetch-speculation-protection, it's designed to protect against the

> > case where you have a direct branch:

> 

> No, see below.

> 

> > 

> > addr:	B	foo

> > 

> > and another CPU writes out a new function:

> > 

> > bar:

> > 	insn0

> > 	...

> > 	insnN

> > 

> > before doing any necessary maintenance and then patches the original

> > branch to:

> > 

> > addr:	B	bar

> > 

> > The idea is that a concurrently executing CPU could mispredict the original

> > branch to point at 'bar', fetch the instructions before they've been written

> > out and then confirm the prediction by looking at the newly written branch

> > instruction. Even without the prefetch-speculation-protection, that's

> > fairly difficult to achieve in practice: you'd need to be doing something

> > like reusing memory to hold the instructions so that the initial

> > misprediction occurs.

> > 

> > How does A77 stop this from occurring when the ASID is not reallocated (e.g.

> > the example above)? Is the MOP cache flushed somehow?

> 

> IIUC, The MOP cache is flushed on I-cache invalidate, thus it is fine.	


Hmm, so this is interesting. Does that mean we could do a local I-cache
invalidation in check_and_switch_context() at the same as doing the local
TLBI after a rollover?

I still don't grok the failure case, though, because assuming A77 has IDC=0,
then won't you see the I-cache maintenance from userspace anyway?

Will