diff mbox

arm64: Add workaround for Cavium erratum 27456

Message ID 1455046156-10582-1-git-send-email-ddaney.cavm@gmail.com
State New
Headers show

Commit Message

David Daney Feb. 9, 2016, 7:29 p.m. UTC
From: Andrew Pinski <apinski@cavium.com>


On ThunderX T88 pass 1.x through 2.1 parts, broadcast TLBI
instructions may cause the icache to become invalid if it contains
data for a non-current ASID.

This patch implements the workaround (which flushes the local icache
when switching the mm) by using code patching.

Signed-off-by: Andrew Pinski <apinski@cavium.com>

Signed-off-by: David Daney <david.daney@cavium.com>

---
 arch/arm64/Kconfig                  | 11 +++++++++++
 arch/arm64/include/asm/cpufeature.h |  3 ++-
 arch/arm64/kernel/cpu_errata.c      |  9 +++++++++
 arch/arm64/mm/proc.S                | 12 ++++++++++++
 4 files changed, 34 insertions(+), 1 deletion(-)

-- 
1.8.3.1

Comments

Will Deacon Feb. 10, 2016, 9:28 a.m. UTC | #1
On Tue, Feb 09, 2016 at 11:29:16AM -0800, David Daney wrote:
> From: Andrew Pinski <apinski@cavium.com>

> 

> On ThunderX T88 pass 1.x through 2.1 parts, broadcast TLBI

> instructions may cause the icache to become invalid if it contains

> data for a non-current ASID.

> 

> This patch implements the workaround (which flushes the local icache

> when switching the mm) by using code patching.


So, to be clear, is this "just" a performance problem as opposed to a
correctness issue? If so, do you have any numbers with and without this
change?

Will
Will Deacon Feb. 10, 2016, 6:15 p.m. UTC | #2
On Wed, Feb 10, 2016 at 10:08:17AM -0800, David Daney wrote:
> On 02/10/2016 01:28 AM, Will Deacon wrote:

> >On Tue, Feb 09, 2016 at 11:29:16AM -0800, David Daney wrote:

> >>From: Andrew Pinski <apinski@cavium.com>

> >>

> >>On ThunderX T88 pass 1.x through 2.1 parts, broadcast TLBI

> >>instructions may cause the icache to become invalid if it contains

> >>data for a non-current ASID.

> >>

> >>This patch implements the workaround (which flushes the local icache

> >>when switching the mm) by using code patching.

> >

> >So, to be clear, is this "just" a performance problem as opposed to a

> >correctness issue?

> 

> No.  It is a correctness issue.  Without this workaround in place, userspace

> programs end up executing the wrong instructions, which leads to

> unpredictable behavior and program crashes.


Ok, so I think the description in the commit log isn't quite right. An
"invalid" line in i-cache simply means that it needs to be refetched.
What you're talking about sounds like data corruption.

I also don't understand how the workaround fixes things like TLBIs due
to copy-on-write faults triggered by another core. Also, what's the
interaction with virtual machines, or is the VMID not affected in the
same way as the ASID?

Sorry to be a pain on this, but we need to understand the issue well
enough to maintain the workaround in the future!

> >If so, do you have any numbers with and without this

> >change?

> 

> We tried to measure it, but the impact is not measurable in the tests we

> have done.  Switching the mm is not often done so the extra ICache

> invalidation is rare.


Oh, sure. I was only interested in perf figures if this was a performance
problem rather than a functional one.

Will
Will Deacon Feb. 11, 2016, 1:07 p.m. UTC | #3
Hi David,

Thanks for the reply.

On Wed, Feb 10, 2016 at 10:42:18AM -0800, David Daney wrote:
> On 02/10/2016 10:15 AM, Will Deacon wrote:

> >On Wed, Feb 10, 2016 at 10:08:17AM -0800, David Daney wrote:

> >>On 02/10/2016 01:28 AM, Will Deacon wrote:

> >>>On Tue, Feb 09, 2016 at 11:29:16AM -0800, David Daney wrote:

> >>>>From: Andrew Pinski <apinski@cavium.com>

> >>>>

> >>>>On ThunderX T88 pass 1.x through 2.1 parts, broadcast TLBI

> >>>>instructions may cause the icache to become invalid if it contains

> >>>>data for a non-current ASID.

> >>>>

> >>>>This patch implements the workaround (which flushes the local icache

> >>>>when switching the mm) by using code patching.

> >>>

> >>>So, to be clear, is this "just" a performance problem as opposed to a

> >>>correctness issue?

> >>

> >>No.  It is a correctness issue.  Without this workaround in place, userspace

> >>programs end up executing the wrong instructions, which leads to

> >>unpredictable behavior and program crashes.

> >

> >Ok, so I think the description in the commit log isn't quite right. An

> >"invalid" line in i-cache simply means that it needs to be refetched.

> >What you're talking about sounds like data corruption.

> 

> Yes.  I guess I will be sending v3 with an improved description.


Yes, please!

> >I also don't understand how the workaround fixes things like TLBIs due

> >to copy-on-write faults triggered by another core.

> 

> Caveat: I don't fully understand the internal ICache implementation details.

> But ...

> 

> External broadcast TLBIs arriving for the current ASID (as set in TTBR0_EL1)

> are handled properly.  The issue is that cached data for other ASIDs, under

> some circumstances, may be inadvertently "blessed" into the current ASID.

> If we take care that no data for "foreign" ASIDs is in the Icache, the

> problematical case can never occur.


Ok, that makes sense. Maybe include this in the description too.

> >Also, what's the

> >interaction with virtual machines, or is the VMID not affected in the

> >same way as the ASID?

> 

> Ah, the $10^6 question.  Current information on how this interacts with KVM

> is less well developed.  We think the workaround doesn't cause failures in

> virtual machines.

> 

> I realize that this is different than asserting that virtual machines are

> guaranteed to operate error free.


So, to confirm, we don't need to flush the I-cache on world-switch in
KVM?

Will
diff mbox

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8cc6228..a969970 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -432,6 +432,17 @@  config CAVIUM_ERRATUM_23154
 
 	  If unsure, say Y.
 
+config CAVIUM_ERRATUM_27456
+	bool "Cavium erratum 27456: Broadcast TLBI instructions may cause the icache to become invalid"
+	default y
+	help
+	  On ThunderX T88 pass 1.x through 2.1 parts, broadcast TLBI
+	  instructions may cause the icache to become invalid if it
+	  contains data for a non-current ASID.  The fix is to flush
+	  the icache when changing the mm context.
+
+	  If unsure, say Y.
+
 endmenu
 
 
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 8f271b8..8136afc 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -30,8 +30,9 @@ 
 #define ARM64_HAS_LSE_ATOMICS			5
 #define ARM64_WORKAROUND_CAVIUM_23154		6
 #define ARM64_WORKAROUND_834220			7
+#define ARM64_WORKAROUND_CAVIUM_27456		8
 
-#define ARM64_NCAPS				8
+#define ARM64_NCAPS				9
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index feb6b4e..a3e846a 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -100,6 +100,15 @@  const struct arm64_cpu_capabilities arm64_errata[] = {
 		MIDR_RANGE(MIDR_THUNDERX, 0x00, 0x01),
 	},
 #endif
+#ifdef CONFIG_CAVIUM_ERRATUM_27456
+	{
+	/* Cavium ThunderX, T88 pass 1.x - 2.1 */
+		.desc = "Cavium erratum 27456",
+		.capability = ARM64_WORKAROUND_CAVIUM_27456,
+		MIDR_RANGE(MIDR_THUNDERX, 0x00,
+			   (1 << MIDR_VARIANT_SHIFT) | 1),
+	},
+#endif
 	{
 	}
 };
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index c164d2c..0f3be00 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -25,6 +25,8 @@ 
 #include <asm/hwcap.h>
 #include <asm/pgtable-hwdef.h>
 #include <asm/pgtable.h>
+#include <asm/cpufeature.h>
+#include <asm/alternative.h>
 
 #include "proc-macros.S"
 
@@ -137,7 +139,17 @@  ENTRY(cpu_do_switch_mm)
 	bfi	x0, x1, #48, #16		// set the ASID
 	msr	ttbr0_el1, x0			// set TTBR0
 	isb
+alternative_if_not ARM64_WORKAROUND_CAVIUM_27456
 	ret
+	nop
+	nop
+	nop
+alternative_else
+	ic	iallu
+	dsb	nsh
+	isb
+	ret
+alternative_endif
 ENDPROC(cpu_do_switch_mm)
 
 /*