diff mbox series

[v4] i2c: designware: Fix corrupted memory seen in the ISR

Message ID 20231109031927.1990570-1-janb@os.amperecomputing.com
State New
Headers show
Series [v4] i2c: designware: Fix corrupted memory seen in the ISR | expand

Commit Message

Jan Bottorff Nov. 9, 2023, 3:19 a.m. UTC
When running on a many core ARM64 server, errors were
happening in the ISR that looked like corrupted memory. These
corruptions would fix themselves if small delays were inserted
in the ISR. Errors reported by the driver included "i2c_designware
APMC0D0F:00: i2c_dw_xfer_msg: invalid target address" and
"i2c_designware APMC0D0F:00:controller timed out" during
in-band IPMI SSIF stress tests.

The problem was determined to be memory writes in the driver were not
becoming visible to all cores when execution rapidly shifted between
cores, like when a register write immediately triggers an ISR.
Processors with weak memory ordering, like ARM64, make no
guarantees about the order normal memory writes become globally
visible, unless barrier instructions are used to control ordering.

To solve this, regmap accessor functions configured by this driver
were changed to use non-relaxed forms of the low-level register
access functions, which include a barrier on platforms that require
it. This assures memory writes before a controller register access are
visible to all cores. The community concluded defaulting to correct
operation outweighed defaulting to the small performance gains from
using relaxed access functions. Being a low speed device added weight to
this choice of default register access behavior.

Signed-off-by: Jan Bottorff <janb@os.amperecomputing.com>
---
ChangeLog
v3->v4: add missing changelog
v2->v3: regmap accessors use non-relaxed form instead of wmb barrier
v1->v2: Commit message improvements
v1: insert wmb barrier before enabling interrupts

 drivers/i2c/busses/i2c-designware-common.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

Comments

Jan Bottorff Nov. 15, 2023, 6:25 p.m. UTC | #1
> On Mon, Nov 13, 2023 at 04:51:00AM -0500, Wolfram Sang wrote:
>>
>> Thanks to a restrictive hotel network, I haven't pushed out yet, and
>> could still add your tags. Thanks!

Hi Wolfram,

Any chance we could get the "Cc: stable@vger.kernel.org" tag added to 
this patch? More than one large cloud company would like to see this in 
the stable kernel, as it significantly improves the reliability of IPMI 
transactions on platforms that use i2c for this communication.

Sorry for not including this tag in the submission.

Jan
Wolfram Sang Nov. 15, 2023, 6:52 p.m. UTC | #2
> Any chance we could get the "Cc: stable@vger.kernel.org" tag added to this
> patch? More than one large cloud company would like to see this in the

Well, no, it is already pushed out...

> stable kernel, as it significantly improves the reliability of IPMI
> transactions on platforms that use i2c for this communication.

... but with that commit desc it will surely be backported nonetheless.
If all fails, it can be manually requested to go to stable, but I am
sure this will not be needed.

> Sorry for not including this tag in the submission.

No worries.
Yann Sionneau Nov. 16, 2023, 9:34 a.m. UTC | #3
On 11/13/23 02:54, Wolfram Sang wrote:
> On Thu, Nov 09, 2023 at 03:19:27AM +0000, Jan Bottorff wrote:
>> When running on a many core ARM64 server, errors were
>> happening in the ISR that looked like corrupted memory. These
>> corruptions would fix themselves if small delays were inserted
>> in the ISR. Errors reported by the driver included "i2c_designware
>> APMC0D0F:00: i2c_dw_xfer_msg: invalid target address" and
>> "i2c_designware APMC0D0F:00:controller timed out" during
>> in-band IPMI SSIF stress tests.
>>
>> The problem was determined to be memory writes in the driver were not
>> becoming visible to all cores when execution rapidly shifted between
>> cores, like when a register write immediately triggers an ISR.
>> Processors with weak memory ordering, like ARM64, make no
>> guarantees about the order normal memory writes become globally
>> visible, unless barrier instructions are used to control ordering.
>>
>> To solve this, regmap accessor functions configured by this driver
>> were changed to use non-relaxed forms of the low-level register
>> access functions, which include a barrier on platforms that require
>> it. This assures memory writes before a controller register access are
>> visible to all cores. The community concluded defaulting to correct
>> operation outweighed defaulting to the small performance gains from
>> using relaxed access functions. Being a low speed device added weight to
>> this choice of default register access behavior.
>>
>> Signed-off-by: Jan Bottorff <janb@os.amperecomputing.com>
> Applied to for-current, thanks!
>
A bit late but FYI:

Tested-by: Yann Sionneau <ysionneau@kalrayinc.com>
diff mbox series

Patch

diff --git a/drivers/i2c/busses/i2c-designware-common.c b/drivers/i2c/busses/i2c-designware-common.c
index affcfb243f0f..35f762872b8a 100644
--- a/drivers/i2c/busses/i2c-designware-common.c
+++ b/drivers/i2c/busses/i2c-designware-common.c
@@ -63,7 +63,7 @@  static int dw_reg_read(void *context, unsigned int reg, unsigned int *val)
 {
 	struct dw_i2c_dev *dev = context;
 
-	*val = readl_relaxed(dev->base + reg);
+	*val = readl(dev->base + reg);
 
 	return 0;
 }
@@ -72,7 +72,7 @@  static int dw_reg_write(void *context, unsigned int reg, unsigned int val)
 {
 	struct dw_i2c_dev *dev = context;
 
-	writel_relaxed(val, dev->base + reg);
+	writel(val, dev->base + reg);
 
 	return 0;
 }
@@ -81,7 +81,7 @@  static int dw_reg_read_swab(void *context, unsigned int reg, unsigned int *val)
 {
 	struct dw_i2c_dev *dev = context;
 
-	*val = swab32(readl_relaxed(dev->base + reg));
+	*val = swab32(readl(dev->base + reg));
 
 	return 0;
 }
@@ -90,7 +90,7 @@  static int dw_reg_write_swab(void *context, unsigned int reg, unsigned int val)
 {
 	struct dw_i2c_dev *dev = context;
 
-	writel_relaxed(swab32(val), dev->base + reg);
+	writel(swab32(val), dev->base + reg);
 
 	return 0;
 }
@@ -99,8 +99,8 @@  static int dw_reg_read_word(void *context, unsigned int reg, unsigned int *val)
 {
 	struct dw_i2c_dev *dev = context;
 
-	*val = readw_relaxed(dev->base + reg) |
-		(readw_relaxed(dev->base + reg + 2) << 16);
+	*val = readw(dev->base + reg) |
+		(readw(dev->base + reg + 2) << 16);
 
 	return 0;
 }
@@ -109,8 +109,8 @@  static int dw_reg_write_word(void *context, unsigned int reg, unsigned int val)
 {
 	struct dw_i2c_dev *dev = context;
 
-	writew_relaxed(val, dev->base + reg);
-	writew_relaxed(val >> 16, dev->base + reg + 2);
+	writew(val, dev->base + reg);
+	writew(val >> 16, dev->base + reg + 2);
 
 	return 0;
 }