Message ID | 20231109031927.1990570-1-janb@os.amperecomputing.com |
---|---|
State | New |
Headers | show |
Series | [v4] i2c: designware: Fix corrupted memory seen in the ISR | expand |
> On Mon, Nov 13, 2023 at 04:51:00AM -0500, Wolfram Sang wrote: >> >> Thanks to a restrictive hotel network, I haven't pushed out yet, and >> could still add your tags. Thanks! Hi Wolfram, Any chance we could get the "Cc: stable@vger.kernel.org" tag added to this patch? More than one large cloud company would like to see this in the stable kernel, as it significantly improves the reliability of IPMI transactions on platforms that use i2c for this communication. Sorry for not including this tag in the submission. Jan
> Any chance we could get the "Cc: stable@vger.kernel.org" tag added to this > patch? More than one large cloud company would like to see this in the Well, no, it is already pushed out... > stable kernel, as it significantly improves the reliability of IPMI > transactions on platforms that use i2c for this communication. ... but with that commit desc it will surely be backported nonetheless. If all fails, it can be manually requested to go to stable, but I am sure this will not be needed. > Sorry for not including this tag in the submission. No worries.
On 11/13/23 02:54, Wolfram Sang wrote: > On Thu, Nov 09, 2023 at 03:19:27AM +0000, Jan Bottorff wrote: >> When running on a many core ARM64 server, errors were >> happening in the ISR that looked like corrupted memory. These >> corruptions would fix themselves if small delays were inserted >> in the ISR. Errors reported by the driver included "i2c_designware >> APMC0D0F:00: i2c_dw_xfer_msg: invalid target address" and >> "i2c_designware APMC0D0F:00:controller timed out" during >> in-band IPMI SSIF stress tests. >> >> The problem was determined to be memory writes in the driver were not >> becoming visible to all cores when execution rapidly shifted between >> cores, like when a register write immediately triggers an ISR. >> Processors with weak memory ordering, like ARM64, make no >> guarantees about the order normal memory writes become globally >> visible, unless barrier instructions are used to control ordering. >> >> To solve this, regmap accessor functions configured by this driver >> were changed to use non-relaxed forms of the low-level register >> access functions, which include a barrier on platforms that require >> it. This assures memory writes before a controller register access are >> visible to all cores. The community concluded defaulting to correct >> operation outweighed defaulting to the small performance gains from >> using relaxed access functions. Being a low speed device added weight to >> this choice of default register access behavior. >> >> Signed-off-by: Jan Bottorff <janb@os.amperecomputing.com> > Applied to for-current, thanks! > A bit late but FYI: Tested-by: Yann Sionneau <ysionneau@kalrayinc.com>
diff --git a/drivers/i2c/busses/i2c-designware-common.c b/drivers/i2c/busses/i2c-designware-common.c index affcfb243f0f..35f762872b8a 100644 --- a/drivers/i2c/busses/i2c-designware-common.c +++ b/drivers/i2c/busses/i2c-designware-common.c @@ -63,7 +63,7 @@ static int dw_reg_read(void *context, unsigned int reg, unsigned int *val) { struct dw_i2c_dev *dev = context; - *val = readl_relaxed(dev->base + reg); + *val = readl(dev->base + reg); return 0; } @@ -72,7 +72,7 @@ static int dw_reg_write(void *context, unsigned int reg, unsigned int val) { struct dw_i2c_dev *dev = context; - writel_relaxed(val, dev->base + reg); + writel(val, dev->base + reg); return 0; } @@ -81,7 +81,7 @@ static int dw_reg_read_swab(void *context, unsigned int reg, unsigned int *val) { struct dw_i2c_dev *dev = context; - *val = swab32(readl_relaxed(dev->base + reg)); + *val = swab32(readl(dev->base + reg)); return 0; } @@ -90,7 +90,7 @@ static int dw_reg_write_swab(void *context, unsigned int reg, unsigned int val) { struct dw_i2c_dev *dev = context; - writel_relaxed(swab32(val), dev->base + reg); + writel(swab32(val), dev->base + reg); return 0; } @@ -99,8 +99,8 @@ static int dw_reg_read_word(void *context, unsigned int reg, unsigned int *val) { struct dw_i2c_dev *dev = context; - *val = readw_relaxed(dev->base + reg) | - (readw_relaxed(dev->base + reg + 2) << 16); + *val = readw(dev->base + reg) | + (readw(dev->base + reg + 2) << 16); return 0; } @@ -109,8 +109,8 @@ static int dw_reg_write_word(void *context, unsigned int reg, unsigned int val) { struct dw_i2c_dev *dev = context; - writew_relaxed(val, dev->base + reg); - writew_relaxed(val >> 16, dev->base + reg + 2); + writew(val, dev->base + reg); + writew(val >> 16, dev->base + reg + 2); return 0; }
When running on a many core ARM64 server, errors were happening in the ISR that looked like corrupted memory. These corruptions would fix themselves if small delays were inserted in the ISR. Errors reported by the driver included "i2c_designware APMC0D0F:00: i2c_dw_xfer_msg: invalid target address" and "i2c_designware APMC0D0F:00:controller timed out" during in-band IPMI SSIF stress tests. The problem was determined to be memory writes in the driver were not becoming visible to all cores when execution rapidly shifted between cores, like when a register write immediately triggers an ISR. Processors with weak memory ordering, like ARM64, make no guarantees about the order normal memory writes become globally visible, unless barrier instructions are used to control ordering. To solve this, regmap accessor functions configured by this driver were changed to use non-relaxed forms of the low-level register access functions, which include a barrier on platforms that require it. This assures memory writes before a controller register access are visible to all cores. The community concluded defaulting to correct operation outweighed defaulting to the small performance gains from using relaxed access functions. Being a low speed device added weight to this choice of default register access behavior. Signed-off-by: Jan Bottorff <janb@os.amperecomputing.com> --- ChangeLog v3->v4: add missing changelog v2->v3: regmap accessors use non-relaxed form instead of wmb barrier v1->v2: Commit message improvements v1: insert wmb barrier before enabling interrupts drivers/i2c/busses/i2c-designware-common.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)