mbox series

[net,0/3] Add I2C bus lock for Wangxun

Message ID 20240823030242.3083528-1-jiawenwu@trustnetic.com
Headers show
Series Add I2C bus lock for Wangxun | expand

Message

Jiawen Wu Aug. 23, 2024, 3:02 a.m. UTC
Sometimes the driver can not get the SFP information because the I2C bus
is accessed by the firmware at the same time. So we need to add the lock
on the I2C bus access. The hardware semaphores perform this lock.

Jiawen Wu (3):
  net: txgbe: add IO address in I2C platform device data
  i2c: designware: add device private data passing to lock functions
  i2c: designware: support hardware lock for Wangxun 10Gb NIC

 drivers/i2c/busses/Makefile                   |  1 +
 drivers/i2c/busses/i2c-designware-amdpsp.c    |  4 +-
 drivers/i2c/busses/i2c-designware-baytrail.c  | 14 +++-
 drivers/i2c/busses/i2c-designware-common.c    |  4 +-
 drivers/i2c/busses/i2c-designware-core.h      |  6 +-
 drivers/i2c/busses/i2c-designware-platdrv.c   |  3 +
 drivers/i2c/busses/i2c-designware-wx.c        | 65 +++++++++++++++++++
 .../net/ethernet/wangxun/txgbe/txgbe_phy.c    |  5 ++
 include/linux/platform_data/i2c-wx.h          | 11 ++++
 9 files changed, 105 insertions(+), 8 deletions(-)
 create mode 100644 drivers/i2c/busses/i2c-designware-wx.c
 create mode 100644 include/linux/platform_data/i2c-wx.h

Comments

Andrew Lunn Aug. 26, 2024, 1:32 a.m. UTC | #1
On Fri, Aug 23, 2024 at 11:02:39AM +0800, Jiawen Wu wrote:
> Sometimes the driver can not get the SFP information because the I2C bus
> is accessed by the firmware at the same time.

Please could you explain this some more. What firmware?

There some registers which are clear on read. They don't work when you
have multiple entities reading them.

	Andrew
Jiawen Wu Aug. 26, 2024, 2:04 a.m. UTC | #2
On Mon, Aug 26, 2024 9:33 AM, Andrew Lunn wrote:
> On Fri, Aug 23, 2024 at 11:02:39AM +0800, Jiawen Wu wrote:
> > Sometimes the driver can not get the SFP information because the I2C bus
> > is accessed by the firmware at the same time.
> 
> Please could you explain this some more. What firmware?

It's the firmware of our ethernet devices.

> There some registers which are clear on read. They don't work when you
> have multiple entities reading them.

I'm not trying to multiple access the I2C registers, but these registers cannot
be accessed by other interfaces in the process of reading complete information
each time. So there is a semaphore needed that locks up the entire read process.
Andrew Lunn Aug. 26, 2024, 2:33 a.m. UTC | #3
On Mon, Aug 26, 2024 at 10:04:42AM +0800, Jiawen Wu wrote:
> On Mon, Aug 26, 2024 9:33 AM, Andrew Lunn wrote:
> > On Fri, Aug 23, 2024 at 11:02:39AM +0800, Jiawen Wu wrote:
> > > Sometimes the driver can not get the SFP information because the I2C bus
> > > is accessed by the firmware at the same time.
> > 
> > Please could you explain this some more. What firmware?
> 
> It's the firmware of our ethernet devices.
> 
> > There some registers which are clear on read. They don't work when you
> > have multiple entities reading them.
> 
> I'm not trying to multiple access the I2C registers, but these registers cannot
> be accessed by other interfaces in the process of reading complete information
> each time. So there is a semaphore needed that locks up the entire read process.

More details please.

Linux assume it is driving the hardware. Your firmware cannot be
touching any registers which will clear on read. QSFP states that
registers 3-31 of page 0 are all clear on read, for example. The
firmware should also not be setting any registers, otherwise you can
confuse Linux which assumes registers it set stay set, because it is
controlling the hardware.

Your firmware also needs to handle that Linux can change the page. If
the firmware changes the page, it must restore it back to whatever
page Linux selected, etc.

The fact you are submitting this for net suggests you have seen real
issues. Please describe what those issues are.

	Andrew
Jiawen Wu Aug. 27, 2024, 2:21 a.m. UTC | #4
On Mon, Aug 26, 2024 10:34 AM, Andrew Lunn wrote:
> On Mon, Aug 26, 2024 at 10:04:42AM +0800, Jiawen Wu wrote:
> > On Mon, Aug 26, 2024 9:33 AM, Andrew Lunn wrote:
> > > On Fri, Aug 23, 2024 at 11:02:39AM +0800, Jiawen Wu wrote:
> > > > Sometimes the driver can not get the SFP information because the I2C bus
> > > > is accessed by the firmware at the same time.
> > >
> > > Please could you explain this some more. What firmware?
> >
> > It's the firmware of our ethernet devices.
> >
> > > There some registers which are clear on read. They don't work when you
> > > have multiple entities reading them.
> >
> > I'm not trying to multiple access the I2C registers, but these registers cannot
> > be accessed by other interfaces in the process of reading complete information
> > each time. So there is a semaphore needed that locks up the entire read process.
> 
> More details please.
> 
> Linux assume it is driving the hardware. Your firmware cannot be
> touching any registers which will clear on read. QSFP states that
> registers 3-31 of page 0 are all clear on read, for example. The
> firmware should also not be setting any registers, otherwise you can
> confuse Linux which assumes registers it set stay set, because it is
> controlling the hardware.
> 
> Your firmware also needs to handle that Linux can change the page. If
> the firmware changes the page, it must restore it back to whatever
> page Linux selected, etc.
> 
> The fact you are submitting this for net suggests you have seen real
> issues. Please describe what those issues are.

The error log shows:

[257681.367827] sfp sfp.1025: Host maximum power 1.0W
[257681.370813] txgbe 0000:04:00.1: 31.504 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x4 link at 0000:02:02.0 (capable
of 63.008 Gb/s with 8.0 GT/s PCIe x8 link)
[257681.373364] txgbe 0000:04:00.1 enp4s0f1: renamed from eth0
[257681.434719] txgbe 0000:04:00.1 enp4s0f1: configuring for inband/10gbase-r link mode
[257681.676747] sfp sfp.1025: EEPROM base structure checksum failure: 0x63 != 0x1f
[257681.676755] sfp EE: 00000000: 03 04 07 10 00 00 01 00 00 00 00 06 67 02 00 00  ............g...
[257681.676757] sfp EE: 00000010: 1e 0f 00 00 46 69 62 65 72 53 74 6f 72 65 20 64  ....FiberStore d
[257681.676759] sfp EE: 00000020: 20 20 20 20 00 00 1b 21 53 46 50 2d 31 30 47 53      ...!SFP-10GS
[257681.676760] sfp EE: 00000030: 52 2d 38 35 20 20 20 20 41 20 20 20 03 52 00 1f  R-85    A   .R..
[257681.676762] sfp EE: 00000040: 00 81 cd 5b df 25 0a bd 40 f6 c6 ce 47 8e ff ff  ...[.%..@...G...
[257681.676763] sfp EE: 00000050: 10 d8 24 33 44 8e ff ff 10 41 b0 9a ff ff ff ff  ..$3D....A......
 
It looks like some fields are read incorrectly. For comparison, I printed the
 SFP info when it loaded correctly:

[260908.194533] sfp EE: 00000000: 03 04 07 10 00 00 01 00 00 00 00 06 67 02 00 00  ............g...
[260908.194536] sfp EE: 00000010: 1e 0f 00 00 46 69 62 65 72 53 74 6f 72 65 20 20  ....FiberStore
[260908.194538] sfp EE: 00000020: 20 20 20 20 00 00 1b 21 53 46 50 2d 31 30 47 53      ...!SFP-10GS
[260908.194540] sfp EE: 00000030: 52 2d 38 35 20 20 20 20 41 20 20 20 03 52 00 1f  R-85    A   .R..
[260908.194541] sfp EE: 00000040: 40 63 bd df 40 8e ff ff 10 41 b0 9a ff ff ff ff  @c..@....A......
[260908.194543] sfp EE: 00000050: 10 58 5b 29 41 8e ff ff 10 41 b0 9a ff ff ff ff  .X[)A....A......
[260908.198205] sfp sfp.1025: module FiberStore       SFP-10GSR-85     rev A    sn G1804125607      dc 180605

Since the read mechanism of I2C is to write the offset and read command
first, and then read the target address. I think it's possible that the different
offsets be written at the same time, from Linux and firmware.
Jiawen Wu Aug. 27, 2024, 2:26 a.m. UTC | #5
On Fri, Aug 23, 2024 7:05 PM, Jarkko Nikula wrote:
> Hi
> Hi
> 
> On 8/23/24 6:02 AM, Jiawen Wu wrote:
> > Sometimes the driver can not get the SFP information because the I2C bus
> > is accessed by the firmware at the same time. So we need to add the lock
> > on the I2C bus access. The hardware semaphores perform this lock.
> >
> > Jiawen Wu (3):
> >    net: txgbe: add IO address in I2C platform device data
> >    i2c: designware: add device private data passing to lock functions
> >    i2c: designware: support hardware lock for Wangxun 10Gb NIC
> >
> I was wondering the Fixes tag use in the series. Obviously patchset is
> not fixing a regression so question is what happens when issue occurs?
> 
> I don't think e.g. failing I2C transfer with an error code yet qualifies
> backporting into Linux stable?

Ah, I think this was a leftover bug from when I added Wangxun NIC support
in I2C designware, so I added a fix tag.
Andrew Lunn Aug. 27, 2024, 12:18 p.m. UTC | #6
> > More details please.
> > 
> > Linux assume it is driving the hardware. Your firmware cannot be
> > touching any registers which will clear on read. QSFP states that
> > registers 3-31 of page 0 are all clear on read, for example. The
> > firmware should also not be setting any registers, otherwise you can
> > confuse Linux which assumes registers it set stay set, because it is
> > controlling the hardware.
> > 
> > Your firmware also needs to handle that Linux can change the page. If
> > the firmware changes the page, it must restore it back to whatever
> > page Linux selected, etc.
> > 
> > The fact you are submitting this for net suggests you have seen real
> > issues. Please describe what those issues are.
> 
> The error log shows:
> 
> [257681.367827] sfp sfp.1025: Host maximum power 1.0W
> [257681.370813] txgbe 0000:04:00.1: 31.504 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x4 link at 0000:02:02.0 (capable
> of 63.008 Gb/s with 8.0 GT/s PCIe x8 link)
> [257681.373364] txgbe 0000:04:00.1 enp4s0f1: renamed from eth0
> [257681.434719] txgbe 0000:04:00.1 enp4s0f1: configuring for inband/10gbase-r link mode
> [257681.676747] sfp sfp.1025: EEPROM base structure checksum failure: 0x63 != 0x1f
> [257681.676755] sfp EE: 00000000: 03 04 07 10 00 00 01 00 00 00 00 06 67 02 00 00  ............g...
> [257681.676757] sfp EE: 00000010: 1e 0f 00 00 46 69 62 65 72 53 74 6f 72 65 20 64  ....FiberStore d
> [257681.676759] sfp EE: 00000020: 20 20 20 20 00 00 1b 21 53 46 50 2d 31 30 47 53      ...!SFP-10GS
> [257681.676760] sfp EE: 00000030: 52 2d 38 35 20 20 20 20 41 20 20 20 03 52 00 1f  R-85    A   .R..
> [257681.676762] sfp EE: 00000040: 00 81 cd 5b df 25 0a bd 40 f6 c6 ce 47 8e ff ff  ...[.%..@...G...
> [257681.676763] sfp EE: 00000050: 10 d8 24 33 44 8e ff ff 10 41 b0 9a ff ff ff ff  ..$3D....A......
>  
> It looks like some fields are read incorrectly. For comparison, I printed the
>  SFP info when it loaded correctly:
> 
> [260908.194533] sfp EE: 00000000: 03 04 07 10 00 00 01 00 00 00 00 06 67 02 00 00  ............g...
> [260908.194536] sfp EE: 00000010: 1e 0f 00 00 46 69 62 65 72 53 74 6f 72 65 20 20  ....FiberStore
> [260908.194538] sfp EE: 00000020: 20 20 20 20 00 00 1b 21 53 46 50 2d 31 30 47 53      ...!SFP-10GS
> [260908.194540] sfp EE: 00000030: 52 2d 38 35 20 20 20 20 41 20 20 20 03 52 00 1f  R-85    A   .R..
> [260908.194541] sfp EE: 00000040: 40 63 bd df 40 8e ff ff 10 41 b0 9a ff ff ff ff  @c..@....A......
> [260908.194543] sfp EE: 00000050: 10 58 5b 29 41 8e ff ff 10 41 b0 9a ff ff ff ff  .X[)A....A......
> [260908.198205] sfp sfp.1025: module FiberStore       SFP-10GSR-85     rev A    sn G1804125607      dc 180605
> 
> Since the read mechanism of I2C is to write the offset and read command
> first, and then read the target address. I think it's possible that the different
> offsets be written at the same time, from Linux and firmware.
 
O.K, that is bad. The SFP is totally unreliable...

You however have still not answered my question. What is the firmware
accessing? How does it handle pages?

The hack you have put in place is per i2c transaction. But accessing
pages is likely to be multiple transactions. One to change the page,
followed by a few reads/writes in the new page, then maybe followed by
a transactions to return to page 0.

I think your best solution is to simply take the mutex and never
release it. Block your firmware from accessing the SFP.

	Andrew