Message ID | CY4PR1701MB1878B85B9E1C5B4FDCBA2860DF160@CY4PR1701MB1878.namprd17.prod.outlook.com |
---|---|
Headers | show |
Series | Restore and fix PHY reset for SMSC LAN8720 | expand |
On Tue, 27 Oct 2020 23:25:01 +0000 Badel, Laurent wrote: > Subject: [PATCH net 0/4] Restore and fix PHY reset for SMSC LAN8720 > > Description: > A recent patchset [1] added support in the SMSC PHY driver for managing > the ref clock and therefore removed the PHY_RST_AFTER_CLK_EN flag for the > LAN8720 chip. The ref clock is passed to the SMSC driver through a new > property "clocks" in the device tree. > > There appears to be two potential caveats: > (i) Building kernel 5.9 without updating the DT with the "clocks" > property for SMSC PHY, would break systems previously relying on the PHY > reset workaround (SMSC driver cannot grab the ref clock, so it is still > managed by FEC, but the PHY is not reset because PHY_RST_AFTER_CLK_EN is > not set). This may lead to occasional loss of ethernet connectivity in > these systems, that is difficult to debug. > > (ii) This defeats the purpose of a previous commit [2] that disabled the > ref clock for power saving reasons. If a ref clock for the PHY is > specified in DT, the SMSC driver will keep it always on (confirmed with > scope). While this removes the need for additional PHY resets (only a > single reset is needed after power up), this prevents the FEC from saving > power by disabling the refclk. Since there may be use cases where one is > interested in saving power, keep this option available when no ref clock > is specified for the PHY, by fixing issues with the PHY reset. > > Main changes proposed to address this: > (a) Restore PHY_RST_AFTER_CLK_EN for LAN8720, but explicitly clear it if > the SMSC driver succeeds in retrieving the ref clock. > (b) Fix phy_reset_after_clk_enable() to work in interrupt mode, by > re-configuring the PHY registers after reset. > > Tests: against net tree 5.9, including allyes/no/modconfig. 10 pieces of > an iMX28-EVK-based board were tested, 3 of which were found to exhibit > issues when the "clocks" property was left unset. Issues were fixed by > the present patchset. > > References: > [1] commit d65af21842f8 ("net: phy: smsc: LAN8710/20: remove > PHY_RST_AFTER_CLK_EN flag") > commit bedd8d78aba3 ("net: phy: smsc: LAN8710/20: add phy refclk in > support") > [2] commit e8fcfcd5684a ("net: fec: optimize the clock management to save > power") Please resend with git send-email, if you can. All the patches have a "Subject: [PATCH" line in the message body, and Fixes tags are line-wrapped (they should be one line even if they are long). > Laurent Badel (5): > net:phy:smsc: enable PHY_RST_AFTER_CLK_EN if ref clock is not set > net:phy:smsc: expand documentation of clocks property > net:phy: add phy_device_reset_status() support > net:phy: fix phy_reset_after_clk_enable() > net:phy: add SMSC PHY reset on PM restore There are only 4 patches in the series.
Hi, thanks for your patches :) On 20-10-27 23:25, Badel, Laurent wrote: > Subject: [PATCH net 0/4] Restore and fix PHY reset for SMSC LAN8720 > > Description: > A recent patchset [1] added support in the SMSC PHY driver for managing > the ref clock and therefore removed the PHY_RST_AFTER_CLK_EN flag for the > LAN8720 chip. The ref clock is passed to the SMSC driver through a new > property "clocks" in the device tree. > > There appears to be two potential caveats: > (i) Building kernel 5.9 without updating the DT with the "clocks" > property for SMSC PHY, would break systems previously relying on the PHY > reset workaround (SMSC driver cannot grab the ref clock, so it is still > managed by FEC, but the PHY is not reset because PHY_RST_AFTER_CLK_EN is > not set). This may lead to occasional loss of ethernet connectivity in > these systems, that is difficult to debug. IMHO reyling on PHY_RST_AFTER_CLK_EN was broken since the day of adding this feature because: 1st) Each host driver needs to call the phy-reset logic. So this isn't a fix for all hosts using a LAN8720 phy. 2st) It interacts realy bad with the phy state machine. Only the state machine should be able to do this. Why can't you add the clock? > (ii) This defeats the purpose of a previous commit [2] that disabled the > ref clock for power saving reasons. If a ref clock for the PHY is > specified in DT, the SMSC driver will keep it always on (confirmed with > scope). NACK, the clock provider can be any clock. This has nothing to do with the FEC clocks. The FEC _can_ be used as clock provider. > While this removes the need for additional PHY resets (only a > single reset is needed after power up), this prevents the FEC from saving > power by disabling the refclk. Since there may be use cases where one is > interested in saving power, You can't just turn off the clock for the LAN8720 because of the phy internal state machine. The state machine gets confused if the clock is turned off/on randomly. > keep this option available when no ref clock > is specified for the PHY, by fixing issues with the PHY reset. IMHO pulling the reset line everytime has a few disadvantages: - You need to ensure that the strapping pins are correct and - You need to ensure that the reset logic including the reset delays are keeped. > Main changes proposed to address this: > (a) Restore PHY_RST_AFTER_CLK_EN for LAN8720, but explicitly clear it if > the SMSC driver succeeds in retrieving the ref clock. IMHO NACK since this was the wrong approach. > (b) Fix phy_reset_after_clk_enable() to work in interrupt mode, by > re-configuring the PHY registers after reset. > > Tests: against net tree 5.9, including allyes/no/modconfig. 10 pieces of > an iMX28-EVK-based board were tested, 3 of which were found to exhibit > issues when the "clocks" property was left unset. Issues were fixed by > the present patchset. All iMX machines are now DT-based why can't you just add the correct clock provider? Regards, Marco
> ----------------------------- Eaton Industries Manufacturing GmbH ~ Registered place of business: Route de la Longeraie 7, 1110, Morges, Switzerland ----------------------------- -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Thursday, October 29, 2020 12:10 AM > To: Badel, Laurent <LaurentBadel@eaton.com> <snip> > Subject: [EXTERNAL] Re: [PATCH net 0/4] Restore and fix PHY reset for SMSC > LAN8720 > > On Tue, 27 Oct 2020 23:25:01 +0000 Badel, Laurent wrote: > > Subject: [PATCH net 0/4] Restore and fix PHY reset for SMSC LAN8720 > > > > Description: > > A recent patchset [1] added support in the SMSC PHY driver for > > managing the ref clock and therefore removed the > PHY_RST_AFTER_CLK_EN > > flag for the > > LAN8720 chip. The ref clock is passed to the SMSC driver through a new > > property "clocks" in the device tree. > > > > There appears to be two potential caveats: > > (i) Building kernel 5.9 without updating the DT with the "clocks" > > property for SMSC PHY, would break systems previously relying on the > > PHY reset workaround (SMSC driver cannot grab the ref clock, so it is > > still managed by FEC, but the PHY is not reset because > > PHY_RST_AFTER_CLK_EN is not set). This may lead to occasional loss of > > ethernet connectivity in these systems, that is difficult to debug. > > > > (ii) This defeats the purpose of a previous commit [2] that disabled > > the ref clock for power saving reasons. If a ref clock for the PHY is > > specified in DT, the SMSC driver will keep it always on (confirmed > > with scope). While this removes the need for additional PHY resets > > (only a single reset is needed after power up), this prevents the FEC > > from saving power by disabling the refclk. Since there may be use > > cases where one is interested in saving power, keep this option > > available when no ref clock is specified for the PHY, by fixing issues with > the PHY reset. > > > > Main changes proposed to address this: > > (a) Restore PHY_RST_AFTER_CLK_EN for LAN8720, but explicitly clear it > > if the SMSC driver succeeds in retrieving the ref clock. > > (b) Fix phy_reset_after_clk_enable() to work in interrupt mode, by > > re-configuring the PHY registers after reset. > > > > Tests: against net tree 5.9, including allyes/no/modconfig. 10 pieces > > of an iMX28-EVK-based board were tested, 3 of which were found to > > exhibit issues when the "clocks" property was left unset. Issues were > > fixed by the present patchset. > > > > References: > > [1] commit d65af21842f8 ("net: phy: smsc: LAN8710/20: remove > > PHY_RST_AFTER_CLK_EN flag") > > commit bedd8d78aba3 ("net: phy: smsc: LAN8710/20: add phy refclk in > > support") > > [2] commit e8fcfcd5684a ("net: fec: optimize the clock management to > save > > power") > > Please resend with git send-email, if you can. > My apologies. I will see if I manage to set up git to send emails with my account, but if not I will make sure to check the formatting more thoroughly. Thanks also for taking the time to detail the defects. Best regards. > All the patches have a "Subject: [PATCH" line in the message body, and Fixes > tags are line-wrapped (they should be one line even if they are long). > > > Laurent Badel (5): > > net:phy:smsc: enable PHY_RST_AFTER_CLK_EN if ref clock is not set > > net:phy:smsc: expand documentation of clocks property > > net:phy: add phy_device_reset_status() support > > net:phy: fix phy_reset_after_clk_enable() > > net:phy: add SMSC PHY reset on PM restore > > There are only 4 patches in the series.
Hi Marco, Thank you very much for your time reviewing and your helpful comments. Also sorry for the late reply. Please see my responses below. These are only my thoughts but I would be very interested to have your feedback if you don't mind and have time for this. I've been pulling my hair with this PHY for quite some time and am still pondering what would be the best solution to fix this once and for all. > ----------------------------- Eaton Industries Manufacturing GmbH ~ Registered place of business: Route de la Longeraie 7, 1110, Morges, Switzerland ----------------------------- -----Original Message----- > From: Marco Felsch <m.felsch@pengutronix.de> > Sent: Thursday, October 29, 2020 9:16 AM > To: Badel, Laurent <LaurentBadel@eaton.com> > Cc: davem@davemloft.net; fugang.duan@nxp.com; kuba@kernel.org; > andrew@lunn.ch; Heiner Kallweit <hkallweit1@gmail.com>; > linux@armlinux.org.uk; p.zabel@pengutronix.de; lgirdwood@gmail.com; > broonie@kernel.org; robh+dt@kernel.org; richard.leitner@skidata.com; > netdev@vger.kernel.org; devicetree@vger.kernel.org; f.fainelli@gmail.com; > Quette, Arnaud <ArnaudQuette@Eaton.com> > Subject: [EXTERNAL] Re: [PATCH net 0/4] Restore and fix PHY reset for SMSC > LAN8720 > > Hi, > > thanks for your patches :) > > On 20-10-27 23:25, Badel, Laurent wrote: > > Subject: [PATCH net 0/4] Restore and fix PHY reset for SMSC LAN8720 > > > > Description: > > A recent patchset [1] added support in the SMSC PHY driver for > > managing the ref clock and therefore removed the > PHY_RST_AFTER_CLK_EN > > flag for the > > LAN8720 chip. The ref clock is passed to the SMSC driver through a new > > property "clocks" in the device tree. > > > > There appears to be two potential caveats: > > (i) Building kernel 5.9 without updating the DT with the "clocks" > > property for SMSC PHY, would break systems previously relying on the > > PHY reset workaround (SMSC driver cannot grab the ref clock, so it is > > still managed by FEC, but the PHY is not reset because > > PHY_RST_AFTER_CLK_EN is not set). This may lead to occasional loss of > > ethernet connectivity in these systems, that is difficult to debug. > > IMHO reyling on PHY_RST_AFTER_CLK_EN was broken since the day of > adding this feature because: > > 1st) Each host driver needs to call the phy-reset logic. So this isn't a > fix for all hosts using a LAN8720 phy. > 2st) It interacts realy bad with the phy state machine. Only the state > machine should be able to do this. > I absolutely agree with this, but on the other hand it seems to me that issues were actually reported only with the FEC driver, precisely because the FEC enables/disables the ref clock on opening/closing the interface. This means that there is a time between the initial probing of the driver, and the first time the interface is brought up, where the ref clk is turned off, but the PHY is not held in reset, and this is what causes most if not all of the issues in my opinion. If there is a way to keep the PHY in reset during that time, then the problem would be equally solved. I will look in direction for a possible resubmission. I do wonder if there may be issues with suspend-to-ram/hibernation. I am unsure as I can only test with pm_debug, but when doing so I find that the clock stays on during hibernation, so the system resumes without problem. Would that also be the case during a real hibernation? If not then that would be a case where an additional PHY reset would be needed. Any thoughts on this? > Why can't you add the clock? Agree this would be straightforward, my concern was that if for any reason one fails to add the clock to the DT, this results in a seemingly working system, but there might be issues in perhaps 1/100 cases which would be difficult to identify and/or diagnose (I work myself with a board that has these issues and it is really difficult to even reproduce the problem since it occurs in 1% or so of boards in production). > > > (ii) This defeats the purpose of a previous commit [2] that disabled > > the ref clock for power saving reasons. If a ref clock for the PHY is > > specified in DT, the SMSC driver will keep it always on (confirmed > > with scope). > > NACK, the clock provider can be any clock. This has nothing to do with the > FEC clocks. The FEC _can_ be used as clock provider. I'm sure you understand this much better than I do. What I can say is that I directly measured the ref clk and found that when I add the clock to the DT the clock stays on forever. Basically it seems like the FEC calls to clk_disable_unprepare() don't work in this case, though I'm not sure about the reason behind this. I will investigate further to make sure I understand what is really going on. > > > While this removes the need for additional PHY resets (only a single > > reset is needed after power up), this prevents the FEC from saving > > power by disabling the refclk. Since there may be use cases where one > > is interested in saving power, > > You can't just turn off the clock for the LAN8720 because of the phy internal > state machine. The state machine gets confused if the clock is turned off/on > randomly. My understanding was that it is the PHY hardware that gets confused, rather than the state machine. Indeed removing the clock doesn't seem to me like a good thing to do anyway, but I would guess that whoever does it, should also be responsible for dealing with the consequences (in this case, performing a reset when re-enabling the clock). Of course, keeping the clock on is also an option, but if we can save power by disabling it, then perhaps there is some added value in doing so? Otherwise, shouldn't the FEC be fixed directly if it is determined that there is no added value in turning off the clocks? > > > keep this option available when no ref clock is specified for the PHY, > > by fixing issues with the PHY reset. > > IMHO pulling the reset line everytime has a few disadvantages: > - You need to ensure that the strapping pins are correct and > - You need to ensure that the reset logic including the reset delays > are keeped. > I agree, but in my experience the LAN8720 absolutely needs one reset after power up (this is specified in a SMSC schematic checklist as "A hardware reset (nRST assertion) is required following power-up"). This works out well because during initial probing of the driver the reset is asserted, but if the reset logic is bad I think you might experience problems anyway. > > Main changes proposed to address this: > > (a) Restore PHY_RST_AFTER_CLK_EN for LAN8720, but explicitly clear it > > if the SMSC driver succeeds in retrieving the ref clock. > > IMHO NACK since this was the wrong approach. Still I wonder if ensuring backwards compatibility wouldn't be a good thing? > > > (b) Fix phy_reset_after_clk_enable() to work in interrupt mode, by > > re-configuring the PHY registers after reset. > > > > Tests: against net tree 5.9, including allyes/no/modconfig. 10 pieces > > of an iMX28-EVK-based board were tested, 3 of which were found to > > exhibit issues when the "clocks" property was left unset. Issues were > > fixed by the present patchset. > > All iMX machines are now DT-based why can't you just add the correct clock > provider? I would probably, assuming I know about it. In my case I pulled the 5.9 sources and found out "accidentally" that the PHY reset behavior had changed. I then did some research and found your commits. Of course, I'm at fault because I didn't do my homework of going through the changelog, but I thought maybe I won't be the only one to make this mistake. Besides, having worked with the (flawed indeed) reset workaround, I did build a certain level of confidence that it works. Very honestly, if I had a choice I might hesitate to switch to the new clock management, as to be thorough I would have to re-test everything. > > Regards, > Marco > -----Original Message----- > From: Marco Felsch <m.felsch@pengutronix.de> > Sent: Thursday, October 29, 2020 9:16 AM > To: Badel, Laurent <LaurentBadel@eaton.com> > Cc: davem@davemloft.net; fugang.duan@nxp.com; kuba@kernel.org; > andrew@lunn.ch; Heiner Kallweit <hkallweit1@gmail.com>; > linux@armlinux.org.uk; p.zabel@pengutronix.de; lgirdwood@gmail.com; > broonie@kernel.org; robh+dt@kernel.org; richard.leitner@skidata.com; > netdev@vger.kernel.org; devicetree@vger.kernel.org; f.fainelli@gmail.com; > Quette, Arnaud <ArnaudQuette@Eaton.com> > Subject: [EXTERNAL] Re: [PATCH net 0/4] Restore and fix PHY reset for SMSC > LAN8720 > > Hi, > > thanks for your patches :) > > On 20-10-27 23:25, Badel, Laurent wrote: > > Subject: [PATCH net 0/4] Restore and fix PHY reset for SMSC LAN8720 > > > > Description: > > A recent patchset [1] added support in the SMSC PHY driver for > > managing the ref clock and therefore removed the > PHY_RST_AFTER_CLK_EN > > flag for the > > LAN8720 chip. The ref clock is passed to the SMSC driver through a new > > property "clocks" in the device tree. > > > > There appears to be two potential caveats: > > (i) Building kernel 5.9 without updating the DT with the "clocks" > > property for SMSC PHY, would break systems previously relying on the > > PHY reset workaround (SMSC driver cannot grab the ref clock, so it is > > still managed by FEC, but the PHY is not reset because > > PHY_RST_AFTER_CLK_EN is not set). This may lead to occasional loss of > > ethernet connectivity in these systems, that is difficult to debug. > > IMHO reyling on PHY_RST_AFTER_CLK_EN was broken since the day of > adding this feature because: > > 1st) Each host driver needs to call the phy-reset logic. So this isn't a > fix for all hosts using a LAN8720 phy. > 2st) It interacts realy bad with the phy state machine. Only the state > machine should be able to do this. > > Why can't you add the clock? > > > (ii) This defeats the purpose of a previous commit [2] that disabled > > the ref clock for power saving reasons. If a ref clock for the PHY is > > specified in DT, the SMSC driver will keep it always on (confirmed > > with scope). > > NACK, the clock provider can be any clock. This has nothing to do with the > FEC clocks. The FEC _can_ be used as clock provider. > > > While this removes the need for additional PHY resets (only a single > > reset is needed after power up), this prevents the FEC from saving > > power by disabling the refclk. Since there may be use cases where one > > is interested in saving power, > > You can't just turn off the clock for the LAN8720 because of the phy internal > state machine. The state machine gets confused if the clock is turned off/on > randomly. > > > keep this option available when no ref clock is specified for the PHY, > > by fixing issues with the PHY reset. > > IMHO pulling the reset line everytime has a few disadvantages: > - You need to ensure that the strapping pins are correct and > - You need to ensure that the reset logic including the reset delays > are keeped. > > > Main changes proposed to address this: > > (a) Restore PHY_RST_AFTER_CLK_EN for LAN8720, but explicitly clear it > > if the SMSC driver succeeds in retrieving the ref clock. > > IMHO NACK since this was the wrong approach. > > > (b) Fix phy_reset_after_clk_enable() to work in interrupt mode, by > > re-configuring the PHY registers after reset. > > > > Tests: against net tree 5.9, including allyes/no/modconfig. 10 pieces > > of an iMX28-EVK-based board were tested, 3 of which were found to > > exhibit issues when the "clocks" property was left unset. Issues were > > fixed by the present patchset. > > All iMX machines are now DT-based why can't you just add the correct clock > provider? > > Regards, > Marco
> > > (ii) This defeats the purpose of a previous commit [2] that disabled > > > the ref clock for power saving reasons. If a ref clock for the PHY is > > > specified in DT, the SMSC driver will keep it always on (confirmed > > > with scope). > > > > NACK, the clock provider can be any clock. This has nothing to do with the > > FEC clocks. The FEC _can_ be used as clock provider. > > I'm sure you understand this much better than I do. What I can say is that I > directly measured the ref clk and found that when I add the clock to the DT > the clock stays on forever. Basically it seems like the FEC calls to > clk_disable_unprepare() don't work in this case, though I'm not sure about the > reason behind this. The reason is easy to explain. The clock API is reference counted. It counts how many times a clock is turned on and off. A clock has to be turned off as many times as it was turned on before the hardware actually turns off. So you have the FEC turning the clock on during probe, followed by the phy turning the clock on. Some time later the FEC turns the clock off for run time power saving, but there is still one reference to the clock held by the PHY, so the hardware is left ticking. Andrew
> ----------------------------- Eaton Industries Manufacturing GmbH ~ Registered place of business: Route de la Longeraie 7, 1110, Morges, Switzerland ----------------------------- -----Original Message----- > From: Andrew Lunn <andrew@lunn.ch> > Sent: Wednesday, November 04, 2020 2:11 PM > To: Badel, Laurent <LaurentBadel@eaton.com> > Cc: Marco Felsch <m.felsch@pengutronix.de>; davem@davemloft.net; > fugang.duan@nxp.com; kuba@kernel.org; Heiner Kallweit > <hkallweit1@gmail.com>; linux@armlinux.org.uk; p.zabel@pengutronix.de; > lgirdwood@gmail.com; broonie@kernel.org; robh+dt@kernel.org; > richard.leitner@skidata.com; netdev@vger.kernel.org; > devicetree@vger.kernel.org; f.fainelli@gmail.com; Quette, Arnaud > <ArnaudQuette@Eaton.com> > Subject: Re: [EXTERNAL] Re: [PATCH net 0/4] Restore and fix PHY reset for > SMSC LAN8720 > > > > > (ii) This defeats the purpose of a previous commit [2] that > > > > disabled the ref clock for power saving reasons. If a ref clock > > > > for the PHY is specified in DT, the SMSC driver will keep it > > > > always on (confirmed with scope). > > > > > > NACK, the clock provider can be any clock. This has nothing to do > > > with the FEC clocks. The FEC _can_ be used as clock provider. > > > > I'm sure you understand this much better than I do. What I can say is > > that I directly measured the ref clk and found that when I add the > > clock to the DT the clock stays on forever. Basically it seems like > > the FEC calls to > > clk_disable_unprepare() don't work in this case, though I'm not sure > > about the reason behind this. > > The reason is easy to explain. The clock API is reference counted. It counts > how many times a clock is turned on and off. A clock has to be turned off as > many times as it was turned on before the hardware actually turns off. So > you have the FEC turning the clock on during probe, followed by the phy > turning the clock on. Some time later the FEC turns the clock off for run time > power saving, but there is still one reference to the clock held by the PHY, so > the hardware is left ticking. > > Andrew That makes a lot of sense, thanks very much for the explanation.