Message ID | 20210404100701.6366-1-qiangqing.zhang@nxp.com |
---|---|
State | New |
Headers | show |
Series | net: phy: fix PHY possibly unwork after MDIO bus resume back | expand |
On 4/5/21 12:48 AM, Heiner Kallweit wrote: > On 04.04.2021 16:09, Heiner Kallweit wrote: >> On 04.04.2021 12:07, Joakim Zhang wrote: >>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram may cut >>> off PHY power") invokes phy_init_hw() when MDIO bus resume, it will >>> soft reset PHY if PHY driver implements soft_reset callback. >>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset callback to >>> genphy_soft_reset for KSZ8081") adds soft_reset for KSZ8081. After these >>> two patches, I found i.MX6UL 14x14 EVK which connected to KSZ8081RNB doesn't >>> work any more when system resume back, MAC driver is fec_main.c. >>> >>> It's obvious that initializing PHY hardware when MDIO bus resume back >>> would introduce some regression when PHY implements soft_reset. When I >> >> Why is this obvious? Please elaborate on why a soft reset should break >> something. >> >>> am debugging, I found PHY works fine if MAC doesn't support suspend/resume >>> or phy_stop()/phy_start() doesn't been called during suspend/resume. This >>> let me realize, PHY state machine phy_state_machine() could do something >>> breaks the PHY. >>> >>> As we known, MAC resume first and then MDIO bus resume when system >>> resume back from suspend. When MAC resume, usually it will invoke >>> phy_start() where to change PHY state to PHY_UP, then trigger the stat> machine to run now. In phy_state_machine(), it will start/config >>> auto-nego, then change PHY state to PHY_NOLINK, what to next is >>> periodically check PHY link status. When MDIO bus resume, it will >>> initialize PHY hardware, including soft_reset, what would soft_reset >>> affect seems various from different PHYs. For KSZ8081RNB, when it in >>> PHY_NOLINK state and then perform a soft reset, it will never complete >>> auto-nego. >> >> Why? That would need to be checked in detail. Maybe chip errata >> documentation provides a hint. >> > > The KSZ8081 spec says the following about bit BMCR_PDOWN: > > If software reset (Register 0.15) is > used to exit power-down mode > (Register 0.11 = 1), two software > reset writes (Register 0.15 = 1) are > required. The first write clears > power-down mode; the second > write resets the chip and re-latches > the pin strapping pin values. > > Maybe this causes the issue you see and genphy_soft_reset() isn't > appropriate for this PHY. Please re-test with the KSZ8081 soft reset > following the spec comment. > Interesting. Never expected that behavior. Thanks for catching it. Skimmed through the datasheets/erratas. This is what I found (micrel.c): 10/100: 8001 - Unaffected? 8021/8031 - Double reset after PDOWN. 8041 - Errata. PDOWN broken. Recommended do not use. Unclear if reset solves the issue since errata says no error after reset but is also claiming that only toggling PDOWN (may) or power will help. 8051 - Double reset after PDOWN. 8061 - Double reset after PDOWN. 8081 - Double reset after PDOWN. 8091 - Double reset after PDOWN. 10/100/1000: Nothing in gigabit afaics. Switches: 8862 - Not affected? 8863 - Errata. PDOWN broken. Reset will not help. Workaround exists. 8864 - Not affected? 8873 - Errata. PDOWN broken. Reset will not help. Workaround exists. 9477 - Errata. PDOWN broken. Will randomly cause link failure on adjacent links. Do not use. This certainly explains a lot. >>> >>> This patch changes PHY state to PHY_UP when MDIO bus resume back, it >>> should be reasonable after PHY hardware re-initialized. Also give state >>> machine a chance to start/config auto-nego again. >>> >> >> If the MAC driver calls phy_stop() on suspend, then phydev->suspended >> is true and mdio_bus_phy_may_suspend() returns false. As a consequence >> phydev->suspended_by_mdio_bus is false and mdio_bus_phy_resume() >> skips the PHY hw initialization. >> Please also note that mdio_bus_phy_suspend() calls phy_stop_machine() >> that sets the state to PHY_UP. >> > > Forgot that MDIO bus suspend is done before MAC driver suspend. > Therefore disregard this part for now. > >> Having said that the current argumentation isn't convincing. I'm not >> aware of such issues on other systems, therefore it's likely that >> something is system-dependent. >> >> Please check the exact call sequence on your system, maybe it >> provides a hint. >> >>> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> >>> --- >>> drivers/net/phy/phy_device.c | 7 +++++++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c >>> index cc38e326405a..312a6f662481 100644 >>> --- a/drivers/net/phy/phy_device.c >>> +++ b/drivers/net/phy/phy_device.c >>> @@ -306,6 +306,13 @@ static __maybe_unused int mdio_bus_phy_resume(struct device *dev) >>> ret = phy_resume(phydev); >>> if (ret < 0) >>> return ret; >>> + >>> + /* PHY state could be changed to PHY_NOLINK from MAC controller resume >>> + * rounte with phy_start(), here change to PHY_UP after re-initializing >>> + * PHY hardware, let PHY state machine to start/config auto-nego again. >>> + */ >>> + phydev->state = PHY_UP; >>> + >>> no_resume: >>> if (phydev->attached_dev && phydev->adjust_link) >>> phy_start_machine(phydev); >>> >> >
On 05.04.2021 10:43, Christian Melki wrote: > On 4/5/21 12:48 AM, Heiner Kallweit wrote: >> On 04.04.2021 16:09, Heiner Kallweit wrote: >>> On 04.04.2021 12:07, Joakim Zhang wrote: >>>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram may cut >>>> off PHY power") invokes phy_init_hw() when MDIO bus resume, it will >>>> soft reset PHY if PHY driver implements soft_reset callback. >>>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset callback to >>>> genphy_soft_reset for KSZ8081") adds soft_reset for KSZ8081. After these >>>> two patches, I found i.MX6UL 14x14 EVK which connected to KSZ8081RNB doesn't >>>> work any more when system resume back, MAC driver is fec_main.c. >>>> >>>> It's obvious that initializing PHY hardware when MDIO bus resume back >>>> would introduce some regression when PHY implements soft_reset. When I >>> >>> Why is this obvious? Please elaborate on why a soft reset should break >>> something. >>> >>>> am debugging, I found PHY works fine if MAC doesn't support suspend/resume >>>> or phy_stop()/phy_start() doesn't been called during suspend/resume. This >>>> let me realize, PHY state machine phy_state_machine() could do something >>>> breaks the PHY. >>>> >>>> As we known, MAC resume first and then MDIO bus resume when system >>>> resume back from suspend. When MAC resume, usually it will invoke >>>> phy_start() where to change PHY state to PHY_UP, then trigger the stat> machine to run now. In phy_state_machine(), it will start/config >>>> auto-nego, then change PHY state to PHY_NOLINK, what to next is >>>> periodically check PHY link status. When MDIO bus resume, it will >>>> initialize PHY hardware, including soft_reset, what would soft_reset >>>> affect seems various from different PHYs. For KSZ8081RNB, when it in >>>> PHY_NOLINK state and then perform a soft reset, it will never complete >>>> auto-nego. >>> >>> Why? That would need to be checked in detail. Maybe chip errata >>> documentation provides a hint. >>> >> >> The KSZ8081 spec says the following about bit BMCR_PDOWN: >> >> If software reset (Register 0.15) is >> used to exit power-down mode >> (Register 0.11 = 1), two software >> reset writes (Register 0.15 = 1) are >> required. The first write clears >> power-down mode; the second >> write resets the chip and re-latches >> the pin strapping pin values. >> >> Maybe this causes the issue you see and genphy_soft_reset() isn't >> appropriate for this PHY. Please re-test with the KSZ8081 soft reset >> following the spec comment. >> > > Interesting. Never expected that behavior. > Thanks for catching it. Skimmed through the datasheets/erratas. > This is what I found (micrel.c): > > 10/100: > 8001 - Unaffected? > 8021/8031 - Double reset after PDOWN. > 8041 - Errata. PDOWN broken. Recommended do not use. Unclear if reset > solves the issue since errata says no error after reset but is also > claiming that only toggling PDOWN (may) or power will help. > 8051 - Double reset after PDOWN. > 8061 - Double reset after PDOWN. > 8081 - Double reset after PDOWN. > 8091 - Double reset after PDOWN. > > 10/100/1000: > Nothing in gigabit afaics. > > Switches: > 8862 - Not affected? > 8863 - Errata. PDOWN broken. Reset will not help. Workaround exists. > 8864 - Not affected? > 8873 - Errata. PDOWN broken. Reset will not help. Workaround exists. > 9477 - Errata. PDOWN broken. Will randomly cause link failure on > adjacent links. Do not use. > > This certainly explains a lot. > >>>> >>>> This patch changes PHY state to PHY_UP when MDIO bus resume back, it >>>> should be reasonable after PHY hardware re-initialized. Also give state >>>> machine a chance to start/config auto-nego again. >>>> >>> >>> If the MAC driver calls phy_stop() on suspend, then phydev->suspended >>> is true and mdio_bus_phy_may_suspend() returns false. As a consequence >>> phydev->suspended_by_mdio_bus is false and mdio_bus_phy_resume() >>> skips the PHY hw initialization. >>> Please also note that mdio_bus_phy_suspend() calls phy_stop_machine() >>> that sets the state to PHY_UP. >>> >> >> Forgot that MDIO bus suspend is done before MAC driver suspend. >> Therefore disregard this part for now. >> >>> Having said that the current argumentation isn't convincing. I'm not >>> aware of such issues on other systems, therefore it's likely that >>> something is system-dependent. >>> >>> Please check the exact call sequence on your system, maybe it >>> provides a hint. >>> >>>> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> >>>> --- >>>> drivers/net/phy/phy_device.c | 7 +++++++ >>>> 1 file changed, 7 insertions(+) >>>> >>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c >>>> index cc38e326405a..312a6f662481 100644 >>>> --- a/drivers/net/phy/phy_device.c >>>> +++ b/drivers/net/phy/phy_device.c >>>> @@ -306,6 +306,13 @@ static __maybe_unused int mdio_bus_phy_resume(struct device *dev) >>>> ret = phy_resume(phydev); >>>> if (ret < 0) >>>> return ret; >>>> + >>>> + /* PHY state could be changed to PHY_NOLINK from MAC controller resume >>>> + * rounte with phy_start(), here change to PHY_UP after re-initializing >>>> + * PHY hardware, let PHY state machine to start/config auto-nego again. >>>> + */ >>>> + phydev->state = PHY_UP; >>>> + >>>> no_resume: >>>> if (phydev->attached_dev && phydev->adjust_link) >>>> phy_start_machine(phydev); >>>> >>> >> > This is a quick draft of the modified soft reset for KSZ8081. Some tests would be appreciated. diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index a14a00328..4902235a8 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -1091,6 +1091,42 @@ static void kszphy_get_stats(struct phy_device *phydev, data[i] = kszphy_get_stat(phydev, i); } +int ksz8081_soft_reset(struct phy_device *phydev) +{ + int bmcr, ret, val; + + phy_lock_mdio_bus(phydev); + + bmcr = __phy_read(phydev, MII_BMCR); + if (bmcr < 0) + return bmcr; + + bmcr |= BMCR_RESET; + + if (bmcr & BMCR_PDOWN) + __phy_write(phydev, MII_BMCR, bmcr); + + if (phydev->autoneg == AUTONEG_ENABLE) + bmcr |= BMCR_ANRESTART; + + __phy_write(phydev, MII_BMCR, bmcr & ~BMCR_ISOLATE); + + phy_unlock_mdio_bus(phydev); + + phydev->suspended = 0; + + ret = phy_read_poll_timeout(phydev, MII_BMCR, val, !(val & BMCR_RESET), + 50000, 600000, true); + if (ret) + return ret; + + /* BMCR may be reset to defaults */ + if (phydev->autoneg == AUTONEG_DISABLE) + ret = genphy_setup_forced(phydev); + + return ret; +} + static int kszphy_suspend(struct phy_device *phydev) { /* Disable PHY Interrupts */ @@ -1303,7 +1339,7 @@ static struct phy_driver ksphy_driver[] = { .driver_data = &ksz8081_type, .probe = kszphy_probe, .config_init = ksz8081_config_init, - .soft_reset = genphy_soft_reset, + .soft_reset = ksz8081_soft_reset, .config_intr = kszphy_config_intr, .handle_interrupt = kszphy_handle_interrupt, .get_sset_count = kszphy_get_sset_count,
On 4/5/2021 7:58 AM, Heiner Kallweit wrote: > On 05.04.2021 15:53, Christian Melki wrote: >> On 4/5/21 2:09 PM, Heiner Kallweit wrote: >>> On 05.04.2021 10:43, Christian Melki wrote: >>>> On 4/5/21 12:48 AM, Heiner Kallweit wrote: >>>>> On 04.04.2021 16:09, Heiner Kallweit wrote: >>>>>> On 04.04.2021 12:07, Joakim Zhang wrote: >>>>>>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram may cut >>>>>>> off PHY power") invokes phy_init_hw() when MDIO bus resume, it will >>>>>>> soft reset PHY if PHY driver implements soft_reset callback. >>>>>>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset callback to >>>>>>> genphy_soft_reset for KSZ8081") adds soft_reset for KSZ8081. After these >>>>>>> two patches, I found i.MX6UL 14x14 EVK which connected to KSZ8081RNB doesn't >>>>>>> work any more when system resume back, MAC driver is fec_main.c. >>>>>>> >>>>>>> It's obvious that initializing PHY hardware when MDIO bus resume back >>>>>>> would introduce some regression when PHY implements soft_reset. When I >>>>>> >>>>>> Why is this obvious? Please elaborate on why a soft reset should break >>>>>> something. >>>>>> >>>>>>> am debugging, I found PHY works fine if MAC doesn't support suspend/resume >>>>>>> or phy_stop()/phy_start() doesn't been called during suspend/resume. This >>>>>>> let me realize, PHY state machine phy_state_machine() could do something >>>>>>> breaks the PHY. >>>>>>> >>>>>>> As we known, MAC resume first and then MDIO bus resume when system >>>>>>> resume back from suspend. When MAC resume, usually it will invoke >>>>>>> phy_start() where to change PHY state to PHY_UP, then trigger the stat> machine to run now. In phy_state_machine(), it will start/config >>>>>>> auto-nego, then change PHY state to PHY_NOLINK, what to next is >>>>>>> periodically check PHY link status. When MDIO bus resume, it will >>>>>>> initialize PHY hardware, including soft_reset, what would soft_reset >>>>>>> affect seems various from different PHYs. For KSZ8081RNB, when it in >>>>>>> PHY_NOLINK state and then perform a soft reset, it will never complete >>>>>>> auto-nego. >>>>>> >>>>>> Why? That would need to be checked in detail. Maybe chip errata >>>>>> documentation provides a hint. >>>>>> >>>>> >>>>> The KSZ8081 spec says the following about bit BMCR_PDOWN: >>>>> >>>>> If software reset (Register 0.15) is >>>>> used to exit power-down mode >>>>> (Register 0.11 = 1), two software >>>>> reset writes (Register 0.15 = 1) are >>>>> required. The first write clears >>>>> power-down mode; the second >>>>> write resets the chip and re-latches >>>>> the pin strapping pin values. >>>>> >>>>> Maybe this causes the issue you see and genphy_soft_reset() isn't >>>>> appropriate for this PHY. Please re-test with the KSZ8081 soft reset >>>>> following the spec comment. >>>>> >>>> >>>> Interesting. Never expected that behavior. >>>> Thanks for catching it. Skimmed through the datasheets/erratas. >>>> This is what I found (micrel.c): >>>> >>>> 10/100: >>>> 8001 - Unaffected? >>>> 8021/8031 - Double reset after PDOWN. >>>> 8041 - Errata. PDOWN broken. Recommended do not use. Unclear if reset >>>> solves the issue since errata says no error after reset but is also >>>> claiming that only toggling PDOWN (may) or power will help. >>>> 8051 - Double reset after PDOWN. >>>> 8061 - Double reset after PDOWN. >>>> 8081 - Double reset after PDOWN. >>>> 8091 - Double reset after PDOWN. >>>> >>>> 10/100/1000: >>>> Nothing in gigabit afaics. >>>> >>>> Switches: >>>> 8862 - Not affected? >>>> 8863 - Errata. PDOWN broken. Reset will not help. Workaround exists. >>>> 8864 - Not affected? >>>> 8873 - Errata. PDOWN broken. Reset will not help. Workaround exists. >>>> 9477 - Errata. PDOWN broken. Will randomly cause link failure on >>>> adjacent links. Do not use. >>>> >>>> This certainly explains a lot. >>>> >>>>>>> >>>>>>> This patch changes PHY state to PHY_UP when MDIO bus resume back, it >>>>>>> should be reasonable after PHY hardware re-initialized. Also give state >>>>>>> machine a chance to start/config auto-nego again. >>>>>>> >>>>>> >>>>>> If the MAC driver calls phy_stop() on suspend, then phydev->suspended >>>>>> is true and mdio_bus_phy_may_suspend() returns false. As a consequence >>>>>> phydev->suspended_by_mdio_bus is false and mdio_bus_phy_resume() >>>>>> skips the PHY hw initialization. >>>>>> Please also note that mdio_bus_phy_suspend() calls phy_stop_machine() >>>>>> that sets the state to PHY_UP. >>>>>> >>>>> >>>>> Forgot that MDIO bus suspend is done before MAC driver suspend. >>>>> Therefore disregard this part for now. >>>>> >>>>>> Having said that the current argumentation isn't convincing. I'm not >>>>>> aware of such issues on other systems, therefore it's likely that >>>>>> something is system-dependent. >>>>>> >>>>>> Please check the exact call sequence on your system, maybe it >>>>>> provides a hint. >>>>>> >>>>>>> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> >>>>>>> --- >>>>>>> drivers/net/phy/phy_device.c | 7 +++++++ >>>>>>> 1 file changed, 7 insertions(+) >>>>>>> >>>>>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c >>>>>>> index cc38e326405a..312a6f662481 100644 >>>>>>> --- a/drivers/net/phy/phy_device.c >>>>>>> +++ b/drivers/net/phy/phy_device.c >>>>>>> @@ -306,6 +306,13 @@ static __maybe_unused int mdio_bus_phy_resume(struct device *dev) >>>>>>> ret = phy_resume(phydev); >>>>>>> if (ret < 0) >>>>>>> return ret; >>>>>>> + >>>>>>> + /* PHY state could be changed to PHY_NOLINK from MAC controller resume >>>>>>> + * rounte with phy_start(), here change to PHY_UP after re-initializing >>>>>>> + * PHY hardware, let PHY state machine to start/config auto-nego again. >>>>>>> + */ >>>>>>> + phydev->state = PHY_UP; >>>>>>> + >>>>>>> no_resume: >>>>>>> if (phydev->attached_dev && phydev->adjust_link) >>>>>>> phy_start_machine(phydev); >>>>>>> >>>>>> >>>>> >>>> >>> >>> This is a quick draft of the modified soft reset for KSZ8081. >>> Some tests would be appreciated. >>> >>> >>> diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c >>> index a14a00328..4902235a8 100644 >>> --- a/drivers/net/phy/micrel.c >>> +++ b/drivers/net/phy/micrel.c >>> @@ -1091,6 +1091,42 @@ static void kszphy_get_stats(struct phy_device *phydev, >>> data[i] = kszphy_get_stat(phydev, i); >>> } >>> >>> +int ksz8081_soft_reset(struct phy_device *phydev) >>> +{ >>> + int bmcr, ret, val; >>> + >>> + phy_lock_mdio_bus(phydev); >>> + >>> + bmcr = __phy_read(phydev, MII_BMCR); >>> + if (bmcr < 0) >>> + return bmcr; >>> + >>> + bmcr |= BMCR_RESET; >>> + >>> + if (bmcr & BMCR_PDOWN) >>> + __phy_write(phydev, MII_BMCR, bmcr); >>> + >>> + if (phydev->autoneg == AUTONEG_ENABLE) >>> + bmcr |= BMCR_ANRESTART; >>> + >>> + __phy_write(phydev, MII_BMCR, bmcr & ~BMCR_ISOLATE); >>> + >> >> Wouldn't this re-set BMCR_PDOWN? > > No. A soft reset is supposed to reset all bits to their defaults. FWIW, this behavior is not universally accepted by PHY vendors and some take liberties in how BMCR_RESET is implemented.
Hi Heiner, Thanks for your comments. > -----Original Message----- > From: Heiner Kallweit <hkallweit1@gmail.com> > Sent: 2021年4月4日 22:09 > To: Joakim Zhang <qiangqing.zhang@nxp.com>; andrew@lunn.ch; > linux@armlinux.org.uk; davem@davemloft.net; kuba@kernel.org > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx > <linux-imx@nxp.com>; christian.melki@t2data.com > Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume > back > > On 04.04.2021 12:07, Joakim Zhang wrote: > > commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram may cut off > > PHY power") invokes phy_init_hw() when MDIO bus resume, it will soft > > reset PHY if PHY driver implements soft_reset callback. > > commit 764d31cacfe4 ("net: phy: micrel: set soft_reset callback to > > genphy_soft_reset for KSZ8081") adds soft_reset for KSZ8081. After > > these two patches, I found i.MX6UL 14x14 EVK which connected to > > KSZ8081RNB doesn't work any more when system resume back, MAC driver > is fec_main.c. > > > > It's obvious that initializing PHY hardware when MDIO bus resume back > > would introduce some regression when PHY implements soft_reset. When I > > Why is this obvious? Please elaborate on why a soft reset should break > something. This obvious since only above two fixes work together can trigger this issue at my side. > > am debugging, I found PHY works fine if MAC doesn't support > > suspend/resume or phy_stop()/phy_start() doesn't been called during > > suspend/resume. This let me realize, PHY state machine > > phy_state_machine() could do something breaks the PHY. > > > > As we known, MAC resume first and then MDIO bus resume when system > > resume back from suspend. When MAC resume, usually it will invoke > > phy_start() where to change PHY state to PHY_UP, then trigger the > > stat> machine to run now. In phy_state_machine(), it will start/config > > auto-nego, then change PHY state to PHY_NOLINK, what to next is > > periodically check PHY link status. When MDIO bus resume, it will > > initialize PHY hardware, including soft_reset, what would soft_reset > > affect seems various from different PHYs. For KSZ8081RNB, when it in > > PHY_NOLINK state and then perform a soft reset, it will never complete > auto-nego. > > Why? That would need to be checked in detail. Maybe chip errata > documentation provides a hint. > > > > > This patch changes PHY state to PHY_UP when MDIO bus resume back, it > > should be reasonable after PHY hardware re-initialized. Also give > > state machine a chance to start/config auto-nego again. > > > > If the MAC driver calls phy_stop() on suspend, then phydev->suspended is true > and mdio_bus_phy_may_suspend() returns false. As a consequence > phydev->suspended_by_mdio_bus is false and mdio_bus_phy_resume() > skips the PHY hw initialization. Per my debugging, MDIO bus suspended before MAC, I think it is a common behavior, so PHY hw initialization is always called. Does this behavior be system-dependent? Best Regards, Joakim Zhang > Please also note that mdio_bus_phy_suspend() calls phy_stop_machine() that > sets the state to PHY_UP. > > Having said that the current argumentation isn't convincing. I'm not aware of > such issues on other systems, therefore it's likely that something is > system-dependent. > > Please check the exact call sequence on your system, maybe it provides a hint. > > > Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> > > --- > > drivers/net/phy/phy_device.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/net/phy/phy_device.c > > b/drivers/net/phy/phy_device.c index cc38e326405a..312a6f662481 100644 > > --- a/drivers/net/phy/phy_device.c > > +++ b/drivers/net/phy/phy_device.c > > @@ -306,6 +306,13 @@ static __maybe_unused int > mdio_bus_phy_resume(struct device *dev) > > ret = phy_resume(phydev); > > if (ret < 0) > > return ret; > > + > > + /* PHY state could be changed to PHY_NOLINK from MAC controller > resume > > + * rounte with phy_start(), here change to PHY_UP after re-initializing > > + * PHY hardware, let PHY state machine to start/config auto-nego again. > > + */ > > + phydev->state = PHY_UP; > > + > > no_resume: > > if (phydev->attached_dev && phydev->adjust_link) > > phy_start_machine(phydev); > >
Hi Heiner, > -----Original Message----- > From: Heiner Kallweit <hkallweit1@gmail.com> > Sent: 2021年4月5日 6:49 > To: Joakim Zhang <qiangqing.zhang@nxp.com>; andrew@lunn.ch; > linux@armlinux.org.uk; davem@davemloft.net; kuba@kernel.org > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx > <linux-imx@nxp.com>; christian.melki@t2data.com > Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume > back > > On 04.04.2021 16:09, Heiner Kallweit wrote: > > On 04.04.2021 12:07, Joakim Zhang wrote: > >> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram may cut > >> off PHY power") invokes phy_init_hw() when MDIO bus resume, it will > >> soft reset PHY if PHY driver implements soft_reset callback. > >> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset callback to > >> genphy_soft_reset for KSZ8081") adds soft_reset for KSZ8081. After > >> these two patches, I found i.MX6UL 14x14 EVK which connected to > >> KSZ8081RNB doesn't work any more when system resume back, MAC driver > is fec_main.c. > >> > >> It's obvious that initializing PHY hardware when MDIO bus resume back > >> would introduce some regression when PHY implements soft_reset. When > >> I > > > > Why is this obvious? Please elaborate on why a soft reset should break > > something. > > > >> am debugging, I found PHY works fine if MAC doesn't support > >> suspend/resume or phy_stop()/phy_start() doesn't been called during > >> suspend/resume. This let me realize, PHY state machine > >> phy_state_machine() could do something breaks the PHY. > >> > >> As we known, MAC resume first and then MDIO bus resume when system > >> resume back from suspend. When MAC resume, usually it will invoke > >> phy_start() where to change PHY state to PHY_UP, then trigger the > >> stat> machine to run now. In phy_state_machine(), it will > >> start/config auto-nego, then change PHY state to PHY_NOLINK, what to > >> next is periodically check PHY link status. When MDIO bus resume, it > >> will initialize PHY hardware, including soft_reset, what would > >> soft_reset affect seems various from different PHYs. For KSZ8081RNB, > >> when it in PHY_NOLINK state and then perform a soft reset, it will never > complete auto-nego. > > > > Why? That would need to be checked in detail. Maybe chip errata > > documentation provides a hint. > > > > The KSZ8081 spec says the following about bit BMCR_PDOWN: > > If software reset (Register 0.15) is > used to exit power-down mode > (Register 0.11 = 1), two software > reset writes (Register 0.15 = 1) are > required. The first write clears > power-down mode; the second > write resets the chip and re-latches > the pin strapping pin values. > > Maybe this causes the issue you see and genphy_soft_reset() isn't appropriate > for this PHY. Please re-test with the KSZ8081 soft reset following the spec > comment. Yes, I also noticed this note in spec, and tried to soft reset twice from PHY driver, but still can't work. What is strange is that, ifup/ifdown can work fine, which is almost the same route with suspend/resume, except suspend/resume has state machine running. > > >> > >> This patch changes PHY state to PHY_UP when MDIO bus resume back, it > >> should be reasonable after PHY hardware re-initialized. Also give > >> state machine a chance to start/config auto-nego again. > >> > > > > If the MAC driver calls phy_stop() on suspend, then phydev->suspended > > is true and mdio_bus_phy_may_suspend() returns false. As a consequence > > phydev->suspended_by_mdio_bus is false and mdio_bus_phy_resume() > > skips the PHY hw initialization. > > Please also note that mdio_bus_phy_suspend() calls phy_stop_machine() > > that sets the state to PHY_UP. > > > > Forgot that MDIO bus suspend is done before MAC driver suspend. > Therefore disregard this part for now. OK. Best Regards, Joakim Zhang > > Having said that the current argumentation isn't convincing. I'm not > > aware of such issues on other systems, therefore it's likely that > > something is system-dependent. > > > > Please check the exact call sequence on your system, maybe it provides > > a hint. > > > >> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> > >> --- > >> drivers/net/phy/phy_device.c | 7 +++++++ > >> 1 file changed, 7 insertions(+) > >> > >> diff --git a/drivers/net/phy/phy_device.c > >> b/drivers/net/phy/phy_device.c index cc38e326405a..312a6f662481 > >> 100644 > >> --- a/drivers/net/phy/phy_device.c > >> +++ b/drivers/net/phy/phy_device.c > >> @@ -306,6 +306,13 @@ static __maybe_unused int > mdio_bus_phy_resume(struct device *dev) > >> ret = phy_resume(phydev); > >> if (ret < 0) > >> return ret; > >> + > >> + /* PHY state could be changed to PHY_NOLINK from MAC controller > resume > >> + * rounte with phy_start(), here change to PHY_UP after re-initializing > >> + * PHY hardware, let PHY state machine to start/config auto-nego again. > >> + */ > >> + phydev->state = PHY_UP; > >> + > >> no_resume: > >> if (phydev->attached_dev && phydev->adjust_link) > >> phy_start_machine(phydev); > >> > >
Hi Charistian, > -----Original Message----- > From: Christian Melki <christian.melki@t2data.com> > Sent: 2021年4月5日 16:44 > To: Heiner Kallweit <hkallweit1@gmail.com>; Joakim Zhang > <qiangqing.zhang@nxp.com>; andrew@lunn.ch; linux@armlinux.org.uk; > davem@davemloft.net; kuba@kernel.org > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx > <linux-imx@nxp.com> > Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume > back > > On 4/5/21 12:48 AM, Heiner Kallweit wrote: > > On 04.04.2021 16:09, Heiner Kallweit wrote: > >> On 04.04.2021 12:07, Joakim Zhang wrote: > >>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram may cut > >>> off PHY power") invokes phy_init_hw() when MDIO bus resume, it will > >>> soft reset PHY if PHY driver implements soft_reset callback. > >>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset callback to > >>> genphy_soft_reset for KSZ8081") adds soft_reset for KSZ8081. After > >>> these two patches, I found i.MX6UL 14x14 EVK which connected to > >>> KSZ8081RNB doesn't work any more when system resume back, MAC > driver is fec_main.c. > >>> > >>> It's obvious that initializing PHY hardware when MDIO bus resume > >>> back would introduce some regression when PHY implements soft_reset. > >>> When I > >> > >> Why is this obvious? Please elaborate on why a soft reset should > >> break something. > >> > >>> am debugging, I found PHY works fine if MAC doesn't support > >>> suspend/resume or phy_stop()/phy_start() doesn't been called during > >>> suspend/resume. This let me realize, PHY state machine > >>> phy_state_machine() could do something breaks the PHY. > >>> > >>> As we known, MAC resume first and then MDIO bus resume when system > >>> resume back from suspend. When MAC resume, usually it will invoke > >>> phy_start() where to change PHY state to PHY_UP, then trigger the > >>> stat> machine to run now. In phy_state_machine(), it will > >>> start/config auto-nego, then change PHY state to PHY_NOLINK, what to > >>> next is periodically check PHY link status. When MDIO bus resume, it > >>> will initialize PHY hardware, including soft_reset, what would > >>> soft_reset affect seems various from different PHYs. For KSZ8081RNB, > >>> when it in PHY_NOLINK state and then perform a soft reset, it will never > complete auto-nego. > >> > >> Why? That would need to be checked in detail. Maybe chip errata > >> documentation provides a hint. > >> > > > > The KSZ8081 spec says the following about bit BMCR_PDOWN: > > > > If software reset (Register 0.15) is > > used to exit power-down mode > > (Register 0.11 = 1), two software > > reset writes (Register 0.15 = 1) are > > required. The first write clears > > power-down mode; the second > > write resets the chip and re-latches > > the pin strapping pin values. > > > > Maybe this causes the issue you see and genphy_soft_reset() isn't > > appropriate for this PHY. Please re-test with the KSZ8081 soft reset > > following the spec comment. > > > > Interesting. Never expected that behavior. > Thanks for catching it. Skimmed through the datasheets/erratas. > This is what I found (micrel.c): > > 10/100: > 8001 - Unaffected? > 8021/8031 - Double reset after PDOWN. > 8041 - Errata. PDOWN broken. Recommended do not use. Unclear if reset > solves the issue since errata says no error after reset but is also claiming that > only toggling PDOWN (may) or power will help. > 8051 - Double reset after PDOWN. > 8061 - Double reset after PDOWN. > 8081 - Double reset after PDOWN. > 8091 - Double reset after PDOWN. > > 10/100/1000: > Nothing in gigabit afaics. > > Switches: > 8862 - Not affected? > 8863 - Errata. PDOWN broken. Reset will not help. Workaround exists. > 8864 - Not affected? > 8873 - Errata. PDOWN broken. Reset will not help. Workaround exists. > 9477 - Errata. PDOWN broken. Will randomly cause link failure on adjacent links. > Do not use. > > This certainly explains a lot. Thanks for digging into it. As I discussed with you before, there is no problem with these two fixes if I did ifdown/ifup. Almost the same route with suspend/resume. Difference is that it will start state machine after initializing PHY. But when resume back, state machine is running before initializing PHY. I think the key is to figure out what would soft reset affect in 8081, there is no any hint in spec. Best Regards, Joakim Zhang > >>> > >>> This patch changes PHY state to PHY_UP when MDIO bus resume back, it > >>> should be reasonable after PHY hardware re-initialized. Also give > >>> state machine a chance to start/config auto-nego again. > >>> > >> > >> If the MAC driver calls phy_stop() on suspend, then phydev->suspended > >> is true and mdio_bus_phy_may_suspend() returns false. As a > >> consequence > >> phydev->suspended_by_mdio_bus is false and mdio_bus_phy_resume() > >> skips the PHY hw initialization. > >> Please also note that mdio_bus_phy_suspend() calls phy_stop_machine() > >> that sets the state to PHY_UP. > >> > > > > Forgot that MDIO bus suspend is done before MAC driver suspend. > > Therefore disregard this part for now. > > > >> Having said that the current argumentation isn't convincing. I'm not > >> aware of such issues on other systems, therefore it's likely that > >> something is system-dependent. > >> > >> Please check the exact call sequence on your system, maybe it > >> provides a hint. > >> > >>> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> > >>> --- > >>> drivers/net/phy/phy_device.c | 7 +++++++ > >>> 1 file changed, 7 insertions(+) > >>> > >>> diff --git a/drivers/net/phy/phy_device.c > >>> b/drivers/net/phy/phy_device.c index cc38e326405a..312a6f662481 > >>> 100644 > >>> --- a/drivers/net/phy/phy_device.c > >>> +++ b/drivers/net/phy/phy_device.c > >>> @@ -306,6 +306,13 @@ static __maybe_unused int > mdio_bus_phy_resume(struct device *dev) > >>> ret = phy_resume(phydev); > >>> if (ret < 0) > >>> return ret; > >>> + > >>> + /* PHY state could be changed to PHY_NOLINK from MAC controller > resume > >>> + * rounte with phy_start(), here change to PHY_UP after > re-initializing > >>> + * PHY hardware, let PHY state machine to start/config auto-nego > again. > >>> + */ > >>> + phydev->state = PHY_UP; > >>> + > >>> no_resume: > >>> if (phydev->attached_dev && phydev->adjust_link) > >>> phy_start_machine(phydev); > >>> > >> > >
Hi Heiner, > -----Original Message----- > From: Heiner Kallweit <hkallweit1@gmail.com> > Sent: 2021年4月5日 20:10 > To: christian.melki@t2data.com; Joakim Zhang <qiangqing.zhang@nxp.com>; > andrew@lunn.ch; linux@armlinux.org.uk; davem@davemloft.net; > kuba@kernel.org > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx > <linux-imx@nxp.com> > Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume > back > > On 05.04.2021 10:43, Christian Melki wrote: > > On 4/5/21 12:48 AM, Heiner Kallweit wrote: > >> On 04.04.2021 16:09, Heiner Kallweit wrote: > >>> On 04.04.2021 12:07, Joakim Zhang wrote: > >>>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram may cut > >>>> off PHY power") invokes phy_init_hw() when MDIO bus resume, it will > >>>> soft reset PHY if PHY driver implements soft_reset callback. > >>>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset callback to > >>>> genphy_soft_reset for KSZ8081") adds soft_reset for KSZ8081. After > >>>> these two patches, I found i.MX6UL 14x14 EVK which connected to > >>>> KSZ8081RNB doesn't work any more when system resume back, MAC > driver is fec_main.c. > >>>> > >>>> It's obvious that initializing PHY hardware when MDIO bus resume > >>>> back would introduce some regression when PHY implements > >>>> soft_reset. When I > >>> > >>> Why is this obvious? Please elaborate on why a soft reset should > >>> break something. > >>> > >>>> am debugging, I found PHY works fine if MAC doesn't support > >>>> suspend/resume or phy_stop()/phy_start() doesn't been called during > >>>> suspend/resume. This let me realize, PHY state machine > >>>> phy_state_machine() could do something breaks the PHY. > >>>> > >>>> As we known, MAC resume first and then MDIO bus resume when system > >>>> resume back from suspend. When MAC resume, usually it will invoke > >>>> phy_start() where to change PHY state to PHY_UP, then trigger the > >>>> stat> machine to run now. In phy_state_machine(), it will > >>>> start/config auto-nego, then change PHY state to PHY_NOLINK, what > >>>> to next is periodically check PHY link status. When MDIO bus > >>>> resume, it will initialize PHY hardware, including soft_reset, what > >>>> would soft_reset affect seems various from different PHYs. For > >>>> KSZ8081RNB, when it in PHY_NOLINK state and then perform a soft reset, > it will never complete auto-nego. > >>> > >>> Why? That would need to be checked in detail. Maybe chip errata > >>> documentation provides a hint. > >>> > >> > >> The KSZ8081 spec says the following about bit BMCR_PDOWN: > >> > >> If software reset (Register 0.15) is > >> used to exit power-down mode > >> (Register 0.11 = 1), two software > >> reset writes (Register 0.15 = 1) are > >> required. The first write clears > >> power-down mode; the second > >> write resets the chip and re-latches > >> the pin strapping pin values. > >> > >> Maybe this causes the issue you see and genphy_soft_reset() isn't > >> appropriate for this PHY. Please re-test with the KSZ8081 soft reset > >> following the spec comment. > >> > > > > Interesting. Never expected that behavior. > > Thanks for catching it. Skimmed through the datasheets/erratas. > > This is what I found (micrel.c): > > > > 10/100: > > 8001 - Unaffected? > > 8021/8031 - Double reset after PDOWN. > > 8041 - Errata. PDOWN broken. Recommended do not use. Unclear if reset > > solves the issue since errata says no error after reset but is also > > claiming that only toggling PDOWN (may) or power will help. > > 8051 - Double reset after PDOWN. > > 8061 - Double reset after PDOWN. > > 8081 - Double reset after PDOWN. > > 8091 - Double reset after PDOWN. > > > > 10/100/1000: > > Nothing in gigabit afaics. > > > > Switches: > > 8862 - Not affected? > > 8863 - Errata. PDOWN broken. Reset will not help. Workaround exists. > > 8864 - Not affected? > > 8873 - Errata. PDOWN broken. Reset will not help. Workaround exists. > > 9477 - Errata. PDOWN broken. Will randomly cause link failure on > > adjacent links. Do not use. > > > > This certainly explains a lot. > > > >>>> > >>>> This patch changes PHY state to PHY_UP when MDIO bus resume back, > >>>> it should be reasonable after PHY hardware re-initialized. Also > >>>> give state machine a chance to start/config auto-nego again. > >>>> > >>> > >>> If the MAC driver calls phy_stop() on suspend, then > >>> phydev->suspended is true and mdio_bus_phy_may_suspend() returns > >>> false. As a consequence > >>> phydev->suspended_by_mdio_bus is false and mdio_bus_phy_resume() > >>> skips the PHY hw initialization. > >>> Please also note that mdio_bus_phy_suspend() calls > >>> phy_stop_machine() that sets the state to PHY_UP. > >>> > >> > >> Forgot that MDIO bus suspend is done before MAC driver suspend. > >> Therefore disregard this part for now. > >> > >>> Having said that the current argumentation isn't convincing. I'm not > >>> aware of such issues on other systems, therefore it's likely that > >>> something is system-dependent. > >>> > >>> Please check the exact call sequence on your system, maybe it > >>> provides a hint. > >>> > >>>> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> > >>>> --- > >>>> drivers/net/phy/phy_device.c | 7 +++++++ > >>>> 1 file changed, 7 insertions(+) > >>>> > >>>> diff --git a/drivers/net/phy/phy_device.c > >>>> b/drivers/net/phy/phy_device.c index cc38e326405a..312a6f662481 > >>>> 100644 > >>>> --- a/drivers/net/phy/phy_device.c > >>>> +++ b/drivers/net/phy/phy_device.c > >>>> @@ -306,6 +306,13 @@ static __maybe_unused int > mdio_bus_phy_resume(struct device *dev) > >>>> ret = phy_resume(phydev); > >>>> if (ret < 0) > >>>> return ret; > >>>> + > >>>> + /* PHY state could be changed to PHY_NOLINK from MAC controller > resume > >>>> + * rounte with phy_start(), here change to PHY_UP after > re-initializing > >>>> + * PHY hardware, let PHY state machine to start/config auto-nego > again. > >>>> + */ > >>>> + phydev->state = PHY_UP; > >>>> + > >>>> no_resume: > >>>> if (phydev->attached_dev && phydev->adjust_link) > >>>> phy_start_machine(phydev); > >>>> > >>> > >> > > > > This is a quick draft of the modified soft reset for KSZ8081. > Some tests would be appreciated. > I test this patch at my side, unfortunately, it still can't work. I have not found what soft reset would affect in 8081 spec, is there ang common Description for different PHYs? Best Regards, Joakim Zhang > diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index > a14a00328..4902235a8 100644 > --- a/drivers/net/phy/micrel.c > +++ b/drivers/net/phy/micrel.c > @@ -1091,6 +1091,42 @@ static void kszphy_get_stats(struct phy_device > *phydev, > data[i] = kszphy_get_stat(phydev, i); } > > +int ksz8081_soft_reset(struct phy_device *phydev) { > + int bmcr, ret, val; > + > + phy_lock_mdio_bus(phydev); > + > + bmcr = __phy_read(phydev, MII_BMCR); > + if (bmcr < 0) > + return bmcr; > + > + bmcr |= BMCR_RESET; > + > + if (bmcr & BMCR_PDOWN) > + __phy_write(phydev, MII_BMCR, bmcr); > + > + if (phydev->autoneg == AUTONEG_ENABLE) > + bmcr |= BMCR_ANRESTART; > + > + __phy_write(phydev, MII_BMCR, bmcr & ~BMCR_ISOLATE); > + > + phy_unlock_mdio_bus(phydev); > + > + phydev->suspended = 0; > + > + ret = phy_read_poll_timeout(phydev, MII_BMCR, val, !(val & > BMCR_RESET), > + 50000, 600000, true); > + if (ret) > + return ret; > + > + /* BMCR may be reset to defaults */ > + if (phydev->autoneg == AUTONEG_DISABLE) > + ret = genphy_setup_forced(phydev); > + > + return ret; > +} > + > static int kszphy_suspend(struct phy_device *phydev) { > /* Disable PHY Interrupts */ > @@ -1303,7 +1339,7 @@ static struct phy_driver ksphy_driver[] = { > .driver_data = &ksz8081_type, > .probe = kszphy_probe, > .config_init = ksz8081_config_init, > - .soft_reset = genphy_soft_reset, > + .soft_reset = ksz8081_soft_reset, > .config_intr = kszphy_config_intr, > .handle_interrupt = kszphy_handle_interrupt, > .get_sset_count = kszphy_get_sset_count, > -- > 2.31.1
On 06.04.2021 04:07, Joakim Zhang wrote: > > Hi Heiner, > >> -----Original Message----- >> From: Heiner Kallweit <hkallweit1@gmail.com> >> Sent: 2021年4月5日 20:10 >> To: christian.melki@t2data.com; Joakim Zhang <qiangqing.zhang@nxp.com>; >> andrew@lunn.ch; linux@armlinux.org.uk; davem@davemloft.net; >> kuba@kernel.org >> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx >> <linux-imx@nxp.com> >> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume >> back >> >> On 05.04.2021 10:43, Christian Melki wrote: >>> On 4/5/21 12:48 AM, Heiner Kallweit wrote: >>>> On 04.04.2021 16:09, Heiner Kallweit wrote: >>>>> On 04.04.2021 12:07, Joakim Zhang wrote: >>>>>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram may cut >>>>>> off PHY power") invokes phy_init_hw() when MDIO bus resume, it will >>>>>> soft reset PHY if PHY driver implements soft_reset callback. >>>>>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset callback to >>>>>> genphy_soft_reset for KSZ8081") adds soft_reset for KSZ8081. After >>>>>> these two patches, I found i.MX6UL 14x14 EVK which connected to >>>>>> KSZ8081RNB doesn't work any more when system resume back, MAC >> driver is fec_main.c. >>>>>> >>>>>> It's obvious that initializing PHY hardware when MDIO bus resume >>>>>> back would introduce some regression when PHY implements >>>>>> soft_reset. When I >>>>> >>>>> Why is this obvious? Please elaborate on why a soft reset should >>>>> break something. >>>>> >>>>>> am debugging, I found PHY works fine if MAC doesn't support >>>>>> suspend/resume or phy_stop()/phy_start() doesn't been called during >>>>>> suspend/resume. This let me realize, PHY state machine >>>>>> phy_state_machine() could do something breaks the PHY. >>>>>> >>>>>> As we known, MAC resume first and then MDIO bus resume when system >>>>>> resume back from suspend. When MAC resume, usually it will invoke >>>>>> phy_start() where to change PHY state to PHY_UP, then trigger the >>>>>> stat> machine to run now. In phy_state_machine(), it will >>>>>> start/config auto-nego, then change PHY state to PHY_NOLINK, what >>>>>> to next is periodically check PHY link status. When MDIO bus >>>>>> resume, it will initialize PHY hardware, including soft_reset, what >>>>>> would soft_reset affect seems various from different PHYs. For >>>>>> KSZ8081RNB, when it in PHY_NOLINK state and then perform a soft reset, >> it will never complete auto-nego. >>>>> >>>>> Why? That would need to be checked in detail. Maybe chip errata >>>>> documentation provides a hint. >>>>> >>>> >>>> The KSZ8081 spec says the following about bit BMCR_PDOWN: >>>> >>>> If software reset (Register 0.15) is >>>> used to exit power-down mode >>>> (Register 0.11 = 1), two software >>>> reset writes (Register 0.15 = 1) are >>>> required. The first write clears >>>> power-down mode; the second >>>> write resets the chip and re-latches >>>> the pin strapping pin values. >>>> >>>> Maybe this causes the issue you see and genphy_soft_reset() isn't >>>> appropriate for this PHY. Please re-test with the KSZ8081 soft reset >>>> following the spec comment. >>>> >>> >>> Interesting. Never expected that behavior. >>> Thanks for catching it. Skimmed through the datasheets/erratas. >>> This is what I found (micrel.c): >>> >>> 10/100: >>> 8001 - Unaffected? >>> 8021/8031 - Double reset after PDOWN. >>> 8041 - Errata. PDOWN broken. Recommended do not use. Unclear if reset >>> solves the issue since errata says no error after reset but is also >>> claiming that only toggling PDOWN (may) or power will help. >>> 8051 - Double reset after PDOWN. >>> 8061 - Double reset after PDOWN. >>> 8081 - Double reset after PDOWN. >>> 8091 - Double reset after PDOWN. >>> >>> 10/100/1000: >>> Nothing in gigabit afaics. >>> >>> Switches: >>> 8862 - Not affected? >>> 8863 - Errata. PDOWN broken. Reset will not help. Workaround exists. >>> 8864 - Not affected? >>> 8873 - Errata. PDOWN broken. Reset will not help. Workaround exists. >>> 9477 - Errata. PDOWN broken. Will randomly cause link failure on >>> adjacent links. Do not use. >>> >>> This certainly explains a lot. >>> >>>>>> >>>>>> This patch changes PHY state to PHY_UP when MDIO bus resume back, >>>>>> it should be reasonable after PHY hardware re-initialized. Also >>>>>> give state machine a chance to start/config auto-nego again. >>>>>> >>>>> >>>>> If the MAC driver calls phy_stop() on suspend, then >>>>> phydev->suspended is true and mdio_bus_phy_may_suspend() returns >>>>> false. As a consequence >>>>> phydev->suspended_by_mdio_bus is false and mdio_bus_phy_resume() >>>>> skips the PHY hw initialization. >>>>> Please also note that mdio_bus_phy_suspend() calls >>>>> phy_stop_machine() that sets the state to PHY_UP. >>>>> >>>> >>>> Forgot that MDIO bus suspend is done before MAC driver suspend. >>>> Therefore disregard this part for now. >>>> >>>>> Having said that the current argumentation isn't convincing. I'm not >>>>> aware of such issues on other systems, therefore it's likely that >>>>> something is system-dependent. >>>>> >>>>> Please check the exact call sequence on your system, maybe it >>>>> provides a hint. >>>>> >>>>>> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> >>>>>> --- >>>>>> drivers/net/phy/phy_device.c | 7 +++++++ >>>>>> 1 file changed, 7 insertions(+) >>>>>> >>>>>> diff --git a/drivers/net/phy/phy_device.c >>>>>> b/drivers/net/phy/phy_device.c index cc38e326405a..312a6f662481 >>>>>> 100644 >>>>>> --- a/drivers/net/phy/phy_device.c >>>>>> +++ b/drivers/net/phy/phy_device.c >>>>>> @@ -306,6 +306,13 @@ static __maybe_unused int >> mdio_bus_phy_resume(struct device *dev) >>>>>> ret = phy_resume(phydev); >>>>>> if (ret < 0) >>>>>> return ret; >>>>>> + >>>>>> + /* PHY state could be changed to PHY_NOLINK from MAC controller >> resume >>>>>> + * rounte with phy_start(), here change to PHY_UP after >> re-initializing >>>>>> + * PHY hardware, let PHY state machine to start/config auto-nego >> again. >>>>>> + */ >>>>>> + phydev->state = PHY_UP; >>>>>> + >>>>>> no_resume: >>>>>> if (phydev->attached_dev && phydev->adjust_link) >>>>>> phy_start_machine(phydev); >>>>>> >>>>> >>>> >>> >> >> This is a quick draft of the modified soft reset for KSZ8081. >> Some tests would be appreciated. >> > > I test this patch at my side, unfortunately, it still can't work. > > I have not found what soft reset would affect in 8081 spec, is there ang common > Description for different PHYs? > You can check the clause 22 spec: 802.3 22.2.4.1.1 Apart from that you can check BMCR value and which modes your PHY advertises after a soft reset. If the PHY indeed wouldn't restart aneg after a soft reset, then it would be the only one with this behavior I know. And I'd wonder why there aren't more such reports. > Best Regards, > Joakim Zhang >> diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index >> a14a00328..4902235a8 100644 >> --- a/drivers/net/phy/micrel.c >> +++ b/drivers/net/phy/micrel.c >> @@ -1091,6 +1091,42 @@ static void kszphy_get_stats(struct phy_device >> *phydev, >> data[i] = kszphy_get_stat(phydev, i); } >> >> +int ksz8081_soft_reset(struct phy_device *phydev) { >> + int bmcr, ret, val; >> + >> + phy_lock_mdio_bus(phydev); >> + >> + bmcr = __phy_read(phydev, MII_BMCR); >> + if (bmcr < 0) >> + return bmcr; >> + >> + bmcr |= BMCR_RESET; >> + >> + if (bmcr & BMCR_PDOWN) >> + __phy_write(phydev, MII_BMCR, bmcr); >> + >> + if (phydev->autoneg == AUTONEG_ENABLE) >> + bmcr |= BMCR_ANRESTART; >> + >> + __phy_write(phydev, MII_BMCR, bmcr & ~BMCR_ISOLATE); >> + >> + phy_unlock_mdio_bus(phydev); >> + >> + phydev->suspended = 0; >> + >> + ret = phy_read_poll_timeout(phydev, MII_BMCR, val, !(val & >> BMCR_RESET), >> + 50000, 600000, true); >> + if (ret) >> + return ret; >> + >> + /* BMCR may be reset to defaults */ >> + if (phydev->autoneg == AUTONEG_DISABLE) >> + ret = genphy_setup_forced(phydev); >> + >> + return ret; >> +} >> + >> static int kszphy_suspend(struct phy_device *phydev) { >> /* Disable PHY Interrupts */ >> @@ -1303,7 +1339,7 @@ static struct phy_driver ksphy_driver[] = { >> .driver_data = &ksz8081_type, >> .probe = kszphy_probe, >> .config_init = ksz8081_config_init, >> - .soft_reset = genphy_soft_reset, >> + .soft_reset = ksz8081_soft_reset, >> .config_intr = kszphy_config_intr, >> .handle_interrupt = kszphy_handle_interrupt, >> .get_sset_count = kszphy_get_sset_count, >> -- >> 2.31.1 >
> -----Original Message----- > From: Heiner Kallweit <hkallweit1@gmail.com> > Sent: 2021年4月6日 14:29 > To: Joakim Zhang <qiangqing.zhang@nxp.com>; christian.melki@t2data.com; > andrew@lunn.ch; linux@armlinux.org.uk; davem@davemloft.net; > kuba@kernel.org > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx > <linux-imx@nxp.com> > Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume > back > > On 06.04.2021 04:07, Joakim Zhang wrote: > > > > Hi Heiner, > > > >> -----Original Message----- > >> From: Heiner Kallweit <hkallweit1@gmail.com> > >> Sent: 2021年4月5日 20:10 > >> To: christian.melki@t2data.com; Joakim Zhang > >> <qiangqing.zhang@nxp.com>; andrew@lunn.ch; linux@armlinux.org.uk; > >> davem@davemloft.net; kuba@kernel.org > >> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > >> dl-linux-imx <linux-imx@nxp.com> > >> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus > >> resume back > >> > >> On 05.04.2021 10:43, Christian Melki wrote: > >>> On 4/5/21 12:48 AM, Heiner Kallweit wrote: > >>>> On 04.04.2021 16:09, Heiner Kallweit wrote: > >>>>> On 04.04.2021 12:07, Joakim Zhang wrote: > >>>>>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram may > >>>>>> cut off PHY power") invokes phy_init_hw() when MDIO bus resume, > >>>>>> it will soft reset PHY if PHY driver implements soft_reset callback. > >>>>>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset callback > >>>>>> to genphy_soft_reset for KSZ8081") adds soft_reset for KSZ8081. > >>>>>> After these two patches, I found i.MX6UL 14x14 EVK which > >>>>>> connected to KSZ8081RNB doesn't work any more when system > resume > >>>>>> back, MAC > >> driver is fec_main.c. > >>>>>> > >>>>>> It's obvious that initializing PHY hardware when MDIO bus resume > >>>>>> back would introduce some regression when PHY implements > >>>>>> soft_reset. When I > >>>>> > >>>>> Why is this obvious? Please elaborate on why a soft reset should > >>>>> break something. > >>>>> > >>>>>> am debugging, I found PHY works fine if MAC doesn't support > >>>>>> suspend/resume or phy_stop()/phy_start() doesn't been called > >>>>>> during suspend/resume. This let me realize, PHY state machine > >>>>>> phy_state_machine() could do something breaks the PHY. > >>>>>> > >>>>>> As we known, MAC resume first and then MDIO bus resume when > >>>>>> system resume back from suspend. When MAC resume, usually it will > >>>>>> invoke > >>>>>> phy_start() where to change PHY state to PHY_UP, then trigger the > >>>>>> stat> machine to run now. In phy_state_machine(), it will > >>>>>> start/config auto-nego, then change PHY state to PHY_NOLINK, what > >>>>>> to next is periodically check PHY link status. When MDIO bus > >>>>>> resume, it will initialize PHY hardware, including soft_reset, > >>>>>> what would soft_reset affect seems various from different PHYs. > >>>>>> For KSZ8081RNB, when it in PHY_NOLINK state and then perform a > >>>>>> soft reset, > >> it will never complete auto-nego. > >>>>> > >>>>> Why? That would need to be checked in detail. Maybe chip errata > >>>>> documentation provides a hint. > >>>>> > >>>> > >>>> The KSZ8081 spec says the following about bit BMCR_PDOWN: > >>>> > >>>> If software reset (Register 0.15) is used to exit power-down mode > >>>> (Register 0.11 = 1), two software reset writes (Register 0.15 = 1) > >>>> are required. The first write clears power-down mode; the second > >>>> write resets the chip and re-latches the pin strapping pin values. > >>>> > >>>> Maybe this causes the issue you see and genphy_soft_reset() isn't > >>>> appropriate for this PHY. Please re-test with the KSZ8081 soft > >>>> reset following the spec comment. > >>>> > >>> > >>> Interesting. Never expected that behavior. > >>> Thanks for catching it. Skimmed through the datasheets/erratas. > >>> This is what I found (micrel.c): > >>> > >>> 10/100: > >>> 8001 - Unaffected? > >>> 8021/8031 - Double reset after PDOWN. > >>> 8041 - Errata. PDOWN broken. Recommended do not use. Unclear if > >>> reset solves the issue since errata says no error after reset but is > >>> also claiming that only toggling PDOWN (may) or power will help. > >>> 8051 - Double reset after PDOWN. > >>> 8061 - Double reset after PDOWN. > >>> 8081 - Double reset after PDOWN. > >>> 8091 - Double reset after PDOWN. > >>> > >>> 10/100/1000: > >>> Nothing in gigabit afaics. > >>> > >>> Switches: > >>> 8862 - Not affected? > >>> 8863 - Errata. PDOWN broken. Reset will not help. Workaround exists. > >>> 8864 - Not affected? > >>> 8873 - Errata. PDOWN broken. Reset will not help. Workaround exists. > >>> 9477 - Errata. PDOWN broken. Will randomly cause link failure on > >>> adjacent links. Do not use. > >>> > >>> This certainly explains a lot. > >>> > >>>>>> > >>>>>> This patch changes PHY state to PHY_UP when MDIO bus resume back, > >>>>>> it should be reasonable after PHY hardware re-initialized. Also > >>>>>> give state machine a chance to start/config auto-nego again. > >>>>>> > >>>>> > >>>>> If the MAC driver calls phy_stop() on suspend, then > >>>>> phydev->suspended is true and mdio_bus_phy_may_suspend() returns > >>>>> false. As a consequence > >>>>> phydev->suspended_by_mdio_bus is false and mdio_bus_phy_resume() > >>>>> skips the PHY hw initialization. > >>>>> Please also note that mdio_bus_phy_suspend() calls > >>>>> phy_stop_machine() that sets the state to PHY_UP. > >>>>> > >>>> > >>>> Forgot that MDIO bus suspend is done before MAC driver suspend. > >>>> Therefore disregard this part for now. > >>>> > >>>>> Having said that the current argumentation isn't convincing. I'm > >>>>> not aware of such issues on other systems, therefore it's likely > >>>>> that something is system-dependent. > >>>>> > >>>>> Please check the exact call sequence on your system, maybe it > >>>>> provides a hint. > >>>>> > >>>>>> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> > >>>>>> --- > >>>>>> drivers/net/phy/phy_device.c | 7 +++++++ > >>>>>> 1 file changed, 7 insertions(+) > >>>>>> > >>>>>> diff --git a/drivers/net/phy/phy_device.c > >>>>>> b/drivers/net/phy/phy_device.c index cc38e326405a..312a6f662481 > >>>>>> 100644 > >>>>>> --- a/drivers/net/phy/phy_device.c > >>>>>> +++ b/drivers/net/phy/phy_device.c > >>>>>> @@ -306,6 +306,13 @@ static __maybe_unused int > >> mdio_bus_phy_resume(struct device *dev) > >>>>>> ret = phy_resume(phydev); > >>>>>> if (ret < 0) > >>>>>> return ret; > >>>>>> + > >>>>>> + /* PHY state could be changed to PHY_NOLINK from MAC controller > >> resume > >>>>>> + * rounte with phy_start(), here change to PHY_UP after > >> re-initializing > >>>>>> + * PHY hardware, let PHY state machine to start/config > >>>>>> +auto-nego > >> again. > >>>>>> + */ > >>>>>> + phydev->state = PHY_UP; > >>>>>> + > >>>>>> no_resume: > >>>>>> if (phydev->attached_dev && phydev->adjust_link) > >>>>>> phy_start_machine(phydev); > >>>>>> > >>>>> > >>>> > >>> > >> > >> This is a quick draft of the modified soft reset for KSZ8081. > >> Some tests would be appreciated. > >> > > > > I test this patch at my side, unfortunately, it still can't work. > > > > I have not found what soft reset would affect in 8081 spec, is there > > ang common Description for different PHYs? > > > > You can check the clause 22 spec: 802.3 22.2.4.1.1 > > Apart from that you can check BMCR value and which modes your PHY > advertises after a soft reset. If the PHY indeed wouldn't restart aneg after a > soft reset, then it would be the only one with this behavior I know. And I'd > wonder why there aren't more such reports. Hi Heiner, We have reasons to believe it is a weird behavior of KSZ8081. I have another two PHYs at my side, AR8031 and RTL8211FD, they can work fine if soft_reset implemented. As commit message described, phy_start() would change PHY state to PHY_UP finally to PHY_NOLINK when MAC resume. After MDIO bus resume back, it will periodically check link status. I did below code change to dump all PHY registers. --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -35,7 +35,7 @@ #include <net/genetlink.h> #include <net/sock.h> -#define PHY_STATE_TIME HZ +#define PHY_STATE_TIME (10 * HZ) #define PHY_STATE_STR(_state) \ case PHY_##_state: \ @@ -738,6 +738,28 @@ static int phy_check_link_status(struct phy_device *phydev) if (err) return err; + printk("offset 0x00 data = %0x\n", phy_read(phydev, 0x00)); + printk("offset 0x01 data = %0x\n", phy_read(phydev, 0x01)); + printk("offset 0x02 data = %0x\n", phy_read(phydev, 0x02)); + printk("offset 0x03 data = %0x\n", phy_read(phydev, 0x03)); + printk("offset 0x04 data = %0x\n", phy_read(phydev, 0x04)); + printk("offset 0x05 data = %0x\n", phy_read(phydev, 0x05)); + printk("offset 0x06 data = %0x\n", phy_read(phydev, 0x06)); + printk("offset 0x07 data = %0x\n", phy_read(phydev, 0x07)); + printk("offset 0x08 data = %0x\n", phy_read(phydev, 0x08)); + printk("offset 0x09 data = %0x\n", phy_read(phydev, 0x09)); + printk("offset 0x10 data = %0x\n", phy_read(phydev, 0x10)); + printk("offset 0x11 data = %0x\n", phy_read(phydev, 0x11)); + printk("offset 0x15 data = %0x\n", phy_read(phydev, 0x15)); + printk("offset 0x16 data = %0x\n", phy_read(phydev, 0x16)); + printk("offset 0x17 data = %0x\n", phy_read(phydev, 0x17)); + printk("offset 0x18 data = %0x\n", phy_read(phydev, 0x18)); + printk("offset 0x1b data = %0x\n", phy_read(phydev, 0x1b)); + printk("offset 0x1c data = %0x\n", phy_read(phydev, 0x1c)); + printk("offset 0x1d data = %0x\n", phy_read(phydev, 0x1d)); + printk("offset 0x1e data = %0x\n", phy_read(phydev, 0x1e)); + printk("offset 0x1f data = %0x\n", phy_read(phydev, 0x1f)); + printk("\n\n"); if (phydev->link && phydev->state != PHY_RUNNING) { phy_check_downshift(phydev); phydev->state = PHY_RUNNING; After MDIO bus resume back, results as below: [ 119.545383] offset 0x00 data = 1100 [ 119.549237] offset 0x01 data = 7849 [ 119.563125] offset 0x02 data = 22 [ 119.566810] offset 0x03 data = 1560 [ 119.570597] offset 0x04 data = 85e1 [ 119.588016] offset 0x05 data = 0 [ 119.592891] offset 0x06 data = 4 [ 119.596452] offset 0x07 data = 2001 [ 119.600233] offset 0x08 data = 0 [ 119.617864] offset 0x09 data = 0 [ 119.625990] offset 0x10 data = 0 [ 119.629576] offset 0x11 data = 0 [ 119.642735] offset 0x15 data = 0 [ 119.646332] offset 0x16 data = 202 [ 119.650030] offset 0x17 data = 5c02 [ 119.668054] offset 0x18 data = 801 [ 119.672997] offset 0x1b data = 0 [ 119.676565] offset 0x1c data = 0 [ 119.680084] offset 0x1d data = 0 [ 119.698031] offset 0x1e data = 20 [ 119.706246] offset 0x1f data = 8190 [ 119.709984] [ 119.709984] [ 120.182120] offset 0x00 data = 100 [ 120.185894] offset 0x01 data = 784d [ 120.189681] offset 0x02 data = 22 [ 120.206319] offset 0x03 data = 1560 [ 120.210171] offset 0x04 data = 8061 [ 120.225353] offset 0x05 data = 0 [ 120.228948] offset 0x06 data = 4 [ 120.242936] offset 0x07 data = 2001 [ 120.246792] offset 0x08 data = 0 [ 120.250313] offset 0x09 data = 0 [ 120.266748] offset 0x10 data = 0 [ 120.270335] offset 0x11 data = 0 [ 120.284167] offset 0x15 data = 0 [ 120.287760] offset 0x16 data = 202 [ 120.301031] offset 0x17 data = 3c02 [ 120.313209] offset 0x18 data = 801 [ 120.316983] offset 0x1b data = 0 [ 120.320513] offset 0x1c data = 1 [ 120.336589] offset 0x1d data = 0 [ 120.340184] offset 0x1e data = 115 [ 120.355357] offset 0x1f data = 8190 [ 120.359112] [ 120.359112] [ 129.785396] offset 0x00 data = 1100 [ 129.789252] offset 0x01 data = 7849 [ 129.809951] offset 0x02 data = 22 [ 129.815018] offset 0x03 data = 1560 [ 129.818845] offset 0x04 data = 85e1 [ 129.835808] offset 0x05 data = 0 [ 129.839398] offset 0x06 data = 4 [ 129.854514] offset 0x07 data = 2001 [ 129.858371] offset 0x08 data = 0 [ 129.873357] offset 0x09 data = 0 [ 129.876954] offset 0x10 data = 0 [ 129.880472] offset 0x11 data = 0 [ 129.896450] offset 0x15 data = 0 [ 129.900038] offset 0x16 data = 202 [ 129.915041] offset 0x17 data = 5c02 [ 129.918889] offset 0x18 data = 801 [ 129.932735] offset 0x1b data = 0 [ 129.946830] offset 0x1c data = 0 [ 129.950424] offset 0x1d data = 0 [ 129.964585] offset 0x1e data = 0 [ 129.968192] offset 0x1f data = 8190 [ 129.972938] [ 129.972938] [ 130.425125] offset 0x00 data = 100 [ 130.428889] offset 0x01 data = 784d [ 130.442671] offset 0x02 data = 22 [ 130.446360] offset 0x03 data = 1560 [ 130.450142] offset 0x04 data = 8061 [ 130.467207] offset 0x05 data = 0 [ 130.470789] offset 0x06 data = 4 [ 130.485071] offset 0x07 data = 2001 [ 130.488934] offset 0x08 data = 0 [ 130.504320] offset 0x09 data = 0 [ 130.507911] offset 0x10 data = 0 [ 130.520950] offset 0x11 data = 0 [ 130.532865] offset 0x15 data = 0 [ 130.536461] offset 0x16 data = 202 [ 130.540156] offset 0x17 data = 3c02 [ 130.557218] offset 0x18 data = 801 [ 130.560987] offset 0x1b data = 0 [ 130.575158] offset 0x1c data = 1 [ 130.578754] offset 0x1d data = 0 [ 130.593543] offset 0x1e data = 115 [ 130.597312] offset 0x1f data = 8190 We can see that BMCR and BMSR changes from 0x1100,0x7849 to 0x100,0x784d (BMCR[12] bit and BMSR[2]), and loop it. Above process is auto-nego enabled, link is down, auto-nego is disabled, link is up. And auto-nego complete bit is always 0. PHY seems become unstable if soft reset after start/config auto-nego. I also want to fix it in micrel driver, but failed. Do you have any other insights that can help me further locate the issue? Thanks. Best Regards, Joakim Zhang [...]
On 06.04.2021 12:07, Joakim Zhang wrote: > >> -----Original Message----- >> From: Heiner Kallweit <hkallweit1@gmail.com> >> Sent: 2021年4月6日 14:29 >> To: Joakim Zhang <qiangqing.zhang@nxp.com>; christian.melki@t2data.com; >> andrew@lunn.ch; linux@armlinux.org.uk; davem@davemloft.net; >> kuba@kernel.org >> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx >> <linux-imx@nxp.com> >> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume >> back >> >> On 06.04.2021 04:07, Joakim Zhang wrote: >>> >>> Hi Heiner, >>> >>>> -----Original Message----- >>>> From: Heiner Kallweit <hkallweit1@gmail.com> >>>> Sent: 2021年4月5日 20:10 >>>> To: christian.melki@t2data.com; Joakim Zhang >>>> <qiangqing.zhang@nxp.com>; andrew@lunn.ch; linux@armlinux.org.uk; >>>> davem@davemloft.net; kuba@kernel.org >>>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; >>>> dl-linux-imx <linux-imx@nxp.com> >>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus >>>> resume back >>>> >>>> On 05.04.2021 10:43, Christian Melki wrote: >>>>> On 4/5/21 12:48 AM, Heiner Kallweit wrote: >>>>>> On 04.04.2021 16:09, Heiner Kallweit wrote: >>>>>>> On 04.04.2021 12:07, Joakim Zhang wrote: >>>>>>>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram may >>>>>>>> cut off PHY power") invokes phy_init_hw() when MDIO bus resume, >>>>>>>> it will soft reset PHY if PHY driver implements soft_reset callback. >>>>>>>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset callback >>>>>>>> to genphy_soft_reset for KSZ8081") adds soft_reset for KSZ8081. >>>>>>>> After these two patches, I found i.MX6UL 14x14 EVK which >>>>>>>> connected to KSZ8081RNB doesn't work any more when system >> resume >>>>>>>> back, MAC >>>> driver is fec_main.c. >>>>>>>> >>>>>>>> It's obvious that initializing PHY hardware when MDIO bus resume >>>>>>>> back would introduce some regression when PHY implements >>>>>>>> soft_reset. When I >>>>>>> >>>>>>> Why is this obvious? Please elaborate on why a soft reset should >>>>>>> break something. >>>>>>> >>>>>>>> am debugging, I found PHY works fine if MAC doesn't support >>>>>>>> suspend/resume or phy_stop()/phy_start() doesn't been called >>>>>>>> during suspend/resume. This let me realize, PHY state machine >>>>>>>> phy_state_machine() could do something breaks the PHY. >>>>>>>> >>>>>>>> As we known, MAC resume first and then MDIO bus resume when >>>>>>>> system resume back from suspend. When MAC resume, usually it will >>>>>>>> invoke >>>>>>>> phy_start() where to change PHY state to PHY_UP, then trigger the >>>>>>>> stat> machine to run now. In phy_state_machine(), it will >>>>>>>> start/config auto-nego, then change PHY state to PHY_NOLINK, what >>>>>>>> to next is periodically check PHY link status. When MDIO bus >>>>>>>> resume, it will initialize PHY hardware, including soft_reset, >>>>>>>> what would soft_reset affect seems various from different PHYs. >>>>>>>> For KSZ8081RNB, when it in PHY_NOLINK state and then perform a >>>>>>>> soft reset, >>>> it will never complete auto-nego. >>>>>>> >>>>>>> Why? That would need to be checked in detail. Maybe chip errata >>>>>>> documentation provides a hint. >>>>>>> >>>>>> >>>>>> The KSZ8081 spec says the following about bit BMCR_PDOWN: >>>>>> >>>>>> If software reset (Register 0.15) is used to exit power-down mode >>>>>> (Register 0.11 = 1), two software reset writes (Register 0.15 = 1) >>>>>> are required. The first write clears power-down mode; the second >>>>>> write resets the chip and re-latches the pin strapping pin values. >>>>>> >>>>>> Maybe this causes the issue you see and genphy_soft_reset() isn't >>>>>> appropriate for this PHY. Please re-test with the KSZ8081 soft >>>>>> reset following the spec comment. >>>>>> >>>>> >>>>> Interesting. Never expected that behavior. >>>>> Thanks for catching it. Skimmed through the datasheets/erratas. >>>>> This is what I found (micrel.c): >>>>> >>>>> 10/100: >>>>> 8001 - Unaffected? >>>>> 8021/8031 - Double reset after PDOWN. >>>>> 8041 - Errata. PDOWN broken. Recommended do not use. Unclear if >>>>> reset solves the issue since errata says no error after reset but is >>>>> also claiming that only toggling PDOWN (may) or power will help. >>>>> 8051 - Double reset after PDOWN. >>>>> 8061 - Double reset after PDOWN. >>>>> 8081 - Double reset after PDOWN. >>>>> 8091 - Double reset after PDOWN. >>>>> >>>>> 10/100/1000: >>>>> Nothing in gigabit afaics. >>>>> >>>>> Switches: >>>>> 8862 - Not affected? >>>>> 8863 - Errata. PDOWN broken. Reset will not help. Workaround exists. >>>>> 8864 - Not affected? >>>>> 8873 - Errata. PDOWN broken. Reset will not help. Workaround exists. >>>>> 9477 - Errata. PDOWN broken. Will randomly cause link failure on >>>>> adjacent links. Do not use. >>>>> >>>>> This certainly explains a lot. >>>>> >>>>>>>> >>>>>>>> This patch changes PHY state to PHY_UP when MDIO bus resume back, >>>>>>>> it should be reasonable after PHY hardware re-initialized. Also >>>>>>>> give state machine a chance to start/config auto-nego again. >>>>>>>> >>>>>>> >>>>>>> If the MAC driver calls phy_stop() on suspend, then >>>>>>> phydev->suspended is true and mdio_bus_phy_may_suspend() returns >>>>>>> false. As a consequence >>>>>>> phydev->suspended_by_mdio_bus is false and mdio_bus_phy_resume() >>>>>>> skips the PHY hw initialization. >>>>>>> Please also note that mdio_bus_phy_suspend() calls >>>>>>> phy_stop_machine() that sets the state to PHY_UP. >>>>>>> >>>>>> >>>>>> Forgot that MDIO bus suspend is done before MAC driver suspend. >>>>>> Therefore disregard this part for now. >>>>>> >>>>>>> Having said that the current argumentation isn't convincing. I'm >>>>>>> not aware of such issues on other systems, therefore it's likely >>>>>>> that something is system-dependent. >>>>>>> >>>>>>> Please check the exact call sequence on your system, maybe it >>>>>>> provides a hint. >>>>>>> >>>>>>>> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> >>>>>>>> --- >>>>>>>> drivers/net/phy/phy_device.c | 7 +++++++ >>>>>>>> 1 file changed, 7 insertions(+) >>>>>>>> >>>>>>>> diff --git a/drivers/net/phy/phy_device.c >>>>>>>> b/drivers/net/phy/phy_device.c index cc38e326405a..312a6f662481 >>>>>>>> 100644 >>>>>>>> --- a/drivers/net/phy/phy_device.c >>>>>>>> +++ b/drivers/net/phy/phy_device.c >>>>>>>> @@ -306,6 +306,13 @@ static __maybe_unused int >>>> mdio_bus_phy_resume(struct device *dev) >>>>>>>> ret = phy_resume(phydev); >>>>>>>> if (ret < 0) >>>>>>>> return ret; >>>>>>>> + >>>>>>>> + /* PHY state could be changed to PHY_NOLINK from MAC controller >>>> resume >>>>>>>> + * rounte with phy_start(), here change to PHY_UP after >>>> re-initializing >>>>>>>> + * PHY hardware, let PHY state machine to start/config >>>>>>>> +auto-nego >>>> again. >>>>>>>> + */ >>>>>>>> + phydev->state = PHY_UP; >>>>>>>> + >>>>>>>> no_resume: >>>>>>>> if (phydev->attached_dev && phydev->adjust_link) >>>>>>>> phy_start_machine(phydev); >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> This is a quick draft of the modified soft reset for KSZ8081. >>>> Some tests would be appreciated. >>>> >>> >>> I test this patch at my side, unfortunately, it still can't work. >>> >>> I have not found what soft reset would affect in 8081 spec, is there >>> ang common Description for different PHYs? >>> >> >> You can check the clause 22 spec: 802.3 22.2.4.1.1 >> >> Apart from that you can check BMCR value and which modes your PHY >> advertises after a soft reset. If the PHY indeed wouldn't restart aneg after a >> soft reset, then it would be the only one with this behavior I know. And I'd >> wonder why there aren't more such reports. > > Hi Heiner, > > We have reasons to believe it is a weird behavior of KSZ8081. I have another two PHYs at my side, AR8031 and RTL8211FD, > they can work fine if soft_reset implemented. > > As commit message described, phy_start() would change PHY state to PHY_UP finally to PHY_NOLINK when MAC resume. > After MDIO bus resume back, it will periodically check link status. I did below code change to dump all PHY registers. > > --- a/drivers/net/phy/phy.c > +++ b/drivers/net/phy/phy.c > @@ -35,7 +35,7 @@ > #include <net/genetlink.h> > #include <net/sock.h> > > -#define PHY_STATE_TIME HZ > +#define PHY_STATE_TIME (10 * HZ) > > #define PHY_STATE_STR(_state) \ > case PHY_##_state: \ > @@ -738,6 +738,28 @@ static int phy_check_link_status(struct phy_device *phydev) > if (err) > return err; > > + printk("offset 0x00 data = %0x\n", phy_read(phydev, 0x00)); > + printk("offset 0x01 data = %0x\n", phy_read(phydev, 0x01)); > + printk("offset 0x02 data = %0x\n", phy_read(phydev, 0x02)); > + printk("offset 0x03 data = %0x\n", phy_read(phydev, 0x03)); > + printk("offset 0x04 data = %0x\n", phy_read(phydev, 0x04)); > + printk("offset 0x05 data = %0x\n", phy_read(phydev, 0x05)); > + printk("offset 0x06 data = %0x\n", phy_read(phydev, 0x06)); > + printk("offset 0x07 data = %0x\n", phy_read(phydev, 0x07)); > + printk("offset 0x08 data = %0x\n", phy_read(phydev, 0x08)); > + printk("offset 0x09 data = %0x\n", phy_read(phydev, 0x09)); > + printk("offset 0x10 data = %0x\n", phy_read(phydev, 0x10)); > + printk("offset 0x11 data = %0x\n", phy_read(phydev, 0x11)); > + printk("offset 0x15 data = %0x\n", phy_read(phydev, 0x15)); > + printk("offset 0x16 data = %0x\n", phy_read(phydev, 0x16)); > + printk("offset 0x17 data = %0x\n", phy_read(phydev, 0x17)); > + printk("offset 0x18 data = %0x\n", phy_read(phydev, 0x18)); > + printk("offset 0x1b data = %0x\n", phy_read(phydev, 0x1b)); > + printk("offset 0x1c data = %0x\n", phy_read(phydev, 0x1c)); > + printk("offset 0x1d data = %0x\n", phy_read(phydev, 0x1d)); > + printk("offset 0x1e data = %0x\n", phy_read(phydev, 0x1e)); > + printk("offset 0x1f data = %0x\n", phy_read(phydev, 0x1f)); > + printk("\n\n"); > if (phydev->link && phydev->state != PHY_RUNNING) { > phy_check_downshift(phydev); > phydev->state = PHY_RUNNING; > > After MDIO bus resume back, results as below: > > [ 119.545383] offset 0x00 data = 1100 > [ 119.549237] offset 0x01 data = 7849 > [ 119.563125] offset 0x02 data = 22 > [ 119.566810] offset 0x03 data = 1560 > [ 119.570597] offset 0x04 data = 85e1 > [ 119.588016] offset 0x05 data = 0 > [ 119.592891] offset 0x06 data = 4 > [ 119.596452] offset 0x07 data = 2001 > [ 119.600233] offset 0x08 data = 0 > [ 119.617864] offset 0x09 data = 0 > [ 119.625990] offset 0x10 data = 0 > [ 119.629576] offset 0x11 data = 0 > [ 119.642735] offset 0x15 data = 0 > [ 119.646332] offset 0x16 data = 202 > [ 119.650030] offset 0x17 data = 5c02 > [ 119.668054] offset 0x18 data = 801 > [ 119.672997] offset 0x1b data = 0 > [ 119.676565] offset 0x1c data = 0 > [ 119.680084] offset 0x1d data = 0 > [ 119.698031] offset 0x1e data = 20 > [ 119.706246] offset 0x1f data = 8190 > [ 119.709984] > [ 119.709984] > [ 120.182120] offset 0x00 data = 100 > [ 120.185894] offset 0x01 data = 784d > [ 120.189681] offset 0x02 data = 22 > [ 120.206319] offset 0x03 data = 1560 > [ 120.210171] offset 0x04 data = 8061 > [ 120.225353] offset 0x05 data = 0 > [ 120.228948] offset 0x06 data = 4 > [ 120.242936] offset 0x07 data = 2001 > [ 120.246792] offset 0x08 data = 0 > [ 120.250313] offset 0x09 data = 0 > [ 120.266748] offset 0x10 data = 0 > [ 120.270335] offset 0x11 data = 0 > [ 120.284167] offset 0x15 data = 0 > [ 120.287760] offset 0x16 data = 202 > [ 120.301031] offset 0x17 data = 3c02 > [ 120.313209] offset 0x18 data = 801 > [ 120.316983] offset 0x1b data = 0 > [ 120.320513] offset 0x1c data = 1 > [ 120.336589] offset 0x1d data = 0 > [ 120.340184] offset 0x1e data = 115 > [ 120.355357] offset 0x1f data = 8190 > [ 120.359112] > [ 120.359112] > [ 129.785396] offset 0x00 data = 1100 > [ 129.789252] offset 0x01 data = 7849 > [ 129.809951] offset 0x02 data = 22 > [ 129.815018] offset 0x03 data = 1560 > [ 129.818845] offset 0x04 data = 85e1 > [ 129.835808] offset 0x05 data = 0 > [ 129.839398] offset 0x06 data = 4 > [ 129.854514] offset 0x07 data = 2001 > [ 129.858371] offset 0x08 data = 0 > [ 129.873357] offset 0x09 data = 0 > [ 129.876954] offset 0x10 data = 0 > [ 129.880472] offset 0x11 data = 0 > [ 129.896450] offset 0x15 data = 0 > [ 129.900038] offset 0x16 data = 202 > [ 129.915041] offset 0x17 data = 5c02 > [ 129.918889] offset 0x18 data = 801 > [ 129.932735] offset 0x1b data = 0 > [ 129.946830] offset 0x1c data = 0 > [ 129.950424] offset 0x1d data = 0 > [ 129.964585] offset 0x1e data = 0 > [ 129.968192] offset 0x1f data = 8190 > [ 129.972938] > [ 129.972938] > [ 130.425125] offset 0x00 data = 100 > [ 130.428889] offset 0x01 data = 784d > [ 130.442671] offset 0x02 data = 22 > [ 130.446360] offset 0x03 data = 1560 > [ 130.450142] offset 0x04 data = 8061 > [ 130.467207] offset 0x05 data = 0 > [ 130.470789] offset 0x06 data = 4 > [ 130.485071] offset 0x07 data = 2001 > [ 130.488934] offset 0x08 data = 0 > [ 130.504320] offset 0x09 data = 0 > [ 130.507911] offset 0x10 data = 0 > [ 130.520950] offset 0x11 data = 0 > [ 130.532865] offset 0x15 data = 0 > [ 130.536461] offset 0x16 data = 202 > [ 130.540156] offset 0x17 data = 3c02 > [ 130.557218] offset 0x18 data = 801 > [ 130.560987] offset 0x1b data = 0 > [ 130.575158] offset 0x1c data = 1 > [ 130.578754] offset 0x1d data = 0 > [ 130.593543] offset 0x1e data = 115 > [ 130.597312] offset 0x1f data = 8190 > > We can see that BMCR and BMSR changes from 0x1100,0x7849 to 0x100,0x784d (BMCR[12] bit and BMSR[2]), and loop it. > Above process is auto-nego enabled, link is down, auto-nego is disabled, link is up. And auto-nego complete bit is always 0. > > PHY seems become unstable if soft reset after start/config auto-nego. I also want to fix it in micrel driver, but failed. > Waiting for ANEG_COMPLETE to be set wouldn't be a good option. Aneg may never complete for different reasons, e.g. no physical link. And even if we use a timeout this may add unwanted delays. > Do you have any other insights that can help me further locate the issue? Thanks. > I think current MAC/PHY PM handling isn't perfect. Often we have the following scenario: *suspend* 1. PHY is suspended (mdio_bus_phy_suspend) 2. MAC suspend callback (typically involving phy_stop()) *resume* 1. MAC resume callback (typically involving phy_start()) 2. PHY is resumed (mdio_bus_phy_resume), incl. calling phy_init_hw() Calling phy_init_hw() after phy_start() doesn't look right. It seems to work in most cases, but there's a certain risk that phy_init_hw() overwrites something, e.g. the advertised modes. I think we have two valid scenarios: 1. phylib PM callbacks are used, then the MAC driver shouldn't touch the PHY in its PM callbacks, especially not call phy_stop/phy_start. 2. MAC PM callbacks take care also of the PHY. Then I think we would need a flag at the phy_device telling it to make the PHY PM callbacks a no-op. Andrew, Russell: What do you think? > Best Regards, > Joakim Zhang > > [...] > Heiner
On 06.04.2021 13:42, Heiner Kallweit wrote: > On 06.04.2021 12:07, Joakim Zhang wrote: >> >>> -----Original Message----- >>> From: Heiner Kallweit <hkallweit1@gmail.com> >>> Sent: 2021年4月6日 14:29 >>> To: Joakim Zhang <qiangqing.zhang@nxp.com>; christian.melki@t2data.com; >>> andrew@lunn.ch; linux@armlinux.org.uk; davem@davemloft.net; >>> kuba@kernel.org >>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx >>> <linux-imx@nxp.com> >>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume >>> back >>> >>> On 06.04.2021 04:07, Joakim Zhang wrote: >>>> >>>> Hi Heiner, >>>> >>>>> -----Original Message----- >>>>> From: Heiner Kallweit <hkallweit1@gmail.com> >>>>> Sent: 2021年4月5日 20:10 >>>>> To: christian.melki@t2data.com; Joakim Zhang >>>>> <qiangqing.zhang@nxp.com>; andrew@lunn.ch; linux@armlinux.org.uk; >>>>> davem@davemloft.net; kuba@kernel.org >>>>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; >>>>> dl-linux-imx <linux-imx@nxp.com> >>>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus >>>>> resume back >>>>> >>>>> On 05.04.2021 10:43, Christian Melki wrote: >>>>>> On 4/5/21 12:48 AM, Heiner Kallweit wrote: >>>>>>> On 04.04.2021 16:09, Heiner Kallweit wrote: >>>>>>>> On 04.04.2021 12:07, Joakim Zhang wrote: >>>>>>>>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram may >>>>>>>>> cut off PHY power") invokes phy_init_hw() when MDIO bus resume, >>>>>>>>> it will soft reset PHY if PHY driver implements soft_reset callback. >>>>>>>>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset callback >>>>>>>>> to genphy_soft_reset for KSZ8081") adds soft_reset for KSZ8081. >>>>>>>>> After these two patches, I found i.MX6UL 14x14 EVK which >>>>>>>>> connected to KSZ8081RNB doesn't work any more when system >>> resume >>>>>>>>> back, MAC >>>>> driver is fec_main.c. >>>>>>>>> >>>>>>>>> It's obvious that initializing PHY hardware when MDIO bus resume >>>>>>>>> back would introduce some regression when PHY implements >>>>>>>>> soft_reset. When I >>>>>>>> >>>>>>>> Why is this obvious? Please elaborate on why a soft reset should >>>>>>>> break something. >>>>>>>> >>>>>>>>> am debugging, I found PHY works fine if MAC doesn't support >>>>>>>>> suspend/resume or phy_stop()/phy_start() doesn't been called >>>>>>>>> during suspend/resume. This let me realize, PHY state machine >>>>>>>>> phy_state_machine() could do something breaks the PHY. >>>>>>>>> >>>>>>>>> As we known, MAC resume first and then MDIO bus resume when >>>>>>>>> system resume back from suspend. When MAC resume, usually it will >>>>>>>>> invoke >>>>>>>>> phy_start() where to change PHY state to PHY_UP, then trigger the >>>>>>>>> stat> machine to run now. In phy_state_machine(), it will >>>>>>>>> start/config auto-nego, then change PHY state to PHY_NOLINK, what >>>>>>>>> to next is periodically check PHY link status. When MDIO bus >>>>>>>>> resume, it will initialize PHY hardware, including soft_reset, >>>>>>>>> what would soft_reset affect seems various from different PHYs. >>>>>>>>> For KSZ8081RNB, when it in PHY_NOLINK state and then perform a >>>>>>>>> soft reset, >>>>> it will never complete auto-nego. >>>>>>>> >>>>>>>> Why? That would need to be checked in detail. Maybe chip errata >>>>>>>> documentation provides a hint. >>>>>>>> >>>>>>> >>>>>>> The KSZ8081 spec says the following about bit BMCR_PDOWN: >>>>>>> >>>>>>> If software reset (Register 0.15) is used to exit power-down mode >>>>>>> (Register 0.11 = 1), two software reset writes (Register 0.15 = 1) >>>>>>> are required. The first write clears power-down mode; the second >>>>>>> write resets the chip and re-latches the pin strapping pin values. >>>>>>> >>>>>>> Maybe this causes the issue you see and genphy_soft_reset() isn't >>>>>>> appropriate for this PHY. Please re-test with the KSZ8081 soft >>>>>>> reset following the spec comment. >>>>>>> >>>>>> >>>>>> Interesting. Never expected that behavior. >>>>>> Thanks for catching it. Skimmed through the datasheets/erratas. >>>>>> This is what I found (micrel.c): >>>>>> >>>>>> 10/100: >>>>>> 8001 - Unaffected? >>>>>> 8021/8031 - Double reset after PDOWN. >>>>>> 8041 - Errata. PDOWN broken. Recommended do not use. Unclear if >>>>>> reset solves the issue since errata says no error after reset but is >>>>>> also claiming that only toggling PDOWN (may) or power will help. >>>>>> 8051 - Double reset after PDOWN. >>>>>> 8061 - Double reset after PDOWN. >>>>>> 8081 - Double reset after PDOWN. >>>>>> 8091 - Double reset after PDOWN. >>>>>> >>>>>> 10/100/1000: >>>>>> Nothing in gigabit afaics. >>>>>> >>>>>> Switches: >>>>>> 8862 - Not affected? >>>>>> 8863 - Errata. PDOWN broken. Reset will not help. Workaround exists. >>>>>> 8864 - Not affected? >>>>>> 8873 - Errata. PDOWN broken. Reset will not help. Workaround exists. >>>>>> 9477 - Errata. PDOWN broken. Will randomly cause link failure on >>>>>> adjacent links. Do not use. >>>>>> >>>>>> This certainly explains a lot. >>>>>> >>>>>>>>> >>>>>>>>> This patch changes PHY state to PHY_UP when MDIO bus resume back, >>>>>>>>> it should be reasonable after PHY hardware re-initialized. Also >>>>>>>>> give state machine a chance to start/config auto-nego again. >>>>>>>>> >>>>>>>> >>>>>>>> If the MAC driver calls phy_stop() on suspend, then >>>>>>>> phydev->suspended is true and mdio_bus_phy_may_suspend() returns >>>>>>>> false. As a consequence >>>>>>>> phydev->suspended_by_mdio_bus is false and mdio_bus_phy_resume() >>>>>>>> skips the PHY hw initialization. >>>>>>>> Please also note that mdio_bus_phy_suspend() calls >>>>>>>> phy_stop_machine() that sets the state to PHY_UP. >>>>>>>> >>>>>>> >>>>>>> Forgot that MDIO bus suspend is done before MAC driver suspend. >>>>>>> Therefore disregard this part for now. >>>>>>> >>>>>>>> Having said that the current argumentation isn't convincing. I'm >>>>>>>> not aware of such issues on other systems, therefore it's likely >>>>>>>> that something is system-dependent. >>>>>>>> >>>>>>>> Please check the exact call sequence on your system, maybe it >>>>>>>> provides a hint. >>>>>>>> >>>>>>>>> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> >>>>>>>>> --- >>>>>>>>> drivers/net/phy/phy_device.c | 7 +++++++ >>>>>>>>> 1 file changed, 7 insertions(+) >>>>>>>>> >>>>>>>>> diff --git a/drivers/net/phy/phy_device.c >>>>>>>>> b/drivers/net/phy/phy_device.c index cc38e326405a..312a6f662481 >>>>>>>>> 100644 >>>>>>>>> --- a/drivers/net/phy/phy_device.c >>>>>>>>> +++ b/drivers/net/phy/phy_device.c >>>>>>>>> @@ -306,6 +306,13 @@ static __maybe_unused int >>>>> mdio_bus_phy_resume(struct device *dev) >>>>>>>>> ret = phy_resume(phydev); >>>>>>>>> if (ret < 0) >>>>>>>>> return ret; >>>>>>>>> + >>>>>>>>> + /* PHY state could be changed to PHY_NOLINK from MAC controller >>>>> resume >>>>>>>>> + * rounte with phy_start(), here change to PHY_UP after >>>>> re-initializing >>>>>>>>> + * PHY hardware, let PHY state machine to start/config >>>>>>>>> +auto-nego >>>>> again. >>>>>>>>> + */ >>>>>>>>> + phydev->state = PHY_UP; >>>>>>>>> + >>>>>>>>> no_resume: >>>>>>>>> if (phydev->attached_dev && phydev->adjust_link) >>>>>>>>> phy_start_machine(phydev); >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> This is a quick draft of the modified soft reset for KSZ8081. >>>>> Some tests would be appreciated. >>>>> >>>> >>>> I test this patch at my side, unfortunately, it still can't work. >>>> >>>> I have not found what soft reset would affect in 8081 spec, is there >>>> ang common Description for different PHYs? >>>> >>> >>> You can check the clause 22 spec: 802.3 22.2.4.1.1 >>> >>> Apart from that you can check BMCR value and which modes your PHY >>> advertises after a soft reset. If the PHY indeed wouldn't restart aneg after a >>> soft reset, then it would be the only one with this behavior I know. And I'd >>> wonder why there aren't more such reports. >> >> Hi Heiner, >> >> We have reasons to believe it is a weird behavior of KSZ8081. I have another two PHYs at my side, AR8031 and RTL8211FD, >> they can work fine if soft_reset implemented. >> >> As commit message described, phy_start() would change PHY state to PHY_UP finally to PHY_NOLINK when MAC resume. >> After MDIO bus resume back, it will periodically check link status. I did below code change to dump all PHY registers. >> >> --- a/drivers/net/phy/phy.c >> +++ b/drivers/net/phy/phy.c >> @@ -35,7 +35,7 @@ >> #include <net/genetlink.h> >> #include <net/sock.h> >> >> -#define PHY_STATE_TIME HZ >> +#define PHY_STATE_TIME (10 * HZ) >> >> #define PHY_STATE_STR(_state) \ >> case PHY_##_state: \ >> @@ -738,6 +738,28 @@ static int phy_check_link_status(struct phy_device *phydev) >> if (err) >> return err; >> >> + printk("offset 0x00 data = %0x\n", phy_read(phydev, 0x00)); >> + printk("offset 0x01 data = %0x\n", phy_read(phydev, 0x01)); >> + printk("offset 0x02 data = %0x\n", phy_read(phydev, 0x02)); >> + printk("offset 0x03 data = %0x\n", phy_read(phydev, 0x03)); >> + printk("offset 0x04 data = %0x\n", phy_read(phydev, 0x04)); >> + printk("offset 0x05 data = %0x\n", phy_read(phydev, 0x05)); >> + printk("offset 0x06 data = %0x\n", phy_read(phydev, 0x06)); >> + printk("offset 0x07 data = %0x\n", phy_read(phydev, 0x07)); >> + printk("offset 0x08 data = %0x\n", phy_read(phydev, 0x08)); >> + printk("offset 0x09 data = %0x\n", phy_read(phydev, 0x09)); >> + printk("offset 0x10 data = %0x\n", phy_read(phydev, 0x10)); >> + printk("offset 0x11 data = %0x\n", phy_read(phydev, 0x11)); >> + printk("offset 0x15 data = %0x\n", phy_read(phydev, 0x15)); >> + printk("offset 0x16 data = %0x\n", phy_read(phydev, 0x16)); >> + printk("offset 0x17 data = %0x\n", phy_read(phydev, 0x17)); >> + printk("offset 0x18 data = %0x\n", phy_read(phydev, 0x18)); >> + printk("offset 0x1b data = %0x\n", phy_read(phydev, 0x1b)); >> + printk("offset 0x1c data = %0x\n", phy_read(phydev, 0x1c)); >> + printk("offset 0x1d data = %0x\n", phy_read(phydev, 0x1d)); >> + printk("offset 0x1e data = %0x\n", phy_read(phydev, 0x1e)); >> + printk("offset 0x1f data = %0x\n", phy_read(phydev, 0x1f)); >> + printk("\n\n"); >> if (phydev->link && phydev->state != PHY_RUNNING) { >> phy_check_downshift(phydev); >> phydev->state = PHY_RUNNING; >> >> After MDIO bus resume back, results as below: >> >> [ 119.545383] offset 0x00 data = 1100 >> [ 119.549237] offset 0x01 data = 7849 >> [ 119.563125] offset 0x02 data = 22 >> [ 119.566810] offset 0x03 data = 1560 >> [ 119.570597] offset 0x04 data = 85e1 >> [ 119.588016] offset 0x05 data = 0 >> [ 119.592891] offset 0x06 data = 4 >> [ 119.596452] offset 0x07 data = 2001 >> [ 119.600233] offset 0x08 data = 0 >> [ 119.617864] offset 0x09 data = 0 >> [ 119.625990] offset 0x10 data = 0 >> [ 119.629576] offset 0x11 data = 0 >> [ 119.642735] offset 0x15 data = 0 >> [ 119.646332] offset 0x16 data = 202 >> [ 119.650030] offset 0x17 data = 5c02 >> [ 119.668054] offset 0x18 data = 801 >> [ 119.672997] offset 0x1b data = 0 >> [ 119.676565] offset 0x1c data = 0 >> [ 119.680084] offset 0x1d data = 0 >> [ 119.698031] offset 0x1e data = 20 >> [ 119.706246] offset 0x1f data = 8190 >> [ 119.709984] >> [ 119.709984] >> [ 120.182120] offset 0x00 data = 100 >> [ 120.185894] offset 0x01 data = 784d >> [ 120.189681] offset 0x02 data = 22 >> [ 120.206319] offset 0x03 data = 1560 >> [ 120.210171] offset 0x04 data = 8061 >> [ 120.225353] offset 0x05 data = 0 >> [ 120.228948] offset 0x06 data = 4 >> [ 120.242936] offset 0x07 data = 2001 >> [ 120.246792] offset 0x08 data = 0 >> [ 120.250313] offset 0x09 data = 0 >> [ 120.266748] offset 0x10 data = 0 >> [ 120.270335] offset 0x11 data = 0 >> [ 120.284167] offset 0x15 data = 0 >> [ 120.287760] offset 0x16 data = 202 >> [ 120.301031] offset 0x17 data = 3c02 >> [ 120.313209] offset 0x18 data = 801 >> [ 120.316983] offset 0x1b data = 0 >> [ 120.320513] offset 0x1c data = 1 >> [ 120.336589] offset 0x1d data = 0 >> [ 120.340184] offset 0x1e data = 115 >> [ 120.355357] offset 0x1f data = 8190 >> [ 120.359112] >> [ 120.359112] >> [ 129.785396] offset 0x00 data = 1100 >> [ 129.789252] offset 0x01 data = 7849 >> [ 129.809951] offset 0x02 data = 22 >> [ 129.815018] offset 0x03 data = 1560 >> [ 129.818845] offset 0x04 data = 85e1 >> [ 129.835808] offset 0x05 data = 0 >> [ 129.839398] offset 0x06 data = 4 >> [ 129.854514] offset 0x07 data = 2001 >> [ 129.858371] offset 0x08 data = 0 >> [ 129.873357] offset 0x09 data = 0 >> [ 129.876954] offset 0x10 data = 0 >> [ 129.880472] offset 0x11 data = 0 >> [ 129.896450] offset 0x15 data = 0 >> [ 129.900038] offset 0x16 data = 202 >> [ 129.915041] offset 0x17 data = 5c02 >> [ 129.918889] offset 0x18 data = 801 >> [ 129.932735] offset 0x1b data = 0 >> [ 129.946830] offset 0x1c data = 0 >> [ 129.950424] offset 0x1d data = 0 >> [ 129.964585] offset 0x1e data = 0 >> [ 129.968192] offset 0x1f data = 8190 >> [ 129.972938] >> [ 129.972938] >> [ 130.425125] offset 0x00 data = 100 >> [ 130.428889] offset 0x01 data = 784d >> [ 130.442671] offset 0x02 data = 22 >> [ 130.446360] offset 0x03 data = 1560 >> [ 130.450142] offset 0x04 data = 8061 >> [ 130.467207] offset 0x05 data = 0 >> [ 130.470789] offset 0x06 data = 4 >> [ 130.485071] offset 0x07 data = 2001 >> [ 130.488934] offset 0x08 data = 0 >> [ 130.504320] offset 0x09 data = 0 >> [ 130.507911] offset 0x10 data = 0 >> [ 130.520950] offset 0x11 data = 0 >> [ 130.532865] offset 0x15 data = 0 >> [ 130.536461] offset 0x16 data = 202 >> [ 130.540156] offset 0x17 data = 3c02 >> [ 130.557218] offset 0x18 data = 801 >> [ 130.560987] offset 0x1b data = 0 >> [ 130.575158] offset 0x1c data = 1 >> [ 130.578754] offset 0x1d data = 0 >> [ 130.593543] offset 0x1e data = 115 >> [ 130.597312] offset 0x1f data = 8190 >> >> We can see that BMCR and BMSR changes from 0x1100,0x7849 to 0x100,0x784d (BMCR[12] bit and BMSR[2]), and loop it. >> Above process is auto-nego enabled, link is down, auto-nego is disabled, link is up. And auto-nego complete bit is always 0. >> >> PHY seems become unstable if soft reset after start/config auto-nego. I also want to fix it in micrel driver, but failed. >> > > Waiting for ANEG_COMPLETE to be set wouldn't be a good option. Aneg may never > complete for different reasons, e.g. no physical link. And even if we use a > timeout this may add unwanted delays. > >> Do you have any other insights that can help me further locate the issue? Thanks. >> > > I think current MAC/PHY PM handling isn't perfect. Often we have the following > scenario: > > *suspend* > 1. PHY is suspended (mdio_bus_phy_suspend) > 2. MAC suspend callback (typically involving phy_stop()) > > *resume* > 1. MAC resume callback (typically involving phy_start()) > 2. PHY is resumed (mdio_bus_phy_resume), incl. calling phy_init_hw() > > Calling phy_init_hw() after phy_start() doesn't look right. > It seems to work in most cases, but there's a certain risk > that phy_init_hw() overwrites something, e.g. the advertised > modes. > I think we have two valid scenarios: > > 1. phylib PM callbacks are used, then the MAC driver shouldn't > touch the PHY in its PM callbacks, especially not call > phy_stop/phy_start. > > 2. MAC PM callbacks take care also of the PHY. Then I think we would > need a flag at the phy_device telling it to make the PHY PM > callbacks a no-op. > > Andrew, Russell: What do you think? > >> Best Regards, >> Joakim Zhang >> >> [...] >> > Heiner > Based on scenario 1 you can also try the following. diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 3db882322..cdf9160fc 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -3805,7 +3805,6 @@ static int __maybe_unused fec_suspend(struct device *dev) if (netif_running(ndev)) { if (fep->wol_flag & FEC_WOL_FLAG_ENABLE) fep->wol_flag |= FEC_WOL_FLAG_SLEEP_ON; - phy_stop(ndev->phydev); napi_disable(&fep->napi); netif_tx_lock_bh(ndev); netif_device_detach(ndev); @@ -3864,7 +3863,6 @@ static int __maybe_unused fec_resume(struct device *dev) netif_device_attach(ndev); netif_tx_unlock_bh(ndev); napi_enable(&fep->napi); - phy_start(ndev->phydev); } rtnl_unlock(); -- 2.31.1
On 4/6/2021 4:42 AM, Heiner Kallweit wrote: > > Waiting for ANEG_COMPLETE to be set wouldn't be a good option. Aneg may never > complete for different reasons, e.g. no physical link. And even if we use a > timeout this may add unwanted delays. > >> Do you have any other insights that can help me further locate the issue? Thanks. >> > > I think current MAC/PHY PM handling isn't perfect. Often we have the following > scenario: > > *suspend* > 1. PHY is suspended (mdio_bus_phy_suspend) > 2. MAC suspend callback (typically involving phy_stop()) > > *resume* > 1. MAC resume callback (typically involving phy_start()) > 2. PHY is resumed (mdio_bus_phy_resume), incl. calling phy_init_hw() > > Calling phy_init_hw() after phy_start() doesn't look right. > It seems to work in most cases, but there's a certain risk > that phy_init_hw() overwrites something, e.g. the advertised > modes. > I think we have two valid scenarios: > > 1. phylib PM callbacks are used, then the MAC driver shouldn't > touch the PHY in its PM callbacks, especially not call > phy_stop/phy_start. > > 2. MAC PM callbacks take care also of the PHY. Then I think we would > need a flag at the phy_device telling it to make the PHY PM > callbacks a no-op. Maybe part of the problem is that the FEC is calling phy_{stop,start} in its suspend/resume callbacks instead of phy_{suspend,resume} which would play nice and tell the MDIO bus PM callbacks that the PHY has already been suspended. I am also suspicious about whether Wake-on-LAN actually works with the FEC, you cannot wake from LAN if the PHY is stopped and powered down. -- Florian
On 06.04.2021 20:32, Florian Fainelli wrote: > > > On 4/6/2021 4:42 AM, Heiner Kallweit wrote: >> >> Waiting for ANEG_COMPLETE to be set wouldn't be a good option. Aneg may never >> complete for different reasons, e.g. no physical link. And even if we use a >> timeout this may add unwanted delays. >> >>> Do you have any other insights that can help me further locate the issue? Thanks. >>> >> >> I think current MAC/PHY PM handling isn't perfect. Often we have the following >> scenario: >> >> *suspend* >> 1. PHY is suspended (mdio_bus_phy_suspend) >> 2. MAC suspend callback (typically involving phy_stop()) >> >> *resume* >> 1. MAC resume callback (typically involving phy_start()) >> 2. PHY is resumed (mdio_bus_phy_resume), incl. calling phy_init_hw() >> >> Calling phy_init_hw() after phy_start() doesn't look right. >> It seems to work in most cases, but there's a certain risk >> that phy_init_hw() overwrites something, e.g. the advertised >> modes. >> I think we have two valid scenarios: >> >> 1. phylib PM callbacks are used, then the MAC driver shouldn't >> touch the PHY in its PM callbacks, especially not call >> phy_stop/phy_start. >> >> 2. MAC PM callbacks take care also of the PHY. Then I think we would >> need a flag at the phy_device telling it to make the PHY PM >> callbacks a no-op. > > Maybe part of the problem is that the FEC is calling phy_{stop,start} in > its suspend/resume callbacks instead of phy_{suspend,resume} which would > play nice and tell the MDIO bus PM callbacks that the PHY has already > been suspended. > This basically is what I just proposed to test. > I am also suspicious about whether Wake-on-LAN actually works with the > FEC, you cannot wake from LAN if the PHY is stopped and powered down. > phy_stop() calls phy_suspend() which checks for WoL. Therefore this should not be a problem.
On 4/6/2021 11:43 AM, Heiner Kallweit wrote: > On 06.04.2021 20:32, Florian Fainelli wrote: >> >> >> On 4/6/2021 4:42 AM, Heiner Kallweit wrote: >>> >>> Waiting for ANEG_COMPLETE to be set wouldn't be a good option. Aneg may never >>> complete for different reasons, e.g. no physical link. And even if we use a >>> timeout this may add unwanted delays. >>> >>>> Do you have any other insights that can help me further locate the issue? Thanks. >>>> >>> >>> I think current MAC/PHY PM handling isn't perfect. Often we have the following >>> scenario: >>> >>> *suspend* >>> 1. PHY is suspended (mdio_bus_phy_suspend) >>> 2. MAC suspend callback (typically involving phy_stop()) >>> >>> *resume* >>> 1. MAC resume callback (typically involving phy_start()) >>> 2. PHY is resumed (mdio_bus_phy_resume), incl. calling phy_init_hw() >>> >>> Calling phy_init_hw() after phy_start() doesn't look right. >>> It seems to work in most cases, but there's a certain risk >>> that phy_init_hw() overwrites something, e.g. the advertised >>> modes. >>> I think we have two valid scenarios: >>> >>> 1. phylib PM callbacks are used, then the MAC driver shouldn't >>> touch the PHY in its PM callbacks, especially not call >>> phy_stop/phy_start. >>> >>> 2. MAC PM callbacks take care also of the PHY. Then I think we would >>> need a flag at the phy_device telling it to make the PHY PM >>> callbacks a no-op. >> >> Maybe part of the problem is that the FEC is calling phy_{stop,start} in >> its suspend/resume callbacks instead of phy_{suspend,resume} which would >> play nice and tell the MDIO bus PM callbacks that the PHY has already >> been suspended. >> > This basically is what I just proposed to test. What you suggested to be tested is to let the MDIO bus PM callbacks deal with suspending the PHY, which is different from having the MAC do it explicitly, both would be interesting to try out. > >> I am also suspicious about whether Wake-on-LAN actually works with the >> FEC, you cannot wake from LAN if the PHY is stopped and powered down. >> > phy_stop() calls phy_suspend() which checks for WoL. Therefore this > should not be a problem. Indeed, and I had missed that phy_suspend() checks netdev->wol_enabled, thanks for reminding me. -- Florian
Hi Heiner, > -----Original Message----- > From: Heiner Kallweit <hkallweit1@gmail.com> > Sent: 2021年4月7日 2:22 > To: Joakim Zhang <qiangqing.zhang@nxp.com>; christian.melki@t2data.com; > andrew@lunn.ch; linux@armlinux.org.uk; davem@davemloft.net; > kuba@kernel.org; Russell King - ARM Linux <linux@armlinux.org.uk> > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx > <linux-imx@nxp.com>; Florian Fainelli <f.fainelli@gmail.com> > Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume > back > > On 06.04.2021 13:42, Heiner Kallweit wrote: > > On 06.04.2021 12:07, Joakim Zhang wrote: > >> > >>> -----Original Message----- > >>> From: Heiner Kallweit <hkallweit1@gmail.com> > >>> Sent: 2021年4月6日 14:29 > >>> To: Joakim Zhang <qiangqing.zhang@nxp.com>; > >>> christian.melki@t2data.com; andrew@lunn.ch; linux@armlinux.org.uk; > >>> davem@davemloft.net; kuba@kernel.org > >>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > >>> dl-linux-imx <linux-imx@nxp.com> > >>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO > >>> bus resume back > >>> > >>> On 06.04.2021 04:07, Joakim Zhang wrote: > >>>> > >>>> Hi Heiner, > >>>> > >>>>> -----Original Message----- > >>>>> From: Heiner Kallweit <hkallweit1@gmail.com> > >>>>> Sent: 2021年4月5日 20:10 > >>>>> To: christian.melki@t2data.com; Joakim Zhang > >>>>> <qiangqing.zhang@nxp.com>; andrew@lunn.ch; linux@armlinux.org.uk; > >>>>> davem@davemloft.net; kuba@kernel.org > >>>>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > >>>>> dl-linux-imx <linux-imx@nxp.com> > >>>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO > >>>>> bus resume back > >>>>> > >>>>> On 05.04.2021 10:43, Christian Melki wrote: > >>>>>> On 4/5/21 12:48 AM, Heiner Kallweit wrote: > >>>>>>> On 04.04.2021 16:09, Heiner Kallweit wrote: > >>>>>>>> On 04.04.2021 12:07, Joakim Zhang wrote: > >>>>>>>>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram > may > >>>>>>>>> cut off PHY power") invokes phy_init_hw() when MDIO bus > >>>>>>>>> resume, it will soft reset PHY if PHY driver implements soft_reset > callback. > >>>>>>>>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset > >>>>>>>>> callback to genphy_soft_reset for KSZ8081") adds soft_reset for > KSZ8081. > >>>>>>>>> After these two patches, I found i.MX6UL 14x14 EVK which > >>>>>>>>> connected to KSZ8081RNB doesn't work any more when system > >>> resume > >>>>>>>>> back, MAC > >>>>> driver is fec_main.c. > >>>>>>>>> > >>>>>>>>> It's obvious that initializing PHY hardware when MDIO bus > >>>>>>>>> resume back would introduce some regression when PHY > >>>>>>>>> implements soft_reset. When I > >>>>>>>> > >>>>>>>> Why is this obvious? Please elaborate on why a soft reset > >>>>>>>> should break something. > >>>>>>>> > >>>>>>>>> am debugging, I found PHY works fine if MAC doesn't support > >>>>>>>>> suspend/resume or phy_stop()/phy_start() doesn't been called > >>>>>>>>> during suspend/resume. This let me realize, PHY state machine > >>>>>>>>> phy_state_machine() could do something breaks the PHY. > >>>>>>>>> > >>>>>>>>> As we known, MAC resume first and then MDIO bus resume when > >>>>>>>>> system resume back from suspend. When MAC resume, usually it > >>>>>>>>> will invoke > >>>>>>>>> phy_start() where to change PHY state to PHY_UP, then trigger > >>>>>>>>> the > >>>>>>>>> stat> machine to run now. In phy_state_machine(), it will > >>>>>>>>> start/config auto-nego, then change PHY state to PHY_NOLINK, > >>>>>>>>> what to next is periodically check PHY link status. When MDIO > >>>>>>>>> bus resume, it will initialize PHY hardware, including > >>>>>>>>> soft_reset, what would soft_reset affect seems various from > different PHYs. > >>>>>>>>> For KSZ8081RNB, when it in PHY_NOLINK state and then perform a > >>>>>>>>> soft reset, > >>>>> it will never complete auto-nego. > >>>>>>>> > >>>>>>>> Why? That would need to be checked in detail. Maybe chip errata > >>>>>>>> documentation provides a hint. > >>>>>>>> > >>>>>>> > >>>>>>> The KSZ8081 spec says the following about bit BMCR_PDOWN: > >>>>>>> > >>>>>>> If software reset (Register 0.15) is used to exit power-down > >>>>>>> mode (Register 0.11 = 1), two software reset writes (Register > >>>>>>> 0.15 = 1) are required. The first write clears power-down mode; > >>>>>>> the second write resets the chip and re-latches the pin strapping pin > values. > >>>>>>> > >>>>>>> Maybe this causes the issue you see and genphy_soft_reset() > >>>>>>> isn't appropriate for this PHY. Please re-test with the KSZ8081 > >>>>>>> soft reset following the spec comment. > >>>>>>> > >>>>>> > >>>>>> Interesting. Never expected that behavior. > >>>>>> Thanks for catching it. Skimmed through the datasheets/erratas. > >>>>>> This is what I found (micrel.c): > >>>>>> > >>>>>> 10/100: > >>>>>> 8001 - Unaffected? > >>>>>> 8021/8031 - Double reset after PDOWN. > >>>>>> 8041 - Errata. PDOWN broken. Recommended do not use. Unclear if > >>>>>> reset solves the issue since errata says no error after reset but > >>>>>> is also claiming that only toggling PDOWN (may) or power will help. > >>>>>> 8051 - Double reset after PDOWN. > >>>>>> 8061 - Double reset after PDOWN. > >>>>>> 8081 - Double reset after PDOWN. > >>>>>> 8091 - Double reset after PDOWN. > >>>>>> > >>>>>> 10/100/1000: > >>>>>> Nothing in gigabit afaics. > >>>>>> > >>>>>> Switches: > >>>>>> 8862 - Not affected? > >>>>>> 8863 - Errata. PDOWN broken. Reset will not help. Workaround exists. > >>>>>> 8864 - Not affected? > >>>>>> 8873 - Errata. PDOWN broken. Reset will not help. Workaround exists. > >>>>>> 9477 - Errata. PDOWN broken. Will randomly cause link failure on > >>>>>> adjacent links. Do not use. > >>>>>> > >>>>>> This certainly explains a lot. > >>>>>> > >>>>>>>>> > >>>>>>>>> This patch changes PHY state to PHY_UP when MDIO bus resume > >>>>>>>>> back, it should be reasonable after PHY hardware > >>>>>>>>> re-initialized. Also give state machine a chance to start/config > auto-nego again. > >>>>>>>>> > >>>>>>>> > >>>>>>>> If the MAC driver calls phy_stop() on suspend, then > >>>>>>>> phydev->suspended is true and mdio_bus_phy_may_suspend() > >>>>>>>> phydev->returns > >>>>>>>> false. As a consequence > >>>>>>>> phydev->suspended_by_mdio_bus is false and > >>>>>>>> phydev->mdio_bus_phy_resume() > >>>>>>>> skips the PHY hw initialization. > >>>>>>>> Please also note that mdio_bus_phy_suspend() calls > >>>>>>>> phy_stop_machine() that sets the state to PHY_UP. > >>>>>>>> > >>>>>>> > >>>>>>> Forgot that MDIO bus suspend is done before MAC driver suspend. > >>>>>>> Therefore disregard this part for now. > >>>>>>> > >>>>>>>> Having said that the current argumentation isn't convincing. > >>>>>>>> I'm not aware of such issues on other systems, therefore it's > >>>>>>>> likely that something is system-dependent. > >>>>>>>> > >>>>>>>> Please check the exact call sequence on your system, maybe it > >>>>>>>> provides a hint. > >>>>>>>> > >>>>>>>>> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> > >>>>>>>>> --- > >>>>>>>>> drivers/net/phy/phy_device.c | 7 +++++++ > >>>>>>>>> 1 file changed, 7 insertions(+) > >>>>>>>>> > >>>>>>>>> diff --git a/drivers/net/phy/phy_device.c > >>>>>>>>> b/drivers/net/phy/phy_device.c index > >>>>>>>>> cc38e326405a..312a6f662481 > >>>>>>>>> 100644 > >>>>>>>>> --- a/drivers/net/phy/phy_device.c > >>>>>>>>> +++ b/drivers/net/phy/phy_device.c > >>>>>>>>> @@ -306,6 +306,13 @@ static __maybe_unused int > >>>>> mdio_bus_phy_resume(struct device *dev) > >>>>>>>>> ret = phy_resume(phydev); > >>>>>>>>> if (ret < 0) > >>>>>>>>> return ret; > >>>>>>>>> + > >>>>>>>>> + /* PHY state could be changed to PHY_NOLINK from MAC > >>>>>>>>> +controller > >>>>> resume > >>>>>>>>> + * rounte with phy_start(), here change to PHY_UP after > >>>>> re-initializing > >>>>>>>>> + * PHY hardware, let PHY state machine to start/config > >>>>>>>>> +auto-nego > >>>>> again. > >>>>>>>>> + */ > >>>>>>>>> + phydev->state = PHY_UP; > >>>>>>>>> + > >>>>>>>>> no_resume: > >>>>>>>>> if (phydev->attached_dev && phydev->adjust_link) > >>>>>>>>> phy_start_machine(phydev); > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> This is a quick draft of the modified soft reset for KSZ8081. > >>>>> Some tests would be appreciated. > >>>>> > >>>> > >>>> I test this patch at my side, unfortunately, it still can't work. > >>>> > >>>> I have not found what soft reset would affect in 8081 spec, is > >>>> there ang common Description for different PHYs? > >>>> > >>> > >>> You can check the clause 22 spec: 802.3 22.2.4.1.1 > >>> > >>> Apart from that you can check BMCR value and which modes your PHY > >>> advertises after a soft reset. If the PHY indeed wouldn't restart > >>> aneg after a soft reset, then it would be the only one with this > >>> behavior I know. And I'd wonder why there aren't more such reports. > >> > >> Hi Heiner, > >> > >> We have reasons to believe it is a weird behavior of KSZ8081. I have > >> another two PHYs at my side, AR8031 and RTL8211FD, they can work fine if > soft_reset implemented. > >> > >> As commit message described, phy_start() would change PHY state to > PHY_UP finally to PHY_NOLINK when MAC resume. > >> After MDIO bus resume back, it will periodically check link status. I did below > code change to dump all PHY registers. > >> > >> --- a/drivers/net/phy/phy.c > >> +++ b/drivers/net/phy/phy.c > >> @@ -35,7 +35,7 @@ > >> #include <net/genetlink.h> > >> #include <net/sock.h> > >> > >> -#define PHY_STATE_TIME HZ > >> +#define PHY_STATE_TIME (10 * HZ) > >> > >> #define PHY_STATE_STR(_state) \ > >> case PHY_##_state: \ > >> @@ -738,6 +738,28 @@ static int phy_check_link_status(struct phy_device > *phydev) > >> if (err) > >> return err; > >> > >> + printk("offset 0x00 data = %0x\n", phy_read(phydev, 0x00)); > >> + printk("offset 0x01 data = %0x\n", phy_read(phydev, 0x01)); > >> + printk("offset 0x02 data = %0x\n", phy_read(phydev, 0x02)); > >> + printk("offset 0x03 data = %0x\n", phy_read(phydev, 0x03)); > >> + printk("offset 0x04 data = %0x\n", phy_read(phydev, 0x04)); > >> + printk("offset 0x05 data = %0x\n", phy_read(phydev, 0x05)); > >> + printk("offset 0x06 data = %0x\n", phy_read(phydev, 0x06)); > >> + printk("offset 0x07 data = %0x\n", phy_read(phydev, 0x07)); > >> + printk("offset 0x08 data = %0x\n", phy_read(phydev, 0x08)); > >> + printk("offset 0x09 data = %0x\n", phy_read(phydev, 0x09)); > >> + printk("offset 0x10 data = %0x\n", phy_read(phydev, 0x10)); > >> + printk("offset 0x11 data = %0x\n", phy_read(phydev, 0x11)); > >> + printk("offset 0x15 data = %0x\n", phy_read(phydev, 0x15)); > >> + printk("offset 0x16 data = %0x\n", phy_read(phydev, 0x16)); > >> + printk("offset 0x17 data = %0x\n", phy_read(phydev, 0x17)); > >> + printk("offset 0x18 data = %0x\n", phy_read(phydev, 0x18)); > >> + printk("offset 0x1b data = %0x\n", phy_read(phydev, 0x1b)); > >> + printk("offset 0x1c data = %0x\n", phy_read(phydev, 0x1c)); > >> + printk("offset 0x1d data = %0x\n", phy_read(phydev, 0x1d)); > >> + printk("offset 0x1e data = %0x\n", phy_read(phydev, 0x1e)); > >> + printk("offset 0x1f data = %0x\n", phy_read(phydev, 0x1f)); > >> + printk("\n\n"); > >> if (phydev->link && phydev->state != PHY_RUNNING) { > >> phy_check_downshift(phydev); > >> phydev->state = PHY_RUNNING; > >> > >> After MDIO bus resume back, results as below: > >> > >> [ 119.545383] offset 0x00 data = 1100 [ 119.549237] offset 0x01 > >> data = 7849 [ 119.563125] offset 0x02 data = 22 [ 119.566810] > >> offset 0x03 data = 1560 [ 119.570597] offset 0x04 data = 85e1 [ > >> 119.588016] offset 0x05 data = 0 [ 119.592891] offset 0x06 data = 4 > >> [ 119.596452] offset 0x07 data = 2001 [ 119.600233] offset 0x08 > >> data = 0 [ 119.617864] offset 0x09 data = 0 [ 119.625990] offset > >> 0x10 data = 0 [ 119.629576] offset 0x11 data = 0 [ 119.642735] > >> offset 0x15 data = 0 [ 119.646332] offset 0x16 data = 202 [ > >> 119.650030] offset 0x17 data = 5c02 [ 119.668054] offset 0x18 data = > >> 801 [ 119.672997] offset 0x1b data = 0 [ 119.676565] offset 0x1c > >> data = 0 [ 119.680084] offset 0x1d data = 0 [ 119.698031] offset > >> 0x1e data = 20 [ 119.706246] offset 0x1f data = 8190 [ 119.709984] > >> [ 119.709984] [ 120.182120] offset 0x00 data = 100 [ 120.185894] > >> offset 0x01 data = 784d [ 120.189681] offset 0x02 data = 22 [ > >> 120.206319] offset 0x03 data = 1560 [ 120.210171] offset 0x04 data = > >> 8061 [ 120.225353] offset 0x05 data = 0 [ 120.228948] offset 0x06 > >> data = 4 [ 120.242936] offset 0x07 data = 2001 [ 120.246792] offset > >> 0x08 data = 0 [ 120.250313] offset 0x09 data = 0 [ 120.266748] > >> offset 0x10 data = 0 [ 120.270335] offset 0x11 data = 0 [ > >> 120.284167] offset 0x15 data = 0 [ 120.287760] offset 0x16 data = > >> 202 [ 120.301031] offset 0x17 data = 3c02 [ 120.313209] offset 0x18 > >> data = 801 [ 120.316983] offset 0x1b data = 0 [ 120.320513] offset > >> 0x1c data = 1 [ 120.336589] offset 0x1d data = 0 [ 120.340184] > >> offset 0x1e data = 115 [ 120.355357] offset 0x1f data = 8190 [ > >> 120.359112] [ 120.359112] [ 129.785396] offset 0x00 data = 1100 [ > >> 129.789252] offset 0x01 data = 7849 [ 129.809951] offset 0x02 data = > >> 22 [ 129.815018] offset 0x03 data = 1560 [ 129.818845] offset 0x04 > >> data = 85e1 [ 129.835808] offset 0x05 data = 0 [ 129.839398] offset > >> 0x06 data = 4 [ 129.854514] offset 0x07 data = 2001 [ 129.858371] > >> offset 0x08 data = 0 [ 129.873357] offset 0x09 data = 0 [ > >> 129.876954] offset 0x10 data = 0 [ 129.880472] offset 0x11 data = 0 > >> [ 129.896450] offset 0x15 data = 0 [ 129.900038] offset 0x16 data = > >> 202 [ 129.915041] offset 0x17 data = 5c02 [ 129.918889] offset 0x18 > >> data = 801 [ 129.932735] offset 0x1b data = 0 [ 129.946830] offset > >> 0x1c data = 0 [ 129.950424] offset 0x1d data = 0 [ 129.964585] > >> offset 0x1e data = 0 [ 129.968192] offset 0x1f data = 8190 [ > >> 129.972938] [ 129.972938] [ 130.425125] offset 0x00 data = 100 [ > >> 130.428889] offset 0x01 data = 784d [ 130.442671] offset 0x02 data = > >> 22 [ 130.446360] offset 0x03 data = 1560 [ 130.450142] offset 0x04 > >> data = 8061 [ 130.467207] offset 0x05 data = 0 [ 130.470789] offset > >> 0x06 data = 4 [ 130.485071] offset 0x07 data = 2001 [ 130.488934] > >> offset 0x08 data = 0 [ 130.504320] offset 0x09 data = 0 [ > >> 130.507911] offset 0x10 data = 0 [ 130.520950] offset 0x11 data = 0 > >> [ 130.532865] offset 0x15 data = 0 [ 130.536461] offset 0x16 data = > >> 202 [ 130.540156] offset 0x17 data = 3c02 [ 130.557218] offset 0x18 > >> data = 801 [ 130.560987] offset 0x1b data = 0 [ 130.575158] offset > >> 0x1c data = 1 [ 130.578754] offset 0x1d data = 0 [ 130.593543] > >> offset 0x1e data = 115 [ 130.597312] offset 0x1f data = 8190 > >> > >> We can see that BMCR and BMSR changes from 0x1100,0x7849 to > 0x100,0x784d (BMCR[12] bit and BMSR[2]), and loop it. > >> Above process is auto-nego enabled, link is down, auto-nego is disabled, link > is up. And auto-nego complete bit is always 0. > >> > >> PHY seems become unstable if soft reset after start/config auto-nego. I also > want to fix it in micrel driver, but failed. > >> > > > > Waiting for ANEG_COMPLETE to be set wouldn't be a good option. Aneg > > may never complete for different reasons, e.g. no physical link. And > > even if we use a timeout this may add unwanted delays. > > > >> Do you have any other insights that can help me further locate the issue? > Thanks. > >> > > > > I think current MAC/PHY PM handling isn't perfect. Often we have the > > following > > scenario: > > > > *suspend* > > 1. PHY is suspended (mdio_bus_phy_suspend) 2. MAC suspend callback > > (typically involving phy_stop()) > > > > *resume* > > 1. MAC resume callback (typically involving phy_start()) 2. PHY is > > resumed (mdio_bus_phy_resume), incl. calling phy_init_hw() > > > > Calling phy_init_hw() after phy_start() doesn't look right. > > It seems to work in most cases, but there's a certain risk that > > phy_init_hw() overwrites something, e.g. the advertised modes. > > I think we have two valid scenarios: > > > > 1. phylib PM callbacks are used, then the MAC driver shouldn't > > touch the PHY in its PM callbacks, especially not call > > phy_stop/phy_start. > > > > 2. MAC PM callbacks take care also of the PHY. Then I think we would > > need a flag at the phy_device telling it to make the PHY PM > > callbacks a no-op. > > > > Andrew, Russell: What do you think? > > > >> Best Regards, > >> Joakim Zhang > >> > >> [...] > >> > > Heiner > > > > Based on scenario 1 you can also try the following. > > diff --git a/drivers/net/ethernet/freescale/fec_main.c > b/drivers/net/ethernet/freescale/fec_main.c > index 3db882322..cdf9160fc 100644 > --- a/drivers/net/ethernet/freescale/fec_main.c > +++ b/drivers/net/ethernet/freescale/fec_main.c > @@ -3805,7 +3805,6 @@ static int __maybe_unused fec_suspend(struct > device *dev) > if (netif_running(ndev)) { > if (fep->wol_flag & FEC_WOL_FLAG_ENABLE) > fep->wol_flag |= FEC_WOL_FLAG_SLEEP_ON; > - phy_stop(ndev->phydev); > napi_disable(&fep->napi); > netif_tx_lock_bh(ndev); > netif_device_detach(ndev); > @@ -3864,7 +3863,6 @@ static int __maybe_unused fec_resume(struct > device *dev) > netif_device_attach(ndev); > netif_tx_unlock_bh(ndev); > napi_enable(&fep->napi); > - phy_start(ndev->phydev); > } > rtnl_unlock(); As I described in commit message: "When I am debugging, I found PHY works fine if MAC doesn't support suspend/resume or phy_stop()/phy_start() doesn't been called during suspend/resume. This let me realize, PHY state machine phy_state_machine() could do something breaks the PHY." So I have tried your above code change before, and it works. But it is a very common phenomenon that MAC suspend/resume invokes phy_stop/start or phylink_stop/start, and it's been around for a long time. As the scenarios you list, it is indeed unreasonable to soft reset or config PHY after calling phy_start_aneg() in state machine. PHY state should be PHY_UP after calling phy_init_hw(), It seems more reasonable. Best Regards, Joakim Zhang > -- > 2.31.1 >
On 07.04.2021 03:43, Joakim Zhang wrote: > > Hi Heiner, > >> -----Original Message----- >> From: Heiner Kallweit <hkallweit1@gmail.com> >> Sent: 2021年4月7日 2:22 >> To: Joakim Zhang <qiangqing.zhang@nxp.com>; christian.melki@t2data.com; >> andrew@lunn.ch; linux@armlinux.org.uk; davem@davemloft.net; >> kuba@kernel.org; Russell King - ARM Linux <linux@armlinux.org.uk> >> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx >> <linux-imx@nxp.com>; Florian Fainelli <f.fainelli@gmail.com> >> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume >> back >> >> On 06.04.2021 13:42, Heiner Kallweit wrote: >>> On 06.04.2021 12:07, Joakim Zhang wrote: >>>> >>>>> -----Original Message----- >>>>> From: Heiner Kallweit <hkallweit1@gmail.com> >>>>> Sent: 2021年4月6日 14:29 >>>>> To: Joakim Zhang <qiangqing.zhang@nxp.com>; >>>>> christian.melki@t2data.com; andrew@lunn.ch; linux@armlinux.org.uk; >>>>> davem@davemloft.net; kuba@kernel.org >>>>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; >>>>> dl-linux-imx <linux-imx@nxp.com> >>>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO >>>>> bus resume back >>>>> >>>>> On 06.04.2021 04:07, Joakim Zhang wrote: >>>>>> >>>>>> Hi Heiner, >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Heiner Kallweit <hkallweit1@gmail.com> >>>>>>> Sent: 2021年4月5日 20:10 >>>>>>> To: christian.melki@t2data.com; Joakim Zhang >>>>>>> <qiangqing.zhang@nxp.com>; andrew@lunn.ch; linux@armlinux.org.uk; >>>>>>> davem@davemloft.net; kuba@kernel.org >>>>>>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; >>>>>>> dl-linux-imx <linux-imx@nxp.com> >>>>>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO >>>>>>> bus resume back >>>>>>> >>>>>>> On 05.04.2021 10:43, Christian Melki wrote: >>>>>>>> On 4/5/21 12:48 AM, Heiner Kallweit wrote: >>>>>>>>> On 04.04.2021 16:09, Heiner Kallweit wrote: >>>>>>>>>> On 04.04.2021 12:07, Joakim Zhang wrote: >>>>>>>>>>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram >> may >>>>>>>>>>> cut off PHY power") invokes phy_init_hw() when MDIO bus >>>>>>>>>>> resume, it will soft reset PHY if PHY driver implements soft_reset >> callback. >>>>>>>>>>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset >>>>>>>>>>> callback to genphy_soft_reset for KSZ8081") adds soft_reset for >> KSZ8081. >>>>>>>>>>> After these two patches, I found i.MX6UL 14x14 EVK which >>>>>>>>>>> connected to KSZ8081RNB doesn't work any more when system >>>>> resume >>>>>>>>>>> back, MAC >>>>>>> driver is fec_main.c. >>>>>>>>>>> >>>>>>>>>>> It's obvious that initializing PHY hardware when MDIO bus >>>>>>>>>>> resume back would introduce some regression when PHY >>>>>>>>>>> implements soft_reset. When I >>>>>>>>>> >>>>>>>>>> Why is this obvious? Please elaborate on why a soft reset >>>>>>>>>> should break something. >>>>>>>>>> >>>>>>>>>>> am debugging, I found PHY works fine if MAC doesn't support >>>>>>>>>>> suspend/resume or phy_stop()/phy_start() doesn't been called >>>>>>>>>>> during suspend/resume. This let me realize, PHY state machine >>>>>>>>>>> phy_state_machine() could do something breaks the PHY. >>>>>>>>>>> >>>>>>>>>>> As we known, MAC resume first and then MDIO bus resume when >>>>>>>>>>> system resume back from suspend. When MAC resume, usually it >>>>>>>>>>> will invoke >>>>>>>>>>> phy_start() where to change PHY state to PHY_UP, then trigger >>>>>>>>>>> the >>>>>>>>>>> stat> machine to run now. In phy_state_machine(), it will >>>>>>>>>>> start/config auto-nego, then change PHY state to PHY_NOLINK, >>>>>>>>>>> what to next is periodically check PHY link status. When MDIO >>>>>>>>>>> bus resume, it will initialize PHY hardware, including >>>>>>>>>>> soft_reset, what would soft_reset affect seems various from >> different PHYs. >>>>>>>>>>> For KSZ8081RNB, when it in PHY_NOLINK state and then perform a >>>>>>>>>>> soft reset, >>>>>>> it will never complete auto-nego. >>>>>>>>>> >>>>>>>>>> Why? That would need to be checked in detail. Maybe chip errata >>>>>>>>>> documentation provides a hint. >>>>>>>>>> >>>>>>>>> >>>>>>>>> The KSZ8081 spec says the following about bit BMCR_PDOWN: >>>>>>>>> >>>>>>>>> If software reset (Register 0.15) is used to exit power-down >>>>>>>>> mode (Register 0.11 = 1), two software reset writes (Register >>>>>>>>> 0.15 = 1) are required. The first write clears power-down mode; >>>>>>>>> the second write resets the chip and re-latches the pin strapping pin >> values. >>>>>>>>> >>>>>>>>> Maybe this causes the issue you see and genphy_soft_reset() >>>>>>>>> isn't appropriate for this PHY. Please re-test with the KSZ8081 >>>>>>>>> soft reset following the spec comment. >>>>>>>>> >>>>>>>> >>>>>>>> Interesting. Never expected that behavior. >>>>>>>> Thanks for catching it. Skimmed through the datasheets/erratas. >>>>>>>> This is what I found (micrel.c): >>>>>>>> >>>>>>>> 10/100: >>>>>>>> 8001 - Unaffected? >>>>>>>> 8021/8031 - Double reset after PDOWN. >>>>>>>> 8041 - Errata. PDOWN broken. Recommended do not use. Unclear if >>>>>>>> reset solves the issue since errata says no error after reset but >>>>>>>> is also claiming that only toggling PDOWN (may) or power will help. >>>>>>>> 8051 - Double reset after PDOWN. >>>>>>>> 8061 - Double reset after PDOWN. >>>>>>>> 8081 - Double reset after PDOWN. >>>>>>>> 8091 - Double reset after PDOWN. >>>>>>>> >>>>>>>> 10/100/1000: >>>>>>>> Nothing in gigabit afaics. >>>>>>>> >>>>>>>> Switches: >>>>>>>> 8862 - Not affected? >>>>>>>> 8863 - Errata. PDOWN broken. Reset will not help. Workaround exists. >>>>>>>> 8864 - Not affected? >>>>>>>> 8873 - Errata. PDOWN broken. Reset will not help. Workaround exists. >>>>>>>> 9477 - Errata. PDOWN broken. Will randomly cause link failure on >>>>>>>> adjacent links. Do not use. >>>>>>>> >>>>>>>> This certainly explains a lot. >>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This patch changes PHY state to PHY_UP when MDIO bus resume >>>>>>>>>>> back, it should be reasonable after PHY hardware >>>>>>>>>>> re-initialized. Also give state machine a chance to start/config >> auto-nego again. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> If the MAC driver calls phy_stop() on suspend, then >>>>>>>>>> phydev->suspended is true and mdio_bus_phy_may_suspend() >>>>>>>>>> phydev->returns >>>>>>>>>> false. As a consequence >>>>>>>>>> phydev->suspended_by_mdio_bus is false and >>>>>>>>>> phydev->mdio_bus_phy_resume() >>>>>>>>>> skips the PHY hw initialization. >>>>>>>>>> Please also note that mdio_bus_phy_suspend() calls >>>>>>>>>> phy_stop_machine() that sets the state to PHY_UP. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Forgot that MDIO bus suspend is done before MAC driver suspend. >>>>>>>>> Therefore disregard this part for now. >>>>>>>>> >>>>>>>>>> Having said that the current argumentation isn't convincing. >>>>>>>>>> I'm not aware of such issues on other systems, therefore it's >>>>>>>>>> likely that something is system-dependent. >>>>>>>>>> >>>>>>>>>> Please check the exact call sequence on your system, maybe it >>>>>>>>>> provides a hint. >>>>>>>>>> >>>>>>>>>>> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> >>>>>>>>>>> --- >>>>>>>>>>> drivers/net/phy/phy_device.c | 7 +++++++ >>>>>>>>>>> 1 file changed, 7 insertions(+) >>>>>>>>>>> >>>>>>>>>>> diff --git a/drivers/net/phy/phy_device.c >>>>>>>>>>> b/drivers/net/phy/phy_device.c index >>>>>>>>>>> cc38e326405a..312a6f662481 >>>>>>>>>>> 100644 >>>>>>>>>>> --- a/drivers/net/phy/phy_device.c >>>>>>>>>>> +++ b/drivers/net/phy/phy_device.c >>>>>>>>>>> @@ -306,6 +306,13 @@ static __maybe_unused int >>>>>>> mdio_bus_phy_resume(struct device *dev) >>>>>>>>>>> ret = phy_resume(phydev); >>>>>>>>>>> if (ret < 0) >>>>>>>>>>> return ret; >>>>>>>>>>> + >>>>>>>>>>> + /* PHY state could be changed to PHY_NOLINK from MAC >>>>>>>>>>> +controller >>>>>>> resume >>>>>>>>>>> + * rounte with phy_start(), here change to PHY_UP after >>>>>>> re-initializing >>>>>>>>>>> + * PHY hardware, let PHY state machine to start/config >>>>>>>>>>> +auto-nego >>>>>>> again. >>>>>>>>>>> + */ >>>>>>>>>>> + phydev->state = PHY_UP; >>>>>>>>>>> + >>>>>>>>>>> no_resume: >>>>>>>>>>> if (phydev->attached_dev && phydev->adjust_link) >>>>>>>>>>> phy_start_machine(phydev); >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> This is a quick draft of the modified soft reset for KSZ8081. >>>>>>> Some tests would be appreciated. >>>>>>> >>>>>> >>>>>> I test this patch at my side, unfortunately, it still can't work. >>>>>> >>>>>> I have not found what soft reset would affect in 8081 spec, is >>>>>> there ang common Description for different PHYs? >>>>>> >>>>> >>>>> You can check the clause 22 spec: 802.3 22.2.4.1.1 >>>>> >>>>> Apart from that you can check BMCR value and which modes your PHY >>>>> advertises after a soft reset. If the PHY indeed wouldn't restart >>>>> aneg after a soft reset, then it would be the only one with this >>>>> behavior I know. And I'd wonder why there aren't more such reports. >>>> >>>> Hi Heiner, >>>> >>>> We have reasons to believe it is a weird behavior of KSZ8081. I have >>>> another two PHYs at my side, AR8031 and RTL8211FD, they can work fine if >> soft_reset implemented. >>>> >>>> As commit message described, phy_start() would change PHY state to >> PHY_UP finally to PHY_NOLINK when MAC resume. >>>> After MDIO bus resume back, it will periodically check link status. I did below >> code change to dump all PHY registers. >>>> >>>> --- a/drivers/net/phy/phy.c >>>> +++ b/drivers/net/phy/phy.c >>>> @@ -35,7 +35,7 @@ >>>> #include <net/genetlink.h> >>>> #include <net/sock.h> >>>> >>>> -#define PHY_STATE_TIME HZ >>>> +#define PHY_STATE_TIME (10 * HZ) >>>> >>>> #define PHY_STATE_STR(_state) \ >>>> case PHY_##_state: \ >>>> @@ -738,6 +738,28 @@ static int phy_check_link_status(struct phy_device >> *phydev) >>>> if (err) >>>> return err; >>>> >>>> + printk("offset 0x00 data = %0x\n", phy_read(phydev, 0x00)); >>>> + printk("offset 0x01 data = %0x\n", phy_read(phydev, 0x01)); >>>> + printk("offset 0x02 data = %0x\n", phy_read(phydev, 0x02)); >>>> + printk("offset 0x03 data = %0x\n", phy_read(phydev, 0x03)); >>>> + printk("offset 0x04 data = %0x\n", phy_read(phydev, 0x04)); >>>> + printk("offset 0x05 data = %0x\n", phy_read(phydev, 0x05)); >>>> + printk("offset 0x06 data = %0x\n", phy_read(phydev, 0x06)); >>>> + printk("offset 0x07 data = %0x\n", phy_read(phydev, 0x07)); >>>> + printk("offset 0x08 data = %0x\n", phy_read(phydev, 0x08)); >>>> + printk("offset 0x09 data = %0x\n", phy_read(phydev, 0x09)); >>>> + printk("offset 0x10 data = %0x\n", phy_read(phydev, 0x10)); >>>> + printk("offset 0x11 data = %0x\n", phy_read(phydev, 0x11)); >>>> + printk("offset 0x15 data = %0x\n", phy_read(phydev, 0x15)); >>>> + printk("offset 0x16 data = %0x\n", phy_read(phydev, 0x16)); >>>> + printk("offset 0x17 data = %0x\n", phy_read(phydev, 0x17)); >>>> + printk("offset 0x18 data = %0x\n", phy_read(phydev, 0x18)); >>>> + printk("offset 0x1b data = %0x\n", phy_read(phydev, 0x1b)); >>>> + printk("offset 0x1c data = %0x\n", phy_read(phydev, 0x1c)); >>>> + printk("offset 0x1d data = %0x\n", phy_read(phydev, 0x1d)); >>>> + printk("offset 0x1e data = %0x\n", phy_read(phydev, 0x1e)); >>>> + printk("offset 0x1f data = %0x\n", phy_read(phydev, 0x1f)); >>>> + printk("\n\n"); >>>> if (phydev->link && phydev->state != PHY_RUNNING) { >>>> phy_check_downshift(phydev); >>>> phydev->state = PHY_RUNNING; >>>> >>>> After MDIO bus resume back, results as below: >>>> >>>> [ 119.545383] offset 0x00 data = 1100 [ 119.549237] offset 0x01 >>>> data = 7849 [ 119.563125] offset 0x02 data = 22 [ 119.566810] >>>> offset 0x03 data = 1560 [ 119.570597] offset 0x04 data = 85e1 [ >>>> 119.588016] offset 0x05 data = 0 [ 119.592891] offset 0x06 data = 4 >>>> [ 119.596452] offset 0x07 data = 2001 [ 119.600233] offset 0x08 >>>> data = 0 [ 119.617864] offset 0x09 data = 0 [ 119.625990] offset >>>> 0x10 data = 0 [ 119.629576] offset 0x11 data = 0 [ 119.642735] >>>> offset 0x15 data = 0 [ 119.646332] offset 0x16 data = 202 [ >>>> 119.650030] offset 0x17 data = 5c02 [ 119.668054] offset 0x18 data = >>>> 801 [ 119.672997] offset 0x1b data = 0 [ 119.676565] offset 0x1c >>>> data = 0 [ 119.680084] offset 0x1d data = 0 [ 119.698031] offset >>>> 0x1e data = 20 [ 119.706246] offset 0x1f data = 8190 [ 119.709984] >>>> [ 119.709984] [ 120.182120] offset 0x00 data = 100 [ 120.185894] >>>> offset 0x01 data = 784d [ 120.189681] offset 0x02 data = 22 [ >>>> 120.206319] offset 0x03 data = 1560 [ 120.210171] offset 0x04 data = >>>> 8061 [ 120.225353] offset 0x05 data = 0 [ 120.228948] offset 0x06 >>>> data = 4 [ 120.242936] offset 0x07 data = 2001 [ 120.246792] offset >>>> 0x08 data = 0 [ 120.250313] offset 0x09 data = 0 [ 120.266748] >>>> offset 0x10 data = 0 [ 120.270335] offset 0x11 data = 0 [ >>>> 120.284167] offset 0x15 data = 0 [ 120.287760] offset 0x16 data = >>>> 202 [ 120.301031] offset 0x17 data = 3c02 [ 120.313209] offset 0x18 >>>> data = 801 [ 120.316983] offset 0x1b data = 0 [ 120.320513] offset >>>> 0x1c data = 1 [ 120.336589] offset 0x1d data = 0 [ 120.340184] >>>> offset 0x1e data = 115 [ 120.355357] offset 0x1f data = 8190 [ >>>> 120.359112] [ 120.359112] [ 129.785396] offset 0x00 data = 1100 [ >>>> 129.789252] offset 0x01 data = 7849 [ 129.809951] offset 0x02 data = >>>> 22 [ 129.815018] offset 0x03 data = 1560 [ 129.818845] offset 0x04 >>>> data = 85e1 [ 129.835808] offset 0x05 data = 0 [ 129.839398] offset >>>> 0x06 data = 4 [ 129.854514] offset 0x07 data = 2001 [ 129.858371] >>>> offset 0x08 data = 0 [ 129.873357] offset 0x09 data = 0 [ >>>> 129.876954] offset 0x10 data = 0 [ 129.880472] offset 0x11 data = 0 >>>> [ 129.896450] offset 0x15 data = 0 [ 129.900038] offset 0x16 data = >>>> 202 [ 129.915041] offset 0x17 data = 5c02 [ 129.918889] offset 0x18 >>>> data = 801 [ 129.932735] offset 0x1b data = 0 [ 129.946830] offset >>>> 0x1c data = 0 [ 129.950424] offset 0x1d data = 0 [ 129.964585] >>>> offset 0x1e data = 0 [ 129.968192] offset 0x1f data = 8190 [ >>>> 129.972938] [ 129.972938] [ 130.425125] offset 0x00 data = 100 [ >>>> 130.428889] offset 0x01 data = 784d [ 130.442671] offset 0x02 data = >>>> 22 [ 130.446360] offset 0x03 data = 1560 [ 130.450142] offset 0x04 >>>> data = 8061 [ 130.467207] offset 0x05 data = 0 [ 130.470789] offset >>>> 0x06 data = 4 [ 130.485071] offset 0x07 data = 2001 [ 130.488934] >>>> offset 0x08 data = 0 [ 130.504320] offset 0x09 data = 0 [ >>>> 130.507911] offset 0x10 data = 0 [ 130.520950] offset 0x11 data = 0 >>>> [ 130.532865] offset 0x15 data = 0 [ 130.536461] offset 0x16 data = >>>> 202 [ 130.540156] offset 0x17 data = 3c02 [ 130.557218] offset 0x18 >>>> data = 801 [ 130.560987] offset 0x1b data = 0 [ 130.575158] offset >>>> 0x1c data = 1 [ 130.578754] offset 0x1d data = 0 [ 130.593543] >>>> offset 0x1e data = 115 [ 130.597312] offset 0x1f data = 8190 >>>> >>>> We can see that BMCR and BMSR changes from 0x1100,0x7849 to >> 0x100,0x784d (BMCR[12] bit and BMSR[2]), and loop it. >>>> Above process is auto-nego enabled, link is down, auto-nego is disabled, link >> is up. And auto-nego complete bit is always 0. >>>> >>>> PHY seems become unstable if soft reset after start/config auto-nego. I also >> want to fix it in micrel driver, but failed. >>>> >>> >>> Waiting for ANEG_COMPLETE to be set wouldn't be a good option. Aneg >>> may never complete for different reasons, e.g. no physical link. And >>> even if we use a timeout this may add unwanted delays. >>> >>>> Do you have any other insights that can help me further locate the issue? >> Thanks. >>>> >>> >>> I think current MAC/PHY PM handling isn't perfect. Often we have the >>> following >>> scenario: >>> >>> *suspend* >>> 1. PHY is suspended (mdio_bus_phy_suspend) 2. MAC suspend callback >>> (typically involving phy_stop()) >>> >>> *resume* >>> 1. MAC resume callback (typically involving phy_start()) 2. PHY is >>> resumed (mdio_bus_phy_resume), incl. calling phy_init_hw() >>> >>> Calling phy_init_hw() after phy_start() doesn't look right. >>> It seems to work in most cases, but there's a certain risk that >>> phy_init_hw() overwrites something, e.g. the advertised modes. >>> I think we have two valid scenarios: >>> >>> 1. phylib PM callbacks are used, then the MAC driver shouldn't >>> touch the PHY in its PM callbacks, especially not call >>> phy_stop/phy_start. >>> >>> 2. MAC PM callbacks take care also of the PHY. Then I think we would >>> need a flag at the phy_device telling it to make the PHY PM >>> callbacks a no-op. >>> >>> Andrew, Russell: What do you think? >>> >>>> Best Regards, >>>> Joakim Zhang >>>> >>>> [...] >>>> >>> Heiner >>> >> >> Based on scenario 1 you can also try the following. >> >> diff --git a/drivers/net/ethernet/freescale/fec_main.c >> b/drivers/net/ethernet/freescale/fec_main.c >> index 3db882322..cdf9160fc 100644 >> --- a/drivers/net/ethernet/freescale/fec_main.c >> +++ b/drivers/net/ethernet/freescale/fec_main.c >> @@ -3805,7 +3805,6 @@ static int __maybe_unused fec_suspend(struct >> device *dev) >> if (netif_running(ndev)) { >> if (fep->wol_flag & FEC_WOL_FLAG_ENABLE) >> fep->wol_flag |= FEC_WOL_FLAG_SLEEP_ON; >> - phy_stop(ndev->phydev); >> napi_disable(&fep->napi); >> netif_tx_lock_bh(ndev); >> netif_device_detach(ndev); >> @@ -3864,7 +3863,6 @@ static int __maybe_unused fec_resume(struct >> device *dev) >> netif_device_attach(ndev); >> netif_tx_unlock_bh(ndev); >> napi_enable(&fep->napi); >> - phy_start(ndev->phydev); >> } >> rtnl_unlock(); > > As I described in commit message: > > "When I am debugging, I found PHY works fine if MAC doesn't support suspend/resume or phy_stop()/phy_start() doesn't been called during suspend/resume. This let me realize, PHY state machine phy_state_machine() could do something breaks the PHY." > > So I have tried your above code change before, and it works. But it is a very common phenomenon that MAC suspend/resume invokes phy_stop/start or phylink_stop/start, and it's been around for a long time. > > As the scenarios you list, it is indeed unreasonable to soft reset or config PHY after calling phy_start_aneg() in state machine. PHY state should be PHY_UP after calling phy_init_hw(), It seems more reasonable. > > Best Regards, > Joakim Zhang >> -- >> 2.31.1 >> > Following is a draft patch for scenario 1: Make PHY PM a no-op if MAC manages PHY PM. Can you give it a try? diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index a009d1769..73d29fd5e 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -273,6 +273,9 @@ static __maybe_unused int mdio_bus_phy_suspend(struct device *dev) { struct phy_device *phydev = to_phy_device(dev); + if (phydev->mac_managed_pm) + return 0; + /* We must stop the state machine manually, otherwise it stops out of * control, possibly with the phydev->lock held. Upon resume, netdev * may call phy routines that try to grab the same lock, and that may @@ -294,6 +297,9 @@ static __maybe_unused int mdio_bus_phy_resume(struct device *dev) struct phy_device *phydev = to_phy_device(dev); int ret; + if (phydev->mac_managed_pm) + return 0; + if (!phydev->suspended_by_mdio_bus) goto no_resume; diff --git a/include/linux/phy.h b/include/linux/phy.h index 8e2cf84b2..46702af18 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -567,6 +567,7 @@ struct phy_device { unsigned loopback_enabled:1; unsigned downshifted_rate:1; unsigned is_on_sfp_module:1; + unsigned mac_managed_pm:1; unsigned autoneg:1; /* The most recently read link state */ -- 2.31.1 diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 3db882322..70aea9c27 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -2048,6 +2048,8 @@ static int fec_enet_mii_probe(struct net_device *ndev) fep->link = 0; fep->full_duplex = 0; + phy_dev->mac_managed_pm = 1; + phy_attached_info(phy_dev); return 0; @@ -3864,6 +3866,7 @@ static int __maybe_unused fec_resume(struct device *dev) netif_device_attach(ndev); netif_tx_unlock_bh(ndev); napi_enable(&fep->napi); + phy_init_hw(ndev->phydev); phy_start(ndev->phydev); } rtnl_unlock(); -- 2.31.1
Hi Heiner, > -----Original Message----- > From: Heiner Kallweit <hkallweit1@gmail.com> > Sent: 2021年4月7日 15:12 > To: Joakim Zhang <qiangqing.zhang@nxp.com>; christian.melki@t2data.com; > andrew@lunn.ch; linux@armlinux.org.uk; davem@davemloft.net; > kuba@kernel.org > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx > <linux-imx@nxp.com>; Florian Fainelli <f.fainelli@gmail.com> > Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume > back > > On 07.04.2021 03:43, Joakim Zhang wrote: > > > > Hi Heiner, > > > >> -----Original Message----- > >> From: Heiner Kallweit <hkallweit1@gmail.com> > >> Sent: 2021年4月7日 2:22 > >> To: Joakim Zhang <qiangqing.zhang@nxp.com>; > >> christian.melki@t2data.com; andrew@lunn.ch; linux@armlinux.org.uk; > >> davem@davemloft.net; kuba@kernel.org; Russell King - ARM Linux > >> <linux@armlinux.org.uk> > >> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > >> dl-linux-imx <linux-imx@nxp.com>; Florian Fainelli > >> <f.fainelli@gmail.com> > >> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus > >> resume back > >> > >> On 06.04.2021 13:42, Heiner Kallweit wrote: > >>> On 06.04.2021 12:07, Joakim Zhang wrote: > >>>> > >>>>> -----Original Message----- > >>>>> From: Heiner Kallweit <hkallweit1@gmail.com> > >>>>> Sent: 2021年4月6日 14:29 > >>>>> To: Joakim Zhang <qiangqing.zhang@nxp.com>; > >>>>> christian.melki@t2data.com; andrew@lunn.ch; linux@armlinux.org.uk; > >>>>> davem@davemloft.net; kuba@kernel.org > >>>>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > >>>>> dl-linux-imx <linux-imx@nxp.com> > >>>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO > >>>>> bus resume back > >>>>> > >>>>> On 06.04.2021 04:07, Joakim Zhang wrote: > >>>>>> > >>>>>> Hi Heiner, > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Heiner Kallweit <hkallweit1@gmail.com> > >>>>>>> Sent: 2021年4月5日 20:10 > >>>>>>> To: christian.melki@t2data.com; Joakim Zhang > >>>>>>> <qiangqing.zhang@nxp.com>; andrew@lunn.ch; > >>>>>>> linux@armlinux.org.uk; davem@davemloft.net; kuba@kernel.org > >>>>>>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > >>>>>>> dl-linux-imx <linux-imx@nxp.com> > >>>>>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after > >>>>>>> MDIO bus resume back > >>>>>>> > >>>>>>> On 05.04.2021 10:43, Christian Melki wrote: > >>>>>>>> On 4/5/21 12:48 AM, Heiner Kallweit wrote: > >>>>>>>>> On 04.04.2021 16:09, Heiner Kallweit wrote: > >>>>>>>>>> On 04.04.2021 12:07, Joakim Zhang wrote: > >>>>>>>>>>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram > >> may > >>>>>>>>>>> cut off PHY power") invokes phy_init_hw() when MDIO bus > >>>>>>>>>>> resume, it will soft reset PHY if PHY driver implements > >>>>>>>>>>> soft_reset > >> callback. > >>>>>>>>>>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset > >>>>>>>>>>> callback to genphy_soft_reset for KSZ8081") adds soft_reset > >>>>>>>>>>> for > >> KSZ8081. > >>>>>>>>>>> After these two patches, I found i.MX6UL 14x14 EVK which > >>>>>>>>>>> connected to KSZ8081RNB doesn't work any more when system > >>>>> resume > >>>>>>>>>>> back, MAC > >>>>>>> driver is fec_main.c. > >>>>>>>>>>> > >>>>>>>>>>> It's obvious that initializing PHY hardware when MDIO bus > >>>>>>>>>>> resume back would introduce some regression when PHY > >>>>>>>>>>> implements soft_reset. When I > >>>>>>>>>> > >>>>>>>>>> Why is this obvious? Please elaborate on why a soft reset > >>>>>>>>>> should break something. > >>>>>>>>>> > >>>>>>>>>>> am debugging, I found PHY works fine if MAC doesn't support > >>>>>>>>>>> suspend/resume or phy_stop()/phy_start() doesn't been called > >>>>>>>>>>> during suspend/resume. This let me realize, PHY state > >>>>>>>>>>> machine > >>>>>>>>>>> phy_state_machine() could do something breaks the PHY. > >>>>>>>>>>> > >>>>>>>>>>> As we known, MAC resume first and then MDIO bus resume > when > >>>>>>>>>>> system resume back from suspend. When MAC resume, usually > it > >>>>>>>>>>> will invoke > >>>>>>>>>>> phy_start() where to change PHY state to PHY_UP, then > >>>>>>>>>>> trigger the > >>>>>>>>>>> stat> machine to run now. In phy_state_machine(), it will > >>>>>>>>>>> start/config auto-nego, then change PHY state to PHY_NOLINK, > >>>>>>>>>>> what to next is periodically check PHY link status. When > >>>>>>>>>>> MDIO bus resume, it will initialize PHY hardware, including > >>>>>>>>>>> soft_reset, what would soft_reset affect seems various from > >> different PHYs. > >>>>>>>>>>> For KSZ8081RNB, when it in PHY_NOLINK state and then perform > >>>>>>>>>>> a soft reset, > >>>>>>> it will never complete auto-nego. > >>>>>>>>>> > >>>>>>>>>> Why? That would need to be checked in detail. Maybe chip > >>>>>>>>>> errata documentation provides a hint. > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> The KSZ8081 spec says the following about bit BMCR_PDOWN: > >>>>>>>>> > >>>>>>>>> If software reset (Register 0.15) is used to exit power-down > >>>>>>>>> mode (Register 0.11 = 1), two software reset writes (Register > >>>>>>>>> 0.15 = 1) are required. The first write clears power-down > >>>>>>>>> mode; the second write resets the chip and re-latches the pin > >>>>>>>>> strapping pin > >> values. > >>>>>>>>> > >>>>>>>>> Maybe this causes the issue you see and genphy_soft_reset() > >>>>>>>>> isn't appropriate for this PHY. Please re-test with the > >>>>>>>>> KSZ8081 soft reset following the spec comment. > >>>>>>>>> > >>>>>>>> > >>>>>>>> Interesting. Never expected that behavior. > >>>>>>>> Thanks for catching it. Skimmed through the datasheets/erratas. > >>>>>>>> This is what I found (micrel.c): > >>>>>>>> > >>>>>>>> 10/100: > >>>>>>>> 8001 - Unaffected? > >>>>>>>> 8021/8031 - Double reset after PDOWN. > >>>>>>>> 8041 - Errata. PDOWN broken. Recommended do not use. Unclear if > >>>>>>>> reset solves the issue since errata says no error after reset > >>>>>>>> but is also claiming that only toggling PDOWN (may) or power will > help. > >>>>>>>> 8051 - Double reset after PDOWN. > >>>>>>>> 8061 - Double reset after PDOWN. > >>>>>>>> 8081 - Double reset after PDOWN. > >>>>>>>> 8091 - Double reset after PDOWN. > >>>>>>>> > >>>>>>>> 10/100/1000: > >>>>>>>> Nothing in gigabit afaics. > >>>>>>>> > >>>>>>>> Switches: > >>>>>>>> 8862 - Not affected? > >>>>>>>> 8863 - Errata. PDOWN broken. Reset will not help. Workaround > exists. > >>>>>>>> 8864 - Not affected? > >>>>>>>> 8873 - Errata. PDOWN broken. Reset will not help. Workaround > exists. > >>>>>>>> 9477 - Errata. PDOWN broken. Will randomly cause link failure > >>>>>>>> on adjacent links. Do not use. > >>>>>>>> > >>>>>>>> This certainly explains a lot. > >>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> This patch changes PHY state to PHY_UP when MDIO bus > resume > >>>>>>>>>>> back, it should be reasonable after PHY hardware > >>>>>>>>>>> re-initialized. Also give state machine a chance to > >>>>>>>>>>> start/config > >> auto-nego again. > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> If the MAC driver calls phy_stop() on suspend, then > >>>>>>>>>> phydev->suspended is true and mdio_bus_phy_may_suspend() > >>>>>>>>>> phydev->returns > >>>>>>>>>> false. As a consequence > >>>>>>>>>> phydev->suspended_by_mdio_bus is false and > >>>>>>>>>> phydev->mdio_bus_phy_resume() > >>>>>>>>>> skips the PHY hw initialization. > >>>>>>>>>> Please also note that mdio_bus_phy_suspend() calls > >>>>>>>>>> phy_stop_machine() that sets the state to PHY_UP. > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> Forgot that MDIO bus suspend is done before MAC driver suspend. > >>>>>>>>> Therefore disregard this part for now. > >>>>>>>>> > >>>>>>>>>> Having said that the current argumentation isn't convincing. > >>>>>>>>>> I'm not aware of such issues on other systems, therefore it's > >>>>>>>>>> likely that something is system-dependent. > >>>>>>>>>> > >>>>>>>>>> Please check the exact call sequence on your system, maybe it > >>>>>>>>>> provides a hint. > >>>>>>>>>> > >>>>>>>>>>> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> > >>>>>>>>>>> --- > >>>>>>>>>>> drivers/net/phy/phy_device.c | 7 +++++++ > >>>>>>>>>>> 1 file changed, 7 insertions(+) > >>>>>>>>>>> > >>>>>>>>>>> diff --git a/drivers/net/phy/phy_device.c > >>>>>>>>>>> b/drivers/net/phy/phy_device.c index > >>>>>>>>>>> cc38e326405a..312a6f662481 > >>>>>>>>>>> 100644 > >>>>>>>>>>> --- a/drivers/net/phy/phy_device.c > >>>>>>>>>>> +++ b/drivers/net/phy/phy_device.c > >>>>>>>>>>> @@ -306,6 +306,13 @@ static __maybe_unused int > >>>>>>> mdio_bus_phy_resume(struct device *dev) > >>>>>>>>>>> ret = phy_resume(phydev); > >>>>>>>>>>> if (ret < 0) > >>>>>>>>>>> return ret; > >>>>>>>>>>> + > >>>>>>>>>>> + /* PHY state could be changed to PHY_NOLINK from MAC > >>>>>>>>>>> +controller > >>>>>>> resume > >>>>>>>>>>> + * rounte with phy_start(), here change to PHY_UP after > >>>>>>> re-initializing > >>>>>>>>>>> + * PHY hardware, let PHY state machine to start/config > >>>>>>>>>>> +auto-nego > >>>>>>> again. > >>>>>>>>>>> + */ > >>>>>>>>>>> + phydev->state = PHY_UP; > >>>>>>>>>>> + > >>>>>>>>>>> no_resume: > >>>>>>>>>>> if (phydev->attached_dev && phydev->adjust_link) > >>>>>>>>>>> phy_start_machine(phydev); > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> This is a quick draft of the modified soft reset for KSZ8081. > >>>>>>> Some tests would be appreciated. > >>>>>>> > >>>>>> > >>>>>> I test this patch at my side, unfortunately, it still can't work. > >>>>>> > >>>>>> I have not found what soft reset would affect in 8081 spec, is > >>>>>> there ang common Description for different PHYs? > >>>>>> > >>>>> > >>>>> You can check the clause 22 spec: 802.3 22.2.4.1.1 > >>>>> > >>>>> Apart from that you can check BMCR value and which modes your PHY > >>>>> advertises after a soft reset. If the PHY indeed wouldn't restart > >>>>> aneg after a soft reset, then it would be the only one with this > >>>>> behavior I know. And I'd wonder why there aren't more such reports. > >>>> > >>>> Hi Heiner, > >>>> > >>>> We have reasons to believe it is a weird behavior of KSZ8081. I > >>>> have another two PHYs at my side, AR8031 and RTL8211FD, they can > >>>> work fine if > >> soft_reset implemented. > >>>> > >>>> As commit message described, phy_start() would change PHY state to > >> PHY_UP finally to PHY_NOLINK when MAC resume. > >>>> After MDIO bus resume back, it will periodically check link status. > >>>> I did below > >> code change to dump all PHY registers. > >>>> > >>>> --- a/drivers/net/phy/phy.c > >>>> +++ b/drivers/net/phy/phy.c > >>>> @@ -35,7 +35,7 @@ > >>>> #include <net/genetlink.h> > >>>> #include <net/sock.h> > >>>> > >>>> -#define PHY_STATE_TIME HZ > >>>> +#define PHY_STATE_TIME (10 * HZ) > >>>> > >>>> #define PHY_STATE_STR(_state) \ > >>>> case PHY_##_state: \ > >>>> @@ -738,6 +738,28 @@ static int phy_check_link_status(struct > >>>> phy_device > >> *phydev) > >>>> if (err) > >>>> return err; > >>>> > >>>> + printk("offset 0x00 data = %0x\n", phy_read(phydev, 0x00)); > >>>> + printk("offset 0x01 data = %0x\n", phy_read(phydev, 0x01)); > >>>> + printk("offset 0x02 data = %0x\n", phy_read(phydev, 0x02)); > >>>> + printk("offset 0x03 data = %0x\n", phy_read(phydev, 0x03)); > >>>> + printk("offset 0x04 data = %0x\n", phy_read(phydev, 0x04)); > >>>> + printk("offset 0x05 data = %0x\n", phy_read(phydev, 0x05)); > >>>> + printk("offset 0x06 data = %0x\n", phy_read(phydev, 0x06)); > >>>> + printk("offset 0x07 data = %0x\n", phy_read(phydev, 0x07)); > >>>> + printk("offset 0x08 data = %0x\n", phy_read(phydev, 0x08)); > >>>> + printk("offset 0x09 data = %0x\n", phy_read(phydev, 0x09)); > >>>> + printk("offset 0x10 data = %0x\n", phy_read(phydev, 0x10)); > >>>> + printk("offset 0x11 data = %0x\n", phy_read(phydev, 0x11)); > >>>> + printk("offset 0x15 data = %0x\n", phy_read(phydev, 0x15)); > >>>> + printk("offset 0x16 data = %0x\n", phy_read(phydev, 0x16)); > >>>> + printk("offset 0x17 data = %0x\n", phy_read(phydev, 0x17)); > >>>> + printk("offset 0x18 data = %0x\n", phy_read(phydev, 0x18)); > >>>> + printk("offset 0x1b data = %0x\n", phy_read(phydev, 0x1b)); > >>>> + printk("offset 0x1c data = %0x\n", phy_read(phydev, 0x1c)); > >>>> + printk("offset 0x1d data = %0x\n", phy_read(phydev, 0x1d)); > >>>> + printk("offset 0x1e data = %0x\n", phy_read(phydev, 0x1e)); > >>>> + printk("offset 0x1f data = %0x\n", phy_read(phydev, 0x1f)); > >>>> + printk("\n\n"); > >>>> if (phydev->link && phydev->state != PHY_RUNNING) { > >>>> phy_check_downshift(phydev); > >>>> phydev->state = PHY_RUNNING; > >>>> > >>>> After MDIO bus resume back, results as below: > >>>> > >>>> [ 119.545383] offset 0x00 data = 1100 [ 119.549237] offset 0x01 > >>>> data = 7849 [ 119.563125] offset 0x02 data = 22 [ 119.566810] > >>>> offset 0x03 data = 1560 [ 119.570597] offset 0x04 data = 85e1 [ > >>>> 119.588016] offset 0x05 data = 0 [ 119.592891] offset 0x06 data = > >>>> 4 [ 119.596452] offset 0x07 data = 2001 [ 119.600233] offset 0x08 > >>>> data = 0 [ 119.617864] offset 0x09 data = 0 [ 119.625990] offset > >>>> 0x10 data = 0 [ 119.629576] offset 0x11 data = 0 [ 119.642735] > >>>> offset 0x15 data = 0 [ 119.646332] offset 0x16 data = 202 [ > >>>> 119.650030] offset 0x17 data = 5c02 [ 119.668054] offset 0x18 data > >>>> = > >>>> 801 [ 119.672997] offset 0x1b data = 0 [ 119.676565] offset 0x1c > >>>> data = 0 [ 119.680084] offset 0x1d data = 0 [ 119.698031] offset > >>>> 0x1e data = 20 [ 119.706246] offset 0x1f data = 8190 [ > >>>> 119.709984] [ 119.709984] [ 120.182120] offset 0x00 data = 100 [ > >>>> 120.185894] offset 0x01 data = 784d [ 120.189681] offset 0x02 data > >>>> = 22 [ 120.206319] offset 0x03 data = 1560 [ 120.210171] offset > >>>> 0x04 data = > >>>> 8061 [ 120.225353] offset 0x05 data = 0 [ 120.228948] offset 0x06 > >>>> data = 4 [ 120.242936] offset 0x07 data = 2001 [ 120.246792] > >>>> offset > >>>> 0x08 data = 0 [ 120.250313] offset 0x09 data = 0 [ 120.266748] > >>>> offset 0x10 data = 0 [ 120.270335] offset 0x11 data = 0 [ > >>>> 120.284167] offset 0x15 data = 0 [ 120.287760] offset 0x16 data = > >>>> 202 [ 120.301031] offset 0x17 data = 3c02 [ 120.313209] offset > >>>> 0x18 data = 801 [ 120.316983] offset 0x1b data = 0 [ 120.320513] > >>>> offset 0x1c data = 1 [ 120.336589] offset 0x1d data = 0 [ > >>>> 120.340184] offset 0x1e data = 115 [ 120.355357] offset 0x1f data > >>>> = 8190 [ 120.359112] [ 120.359112] [ 129.785396] offset 0x00 data > >>>> = 1100 [ 129.789252] offset 0x01 data = 7849 [ 129.809951] offset > >>>> 0x02 data = > >>>> 22 [ 129.815018] offset 0x03 data = 1560 [ 129.818845] offset > >>>> 0x04 data = 85e1 [ 129.835808] offset 0x05 data = 0 [ 129.839398] > >>>> offset > >>>> 0x06 data = 4 [ 129.854514] offset 0x07 data = 2001 [ 129.858371] > >>>> offset 0x08 data = 0 [ 129.873357] offset 0x09 data = 0 [ > >>>> 129.876954] offset 0x10 data = 0 [ 129.880472] offset 0x11 data = > >>>> 0 [ 129.896450] offset 0x15 data = 0 [ 129.900038] offset 0x16 > >>>> data = > >>>> 202 [ 129.915041] offset 0x17 data = 5c02 [ 129.918889] offset > >>>> 0x18 data = 801 [ 129.932735] offset 0x1b data = 0 [ 129.946830] > >>>> offset 0x1c data = 0 [ 129.950424] offset 0x1d data = 0 [ > >>>> 129.964585] offset 0x1e data = 0 [ 129.968192] offset 0x1f data = > >>>> 8190 [ 129.972938] [ 129.972938] [ 130.425125] offset 0x00 data = > >>>> 100 [ 130.428889] offset 0x01 data = 784d [ 130.442671] offset > >>>> 0x02 data = > >>>> 22 [ 130.446360] offset 0x03 data = 1560 [ 130.450142] offset > >>>> 0x04 data = 8061 [ 130.467207] offset 0x05 data = 0 [ 130.470789] > >>>> offset > >>>> 0x06 data = 4 [ 130.485071] offset 0x07 data = 2001 [ 130.488934] > >>>> offset 0x08 data = 0 [ 130.504320] offset 0x09 data = 0 [ > >>>> 130.507911] offset 0x10 data = 0 [ 130.520950] offset 0x11 data = > >>>> 0 [ 130.532865] offset 0x15 data = 0 [ 130.536461] offset 0x16 > >>>> data = > >>>> 202 [ 130.540156] offset 0x17 data = 3c02 [ 130.557218] offset > >>>> 0x18 data = 801 [ 130.560987] offset 0x1b data = 0 [ 130.575158] > >>>> offset 0x1c data = 1 [ 130.578754] offset 0x1d data = 0 [ > >>>> 130.593543] offset 0x1e data = 115 [ 130.597312] offset 0x1f data > >>>> = 8190 > >>>> > >>>> We can see that BMCR and BMSR changes from 0x1100,0x7849 to > >> 0x100,0x784d (BMCR[12] bit and BMSR[2]), and loop it. > >>>> Above process is auto-nego enabled, link is down, auto-nego is > >>>> disabled, link > >> is up. And auto-nego complete bit is always 0. > >>>> > >>>> PHY seems become unstable if soft reset after start/config > >>>> auto-nego. I also > >> want to fix it in micrel driver, but failed. > >>>> > >>> > >>> Waiting for ANEG_COMPLETE to be set wouldn't be a good option. Aneg > >>> may never complete for different reasons, e.g. no physical link. And > >>> even if we use a timeout this may add unwanted delays. > >>> > >>>> Do you have any other insights that can help me further locate the issue? > >> Thanks. > >>>> > >>> > >>> I think current MAC/PHY PM handling isn't perfect. Often we have the > >>> following > >>> scenario: > >>> > >>> *suspend* > >>> 1. PHY is suspended (mdio_bus_phy_suspend) 2. MAC suspend callback > >>> (typically involving phy_stop()) > >>> > >>> *resume* > >>> 1. MAC resume callback (typically involving phy_start()) 2. PHY is > >>> resumed (mdio_bus_phy_resume), incl. calling phy_init_hw() > >>> > >>> Calling phy_init_hw() after phy_start() doesn't look right. > >>> It seems to work in most cases, but there's a certain risk that > >>> phy_init_hw() overwrites something, e.g. the advertised modes. > >>> I think we have two valid scenarios: > >>> > >>> 1. phylib PM callbacks are used, then the MAC driver shouldn't > >>> touch the PHY in its PM callbacks, especially not call > >>> phy_stop/phy_start. > >>> > >>> 2. MAC PM callbacks take care also of the PHY. Then I think we would > >>> need a flag at the phy_device telling it to make the PHY PM > >>> callbacks a no-op. > >>> > >>> Andrew, Russell: What do you think? > >>> > >>>> Best Regards, > >>>> Joakim Zhang > >>>> > >>>> [...] > >>>> > >>> Heiner > >>> > >> > >> Based on scenario 1 you can also try the following. > >> > >> diff --git a/drivers/net/ethernet/freescale/fec_main.c > >> b/drivers/net/ethernet/freescale/fec_main.c > >> index 3db882322..cdf9160fc 100644 > >> --- a/drivers/net/ethernet/freescale/fec_main.c > >> +++ b/drivers/net/ethernet/freescale/fec_main.c > >> @@ -3805,7 +3805,6 @@ static int __maybe_unused fec_suspend(struct > >> device *dev) > >> if (netif_running(ndev)) { > >> if (fep->wol_flag & FEC_WOL_FLAG_ENABLE) > >> fep->wol_flag |= FEC_WOL_FLAG_SLEEP_ON; > >> - phy_stop(ndev->phydev); > >> napi_disable(&fep->napi); > >> netif_tx_lock_bh(ndev); > >> netif_device_detach(ndev); > >> @@ -3864,7 +3863,6 @@ static int __maybe_unused fec_resume(struct > >> device *dev) > >> netif_device_attach(ndev); > >> netif_tx_unlock_bh(ndev); > >> napi_enable(&fep->napi); > >> - phy_start(ndev->phydev); > >> } > >> rtnl_unlock(); > > > > As I described in commit message: > > > > "When I am debugging, I found PHY works fine if MAC doesn't support > suspend/resume or phy_stop()/phy_start() doesn't been called during > suspend/resume. This let me realize, PHY state machine phy_state_machine() > could do something breaks the PHY." > > > > So I have tried your above code change before, and it works. But it is a very > common phenomenon that MAC suspend/resume invokes phy_stop/start or > phylink_stop/start, and it's been around for a long time. > > > > As the scenarios you list, it is indeed unreasonable to soft reset or config PHY > after calling phy_start_aneg() in state machine. PHY state should be PHY_UP > after calling phy_init_hw(), It seems more reasonable. > > > > Best Regards, > > Joakim Zhang > >> -- > >> 2.31.1 > >> > > > > Following is a draft patch for scenario 1: > Make PHY PM a no-op if MAC manages PHY PM. > Can you give it a try? > It can work for my case. Thanks. Best Regards, Joakim Zhang > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index > a009d1769..73d29fd5e 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -273,6 +273,9 @@ static __maybe_unused int > mdio_bus_phy_suspend(struct device *dev) { > struct phy_device *phydev = to_phy_device(dev); > > + if (phydev->mac_managed_pm) > + return 0; > + > /* We must stop the state machine manually, otherwise it stops out of > * control, possibly with the phydev->lock held. Upon resume, netdev > * may call phy routines that try to grab the same lock, and that may @@ > -294,6 +297,9 @@ static __maybe_unused int mdio_bus_phy_resume(struct > device *dev) > struct phy_device *phydev = to_phy_device(dev); > int ret; > > + if (phydev->mac_managed_pm) > + return 0; > + > if (!phydev->suspended_by_mdio_bus) > goto no_resume; > > diff --git a/include/linux/phy.h b/include/linux/phy.h index > 8e2cf84b2..46702af18 100644 > --- a/include/linux/phy.h > +++ b/include/linux/phy.h > @@ -567,6 +567,7 @@ struct phy_device { > unsigned loopback_enabled:1; > unsigned downshifted_rate:1; > unsigned is_on_sfp_module:1; > + unsigned mac_managed_pm:1; > > unsigned autoneg:1; > /* The most recently read link state */ > -- > 2.31.1 > > > > diff --git a/drivers/net/ethernet/freescale/fec_main.c > b/drivers/net/ethernet/freescale/fec_main.c > index 3db882322..70aea9c27 100644 > --- a/drivers/net/ethernet/freescale/fec_main.c > +++ b/drivers/net/ethernet/freescale/fec_main.c > @@ -2048,6 +2048,8 @@ static int fec_enet_mii_probe(struct net_device > *ndev) > fep->link = 0; > fep->full_duplex = 0; > > + phy_dev->mac_managed_pm = 1; > + > phy_attached_info(phy_dev); > > return 0; > @@ -3864,6 +3866,7 @@ static int __maybe_unused fec_resume(struct > device *dev) > netif_device_attach(ndev); > netif_tx_unlock_bh(ndev); > napi_enable(&fep->napi); > + phy_init_hw(ndev->phydev); > phy_start(ndev->phydev); > } > rtnl_unlock(); > -- > 2.31.1
Hi Heiner, > -----Original Message----- > From: Joakim Zhang <qiangqing.zhang@nxp.com> > Sent: 2021年4月7日 15:46 > To: Heiner Kallweit <hkallweit1@gmail.com>; christian.melki@t2data.com; > andrew@lunn.ch; linux@armlinux.org.uk; davem@davemloft.net; > kuba@kernel.org > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx > <linux-imx@nxp.com>; Florian Fainelli <f.fainelli@gmail.com> > Subject: RE: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume > back > > > Hi Heiner, > > > -----Original Message----- > > From: Heiner Kallweit <hkallweit1@gmail.com> > > Sent: 2021年4月7日 15:12 > > To: Joakim Zhang <qiangqing.zhang@nxp.com>; > > christian.melki@t2data.com; andrew@lunn.ch; linux@armlinux.org.uk; > > davem@davemloft.net; kuba@kernel.org > > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx > > <linux-imx@nxp.com>; Florian Fainelli <f.fainelli@gmail.com> > > Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus > > resume back > > > > On 07.04.2021 03:43, Joakim Zhang wrote: > > > > > > Hi Heiner, > > > > > >> -----Original Message----- > > >> From: Heiner Kallweit <hkallweit1@gmail.com> > > >> Sent: 2021年4月7日 2:22 > > >> To: Joakim Zhang <qiangqing.zhang@nxp.com>; > > >> christian.melki@t2data.com; andrew@lunn.ch; linux@armlinux.org.uk; > > >> davem@davemloft.net; kuba@kernel.org; Russell King - ARM Linux > > >> <linux@armlinux.org.uk> > > >> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > > >> dl-linux-imx <linux-imx@nxp.com>; Florian Fainelli > > >> <f.fainelli@gmail.com> > > >> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO > > >> bus resume back > > >> > > >> On 06.04.2021 13:42, Heiner Kallweit wrote: > > >>> On 06.04.2021 12:07, Joakim Zhang wrote: > > >>>> > > >>>>> -----Original Message----- > > >>>>> From: Heiner Kallweit <hkallweit1@gmail.com> > > >>>>> Sent: 2021年4月6日 14:29 > > >>>>> To: Joakim Zhang <qiangqing.zhang@nxp.com>; > > >>>>> christian.melki@t2data.com; andrew@lunn.ch; > > >>>>> linux@armlinux.org.uk; davem@davemloft.net; kuba@kernel.org > > >>>>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > > >>>>> dl-linux-imx <linux-imx@nxp.com> > > >>>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after > > >>>>> MDIO bus resume back > > >>>>> > > >>>>> On 06.04.2021 04:07, Joakim Zhang wrote: > > >>>>>> > > >>>>>> Hi Heiner, > > >>>>>> > > >>>>>>> -----Original Message----- > > >>>>>>> From: Heiner Kallweit <hkallweit1@gmail.com> > > >>>>>>> Sent: 2021年4月5日 20:10 > > >>>>>>> To: christian.melki@t2data.com; Joakim Zhang > > >>>>>>> <qiangqing.zhang@nxp.com>; andrew@lunn.ch; > > >>>>>>> linux@armlinux.org.uk; davem@davemloft.net; kuba@kernel.org > > >>>>>>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > > >>>>>>> dl-linux-imx <linux-imx@nxp.com> > > >>>>>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after > > >>>>>>> MDIO bus resume back > > >>>>>>> > > >>>>>>> On 05.04.2021 10:43, Christian Melki wrote: > > >>>>>>>> On 4/5/21 12:48 AM, Heiner Kallweit wrote: > > >>>>>>>>> On 04.04.2021 16:09, Heiner Kallweit wrote: > > >>>>>>>>>> On 04.04.2021 12:07, Joakim Zhang wrote: > > >>>>>>>>>>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram > > >> may > > >>>>>>>>>>> cut off PHY power") invokes phy_init_hw() when MDIO bus > > >>>>>>>>>>> resume, it will soft reset PHY if PHY driver implements > > >>>>>>>>>>> soft_reset > > >> callback. > > >>>>>>>>>>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset > > >>>>>>>>>>> callback to genphy_soft_reset for KSZ8081") adds > > >>>>>>>>>>> soft_reset for > > >> KSZ8081. > > >>>>>>>>>>> After these two patches, I found i.MX6UL 14x14 EVK which > > >>>>>>>>>>> connected to KSZ8081RNB doesn't work any more when > system > > >>>>> resume > > >>>>>>>>>>> back, MAC > > >>>>>>> driver is fec_main.c. > > >>>>>>>>>>> > > >>>>>>>>>>> It's obvious that initializing PHY hardware when MDIO bus > > >>>>>>>>>>> resume back would introduce some regression when PHY > > >>>>>>>>>>> implements soft_reset. When I > > >>>>>>>>>> > > >>>>>>>>>> Why is this obvious? Please elaborate on why a soft reset > > >>>>>>>>>> should break something. > > >>>>>>>>>> > > >>>>>>>>>>> am debugging, I found PHY works fine if MAC doesn't > > >>>>>>>>>>> support suspend/resume or phy_stop()/phy_start() doesn't > > >>>>>>>>>>> been called during suspend/resume. This let me realize, > > >>>>>>>>>>> PHY state machine > > >>>>>>>>>>> phy_state_machine() could do something breaks the PHY. > > >>>>>>>>>>> > > >>>>>>>>>>> As we known, MAC resume first and then MDIO bus resume > > when > > >>>>>>>>>>> system resume back from suspend. When MAC resume, usually > > it > > >>>>>>>>>>> will invoke > > >>>>>>>>>>> phy_start() where to change PHY state to PHY_UP, then > > >>>>>>>>>>> trigger the > > >>>>>>>>>>> stat> machine to run now. In phy_state_machine(), it will > > >>>>>>>>>>> start/config auto-nego, then change PHY state to > > >>>>>>>>>>> PHY_NOLINK, what to next is periodically check PHY link > > >>>>>>>>>>> status. When MDIO bus resume, it will initialize PHY > > >>>>>>>>>>> hardware, including soft_reset, what would soft_reset > > >>>>>>>>>>> affect seems various from > > >> different PHYs. > > >>>>>>>>>>> For KSZ8081RNB, when it in PHY_NOLINK state and then > > >>>>>>>>>>> perform a soft reset, > > >>>>>>> it will never complete auto-nego. > > >>>>>>>>>> > > >>>>>>>>>> Why? That would need to be checked in detail. Maybe chip > > >>>>>>>>>> errata documentation provides a hint. > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> The KSZ8081 spec says the following about bit BMCR_PDOWN: > > >>>>>>>>> > > >>>>>>>>> If software reset (Register 0.15) is used to exit power-down > > >>>>>>>>> mode (Register 0.11 = 1), two software reset writes > > >>>>>>>>> (Register > > >>>>>>>>> 0.15 = 1) are required. The first write clears power-down > > >>>>>>>>> mode; the second write resets the chip and re-latches the > > >>>>>>>>> pin strapping pin > > >> values. > > >>>>>>>>> > > >>>>>>>>> Maybe this causes the issue you see and genphy_soft_reset() > > >>>>>>>>> isn't appropriate for this PHY. Please re-test with the > > >>>>>>>>> KSZ8081 soft reset following the spec comment. > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> Interesting. Never expected that behavior. > > >>>>>>>> Thanks for catching it. Skimmed through the datasheets/erratas. > > >>>>>>>> This is what I found (micrel.c): > > >>>>>>>> > > >>>>>>>> 10/100: > > >>>>>>>> 8001 - Unaffected? > > >>>>>>>> 8021/8031 - Double reset after PDOWN. > > >>>>>>>> 8041 - Errata. PDOWN broken. Recommended do not use. Unclear > > >>>>>>>> if reset solves the issue since errata says no error after > > >>>>>>>> reset but is also claiming that only toggling PDOWN (may) or > > >>>>>>>> power will > > help. > > >>>>>>>> 8051 - Double reset after PDOWN. > > >>>>>>>> 8061 - Double reset after PDOWN. > > >>>>>>>> 8081 - Double reset after PDOWN. > > >>>>>>>> 8091 - Double reset after PDOWN. > > >>>>>>>> > > >>>>>>>> 10/100/1000: > > >>>>>>>> Nothing in gigabit afaics. > > >>>>>>>> > > >>>>>>>> Switches: > > >>>>>>>> 8862 - Not affected? > > >>>>>>>> 8863 - Errata. PDOWN broken. Reset will not help. Workaround > > exists. > > >>>>>>>> 8864 - Not affected? > > >>>>>>>> 8873 - Errata. PDOWN broken. Reset will not help. Workaround > > exists. > > >>>>>>>> 9477 - Errata. PDOWN broken. Will randomly cause link failure > > >>>>>>>> on adjacent links. Do not use. > > >>>>>>>> > > >>>>>>>> This certainly explains a lot. > > >>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> This patch changes PHY state to PHY_UP when MDIO bus > > resume > > >>>>>>>>>>> back, it should be reasonable after PHY hardware > > >>>>>>>>>>> re-initialized. Also give state machine a chance to > > >>>>>>>>>>> start/config > > >> auto-nego again. > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> If the MAC driver calls phy_stop() on suspend, then > > >>>>>>>>>> phydev->suspended is true and mdio_bus_phy_may_suspend() > > >>>>>>>>>> phydev->returns > > >>>>>>>>>> false. As a consequence > > >>>>>>>>>> phydev->suspended_by_mdio_bus is false and > > >>>>>>>>>> phydev->mdio_bus_phy_resume() > > >>>>>>>>>> skips the PHY hw initialization. > > >>>>>>>>>> Please also note that mdio_bus_phy_suspend() calls > > >>>>>>>>>> phy_stop_machine() that sets the state to PHY_UP. > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Forgot that MDIO bus suspend is done before MAC driver > suspend. > > >>>>>>>>> Therefore disregard this part for now. > > >>>>>>>>> > > >>>>>>>>>> Having said that the current argumentation isn't convincing. > > >>>>>>>>>> I'm not aware of such issues on other systems, therefore > > >>>>>>>>>> it's likely that something is system-dependent. > > >>>>>>>>>> > > >>>>>>>>>> Please check the exact call sequence on your system, maybe > > >>>>>>>>>> it provides a hint. > > >>>>>>>>>> > > >>>>>>>>>>> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> > > >>>>>>>>>>> --- > > >>>>>>>>>>> drivers/net/phy/phy_device.c | 7 +++++++ > > >>>>>>>>>>> 1 file changed, 7 insertions(+) > > >>>>>>>>>>> > > >>>>>>>>>>> diff --git a/drivers/net/phy/phy_device.c > > >>>>>>>>>>> b/drivers/net/phy/phy_device.c index > > >>>>>>>>>>> cc38e326405a..312a6f662481 > > >>>>>>>>>>> 100644 > > >>>>>>>>>>> --- a/drivers/net/phy/phy_device.c > > >>>>>>>>>>> +++ b/drivers/net/phy/phy_device.c > > >>>>>>>>>>> @@ -306,6 +306,13 @@ static __maybe_unused int > > >>>>>>> mdio_bus_phy_resume(struct device *dev) > > >>>>>>>>>>> ret = phy_resume(phydev); > > >>>>>>>>>>> if (ret < 0) > > >>>>>>>>>>> return ret; > > >>>>>>>>>>> + > > >>>>>>>>>>> + /* PHY state could be changed to PHY_NOLINK from MAC > > >>>>>>>>>>> +controller > > >>>>>>> resume > > >>>>>>>>>>> + * rounte with phy_start(), here change to PHY_UP after > > >>>>>>> re-initializing > > >>>>>>>>>>> + * PHY hardware, let PHY state machine to start/config > > >>>>>>>>>>> +auto-nego > > >>>>>>> again. > > >>>>>>>>>>> + */ > > >>>>>>>>>>> + phydev->state = PHY_UP; > > >>>>>>>>>>> + > > >>>>>>>>>>> no_resume: > > >>>>>>>>>>> if (phydev->attached_dev && phydev->adjust_link) > > >>>>>>>>>>> phy_start_machine(phydev); > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>>> This is a quick draft of the modified soft reset for KSZ8081. > > >>>>>>> Some tests would be appreciated. > > >>>>>>> > > >>>>>> > > >>>>>> I test this patch at my side, unfortunately, it still can't work. > > >>>>>> > > >>>>>> I have not found what soft reset would affect in 8081 spec, is > > >>>>>> there ang common Description for different PHYs? > > >>>>>> > > >>>>> > > >>>>> You can check the clause 22 spec: 802.3 22.2.4.1.1 > > >>>>> > > >>>>> Apart from that you can check BMCR value and which modes your > > >>>>> PHY advertises after a soft reset. If the PHY indeed wouldn't > > >>>>> restart aneg after a soft reset, then it would be the only one > > >>>>> with this behavior I know. And I'd wonder why there aren't more such > reports. > > >>>> > > >>>> Hi Heiner, > > >>>> > > >>>> We have reasons to believe it is a weird behavior of KSZ8081. I > > >>>> have another two PHYs at my side, AR8031 and RTL8211FD, they can > > >>>> work fine if > > >> soft_reset implemented. > > >>>> > > >>>> As commit message described, phy_start() would change PHY state > > >>>> to > > >> PHY_UP finally to PHY_NOLINK when MAC resume. > > >>>> After MDIO bus resume back, it will periodically check link status. > > >>>> I did below > > >> code change to dump all PHY registers. > > >>>> > > >>>> --- a/drivers/net/phy/phy.c > > >>>> +++ b/drivers/net/phy/phy.c > > >>>> @@ -35,7 +35,7 @@ > > >>>> #include <net/genetlink.h> > > >>>> #include <net/sock.h> > > >>>> > > >>>> -#define PHY_STATE_TIME HZ > > >>>> +#define PHY_STATE_TIME (10 * HZ) > > >>>> > > >>>> #define PHY_STATE_STR(_state) \ > > >>>> case PHY_##_state: \ > > >>>> @@ -738,6 +738,28 @@ static int phy_check_link_status(struct > > >>>> phy_device > > >> *phydev) > > >>>> if (err) > > >>>> return err; > > >>>> > > >>>> + printk("offset 0x00 data = %0x\n", phy_read(phydev, 0x00)); > > >>>> + printk("offset 0x01 data = %0x\n", phy_read(phydev, 0x01)); > > >>>> + printk("offset 0x02 data = %0x\n", phy_read(phydev, 0x02)); > > >>>> + printk("offset 0x03 data = %0x\n", phy_read(phydev, 0x03)); > > >>>> + printk("offset 0x04 data = %0x\n", phy_read(phydev, 0x04)); > > >>>> + printk("offset 0x05 data = %0x\n", phy_read(phydev, 0x05)); > > >>>> + printk("offset 0x06 data = %0x\n", phy_read(phydev, 0x06)); > > >>>> + printk("offset 0x07 data = %0x\n", phy_read(phydev, 0x07)); > > >>>> + printk("offset 0x08 data = %0x\n", phy_read(phydev, 0x08)); > > >>>> + printk("offset 0x09 data = %0x\n", phy_read(phydev, 0x09)); > > >>>> + printk("offset 0x10 data = %0x\n", phy_read(phydev, 0x10)); > > >>>> + printk("offset 0x11 data = %0x\n", phy_read(phydev, 0x11)); > > >>>> + printk("offset 0x15 data = %0x\n", phy_read(phydev, 0x15)); > > >>>> + printk("offset 0x16 data = %0x\n", phy_read(phydev, 0x16)); > > >>>> + printk("offset 0x17 data = %0x\n", phy_read(phydev, 0x17)); > > >>>> + printk("offset 0x18 data = %0x\n", phy_read(phydev, 0x18)); > > >>>> + printk("offset 0x1b data = %0x\n", phy_read(phydev, 0x1b)); > > >>>> + printk("offset 0x1c data = %0x\n", phy_read(phydev, 0x1c)); > > >>>> + printk("offset 0x1d data = %0x\n", phy_read(phydev, 0x1d)); > > >>>> + printk("offset 0x1e data = %0x\n", phy_read(phydev, 0x1e)); > > >>>> + printk("offset 0x1f data = %0x\n", phy_read(phydev, 0x1f)); > > >>>> + printk("\n\n"); > > >>>> if (phydev->link && phydev->state != PHY_RUNNING) { > > >>>> phy_check_downshift(phydev); > > >>>> phydev->state = PHY_RUNNING; > > >>>> > > >>>> After MDIO bus resume back, results as below: > > >>>> > > >>>> [ 119.545383] offset 0x00 data = 1100 [ 119.549237] offset 0x01 > > >>>> data = 7849 [ 119.563125] offset 0x02 data = 22 [ 119.566810] > > >>>> offset 0x03 data = 1560 [ 119.570597] offset 0x04 data = 85e1 [ > > >>>> 119.588016] offset 0x05 data = 0 [ 119.592891] offset 0x06 data > > >>>> = > > >>>> 4 [ 119.596452] offset 0x07 data = 2001 [ 119.600233] offset > > >>>> 0x08 data = 0 [ 119.617864] offset 0x09 data = 0 [ 119.625990] > > >>>> offset > > >>>> 0x10 data = 0 [ 119.629576] offset 0x11 data = 0 [ 119.642735] > > >>>> offset 0x15 data = 0 [ 119.646332] offset 0x16 data = 202 [ > > >>>> 119.650030] offset 0x17 data = 5c02 [ 119.668054] offset 0x18 > > >>>> data = > > >>>> 801 [ 119.672997] offset 0x1b data = 0 [ 119.676565] offset > > >>>> 0x1c data = 0 [ 119.680084] offset 0x1d data = 0 [ 119.698031] > > >>>> offset 0x1e data = 20 [ 119.706246] offset 0x1f data = 8190 [ > > >>>> 119.709984] [ 119.709984] [ 120.182120] offset 0x00 data = 100 > > >>>> [ 120.185894] offset 0x01 data = 784d [ 120.189681] offset 0x02 > > >>>> data = 22 [ 120.206319] offset 0x03 data = 1560 [ 120.210171] > > >>>> offset > > >>>> 0x04 data = > > >>>> 8061 [ 120.225353] offset 0x05 data = 0 [ 120.228948] offset > > >>>> 0x06 data = 4 [ 120.242936] offset 0x07 data = 2001 [ > > >>>> 120.246792] offset > > >>>> 0x08 data = 0 [ 120.250313] offset 0x09 data = 0 [ 120.266748] > > >>>> offset 0x10 data = 0 [ 120.270335] offset 0x11 data = 0 [ > > >>>> 120.284167] offset 0x15 data = 0 [ 120.287760] offset 0x16 data > > >>>> = > > >>>> 202 [ 120.301031] offset 0x17 data = 3c02 [ 120.313209] offset > > >>>> 0x18 data = 801 [ 120.316983] offset 0x1b data = 0 [ > > >>>> 120.320513] offset 0x1c data = 1 [ 120.336589] offset 0x1d data > > >>>> = 0 [ 120.340184] offset 0x1e data = 115 [ 120.355357] offset > > >>>> 0x1f data = 8190 [ 120.359112] [ 120.359112] [ 129.785396] > > >>>> offset 0x00 data = 1100 [ 129.789252] offset 0x01 data = 7849 [ > > >>>> 129.809951] offset > > >>>> 0x02 data = > > >>>> 22 [ 129.815018] offset 0x03 data = 1560 [ 129.818845] offset > > >>>> 0x04 data = 85e1 [ 129.835808] offset 0x05 data = 0 [ > > >>>> 129.839398] offset > > >>>> 0x06 data = 4 [ 129.854514] offset 0x07 data = 2001 [ > > >>>> 129.858371] offset 0x08 data = 0 [ 129.873357] offset 0x09 data > > >>>> = 0 [ 129.876954] offset 0x10 data = 0 [ 129.880472] offset 0x11 > > >>>> data = > > >>>> 0 [ 129.896450] offset 0x15 data = 0 [ 129.900038] offset 0x16 > > >>>> data = > > >>>> 202 [ 129.915041] offset 0x17 data = 5c02 [ 129.918889] offset > > >>>> 0x18 data = 801 [ 129.932735] offset 0x1b data = 0 [ > > >>>> 129.946830] offset 0x1c data = 0 [ 129.950424] offset 0x1d data > > >>>> = 0 [ 129.964585] offset 0x1e data = 0 [ 129.968192] offset 0x1f > > >>>> data = > > >>>> 8190 [ 129.972938] [ 129.972938] [ 130.425125] offset 0x00 data > > >>>> = > > >>>> 100 [ 130.428889] offset 0x01 data = 784d [ 130.442671] offset > > >>>> 0x02 data = > > >>>> 22 [ 130.446360] offset 0x03 data = 1560 [ 130.450142] offset > > >>>> 0x04 data = 8061 [ 130.467207] offset 0x05 data = 0 [ > > >>>> 130.470789] offset > > >>>> 0x06 data = 4 [ 130.485071] offset 0x07 data = 2001 [ > > >>>> 130.488934] offset 0x08 data = 0 [ 130.504320] offset 0x09 data > > >>>> = 0 [ 130.507911] offset 0x10 data = 0 [ 130.520950] offset 0x11 > > >>>> data = > > >>>> 0 [ 130.532865] offset 0x15 data = 0 [ 130.536461] offset 0x16 > > >>>> data = > > >>>> 202 [ 130.540156] offset 0x17 data = 3c02 [ 130.557218] offset > > >>>> 0x18 data = 801 [ 130.560987] offset 0x1b data = 0 [ > > >>>> 130.575158] offset 0x1c data = 1 [ 130.578754] offset 0x1d data > > >>>> = 0 [ 130.593543] offset 0x1e data = 115 [ 130.597312] offset > > >>>> 0x1f data = 8190 > > >>>> > > >>>> We can see that BMCR and BMSR changes from 0x1100,0x7849 to > > >> 0x100,0x784d (BMCR[12] bit and BMSR[2]), and loop it. > > >>>> Above process is auto-nego enabled, link is down, auto-nego is > > >>>> disabled, link > > >> is up. And auto-nego complete bit is always 0. > > >>>> > > >>>> PHY seems become unstable if soft reset after start/config > > >>>> auto-nego. I also > > >> want to fix it in micrel driver, but failed. > > >>>> > > >>> > > >>> Waiting for ANEG_COMPLETE to be set wouldn't be a good option. > > >>> Aneg may never complete for different reasons, e.g. no physical > > >>> link. And even if we use a timeout this may add unwanted delays. > > >>> > > >>>> Do you have any other insights that can help me further locate the > issue? > > >> Thanks. > > >>>> > > >>> > > >>> I think current MAC/PHY PM handling isn't perfect. Often we have > > >>> the following > > >>> scenario: > > >>> > > >>> *suspend* > > >>> 1. PHY is suspended (mdio_bus_phy_suspend) 2. MAC suspend callback > > >>> (typically involving phy_stop()) > > >>> > > >>> *resume* > > >>> 1. MAC resume callback (typically involving phy_start()) 2. PHY is > > >>> resumed (mdio_bus_phy_resume), incl. calling phy_init_hw() > > >>> > > >>> Calling phy_init_hw() after phy_start() doesn't look right. > > >>> It seems to work in most cases, but there's a certain risk that > > >>> phy_init_hw() overwrites something, e.g. the advertised modes. > > >>> I think we have two valid scenarios: > > >>> > > >>> 1. phylib PM callbacks are used, then the MAC driver shouldn't > > >>> touch the PHY in its PM callbacks, especially not call > > >>> phy_stop/phy_start. > > >>> > > >>> 2. MAC PM callbacks take care also of the PHY. Then I think we would > > >>> need a flag at the phy_device telling it to make the PHY PM > > >>> callbacks a no-op. > > >>> > > >>> Andrew, Russell: What do you think? > > >>> > > >>>> Best Regards, > > >>>> Joakim Zhang > > >>>> > > >>>> [...] > > >>>> > > >>> Heiner > > >>> > > >> > > >> Based on scenario 1 you can also try the following. > > >> > > >> diff --git a/drivers/net/ethernet/freescale/fec_main.c > > >> b/drivers/net/ethernet/freescale/fec_main.c > > >> index 3db882322..cdf9160fc 100644 > > >> --- a/drivers/net/ethernet/freescale/fec_main.c > > >> +++ b/drivers/net/ethernet/freescale/fec_main.c > > >> @@ -3805,7 +3805,6 @@ static int __maybe_unused fec_suspend(struct > > >> device *dev) > > >> if (netif_running(ndev)) { > > >> if (fep->wol_flag & FEC_WOL_FLAG_ENABLE) > > >> fep->wol_flag |= FEC_WOL_FLAG_SLEEP_ON; > > >> - phy_stop(ndev->phydev); > > >> napi_disable(&fep->napi); > > >> netif_tx_lock_bh(ndev); > > >> netif_device_detach(ndev); > > >> @@ -3864,7 +3863,6 @@ static int __maybe_unused fec_resume(struct > > >> device *dev) > > >> netif_device_attach(ndev); > > >> netif_tx_unlock_bh(ndev); > > >> napi_enable(&fep->napi); > > >> - phy_start(ndev->phydev); > > >> } > > >> rtnl_unlock(); > > > > > > As I described in commit message: > > > > > > "When I am debugging, I found PHY works fine if MAC doesn't support > > suspend/resume or phy_stop()/phy_start() doesn't been called during > > suspend/resume. This let me realize, PHY state machine > > phy_state_machine() could do something breaks the PHY." > > > > > > So I have tried your above code change before, and it works. But it > > > is a very > > common phenomenon that MAC suspend/resume invokes phy_stop/start or > > phylink_stop/start, and it's been around for a long time. > > > > > > As the scenarios you list, it is indeed unreasonable to soft reset > > > or config PHY > > after calling phy_start_aneg() in state machine. PHY state should be > > PHY_UP after calling phy_init_hw(), It seems more reasonable. > > > > > > Best Regards, > > > Joakim Zhang > > >> -- > > >> 2.31.1 > > >> > > > > > > > Following is a draft patch for scenario 1: > > Make PHY PM a no-op if MAC manages PHY PM. > > Can you give it a try? > > > > It can work for my case. Thanks. I have another question, if it is possible to change the suspend/resume sequence? MAC should suspend before MDIO bus, and MDIO bus should resume before MAC. For some MACs, they need PHY feed clocks. It seems more reasonable. Best Regards, Joakim Zhang > Best Regards, > Joakim Zhang > > diff --git a/drivers/net/phy/phy_device.c > > b/drivers/net/phy/phy_device.c index a009d1769..73d29fd5e 100644 > > --- a/drivers/net/phy/phy_device.c > > +++ b/drivers/net/phy/phy_device.c > > @@ -273,6 +273,9 @@ static __maybe_unused int > > mdio_bus_phy_suspend(struct device *dev) { > > struct phy_device *phydev = to_phy_device(dev); > > > > + if (phydev->mac_managed_pm) > > + return 0; > > + > > /* We must stop the state machine manually, otherwise it stops out of > > * control, possibly with the phydev->lock held. Upon resume, netdev > > * may call phy routines that try to grab the same lock, and that > > may @@ > > -294,6 +297,9 @@ static __maybe_unused int mdio_bus_phy_resume(struct > > device *dev) > > struct phy_device *phydev = to_phy_device(dev); > > int ret; > > > > + if (phydev->mac_managed_pm) > > + return 0; > > + > > if (!phydev->suspended_by_mdio_bus) > > goto no_resume; > > > > diff --git a/include/linux/phy.h b/include/linux/phy.h index > > 8e2cf84b2..46702af18 100644 > > --- a/include/linux/phy.h > > +++ b/include/linux/phy.h > > @@ -567,6 +567,7 @@ struct phy_device { > > unsigned loopback_enabled:1; > > unsigned downshifted_rate:1; > > unsigned is_on_sfp_module:1; > > + unsigned mac_managed_pm:1; > > > > unsigned autoneg:1; > > /* The most recently read link state */ > > -- > > 2.31.1 > > > > > > > > diff --git a/drivers/net/ethernet/freescale/fec_main.c > > b/drivers/net/ethernet/freescale/fec_main.c > > index 3db882322..70aea9c27 100644 > > --- a/drivers/net/ethernet/freescale/fec_main.c > > +++ b/drivers/net/ethernet/freescale/fec_main.c > > @@ -2048,6 +2048,8 @@ static int fec_enet_mii_probe(struct net_device > > *ndev) > > fep->link = 0; > > fep->full_duplex = 0; > > > > + phy_dev->mac_managed_pm = 1; > > + > > phy_attached_info(phy_dev); > > > > return 0; > > @@ -3864,6 +3866,7 @@ static int __maybe_unused fec_resume(struct > > device *dev) > > netif_device_attach(ndev); > > netif_tx_unlock_bh(ndev); > > napi_enable(&fep->napi); > > + phy_init_hw(ndev->phydev); > > phy_start(ndev->phydev); > > } > > rtnl_unlock(); > > -- > > 2.31.1
On 07.04.2021 12:05, Joakim Zhang wrote: > > Hi Heiner, > >> -----Original Message----- >> From: Joakim Zhang <qiangqing.zhang@nxp.com> >> Sent: 2021年4月7日 15:46 >> To: Heiner Kallweit <hkallweit1@gmail.com>; christian.melki@t2data.com; >> andrew@lunn.ch; linux@armlinux.org.uk; davem@davemloft.net; >> kuba@kernel.org >> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx >> <linux-imx@nxp.com>; Florian Fainelli <f.fainelli@gmail.com> >> Subject: RE: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume >> back >> >> >> Hi Heiner, >> >>> -----Original Message----- >>> From: Heiner Kallweit <hkallweit1@gmail.com> >>> Sent: 2021年4月7日 15:12 >>> To: Joakim Zhang <qiangqing.zhang@nxp.com>; >>> christian.melki@t2data.com; andrew@lunn.ch; linux@armlinux.org.uk; >>> davem@davemloft.net; kuba@kernel.org >>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx >>> <linux-imx@nxp.com>; Florian Fainelli <f.fainelli@gmail.com> >>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus >>> resume back >>> >>> On 07.04.2021 03:43, Joakim Zhang wrote: >>>> >>>> Hi Heiner, >>>> >>>>> -----Original Message----- >>>>> From: Heiner Kallweit <hkallweit1@gmail.com> >>>>> Sent: 2021年4月7日 2:22 >>>>> To: Joakim Zhang <qiangqing.zhang@nxp.com>; >>>>> christian.melki@t2data.com; andrew@lunn.ch; linux@armlinux.org.uk; >>>>> davem@davemloft.net; kuba@kernel.org; Russell King - ARM Linux >>>>> <linux@armlinux.org.uk> >>>>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; >>>>> dl-linux-imx <linux-imx@nxp.com>; Florian Fainelli >>>>> <f.fainelli@gmail.com> >>>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO >>>>> bus resume back >>>>> >>>>> On 06.04.2021 13:42, Heiner Kallweit wrote: >>>>>> On 06.04.2021 12:07, Joakim Zhang wrote: >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Heiner Kallweit <hkallweit1@gmail.com> >>>>>>>> Sent: 2021年4月6日 14:29 >>>>>>>> To: Joakim Zhang <qiangqing.zhang@nxp.com>; >>>>>>>> christian.melki@t2data.com; andrew@lunn.ch; >>>>>>>> linux@armlinux.org.uk; davem@davemloft.net; kuba@kernel.org >>>>>>>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; >>>>>>>> dl-linux-imx <linux-imx@nxp.com> >>>>>>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after >>>>>>>> MDIO bus resume back >>>>>>>> >>>>>>>> On 06.04.2021 04:07, Joakim Zhang wrote: >>>>>>>>> >>>>>>>>> Hi Heiner, >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Heiner Kallweit <hkallweit1@gmail.com> >>>>>>>>>> Sent: 2021年4月5日 20:10 >>>>>>>>>> To: christian.melki@t2data.com; Joakim Zhang >>>>>>>>>> <qiangqing.zhang@nxp.com>; andrew@lunn.ch; >>>>>>>>>> linux@armlinux.org.uk; davem@davemloft.net; kuba@kernel.org >>>>>>>>>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; >>>>>>>>>> dl-linux-imx <linux-imx@nxp.com> >>>>>>>>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after >>>>>>>>>> MDIO bus resume back >>>>>>>>>> >>>>>>>>>> On 05.04.2021 10:43, Christian Melki wrote: >>>>>>>>>>> On 4/5/21 12:48 AM, Heiner Kallweit wrote: >>>>>>>>>>>> On 04.04.2021 16:09, Heiner Kallweit wrote: >>>>>>>>>>>>> On 04.04.2021 12:07, Joakim Zhang wrote: >>>>>>>>>>>>>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram >>>>> may >>>>>>>>>>>>>> cut off PHY power") invokes phy_init_hw() when MDIO bus >>>>>>>>>>>>>> resume, it will soft reset PHY if PHY driver implements >>>>>>>>>>>>>> soft_reset >>>>> callback. >>>>>>>>>>>>>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset >>>>>>>>>>>>>> callback to genphy_soft_reset for KSZ8081") adds >>>>>>>>>>>>>> soft_reset for >>>>> KSZ8081. >>>>>>>>>>>>>> After these two patches, I found i.MX6UL 14x14 EVK which >>>>>>>>>>>>>> connected to KSZ8081RNB doesn't work any more when >> system >>>>>>>> resume >>>>>>>>>>>>>> back, MAC >>>>>>>>>> driver is fec_main.c. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It's obvious that initializing PHY hardware when MDIO bus >>>>>>>>>>>>>> resume back would introduce some regression when PHY >>>>>>>>>>>>>> implements soft_reset. When I >>>>>>>>>>>>> >>>>>>>>>>>>> Why is this obvious? Please elaborate on why a soft reset >>>>>>>>>>>>> should break something. >>>>>>>>>>>>> >>>>>>>>>>>>>> am debugging, I found PHY works fine if MAC doesn't >>>>>>>>>>>>>> support suspend/resume or phy_stop()/phy_start() doesn't >>>>>>>>>>>>>> been called during suspend/resume. This let me realize, >>>>>>>>>>>>>> PHY state machine >>>>>>>>>>>>>> phy_state_machine() could do something breaks the PHY. >>>>>>>>>>>>>> >>>>>>>>>>>>>> As we known, MAC resume first and then MDIO bus resume >>> when >>>>>>>>>>>>>> system resume back from suspend. When MAC resume, usually >>> it >>>>>>>>>>>>>> will invoke >>>>>>>>>>>>>> phy_start() where to change PHY state to PHY_UP, then >>>>>>>>>>>>>> trigger the >>>>>>>>>>>>>> stat> machine to run now. In phy_state_machine(), it will >>>>>>>>>>>>>> start/config auto-nego, then change PHY state to >>>>>>>>>>>>>> PHY_NOLINK, what to next is periodically check PHY link >>>>>>>>>>>>>> status. When MDIO bus resume, it will initialize PHY >>>>>>>>>>>>>> hardware, including soft_reset, what would soft_reset >>>>>>>>>>>>>> affect seems various from >>>>> different PHYs. >>>>>>>>>>>>>> For KSZ8081RNB, when it in PHY_NOLINK state and then >>>>>>>>>>>>>> perform a soft reset, >>>>>>>>>> it will never complete auto-nego. >>>>>>>>>>>>> >>>>>>>>>>>>> Why? That would need to be checked in detail. Maybe chip >>>>>>>>>>>>> errata documentation provides a hint. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> The KSZ8081 spec says the following about bit BMCR_PDOWN: >>>>>>>>>>>> >>>>>>>>>>>> If software reset (Register 0.15) is used to exit power-down >>>>>>>>>>>> mode (Register 0.11 = 1), two software reset writes >>>>>>>>>>>> (Register >>>>>>>>>>>> 0.15 = 1) are required. The first write clears power-down >>>>>>>>>>>> mode; the second write resets the chip and re-latches the >>>>>>>>>>>> pin strapping pin >>>>> values. >>>>>>>>>>>> >>>>>>>>>>>> Maybe this causes the issue you see and genphy_soft_reset() >>>>>>>>>>>> isn't appropriate for this PHY. Please re-test with the >>>>>>>>>>>> KSZ8081 soft reset following the spec comment. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Interesting. Never expected that behavior. >>>>>>>>>>> Thanks for catching it. Skimmed through the datasheets/erratas. >>>>>>>>>>> This is what I found (micrel.c): >>>>>>>>>>> >>>>>>>>>>> 10/100: >>>>>>>>>>> 8001 - Unaffected? >>>>>>>>>>> 8021/8031 - Double reset after PDOWN. >>>>>>>>>>> 8041 - Errata. PDOWN broken. Recommended do not use. Unclear >>>>>>>>>>> if reset solves the issue since errata says no error after >>>>>>>>>>> reset but is also claiming that only toggling PDOWN (may) or >>>>>>>>>>> power will >>> help. >>>>>>>>>>> 8051 - Double reset after PDOWN. >>>>>>>>>>> 8061 - Double reset after PDOWN. >>>>>>>>>>> 8081 - Double reset after PDOWN. >>>>>>>>>>> 8091 - Double reset after PDOWN. >>>>>>>>>>> >>>>>>>>>>> 10/100/1000: >>>>>>>>>>> Nothing in gigabit afaics. >>>>>>>>>>> >>>>>>>>>>> Switches: >>>>>>>>>>> 8862 - Not affected? >>>>>>>>>>> 8863 - Errata. PDOWN broken. Reset will not help. Workaround >>> exists. >>>>>>>>>>> 8864 - Not affected? >>>>>>>>>>> 8873 - Errata. PDOWN broken. Reset will not help. Workaround >>> exists. >>>>>>>>>>> 9477 - Errata. PDOWN broken. Will randomly cause link failure >>>>>>>>>>> on adjacent links. Do not use. >>>>>>>>>>> >>>>>>>>>>> This certainly explains a lot. >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> This patch changes PHY state to PHY_UP when MDIO bus >>> resume >>>>>>>>>>>>>> back, it should be reasonable after PHY hardware >>>>>>>>>>>>>> re-initialized. Also give state machine a chance to >>>>>>>>>>>>>> start/config >>>>> auto-nego again. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> If the MAC driver calls phy_stop() on suspend, then >>>>>>>>>>>>> phydev->suspended is true and mdio_bus_phy_may_suspend() >>>>>>>>>>>>> phydev->returns >>>>>>>>>>>>> false. As a consequence >>>>>>>>>>>>> phydev->suspended_by_mdio_bus is false and >>>>>>>>>>>>> phydev->mdio_bus_phy_resume() >>>>>>>>>>>>> skips the PHY hw initialization. >>>>>>>>>>>>> Please also note that mdio_bus_phy_suspend() calls >>>>>>>>>>>>> phy_stop_machine() that sets the state to PHY_UP. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Forgot that MDIO bus suspend is done before MAC driver >> suspend. >>>>>>>>>>>> Therefore disregard this part for now. >>>>>>>>>>>> >>>>>>>>>>>>> Having said that the current argumentation isn't convincing. >>>>>>>>>>>>> I'm not aware of such issues on other systems, therefore >>>>>>>>>>>>> it's likely that something is system-dependent. >>>>>>>>>>>>> >>>>>>>>>>>>> Please check the exact call sequence on your system, maybe >>>>>>>>>>>>> it provides a hint. >>>>>>>>>>>>> >>>>>>>>>>>>>> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> >>>>>>>>>>>>>> --- >>>>>>>>>>>>>> drivers/net/phy/phy_device.c | 7 +++++++ >>>>>>>>>>>>>> 1 file changed, 7 insertions(+) >>>>>>>>>>>>>> >>>>>>>>>>>>>> diff --git a/drivers/net/phy/phy_device.c >>>>>>>>>>>>>> b/drivers/net/phy/phy_device.c index >>>>>>>>>>>>>> cc38e326405a..312a6f662481 >>>>>>>>>>>>>> 100644 >>>>>>>>>>>>>> --- a/drivers/net/phy/phy_device.c >>>>>>>>>>>>>> +++ b/drivers/net/phy/phy_device.c >>>>>>>>>>>>>> @@ -306,6 +306,13 @@ static __maybe_unused int >>>>>>>>>> mdio_bus_phy_resume(struct device *dev) >>>>>>>>>>>>>> ret = phy_resume(phydev); >>>>>>>>>>>>>> if (ret < 0) >>>>>>>>>>>>>> return ret; >>>>>>>>>>>>>> + >>>>>>>>>>>>>> + /* PHY state could be changed to PHY_NOLINK from MAC >>>>>>>>>>>>>> +controller >>>>>>>>>> resume >>>>>>>>>>>>>> + * rounte with phy_start(), here change to PHY_UP after >>>>>>>>>> re-initializing >>>>>>>>>>>>>> + * PHY hardware, let PHY state machine to start/config >>>>>>>>>>>>>> +auto-nego >>>>>>>>>> again. >>>>>>>>>>>>>> + */ >>>>>>>>>>>>>> + phydev->state = PHY_UP; >>>>>>>>>>>>>> + >>>>>>>>>>>>>> no_resume: >>>>>>>>>>>>>> if (phydev->attached_dev && phydev->adjust_link) >>>>>>>>>>>>>> phy_start_machine(phydev); >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This is a quick draft of the modified soft reset for KSZ8081. >>>>>>>>>> Some tests would be appreciated. >>>>>>>>>> >>>>>>>>> >>>>>>>>> I test this patch at my side, unfortunately, it still can't work. >>>>>>>>> >>>>>>>>> I have not found what soft reset would affect in 8081 spec, is >>>>>>>>> there ang common Description for different PHYs? >>>>>>>>> >>>>>>>> >>>>>>>> You can check the clause 22 spec: 802.3 22.2.4.1.1 >>>>>>>> >>>>>>>> Apart from that you can check BMCR value and which modes your >>>>>>>> PHY advertises after a soft reset. If the PHY indeed wouldn't >>>>>>>> restart aneg after a soft reset, then it would be the only one >>>>>>>> with this behavior I know. And I'd wonder why there aren't more such >> reports. >>>>>>> >>>>>>> Hi Heiner, >>>>>>> >>>>>>> We have reasons to believe it is a weird behavior of KSZ8081. I >>>>>>> have another two PHYs at my side, AR8031 and RTL8211FD, they can >>>>>>> work fine if >>>>> soft_reset implemented. >>>>>>> >>>>>>> As commit message described, phy_start() would change PHY state >>>>>>> to >>>>> PHY_UP finally to PHY_NOLINK when MAC resume. >>>>>>> After MDIO bus resume back, it will periodically check link status. >>>>>>> I did below >>>>> code change to dump all PHY registers. >>>>>>> >>>>>>> --- a/drivers/net/phy/phy.c >>>>>>> +++ b/drivers/net/phy/phy.c >>>>>>> @@ -35,7 +35,7 @@ >>>>>>> #include <net/genetlink.h> >>>>>>> #include <net/sock.h> >>>>>>> >>>>>>> -#define PHY_STATE_TIME HZ >>>>>>> +#define PHY_STATE_TIME (10 * HZ) >>>>>>> >>>>>>> #define PHY_STATE_STR(_state) \ >>>>>>> case PHY_##_state: \ >>>>>>> @@ -738,6 +738,28 @@ static int phy_check_link_status(struct >>>>>>> phy_device >>>>> *phydev) >>>>>>> if (err) >>>>>>> return err; >>>>>>> >>>>>>> + printk("offset 0x00 data = %0x\n", phy_read(phydev, 0x00)); >>>>>>> + printk("offset 0x01 data = %0x\n", phy_read(phydev, 0x01)); >>>>>>> + printk("offset 0x02 data = %0x\n", phy_read(phydev, 0x02)); >>>>>>> + printk("offset 0x03 data = %0x\n", phy_read(phydev, 0x03)); >>>>>>> + printk("offset 0x04 data = %0x\n", phy_read(phydev, 0x04)); >>>>>>> + printk("offset 0x05 data = %0x\n", phy_read(phydev, 0x05)); >>>>>>> + printk("offset 0x06 data = %0x\n", phy_read(phydev, 0x06)); >>>>>>> + printk("offset 0x07 data = %0x\n", phy_read(phydev, 0x07)); >>>>>>> + printk("offset 0x08 data = %0x\n", phy_read(phydev, 0x08)); >>>>>>> + printk("offset 0x09 data = %0x\n", phy_read(phydev, 0x09)); >>>>>>> + printk("offset 0x10 data = %0x\n", phy_read(phydev, 0x10)); >>>>>>> + printk("offset 0x11 data = %0x\n", phy_read(phydev, 0x11)); >>>>>>> + printk("offset 0x15 data = %0x\n", phy_read(phydev, 0x15)); >>>>>>> + printk("offset 0x16 data = %0x\n", phy_read(phydev, 0x16)); >>>>>>> + printk("offset 0x17 data = %0x\n", phy_read(phydev, 0x17)); >>>>>>> + printk("offset 0x18 data = %0x\n", phy_read(phydev, 0x18)); >>>>>>> + printk("offset 0x1b data = %0x\n", phy_read(phydev, 0x1b)); >>>>>>> + printk("offset 0x1c data = %0x\n", phy_read(phydev, 0x1c)); >>>>>>> + printk("offset 0x1d data = %0x\n", phy_read(phydev, 0x1d)); >>>>>>> + printk("offset 0x1e data = %0x\n", phy_read(phydev, 0x1e)); >>>>>>> + printk("offset 0x1f data = %0x\n", phy_read(phydev, 0x1f)); >>>>>>> + printk("\n\n"); >>>>>>> if (phydev->link && phydev->state != PHY_RUNNING) { >>>>>>> phy_check_downshift(phydev); >>>>>>> phydev->state = PHY_RUNNING; >>>>>>> >>>>>>> After MDIO bus resume back, results as below: >>>>>>> >>>>>>> [ 119.545383] offset 0x00 data = 1100 [ 119.549237] offset 0x01 >>>>>>> data = 7849 [ 119.563125] offset 0x02 data = 22 [ 119.566810] >>>>>>> offset 0x03 data = 1560 [ 119.570597] offset 0x04 data = 85e1 [ >>>>>>> 119.588016] offset 0x05 data = 0 [ 119.592891] offset 0x06 data >>>>>>> = >>>>>>> 4 [ 119.596452] offset 0x07 data = 2001 [ 119.600233] offset >>>>>>> 0x08 data = 0 [ 119.617864] offset 0x09 data = 0 [ 119.625990] >>>>>>> offset >>>>>>> 0x10 data = 0 [ 119.629576] offset 0x11 data = 0 [ 119.642735] >>>>>>> offset 0x15 data = 0 [ 119.646332] offset 0x16 data = 202 [ >>>>>>> 119.650030] offset 0x17 data = 5c02 [ 119.668054] offset 0x18 >>>>>>> data = >>>>>>> 801 [ 119.672997] offset 0x1b data = 0 [ 119.676565] offset >>>>>>> 0x1c data = 0 [ 119.680084] offset 0x1d data = 0 [ 119.698031] >>>>>>> offset 0x1e data = 20 [ 119.706246] offset 0x1f data = 8190 [ >>>>>>> 119.709984] [ 119.709984] [ 120.182120] offset 0x00 data = 100 >>>>>>> [ 120.185894] offset 0x01 data = 784d [ 120.189681] offset 0x02 >>>>>>> data = 22 [ 120.206319] offset 0x03 data = 1560 [ 120.210171] >>>>>>> offset >>>>>>> 0x04 data = >>>>>>> 8061 [ 120.225353] offset 0x05 data = 0 [ 120.228948] offset >>>>>>> 0x06 data = 4 [ 120.242936] offset 0x07 data = 2001 [ >>>>>>> 120.246792] offset >>>>>>> 0x08 data = 0 [ 120.250313] offset 0x09 data = 0 [ 120.266748] >>>>>>> offset 0x10 data = 0 [ 120.270335] offset 0x11 data = 0 [ >>>>>>> 120.284167] offset 0x15 data = 0 [ 120.287760] offset 0x16 data >>>>>>> = >>>>>>> 202 [ 120.301031] offset 0x17 data = 3c02 [ 120.313209] offset >>>>>>> 0x18 data = 801 [ 120.316983] offset 0x1b data = 0 [ >>>>>>> 120.320513] offset 0x1c data = 1 [ 120.336589] offset 0x1d data >>>>>>> = 0 [ 120.340184] offset 0x1e data = 115 [ 120.355357] offset >>>>>>> 0x1f data = 8190 [ 120.359112] [ 120.359112] [ 129.785396] >>>>>>> offset 0x00 data = 1100 [ 129.789252] offset 0x01 data = 7849 [ >>>>>>> 129.809951] offset >>>>>>> 0x02 data = >>>>>>> 22 [ 129.815018] offset 0x03 data = 1560 [ 129.818845] offset >>>>>>> 0x04 data = 85e1 [ 129.835808] offset 0x05 data = 0 [ >>>>>>> 129.839398] offset >>>>>>> 0x06 data = 4 [ 129.854514] offset 0x07 data = 2001 [ >>>>>>> 129.858371] offset 0x08 data = 0 [ 129.873357] offset 0x09 data >>>>>>> = 0 [ 129.876954] offset 0x10 data = 0 [ 129.880472] offset 0x11 >>>>>>> data = >>>>>>> 0 [ 129.896450] offset 0x15 data = 0 [ 129.900038] offset 0x16 >>>>>>> data = >>>>>>> 202 [ 129.915041] offset 0x17 data = 5c02 [ 129.918889] offset >>>>>>> 0x18 data = 801 [ 129.932735] offset 0x1b data = 0 [ >>>>>>> 129.946830] offset 0x1c data = 0 [ 129.950424] offset 0x1d data >>>>>>> = 0 [ 129.964585] offset 0x1e data = 0 [ 129.968192] offset 0x1f >>>>>>> data = >>>>>>> 8190 [ 129.972938] [ 129.972938] [ 130.425125] offset 0x00 data >>>>>>> = >>>>>>> 100 [ 130.428889] offset 0x01 data = 784d [ 130.442671] offset >>>>>>> 0x02 data = >>>>>>> 22 [ 130.446360] offset 0x03 data = 1560 [ 130.450142] offset >>>>>>> 0x04 data = 8061 [ 130.467207] offset 0x05 data = 0 [ >>>>>>> 130.470789] offset >>>>>>> 0x06 data = 4 [ 130.485071] offset 0x07 data = 2001 [ >>>>>>> 130.488934] offset 0x08 data = 0 [ 130.504320] offset 0x09 data >>>>>>> = 0 [ 130.507911] offset 0x10 data = 0 [ 130.520950] offset 0x11 >>>>>>> data = >>>>>>> 0 [ 130.532865] offset 0x15 data = 0 [ 130.536461] offset 0x16 >>>>>>> data = >>>>>>> 202 [ 130.540156] offset 0x17 data = 3c02 [ 130.557218] offset >>>>>>> 0x18 data = 801 [ 130.560987] offset 0x1b data = 0 [ >>>>>>> 130.575158] offset 0x1c data = 1 [ 130.578754] offset 0x1d data >>>>>>> = 0 [ 130.593543] offset 0x1e data = 115 [ 130.597312] offset >>>>>>> 0x1f data = 8190 >>>>>>> >>>>>>> We can see that BMCR and BMSR changes from 0x1100,0x7849 to >>>>> 0x100,0x784d (BMCR[12] bit and BMSR[2]), and loop it. >>>>>>> Above process is auto-nego enabled, link is down, auto-nego is >>>>>>> disabled, link >>>>> is up. And auto-nego complete bit is always 0. >>>>>>> >>>>>>> PHY seems become unstable if soft reset after start/config >>>>>>> auto-nego. I also >>>>> want to fix it in micrel driver, but failed. >>>>>>> >>>>>> >>>>>> Waiting for ANEG_COMPLETE to be set wouldn't be a good option. >>>>>> Aneg may never complete for different reasons, e.g. no physical >>>>>> link. And even if we use a timeout this may add unwanted delays. >>>>>> >>>>>>> Do you have any other insights that can help me further locate the >> issue? >>>>> Thanks. >>>>>>> >>>>>> >>>>>> I think current MAC/PHY PM handling isn't perfect. Often we have >>>>>> the following >>>>>> scenario: >>>>>> >>>>>> *suspend* >>>>>> 1. PHY is suspended (mdio_bus_phy_suspend) 2. MAC suspend callback >>>>>> (typically involving phy_stop()) >>>>>> >>>>>> *resume* >>>>>> 1. MAC resume callback (typically involving phy_start()) 2. PHY is >>>>>> resumed (mdio_bus_phy_resume), incl. calling phy_init_hw() >>>>>> >>>>>> Calling phy_init_hw() after phy_start() doesn't look right. >>>>>> It seems to work in most cases, but there's a certain risk that >>>>>> phy_init_hw() overwrites something, e.g. the advertised modes. >>>>>> I think we have two valid scenarios: >>>>>> >>>>>> 1. phylib PM callbacks are used, then the MAC driver shouldn't >>>>>> touch the PHY in its PM callbacks, especially not call >>>>>> phy_stop/phy_start. >>>>>> >>>>>> 2. MAC PM callbacks take care also of the PHY. Then I think we would >>>>>> need a flag at the phy_device telling it to make the PHY PM >>>>>> callbacks a no-op. >>>>>> >>>>>> Andrew, Russell: What do you think? >>>>>> >>>>>>> Best Regards, >>>>>>> Joakim Zhang >>>>>>> >>>>>>> [...] >>>>>>> >>>>>> Heiner >>>>>> >>>>> >>>>> Based on scenario 1 you can also try the following. >>>>> >>>>> diff --git a/drivers/net/ethernet/freescale/fec_main.c >>>>> b/drivers/net/ethernet/freescale/fec_main.c >>>>> index 3db882322..cdf9160fc 100644 >>>>> --- a/drivers/net/ethernet/freescale/fec_main.c >>>>> +++ b/drivers/net/ethernet/freescale/fec_main.c >>>>> @@ -3805,7 +3805,6 @@ static int __maybe_unused fec_suspend(struct >>>>> device *dev) >>>>> if (netif_running(ndev)) { >>>>> if (fep->wol_flag & FEC_WOL_FLAG_ENABLE) >>>>> fep->wol_flag |= FEC_WOL_FLAG_SLEEP_ON; >>>>> - phy_stop(ndev->phydev); >>>>> napi_disable(&fep->napi); >>>>> netif_tx_lock_bh(ndev); >>>>> netif_device_detach(ndev); >>>>> @@ -3864,7 +3863,6 @@ static int __maybe_unused fec_resume(struct >>>>> device *dev) >>>>> netif_device_attach(ndev); >>>>> netif_tx_unlock_bh(ndev); >>>>> napi_enable(&fep->napi); >>>>> - phy_start(ndev->phydev); >>>>> } >>>>> rtnl_unlock(); >>>> >>>> As I described in commit message: >>>> >>>> "When I am debugging, I found PHY works fine if MAC doesn't support >>> suspend/resume or phy_stop()/phy_start() doesn't been called during >>> suspend/resume. This let me realize, PHY state machine >>> phy_state_machine() could do something breaks the PHY." >>>> >>>> So I have tried your above code change before, and it works. But it >>>> is a very >>> common phenomenon that MAC suspend/resume invokes phy_stop/start or >>> phylink_stop/start, and it's been around for a long time. >>>> >>>> As the scenarios you list, it is indeed unreasonable to soft reset >>>> or config PHY >>> after calling phy_start_aneg() in state machine. PHY state should be >>> PHY_UP after calling phy_init_hw(), It seems more reasonable. >>>> >>>> Best Regards, >>>> Joakim Zhang >>>>> -- >>>>> 2.31.1 >>>>> >>>> >>> >>> Following is a draft patch for scenario 1: >>> Make PHY PM a no-op if MAC manages PHY PM. >>> Can you give it a try? >>> >> >> It can work for my case. Thanks. > > I have another question, if it is possible to change the suspend/resume sequence? > MAC should suspend before MDIO bus, and MDIO bus should resume before MAC. For some MACs, they need PHY feed clocks. It seems more reasonable. > On the other hand we have cases where the PHY depends on the MAC. If the MAC suspends first, the MDIO bus (and therefore the PHY) may not be accessible any longer. Therefore it's not that easy. This speaks for my proposal to be able to make the PHY PM ops no-ops. Then we can handle MAC + PHY PM in the MAC PM callbacks and consider any such dependency. I don't think we can change the PM ordering, AFAIK the PM core does it based on registration order of devices. > Best Regards, > Joakim Zhang >> Best Regards, >> Joakim Zhang >>> diff --git a/drivers/net/phy/phy_device.c >>> b/drivers/net/phy/phy_device.c index a009d1769..73d29fd5e 100644 >>> --- a/drivers/net/phy/phy_device.c >>> +++ b/drivers/net/phy/phy_device.c >>> @@ -273,6 +273,9 @@ static __maybe_unused int >>> mdio_bus_phy_suspend(struct device *dev) { >>> struct phy_device *phydev = to_phy_device(dev); >>> >>> + if (phydev->mac_managed_pm) >>> + return 0; >>> + >>> /* We must stop the state machine manually, otherwise it stops out of >>> * control, possibly with the phydev->lock held. Upon resume, netdev >>> * may call phy routines that try to grab the same lock, and that >>> may @@ >>> -294,6 +297,9 @@ static __maybe_unused int mdio_bus_phy_resume(struct >>> device *dev) >>> struct phy_device *phydev = to_phy_device(dev); >>> int ret; >>> >>> + if (phydev->mac_managed_pm) >>> + return 0; >>> + >>> if (!phydev->suspended_by_mdio_bus) >>> goto no_resume; >>> >>> diff --git a/include/linux/phy.h b/include/linux/phy.h index >>> 8e2cf84b2..46702af18 100644 >>> --- a/include/linux/phy.h >>> +++ b/include/linux/phy.h >>> @@ -567,6 +567,7 @@ struct phy_device { >>> unsigned loopback_enabled:1; >>> unsigned downshifted_rate:1; >>> unsigned is_on_sfp_module:1; >>> + unsigned mac_managed_pm:1; >>> >>> unsigned autoneg:1; >>> /* The most recently read link state */ >>> -- >>> 2.31.1 >>> >>> >>> >>> diff --git a/drivers/net/ethernet/freescale/fec_main.c >>> b/drivers/net/ethernet/freescale/fec_main.c >>> index 3db882322..70aea9c27 100644 >>> --- a/drivers/net/ethernet/freescale/fec_main.c >>> +++ b/drivers/net/ethernet/freescale/fec_main.c >>> @@ -2048,6 +2048,8 @@ static int fec_enet_mii_probe(struct net_device >>> *ndev) >>> fep->link = 0; >>> fep->full_duplex = 0; >>> >>> + phy_dev->mac_managed_pm = 1; >>> + >>> phy_attached_info(phy_dev); >>> >>> return 0; >>> @@ -3864,6 +3866,7 @@ static int __maybe_unused fec_resume(struct >>> device *dev) >>> netif_device_attach(ndev); >>> netif_tx_unlock_bh(ndev); >>> napi_enable(&fep->napi); >>> + phy_init_hw(ndev->phydev); >>> phy_start(ndev->phydev); >>> } >>> rtnl_unlock(); >>> -- >>> 2.31.1 >
> -----Original Message----- > From: Heiner Kallweit <hkallweit1@gmail.com> > Sent: 2021年4月7日 18:22 > To: Joakim Zhang <qiangqing.zhang@nxp.com>; christian.melki@t2data.com; > andrew@lunn.ch; linux@armlinux.org.uk; davem@davemloft.net; > kuba@kernel.org > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx > <linux-imx@nxp.com>; Florian Fainelli <f.fainelli@gmail.com> > Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume > back > > On 07.04.2021 12:05, Joakim Zhang wrote: > > > > Hi Heiner, > > > >> -----Original Message----- > >> From: Joakim Zhang <qiangqing.zhang@nxp.com> > >> Sent: 2021年4月7日 15:46 > >> To: Heiner Kallweit <hkallweit1@gmail.com>; > >> christian.melki@t2data.com; andrew@lunn.ch; linux@armlinux.org.uk; > >> davem@davemloft.net; kuba@kernel.org > >> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > >> dl-linux-imx <linux-imx@nxp.com>; Florian Fainelli > >> <f.fainelli@gmail.com> > >> Subject: RE: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus > >> resume back > >> > >> > >> Hi Heiner, > >> > >>> -----Original Message----- > >>> From: Heiner Kallweit <hkallweit1@gmail.com> > >>> Sent: 2021年4月7日 15:12 > >>> To: Joakim Zhang <qiangqing.zhang@nxp.com>; > >>> christian.melki@t2data.com; andrew@lunn.ch; linux@armlinux.org.uk; > >>> davem@davemloft.net; kuba@kernel.org > >>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > >>> dl-linux-imx <linux-imx@nxp.com>; Florian Fainelli > >>> <f.fainelli@gmail.com> > >>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO > >>> bus resume back > >>> > >>> On 07.04.2021 03:43, Joakim Zhang wrote: > >>>> > >>>> Hi Heiner, > >>>> > >>>>> -----Original Message----- > >>>>> From: Heiner Kallweit <hkallweit1@gmail.com> > >>>>> Sent: 2021年4月7日 2:22 > >>>>> To: Joakim Zhang <qiangqing.zhang@nxp.com>; > >>>>> christian.melki@t2data.com; andrew@lunn.ch; linux@armlinux.org.uk; > >>>>> davem@davemloft.net; kuba@kernel.org; Russell King - ARM Linux > >>>>> <linux@armlinux.org.uk> > >>>>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > >>>>> dl-linux-imx <linux-imx@nxp.com>; Florian Fainelli > >>>>> <f.fainelli@gmail.com> > >>>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO > >>>>> bus resume back > >>>>> > >>>>> On 06.04.2021 13:42, Heiner Kallweit wrote: > >>>>>> On 06.04.2021 12:07, Joakim Zhang wrote: > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: Heiner Kallweit <hkallweit1@gmail.com> > >>>>>>>> Sent: 2021年4月6日 14:29 > >>>>>>>> To: Joakim Zhang <qiangqing.zhang@nxp.com>; > >>>>>>>> christian.melki@t2data.com; andrew@lunn.ch; > >>>>>>>> linux@armlinux.org.uk; davem@davemloft.net; kuba@kernel.org > >>>>>>>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > >>>>>>>> dl-linux-imx <linux-imx@nxp.com> > >>>>>>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after > >>>>>>>> MDIO bus resume back > >>>>>>>> > >>>>>>>> On 06.04.2021 04:07, Joakim Zhang wrote: > >>>>>>>>> > >>>>>>>>> Hi Heiner, > >>>>>>>>> > >>>>>>>>>> -----Original Message----- > >>>>>>>>>> From: Heiner Kallweit <hkallweit1@gmail.com> > >>>>>>>>>> Sent: 2021年4月5日 20:10 > >>>>>>>>>> To: christian.melki@t2data.com; Joakim Zhang > >>>>>>>>>> <qiangqing.zhang@nxp.com>; andrew@lunn.ch; > >>>>>>>>>> linux@armlinux.org.uk; davem@davemloft.net; kuba@kernel.org > >>>>>>>>>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > >>>>>>>>>> dl-linux-imx <linux-imx@nxp.com> > >>>>>>>>>> Subject: Re: [PATCH] net: phy: fix PHY possibly unwork after > >>>>>>>>>> MDIO bus resume back > >>>>>>>>>> > >>>>>>>>>> On 05.04.2021 10:43, Christian Melki wrote: > >>>>>>>>>>> On 4/5/21 12:48 AM, Heiner Kallweit wrote: > >>>>>>>>>>>> On 04.04.2021 16:09, Heiner Kallweit wrote: > >>>>>>>>>>>>> On 04.04.2021 12:07, Joakim Zhang wrote: > >>>>>>>>>>>>>> commit 4c0d2e96ba055 ("net: phy: consider that > >>>>>>>>>>>>>> suspend2ram > >>>>> may > >>>>>>>>>>>>>> cut off PHY power") invokes phy_init_hw() when MDIO bus > >>>>>>>>>>>>>> resume, it will soft reset PHY if PHY driver implements > >>>>>>>>>>>>>> soft_reset > >>>>> callback. > >>>>>>>>>>>>>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset > >>>>>>>>>>>>>> callback to genphy_soft_reset for KSZ8081") adds > >>>>>>>>>>>>>> soft_reset for > >>>>> KSZ8081. > >>>>>>>>>>>>>> After these two patches, I found i.MX6UL 14x14 EVK which > >>>>>>>>>>>>>> connected to KSZ8081RNB doesn't work any more when > >> system > >>>>>>>> resume > >>>>>>>>>>>>>> back, MAC > >>>>>>>>>> driver is fec_main.c. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> It's obvious that initializing PHY hardware when MDIO bus > >>>>>>>>>>>>>> resume back would introduce some regression when PHY > >>>>>>>>>>>>>> implements soft_reset. When I > >>>>>>>>>>>>> > >>>>>>>>>>>>> Why is this obvious? Please elaborate on why a soft reset > >>>>>>>>>>>>> should break something. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> am debugging, I found PHY works fine if MAC doesn't > >>>>>>>>>>>>>> support suspend/resume or phy_stop()/phy_start() doesn't > >>>>>>>>>>>>>> been called during suspend/resume. This let me realize, > >>>>>>>>>>>>>> PHY state machine > >>>>>>>>>>>>>> phy_state_machine() could do something breaks the PHY. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> As we known, MAC resume first and then MDIO bus resume > >>> when > >>>>>>>>>>>>>> system resume back from suspend. When MAC resume, > usually > >>> it > >>>>>>>>>>>>>> will invoke > >>>>>>>>>>>>>> phy_start() where to change PHY state to PHY_UP, then > >>>>>>>>>>>>>> trigger the > >>>>>>>>>>>>>> stat> machine to run now. In phy_state_machine(), it will > >>>>>>>>>>>>>> start/config auto-nego, then change PHY state to > >>>>>>>>>>>>>> PHY_NOLINK, what to next is periodically check PHY link > >>>>>>>>>>>>>> status. When MDIO bus resume, it will initialize PHY > >>>>>>>>>>>>>> hardware, including soft_reset, what would soft_reset > >>>>>>>>>>>>>> affect seems various from > >>>>> different PHYs. > >>>>>>>>>>>>>> For KSZ8081RNB, when it in PHY_NOLINK state and then > >>>>>>>>>>>>>> perform a soft reset, > >>>>>>>>>> it will never complete auto-nego. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Why? That would need to be checked in detail. Maybe chip > >>>>>>>>>>>>> errata documentation provides a hint. > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> The KSZ8081 spec says the following about bit BMCR_PDOWN: > >>>>>>>>>>>> > >>>>>>>>>>>> If software reset (Register 0.15) is used to exit > >>>>>>>>>>>> power-down mode (Register 0.11 = 1), two software reset > >>>>>>>>>>>> writes (Register > >>>>>>>>>>>> 0.15 = 1) are required. The first write clears power-down > >>>>>>>>>>>> mode; the second write resets the chip and re-latches the > >>>>>>>>>>>> pin strapping pin > >>>>> values. > >>>>>>>>>>>> > >>>>>>>>>>>> Maybe this causes the issue you see and genphy_soft_reset() > >>>>>>>>>>>> isn't appropriate for this PHY. Please re-test with the > >>>>>>>>>>>> KSZ8081 soft reset following the spec comment. > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Interesting. Never expected that behavior. > >>>>>>>>>>> Thanks for catching it. Skimmed through the datasheets/erratas. > >>>>>>>>>>> This is what I found (micrel.c): > >>>>>>>>>>> > >>>>>>>>>>> 10/100: > >>>>>>>>>>> 8001 - Unaffected? > >>>>>>>>>>> 8021/8031 - Double reset after PDOWN. > >>>>>>>>>>> 8041 - Errata. PDOWN broken. Recommended do not use. > Unclear > >>>>>>>>>>> if reset solves the issue since errata says no error after > >>>>>>>>>>> reset but is also claiming that only toggling PDOWN (may) or > >>>>>>>>>>> power will > >>> help. > >>>>>>>>>>> 8051 - Double reset after PDOWN. > >>>>>>>>>>> 8061 - Double reset after PDOWN. > >>>>>>>>>>> 8081 - Double reset after PDOWN. > >>>>>>>>>>> 8091 - Double reset after PDOWN. > >>>>>>>>>>> > >>>>>>>>>>> 10/100/1000: > >>>>>>>>>>> Nothing in gigabit afaics. > >>>>>>>>>>> > >>>>>>>>>>> Switches: > >>>>>>>>>>> 8862 - Not affected? > >>>>>>>>>>> 8863 - Errata. PDOWN broken. Reset will not help. Workaround > >>> exists. > >>>>>>>>>>> 8864 - Not affected? > >>>>>>>>>>> 8873 - Errata. PDOWN broken. Reset will not help. Workaround > >>> exists. > >>>>>>>>>>> 9477 - Errata. PDOWN broken. Will randomly cause link > >>>>>>>>>>> failure on adjacent links. Do not use. > >>>>>>>>>>> > >>>>>>>>>>> This certainly explains a lot. > >>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> This patch changes PHY state to PHY_UP when MDIO bus > >>> resume > >>>>>>>>>>>>>> back, it should be reasonable after PHY hardware > >>>>>>>>>>>>>> re-initialized. Also give state machine a chance to > >>>>>>>>>>>>>> start/config > >>>>> auto-nego again. > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> If the MAC driver calls phy_stop() on suspend, then > >>>>>>>>>>>>> phydev->suspended is true and mdio_bus_phy_may_suspend() > >>>>>>>>>>>>> phydev->returns > >>>>>>>>>>>>> false. As a consequence > >>>>>>>>>>>>> phydev->suspended_by_mdio_bus is false and > >>>>>>>>>>>>> phydev->mdio_bus_phy_resume() > >>>>>>>>>>>>> skips the PHY hw initialization. > >>>>>>>>>>>>> Please also note that mdio_bus_phy_suspend() calls > >>>>>>>>>>>>> phy_stop_machine() that sets the state to PHY_UP. > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Forgot that MDIO bus suspend is done before MAC driver > >> suspend. > >>>>>>>>>>>> Therefore disregard this part for now. > >>>>>>>>>>>> > >>>>>>>>>>>>> Having said that the current argumentation isn't convincing. > >>>>>>>>>>>>> I'm not aware of such issues on other systems, therefore > >>>>>>>>>>>>> it's likely that something is system-dependent. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Please check the exact call sequence on your system, maybe > >>>>>>>>>>>>> it provides a hint. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> > >>>>>>>>>>>>>> --- > >>>>>>>>>>>>>> drivers/net/phy/phy_device.c | 7 +++++++ > >>>>>>>>>>>>>> 1 file changed, 7 insertions(+) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> diff --git a/drivers/net/phy/phy_device.c > >>>>>>>>>>>>>> b/drivers/net/phy/phy_device.c index > >>>>>>>>>>>>>> cc38e326405a..312a6f662481 > >>>>>>>>>>>>>> 100644 > >>>>>>>>>>>>>> --- a/drivers/net/phy/phy_device.c > >>>>>>>>>>>>>> +++ b/drivers/net/phy/phy_device.c > >>>>>>>>>>>>>> @@ -306,6 +306,13 @@ static __maybe_unused int > >>>>>>>>>> mdio_bus_phy_resume(struct device *dev) > >>>>>>>>>>>>>> ret = phy_resume(phydev); > >>>>>>>>>>>>>> if (ret < 0) > >>>>>>>>>>>>>> return ret; > >>>>>>>>>>>>>> + > >>>>>>>>>>>>>> + /* PHY state could be changed to PHY_NOLINK from MAC > >>>>>>>>>>>>>> +controller > >>>>>>>>>> resume > >>>>>>>>>>>>>> + * rounte with phy_start(), here change to PHY_UP after > >>>>>>>>>> re-initializing > >>>>>>>>>>>>>> + * PHY hardware, let PHY state machine to start/config > >>>>>>>>>>>>>> +auto-nego > >>>>>>>>>> again. > >>>>>>>>>>>>>> + */ > >>>>>>>>>>>>>> + phydev->state = PHY_UP; > >>>>>>>>>>>>>> + > >>>>>>>>>>>>>> no_resume: > >>>>>>>>>>>>>> if (phydev->attached_dev && phydev->adjust_link) > >>>>>>>>>>>>>> phy_start_machine(phydev); > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> This is a quick draft of the modified soft reset for KSZ8081. > >>>>>>>>>> Some tests would be appreciated. > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> I test this patch at my side, unfortunately, it still can't work. > >>>>>>>>> > >>>>>>>>> I have not found what soft reset would affect in 8081 spec, is > >>>>>>>>> there ang common Description for different PHYs? > >>>>>>>>> > >>>>>>>> > >>>>>>>> You can check the clause 22 spec: 802.3 22.2.4.1.1 > >>>>>>>> > >>>>>>>> Apart from that you can check BMCR value and which modes your > >>>>>>>> PHY advertises after a soft reset. If the PHY indeed wouldn't > >>>>>>>> restart aneg after a soft reset, then it would be the only one > >>>>>>>> with this behavior I know. And I'd wonder why there aren't more > >>>>>>>> such > >> reports. > >>>>>>> > >>>>>>> Hi Heiner, > >>>>>>> > >>>>>>> We have reasons to believe it is a weird behavior of KSZ8081. I > >>>>>>> have another two PHYs at my side, AR8031 and RTL8211FD, they can > >>>>>>> work fine if > >>>>> soft_reset implemented. > >>>>>>> > >>>>>>> As commit message described, phy_start() would change PHY state > >>>>>>> to > >>>>> PHY_UP finally to PHY_NOLINK when MAC resume. > >>>>>>> After MDIO bus resume back, it will periodically check link status. > >>>>>>> I did below > >>>>> code change to dump all PHY registers. > >>>>>>> > >>>>>>> --- a/drivers/net/phy/phy.c > >>>>>>> +++ b/drivers/net/phy/phy.c > >>>>>>> @@ -35,7 +35,7 @@ > >>>>>>> #include <net/genetlink.h> > >>>>>>> #include <net/sock.h> > >>>>>>> > >>>>>>> -#define PHY_STATE_TIME HZ > >>>>>>> +#define PHY_STATE_TIME (10 * HZ) > >>>>>>> > >>>>>>> #define PHY_STATE_STR(_state) \ > >>>>>>> case PHY_##_state: \ > >>>>>>> @@ -738,6 +738,28 @@ static int phy_check_link_status(struct > >>>>>>> phy_device > >>>>> *phydev) > >>>>>>> if (err) > >>>>>>> return err; > >>>>>>> > >>>>>>> + printk("offset 0x00 data = %0x\n", phy_read(phydev, 0x00)); > >>>>>>> + printk("offset 0x01 data = %0x\n", phy_read(phydev, 0x01)); > >>>>>>> + printk("offset 0x02 data = %0x\n", phy_read(phydev, 0x02)); > >>>>>>> + printk("offset 0x03 data = %0x\n", phy_read(phydev, 0x03)); > >>>>>>> + printk("offset 0x04 data = %0x\n", phy_read(phydev, 0x04)); > >>>>>>> + printk("offset 0x05 data = %0x\n", phy_read(phydev, 0x05)); > >>>>>>> + printk("offset 0x06 data = %0x\n", phy_read(phydev, 0x06)); > >>>>>>> + printk("offset 0x07 data = %0x\n", phy_read(phydev, 0x07)); > >>>>>>> + printk("offset 0x08 data = %0x\n", phy_read(phydev, 0x08)); > >>>>>>> + printk("offset 0x09 data = %0x\n", phy_read(phydev, 0x09)); > >>>>>>> + printk("offset 0x10 data = %0x\n", phy_read(phydev, 0x10)); > >>>>>>> + printk("offset 0x11 data = %0x\n", phy_read(phydev, 0x11)); > >>>>>>> + printk("offset 0x15 data = %0x\n", phy_read(phydev, 0x15)); > >>>>>>> + printk("offset 0x16 data = %0x\n", phy_read(phydev, 0x16)); > >>>>>>> + printk("offset 0x17 data = %0x\n", phy_read(phydev, 0x17)); > >>>>>>> + printk("offset 0x18 data = %0x\n", phy_read(phydev, 0x18)); > >>>>>>> + printk("offset 0x1b data = %0x\n", phy_read(phydev, 0x1b)); > >>>>>>> + printk("offset 0x1c data = %0x\n", phy_read(phydev, 0x1c)); > >>>>>>> + printk("offset 0x1d data = %0x\n", phy_read(phydev, 0x1d)); > >>>>>>> + printk("offset 0x1e data = %0x\n", phy_read(phydev, 0x1e)); > >>>>>>> + printk("offset 0x1f data = %0x\n", phy_read(phydev, 0x1f)); > >>>>>>> + printk("\n\n"); > >>>>>>> if (phydev->link && phydev->state != PHY_RUNNING) { > >>>>>>> phy_check_downshift(phydev); > >>>>>>> phydev->state = PHY_RUNNING; > >>>>>>> > >>>>>>> After MDIO bus resume back, results as below: > >>>>>>> > >>>>>>> [ 119.545383] offset 0x00 data = 1100 [ 119.549237] offset > >>>>>>> 0x01 data = 7849 [ 119.563125] offset 0x02 data = 22 [ > >>>>>>> 119.566810] offset 0x03 data = 1560 [ 119.570597] offset 0x04 > >>>>>>> data = 85e1 [ 119.588016] offset 0x05 data = 0 [ 119.592891] > >>>>>>> offset 0x06 data = > >>>>>>> 4 [ 119.596452] offset 0x07 data = 2001 [ 119.600233] offset > >>>>>>> 0x08 data = 0 [ 119.617864] offset 0x09 data = 0 [ 119.625990] > >>>>>>> offset > >>>>>>> 0x10 data = 0 [ 119.629576] offset 0x11 data = 0 [ 119.642735] > >>>>>>> offset 0x15 data = 0 [ 119.646332] offset 0x16 data = 202 [ > >>>>>>> 119.650030] offset 0x17 data = 5c02 [ 119.668054] offset 0x18 > >>>>>>> data = > >>>>>>> 801 [ 119.672997] offset 0x1b data = 0 [ 119.676565] offset > >>>>>>> 0x1c data = 0 [ 119.680084] offset 0x1d data = 0 [ 119.698031] > >>>>>>> offset 0x1e data = 20 [ 119.706246] offset 0x1f data = 8190 [ > >>>>>>> 119.709984] [ 119.709984] [ 120.182120] offset 0x00 data = 100 > >>>>>>> [ 120.185894] offset 0x01 data = 784d [ 120.189681] offset 0x02 > >>>>>>> data = 22 [ 120.206319] offset 0x03 data = 1560 [ 120.210171] > >>>>>>> offset > >>>>>>> 0x04 data = > >>>>>>> 8061 [ 120.225353] offset 0x05 data = 0 [ 120.228948] offset > >>>>>>> 0x06 data = 4 [ 120.242936] offset 0x07 data = 2001 [ > >>>>>>> 120.246792] offset > >>>>>>> 0x08 data = 0 [ 120.250313] offset 0x09 data = 0 [ 120.266748] > >>>>>>> offset 0x10 data = 0 [ 120.270335] offset 0x11 data = 0 [ > >>>>>>> 120.284167] offset 0x15 data = 0 [ 120.287760] offset 0x16 data > >>>>>>> = > >>>>>>> 202 [ 120.301031] offset 0x17 data = 3c02 [ 120.313209] offset > >>>>>>> 0x18 data = 801 [ 120.316983] offset 0x1b data = 0 [ > >>>>>>> 120.320513] offset 0x1c data = 1 [ 120.336589] offset 0x1d data > >>>>>>> = 0 [ 120.340184] offset 0x1e data = 115 [ 120.355357] offset > >>>>>>> 0x1f data = 8190 [ 120.359112] [ 120.359112] [ 129.785396] > >>>>>>> offset 0x00 data = 1100 [ 129.789252] offset 0x01 data = 7849 [ > >>>>>>> 129.809951] offset > >>>>>>> 0x02 data = > >>>>>>> 22 [ 129.815018] offset 0x03 data = 1560 [ 129.818845] offset > >>>>>>> 0x04 data = 85e1 [ 129.835808] offset 0x05 data = 0 [ > >>>>>>> 129.839398] offset > >>>>>>> 0x06 data = 4 [ 129.854514] offset 0x07 data = 2001 [ > >>>>>>> 129.858371] offset 0x08 data = 0 [ 129.873357] offset 0x09 data > >>>>>>> = 0 [ 129.876954] offset 0x10 data = 0 [ 129.880472] offset > >>>>>>> 0x11 data = > >>>>>>> 0 [ 129.896450] offset 0x15 data = 0 [ 129.900038] offset 0x16 > >>>>>>> data = > >>>>>>> 202 [ 129.915041] offset 0x17 data = 5c02 [ 129.918889] offset > >>>>>>> 0x18 data = 801 [ 129.932735] offset 0x1b data = 0 [ > >>>>>>> 129.946830] offset 0x1c data = 0 [ 129.950424] offset 0x1d data > >>>>>>> = 0 [ 129.964585] offset 0x1e data = 0 [ 129.968192] offset > >>>>>>> 0x1f data = > >>>>>>> 8190 [ 129.972938] [ 129.972938] [ 130.425125] offset 0x00 > >>>>>>> data = > >>>>>>> 100 [ 130.428889] offset 0x01 data = 784d [ 130.442671] offset > >>>>>>> 0x02 data = > >>>>>>> 22 [ 130.446360] offset 0x03 data = 1560 [ 130.450142] offset > >>>>>>> 0x04 data = 8061 [ 130.467207] offset 0x05 data = 0 [ > >>>>>>> 130.470789] offset > >>>>>>> 0x06 data = 4 [ 130.485071] offset 0x07 data = 2001 [ > >>>>>>> 130.488934] offset 0x08 data = 0 [ 130.504320] offset 0x09 data > >>>>>>> = 0 [ 130.507911] offset 0x10 data = 0 [ 130.520950] offset > >>>>>>> 0x11 data = > >>>>>>> 0 [ 130.532865] offset 0x15 data = 0 [ 130.536461] offset 0x16 > >>>>>>> data = > >>>>>>> 202 [ 130.540156] offset 0x17 data = 3c02 [ 130.557218] offset > >>>>>>> 0x18 data = 801 [ 130.560987] offset 0x1b data = 0 [ > >>>>>>> 130.575158] offset 0x1c data = 1 [ 130.578754] offset 0x1d data > >>>>>>> = 0 [ 130.593543] offset 0x1e data = 115 [ 130.597312] offset > >>>>>>> 0x1f data = 8190 > >>>>>>> > >>>>>>> We can see that BMCR and BMSR changes from 0x1100,0x7849 to > >>>>> 0x100,0x784d (BMCR[12] bit and BMSR[2]), and loop it. > >>>>>>> Above process is auto-nego enabled, link is down, auto-nego is > >>>>>>> disabled, link > >>>>> is up. And auto-nego complete bit is always 0. > >>>>>>> > >>>>>>> PHY seems become unstable if soft reset after start/config > >>>>>>> auto-nego. I also > >>>>> want to fix it in micrel driver, but failed. > >>>>>>> > >>>>>> > >>>>>> Waiting for ANEG_COMPLETE to be set wouldn't be a good option. > >>>>>> Aneg may never complete for different reasons, e.g. no physical > >>>>>> link. And even if we use a timeout this may add unwanted delays. > >>>>>> > >>>>>>> Do you have any other insights that can help me further locate > >>>>>>> the > >> issue? > >>>>> Thanks. > >>>>>>> > >>>>>> > >>>>>> I think current MAC/PHY PM handling isn't perfect. Often we have > >>>>>> the following > >>>>>> scenario: > >>>>>> > >>>>>> *suspend* > >>>>>> 1. PHY is suspended (mdio_bus_phy_suspend) 2. MAC suspend > >>>>>> callback (typically involving phy_stop()) > >>>>>> > >>>>>> *resume* > >>>>>> 1. MAC resume callback (typically involving phy_start()) 2. PHY > >>>>>> is resumed (mdio_bus_phy_resume), incl. calling phy_init_hw() > >>>>>> > >>>>>> Calling phy_init_hw() after phy_start() doesn't look right. > >>>>>> It seems to work in most cases, but there's a certain risk that > >>>>>> phy_init_hw() overwrites something, e.g. the advertised modes. > >>>>>> I think we have two valid scenarios: > >>>>>> > >>>>>> 1. phylib PM callbacks are used, then the MAC driver shouldn't > >>>>>> touch the PHY in its PM callbacks, especially not call > >>>>>> phy_stop/phy_start. > >>>>>> > >>>>>> 2. MAC PM callbacks take care also of the PHY. Then I think we would > >>>>>> need a flag at the phy_device telling it to make the PHY PM > >>>>>> callbacks a no-op. > >>>>>> > >>>>>> Andrew, Russell: What do you think? > >>>>>> > >>>>>>> Best Regards, > >>>>>>> Joakim Zhang > >>>>>>> > >>>>>>> [...] > >>>>>>> > >>>>>> Heiner > >>>>>> > >>>>> > >>>>> Based on scenario 1 you can also try the following. > >>>>> > >>>>> diff --git a/drivers/net/ethernet/freescale/fec_main.c > >>>>> b/drivers/net/ethernet/freescale/fec_main.c > >>>>> index 3db882322..cdf9160fc 100644 > >>>>> --- a/drivers/net/ethernet/freescale/fec_main.c > >>>>> +++ b/drivers/net/ethernet/freescale/fec_main.c > >>>>> @@ -3805,7 +3805,6 @@ static int __maybe_unused > fec_suspend(struct > >>>>> device *dev) > >>>>> if (netif_running(ndev)) { > >>>>> if (fep->wol_flag & FEC_WOL_FLAG_ENABLE) > >>>>> fep->wol_flag |= FEC_WOL_FLAG_SLEEP_ON; > >>>>> - phy_stop(ndev->phydev); > >>>>> napi_disable(&fep->napi); > >>>>> netif_tx_lock_bh(ndev); > >>>>> netif_device_detach(ndev); > >>>>> @@ -3864,7 +3863,6 @@ static int __maybe_unused > fec_resume(struct > >>>>> device *dev) > >>>>> netif_device_attach(ndev); > >>>>> netif_tx_unlock_bh(ndev); > >>>>> napi_enable(&fep->napi); > >>>>> - phy_start(ndev->phydev); > >>>>> } > >>>>> rtnl_unlock(); > >>>> > >>>> As I described in commit message: > >>>> > >>>> "When I am debugging, I found PHY works fine if MAC doesn't support > >>> suspend/resume or phy_stop()/phy_start() doesn't been called during > >>> suspend/resume. This let me realize, PHY state machine > >>> phy_state_machine() could do something breaks the PHY." > >>>> > >>>> So I have tried your above code change before, and it works. But it > >>>> is a very > >>> common phenomenon that MAC suspend/resume invokes phy_stop/start > or > >>> phylink_stop/start, and it's been around for a long time. > >>>> > >>>> As the scenarios you list, it is indeed unreasonable to soft reset > >>>> or config PHY > >>> after calling phy_start_aneg() in state machine. PHY state should be > >>> PHY_UP after calling phy_init_hw(), It seems more reasonable. > >>>> > >>>> Best Regards, > >>>> Joakim Zhang > >>>>> -- > >>>>> 2.31.1 > >>>>> > >>>> > >>> > >>> Following is a draft patch for scenario 1: > >>> Make PHY PM a no-op if MAC manages PHY PM. > >>> Can you give it a try? > >>> > >> > >> It can work for my case. Thanks. > > > > I have another question, if it is possible to change the suspend/resume > sequence? > > MAC should suspend before MDIO bus, and MDIO bus should resume before > MAC. For some MACs, they need PHY feed clocks. It seems more reasonable. > > > > On the other hand we have cases where the PHY depends on the MAC. > If the MAC suspends first, the MDIO bus (and therefore the PHY) may not be > accessible any longer. Therefore it's not that easy. Yes, right. > This speaks for my proposal to be able to make the PHY PM ops no-ops. > Then we can handle MAC + PHY PM in the MAC PM callbacks and consider any > such dependency. Yes, this can cover different cases. > I don't think we can change the PM ordering, AFAIK the PM core does it based > on registration order of devices. Learn more. Thanks. Do you plan to send a formal patch for this? Best Regards, Joakim Zhang > > > Best Regards, > > Joakim Zhang > >> Best Regards, > >> Joakim Zhang > >>> diff --git a/drivers/net/phy/phy_device.c > >>> b/drivers/net/phy/phy_device.c index a009d1769..73d29fd5e 100644 > >>> --- a/drivers/net/phy/phy_device.c > >>> +++ b/drivers/net/phy/phy_device.c > >>> @@ -273,6 +273,9 @@ static __maybe_unused int > >>> mdio_bus_phy_suspend(struct device *dev) { > >>> struct phy_device *phydev = to_phy_device(dev); > >>> > >>> + if (phydev->mac_managed_pm) > >>> + return 0; > >>> + > >>> /* We must stop the state machine manually, otherwise it stops out > of > >>> * control, possibly with the phydev->lock held. Upon resume, netdev > >>> * may call phy routines that try to grab the same lock, and that > >>> may @@ > >>> -294,6 +297,9 @@ static __maybe_unused int > >>> mdio_bus_phy_resume(struct device *dev) > >>> struct phy_device *phydev = to_phy_device(dev); > >>> int ret; > >>> > >>> + if (phydev->mac_managed_pm) > >>> + return 0; > >>> + > >>> if (!phydev->suspended_by_mdio_bus) > >>> goto no_resume; > >>> > >>> diff --git a/include/linux/phy.h b/include/linux/phy.h index > >>> 8e2cf84b2..46702af18 100644 > >>> --- a/include/linux/phy.h > >>> +++ b/include/linux/phy.h > >>> @@ -567,6 +567,7 @@ struct phy_device { > >>> unsigned loopback_enabled:1; > >>> unsigned downshifted_rate:1; > >>> unsigned is_on_sfp_module:1; > >>> + unsigned mac_managed_pm:1; > >>> > >>> unsigned autoneg:1; > >>> /* The most recently read link state */ > >>> -- > >>> 2.31.1 > >>> > >>> > >>> > >>> diff --git a/drivers/net/ethernet/freescale/fec_main.c > >>> b/drivers/net/ethernet/freescale/fec_main.c > >>> index 3db882322..70aea9c27 100644 > >>> --- a/drivers/net/ethernet/freescale/fec_main.c > >>> +++ b/drivers/net/ethernet/freescale/fec_main.c > >>> @@ -2048,6 +2048,8 @@ static int fec_enet_mii_probe(struct > >>> net_device > >>> *ndev) > >>> fep->link = 0; > >>> fep->full_duplex = 0; > >>> > >>> + phy_dev->mac_managed_pm = 1; > >>> + > >>> phy_attached_info(phy_dev); > >>> > >>> return 0; > >>> @@ -3864,6 +3866,7 @@ static int __maybe_unused fec_resume(struct > >>> device *dev) > >>> netif_device_attach(ndev); > >>> netif_tx_unlock_bh(ndev); > >>> napi_enable(&fep->napi); > >>> + phy_init_hw(ndev->phydev); > >>> phy_start(ndev->phydev); > >>> } > >>> rtnl_unlock(); > >>> -- > >>> 2.31.1 > >
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index cc38e326405a..312a6f662481 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -306,6 +306,13 @@ static __maybe_unused int mdio_bus_phy_resume(struct device *dev) ret = phy_resume(phydev); if (ret < 0) return ret; + + /* PHY state could be changed to PHY_NOLINK from MAC controller resume + * rounte with phy_start(), here change to PHY_UP after re-initializing + * PHY hardware, let PHY state machine to start/config auto-nego again. + */ + phydev->state = PHY_UP; + no_resume: if (phydev->attached_dev && phydev->adjust_link) phy_start_machine(phydev);
commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram may cut off PHY power") invokes phy_init_hw() when MDIO bus resume, it will soft reset PHY if PHY driver implements soft_reset callback. commit 764d31cacfe4 ("net: phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ8081") adds soft_reset for KSZ8081. After these two patches, I found i.MX6UL 14x14 EVK which connected to KSZ8081RNB doesn't work any more when system resume back, MAC driver is fec_main.c. It's obvious that initializing PHY hardware when MDIO bus resume back would introduce some regression when PHY implements soft_reset. When I am debugging, I found PHY works fine if MAC doesn't support suspend/resume or phy_stop()/phy_start() doesn't been called during suspend/resume. This let me realize, PHY state machine phy_state_machine() could do something breaks the PHY. As we known, MAC resume first and then MDIO bus resume when system resume back from suspend. When MAC resume, usually it will invoke phy_start() where to change PHY state to PHY_UP, then trigger the state machine to run now. In phy_state_machine(), it will start/config auto-nego, then change PHY state to PHY_NOLINK, what to next is periodically check PHY link status. When MDIO bus resume, it will initialize PHY hardware, including soft_reset, what would soft_reset affect seems various from different PHYs. For KSZ8081RNB, when it in PHY_NOLINK state and then perform a soft reset, it will never complete auto-nego. This patch changes PHY state to PHY_UP when MDIO bus resume back, it should be reasonable after PHY hardware re-initialized. Also give state machine a chance to start/config auto-nego again. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> --- drivers/net/phy/phy_device.c | 7 +++++++ 1 file changed, 7 insertions(+)