Message ID | cover.1654680790.git.lukas@wunner.de |
---|---|
Headers | show |
Series | PHY interruptus horribilis | expand |
On Wed, Jun 8, 2022 at 11:57 AM Lukas Wunner <lukas@wunner.de> wrote: > > Upon system sleep, mdio_bus_phy_suspend() stops the phy_state_machine(), > but subsequent interrupts may retrigger it: > > They may have been left enabled to facilitate wakeup and are not > quiesced until the ->suspend_noirq() phase. Unwanted interrupts may > hence occur between mdio_bus_phy_suspend() and dpm_suspend_noirq(), > as well as between dpm_resume_noirq() and mdio_bus_phy_resume(). > > Retriggering the phy_state_machine() through an interrupt is not only > undesirable for the reason given in mdio_bus_phy_suspend() (freezing it > midway with phydev->lock held), but also because the PHY may be > inaccessible after it's suspended: Accesses to USB-attached PHYs are > blocked once usb_suspend_both() clears the can_submit flag and PHYs on > PCI network cards may become inaccessible upon suspend as well. > > Amend phy_interrupt() to avoid triggering the state machine if the PHY > is suspended. Signal wakeup instead if the attached net_device or its > parent has been configured as a wakeup source. (Those conditions are > identical to mdio_bus_phy_may_suspend().) Postpone handling of the > interrupt until the PHY has resumed. > > Before stopping the phy_state_machine() in mdio_bus_phy_suspend(), > wait for a concurrent phy_interrupt() to run to completion. That is > necessary because phy_interrupt() may have checked the PHY's suspend > status before the system sleep transition commenced and it may thus > retrigger the state machine after it was stopped. > > Likewise, after re-enabling interrupt handling in mdio_bus_phy_resume(), > wait for a concurrent phy_interrupt() to complete to ensure that > interrupts which it postponed are properly rerun. > > Link: https://lore.kernel.org/netdev/a5315a8a-32c2-962f-f696-de9a26d30091@samsung.com/ > Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > Signed-off-by: Lukas Wunner <lukas@wunner.de> Looks reasonable to me. Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > drivers/net/phy/phy.c | 23 +++++++++++++++++++++++ > drivers/net/phy/phy_device.c | 23 +++++++++++++++++++++++ > include/linux/phy.h | 6 ++++++ > 3 files changed, 52 insertions(+) > > diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c > index ef62f357b76d..8d3ee3a6495b 100644 > --- a/drivers/net/phy/phy.c > +++ b/drivers/net/phy/phy.c > @@ -31,6 +31,7 @@ > #include <linux/io.h> > #include <linux/uaccess.h> > #include <linux/atomic.h> > +#include <linux/suspend.h> > #include <net/netlink.h> > #include <net/genetlink.h> > #include <net/sock.h> > @@ -976,6 +977,28 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat) > struct phy_driver *drv = phydev->drv; > irqreturn_t ret; > > + /* Wakeup interrupts may occur during a system sleep transition. > + * Postpone handling until the PHY has resumed. > + */ > + if (IS_ENABLED(CONFIG_PM_SLEEP) && phydev->irq_suspended) { > + struct net_device *netdev = phydev->attached_dev; > + > + if (netdev) { > + struct device *parent = netdev->dev.parent; > + > + if (netdev->wol_enabled) > + pm_system_wakeup(); > + else if (device_may_wakeup(&netdev->dev)) > + pm_wakeup_dev_event(&netdev->dev, 0, true); > + else if (parent && device_may_wakeup(parent)) > + pm_wakeup_dev_event(parent, 0, true); > + } > + > + phydev->irq_rerun = 1; > + disable_irq_nosync(irq); > + return IRQ_HANDLED; > + } > + > mutex_lock(&phydev->lock); > ret = drv->handle_interrupt(phydev); > mutex_unlock(&phydev->lock); > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index 431a8719c635..46acddd865a7 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -278,6 +278,15 @@ static __maybe_unused int mdio_bus_phy_suspend(struct device *dev) > if (phydev->mac_managed_pm) > return 0; > > + /* Wakeup interrupts may occur during the system sleep transition when > + * the PHY is inaccessible. Set flag to postpone handling until the PHY > + * has resumed. Wait for concurrent interrupt handler to complete. > + */ > + if (phy_interrupt_is_valid(phydev)) { > + phydev->irq_suspended = 1; > + synchronize_irq(phydev->irq); > + } > + > /* We must stop the state machine manually, otherwise it stops out of > * control, possibly with the phydev->lock held. Upon resume, netdev > * may call phy routines that try to grab the same lock, and that may > @@ -315,6 +324,20 @@ static __maybe_unused int mdio_bus_phy_resume(struct device *dev) > if (ret < 0) > return ret; > no_resume: > + if (phy_interrupt_is_valid(phydev)) { > + phydev->irq_suspended = 0; > + synchronize_irq(phydev->irq); > + > + /* Rerun interrupts which were postponed by phy_interrupt() > + * because they occurred during the system sleep transition. > + */ > + if (phydev->irq_rerun) { > + phydev->irq_rerun = 0; > + enable_irq(phydev->irq); > + irq_wake_thread(phydev->irq, phydev); > + } > + } > + > if (phydev->attached_dev && phydev->adjust_link) > phy_start_machine(phydev); > > diff --git a/include/linux/phy.h b/include/linux/phy.h > index 508f1149665b..b09f7d36cff2 100644 > --- a/include/linux/phy.h > +++ b/include/linux/phy.h > @@ -572,6 +572,10 @@ struct macsec_ops; > * @mdix_ctrl: User setting of crossover > * @pma_extable: Cached value of PMA/PMD Extended Abilities Register > * @interrupts: Flag interrupts have been enabled > + * @irq_suspended: Flag indicating PHY is suspended and therefore interrupt > + * handling shall be postponed until PHY has resumed > + * @irq_rerun: Flag indicating interrupts occurred while PHY was suspended, > + * requiring a rerun of the interrupt handler after resume > * @interface: enum phy_interface_t value > * @skb: Netlink message for cable diagnostics > * @nest: Netlink nest used for cable diagnostics > @@ -626,6 +630,8 @@ struct phy_device { > > /* Interrupts are enabled */ > unsigned interrupts:1; > + unsigned irq_suspended:1; > + unsigned irq_rerun:1; > > enum phy_state state; > > -- > 2.35.2 >
On Wed, Jun 8, 2022 at 11:52 AM Lukas Wunner <lukas@wunner.de> wrote: > > Andrew Lunn (PHY maintainer) asked me to resend this patch and cc the > IRQ maintainer. I'm also cc'ing PM maintainers for good measure. > > The patch addresses an issue with PHY interrupts occurring during a > system sleep transition after the PHY has already been suspended. > > The IRQ subsystem uses an internal flag IRQD_WAKEUP_ARMED to avoid > handling such interrupts, but it's not set until suspend_device_irqs() > is called during the ->suspend_noirq() phase. That's too late in this > case as PHYs are suspended in the ->suspend() phase. And there's > no external interface to set the flag earlier. Yes, it is not there intentionally. Strictly speaking, IRQD_WAKEUP_ARMED is there to indicate to the IRQ subsystem that the given IRQ is a system wakeup one and has been left enabled specifically in order to signal system wakeup. It allows the IRQ to trigger between suspend_device_irqs() and resume_device_irqs() exactly once, which causes the system to wake up from suspend-to-idle (that's the primary use case for it) or aborts system suspends in progress. As you have noticed, it is set automatically by suspend_device_irqs() if the given IRQ has IRQD_WAKEUP_STATE which is the case when it has been enabled for system wakeup. > As I'm lacking access to the flag, I'm open coding its functionality > in this patch. Is this the correct approach or should I instead look > into providing an external interface to the flag? The idea is that the regular IRQ "action" handler will run before suspend_device_irqs(), so it should take care of signaling wakeup if need be. IRQD_WAKEUP_ARMED to trigger wakeup when IRQ "action" handlers don't run. > Side note: suspend_device_irqs() and resume_device_irqs() have been > exported since forever even though there's no module user... Well, that's a mistake.
On Wed, Jun 8, 2022 at 12:35 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Wed, Jun 8, 2022 at 11:52 AM Lukas Wunner <lukas@wunner.de> wrote: > > > > Andrew Lunn (PHY maintainer) asked me to resend this patch and cc the > > IRQ maintainer. I'm also cc'ing PM maintainers for good measure. > > > > The patch addresses an issue with PHY interrupts occurring during a > > system sleep transition after the PHY has already been suspended. > > > > The IRQ subsystem uses an internal flag IRQD_WAKEUP_ARMED to avoid > > handling such interrupts, but it's not set until suspend_device_irqs() > > is called during the ->suspend_noirq() phase. That's too late in this > > case as PHYs are suspended in the ->suspend() phase. And there's > > no external interface to set the flag earlier. > > Yes, it is not there intentionally. > > Strictly speaking, IRQD_WAKEUP_ARMED is there to indicate to the IRQ > subsystem that the given IRQ is a system wakeup one and has been left > enabled specifically in order to signal system wakeup. It allows the > IRQ to trigger between suspend_device_irqs() and resume_device_irqs() > exactly once, which causes the system to wake up from suspend-to-idle > (that's the primary use case for it) or aborts system suspends in > progress. > > As you have noticed, it is set automatically by suspend_device_irqs() > if the given IRQ has IRQD_WAKEUP_STATE which is the case when it has > been enabled for system wakeup. > > > As I'm lacking access to the flag, I'm open coding its functionality > > in this patch. Is this the correct approach or should I instead look > > into providing an external interface to the flag? > > The idea is that the regular IRQ "action" handler will run before > suspend_device_irqs(), so it should take care of signaling wakeup if > need be. IRQD_WAKEUP_ARMED to trigger wakeup when IRQ "action" > handlers don't run. That said IMV there could be a wrapper around suspend_device_irq() to mark a specific IRQ as "suspended" before running suspend_device_irqs(), but that would require adding "suspend depth" to struct irq_desc, so the IRQ is not "resumed" prematurely by resume_device_irqs(). And there needs to be an analogous wrapper around resume_irq() for the resume path. Does the single prospective user justify this?