mbox series

[net,v2,0/1] PHY interruptus horribilis

Message ID cover.1654680790.git.lukas@wunner.de
Headers show
Series PHY interruptus horribilis | expand

Message

Lukas Wunner June 8, 2022, 9:52 a.m. UTC
Andrew Lunn (PHY maintainer) asked me to resend this patch and cc the
IRQ maintainer.  I'm also cc'ing PM maintainers for good measure.

The patch addresses an issue with PHY interrupts occurring during a
system sleep transition after the PHY has already been suspended.

The IRQ subsystem uses an internal flag IRQD_WAKEUP_ARMED to avoid
handling such interrupts, but it's not set until suspend_device_irqs()
is called during the ->suspend_noirq() phase.  That's too late in this
case as PHYs are suspended in the ->suspend() phase.  And there's
no external interface to set the flag earlier.

As I'm lacking access to the flag, I'm open coding its functionality
in this patch.  Is this the correct approach or should I instead look
into providing an external interface to the flag?

Side note: suspend_device_irqs() and resume_device_irqs() have been
exported since forever even though there's no module user...

Thanks!

Changes since v1:
* Extend rationale in the commit message.
* Drop Fixes tag, add Tested-by tag (Marek).

Link to v1:
https://lore.kernel.org/netdev/688f559346ea747d3b47a4d16ef8277e093f9ebe.1653556322.git.lukas@wunner.de/

Lukas Wunner (1):
  net: phy: Don't trigger state machine while in suspend

 drivers/net/phy/phy.c        | 23 +++++++++++++++++++++++
 drivers/net/phy/phy_device.c | 23 +++++++++++++++++++++++
 include/linux/phy.h          |  6 ++++++
 3 files changed, 52 insertions(+)

Comments

Rafael J. Wysocki June 8, 2022, 10:11 a.m. UTC | #1
On Wed, Jun 8, 2022 at 11:57 AM Lukas Wunner <lukas@wunner.de> wrote:
>
> Upon system sleep, mdio_bus_phy_suspend() stops the phy_state_machine(),
> but subsequent interrupts may retrigger it:
>
> They may have been left enabled to facilitate wakeup and are not
> quiesced until the ->suspend_noirq() phase.  Unwanted interrupts may
> hence occur between mdio_bus_phy_suspend() and dpm_suspend_noirq(),
> as well as between dpm_resume_noirq() and mdio_bus_phy_resume().
>
> Retriggering the phy_state_machine() through an interrupt is not only
> undesirable for the reason given in mdio_bus_phy_suspend() (freezing it
> midway with phydev->lock held), but also because the PHY may be
> inaccessible after it's suspended:  Accesses to USB-attached PHYs are
> blocked once usb_suspend_both() clears the can_submit flag and PHYs on
> PCI network cards may become inaccessible upon suspend as well.
>
> Amend phy_interrupt() to avoid triggering the state machine if the PHY
> is suspended.  Signal wakeup instead if the attached net_device or its
> parent has been configured as a wakeup source.  (Those conditions are
> identical to mdio_bus_phy_may_suspend().)  Postpone handling of the
> interrupt until the PHY has resumed.
>
> Before stopping the phy_state_machine() in mdio_bus_phy_suspend(),
> wait for a concurrent phy_interrupt() to run to completion.  That is
> necessary because phy_interrupt() may have checked the PHY's suspend
> status before the system sleep transition commenced and it may thus
> retrigger the state machine after it was stopped.
>
> Likewise, after re-enabling interrupt handling in mdio_bus_phy_resume(),
> wait for a concurrent phy_interrupt() to complete to ensure that
> interrupts which it postponed are properly rerun.
>
> Link: https://lore.kernel.org/netdev/a5315a8a-32c2-962f-f696-de9a26d30091@samsung.com/
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

Looks reasonable to me.

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/net/phy/phy.c        | 23 +++++++++++++++++++++++
>  drivers/net/phy/phy_device.c | 23 +++++++++++++++++++++++
>  include/linux/phy.h          |  6 ++++++
>  3 files changed, 52 insertions(+)
>
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index ef62f357b76d..8d3ee3a6495b 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -31,6 +31,7 @@
>  #include <linux/io.h>
>  #include <linux/uaccess.h>
>  #include <linux/atomic.h>
> +#include <linux/suspend.h>
>  #include <net/netlink.h>
>  #include <net/genetlink.h>
>  #include <net/sock.h>
> @@ -976,6 +977,28 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
>         struct phy_driver *drv = phydev->drv;
>         irqreturn_t ret;
>
> +       /* Wakeup interrupts may occur during a system sleep transition.
> +        * Postpone handling until the PHY has resumed.
> +        */
> +       if (IS_ENABLED(CONFIG_PM_SLEEP) && phydev->irq_suspended) {
> +               struct net_device *netdev = phydev->attached_dev;
> +
> +               if (netdev) {
> +                       struct device *parent = netdev->dev.parent;
> +
> +                       if (netdev->wol_enabled)
> +                               pm_system_wakeup();
> +                       else if (device_may_wakeup(&netdev->dev))
> +                               pm_wakeup_dev_event(&netdev->dev, 0, true);
> +                       else if (parent && device_may_wakeup(parent))
> +                               pm_wakeup_dev_event(parent, 0, true);
> +               }
> +
> +               phydev->irq_rerun = 1;
> +               disable_irq_nosync(irq);
> +               return IRQ_HANDLED;
> +       }
> +
>         mutex_lock(&phydev->lock);
>         ret = drv->handle_interrupt(phydev);
>         mutex_unlock(&phydev->lock);
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 431a8719c635..46acddd865a7 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -278,6 +278,15 @@ static __maybe_unused int mdio_bus_phy_suspend(struct device *dev)
>         if (phydev->mac_managed_pm)
>                 return 0;
>
> +       /* Wakeup interrupts may occur during the system sleep transition when
> +        * the PHY is inaccessible. Set flag to postpone handling until the PHY
> +        * has resumed. Wait for concurrent interrupt handler to complete.
> +        */
> +       if (phy_interrupt_is_valid(phydev)) {
> +               phydev->irq_suspended = 1;
> +               synchronize_irq(phydev->irq);
> +       }
> +
>         /* We must stop the state machine manually, otherwise it stops out of
>          * control, possibly with the phydev->lock held. Upon resume, netdev
>          * may call phy routines that try to grab the same lock, and that may
> @@ -315,6 +324,20 @@ static __maybe_unused int mdio_bus_phy_resume(struct device *dev)
>         if (ret < 0)
>                 return ret;
>  no_resume:
> +       if (phy_interrupt_is_valid(phydev)) {
> +               phydev->irq_suspended = 0;
> +               synchronize_irq(phydev->irq);
> +
> +               /* Rerun interrupts which were postponed by phy_interrupt()
> +                * because they occurred during the system sleep transition.
> +                */
> +               if (phydev->irq_rerun) {
> +                       phydev->irq_rerun = 0;
> +                       enable_irq(phydev->irq);
> +                       irq_wake_thread(phydev->irq, phydev);
> +               }
> +       }
> +
>         if (phydev->attached_dev && phydev->adjust_link)
>                 phy_start_machine(phydev);
>
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 508f1149665b..b09f7d36cff2 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -572,6 +572,10 @@ struct macsec_ops;
>   * @mdix_ctrl: User setting of crossover
>   * @pma_extable: Cached value of PMA/PMD Extended Abilities Register
>   * @interrupts: Flag interrupts have been enabled
> + * @irq_suspended: Flag indicating PHY is suspended and therefore interrupt
> + *                 handling shall be postponed until PHY has resumed
> + * @irq_rerun: Flag indicating interrupts occurred while PHY was suspended,
> + *             requiring a rerun of the interrupt handler after resume
>   * @interface: enum phy_interface_t value
>   * @skb: Netlink message for cable diagnostics
>   * @nest: Netlink nest used for cable diagnostics
> @@ -626,6 +630,8 @@ struct phy_device {
>
>         /* Interrupts are enabled */
>         unsigned interrupts:1;
> +       unsigned irq_suspended:1;
> +       unsigned irq_rerun:1;
>
>         enum phy_state state;
>
> --
> 2.35.2
>
Rafael J. Wysocki June 8, 2022, 10:35 a.m. UTC | #2
On Wed, Jun 8, 2022 at 11:52 AM Lukas Wunner <lukas@wunner.de> wrote:
>
> Andrew Lunn (PHY maintainer) asked me to resend this patch and cc the
> IRQ maintainer.  I'm also cc'ing PM maintainers for good measure.
>
> The patch addresses an issue with PHY interrupts occurring during a
> system sleep transition after the PHY has already been suspended.
>
> The IRQ subsystem uses an internal flag IRQD_WAKEUP_ARMED to avoid
> handling such interrupts, but it's not set until suspend_device_irqs()
> is called during the ->suspend_noirq() phase.  That's too late in this
> case as PHYs are suspended in the ->suspend() phase.  And there's
> no external interface to set the flag earlier.

Yes, it is not there intentionally.

Strictly speaking, IRQD_WAKEUP_ARMED is there to indicate to the IRQ
subsystem that the given IRQ is a system wakeup one and has been left
enabled specifically in order to signal system wakeup.  It allows the
IRQ to trigger between suspend_device_irqs() and resume_device_irqs()
exactly once, which causes the system to wake up from suspend-to-idle
(that's the primary use case for it) or aborts system suspends in
progress.

As you have noticed, it is set automatically by suspend_device_irqs()
if the given IRQ has IRQD_WAKEUP_STATE which is the case when it has
been enabled for system wakeup.

> As I'm lacking access to the flag, I'm open coding its functionality
> in this patch.  Is this the correct approach or should I instead look
> into providing an external interface to the flag?

The idea is that the regular IRQ "action" handler will run before
suspend_device_irqs(), so it should take care of signaling wakeup if
need be.  IRQD_WAKEUP_ARMED to trigger wakeup when IRQ "action"
handlers don't run.

> Side note: suspend_device_irqs() and resume_device_irqs() have been
> exported since forever even though there's no module user...

Well, that's a mistake.
Rafael J. Wysocki June 8, 2022, 11:09 a.m. UTC | #3
On Wed, Jun 8, 2022 at 12:35 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Jun 8, 2022 at 11:52 AM Lukas Wunner <lukas@wunner.de> wrote:
> >
> > Andrew Lunn (PHY maintainer) asked me to resend this patch and cc the
> > IRQ maintainer.  I'm also cc'ing PM maintainers for good measure.
> >
> > The patch addresses an issue with PHY interrupts occurring during a
> > system sleep transition after the PHY has already been suspended.
> >
> > The IRQ subsystem uses an internal flag IRQD_WAKEUP_ARMED to avoid
> > handling such interrupts, but it's not set until suspend_device_irqs()
> > is called during the ->suspend_noirq() phase.  That's too late in this
> > case as PHYs are suspended in the ->suspend() phase.  And there's
> > no external interface to set the flag earlier.
>
> Yes, it is not there intentionally.
>
> Strictly speaking, IRQD_WAKEUP_ARMED is there to indicate to the IRQ
> subsystem that the given IRQ is a system wakeup one and has been left
> enabled specifically in order to signal system wakeup.  It allows the
> IRQ to trigger between suspend_device_irqs() and resume_device_irqs()
> exactly once, which causes the system to wake up from suspend-to-idle
> (that's the primary use case for it) or aborts system suspends in
> progress.
>
> As you have noticed, it is set automatically by suspend_device_irqs()
> if the given IRQ has IRQD_WAKEUP_STATE which is the case when it has
> been enabled for system wakeup.
>
> > As I'm lacking access to the flag, I'm open coding its functionality
> > in this patch.  Is this the correct approach or should I instead look
> > into providing an external interface to the flag?
>
> The idea is that the regular IRQ "action" handler will run before
> suspend_device_irqs(), so it should take care of signaling wakeup if
> need be.  IRQD_WAKEUP_ARMED to trigger wakeup when IRQ "action"
> handlers don't run.

That said IMV there could be a wrapper around suspend_device_irq() to
mark a specific IRQ as "suspended" before running
suspend_device_irqs(), but that would require adding "suspend depth"
to struct irq_desc, so the IRQ is not "resumed" prematurely by
resume_device_irqs().  And there needs to be an analogous wrapper
around resume_irq() for the resume path.

Does the single prospective user justify this?