diff mbox series

net: phy: leds: Trigger leds only if PHY speed is known

Message ID 20210716141142.12710-1-iivanov@suse.de
State New
Headers show
Series net: phy: leds: Trigger leds only if PHY speed is known | expand

Commit Message

Ivan T. Ivanov July 16, 2021, 2:11 p.m. UTC
This prevents "No phy led trigger registered for speed(-1)"
alert message which is coused by phy_led_trigger_chage_speed()
being called during attaching phy to net_device where phy device
speed could be still unknown.

Signed-off-by: Ivan T. Ivanov <iivanov@suse.de>
---
 drivers/net/phy/phy_led_triggers.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Russell King (Oracle) July 19, 2021, 3:29 p.m. UTC | #1
On Fri, Jul 16, 2021 at 09:32:00PM +0300, Ivan T. Ivanov wrote:
> Hi,

> 

> Quoting Andrew Lunn (2021-07-16 18:19:58)

> > On Fri, Jul 16, 2021 at 05:11:42PM +0300, Ivan T. Ivanov wrote:

> > > This prevents "No phy led trigger registered for speed(-1)"

> > > alert message which is coused by phy_led_trigger_chage_speed()

> > > being called during attaching phy to net_device where phy device

> > > speed could be still unknown.

> > 

> > Hi Ivan

> > 

> > It seems odd that when attaching the PHY we have link, but not the

> > speed. What PHY is this?

> 

> This is lan78xx on RPi3B+

> 

> > 

> > > -     if (phy->speed == 0)

> > > +     if (phy->speed == 0 || phy->speed == SPEED_UNKNOWN)

> > >               return;

> > 

> > This change makes sense. But i'm wondering if the original logic is

> > sound. We have link, but no speed information.

> 

> Well, probably my interpretation was not correct. The most probable

> call to phy_led_trigger_change_speed() which couses this alert is

> phy_attach_direct() -> phy_led_triggers_register(), I think. I am

> not sure that we have link at this stage or not.


This does sound weird.

When a phy_device is allocated, it's explicitly initialised with:

	dev->speed = SPEED_UNKNOWN;
	dev->duplex = DUPLEX_UNKNOWN;
	dev->link = 0;
	dev->state = PHY_DOWN;

so, unless something is causing state to be read before we've attached
the phy to a network device, this is how this state should remain. I
wonder why you are seeing dev->link be non-zero.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Russell King (Oracle) Aug. 9, 2021, 2:16 p.m. UTC | #2
On Wed, Aug 04, 2021 at 11:33:10AM +0300, Ivan T. Ivanov wrote:
> I have sent new patch[1] which I think is proper fix for this.

> 

> [1] https://lore.kernel.org/netdev/20210804081339.19909-1-iivanov@suse.de/T/#u


Thanks.

I haven't reviewed the driver, but the patch itself LGTM from the
point of view that phy_read_status() should definitely only be
called with phydev->lock held.

I think we also need the "Doing it all yourself" section in
Documentation/networking/phy.rst fixed to specify that if you
call this function, you must be holding phydev->lock.

Lastly, I'm wondering how many other places call phy_read_status()
without holding phydev->lock - sounds like something that needs a
kernel-wide review, and then possibly we should introduce a lockdep
check for this in phy_read_status() to catch any new introductions.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Andrew Lunn Aug. 11, 2021, 2:39 p.m. UTC | #3
> I think will be easier if we protect public phylib API internally with

> lock, otherwise it is easy to make mistakes, obviously. But still this

> will not protect users which directly dereference phy_device members.


Anybody directly dereference phy_device members, rather than going
through the API, is directly responsible for any breakage they
cause. This is not a supported use case.

       Andrew
Russell King (Oracle) Aug. 11, 2021, 3:10 p.m. UTC | #4
On Wed, Aug 11, 2021 at 12:51:04PM +0300, Ivan T. Ivanov wrote:
> There are a few callers of this, but then we have a few callers of

> genphy_read_status() too, which are outside just implementing 

> phy_driver->read_status() and don't use locking.


I think we need to strongly discourage people using the genphy*
functions directly from anything except phylib driver code. Any
such use is a layering violation.

I think it does make sense to have a set of lower-level API
functions that do the locking necessary, rather than having the
phylib locking spread out across multiple network drivers. It's
better to have it in one place.

> Then there a few users of phy_init_eee() which uses phy_read_status(),

> again without locking.


That is kind-of a special case - phy_init_eee() can be called by
network drivers from within the phylib adjust_link() callback, and
this callback is made while holding the phydev's lock. So those
cases are safe.

If phy_init_eee() is called outside of that, and the lock isn't
taken, then yes, it's buggy.

This all said, I can't say that I have particularly liked the
phy_init_eee() API. It seems to mix interface-up like configuration
(do we wish clocks to stop in EEE) with working out whether EEE
should be enabled for the speed/duplex that we've just read from
the PHY. However, most users of this function are being called as a
result of a link-up event when we already know the link parameters,
so we shouldn't be re-interrogating the PHY at this point. It seems
to me to be entirely unnecessary.

> I think will be easier if we protect public phylib API internally with

> lock, otherwise it is easy to make mistakes, obviously. But still this

> will not protect users which directly dereference phy_device members.


As Andrew says... but there are some members that network drivers
have to access due to the design of phylib, such as speed, duplex,
*pause, link, etc. Indeed these can change at any time when
phydev->lock is not held due to the action of phylib polling or
link interrupts from the PHY. So, accessing them outside of the
adjust_link() callback without holding the lock is racy.

Note that phylink's usage is similarly safe to the adjust_link()
callback; its access to these members is similarly protected by
phydev->lock taken in the phylib state machine - we use the
slightly lower-level phy_link_change() hook which avoids some of
the help that phylib provides to network drivers (which phylink
really doesn't want phylib managing.)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Andrew Lunn Aug. 11, 2021, 10:23 p.m. UTC | #5
On Mon, Aug 09, 2021 at 03:16:33PM +0100, Russell King (Oracle) wrote:
> On Wed, Aug 04, 2021 at 11:33:10AM +0300, Ivan T. Ivanov wrote:

> > I have sent new patch[1] which I think is proper fix for this.

> > 

> > [1] https://lore.kernel.org/netdev/20210804081339.19909-1-iivanov@suse.de/T/#u

> 

> Thanks.

> 

> I haven't reviewed the driver, but the patch itself LGTM from the

> point of view that phy_read_status() should definitely only be

> called with phydev->lock held.


I'm cooking up a patchset which makes phy_read_status() take the
lock. I don't see any external callers taking the lock, so all the
changes are internal to phylib.

The change is however made a bit more complex by phy_read_status()
being in a header file, not phy.c. I wounder if there is some build
dependencies, modules vs built in. So my first patch simply moves it
into phy.c no other change. I will push it to github and let 0-day
chew on it for a while and see if it finds any build failures.

     Andrew
diff mbox series

Patch

diff --git a/drivers/net/phy/phy_led_triggers.c b/drivers/net/phy/phy_led_triggers.c
index f550576eb9da..4d6497c45ae4 100644
--- a/drivers/net/phy/phy_led_triggers.c
+++ b/drivers/net/phy/phy_led_triggers.c
@@ -33,7 +33,7 @@  void phy_led_trigger_change_speed(struct phy_device *phy)
 	if (!phy->link)
 		return phy_led_trigger_no_link(phy);
 
-	if (phy->speed == 0)
+	if (phy->speed == 0 || phy->speed == SPEED_UNKNOWN)
 		return;
 
 	plt = phy_speed_to_led_trigger(phy, phy->speed);