Message ID | 20210716141142.12710-1-iivanov@suse.de |
---|---|
State | New |
Headers | show |
Series | net: phy: leds: Trigger leds only if PHY speed is known | expand |
On Fri, Jul 16, 2021 at 09:32:00PM +0300, Ivan T. Ivanov wrote: > Hi, > > Quoting Andrew Lunn (2021-07-16 18:19:58) > > On Fri, Jul 16, 2021 at 05:11:42PM +0300, Ivan T. Ivanov wrote: > > > This prevents "No phy led trigger registered for speed(-1)" > > > alert message which is coused by phy_led_trigger_chage_speed() > > > being called during attaching phy to net_device where phy device > > > speed could be still unknown. > > > > Hi Ivan > > > > It seems odd that when attaching the PHY we have link, but not the > > speed. What PHY is this? > > This is lan78xx on RPi3B+ > > > > > > - if (phy->speed == 0) > > > + if (phy->speed == 0 || phy->speed == SPEED_UNKNOWN) > > > return; > > > > This change makes sense. But i'm wondering if the original logic is > > sound. We have link, but no speed information. > > Well, probably my interpretation was not correct. The most probable > call to phy_led_trigger_change_speed() which couses this alert is > phy_attach_direct() -> phy_led_triggers_register(), I think. I am > not sure that we have link at this stage or not. This does sound weird. When a phy_device is allocated, it's explicitly initialised with: dev->speed = SPEED_UNKNOWN; dev->duplex = DUPLEX_UNKNOWN; dev->link = 0; dev->state = PHY_DOWN; so, unless something is causing state to be read before we've attached the phy to a network device, this is how this state should remain. I wonder why you are seeing dev->link be non-zero. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Wed, Aug 04, 2021 at 11:33:10AM +0300, Ivan T. Ivanov wrote: > I have sent new patch[1] which I think is proper fix for this. > > [1] https://lore.kernel.org/netdev/20210804081339.19909-1-iivanov@suse.de/T/#u Thanks. I haven't reviewed the driver, but the patch itself LGTM from the point of view that phy_read_status() should definitely only be called with phydev->lock held. I think we also need the "Doing it all yourself" section in Documentation/networking/phy.rst fixed to specify that if you call this function, you must be holding phydev->lock. Lastly, I'm wondering how many other places call phy_read_status() without holding phydev->lock - sounds like something that needs a kernel-wide review, and then possibly we should introduce a lockdep check for this in phy_read_status() to catch any new introductions. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
> I think will be easier if we protect public phylib API internally with > lock, otherwise it is easy to make mistakes, obviously. But still this > will not protect users which directly dereference phy_device members. Anybody directly dereference phy_device members, rather than going through the API, is directly responsible for any breakage they cause. This is not a supported use case. Andrew
On Wed, Aug 11, 2021 at 12:51:04PM +0300, Ivan T. Ivanov wrote: > There are a few callers of this, but then we have a few callers of > genphy_read_status() too, which are outside just implementing > phy_driver->read_status() and don't use locking. I think we need to strongly discourage people using the genphy* functions directly from anything except phylib driver code. Any such use is a layering violation. I think it does make sense to have a set of lower-level API functions that do the locking necessary, rather than having the phylib locking spread out across multiple network drivers. It's better to have it in one place. > Then there a few users of phy_init_eee() which uses phy_read_status(), > again without locking. That is kind-of a special case - phy_init_eee() can be called by network drivers from within the phylib adjust_link() callback, and this callback is made while holding the phydev's lock. So those cases are safe. If phy_init_eee() is called outside of that, and the lock isn't taken, then yes, it's buggy. This all said, I can't say that I have particularly liked the phy_init_eee() API. It seems to mix interface-up like configuration (do we wish clocks to stop in EEE) with working out whether EEE should be enabled for the speed/duplex that we've just read from the PHY. However, most users of this function are being called as a result of a link-up event when we already know the link parameters, so we shouldn't be re-interrogating the PHY at this point. It seems to me to be entirely unnecessary. > I think will be easier if we protect public phylib API internally with > lock, otherwise it is easy to make mistakes, obviously. But still this > will not protect users which directly dereference phy_device members. As Andrew says... but there are some members that network drivers have to access due to the design of phylib, such as speed, duplex, *pause, link, etc. Indeed these can change at any time when phydev->lock is not held due to the action of phylib polling or link interrupts from the PHY. So, accessing them outside of the adjust_link() callback without holding the lock is racy. Note that phylink's usage is similarly safe to the adjust_link() callback; its access to these members is similarly protected by phydev->lock taken in the phylib state machine - we use the slightly lower-level phy_link_change() hook which avoids some of the help that phylib provides to network drivers (which phylink really doesn't want phylib managing.) -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Mon, Aug 09, 2021 at 03:16:33PM +0100, Russell King (Oracle) wrote: > On Wed, Aug 04, 2021 at 11:33:10AM +0300, Ivan T. Ivanov wrote: > > I have sent new patch[1] which I think is proper fix for this. > > > > [1] https://lore.kernel.org/netdev/20210804081339.19909-1-iivanov@suse.de/T/#u > > Thanks. > > I haven't reviewed the driver, but the patch itself LGTM from the > point of view that phy_read_status() should definitely only be > called with phydev->lock held. I'm cooking up a patchset which makes phy_read_status() take the lock. I don't see any external callers taking the lock, so all the changes are internal to phylib. The change is however made a bit more complex by phy_read_status() being in a header file, not phy.c. I wounder if there is some build dependencies, modules vs built in. So my first patch simply moves it into phy.c no other change. I will push it to github and let 0-day chew on it for a while and see if it finds any build failures. Andrew
diff --git a/drivers/net/phy/phy_led_triggers.c b/drivers/net/phy/phy_led_triggers.c index f550576eb9da..4d6497c45ae4 100644 --- a/drivers/net/phy/phy_led_triggers.c +++ b/drivers/net/phy/phy_led_triggers.c @@ -33,7 +33,7 @@ void phy_led_trigger_change_speed(struct phy_device *phy) if (!phy->link) return phy_led_trigger_no_link(phy); - if (phy->speed == 0) + if (phy->speed == 0 || phy->speed == SPEED_UNKNOWN) return; plt = phy_speed_to_led_trigger(phy, phy->speed);
This prevents "No phy led trigger registered for speed(-1)" alert message which is coused by phy_led_trigger_chage_speed() being called during attaching phy to net_device where phy device speed could be still unknown. Signed-off-by: Ivan T. Ivanov <iivanov@suse.de> --- drivers/net/phy/phy_led_triggers.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)