net: phy: leds: Trigger leds only if PHY speed is known

Message ID	20210716141142.12710-1-iivanov@suse.de
State	New
Headers	show Return-Path: <netdev-owner@kernel.org> From: "Ivan T. Ivanov" <iivanov@suse.de> To: Andrew Lunn <andrew@lunn.ch> Cc: Heiner Kallweit <hkallweit1@gmail.com>, Russell King <linux@armlinux.org.uk>, "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] net: phy: leds: Trigger leds only if PHY speed is known Date: Fri, 16 Jul 2021 17:11:42 +0300 Message-Id: <20210716141142.12710-1-iivanov@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	net: phy: leds: Trigger leds only if PHY speed is known \| expand net: phy: leds: Trigger leds only if PHY speed is known

Ivan T . Ivanov July 16, 2021, 2:11 p.m. UTC

This prevents "No phy led trigger registered for speed(-1)"
alert message which is coused by phy_led_trigger_chage_speed()
being called during attaching phy to net_device where phy device
speed could be still unknown.

Signed-off-by: Ivan T. Ivanov <iivanov@suse.de>
---
 drivers/net/phy/phy_led_triggers.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Russell King (Oracle) July 19, 2021, 3:29 p.m. UTC | #1

On Fri, Jul 16, 2021 at 09:32:00PM +0300, Ivan T. Ivanov wrote:
> Hi,

> 

> Quoting Andrew Lunn (2021-07-16 18:19:58)

> > On Fri, Jul 16, 2021 at 05:11:42PM +0300, Ivan T. Ivanov wrote:

> > > This prevents "No phy led trigger registered for speed(-1)"

> > > alert message which is coused by phy_led_trigger_chage_speed()

> > > being called during attaching phy to net_device where phy device

> > > speed could be still unknown.

> > 

> > Hi Ivan

> > 

> > It seems odd that when attaching the PHY we have link, but not the

> > speed. What PHY is this?

> 

> This is lan78xx on RPi3B+

> 

> > 

> > > -     if (phy->speed == 0)

> > > +     if (phy->speed == 0 || phy->speed == SPEED_UNKNOWN)

> > >               return;

> > 

> > This change makes sense. But i'm wondering if the original logic is

> > sound. We have link, but no speed information.

> 

> Well, probably my interpretation was not correct. The most probable

> call to phy_led_trigger_change_speed() which couses this alert is

> phy_attach_direct() -> phy_led_triggers_register(), I think. I am

> not sure that we have link at this stage or not.


This does sound weird.

When a phy_device is allocated, it's explicitly initialised with:

	dev->speed = SPEED_UNKNOWN;
	dev->duplex = DUPLEX_UNKNOWN;
	dev->link = 0;
	dev->state = PHY_DOWN;

so, unless something is causing state to be read before we've attached
the phy to a network device, this is how this state should remain. I
wonder why you are seeing dev->link be non-zero.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Russell King (Oracle) Aug. 9, 2021, 2:16 p.m. UTC | #2

On Wed, Aug 04, 2021 at 11:33:10AM +0300, Ivan T. Ivanov wrote:
> I have sent new patch[1] which I think is proper fix for this.

> 

> [1] https://lore.kernel.org/netdev/20210804081339.19909-1-iivanov@suse.de/T/#u

Thanks.

I haven't reviewed the driver, but the patch itself LGTM from the
point of view that phy_read_status() should definitely only be
called with phydev->lock held.

I think we also need the "Doing it all yourself" section in
Documentation/networking/phy.rst fixed to specify that if you
call this function, you must be holding phydev->lock.

Lastly, I'm wondering how many other places call phy_read_status()
without holding phydev->lock - sounds like something that needs a
kernel-wide review, and then possibly we should introduce a lockdep
check for this in phy_read_status() to catch any new introductions.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Andrew Lunn Aug. 11, 2021, 2:39 p.m. UTC | #3

> I think will be easier if we protect public phylib API internally with

> lock, otherwise it is easy to make mistakes, obviously. But still this

> will not protect users which directly dereference phy_device members.


Anybody directly dereference phy_device members, rather than going
through the API, is directly responsible for any breakage they
cause. This is not a supported use case.

       Andrew

Russell King (Oracle) Aug. 11, 2021, 3:10 p.m. UTC | #4

On Wed, Aug 11, 2021 at 12:51:04PM +0300, Ivan T. Ivanov wrote:
> There are a few callers of this, but then we have a few callers of

> genphy_read_status() too, which are outside just implementing 

> phy_driver->read_status() and don't use locking.

I think we need to strongly discourage people using the genphy*
functions directly from anything except phylib driver code. Any
such use is a layering violation.

I think it does make sense to have a set of lower-level API
functions that do the locking necessary, rather than having the
phylib locking spread out across multiple network drivers. It's
better to have it in one place.

> Then there a few users of phy_init_eee() which uses phy_read_status(),

> again without locking.

That is kind-of a special case - phy_init_eee() can be called by
network drivers from within the phylib adjust_link() callback, and
this callback is made while holding the phydev's lock. So those
cases are safe.

If phy_init_eee() is called outside of that, and the lock isn't
taken, then yes, it's buggy.

This all said, I can't say that I have particularly liked the
phy_init_eee() API. It seems to mix interface-up like configuration
(do we wish clocks to stop in EEE) with working out whether EEE
should be enabled for the speed/duplex that we've just read from
the PHY. However, most users of this function are being called as a
result of a link-up event when we already know the link parameters,
so we shouldn't be re-interrogating the PHY at this point. It seems
to me to be entirely unnecessary.

> I think will be easier if we protect public phylib API internally with

> lock, otherwise it is easy to make mistakes, obviously. But still this

> will not protect users which directly dereference phy_device members.

As Andrew says... but there are some members that network drivers
have to access due to the design of phylib, such as speed, duplex,
*pause, link, etc. Indeed these can change at any time when
phydev->lock is not held due to the action of phylib polling or
link interrupts from the PHY. So, accessing them outside of the
adjust_link() callback without holding the lock is racy.

Note that phylink's usage is similarly safe to the adjust_link()
callback; its access to these members is similarly protected by
phydev->lock taken in the phylib state machine - we use the
slightly lower-level phy_link_change() hook which avoids some of
the help that phylib provides to network drivers (which phylink
really doesn't want phylib managing.)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Andrew Lunn Aug. 11, 2021, 10:23 p.m. UTC | #5

On Mon, Aug 09, 2021 at 03:16:33PM +0100, Russell King (Oracle) wrote:
> On Wed, Aug 04, 2021 at 11:33:10AM +0300, Ivan T. Ivanov wrote:

> > I have sent new patch[1] which I think is proper fix for this.

> > 

> > [1] https://lore.kernel.org/netdev/20210804081339.19909-1-iivanov@suse.de/T/#u

> 

> Thanks.

> 

> I haven't reviewed the driver, but the patch itself LGTM from the

> point of view that phy_read_status() should definitely only be

> called with phydev->lock held.

I'm cooking up a patchset which makes phy_read_status() take the
lock. I don't see any external callers taking the lock, so all the
changes are internal to phylib.

The change is however made a bit more complex by phy_read_status()
being in a header file, not phy.c. I wounder if there is some build
dependencies, modules vs built in. So my first patch simply moves it
into phy.c no other change. I will push it to github and let 0-day
chew on it for a while and see if it finds any build failures.

     Andrew

net: phy: leds: Trigger leds only if PHY speed is known

Commit Message

Comments

Patch