mbox series

[net,0/4,pull,request] Intel Wired LAN Driver Updates 2021-01-28

Message ID 20210128213851.2499012-1-anthony.l.nguyen@intel.com
Headers show
Series Intel Wired LAN Driver Updates 2021-01-28 | expand

Message

Tony Nguyen Jan. 28, 2021, 9:38 p.m. UTC
This series contains updates to igc and i40e drivers.

Kai-Heng Feng fixes igc to report unknown speed and duplex during suspend
as an attempted read will cause errors.

Kevin Lo sets the default value to -IGC_ERR_NVM instead of success for
writing shadow RAM as this could miss a timeout. Also propagates the return
value for Flow Control configuration to properly pass on errors for igc.

Aleksandr reverts commit 2ad1274fa35a ("i40e: don't report link up for a VF
who hasn't enabled") as this can cause link flapping.

The following are changes since commit 44a674d6f79867d5652026f1cc11f7ba8a390183:
  Merge tag 'mlx5-fixes-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 1GbE

Aleksandr Loktionov (1):
  i40e: Revert "i40e: don't report link up for a VF who hasn't enabled
    queues"

Kai-Heng Feng (1):
  igc: Report speed and duplex as unknown when device is runtime
    suspended

Kevin Lo (2):
  igc: set the default return value to -IGC_ERR_NVM in
    igc_write_nvm_srwr
  igc: check return value of ret_val in igc_config_fc_after_link_up

 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 13 +------------
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |  1 -
 drivers/net/ethernet/intel/igc/igc_ethtool.c       |  3 ++-
 drivers/net/ethernet/intel/igc/igc_i225.c          |  3 +--
 drivers/net/ethernet/intel/igc/igc_mac.c           |  2 +-
 5 files changed, 5 insertions(+), 17 deletions

Comments

Jakub Kicinski Jan. 30, 2021, 6:22 a.m. UTC | #1
On Thu, 28 Jan 2021 13:38:48 -0800 Tony Nguyen wrote:
> From: Kai-Heng Feng <kai.heng.feng@canonical.com>

> 

> Similar to commit 165ae7a8feb5 ("igb: Report speed and duplex as unknown

> when device is runtime suspended"), if we try to read speed and duplex

> sysfs while the device is runtime suspended, igc will complain and

> stops working:


> The more generic approach will be wrap get_link_ksettings() with begin()

> and complete() callbacks, and calls runtime resume and runtime suspend

> routine respectively. However, igc is like igb, runtime resume routine

> uses rtnl_lock() which upper ethtool layer also uses.

> 

> So to prevent a deadlock on rtnl, take a different approach, use

> pm_runtime_suspended() to avoid reading register while device is runtime

> suspended.


Is someone working on the full fix to how PM operates?

There is another rd32(IGC_STATUS) in this file which I don't think 
is protected either.
Sasha Neftin Jan. 30, 2021, 2 p.m. UTC | #2
On 1/30/2021 08:22, Jakub Kicinski wrote:
> On Thu, 28 Jan 2021 13:38:48 -0800 Tony Nguyen wrote:

>> From: Kai-Heng Feng <kai.heng.feng@canonical.com>

>>

>> Similar to commit 165ae7a8feb5 ("igb: Report speed and duplex as unknown

>> when device is runtime suspended"), if we try to read speed and duplex

>> sysfs while the device is runtime suspended, igc will complain and

>> stops working:

> 

>> The more generic approach will be wrap get_link_ksettings() with begin()

>> and complete() callbacks, and calls runtime resume and runtime suspend

>> routine respectively. However, igc is like igb, runtime resume routine

>> uses rtnl_lock() which upper ethtool layer also uses.

>>

>> So to prevent a deadlock on rtnl, take a different approach, use

>> pm_runtime_suspended() to avoid reading register while device is runtime

>> suspended.

> 

> Is someone working on the full fix to how PM operates?

> 

> There is another rd32(IGC_STATUS) in this file which I don't think

> is protected either.

Hello Jakub,
What is another rd32(IGC_STATUS) you meant? in  igc_ethtool_get_regs? 
While the device in D3 state there is no configuration space registers 
access.
sasha
>
Jakub Kicinski Jan. 30, 2021, 6:12 p.m. UTC | #3
On Sat, 30 Jan 2021 16:00:06 +0200 Neftin, Sasha wrote:
> On 1/30/2021 08:22, Jakub Kicinski wrote:

> > On Thu, 28 Jan 2021 13:38:48 -0800 Tony Nguyen wrote:  

> >> From: Kai-Heng Feng <kai.heng.feng@canonical.com>

> >>

> >> Similar to commit 165ae7a8feb5 ("igb: Report speed and duplex as unknown

> >> when device is runtime suspended"), if we try to read speed and duplex

> >> sysfs while the device is runtime suspended, igc will complain and

> >> stops working:  

> >   

> >> The more generic approach will be wrap get_link_ksettings() with begin()

> >> and complete() callbacks, and calls runtime resume and runtime suspend

> >> routine respectively. However, igc is like igb, runtime resume routine

> >> uses rtnl_lock() which upper ethtool layer also uses.

> >>

> >> So to prevent a deadlock on rtnl, take a different approach, use

> >> pm_runtime_suspended() to avoid reading register while device is runtime

> >> suspended.  

> > 

> > Is someone working on the full fix to how PM operates?

> > 

> > There is another rd32(IGC_STATUS) in this file which I don't think

> > is protected either.  

>

> What is another rd32(IGC_STATUS) you meant? in  igc_ethtool_get_regs? 


Yes.

> While the device in D3 state there is no configuration space registers 

> access.


That's to say similar stack trace will be generated to the one fixed
here, if someone runs ethtool -d, correct? I don't see anything
checking runtime there either.

To be clear I'm not asking for this to be addressed in this series.
Rather for a strong commitment that PM handling will be restructured.
It seems to me you should depend on refcounting / locking that the PM
subsystem does more rather than involving rtnl_lock.
Sasha Neftin Jan. 31, 2021, 10:22 a.m. UTC | #4
On 1/30/2021 20:12, Jakub Kicinski wrote:
> On Sat, 30 Jan 2021 16:00:06 +0200 Neftin, Sasha wrote:

>> On 1/30/2021 08:22, Jakub Kicinski wrote:

>>> On Thu, 28 Jan 2021 13:38:48 -0800 Tony Nguyen wrote:

>>>> From: Kai-Heng Feng <kai.heng.feng@canonical.com>

>>>>

>>>> Similar to commit 165ae7a8feb5 ("igb: Report speed and duplex as unknown

>>>> when device is runtime suspended"), if we try to read speed and duplex

>>>> sysfs while the device is runtime suspended, igc will complain and

>>>> stops working:

>>>    

>>>> The more generic approach will be wrap get_link_ksettings() with begin()

>>>> and complete() callbacks, and calls runtime resume and runtime suspend

>>>> routine respectively. However, igc is like igb, runtime resume routine

>>>> uses rtnl_lock() which upper ethtool layer also uses.

>>>>

>>>> So to prevent a deadlock on rtnl, take a different approach, use

>>>> pm_runtime_suspended() to avoid reading register while device is runtime

>>>> suspended.

>>>

>>> Is someone working on the full fix to how PM operates?

>>>

>>> There is another rd32(IGC_STATUS) in this file which I don't think

>>> is protected either.

>>

>> What is another rd32(IGC_STATUS) you meant? in  igc_ethtool_get_regs?

> 

> Yes.

> 

>> While the device in D3 state there is no configuration space registers

>> access.

> 

> That's to say similar stack trace will be generated to the one fixed

> here, if someone runs ethtool -d, correct? I don't see anything

> checking runtime there either.

> 

yes.
This problem crosses many drivers. (not only igb, igc,...)

specific to this one (igc), can we check 'netif_running at begin of the 
_get_regs method:
if (!netif_running(netdev))
	return;
what do you think? (only OS can put device to the D3)

> To be clear I'm not asking for this to be addressed in this series.

> Rather for a strong commitment that PM handling will be restructured.

> It seems to me you should depend on refcounting / locking that the PM

> subsystem does more rather than involving rtnl_lock.

>
Jakub Kicinski Feb. 1, 2021, 10:04 p.m. UTC | #5
On Sun, 31 Jan 2021 12:22:25 +0200 Neftin, Sasha wrote:
> On 1/30/2021 20:12, Jakub Kicinski wrote:

> > On Sat, 30 Jan 2021 16:00:06 +0200 Neftin, Sasha wrote:  

> >> What is another rd32(IGC_STATUS) you meant? in  igc_ethtool_get_regs?  

> > 

> > Yes.

> >   

> >> While the device in D3 state there is no configuration space registers

> >> access.  

> > 

> > That's to say similar stack trace will be generated to the one fixed

> > here, if someone runs ethtool -d, correct? I don't see anything

> > checking runtime there either.

> >   

> yes.

> This problem crosses many drivers. (not only igb, igc,...)

> 

> specific to this one (igc), can we check 'netif_running at begin of the 

> _get_regs method:

> if (!netif_running(netdev))

> 	return;

> what do you think? (only OS can put device to the D3)


That'd address the particular issue we noticed in the 5min review of
this patch, but similar, less obvious problems may still be lurking?
I wish I knew more about PM so I could suggest a solution. It'd be
ideal to avoid the rtnl_lock calls in resume, so that the driver can
just wake up the device from within the callbacks. Maybe embedded
experts can chime in and suggest how it's usually handled..