mbox series

[0/4] Bug fixes to amd-xgbe driver

Message ID 20210212180010.221129-1-Shyam-sundar.S-k@amd.com
Headers show
Series Bug fixes to amd-xgbe driver | expand

Message

Shyam Sundar S K Feb. 12, 2021, 6 p.m. UTC
General fixes on amd-xgbe driver are addressed in this series, mostly
on the mailbox communication failures and improving the link stability
of the amd-xgbe device.

Shyam Sundar S K (4):
  amd-xgbe: Reset the PHY rx data path when mailbox command timeout
  amd-xgbe: Fix NETDEV WATCHDOG transmit queue timeout warning
  amd-xgbe: Reset link when the link never comes back
  amd-xgbe: Fix network fluctuations when using 1G BELFUSE SFP

 drivers/net/ethernet/amd/xgbe/xgbe-common.h | 13 +++++++
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c    |  1 +
 drivers/net/ethernet/amd/xgbe/xgbe-mdio.c   |  3 +-
 drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c | 39 ++++++++++++++++++++-
 4 files changed, 53 insertions(+), 3 deletions(-)

Comments

Tom Lendacky Feb. 12, 2021, 6:48 p.m. UTC | #1
On 2/12/21 12:00 PM, Shyam Sundar S K wrote:
> Current driver calls the netif_carrier_off() during the later point in
> time to tear down the link which causes the netdev watchdog to timeout.

This is a bit confusing...  how about:

The current driver calls netif_carrier_off() late in the link tear down 
which can result in a netdev watchdog timeout.

> 
> Calling netif_carrier_off() immediately after netif_tx_stop_all_queues()
> would avoids the warning.

s/would//

> 
>   ------------[ cut here ]------------
>   NETDEV WATCHDOG: enp3s0f2 (amd-xgbe): transmit queue 0 timed out
>   WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x20d/0x220
>   Modules linked in: amd_xgbe(E)  amd-xgbe 0000:03:00.2 enp3s0f2: Link is Down
>   CPU: 3 PID: 0 Comm: swapper/3 Tainted: G            E
>   Hardware name: AMD Bilby-RV2/Bilby-RV2, BIOS RBB1202A 10/18/2019
>   RIP: 0010:dev_watchdog+0x20d/0x220
>   Code: 00 49 63 4e e0 eb 92 4c 89 e7 c6 05 c6 e2 c1 00 01 e8 e7 ce fc ff 89 d9 48
>   RSP: 0018:ffff90cfc28c3e88 EFLAGS: 00010286
>   RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
>   RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff90cfc28d63c0
>   RBP: ffff90cfb977845c R08: 0000000000000050 R09: 0000000000196018
>   R10: ffff90cfc28c3ef8 R11: 0000000000000000 R12: ffff90cfb9778000
>   R13: 0000000000000003 R14: ffff90cfb9778480 R15: 0000000000000010
>   FS:  0000000000000000(0000) GS:ffff90cfc28c0000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007f240ff2d9d0 CR3: 00000001e3e0a000 CR4: 00000000003406e0
>   Call Trace:
>    <IRQ>
>    ? pfifo_fast_reset+0x100/0x100
>    call_timer_fn+0x2b/0x130
>    run_timer_softirq+0x3e8/0x440
>    ? enqueue_hrtimer+0x39/0x90
> 
> Signed-off-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>

Same comment about Co-developed-by: here as previous patch.

With the above comments addressed,

Acked-by: Tom Lendacky <thomas.lendacky@amd.com>

> ---
>   drivers/net/ethernet/amd/xgbe/xgbe-drv.c  | 1 +
>   drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 1 -
>   2 files changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> index 2709a2db5657..395eb0b52680 100644
> --- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> +++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> @@ -1368,6 +1368,7 @@ static void xgbe_stop(struct xgbe_prv_data *pdata)
>   		return;
>   
>   	netif_tx_stop_all_queues(netdev);
> +	netif_carrier_off(pdata->netdev);
>   
>   	xgbe_stop_timers(pdata);
>   	flush_workqueue(pdata->dev_workqueue);
> diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
> index 93ef5a30cb8d..19ee4db0156d 100644
> --- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
> +++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
> @@ -1396,7 +1396,6 @@ static void xgbe_phy_stop(struct xgbe_prv_data *pdata)
>   	pdata->phy_if.phy_impl.stop(pdata);
>   
>   	pdata->phy.link = 0;
> -	netif_carrier_off(pdata->netdev);
>   
>   	xgbe_phy_adjust_link(pdata);
>   }
>
Tom Lendacky Feb. 12, 2021, 7:01 p.m. UTC | #2
On 2/12/21 12:00 PM, Shyam Sundar S K wrote:
> Normally, auto negotiation and reconnect should be automatically done by
> the hardware. But there seems to be an issue where auto negotiation has
> to be restarted manually. This happens because of link training and so
> even though still connected to the partner the link never "comes back".
> This would need a reset to recover.

This last sentence is strange. Are you meaning to say this needs to 
restart auto-negotiation?

Please mention this pertains only to a backplane connection mode.

> 
> Also, a change in xgbe-mdio is needed to get ethtool to recognize the
> link down and get the link change message.
> 
> Signed-off-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>

Same comment about Co-developed-by: as previous patch.

With those addressed,

Acked-by: Tom Lendacky <thomas.lendacky@amd.com>

> ---
>   drivers/net/ethernet/amd/xgbe/xgbe-mdio.c   | 2 +-
>   drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c | 8 ++++++++
>   2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
> index 19ee4db0156d..4e97b4869522 100644
> --- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
> +++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
> @@ -1345,7 +1345,7 @@ static void xgbe_phy_status(struct xgbe_prv_data *pdata)
>   							     &an_restart);
>   	if (an_restart) {
>   		xgbe_phy_config_aneg(pdata);
> -		return;
> +		goto adjust_link;
>   	}
>   
>   	if (pdata->phy.link) {
> diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
> index 489f1f86df99..1bb468ac9635 100644
> --- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
> +++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
> @@ -2607,6 +2607,14 @@ static int xgbe_phy_link_status(struct xgbe_prv_data *pdata, int *an_restart)
>   	if (reg & MDIO_STAT1_LSTATUS)
>   		return 1;
>   
> +	if (pdata->phy.autoneg == AUTONEG_ENABLE &&
> +	    phy_data->port_mode == XGBE_PORT_MODE_BACKPLANE) {
> +		if (!test_bit(XGBE_LINK_INIT, &pdata->dev_state)) {
> +			netif_carrier_off(pdata->netdev);
> +			*an_restart = 1;
> +		}
> +	}
> +
>   	/* No link, attempt a receiver reset cycle */
>   	if (phy_data->rrc_count++ > XGBE_RRC_FREQUENCY) {
>   		phy_data->rrc_count = 0;
>