diff mbox series

ibmvnic: Continue with reset if set link down failed

Message ID 20210406034752.12840-1-drt@linux.ibm.com
State New
Headers show
Series ibmvnic: Continue with reset if set link down failed | expand

Commit Message

Dany Madden April 6, 2021, 3:47 a.m. UTC
When an adapter is going thru a reset, it maybe in an unstable state that
makes a request to set link down fail. In such a case, the adapter needs
to continue on with reset to bring itself back to a stable state.

Fixes: ed651a10875f ("ibmvnic: Updated reset handling")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Rick Lindsley April 6, 2021, 5:10 a.m. UTC | #1
On 4/5/21 8:47 PM, Dany Madden wrote:
> When an adapter is going thru a reset, it maybe in an unstable state that
> makes a request to set link down fail. In such a case, the adapter needs
> to continue on with reset to bring itself back to a stable state.
> 
> Fixes: ed651a10875f ("ibmvnic: Updated reset handling")
> Signed-off-by: Dany Madden <drt@linux.ibm.com>

Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com>
Lijun Pan April 6, 2021, 6:46 a.m. UTC | #2
> On Apr 5, 2021, at 10:47 PM, Dany Madden <drt@linux.ibm.com> wrote:
> 
> When an adapter is going thru a reset, it maybe in an unstable state that
> makes a request to set link down fail. In such a case, the adapter needs
> to continue on with reset to bring itself back to a stable state.
> 
> Fixes: ed651a10875f ("ibmvnic: Updated reset handling")
> Signed-off-by: Dany Madden <drt@linux.ibm.com>
> ---
> drivers/net/ethernet/ibm/ibmvnic.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
> index 9c6438d3b3a5..e4f01a7099a0 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -1976,8 +1976,10 @@ static int do_reset(struct ibmvnic_adapter *adapter,
> 			rtnl_unlock();
> 			rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);
> 			rtnl_lock();
> -			if (rc)
> -				goto out;
> +			if (rc) {
> +				netdev_dbg(netdev,
> +					   "Setting link down failed rc=%d. Continue anyway\n", rc);
> +			}

What’s the point of checking the return code if it can be neglected anyway?
If we really don’t care if set_link_state succeeds or not, we don’t even need to call
set_link_state() here.
It seems more correct to me that we find out why set_link_state fails and fix it from that end.

Lijun

> 
> 			if (adapter->state == VNIC_OPEN) {
> 				/* When we dropped rtnl, ibmvnic_open() got
> -- 
> 2.26.2
>
Dany Madden April 7, 2021, 7:03 p.m. UTC | #3
On 2021-04-05 23:46, Lijun Pan wrote:
>> On Apr 5, 2021, at 10:47 PM, Dany Madden <drt@linux.ibm.com> wrote:

>> 

>> When an adapter is going thru a reset, it maybe in an unstable state 

>> that

>> makes a request to set link down fail. In such a case, the adapter 

>> needs

>> to continue on with reset to bring itself back to a stable state.

>> 

>> Fixes: ed651a10875f ("ibmvnic: Updated reset handling")

>> Signed-off-by: Dany Madden <drt@linux.ibm.com>

>> ---

>> drivers/net/ethernet/ibm/ibmvnic.c | 6 ++++--

>> 1 file changed, 4 insertions(+), 2 deletions(-)

>> 

>> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 

>> b/drivers/net/ethernet/ibm/ibmvnic.c

>> index 9c6438d3b3a5..e4f01a7099a0 100644

>> --- a/drivers/net/ethernet/ibm/ibmvnic.c

>> +++ b/drivers/net/ethernet/ibm/ibmvnic.c

>> @@ -1976,8 +1976,10 @@ static int do_reset(struct ibmvnic_adapter 

>> *adapter,

>> 			rtnl_unlock();

>> 			rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);

>> 			rtnl_lock();

>> -			if (rc)

>> -				goto out;

>> +			if (rc) {

>> +				netdev_dbg(netdev,

>> +					   "Setting link down failed rc=%d. Continue anyway\n", rc);

>> +			}

> 

> What’s the point of checking the return code if it can be neglected 

> anyway?

> If we really don’t care if set_link_state succeeds or not, we don’t

> even need to call

> set_link_state() here.

> It seems more correct to me that we find out why set_link_state fails

> and fix it from that end.


We know why set link state failed. CRQ is no longer active at this 
point. It is not possible to send a link down request to the VIOS. If 
driver exits here, adapter will be left in an inoperable state. If it 
continues to reinitialize the crq, it can continue to reset and come up.

Prior to submitting this patch, we ran a 17-hour and a 24-hour tests 
(LPM+failover) on 10 vnics. We saw that: 

17 hours, hit 4 times
- 3 times driver is able to continue on to re-init CRQ and continue on 
to bring the adapter up.
- 1 time driver failed to re-init CRQ due to the last reset failed and 
released the CRQ. Subsequent hard reset from a transport event 
(failover) succeeded.

24 hours, hit 10 times
- 7 times driver is able to continue on to re-init CRQ and continue to 
bring the adapter up.
- 3 times driver failed to init CRQ due to the last reset failed and 
released the CRQ. Subsequent hard reset from a transport event (failover 
or lpm) succeed.

In both runs, with the patch, 10 vnics continue to work as expected.

> 

> Lijun

> 

>> 

>> 			if (adapter->state == VNIC_OPEN) {

>> 				/* When we dropped rtnl, ibmvnic_open() got

>> --

>> 2.26.2

>>
Sukadev Bhattiprolu April 8, 2021, 6:03 a.m. UTC | #4
Dany Madden [drt@linux.ibm.com] wrote:
> When an adapter is going thru a reset, it maybe in an unstable state that

> makes a request to set link down fail. In such a case, the adapter needs

> to continue on with reset to bring itself back to a stable state.

> 

> Fixes: ed651a10875f ("ibmvnic: Updated reset handling")

> Signed-off-by: Dany Madden <drt@linux.ibm.com>


Given that the likely reason for set_link_state() failing is that the
CRQ is inactive and that we will attempt to free the CRQ and re-register
it in ibmvnic_reset_crq() further down, I think its okay to ignore the
error here.

Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Dany Madden April 12, 2021, 4:25 p.m. UTC | #5
On 2021-04-05 23:46, Lijun Pan wrote:
>> On Apr 5, 2021, at 10:47 PM, Dany Madden <drt@linux.ibm.com> wrote:

>> 

>> When an adapter is going thru a reset, it maybe in an unstable state 

>> that

>> makes a request to set link down fail. In such a case, the adapter 

>> needs

>> to continue on with reset to bring itself back to a stable state.

>> 

>> Fixes: ed651a10875f ("ibmvnic: Updated reset handling")

>> Signed-off-by: Dany Madden <drt@linux.ibm.com>

>> ---

>> drivers/net/ethernet/ibm/ibmvnic.c | 6 ++++--

>> 1 file changed, 4 insertions(+), 2 deletions(-)

>> 

>> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 

>> b/drivers/net/ethernet/ibm/ibmvnic.c

>> index 9c6438d3b3a5..e4f01a7099a0 100644

>> --- a/drivers/net/ethernet/ibm/ibmvnic.c

>> +++ b/drivers/net/ethernet/ibm/ibmvnic.c

>> @@ -1976,8 +1976,10 @@ static int do_reset(struct ibmvnic_adapter 

>> *adapter,

>> 			rtnl_unlock();

>> 			rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);

>> 			rtnl_lock();

>> -			if (rc)

>> -				goto out;

>> +			if (rc) {

>> +				netdev_dbg(netdev,

>> +					   "Setting link down failed rc=%d. Continue anyway\n", rc);

>> +			}

> 

> What’s the point of checking the return code if it can be neglected 

> anyway?

> If we really don’t care if set_link_state succeeds or not, we don’t

> even need to call

> set_link_state() here.

> It seems more correct to me that we find out why set_link_state fails

> and fix it from that end.

> 

> Lijun

> 

>> 

>> 			if (adapter->state == VNIC_OPEN) {

>> 				/* When we dropped rtnl, ibmvnic_open() got

>> --

>> 2.26.2

>>
Dany Madden April 12, 2021, 4:27 p.m. UTC | #6
On 2021-04-07 12:03, Dany Madden wrote:
> On 2021-04-05 23:46, Lijun Pan wrote:

>>> On Apr 5, 2021, at 10:47 PM, Dany Madden <drt@linux.ibm.com> wrote:

>>> 

>>> When an adapter is going thru a reset, it maybe in an unstable state 

>>> that

>>> makes a request to set link down fail. In such a case, the adapter 

>>> needs

>>> to continue on with reset to bring itself back to a stable state.

>>> 

>>> Fixes: ed651a10875f ("ibmvnic: Updated reset handling")

>>> Signed-off-by: Dany Madden <drt@linux.ibm.com>

>>> ---

>>> drivers/net/ethernet/ibm/ibmvnic.c | 6 ++++--

>>> 1 file changed, 4 insertions(+), 2 deletions(-)

>>> 

>>> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 

>>> b/drivers/net/ethernet/ibm/ibmvnic.c

>>> index 9c6438d3b3a5..e4f01a7099a0 100644

>>> --- a/drivers/net/ethernet/ibm/ibmvnic.c

>>> +++ b/drivers/net/ethernet/ibm/ibmvnic.c

>>> @@ -1976,8 +1976,10 @@ static int do_reset(struct ibmvnic_adapter 

>>> *adapter,

>>> 			rtnl_unlock();

>>> 			rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);

>>> 			rtnl_lock();

>>> -			if (rc)

>>> -				goto out;

>>> +			if (rc) {

>>> +				netdev_dbg(netdev,

>>> +					   "Setting link down failed rc=%d. Continue anyway\n", rc);

>>> +			}

>> 

>> What’s the point of checking the return code if it can be neglected 

>> anyway?

>> If we really don’t care if set_link_state succeeds or not, we don’t

>> even need to call

>> set_link_state() here.

>> It seems more correct to me that we find out why set_link_state fails

>> and fix it from that end.

> 

> We know why set link state failed. CRQ is no longer active at this

> point. It is not possible to send a link down request to the VIOS. If

> driver exits here, adapter will be left in an inoperable state. If it

> continues to reinitialize the crq, it can continue to reset and come

> up.

> 

> Prior to submitting this patch, we ran a 17-hour and a 24-hour tests

> (LPM+failover) on 10 vnics. We saw that: 

> 

> 17 hours, hit 4 times

> - 3 times driver is able to continue on to re-init CRQ and continue on

> to bring the adapter up.

> - 1 time driver failed to re-init CRQ due to the last reset failed and

> released the CRQ. Subsequent hard reset from a transport event

> (failover) succeeded.

> 

> 24 hours, hit 10 times

> - 7 times driver is able to continue on to re-init CRQ and continue to

> bring the adapter up.

> - 3 times driver failed to init CRQ due to the last reset failed and

> released the CRQ. Subsequent hard reset from a transport event

> (failover or lpm) succeed.

> 

> In both runs, with the patch, 10 vnics continue to work as expected.


Is there anything else that we need to address before this is accepted?

Dany

> 

>> 

>> Lijun

>> 

>>> 

>>> 			if (adapter->state == VNIC_OPEN) {

>>> 				/* When we dropped rtnl, ibmvnic_open() got

>>> --

>>> 2.26.2

>>>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 9c6438d3b3a5..e4f01a7099a0 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1976,8 +1976,10 @@  static int do_reset(struct ibmvnic_adapter *adapter,
 			rtnl_unlock();
 			rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);
 			rtnl_lock();
-			if (rc)
-				goto out;
+			if (rc) {
+				netdev_dbg(netdev,
+					   "Setting link down failed rc=%d. Continue anyway\n", rc);
+			}
 
 			if (adapter->state == VNIC_OPEN) {
 				/* When we dropped rtnl, ibmvnic_open() got