Message ID | 20210203050650.680656-1-sukadev@linux.ibm.com |
---|---|
State | Superseded |
Headers | show |
Series | [v2,1/2] ibmvnic: fix a race between open and reset | expand |
On Wed, Feb 3, 2021 at 12:10 AM Sukadev Bhattiprolu <sukadev@linux.ibm.com> wrote: > > If two or more instances of 'ip link set' commands race and first one > already brings the interface up (or down), the subsequent instances > can simply return without redoing the up/down operation. > > Fixes: ed651a10875f ("ibmvnic: Updated reset handling") > Reported-by: Abdul Haleem <abdhalee@in.ibm.com> > Tested-by: Abdul Haleem <abdhalee@in.ibm.com> > Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> > > --- > Changelog[v2] For consistency with ibmvnic_open() use "goto out" and return > from end of function. Did you find the code path that triggers this? In v1 we discussed how the usual ip link path should not call the driver twice based on IFF_UP.
Willem de Bruijn [willemdebruijn.kernel@gmail.com] wrote: > On Wed, Feb 3, 2021 at 12:10 AM Sukadev Bhattiprolu > <sukadev@linux.ibm.com> wrote: > > > > If two or more instances of 'ip link set' commands race and first one > > already brings the interface up (or down), the subsequent instances > > can simply return without redoing the up/down operation. > > > > Fixes: ed651a10875f ("ibmvnic: Updated reset handling") > > Reported-by: Abdul Haleem <abdhalee@in.ibm.com> > > Tested-by: Abdul Haleem <abdhalee@in.ibm.com> > > Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> > > > > --- > > Changelog[v2] For consistency with ibmvnic_open() use "goto out" and return > > from end of function. > > Did you find the code path that triggers this? Not yet. I need to find time on the system to repro/debug that > > In v1 we discussed how the usual ip link path should not call the > driver twice based on IFF_UP. Yes, we can hold this patch for now if necessary. Hopefully Patch 1/2 is ok. Sukadev
On Wed, 3 Feb 2021 12:56:52 -0800 Sukadev Bhattiprolu wrote: > Willem de Bruijn [willemdebruijn.kernel@gmail.com] wrote: > > On Wed, Feb 3, 2021 at 12:10 AM Sukadev Bhattiprolu > > <sukadev@linux.ibm.com> wrote: > > > > > > If two or more instances of 'ip link set' commands race and first one > > > already brings the interface up (or down), the subsequent instances > > > can simply return without redoing the up/down operation. > > > > > > Fixes: ed651a10875f ("ibmvnic: Updated reset handling") > > > Reported-by: Abdul Haleem <abdhalee@in.ibm.com> > > > Tested-by: Abdul Haleem <abdhalee@in.ibm.com> > > > Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> > > > > > > --- > > > Changelog[v2] For consistency with ibmvnic_open() use "goto out" and return > > > from end of function. > > > > Did you find the code path that triggers this? > > Not yet. I need to find time on the system to repro/debug that > > > > In v1 we discussed how the usual ip link path should not call the > > driver twice based on IFF_UP. > > Yes, we can hold this patch for now if necessary. Hopefully Patch 1/2 is > ok. Patch 1 does not apply to net as is. Please rebase and fix up the comments the way I fixed them for "ibmvnic: Clear failover_pending if unable to schedule" Thanks!
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 0ed169ef1cfc..78d244aeee69 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1197,12 +1197,26 @@ static int ibmvnic_open(struct net_device *netdev) struct ibmvnic_adapter *adapter = netdev_priv(netdev); int rc; - /* If device failover is pending, just set device state and return. - * Device operation will be handled by reset routine. + ASSERT_RTNL(); + + /** + * If device failover is pending or we are about to reset, just set + * device state and return. Device operation will be handled by reset + * routine. + * + * It should be safe to overwrite the adapter->state here. Since + * we hold the rtnl, either the reset has not actually started or + * the rtnl got dropped during the set_link_state() in do_reset(). + * In the former case, no one else is changing the state (again we + * have the rtnl) and in the latter case, do_reset() will detect and + * honor our setting below. */ - if (adapter->failover_pending) { + if (adapter->failover_pending || (test_bit(0, &adapter->resetting))) { + netdev_dbg(netdev, "[S:%d FOP:%d] Resetting, deferring open\n", + adapter->state, adapter->failover_pending); adapter->state = VNIC_OPEN; - return 0; + rc = 0; + goto out; } if (adapter->state != VNIC_CLOSED) { @@ -1222,10 +1236,12 @@ static int ibmvnic_open(struct net_device *netdev) out: /* - * If open fails due to a pending failover, set device state and - * return. Device operation will be handled by reset routine. + * If open failed and there is a pending failover or in-progress reset, + * set device state and return. Device operation will be handled by + * reset routine. See also comments above regarding rtnl. */ - if (rc && adapter->failover_pending) { + if (rc && + (adapter->failover_pending || (test_bit(0, &adapter->resetting)))) { adapter->state = VNIC_OPEN; rc = 0; } @@ -1954,6 +1970,14 @@ static int do_reset(struct ibmvnic_adapter *adapter, if (rwi->reset_reason == VNIC_RESET_FAILOVER) adapter->failover_pending = false; + /* read the state and check (again) after getting rtnl */ + reset_state = adapter->state; + + if (reset_state == VNIC_REMOVING || reset_state == VNIC_REMOVED) { + rc = -EBUSY; + goto out; + } + netif_carrier_off(netdev); old_num_rx_queues = adapter->req_rx_queues; @@ -1984,7 +2008,26 @@ static int do_reset(struct ibmvnic_adapter *adapter, if (rc) goto out; + if (adapter->state == VNIC_OPEN) { + /** + * When we dropped rtnl, ibmvnic_open() got + * it and noticed that we are resetting and + * set the adapter state to OPEN. Update our + * new "target" state, and resume the reset + * from VNIC_CLOSING state. + */ + netdev_dbg(netdev, + "Open changed state from %d, updating.\n", + reset_state); + reset_state = VNIC_OPEN; + adapter->state = VNIC_CLOSING; + } + if (adapter->state != VNIC_CLOSING) { + /** + * If someone else changed the adapter state + * when we dropped the rtnl, fail the reset + */ rc = -1; goto out; } @@ -2133,6 +2176,14 @@ static int do_hard_reset(struct ibmvnic_adapter *adapter, netdev_dbg(adapter->netdev, "Hard resetting driver (%d)\n", rwi->reset_reason); + /* read the state and check (again) after getting rtnl */ + reset_state = adapter->state; + + if (reset_state == VNIC_REMOVING || reset_state == VNIC_REMOVED) { + rc = -EBUSY; + goto out; + } + netif_carrier_off(netdev); adapter->reset_reason = rwi->reset_reason;