diff mbox series

net/ncsi: Use real net-device for response handler

Message ID 20201220123957.1694-1-wangzhiqiang.bj@bytedance.com
State Superseded
Headers show
Series net/ncsi: Use real net-device for response handler | expand

Commit Message

John Wang Dec. 20, 2020, 12:39 p.m. UTC
When aggregating ncsi interfaces and dedicated interfaces to bond
interfaces, the ncsi response handler will use the wrong net device to
find ncsi_dev, so that the ncsi interface will not work properly.
Here, we use the net device registered to packet_type to fix it.

Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler")
Signed-off-by: John Wang <wangzhiqiang.bj@bytedance.com>
---
 net/ncsi/ncsi-rsp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Joel Stanley Dec. 22, 2020, 6:13 a.m. UTC | #1
On Sun, 20 Dec 2020 at 12:40, John Wang <wangzhiqiang.bj@bytedance.com> wrote:
>

> When aggregating ncsi interfaces and dedicated interfaces to bond

> interfaces, the ncsi response handler will use the wrong net device to

> find ncsi_dev, so that the ncsi interface will not work properly.

> Here, we use the net device registered to packet_type to fix it.

>

> Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler")

> Signed-off-by: John Wang <wangzhiqiang.bj@bytedance.com>


Can you show me how to reproduce this?

I don't know the ncsi or net code well enough to know if this is the
correct fix. If you are confident it is correct then I have no
objections.

Cheers,

Joel

> ---

>  net/ncsi/ncsi-rsp.c | 2 +-

>  1 file changed, 1 insertion(+), 1 deletion(-)

>

> diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c

> index a94bb59793f0..60ae32682904 100644

> --- a/net/ncsi/ncsi-rsp.c

> +++ b/net/ncsi/ncsi-rsp.c

> @@ -1120,7 +1120,7 @@ int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device *dev,

>         int payload, i, ret;

>

>         /* Find the NCSI device */

> -       nd = ncsi_find_dev(dev);

> +       nd = ncsi_find_dev(pt->dev);

>         ndp = nd ? TO_NCSI_DEV_PRIV(nd) : NULL;

>         if (!ndp)

>                 return -ENODEV;

> --

> 2.25.1

>
Samuel Mendoza-Jonas Dec. 22, 2020, 6:38 p.m. UTC | #2
On Tue, 2020-12-22 at 06:13 +0000, Joel Stanley wrote:
> On Sun, 20 Dec 2020 at 12:40, John Wang <

> wangzhiqiang.bj@bytedance.com> wrote:

> > 

> > When aggregating ncsi interfaces and dedicated interfaces to bond

> > interfaces, the ncsi response handler will use the wrong net device

> > to

> > find ncsi_dev, so that the ncsi interface will not work properly.

> > Here, we use the net device registered to packet_type to fix it.

> > 

> > Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler")

> > Signed-off-by: John Wang <wangzhiqiang.bj@bytedance.com>

> 

> Can you show me how to reproduce this?

> 

> I don't know the ncsi or net code well enough to know if this is the

> correct fix. If you are confident it is correct then I have no

> objections.


This looks like it is probably right; pt->dev will be the original
device from ncsi_register_dev(), if a response comes in to
ncsi_rcv_rsp() associated with a different device then the driver will
fail to find the correct ncsi_dev_priv. An example of the broken case
would be good to see though.

Cheers,
Sam

> 

> Cheers,

> 

> Joel

> 

> > ---

> >  net/ncsi/ncsi-rsp.c | 2 +-

> >  1 file changed, 1 insertion(+), 1 deletion(-)

> > 

> > diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c

> > index a94bb59793f0..60ae32682904 100644

> > --- a/net/ncsi/ncsi-rsp.c

> > +++ b/net/ncsi/ncsi-rsp.c

> > @@ -1120,7 +1120,7 @@ int ncsi_rcv_rsp(struct sk_buff *skb, struct

> > net_device *dev,

> >         int payload, i, ret;

> > 

> >         /* Find the NCSI device */

> > -       nd = ncsi_find_dev(dev);

> > +       nd = ncsi_find_dev(pt->dev);

> >         ndp = nd ? TO_NCSI_DEV_PRIV(nd) : NULL;

> >         if (!ndp)

> >                 return -ENODEV;

> > --

> > 2.25.1

> >
Jakub Kicinski Dec. 23, 2020, 2:24 a.m. UTC | #3
On Tue, 22 Dec 2020 10:38:21 -0800 Samuel Mendoza-Jonas wrote:
> On Tue, 2020-12-22 at 06:13 +0000, Joel Stanley wrote:

> > On Sun, 20 Dec 2020 at 12:40, John Wang wrote:

> > > When aggregating ncsi interfaces and dedicated interfaces to bond

> > > interfaces, the ncsi response handler will use the wrong net device

> > > to

> > > find ncsi_dev, so that the ncsi interface will not work properly.

> > > Here, we use the net device registered to packet_type to fix it.

> > > 

> > > Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler")

> > > Signed-off-by: John Wang <wangzhiqiang.bj@bytedance.com>  


This sounds like exactly the case for which orig_dev was introduced.
I think you should use the orig_dev argument, rather than pt->dev.

Can you test if that works?

> > Can you show me how to reproduce this?

> > 

> > I don't know the ncsi or net code well enough to know if this is the

> > correct fix. If you are confident it is correct then I have no

> > objections.  

> 

> This looks like it is probably right; pt->dev will be the original

> device from ncsi_register_dev(), if a response comes in to

> ncsi_rcv_rsp() associated with a different device then the driver will

> fail to find the correct ncsi_dev_priv. An example of the broken case

> would be good to see though.


From the description sounds like the case is whenever the ncsi
interface is in a bond, the netdev from the second argument is 
the bond not the interface from which the frame came. It should 
be possible to repro even with only one interface on the system,
create a bond or a team and add the ncsi interface to it.

Does that make sense? I'm likely missing the subtleties here.
John Wang Dec. 23, 2020, 5:29 a.m. UTC | #4
On Wed, Dec 23, 2020 at 10:25 AM Jakub Kicinski <kuba@kernel.org> wrote:
>

> On Tue, 22 Dec 2020 10:38:21 -0800 Samuel Mendoza-Jonas wrote:

> > On Tue, 2020-12-22 at 06:13 +0000, Joel Stanley wrote:

> > > On Sun, 20 Dec 2020 at 12:40, John Wang wrote:

> > > > When aggregating ncsi interfaces and dedicated interfaces to bond

> > > > interfaces, the ncsi response handler will use the wrong net device

> > > > to

> > > > find ncsi_dev, so that the ncsi interface will not work properly.

> > > > Here, we use the net device registered to packet_type to fix it.

> > > >

> > > > Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler")

> > > > Signed-off-by: John Wang <wangzhiqiang.bj@bytedance.com>

>

> This sounds like exactly the case for which orig_dev was introduced.

> I think you should use the orig_dev argument, rather than pt->dev.


will send a v2

>

> Can you test if that works?


Yes,  it works.

>

> > > Can you show me how to reproduce this?


On g220a, eth1 is the dedicated interface, eth0 is the ncsi interface

kernel cfg:
CONFIG_BONDING=y

cat /etc/systemd/network/00-bmc-bond1.netdev
[NetDev]
Name=bond1
Description=Bond eth0 and eth1
Kind=bond

[Bond]
Mode=active-backup

cat /etc/systemd/network/00-bmc-eth0.network
[Match]
Name=eth0
[Network]
Bond=bond1

cat /etc/systemd/network/00-bmc-eth0.network
[Match]
Name=eth1
[Network]
Bond=bond1
PrimarySlave=true

ip addr
....
6: bond1: <BROADCAST,MULTICAST,UP,LOWER_UP400> mtu 1500 qdisc noqueue qlen 1000
    link/ether b4:05:5d:8f:6a:ad brd ff:ff:ff:ff:ff:ff
    inet 169.254.11.178/16 brd 169.254.255.255 scope link bond1
       valid_lft forever preferred_lft forever
    inet 192.168.1.108/24 brd 192.168.1.255 scope global bond1
       valid_lft forever preferred_lft forever
    inet 10.2.16.118/24 brd 10.2.16.255 scope global bond1
       valid_lft forever preferred_lft forever
    inet6 fe80::b605:5dff:fe8f:6aad/64 scope link
...


Without this patch:
After bmc boots:
echo eth0 > /sys/class/net/bond1/bonding/active_slave
admin@g220a:~#
admin@g220a:~# echo eth0 > /sys/class/net/bond1/bonding/active_slave
[  105.964357] bond1: (slave eth0): making interface the new active one
admin@g220a:~# ping 10.2.16.1
PING 10.2.16.1 (10.2.16.1): 56 data bytes
64 bytes from 10.2.16.1: seq=0 ttl=255 time=7.096 ms
64 bytes from 10.2.16.1: seq=1 ttl=255 time=2.143 ms
64 bytes from 10.2.16.1: seq=2 ttl=255 time=2.111 ms
[  112.642734] ftgmac100 1e660000.ethernet eth0: NCSI Channel 0 timed out!
64 bytes from 10.2.16.1: seq=3 ttl=255 time=2.039 ms
64 bytes from 10.2.16.1: seq=4 ttl=255 time=2.037 ms
[  117.842814] ftgmac100 1e660000.ethernet eth0: NCSI: No channel with
link found, configuring channel 0
[  134.482746] ftgmac100 1e660000.ethernet eth0: NCSI Channel 0 timed out!
[  139.682820] ftgmac100 1e660000.ethernet eth0: NCSI: No channel with
link found, configuring channel 0

with this patch:
After bmc boots:

admin@g220a:~# echo eth0 > /sys/class/net/bond1/bonding/active_slave
[58332.123754] bond1: (slave eth0): making interface the new active one
admin@g220a:~# ping 10.2.16.1
PING 10.2.16.1 (10.2.16.1): 56 data bytes
64 bytes from 10.2.16.1: seq=0 ttl=255 time=7.279 ms
...
...
64 bytes from 10.2.16.1: seq=N ttl=255 time=2.037 ms



> > >

> > > I don't know the ncsi or net code well enough to know if this is the

> > > correct fix. If you are confident it is correct then I have no

> > > objections.

> >

> > This looks like it is probably right; pt->dev will be the original

> > device from ncsi_register_dev(), if a response comes in to

> > ncsi_rcv_rsp() associated with a different device then the driver will

> > fail to find the correct ncsi_dev_priv. An example of the broken case

> > would be good to see though.

>

> From the description sounds like the case is whenever the ncsi

> interface is in a bond, the netdev from the second argument is

> the bond not the interface from which the frame came. It should

> be possible to repro even with only one interface on the system,

> create a bond or a team and add the ncsi interface to it.

>

> Does that make sense? I'm likely missing the subtleties here.


:)  I guess so.
diff mbox series

Patch

diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index a94bb59793f0..60ae32682904 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -1120,7 +1120,7 @@  int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device *dev,
 	int payload, i, ret;
 
 	/* Find the NCSI device */
-	nd = ncsi_find_dev(dev);
+	nd = ncsi_find_dev(pt->dev);
 	ndp = nd ? TO_NCSI_DEV_PRIV(nd) : NULL;
 	if (!ndp)
 		return -ENODEV;