Message ID | 20201220123957.1694-1-wangzhiqiang.bj@bytedance.com |
---|---|
State | Superseded |
Headers | show |
Series | net/ncsi: Use real net-device for response handler | expand |
On Sun, 20 Dec 2020 at 12:40, John Wang <wangzhiqiang.bj@bytedance.com> wrote: > > When aggregating ncsi interfaces and dedicated interfaces to bond > interfaces, the ncsi response handler will use the wrong net device to > find ncsi_dev, so that the ncsi interface will not work properly. > Here, we use the net device registered to packet_type to fix it. > > Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler") > Signed-off-by: John Wang <wangzhiqiang.bj@bytedance.com> Can you show me how to reproduce this? I don't know the ncsi or net code well enough to know if this is the correct fix. If you are confident it is correct then I have no objections. Cheers, Joel > --- > net/ncsi/ncsi-rsp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c > index a94bb59793f0..60ae32682904 100644 > --- a/net/ncsi/ncsi-rsp.c > +++ b/net/ncsi/ncsi-rsp.c > @@ -1120,7 +1120,7 @@ int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device *dev, > int payload, i, ret; > > /* Find the NCSI device */ > - nd = ncsi_find_dev(dev); > + nd = ncsi_find_dev(pt->dev); > ndp = nd ? TO_NCSI_DEV_PRIV(nd) : NULL; > if (!ndp) > return -ENODEV; > -- > 2.25.1 >
On Tue, 2020-12-22 at 06:13 +0000, Joel Stanley wrote: > On Sun, 20 Dec 2020 at 12:40, John Wang < > wangzhiqiang.bj@bytedance.com> wrote: > > > > When aggregating ncsi interfaces and dedicated interfaces to bond > > interfaces, the ncsi response handler will use the wrong net device > > to > > find ncsi_dev, so that the ncsi interface will not work properly. > > Here, we use the net device registered to packet_type to fix it. > > > > Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler") > > Signed-off-by: John Wang <wangzhiqiang.bj@bytedance.com> > > Can you show me how to reproduce this? > > I don't know the ncsi or net code well enough to know if this is the > correct fix. If you are confident it is correct then I have no > objections. This looks like it is probably right; pt->dev will be the original device from ncsi_register_dev(), if a response comes in to ncsi_rcv_rsp() associated with a different device then the driver will fail to find the correct ncsi_dev_priv. An example of the broken case would be good to see though. Cheers, Sam > > Cheers, > > Joel > > > --- > > net/ncsi/ncsi-rsp.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c > > index a94bb59793f0..60ae32682904 100644 > > --- a/net/ncsi/ncsi-rsp.c > > +++ b/net/ncsi/ncsi-rsp.c > > @@ -1120,7 +1120,7 @@ int ncsi_rcv_rsp(struct sk_buff *skb, struct > > net_device *dev, > > int payload, i, ret; > > > > /* Find the NCSI device */ > > - nd = ncsi_find_dev(dev); > > + nd = ncsi_find_dev(pt->dev); > > ndp = nd ? TO_NCSI_DEV_PRIV(nd) : NULL; > > if (!ndp) > > return -ENODEV; > > -- > > 2.25.1 > >
On Tue, 22 Dec 2020 10:38:21 -0800 Samuel Mendoza-Jonas wrote: > On Tue, 2020-12-22 at 06:13 +0000, Joel Stanley wrote: > > On Sun, 20 Dec 2020 at 12:40, John Wang wrote: > > > When aggregating ncsi interfaces and dedicated interfaces to bond > > > interfaces, the ncsi response handler will use the wrong net device > > > to > > > find ncsi_dev, so that the ncsi interface will not work properly. > > > Here, we use the net device registered to packet_type to fix it. > > > > > > Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler") > > > Signed-off-by: John Wang <wangzhiqiang.bj@bytedance.com> This sounds like exactly the case for which orig_dev was introduced. I think you should use the orig_dev argument, rather than pt->dev. Can you test if that works? > > Can you show me how to reproduce this? > > > > I don't know the ncsi or net code well enough to know if this is the > > correct fix. If you are confident it is correct then I have no > > objections. > > This looks like it is probably right; pt->dev will be the original > device from ncsi_register_dev(), if a response comes in to > ncsi_rcv_rsp() associated with a different device then the driver will > fail to find the correct ncsi_dev_priv. An example of the broken case > would be good to see though. From the description sounds like the case is whenever the ncsi interface is in a bond, the netdev from the second argument is the bond not the interface from which the frame came. It should be possible to repro even with only one interface on the system, create a bond or a team and add the ncsi interface to it. Does that make sense? I'm likely missing the subtleties here.
On Wed, Dec 23, 2020 at 10:25 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 22 Dec 2020 10:38:21 -0800 Samuel Mendoza-Jonas wrote: > > On Tue, 2020-12-22 at 06:13 +0000, Joel Stanley wrote: > > > On Sun, 20 Dec 2020 at 12:40, John Wang wrote: > > > > When aggregating ncsi interfaces and dedicated interfaces to bond > > > > interfaces, the ncsi response handler will use the wrong net device > > > > to > > > > find ncsi_dev, so that the ncsi interface will not work properly. > > > > Here, we use the net device registered to packet_type to fix it. > > > > > > > > Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler") > > > > Signed-off-by: John Wang <wangzhiqiang.bj@bytedance.com> > > This sounds like exactly the case for which orig_dev was introduced. > I think you should use the orig_dev argument, rather than pt->dev. will send a v2 > > Can you test if that works? Yes, it works. > > > > Can you show me how to reproduce this? On g220a, eth1 is the dedicated interface, eth0 is the ncsi interface kernel cfg: CONFIG_BONDING=y cat /etc/systemd/network/00-bmc-bond1.netdev [NetDev] Name=bond1 Description=Bond eth0 and eth1 Kind=bond [Bond] Mode=active-backup cat /etc/systemd/network/00-bmc-eth0.network [Match] Name=eth0 [Network] Bond=bond1 cat /etc/systemd/network/00-bmc-eth0.network [Match] Name=eth1 [Network] Bond=bond1 PrimarySlave=true ip addr .... 6: bond1: <BROADCAST,MULTICAST,UP,LOWER_UP400> mtu 1500 qdisc noqueue qlen 1000 link/ether b4:05:5d:8f:6a:ad brd ff:ff:ff:ff:ff:ff inet 169.254.11.178/16 brd 169.254.255.255 scope link bond1 valid_lft forever preferred_lft forever inet 192.168.1.108/24 brd 192.168.1.255 scope global bond1 valid_lft forever preferred_lft forever inet 10.2.16.118/24 brd 10.2.16.255 scope global bond1 valid_lft forever preferred_lft forever inet6 fe80::b605:5dff:fe8f:6aad/64 scope link ... Without this patch: After bmc boots: echo eth0 > /sys/class/net/bond1/bonding/active_slave admin@g220a:~# admin@g220a:~# echo eth0 > /sys/class/net/bond1/bonding/active_slave [ 105.964357] bond1: (slave eth0): making interface the new active one admin@g220a:~# ping 10.2.16.1 PING 10.2.16.1 (10.2.16.1): 56 data bytes 64 bytes from 10.2.16.1: seq=0 ttl=255 time=7.096 ms 64 bytes from 10.2.16.1: seq=1 ttl=255 time=2.143 ms 64 bytes from 10.2.16.1: seq=2 ttl=255 time=2.111 ms [ 112.642734] ftgmac100 1e660000.ethernet eth0: NCSI Channel 0 timed out! 64 bytes from 10.2.16.1: seq=3 ttl=255 time=2.039 ms 64 bytes from 10.2.16.1: seq=4 ttl=255 time=2.037 ms [ 117.842814] ftgmac100 1e660000.ethernet eth0: NCSI: No channel with link found, configuring channel 0 [ 134.482746] ftgmac100 1e660000.ethernet eth0: NCSI Channel 0 timed out! [ 139.682820] ftgmac100 1e660000.ethernet eth0: NCSI: No channel with link found, configuring channel 0 with this patch: After bmc boots: admin@g220a:~# echo eth0 > /sys/class/net/bond1/bonding/active_slave [58332.123754] bond1: (slave eth0): making interface the new active one admin@g220a:~# ping 10.2.16.1 PING 10.2.16.1 (10.2.16.1): 56 data bytes 64 bytes from 10.2.16.1: seq=0 ttl=255 time=7.279 ms ... ... 64 bytes from 10.2.16.1: seq=N ttl=255 time=2.037 ms > > > > > > I don't know the ncsi or net code well enough to know if this is the > > > correct fix. If you are confident it is correct then I have no > > > objections. > > > > This looks like it is probably right; pt->dev will be the original > > device from ncsi_register_dev(), if a response comes in to > > ncsi_rcv_rsp() associated with a different device then the driver will > > fail to find the correct ncsi_dev_priv. An example of the broken case > > would be good to see though. > > From the description sounds like the case is whenever the ncsi > interface is in a bond, the netdev from the second argument is > the bond not the interface from which the frame came. It should > be possible to repro even with only one interface on the system, > create a bond or a team and add the ncsi interface to it. > > Does that make sense? I'm likely missing the subtleties here. :) I guess so.
diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c index a94bb59793f0..60ae32682904 100644 --- a/net/ncsi/ncsi-rsp.c +++ b/net/ncsi/ncsi-rsp.c @@ -1120,7 +1120,7 @@ int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device *dev, int payload, i, ret; /* Find the NCSI device */ - nd = ncsi_find_dev(dev); + nd = ncsi_find_dev(pt->dev); ndp = nd ? TO_NCSI_DEV_PRIV(nd) : NULL; if (!ndp) return -ENODEV;
When aggregating ncsi interfaces and dedicated interfaces to bond interfaces, the ncsi response handler will use the wrong net device to find ncsi_dev, so that the ncsi interface will not work properly. Here, we use the net device registered to packet_type to fix it. Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler") Signed-off-by: John Wang <wangzhiqiang.bj@bytedance.com> --- net/ncsi/ncsi-rsp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)