Message ID | 20210714191723.31294-3-LinoSanfilippo@gmx.de |
---|---|
State | New |
Headers | show |
Series | Fixes for KSZ DSA switch | expand |
Hi Lino, On Wed, Jul 14, 2021 at 09:17:23PM +0200, Lino Sanfilippo wrote: > If the checksum calculation is offloaded to the network device (e.g due to > NETIF_F_HW_CSUM inherited from the DSA master device), the calculated > layer 4 checksum is incorrect. This is since the DSA tag which is placed > after the layer 4 data is seen as a part of the data portion and thus > errorneously included into the checksum calculation. > To avoid this, always calculate the layer 4 checksum in software. > > Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> > --- This needs to be solved more generically for all tail taggers. Let me try out a few things tomorrow and come with a proposal.
On Wed, Jul 14, 2021 at 10:15:20PM +0200, Andrew Lunn wrote: > On Wed, Jul 14, 2021 at 10:48:12PM +0300, Vladimir Oltean wrote: > > Hi Lino, > > > > On Wed, Jul 14, 2021 at 09:17:23PM +0200, Lino Sanfilippo wrote: > > > If the checksum calculation is offloaded to the network device (e.g due to > > > NETIF_F_HW_CSUM inherited from the DSA master device), the calculated > > > layer 4 checksum is incorrect. This is since the DSA tag which is placed > > > after the layer 4 data is seen as a part of the data portion and thus > > > errorneously included into the checksum calculation. > > > To avoid this, always calculate the layer 4 checksum in software. > > > > > > Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> > > > --- > > > > This needs to be solved more generically for all tail taggers. Let me > > try out a few things tomorrow and come with a proposal. > > Maybe the skb_linearize() is also a generic problem, since many of the > tag drivers are using skb_put()? It looks like skb_linearize() is > cheap because checking if the skb is already linear is cheap. So maybe > we want to do it unconditionally? Yeah, but we should let the stack deal with both issues in validate_xmit_skb(). There is a skb_needs_linearize() call which returns false because the DSA interface inherits NETIF_F_SG from the master via dsa_slave_create(): slave_dev->features = master->vlan_features | NETIF_F_HW_TC; Arguably that's the problem right there, we shouldn't inherit neither NETIF_F_HW_CSUM nor NETIF_F_SG from the list of features inheritable by 8021q uppers. - If we inherit NETIF_F_SG we obligate ourselves to call skb_linearize() for tail taggers on xmit. DSA probably doesn't break anything for header taggers though even if the xmit skb is paged, since the DSA header would always be part of the skb head, so this is a feature we could keep for them. - If we inherit NETIF_F_HW_CSUM from the master for tail taggers, it is actively detrimential to keep this feature enabled, as proven my Lino. As for header taggers, I fail to see how this would be helpful, since the DSA master would always fail to see the real IP header (it has been pushed to the right by the DSA tag), and therefore, the DSA master offload would be effectively bypassed. So no point, really, in inheriting it in the first place in any situation. Lino, to fix these bugs by letting validate_xmit_skb() know what works for DSA and what doesn't, could you please: (a) move the current slave_dev->features assignment to dsa_slave_setup_tagger()? We now support changing the tagging protocol at runtime, and everything that depends on what the tagging protocol is (in this case, tag_ops->needed_headroom vs tag_ops->needed_tailroom) should be put in that function. (b) unconditionally clear NETIF_F_HW_CSUM from slave_dev->features, after inheriting the vlan_features from the master? (c) clear NETIF_F_SG from slave_dev->features if we have a non-zero tag_ops->needed_tailroom? Thanks. By the way I didn't get the chance to test anything, sorry if there is any mistake in my reasoning.
Hi Vladimir, > Gesendet: Donnerstag, 15. Juli 2021 um 08:54 Uhr > Von: "Vladimir Oltean" <olteanv@gmail.com> > An: "Andrew Lunn" <andrew@lunn.ch> > Cc: "Lino Sanfilippo" <LinoSanfilippo@gmx.de>, woojung.huh@microchip.com, UNGLinuxDriver@microchip.com, vivien.didelot@gmail.com, f.fainelli@gmail.com, davem@davemloft.net, kuba@kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org > Betreff: Re: [PATCH 2/2] net: dsa: tag_ksz: dont let the hardware process the layer 4 checksum > > On Wed, Jul 14, 2021 at 10:15:20PM +0200, Andrew Lunn wrote: > > On Wed, Jul 14, 2021 at 10:48:12PM +0300, Vladimir Oltean wrote: > > > Hi Lino, > > > > > > On Wed, Jul 14, 2021 at 09:17:23PM +0200, Lino Sanfilippo wrote: > > > > If the checksum calculation is offloaded to the network device (e.g due to > > > > NETIF_F_HW_CSUM inherited from the DSA master device), the calculated > > > > layer 4 checksum is incorrect. This is since the DSA tag which is placed > > > > after the layer 4 data is seen as a part of the data portion and thus > > > > errorneously included into the checksum calculation. > > > > To avoid this, always calculate the layer 4 checksum in software. > > > > > > > > Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> > > > > --- > > > > > > This needs to be solved more generically for all tail taggers. Let me > > > try out a few things tomorrow and come with a proposal. > > > > Maybe the skb_linearize() is also a generic problem, since many of the > > tag drivers are using skb_put()? It looks like skb_linearize() is > > cheap because checking if the skb is already linear is cheap. So maybe > > we want to do it unconditionally? > > Yeah, but we should let the stack deal with both issues in validate_xmit_skb(). > There is a skb_needs_linearize() call which returns false because the > DSA interface inherits NETIF_F_SG from the master via dsa_slave_create(): > > slave_dev->features = master->vlan_features | NETIF_F_HW_TC; > > Arguably that's the problem right there, we shouldn't inherit neither > NETIF_F_HW_CSUM nor NETIF_F_SG from the list of features inheritable by > 8021q uppers. > > - If we inherit NETIF_F_SG we obligate ourselves to call skb_linearize() > for tail taggers on xmit. DSA probably doesn't break anything for > header taggers though even if the xmit skb is paged, since the DSA > header would always be part of the skb head, so this is a feature we > could keep for them. > - If we inherit NETIF_F_HW_CSUM from the master for tail taggers, it is > actively detrimential to keep this feature enabled, as proven my Lino. > As for header taggers, I fail to see how this would be helpful, since > the DSA master would always fail to see the real IP header (it has > been pushed to the right by the DSA tag), and therefore, the DSA > master offload would be effectively bypassed. So no point, really, in > inheriting it in the first place in any situation. > > Lino, to fix these bugs by letting validate_xmit_skb() know what works > for DSA and what doesn't, could you please: > > (a) move the current slave_dev->features assignment to > dsa_slave_setup_tagger()? We now support changing the tagging > protocol at runtime, and everything that depends on what the tagging > protocol is (in this case, tag_ops->needed_headroom vs > tag_ops->needed_tailroom) should be put in that function. > (b) unconditionally clear NETIF_F_HW_CSUM from slave_dev->features, > after inheriting the vlan_features from the master? > (c) clear NETIF_F_SG from slave_dev->features if we have a non-zero > tag_ops->needed_tailroom? > Sure, I will test this solution. But I think NETIF_F_FRAGLIST should also be cleared in this case, right? Regards, Lino
On Thu, Jul 15, 2021 at 01:16:12PM +0200, Lino Sanfilippo wrote: > Sure, I will test this solution. But I think NETIF_F_FRAGLIST should also be > cleared in this case, right? Hmm, interesting question. I think only hns3 makes meaningful use of NETIF_F_FRAGLIST, right? I'm looking at hns3_fill_skb_to_desc(). Other drivers seem to set it for ridiculous reasons - looking at commit 66aa0678efc2 ("ibmveth: Support to enable LSO/CSO for Trunk VEA.") - they set NETIF_F_FRAGLIST and then linearize the skb chain anyway. The claimed 4x throughput benefit probably has to do with less skbs traversing the stack? I don't know. Anyway, it is hard to imagine all the things that could go wrong with chains of IP fragments on a DSA interface, precisely because I have so few examples to look at. I would say, header taggers are probably fine, tail taggers not so much, so apply the same treatment as for NETIF_F_SG?
> - If we inherit NETIF_F_HW_CSUM from the master for tail taggers, it is > actively detrimential to keep this feature enabled, as proven my Lino. > As for header taggers, I fail to see how this would be helpful, since > the DSA master would always fail to see the real IP header (it has > been pushed to the right by the DSA tag), and therefore, the DSA > master offload would be effectively bypassed. The Marvell MACs know about DSA and should be able to perform hardware checksumming. It is a long time since i looked at how this works, but i think there is a field in the descriptor which gets set with the offset to the IP header, so it work for DSA as well as EDSA. I _think_ Broadcom MACs also know about Broadcom tags and can do the right thing. So we need to be a bit careful here to prevent performance regressions for same vendor MAC+Switch combinations. Andrew
On Thu, Jul 15, 2021 at 03:04:31PM +0200, Lino Sanfilippo wrote: > Please note that skb_put() asserts that the SKB is linearized. So I think we > should rather clear both NETIF_F_FRAGLIST and NETIF_F_SG unconditionally since also > header taggers use some form of skb_put() dont they? The tail taggers use skb_put() as part of the routine to make room for the tail tag. Some of the header taggers use __skb_put_padto() when the packets are too small (under ETH_ZLEN). When they are so small they are definitely linear already. We don't have a third form/use of skb_put().
On Thu, Jul 15, 2021 at 03:08:53PM +0200, Andrew Lunn wrote: > > - If we inherit NETIF_F_HW_CSUM from the master for tail taggers, it is > > actively detrimential to keep this feature enabled, as proven my Lino. > > As for header taggers, I fail to see how this would be helpful, since > > the DSA master would always fail to see the real IP header (it has > > been pushed to the right by the DSA tag), and therefore, the DSA > > master offload would be effectively bypassed. > > The Marvell MACs know about DSA and should be able to perform hardware > checksumming. It is a long time since i looked at how this works, but > i think there is a field in the descriptor which gets set with the > offset to the IP header, so it work for DSA as well as EDSA. > > I _think_ Broadcom MACs also know about Broadcom tags and can do the > right thing. > > So we need to be a bit careful here to prevent performance regressions > for same vendor MAC+Switch combinations. Tell me more (show me some code). Do Marvell Ethernet controllers which support TX checksumming with Marvell switches do different things depending on whether DSA or EDSA is used? Because we can currently toggle between DSA and EDSA at runtime. This new information means we can only accept Lino's patch 2/2 as-is for the "net" tree, otherwise we will introduce regressions one way or another. It will only be a partial fix for the particular case of KSZ switches which probably have no DSA master counterpart to support TX checksumming. I expect Marvell switches to be equally broken on the Broadcom genet controller? No one will provide the TX checksum in that case. And that is not even "fixable" without devising a system where custom code can be run per {tagger, DSA master} pair, and this includes the case where the tagging protocol changes at runtime.
> Tell me more (show me some code). https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/marvell/mvneta.c#L1747 and https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/marvell/mvneta.c#L1944 It uses skb_network_offset(skb) to know where the real header is. This should work independent of DSA or EDSA. mvpp2_main.c looks to have something similar. The older mv643xx_eth.c also has something, but it is more subtle. Ah, found it: https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/marvell/mv643xx_eth.c#L683 > I expect Marvell switches to be equally broken on the Broadcom genet > controller? Maybe. Depends on how genet works. A Broadcom switch connected to a Marvell MAC probably works, since the code is generic. It should work for any switch which uses head tagging, although mv643xx_eth.c is limited to 4 or 8 byte tags. Andrew
On 7/15/21 6:08 AM, Andrew Lunn wrote: >> - If we inherit NETIF_F_HW_CSUM from the master for tail taggers, it is >> actively detrimential to keep this feature enabled, as proven my Lino. >> As for header taggers, I fail to see how this would be helpful, since >> the DSA master would always fail to see the real IP header (it has >> been pushed to the right by the DSA tag), and therefore, the DSA >> master offload would be effectively bypassed. > > The Marvell MACs know about DSA and should be able to perform hardware > checksumming. It is a long time since i looked at how this works, but > i think there is a field in the descriptor which gets set with the > offset to the IP header, so it work for DSA as well as EDSA. > > I _think_ Broadcom MACs also know about Broadcom tags and can do the > right thing. Yes, the SYSTEMPORT MAC as well as bgmac to some extent will automagically work, but they have to be told to expect a Broadcom tag in order to generate an appropriate checksum on receive. This is why we have this: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/bcmsysport.c#n144 Likewise, for TX: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/bcmsysport.c#n172 -- Florian
Hi, > Gesendet: Donnerstag, 15. Juli 2021 um 16:36 Uhr > Von: "Vladimir Oltean" <olteanv@gmail.com> > An: "Andrew Lunn" <andrew@lunn.ch> > Cc: "Lino Sanfilippo" <LinoSanfilippo@gmx.de>, woojung.huh@microchip.com, UNGLinuxDriver@microchip.com, vivien.didelot@gmail.com, f.fainelli@gmail.com, davem@davemloft.net, kuba@kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org > Betreff: Re: [PATCH 2/2] net: dsa: tag_ksz: dont let the hardware process the layer 4 checksum > > On Thu, Jul 15, 2021 at 03:08:53PM +0200, Andrew Lunn wrote: > > > - If we inherit NETIF_F_HW_CSUM from the master for tail taggers, it is > > > actively detrimential to keep this feature enabled, as proven my Lino. > > > As for header taggers, I fail to see how this would be helpful, since > > > the DSA master would always fail to see the real IP header (it has > > > been pushed to the right by the DSA tag), and therefore, the DSA > > > master offload would be effectively bypassed. > > > > The Marvell MACs know about DSA and should be able to perform hardware > > checksumming. It is a long time since i looked at how this works, but > > i think there is a field in the descriptor which gets set with the > > offset to the IP header, so it work for DSA as well as EDSA. > > > > I _think_ Broadcom MACs also know about Broadcom tags and can do the > > right thing. > > > > So we need to be a bit careful here to prevent performance regressions > > for same vendor MAC+Switch combinations. > > Tell me more (show me some code). Do Marvell Ethernet controllers which > support TX checksumming with Marvell switches do different things > depending on whether DSA or EDSA is used? Because we can currently > toggle between DSA and EDSA at runtime. > > This new information means we can only accept Lino's patch 2/2 as-is for > the "net" tree, otherwise we will introduce regressions one way or > another. It will only be a partial fix for the particular case of KSZ > switches which probably have no DSA master counterpart to support TX > checksumming. > Should I then resend the series with patch 1 handling the NETIF_F_SG and NETIF_F_FRAGLIST flags (i.e. deleting them if tailroom is needed) in dsa_slave_setup_tagger() as you suggested and patch 2 as it is? Regards, Lino
On Mon, Jul 19, 2021 at 10:20:13AM +0200, Lino Sanfilippo wrote: > Should I then resend the series with patch 1 handling the NETIF_F_SG and > NETIF_F_FRAGLIST flags (i.e. deleting them if tailroom is needed) in > dsa_slave_setup_tagger() as you suggested and patch 2 as it is? Yup. The TX checksum offload flag on DSA interfaces is definitely net-next material if we want to do it properly, so we can revisit it there. Thanks.
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c index 364f509d7cd7..d59f0e7019e2 100644 --- a/net/dsa/tag_ksz.c +++ b/net/dsa/tag_ksz.c @@ -56,6 +56,9 @@ static struct sk_buff *ksz8795_xmit(struct sk_buff *skb, struct net_device *dev) if (skb_linearize(skb)) return NULL; + if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) + return NULL; + /* Tag encoding */ tag = skb_put(skb, KSZ_INGRESS_TAG_LEN); addr = skb_mac_header(skb); @@ -120,6 +123,9 @@ static struct sk_buff *ksz9477_xmit(struct sk_buff *skb, if (skb_linearize(skb)) return NULL; + if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) + return NULL; + /* Tag encoding */ tag = skb_put(skb, KSZ9477_INGRESS_TAG_LEN); addr = skb_mac_header(skb); @@ -173,6 +179,9 @@ static struct sk_buff *ksz9893_xmit(struct sk_buff *skb, if (skb_linearize(skb)) return NULL; + if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) + return NULL; + /* Tag encoding */ tag = skb_put(skb, KSZ_INGRESS_TAG_LEN); addr = skb_mac_header(skb);
If the checksum calculation is offloaded to the network device (e.g due to NETIF_F_HW_CSUM inherited from the DSA master device), the calculated layer 4 checksum is incorrect. This is since the DSA tag which is placed after the layer 4 data is seen as a part of the data portion and thus errorneously included into the checksum calculation. To avoid this, always calculate the layer 4 checksum in software. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> --- net/dsa/tag_ksz.c | 9 +++++++++ 1 file changed, 9 insertions(+)