diff mbox series

[2/2] net: dsa: tag_ksz: dont let the hardware process the layer 4 checksum

Message ID 20210714191723.31294-3-LinoSanfilippo@gmx.de
State New
Headers show
Series Fixes for KSZ DSA switch | expand

Commit Message

Lino Sanfilippo July 14, 2021, 7:17 p.m. UTC
If the checksum calculation is offloaded to the network device (e.g due to
NETIF_F_HW_CSUM inherited from the DSA master device), the calculated
layer 4 checksum is incorrect. This is since the DSA tag which is placed
after the layer 4 data is seen as a part of the data portion and thus
errorneously included into the checksum calculation.
To avoid this, always calculate the layer 4 checksum in software.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
---
 net/dsa/tag_ksz.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Vladimir Oltean July 14, 2021, 7:48 p.m. UTC | #1
Hi Lino,

On Wed, Jul 14, 2021 at 09:17:23PM +0200, Lino Sanfilippo wrote:
> If the checksum calculation is offloaded to the network device (e.g due to
> NETIF_F_HW_CSUM inherited from the DSA master device), the calculated
> layer 4 checksum is incorrect. This is since the DSA tag which is placed
> after the layer 4 data is seen as a part of the data portion and thus
> errorneously included into the checksum calculation.
> To avoid this, always calculate the layer 4 checksum in software.
> 
> Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> ---

This needs to be solved more generically for all tail taggers. Let me
try out a few things tomorrow and come with a proposal.
Vladimir Oltean July 15, 2021, 6:54 a.m. UTC | #2
On Wed, Jul 14, 2021 at 10:15:20PM +0200, Andrew Lunn wrote:
> On Wed, Jul 14, 2021 at 10:48:12PM +0300, Vladimir Oltean wrote:

> > Hi Lino,

> > 

> > On Wed, Jul 14, 2021 at 09:17:23PM +0200, Lino Sanfilippo wrote:

> > > If the checksum calculation is offloaded to the network device (e.g due to

> > > NETIF_F_HW_CSUM inherited from the DSA master device), the calculated

> > > layer 4 checksum is incorrect. This is since the DSA tag which is placed

> > > after the layer 4 data is seen as a part of the data portion and thus

> > > errorneously included into the checksum calculation.

> > > To avoid this, always calculate the layer 4 checksum in software.

> > > 

> > > Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>

> > > ---

> > 

> > This needs to be solved more generically for all tail taggers. Let me

> > try out a few things tomorrow and come with a proposal.

> 

> Maybe the skb_linearize() is also a generic problem, since many of the

> tag drivers are using skb_put()? It looks like skb_linearize() is

> cheap because checking if the skb is already linear is cheap. So maybe

> we want to do it unconditionally?


Yeah, but we should let the stack deal with both issues in validate_xmit_skb().
There is a skb_needs_linearize() call which returns false because the
DSA interface inherits NETIF_F_SG from the master via dsa_slave_create():

	slave_dev->features = master->vlan_features | NETIF_F_HW_TC;

Arguably that's the problem right there, we shouldn't inherit neither
NETIF_F_HW_CSUM nor NETIF_F_SG from the list of features inheritable by
8021q uppers.

- If we inherit NETIF_F_SG we obligate ourselves to call skb_linearize()
  for tail taggers on xmit. DSA probably doesn't break anything for
  header taggers though even if the xmit skb is paged, since the DSA
  header would always be part of the skb head, so this is a feature we
  could keep for them.
- If we inherit NETIF_F_HW_CSUM from the master for tail taggers, it is
  actively detrimential to keep this feature enabled, as proven my Lino.
  As for header taggers, I fail to see how this would be helpful, since
  the DSA master would always fail to see the real IP header (it has
  been pushed to the right by the DSA tag), and therefore, the DSA
  master offload would be effectively bypassed. So no point, really, in
  inheriting it in the first place in any situation.

Lino, to fix these bugs by letting validate_xmit_skb() know what works
for DSA and what doesn't, could you please:

(a) move the current slave_dev->features assignment to
    dsa_slave_setup_tagger()? We now support changing the tagging
    protocol at runtime, and everything that depends on what the tagging
    protocol is (in this case, tag_ops->needed_headroom vs
    tag_ops->needed_tailroom) should be put in that function.
(b) unconditionally clear NETIF_F_HW_CSUM from slave_dev->features,
    after inheriting the vlan_features from the master?
(c) clear NETIF_F_SG from slave_dev->features if we have a non-zero
    tag_ops->needed_tailroom?

Thanks.

By the way I didn't get the chance to test anything, sorry if there is
any mistake in my reasoning.
Lino Sanfilippo July 15, 2021, 11:16 a.m. UTC | #3
Hi Vladimir,

> Gesendet: Donnerstag, 15. Juli 2021 um 08:54 Uhr
> Von: "Vladimir Oltean" <olteanv@gmail.com>
> An: "Andrew Lunn" <andrew@lunn.ch>
> Cc: "Lino Sanfilippo" <LinoSanfilippo@gmx.de>, woojung.huh@microchip.com, UNGLinuxDriver@microchip.com, vivien.didelot@gmail.com, f.fainelli@gmail.com, davem@davemloft.net, kuba@kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org
> Betreff: Re: [PATCH 2/2] net: dsa: tag_ksz: dont let the hardware process the layer 4 checksum
>
> On Wed, Jul 14, 2021 at 10:15:20PM +0200, Andrew Lunn wrote:
> > On Wed, Jul 14, 2021 at 10:48:12PM +0300, Vladimir Oltean wrote:
> > > Hi Lino,
> > >
> > > On Wed, Jul 14, 2021 at 09:17:23PM +0200, Lino Sanfilippo wrote:
> > > > If the checksum calculation is offloaded to the network device (e.g due to
> > > > NETIF_F_HW_CSUM inherited from the DSA master device), the calculated
> > > > layer 4 checksum is incorrect. This is since the DSA tag which is placed
> > > > after the layer 4 data is seen as a part of the data portion and thus
> > > > errorneously included into the checksum calculation.
> > > > To avoid this, always calculate the layer 4 checksum in software.
> > > >
> > > > Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> > > > ---
> > >
> > > This needs to be solved more generically for all tail taggers. Let me
> > > try out a few things tomorrow and come with a proposal.
> >
> > Maybe the skb_linearize() is also a generic problem, since many of the
> > tag drivers are using skb_put()? It looks like skb_linearize() is
> > cheap because checking if the skb is already linear is cheap. So maybe
> > we want to do it unconditionally?
>
> Yeah, but we should let the stack deal with both issues in validate_xmit_skb().
> There is a skb_needs_linearize() call which returns false because the
> DSA interface inherits NETIF_F_SG from the master via dsa_slave_create():
>
> 	slave_dev->features = master->vlan_features | NETIF_F_HW_TC;
>
> Arguably that's the problem right there, we shouldn't inherit neither
> NETIF_F_HW_CSUM nor NETIF_F_SG from the list of features inheritable by
> 8021q uppers.
>
> - If we inherit NETIF_F_SG we obligate ourselves to call skb_linearize()
>   for tail taggers on xmit. DSA probably doesn't break anything for
>   header taggers though even if the xmit skb is paged, since the DSA
>   header would always be part of the skb head, so this is a feature we
>   could keep for them.
> - If we inherit NETIF_F_HW_CSUM from the master for tail taggers, it is
>   actively detrimential to keep this feature enabled, as proven my Lino.
>   As for header taggers, I fail to see how this would be helpful, since
>   the DSA master would always fail to see the real IP header (it has
>   been pushed to the right by the DSA tag), and therefore, the DSA
>   master offload would be effectively bypassed. So no point, really, in
>   inheriting it in the first place in any situation.
>
> Lino, to fix these bugs by letting validate_xmit_skb() know what works
> for DSA and what doesn't, could you please:
>
> (a) move the current slave_dev->features assignment to
>     dsa_slave_setup_tagger()? We now support changing the tagging
>     protocol at runtime, and everything that depends on what the tagging
>     protocol is (in this case, tag_ops->needed_headroom vs
>     tag_ops->needed_tailroom) should be put in that function.
> (b) unconditionally clear NETIF_F_HW_CSUM from slave_dev->features,
>     after inheriting the vlan_features from the master?
> (c) clear NETIF_F_SG from slave_dev->features if we have a non-zero
>     tag_ops->needed_tailroom?
>

Sure, I will test this solution. But I think NETIF_F_FRAGLIST should also be
cleared in this case, right?


Regards,
Lino
Vladimir Oltean July 15, 2021, 11:49 a.m. UTC | #4
On Thu, Jul 15, 2021 at 01:16:12PM +0200, Lino Sanfilippo wrote:
> Sure, I will test this solution. But I think NETIF_F_FRAGLIST should also be

> cleared in this case, right?


Hmm, interesting question. I think only hns3 makes meaningful use of
NETIF_F_FRAGLIST, right? I'm looking at hns3_fill_skb_to_desc().
Other drivers seem to set it for ridiculous reasons - looking at commit
66aa0678efc2 ("ibmveth: Support to enable LSO/CSO for Trunk VEA.") -
they set NETIF_F_FRAGLIST and then linearize the skb chain anyway. The
claimed 4x throughput benefit probably has to do with less skbs
traversing the stack? I don't know.

Anyway, it is hard to imagine all the things that could go wrong with
chains of IP fragments on a DSA interface, precisely because I have so
few examples to look at. I would say, header taggers are probably fine,
tail taggers not so much, so apply the same treatment as for NETIF_F_SG?
Andrew Lunn July 15, 2021, 1:08 p.m. UTC | #5
> - If we inherit NETIF_F_HW_CSUM from the master for tail taggers, it is

>   actively detrimential to keep this feature enabled, as proven my Lino.

>   As for header taggers, I fail to see how this would be helpful, since

>   the DSA master would always fail to see the real IP header (it has

>   been pushed to the right by the DSA tag), and therefore, the DSA

>   master offload would be effectively bypassed.


The Marvell MACs know about DSA and should be able to perform hardware
checksumming. It is a long time since i looked at how this works, but
i think there is a field in the descriptor which gets set with the
offset to the IP header, so it work for DSA as well as EDSA.

I _think_ Broadcom MACs also know about Broadcom tags and can do the
right thing.

So we need to be a bit careful here to prevent performance regressions
for same vendor MAC+Switch combinations.

    Andrew
Vladimir Oltean July 15, 2021, 1:12 p.m. UTC | #6
On Thu, Jul 15, 2021 at 03:04:31PM +0200, Lino Sanfilippo wrote:
> Please note that skb_put() asserts that the SKB is linearized. So I think we

> should rather clear both NETIF_F_FRAGLIST and NETIF_F_SG unconditionally since also

> header taggers use some form of skb_put() dont they?


The tail taggers use skb_put() as part of the routine to make room for
the tail tag.

Some of the header taggers use __skb_put_padto() when the packets are
too small (under ETH_ZLEN). When they are so small they are definitely
linear already.

We don't have a third form/use of skb_put().
Vladimir Oltean July 15, 2021, 2:36 p.m. UTC | #7
On Thu, Jul 15, 2021 at 03:08:53PM +0200, Andrew Lunn wrote:
> > - If we inherit NETIF_F_HW_CSUM from the master for tail taggers, it is

> >   actively detrimential to keep this feature enabled, as proven my Lino.

> >   As for header taggers, I fail to see how this would be helpful, since

> >   the DSA master would always fail to see the real IP header (it has

> >   been pushed to the right by the DSA tag), and therefore, the DSA

> >   master offload would be effectively bypassed.

>

> The Marvell MACs know about DSA and should be able to perform hardware

> checksumming. It is a long time since i looked at how this works, but

> i think there is a field in the descriptor which gets set with the

> offset to the IP header, so it work for DSA as well as EDSA.

>

> I _think_ Broadcom MACs also know about Broadcom tags and can do the

> right thing.

>

> So we need to be a bit careful here to prevent performance regressions

> for same vendor MAC+Switch combinations.


Tell me more (show me some code). Do Marvell Ethernet controllers which
support TX checksumming with Marvell switches do different things
depending on whether DSA or EDSA is used? Because we can currently
toggle between DSA and EDSA at runtime.

This new information means we can only accept Lino's patch 2/2 as-is for
the "net" tree, otherwise we will introduce regressions one way or
another. It will only be a partial fix for the particular case of KSZ
switches which probably have no DSA master counterpart to support TX
checksumming.

I expect Marvell switches to be equally broken on the Broadcom genet
controller? No one will provide the TX checksum in that case. And that
is not even "fixable" without devising a system where custom code can be
run per {tagger, DSA master} pair, and this includes the case where the
tagging protocol changes at runtime.
Andrew Lunn July 15, 2021, 3:50 p.m. UTC | #8
> Tell me more (show me some code).


https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/marvell/mvneta.c#L1747

and

https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/marvell/mvneta.c#L1944

It uses skb_network_offset(skb) to know where the real header is. This
should work independent of DSA or EDSA.

mvpp2_main.c looks to have something similar. The older mv643xx_eth.c
also has something, but it is more subtle. Ah, found it:

https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/marvell/mv643xx_eth.c#L683

> I expect Marvell switches to be equally broken on the Broadcom genet

> controller?


Maybe. Depends on how genet works. A Broadcom switch connected to a
Marvell MAC probably works, since the code is generic. It should work
for any switch which uses head tagging, although mv643xx_eth.c is
limited to 4 or 8 byte tags.

	Andrew
Florian Fainelli July 15, 2021, 4:24 p.m. UTC | #9
On 7/15/21 6:08 AM, Andrew Lunn wrote:
>> - If we inherit NETIF_F_HW_CSUM from the master for tail taggers, it is

>>   actively detrimential to keep this feature enabled, as proven my Lino.

>>   As for header taggers, I fail to see how this would be helpful, since

>>   the DSA master would always fail to see the real IP header (it has

>>   been pushed to the right by the DSA tag), and therefore, the DSA

>>   master offload would be effectively bypassed.

> 

> The Marvell MACs know about DSA and should be able to perform hardware

> checksumming. It is a long time since i looked at how this works, but

> i think there is a field in the descriptor which gets set with the

> offset to the IP header, so it work for DSA as well as EDSA.

> 

> I _think_ Broadcom MACs also know about Broadcom tags and can do the

> right thing.


Yes, the SYSTEMPORT MAC as well as bgmac to some extent will
automagically work, but they have to be told to expect a Broadcom tag in
order to generate an appropriate checksum on receive. This is why we
have this:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/bcmsysport.c#n144

Likewise, for TX:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/bcmsysport.c#n172
-- 
Florian
Lino Sanfilippo July 19, 2021, 8:20 a.m. UTC | #10
Hi,

> Gesendet: Donnerstag, 15. Juli 2021 um 16:36 Uhr
> Von: "Vladimir Oltean" <olteanv@gmail.com>
> An: "Andrew Lunn" <andrew@lunn.ch>
> Cc: "Lino Sanfilippo" <LinoSanfilippo@gmx.de>, woojung.huh@microchip.com, UNGLinuxDriver@microchip.com, vivien.didelot@gmail.com, f.fainelli@gmail.com, davem@davemloft.net, kuba@kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org
> Betreff: Re: [PATCH 2/2] net: dsa: tag_ksz: dont let the hardware process the layer 4 checksum
>
> On Thu, Jul 15, 2021 at 03:08:53PM +0200, Andrew Lunn wrote:
> > > - If we inherit NETIF_F_HW_CSUM from the master for tail taggers, it is
> > >   actively detrimential to keep this feature enabled, as proven my Lino.
> > >   As for header taggers, I fail to see how this would be helpful, since
> > >   the DSA master would always fail to see the real IP header (it has
> > >   been pushed to the right by the DSA tag), and therefore, the DSA
> > >   master offload would be effectively bypassed.
> >
> > The Marvell MACs know about DSA and should be able to perform hardware
> > checksumming. It is a long time since i looked at how this works, but
> > i think there is a field in the descriptor which gets set with the
> > offset to the IP header, so it work for DSA as well as EDSA.
> >
> > I _think_ Broadcom MACs also know about Broadcom tags and can do the
> > right thing.
> >
> > So we need to be a bit careful here to prevent performance regressions
> > for same vendor MAC+Switch combinations.
>
> Tell me more (show me some code). Do Marvell Ethernet controllers which
> support TX checksumming with Marvell switches do different things
> depending on whether DSA or EDSA is used? Because we can currently
> toggle between DSA and EDSA at runtime.
>
> This new information means we can only accept Lino's patch 2/2 as-is for
> the "net" tree, otherwise we will introduce regressions one way or
> another. It will only be a partial fix for the particular case of KSZ
> switches which probably have no DSA master counterpart to support TX
> checksumming.
>

Should I then resend the series with patch 1 handling the NETIF_F_SG and
NETIF_F_FRAGLIST flags (i.e. deleting them if tailroom is needed) in
dsa_slave_setup_tagger() as you suggested and patch 2 as it is?

Regards,
Lino
Vladimir Oltean July 19, 2021, 8:24 a.m. UTC | #11
On Mon, Jul 19, 2021 at 10:20:13AM +0200, Lino Sanfilippo wrote:
> Should I then resend the series with patch 1 handling the NETIF_F_SG and

> NETIF_F_FRAGLIST flags (i.e. deleting them if tailroom is needed) in

> dsa_slave_setup_tagger() as you suggested and patch 2 as it is?


Yup. The TX checksum offload flag on DSA interfaces is definitely
net-next material if we want to do it properly, so we can revisit it
there. Thanks.
diff mbox series

Patch

diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index 364f509d7cd7..d59f0e7019e2 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -56,6 +56,9 @@  static struct sk_buff *ksz8795_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (skb_linearize(skb))
 		return NULL;
 
+	if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb))
+		return NULL;
+
 	/* Tag encoding */
 	tag = skb_put(skb, KSZ_INGRESS_TAG_LEN);
 	addr = skb_mac_header(skb);
@@ -120,6 +123,9 @@  static struct sk_buff *ksz9477_xmit(struct sk_buff *skb,
 	if (skb_linearize(skb))
 		return NULL;
 
+	if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb))
+		return NULL;
+
 	/* Tag encoding */
 	tag = skb_put(skb, KSZ9477_INGRESS_TAG_LEN);
 	addr = skb_mac_header(skb);
@@ -173,6 +179,9 @@  static struct sk_buff *ksz9893_xmit(struct sk_buff *skb,
 	if (skb_linearize(skb))
 		return NULL;
 
+	if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb))
+		return NULL;
+
 	/* Tag encoding */
 	tag = skb_put(skb, KSZ_INGRESS_TAG_LEN);
 	addr = skb_mac_header(skb);