diff mbox series

[PATCHv2,net-next] ip_gre: remove CRC flag from dev features in gre_gso_segment

Message ID d8dc3cd362915974426d8274bb8ac6970a2096bb.1610371323.git.lucien.xin@gmail.com
State Superseded
Headers show
Series [PATCHv2,net-next] ip_gre: remove CRC flag from dev features in gre_gso_segment | expand

Commit Message

Xin Long Jan. 11, 2021, 1:22 p.m. UTC
This patch is to let it always do CRC checksum in sctp_gso_segment()
by removing CRC flag from the dev features in gre_gso_segment() for
SCTP over GRE, just as it does in Commit 527beb8ef9c0 ("udp: support
sctp over udp in skb_udp_tunnel_segment") for SCTP over UDP.

It could set csum/csum_start in GSO CB properly in sctp_gso_segment()
after that commit, so it would do checksum with gso_make_checksum()
in gre_gso_segment(), and Commit 622e32b7d4a6 ("net: gre: recompute
gre csum for sctp over gre tunnels") can be reverted now.

Note that the current HWs like igb NIC can only handle the SCTP CRC
when it's in the outer packet, not in the inner packet like in this
case, so here it removes CRC flag from the dev features even when
need_csum is false.

v1->v2:
  - improve the changelog.
  - fix "rev xmas tree" in varibles declaration.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/ipv4/gre_offload.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

Comments

Alexander Duyck Jan. 11, 2021, 4:48 p.m. UTC | #1
On Mon, Jan 11, 2021 at 5:22 AM Xin Long <lucien.xin@gmail.com> wrote:
>
> This patch is to let it always do CRC checksum in sctp_gso_segment()
> by removing CRC flag from the dev features in gre_gso_segment() for
> SCTP over GRE, just as it does in Commit 527beb8ef9c0 ("udp: support
> sctp over udp in skb_udp_tunnel_segment") for SCTP over UDP.
>
> It could set csum/csum_start in GSO CB properly in sctp_gso_segment()
> after that commit, so it would do checksum with gso_make_checksum()
> in gre_gso_segment(), and Commit 622e32b7d4a6 ("net: gre: recompute
> gre csum for sctp over gre tunnels") can be reverted now.
>
> Note that the current HWs like igb NIC can only handle the SCTP CRC
> when it's in the outer packet, not in the inner packet like in this
> case, so here it removes CRC flag from the dev features even when
> need_csum is false.

So the limitation in igb is not the hardware but the driver
configuration. When I had coded things up I put in a limitation on the
igb_tx_csum code that it would have to validate that the protocol we
are requesting an SCTP CRC offload since it is a different calculation
than a 1's complement checksum. Since igb doesn't support tunnels we
limited that check to the outer headers.

We could probably enable this for tunnels as long as the tunnel isn't
requesting an outer checksum offload from the driver.

> v1->v2:
>   - improve the changelog.
>   - fix "rev xmas tree" in varibles declaration.
>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  net/ipv4/gre_offload.c | 15 ++++-----------
>  1 file changed, 4 insertions(+), 11 deletions(-)
>
> diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
> index e0a2465..a681306 100644
> --- a/net/ipv4/gre_offload.c
> +++ b/net/ipv4/gre_offload.c
> @@ -15,10 +15,10 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
>                                        netdev_features_t features)
>  {
>         int tnl_hlen = skb_inner_mac_header(skb) - skb_transport_header(skb);
> -       bool need_csum, need_recompute_csum, gso_partial;
>         struct sk_buff *segs = ERR_PTR(-EINVAL);
>         u16 mac_offset = skb->mac_header;
>         __be16 protocol = skb->protocol;
> +       bool need_csum, gso_partial;
>         u16 mac_len = skb->mac_len;
>         int gre_offset, outer_hlen;
>
> @@ -41,10 +41,11 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
>         skb->protocol = skb->inner_protocol;
>
>         need_csum = !!(skb_shinfo(skb)->gso_type & SKB_GSO_GRE_CSUM);
> -       need_recompute_csum = skb->csum_not_inet;
>         skb->encap_hdr_csum = need_csum;
>
>         features &= skb->dev->hw_enc_features;
> +       /* CRC checksum can't be handled by HW when SCTP is the inner proto. */
> +       features &= ~NETIF_F_SCTP_CRC;
>
>         /* segment inner packet. */
>         segs = skb_mac_gso_segment(skb, features);

Do we have NICs that are advertising NETIF_S_SCTP_CRC as part of their
hw_enc_features and then not supporting it? Based on your comment
above it seems like you are masking this out because hardware is
advertising features it doesn't actually support. I'm just wondering
if that is the case or if this is something where this should be
cleared if need_csum is set since we only support one level of
checksum offload.

> @@ -99,15 +100,7 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
>                 }
>
>                 *(pcsum + 1) = 0;
> -               if (need_recompute_csum && !skb_is_gso(skb)) {
> -                       __wsum csum;
> -
> -                       csum = skb_checksum(skb, gre_offset,
> -                                           skb->len - gre_offset, 0);
> -                       *pcsum = csum_fold(csum);
> -               } else {
> -                       *pcsum = gso_make_checksum(skb, 0);
> -               }
> +               *pcsum = gso_make_checksum(skb, 0);
>         } while ((skb = skb->next));
>  out:
>         return segs;
> --
> 2.1.0
>
Xin Long Jan. 12, 2021, 5:14 a.m. UTC | #2
On Tue, Jan 12, 2021 at 12:48 AM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>

> On Mon, Jan 11, 2021 at 5:22 AM Xin Long <lucien.xin@gmail.com> wrote:

> >

> > This patch is to let it always do CRC checksum in sctp_gso_segment()

> > by removing CRC flag from the dev features in gre_gso_segment() for

> > SCTP over GRE, just as it does in Commit 527beb8ef9c0 ("udp: support

> > sctp over udp in skb_udp_tunnel_segment") for SCTP over UDP.

> >

> > It could set csum/csum_start in GSO CB properly in sctp_gso_segment()

> > after that commit, so it would do checksum with gso_make_checksum()

> > in gre_gso_segment(), and Commit 622e32b7d4a6 ("net: gre: recompute

> > gre csum for sctp over gre tunnels") can be reverted now.

> >

> > Note that the current HWs like igb NIC can only handle the SCTP CRC

> > when it's in the outer packet, not in the inner packet like in this

> > case, so here it removes CRC flag from the dev features even when

> > need_csum is false.

>

> So the limitation in igb is not the hardware but the driver

> configuration. When I had coded things up I put in a limitation on the

> igb_tx_csum code that it would have to validate that the protocol we

> are requesting an SCTP CRC offload since it is a different calculation

> than a 1's complement checksum. Since igb doesn't support tunnels we

> limited that check to the outer headers.

Ah.. I see, thanks.
>

> We could probably enable this for tunnels as long as the tunnel isn't

> requesting an outer checksum offload from the driver.

I think in igb_tx_csum(), by checking skb->csum_not_inet would be enough
to validate that is a SCTP request:
-               if (((first->protocol == htons(ETH_P_IP)) &&
-                    (ip_hdr(skb)->protocol == IPPROTO_SCTP)) ||
-                   ((first->protocol == htons(ETH_P_IPV6)) &&
-                    igb_ipv6_csum_is_sctp(skb))) {
+               if (skb->csum_not_inet) {
                        type_tucmd = E1000_ADVTXD_TUCMD_L4T_SCTP;
                        break;
                }

Otherwise, we will need to parse the packet a little bit, as it does in
hns3_get_l4_protocol().

>

> > v1->v2:

> >   - improve the changelog.

> >   - fix "rev xmas tree" in varibles declaration.

> >

> > Signed-off-by: Xin Long <lucien.xin@gmail.com>

> > ---

> >  net/ipv4/gre_offload.c | 15 ++++-----------

> >  1 file changed, 4 insertions(+), 11 deletions(-)

> >

> > diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c

> > index e0a2465..a681306 100644

> > --- a/net/ipv4/gre_offload.c

> > +++ b/net/ipv4/gre_offload.c

> > @@ -15,10 +15,10 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,

> >                                        netdev_features_t features)

> >  {

> >         int tnl_hlen = skb_inner_mac_header(skb) - skb_transport_header(skb);

> > -       bool need_csum, need_recompute_csum, gso_partial;

> >         struct sk_buff *segs = ERR_PTR(-EINVAL);

> >         u16 mac_offset = skb->mac_header;

> >         __be16 protocol = skb->protocol;

> > +       bool need_csum, gso_partial;

> >         u16 mac_len = skb->mac_len;

> >         int gre_offset, outer_hlen;

> >

> > @@ -41,10 +41,11 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,

> >         skb->protocol = skb->inner_protocol;

> >

> >         need_csum = !!(skb_shinfo(skb)->gso_type & SKB_GSO_GRE_CSUM);

> > -       need_recompute_csum = skb->csum_not_inet;

> >         skb->encap_hdr_csum = need_csum;

> >

> >         features &= skb->dev->hw_enc_features;

> > +       /* CRC checksum can't be handled by HW when SCTP is the inner proto. */

> > +       features &= ~NETIF_F_SCTP_CRC;

> >

> >         /* segment inner packet. */

> >         segs = skb_mac_gso_segment(skb, features);

>

> Do we have NICs that are advertising NETIF_S_SCTP_CRC as part of their

> hw_enc_features and then not supporting it? Based on your comment

Yes, igb/igbvf/igc/ixgbe/ixgbevf, they have a similar code of SCTP
proto validation.

> above it seems like you are masking this out because hardware is

> advertising features it doesn't actually support. I'm just wondering

> if that is the case or if this is something where this should be

> cleared if need_csum is set since we only support one level of

> checksum offload.

Since only these drivers only do SCTP proto validation, and "only
one level checksum offload" issue only exists when inner packet
is SCTP packet, clearing NETIF_F_SCTP_CRC should be enough.

But seems to fix the drivers will be better, as hw_enc_features should
tell the correct features for inner proto. wdyt?

(Just note udp tunneling SCTP doesn't have this issue, as the outer
 udp checksum is always required by RFC)

>

> > @@ -99,15 +100,7 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,

> >                 }

> >

> >                 *(pcsum + 1) = 0;

> > -               if (need_recompute_csum && !skb_is_gso(skb)) {

> > -                       __wsum csum;

> > -

> > -                       csum = skb_checksum(skb, gre_offset,

> > -                                           skb->len - gre_offset, 0);

> > -                       *pcsum = csum_fold(csum);

> > -               } else {

> > -                       *pcsum = gso_make_checksum(skb, 0);

> > -               }

> > +               *pcsum = gso_make_checksum(skb, 0);

> >         } while ((skb = skb->next));

> >  out:

> >         return segs;

> > --

> > 2.1.0

> >
Alexander Duyck Jan. 13, 2021, 2:11 a.m. UTC | #3
On Mon, Jan 11, 2021 at 9:14 PM Xin Long <lucien.xin@gmail.com> wrote:
>

> On Tue, Jan 12, 2021 at 12:48 AM Alexander Duyck

> <alexander.duyck@gmail.com> wrote:

> >

> > On Mon, Jan 11, 2021 at 5:22 AM Xin Long <lucien.xin@gmail.com> wrote:

> > >

> > > This patch is to let it always do CRC checksum in sctp_gso_segment()

> > > by removing CRC flag from the dev features in gre_gso_segment() for

> > > SCTP over GRE, just as it does in Commit 527beb8ef9c0 ("udp: support

> > > sctp over udp in skb_udp_tunnel_segment") for SCTP over UDP.

> > >

> > > It could set csum/csum_start in GSO CB properly in sctp_gso_segment()

> > > after that commit, so it would do checksum with gso_make_checksum()

> > > in gre_gso_segment(), and Commit 622e32b7d4a6 ("net: gre: recompute

> > > gre csum for sctp over gre tunnels") can be reverted now.

> > >

> > > Note that the current HWs like igb NIC can only handle the SCTP CRC

> > > when it's in the outer packet, not in the inner packet like in this

> > > case, so here it removes CRC flag from the dev features even when

> > > need_csum is false.

> >

> > So the limitation in igb is not the hardware but the driver

> > configuration. When I had coded things up I put in a limitation on the

> > igb_tx_csum code that it would have to validate that the protocol we

> > are requesting an SCTP CRC offload since it is a different calculation

> > than a 1's complement checksum. Since igb doesn't support tunnels we

> > limited that check to the outer headers.

> Ah.. I see, thanks.

> >

> > We could probably enable this for tunnels as long as the tunnel isn't

> > requesting an outer checksum offload from the driver.

> I think in igb_tx_csum(), by checking skb->csum_not_inet would be enough

> to validate that is a SCTP request:

> -               if (((first->protocol == htons(ETH_P_IP)) &&

> -                    (ip_hdr(skb)->protocol == IPPROTO_SCTP)) ||

> -                   ((first->protocol == htons(ETH_P_IPV6)) &&

> -                    igb_ipv6_csum_is_sctp(skb))) {

> +               if (skb->csum_not_inet) {

>                         type_tucmd = E1000_ADVTXD_TUCMD_L4T_SCTP;

>                         break;

>                 }

>


So if I may ask. Why go with something like csum_not_inet instead of
specifying something like crc32_csum? I'm just wondering if there are
any other non-1's complement checksums that we are dealing with?

One thing we might want to do to make eventual backporting for this
easier would be to add an accessor inline function. Maybe something
like a skb_csum_is_crc32() so that for older kernels the function
could just be defined to return false since the csum_not_inet may not
exist.

> Otherwise, we will need to parse the packet a little bit, as it does in

> hns3_get_l4_protocol().


Agreed. If the csum_not_inet means it is a crc32 checksum then we
could just look at the offsets and as long as they are correct for
sctp we could just go forward with what we have.

> >

> > > v1->v2:

> > >   - improve the changelog.

> > >   - fix "rev xmas tree" in varibles declaration.

> > >

> > > Signed-off-by: Xin Long <lucien.xin@gmail.com>

> > > ---

> > >  net/ipv4/gre_offload.c | 15 ++++-----------

> > >  1 file changed, 4 insertions(+), 11 deletions(-)

> > >

> > > diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c

> > > index e0a2465..a681306 100644

> > > --- a/net/ipv4/gre_offload.c

> > > +++ b/net/ipv4/gre_offload.c

> > > @@ -15,10 +15,10 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,

> > >                                        netdev_features_t features)

> > >  {

> > >         int tnl_hlen = skb_inner_mac_header(skb) - skb_transport_header(skb);

> > > -       bool need_csum, need_recompute_csum, gso_partial;

> > >         struct sk_buff *segs = ERR_PTR(-EINVAL);

> > >         u16 mac_offset = skb->mac_header;

> > >         __be16 protocol = skb->protocol;

> > > +       bool need_csum, gso_partial;

> > >         u16 mac_len = skb->mac_len;

> > >         int gre_offset, outer_hlen;

> > >

> > > @@ -41,10 +41,11 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,

> > >         skb->protocol = skb->inner_protocol;

> > >

> > >         need_csum = !!(skb_shinfo(skb)->gso_type & SKB_GSO_GRE_CSUM);

> > > -       need_recompute_csum = skb->csum_not_inet;

> > >         skb->encap_hdr_csum = need_csum;

> > >

> > >         features &= skb->dev->hw_enc_features;

> > > +       /* CRC checksum can't be handled by HW when SCTP is the inner proto. */

> > > +       features &= ~NETIF_F_SCTP_CRC;

> > >

> > >         /* segment inner packet. */

> > >         segs = skb_mac_gso_segment(skb, features);

> >

> > Do we have NICs that are advertising NETIF_S_SCTP_CRC as part of their

> > hw_enc_features and then not supporting it? Based on your comment

> Yes, igb/igbvf/igc/ixgbe/ixgbevf, they have a similar code of SCTP

> proto validation.


Yeah, it is old code. It was added in 4.6 before tunnels supported
SCTP_CRC I am guessing. It looks like csum_not_inet wasn't added until
4.13. So it would probably be best to fix the drivers since the driver
code is outdated.

> > above it seems like you are masking this out because hardware is

> > advertising features it doesn't actually support. I'm just wondering

> > if that is the case or if this is something where this should be

> > cleared if need_csum is set since we only support one level of

> > checksum offload.

> Since only these drivers only do SCTP proto validation, and "only

> one level checksum offload" issue only exists when inner packet

> is SCTP packet, clearing NETIF_F_SCTP_CRC should be enough.

>

> But seems to fix the drivers will be better, as hw_enc_features should

> tell the correct features for inner proto. wdyt?


Yes, it would be better to fix the drivers. However the one limitation
is that this will only work when we don't have an outer checksum in
place. If we have an outer checksum then we have to compute the crc32
checksum and then offload the outer checksum if we can.

> (Just note udp tunneling SCTP doesn't have this issue, as the outer

>  udp checksum is always required by RFC)


Thanks. I wasn't aware of that.

> >

> > > @@ -99,15 +100,7 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,

> > >                 }

> > >

> > >                 *(pcsum + 1) = 0;

> > > -               if (need_recompute_csum && !skb_is_gso(skb)) {

> > > -                       __wsum csum;

> > > -

> > > -                       csum = skb_checksum(skb, gre_offset,

> > > -                                           skb->len - gre_offset, 0);

> > > -                       *pcsum = csum_fold(csum);

> > > -               } else {

> > > -                       *pcsum = gso_make_checksum(skb, 0);

> > > -               }

> > > +               *pcsum = gso_make_checksum(skb, 0);

> > >         } while ((skb = skb->next));

> > >  out:

> > >         return segs;

> > > --

> > > 2.1.0

> > >
Xin Long Jan. 13, 2021, 9:46 a.m. UTC | #4
On Wed, Jan 13, 2021 at 10:11 AM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>

> On Mon, Jan 11, 2021 at 9:14 PM Xin Long <lucien.xin@gmail.com> wrote:

> >

> > On Tue, Jan 12, 2021 at 12:48 AM Alexander Duyck

> > <alexander.duyck@gmail.com> wrote:

> > >

> > > On Mon, Jan 11, 2021 at 5:22 AM Xin Long <lucien.xin@gmail.com> wrote:

> > > >

> > > > This patch is to let it always do CRC checksum in sctp_gso_segment()

> > > > by removing CRC flag from the dev features in gre_gso_segment() for

> > > > SCTP over GRE, just as it does in Commit 527beb8ef9c0 ("udp: support

> > > > sctp over udp in skb_udp_tunnel_segment") for SCTP over UDP.

> > > >

> > > > It could set csum/csum_start in GSO CB properly in sctp_gso_segment()

> > > > after that commit, so it would do checksum with gso_make_checksum()

> > > > in gre_gso_segment(), and Commit 622e32b7d4a6 ("net: gre: recompute

> > > > gre csum for sctp over gre tunnels") can be reverted now.

> > > >

> > > > Note that the current HWs like igb NIC can only handle the SCTP CRC

> > > > when it's in the outer packet, not in the inner packet like in this

> > > > case, so here it removes CRC flag from the dev features even when

> > > > need_csum is false.

> > >

> > > So the limitation in igb is not the hardware but the driver

> > > configuration. When I had coded things up I put in a limitation on the

> > > igb_tx_csum code that it would have to validate that the protocol we

> > > are requesting an SCTP CRC offload since it is a different calculation

> > > than a 1's complement checksum. Since igb doesn't support tunnels we

> > > limited that check to the outer headers.

> > Ah.. I see, thanks.

> > >

> > > We could probably enable this for tunnels as long as the tunnel isn't

> > > requesting an outer checksum offload from the driver.

> > I think in igb_tx_csum(), by checking skb->csum_not_inet would be enough

> > to validate that is a SCTP request:

> > -               if (((first->protocol == htons(ETH_P_IP)) &&

> > -                    (ip_hdr(skb)->protocol == IPPROTO_SCTP)) ||

> > -                   ((first->protocol == htons(ETH_P_IPV6)) &&

> > -                    igb_ipv6_csum_is_sctp(skb))) {

> > +               if (skb->csum_not_inet) {

> >                         type_tucmd = E1000_ADVTXD_TUCMD_L4T_SCTP;

> >                         break;

> >                 }

> >

>

> So if I may ask. Why go with something like csum_not_inet instead of

> specifying something like crc32_csum? I'm just wondering if there are

> any other non-1's complement checksums that we are dealing with?

I don't think there is, here is the thread of that patch:

https://lore.kernel.org/netdev/CALx6S36rem=OuN_At6qYA=se5cpuYM1N2R8efoaszvo8b8Tz5A@mail.gmail.com/

I'm writing GRE checksum, and trying to change csum_not_inet:1 to
csum_type:2, by doing the below, no bit hole occurs:

-       __u8                    csum_not_inet:1;
-       __u8                    dst_pending_confirm:1;
+       __u8                    csum_type:2;
 #ifdef CONFIG_IPV6_NDISC_NODETYPE
        __u8                    ndisc_nodetype:2;
 #endif
+       __u8                    dst_pending_confirm:1;

and in skb_csum_hwoffload_help():
 int skb_csum_hwoffload_help(struct sk_buff *skb,
                            const netdev_features_t features)
 {
-       if (unlikely(skb->csum_not_inet))
-               return !!(features & NETIF_F_SCTP_CRC) ? 0 :
-                       skb_crc32c_csum_help(skb);
+       if (likely(!skb->csum_type))
+               return !!(features & NETIF_F_CSUM_MASK) ? 0 :
skb_checksum_help(skb);
+
+       if (skb->csum_type == CSUM_T_GENERIC) {
+               return !!(features & NETIF_F_HW_CSUM) ? 0 :
skb_checksum_help(skb);
+       } else if (skb->csum_type == CSUM_T_SCTP_CRC) {
+               return !!(features & NETIF_F_SCTP_CRC) ? 0 :
skb_crc32c_csum_help(skb);
+       } else {
+               pr_warn("Wrong csum type: %d\n", skb->csum_type);
+               return 1;
+       }

then the driver fix will be:
        case offsetof(struct sctphdr, checksum):
                /* validate that this is actually an SCTP request */
-               if (((first->protocol == htons(ETH_P_IP)) &&
-                    (ip_hdr(skb)->protocol == IPPROTO_SCTP)) ||
-                   ((first->protocol == htons(ETH_P_IPV6)) &&
-                    igb_ipv6_csum_is_sctp(skb))) {
+               if (skb->csum_type == CSUM_T_SCTP_CRC) {
                        type_tucmd = E1000_ADVTXD_TUCMD_L4T_SCTP;
                        break;
                }

then the gre csum set will be:
+                               skb->csum_type = CSUM_T_GENERIC;
+                               skb->ip_summed = CHECKSUM_PARTIAL;
+                               skb->csum_start =
skb_transport_header(skb) - skb->head;
+                               skb->csum_offset = sizeof(*greh);

>

> One thing we might want to do to make eventual backporting for this

> easier would be to add an accessor inline function. Maybe something

> like a skb_csum_is_crc32() so that for older kernels the function

> could just be defined to return false since the csum_not_inet may not

> exist.

>

> > Otherwise, we will need to parse the packet a little bit, as it does in

> > hns3_get_l4_protocol().

>

> Agreed. If the csum_not_inet means it is a crc32 checksum then we

> could just look at the offsets and as long as they are correct for

> sctp we could just go forward with what we have.

>

> > >

> > > > v1->v2:

> > > >   - improve the changelog.

> > > >   - fix "rev xmas tree" in varibles declaration.

> > > >

> > > > Signed-off-by: Xin Long <lucien.xin@gmail.com>

> > > > ---

> > > >  net/ipv4/gre_offload.c | 15 ++++-----------

> > > >  1 file changed, 4 insertions(+), 11 deletions(-)

> > > >

> > > > diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c

> > > > index e0a2465..a681306 100644

> > > > --- a/net/ipv4/gre_offload.c

> > > > +++ b/net/ipv4/gre_offload.c

> > > > @@ -15,10 +15,10 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,

> > > >                                        netdev_features_t features)

> > > >  {

> > > >         int tnl_hlen = skb_inner_mac_header(skb) - skb_transport_header(skb);

> > > > -       bool need_csum, need_recompute_csum, gso_partial;

> > > >         struct sk_buff *segs = ERR_PTR(-EINVAL);

> > > >         u16 mac_offset = skb->mac_header;

> > > >         __be16 protocol = skb->protocol;

> > > > +       bool need_csum, gso_partial;

> > > >         u16 mac_len = skb->mac_len;

> > > >         int gre_offset, outer_hlen;

> > > >

> > > > @@ -41,10 +41,11 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,

> > > >         skb->protocol = skb->inner_protocol;

> > > >

> > > >         need_csum = !!(skb_shinfo(skb)->gso_type & SKB_GSO_GRE_CSUM);

> > > > -       need_recompute_csum = skb->csum_not_inet;

> > > >         skb->encap_hdr_csum = need_csum;

> > > >

> > > >         features &= skb->dev->hw_enc_features;

> > > > +       /* CRC checksum can't be handled by HW when SCTP is the inner proto. */

> > > > +       features &= ~NETIF_F_SCTP_CRC;

> > > >

> > > >         /* segment inner packet. */

> > > >         segs = skb_mac_gso_segment(skb, features);

> > >

> > > Do we have NICs that are advertising NETIF_S_SCTP_CRC as part of their

> > > hw_enc_features and then not supporting it? Based on your comment

> > Yes, igb/igbvf/igc/ixgbe/ixgbevf, they have a similar code of SCTP

> > proto validation.

>

> Yeah, it is old code. It was added in 4.6 before tunnels supported

> SCTP_CRC I am guessing. It looks like csum_not_inet wasn't added until

> 4.13. So it would probably be best to fix the drivers since the driver

> code is outdated.

>

> > > above it seems like you are masking this out because hardware is

> > > advertising features it doesn't actually support. I'm just wondering

> > > if that is the case or if this is something where this should be

> > > cleared if need_csum is set since we only support one level of

> > > checksum offload.

> > Since only these drivers only do SCTP proto validation, and "only

> > one level checksum offload" issue only exists when inner packet

> > is SCTP packet, clearing NETIF_F_SCTP_CRC should be enough.

> >

> > But seems to fix the drivers will be better, as hw_enc_features should

> > tell the correct features for inner proto. wdyt?

>

> Yes, it would be better to fix the drivers. However the one limitation

> is that this will only work when we don't have an outer checksum in

> place. If we have an outer checksum then we have to compute the crc32

> checksum and then offload the outer checksum if we can.

>

> > (Just note udp tunneling SCTP doesn't have this issue, as the outer

> >  udp checksum is always required by RFC)

But sctp over Vxlan/Geneve may still use noudpcsum, so need_csum
may still be false in there.

vxlan and geneve is not supporting fraglist, which sctp hw gso requires.
I will add NETIF_F_FRAGLIST flag for udp tunnel device in another patch.

Thanks.

>

> Thanks. I wasn't aware of that.

>

> > >

> > > > @@ -99,15 +100,7 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,

> > > >                 }

> > > >

> > > >                 *(pcsum + 1) = 0;

> > > > -               if (need_recompute_csum && !skb_is_gso(skb)) {

> > > > -                       __wsum csum;

> > > > -

> > > > -                       csum = skb_checksum(skb, gre_offset,

> > > > -                                           skb->len - gre_offset, 0);

> > > > -                       *pcsum = csum_fold(csum);

> > > > -               } else {

> > > > -                       *pcsum = gso_make_checksum(skb, 0);

> > > > -               }

> > > > +               *pcsum = gso_make_checksum(skb, 0);

> > > >         } while ((skb = skb->next));

> > > >  out:

> > > >         return segs;

> > > > --

> > > > 2.1.0

> > > >
Alexander Duyck Jan. 13, 2021, 9:55 p.m. UTC | #5
On Wed, Jan 13, 2021 at 1:46 AM Xin Long <lucien.xin@gmail.com> wrote:
>

> On Wed, Jan 13, 2021 at 10:11 AM Alexander Duyck

> <alexander.duyck@gmail.com> wrote:

> >

> > On Mon, Jan 11, 2021 at 9:14 PM Xin Long <lucien.xin@gmail.com> wrote:

> > >

> > > On Tue, Jan 12, 2021 at 12:48 AM Alexander Duyck

> > > <alexander.duyck@gmail.com> wrote:

> > > >

> > > > On Mon, Jan 11, 2021 at 5:22 AM Xin Long <lucien.xin@gmail.com> wrote:

> > > > >

> > > > > This patch is to let it always do CRC checksum in sctp_gso_segment()

> > > > > by removing CRC flag from the dev features in gre_gso_segment() for

> > > > > SCTP over GRE, just as it does in Commit 527beb8ef9c0 ("udp: support

> > > > > sctp over udp in skb_udp_tunnel_segment") for SCTP over UDP.

> > > > >

> > > > > It could set csum/csum_start in GSO CB properly in sctp_gso_segment()

> > > > > after that commit, so it would do checksum with gso_make_checksum()

> > > > > in gre_gso_segment(), and Commit 622e32b7d4a6 ("net: gre: recompute

> > > > > gre csum for sctp over gre tunnels") can be reverted now.

> > > > >

> > > > > Note that the current HWs like igb NIC can only handle the SCTP CRC

> > > > > when it's in the outer packet, not in the inner packet like in this

> > > > > case, so here it removes CRC flag from the dev features even when

> > > > > need_csum is false.

> > > >

> > > > So the limitation in igb is not the hardware but the driver

> > > > configuration. When I had coded things up I put in a limitation on the

> > > > igb_tx_csum code that it would have to validate that the protocol we

> > > > are requesting an SCTP CRC offload since it is a different calculation

> > > > than a 1's complement checksum. Since igb doesn't support tunnels we

> > > > limited that check to the outer headers.

> > > Ah.. I see, thanks.

> > > >

> > > > We could probably enable this for tunnels as long as the tunnel isn't

> > > > requesting an outer checksum offload from the driver.

> > > I think in igb_tx_csum(), by checking skb->csum_not_inet would be enough

> > > to validate that is a SCTP request:

> > > -               if (((first->protocol == htons(ETH_P_IP)) &&

> > > -                    (ip_hdr(skb)->protocol == IPPROTO_SCTP)) ||

> > > -                   ((first->protocol == htons(ETH_P_IPV6)) &&

> > > -                    igb_ipv6_csum_is_sctp(skb))) {

> > > +               if (skb->csum_not_inet) {

> > >                         type_tucmd = E1000_ADVTXD_TUCMD_L4T_SCTP;

> > >                         break;

> > >                 }

> > >

> >

> > So if I may ask. Why go with something like csum_not_inet instead of

> > specifying something like crc32_csum? I'm just wondering if there are

> > any other non-1's complement checksums that we are dealing with?

> I don't think there is, here is the thread of that patch:

>

> https://lore.kernel.org/netdev/CALx6S36rem=OuN_At6qYA=se5cpuYM1N2R8efoaszvo8b8Tz5A@mail.gmail.com/

>

> I'm writing GRE checksum, and trying to change csum_not_inet:1 to

> csum_type:2, by doing the below, no bit hole occurs:

>

> -       __u8                    csum_not_inet:1;

> -       __u8                    dst_pending_confirm:1;

> +       __u8                    csum_type:2;

>  #ifdef CONFIG_IPV6_NDISC_NODETYPE

>         __u8                    ndisc_nodetype:2;

>  #endif

> +       __u8                    dst_pending_confirm:1;

>


If we only have two types of checksum we probably don't need to add a
new bit. We can basically leave it as a single bit with 0 being the
inet style checksum, and 1 being a crc32. The main thing I would want
to be able to verify is that we are specifically talking about a crc32
checksum so that if the type of checksum and offset match then we can
go forward with the hardware offload. If you need to add additional
types you could probably add them later.

> and in skb_csum_hwoffload_help():

>  int skb_csum_hwoffload_help(struct sk_buff *skb,

>                             const netdev_features_t features)

>  {

> -       if (unlikely(skb->csum_not_inet))

> -               return !!(features & NETIF_F_SCTP_CRC) ? 0 :

> -                       skb_crc32c_csum_help(skb);


Based on this it would seem like csum_not_inet is more of a sctp_crc
specific flag.

> +       if (likely(!skb->csum_type))

> +               return !!(features & NETIF_F_CSUM_MASK) ? 0 :

> skb_checksum_help(skb);

> +

> +       if (skb->csum_type == CSUM_T_GENERIC) {

> +               return !!(features & NETIF_F_HW_CSUM) ? 0 :

> skb_checksum_help(skb);

> +       } else if (skb->csum_type == CSUM_T_SCTP_CRC) {

> +               return !!(features & NETIF_F_SCTP_CRC) ? 0 :

> skb_crc32c_csum_help(skb);

> +       } else {

> +               pr_warn("Wrong csum type: %d\n", skb->csum_type);

> +               return 1;

> +       }

>

> then the driver fix will be:

>         case offsetof(struct sctphdr, checksum):

>                 /* validate that this is actually an SCTP request */

> -               if (((first->protocol == htons(ETH_P_IP)) &&

> -                    (ip_hdr(skb)->protocol == IPPROTO_SCTP)) ||

> -                   ((first->protocol == htons(ETH_P_IPV6)) &&

> -                    igb_ipv6_csum_is_sctp(skb))) {

> +               if (skb->csum_type == CSUM_T_SCTP_CRC) {

>                         type_tucmd = E1000_ADVTXD_TUCMD_L4T_SCTP;

>                         break;

>                 }

>


So the one thing here is that I would prefer to avoid referencing
whatever our solution is directly. I would rather have a function that
performs this test. So what you end up with is something like:
static inline bool skb_csum_is_sctp(struct sk_buff *skb)
{
    return skb->csum_type == CSUM_T_SCTP_CRC;
}

Then it simplifies the driver code and will make backports easier as
they would just have to do something like:
    if (skb_csum_is_sctp(skb)) {
        type_tucmd = E1000_ADVTXD_TUCMD_L4T_SCTP;
        break;
    }

For older kernels the driver can do their own version of
skb_csum_is_sctp and still work correctly.


> then the gre csum set will be:

> +                               skb->csum_type = CSUM_T_GENERIC;

> +                               skb->ip_summed = CHECKSUM_PARTIAL;

> +                               skb->csum_start =

> skb_transport_header(skb) - skb->head;

> +                               skb->csum_offset = sizeof(*greh);

>

> >

> > One thing we might want to do to make eventual backporting for this

> > easier would be to add an accessor inline function. Maybe something

> > like a skb_csum_is_crc32() so that for older kernels the function

> > could just be defined to return false since the csum_not_inet may not

> > exist.

> >

> > > Otherwise, we will need to parse the packet a little bit, as it does in

> > > hns3_get_l4_protocol().

> >

> > Agreed. If the csum_not_inet means it is a crc32 checksum then we

> > could just look at the offsets and as long as they are correct for

> > sctp we could just go forward with what we have.

> >

> > > >

> > > > > v1->v2:

> > > > >   - improve the changelog.

> > > > >   - fix "rev xmas tree" in varibles declaration.

> > > > >

> > > > > Signed-off-by: Xin Long <lucien.xin@gmail.com>

> > > > > ---

> > > > >  net/ipv4/gre_offload.c | 15 ++++-----------

> > > > >  1 file changed, 4 insertions(+), 11 deletions(-)

> > > > >

> > > > > diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c

> > > > > index e0a2465..a681306 100644

> > > > > --- a/net/ipv4/gre_offload.c

> > > > > +++ b/net/ipv4/gre_offload.c

> > > > > @@ -15,10 +15,10 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,

> > > > >                                        netdev_features_t features)

> > > > >  {

> > > > >         int tnl_hlen = skb_inner_mac_header(skb) - skb_transport_header(skb);

> > > > > -       bool need_csum, need_recompute_csum, gso_partial;

> > > > >         struct sk_buff *segs = ERR_PTR(-EINVAL);

> > > > >         u16 mac_offset = skb->mac_header;

> > > > >         __be16 protocol = skb->protocol;

> > > > > +       bool need_csum, gso_partial;

> > > > >         u16 mac_len = skb->mac_len;

> > > > >         int gre_offset, outer_hlen;

> > > > >

> > > > > @@ -41,10 +41,11 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,

> > > > >         skb->protocol = skb->inner_protocol;

> > > > >

> > > > >         need_csum = !!(skb_shinfo(skb)->gso_type & SKB_GSO_GRE_CSUM);

> > > > > -       need_recompute_csum = skb->csum_not_inet;

> > > > >         skb->encap_hdr_csum = need_csum;

> > > > >

> > > > >         features &= skb->dev->hw_enc_features;

> > > > > +       /* CRC checksum can't be handled by HW when SCTP is the inner proto. */

> > > > > +       features &= ~NETIF_F_SCTP_CRC;

> > > > >

> > > > >         /* segment inner packet. */

> > > > >         segs = skb_mac_gso_segment(skb, features);

> > > >

> > > > Do we have NICs that are advertising NETIF_S_SCTP_CRC as part of their

> > > > hw_enc_features and then not supporting it? Based on your comment

> > > Yes, igb/igbvf/igc/ixgbe/ixgbevf, they have a similar code of SCTP

> > > proto validation.

> >

> > Yeah, it is old code. It was added in 4.6 before tunnels supported

> > SCTP_CRC I am guessing. It looks like csum_not_inet wasn't added until

> > 4.13. So it would probably be best to fix the drivers since the driver

> > code is outdated.

> >

> > > > above it seems like you are masking this out because hardware is

> > > > advertising features it doesn't actually support. I'm just wondering

> > > > if that is the case or if this is something where this should be

> > > > cleared if need_csum is set since we only support one level of

> > > > checksum offload.

> > > Since only these drivers only do SCTP proto validation, and "only

> > > one level checksum offload" issue only exists when inner packet

> > > is SCTP packet, clearing NETIF_F_SCTP_CRC should be enough.

> > >

> > > But seems to fix the drivers will be better, as hw_enc_features should

> > > tell the correct features for inner proto. wdyt?

> >

> > Yes, it would be better to fix the drivers. However the one limitation

> > is that this will only work when we don't have an outer checksum in

> > place. If we have an outer checksum then we have to compute the crc32

> > checksum and then offload the outer checksum if we can.

> >

> > > (Just note udp tunneling SCTP doesn't have this issue, as the outer

> > >  udp checksum is always required by RFC)

> But sctp over Vxlan/Geneve may still use noudpcsum, so need_csum

> may still be false in there.

>

> vxlan and geneve is not supporting fraglist, which sctp hw gso requires.

> I will add NETIF_F_FRAGLIST flag for udp tunnel device in another patch.

>

> Thanks.


Okay, so in those cases we still have SCTP over UDP without an outer
checksum then. I agree that it makes sense to allow offloading the
SCTP CRC32 in the noudpcsum case since we should only have one item to
offload.
diff mbox series

Patch

diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index e0a2465..a681306 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -15,10 +15,10 @@  static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
 				       netdev_features_t features)
 {
 	int tnl_hlen = skb_inner_mac_header(skb) - skb_transport_header(skb);
-	bool need_csum, need_recompute_csum, gso_partial;
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	u16 mac_offset = skb->mac_header;
 	__be16 protocol = skb->protocol;
+	bool need_csum, gso_partial;
 	u16 mac_len = skb->mac_len;
 	int gre_offset, outer_hlen;
 
@@ -41,10 +41,11 @@  static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
 	skb->protocol = skb->inner_protocol;
 
 	need_csum = !!(skb_shinfo(skb)->gso_type & SKB_GSO_GRE_CSUM);
-	need_recompute_csum = skb->csum_not_inet;
 	skb->encap_hdr_csum = need_csum;
 
 	features &= skb->dev->hw_enc_features;
+	/* CRC checksum can't be handled by HW when SCTP is the inner proto. */
+	features &= ~NETIF_F_SCTP_CRC;
 
 	/* segment inner packet. */
 	segs = skb_mac_gso_segment(skb, features);
@@ -99,15 +100,7 @@  static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
 		}
 
 		*(pcsum + 1) = 0;
-		if (need_recompute_csum && !skb_is_gso(skb)) {
-			__wsum csum;
-
-			csum = skb_checksum(skb, gre_offset,
-					    skb->len - gre_offset, 0);
-			*pcsum = csum_fold(csum);
-		} else {
-			*pcsum = gso_make_checksum(skb, 0);
-		}
+		*pcsum = gso_make_checksum(skb, 0);
 	} while ((skb = skb->next));
 out:
 	return segs;