diff mbox series

[net,v2] ip_gre: set dev->hard_header_len and dev->needed_headroom properly

Message ID 20201011191129.991-1-xiyou.wangcong@gmail.com
State New
Headers show
Series [net,v2] ip_gre: set dev->hard_header_len and dev->needed_headroom properly | expand

Commit Message

Cong Wang Oct. 11, 2020, 7:11 p.m. UTC
GRE tunnel has its own header_ops, ipgre_header_ops, and sets it
conditionally. When it is set, it assumes the outer IP header is
already created before ipgre_xmit().

This is not true when we send packets through a raw packet socket,
where L2 headers are supposed to be constructed by user. Packet
socket calls dev_validate_header() to validate the header. But
GRE tunnel does not set dev->hard_header_len, so that check can
be simply bypassed, therefore uninit memory could be passed down
to ipgre_xmit(). Similar for dev->needed_headroom.

dev->hard_header_len is supposed to be the length of the header
created by dev->header_ops->create(), so it should be used whenever
header_ops is set, and dev->needed_headroom should be used when it
is not set.

Reported-and-tested-by: syzbot+4a2c52677a8a1aa283cb@syzkaller.appspotmail.com
Cc: Xie He <xie.he.0141@gmail.com>
Cc: William Tu <u9012063@gmail.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
Note, there are some other suspicious use of dev->hard_header_len in
the same file, but let's leave them to a separate patch if really
needed.

 net/ipv4/ip_gre.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Comments

Xie He Oct. 11, 2020, 8:32 p.m. UTC | #1
On Sun, Oct 11, 2020 at 12:11 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>

> GRE tunnel has its own header_ops, ipgre_header_ops, and sets it

> conditionally. When it is set, it assumes the outer IP header is

> already created before ipgre_xmit().

>

> This is not true when we send packets through a raw packet socket,

> where L2 headers are supposed to be constructed by user. Packet

> socket calls dev_validate_header() to validate the header. But

> GRE tunnel does not set dev->hard_header_len, so that check can

> be simply bypassed, therefore uninit memory could be passed down

> to ipgre_xmit(). Similar for dev->needed_headroom.

>

> dev->hard_header_len is supposed to be the length of the header

> created by dev->header_ops->create(), so it should be used whenever

> header_ops is set, and dev->needed_headroom should be used when it

> is not set.


Hi, thanks for attempting to fix this tunnel. Are we still considering
removing header_ops->create?

As said in my email sent previously today, I want to remove
header_ops->create because 1) this keeps the un-exposed headers of GRE
devices consistent with those of GRETAP devices, and 2) I think the
GRE header (and the headers before the GRE header) is not actually the
L2 header of the tunnel (the Wikipedia page for "Generic Routing
Encapsulation" doesn't consider this protocol to be at L2 either).

I'm not sure if you still agree to remove header_ops->create. Do you
still agree but think it'd be better to do that in a separate patch?

Removing header_ops->create would simplify the fixing of the issue you
are trying to fix, too, because that way we would no longer need to
use header_ops or hard_header_len. Also, I'm worried that changing
hard_header_len (or needed_headroom) in ipgre_link_update would have
racing issues. If we remove header_ops, we no longer need to use
hard_header_len and we can just set needed_headroom to the maximum
value, so that we no longer need to update them in ipgre_link_update.
Willem de Bruijn Oct. 11, 2020, 9 p.m. UTC | #2
On Sun, Oct 11, 2020 at 3:11 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> GRE tunnel has its own header_ops, ipgre_header_ops, and sets it
> conditionally. When it is set, it assumes the outer IP header is
> already created before ipgre_xmit().
>
> This is not true when we send packets through a raw packet socket,
> where L2 headers are supposed to be constructed by user. Packet
> socket calls dev_validate_header() to validate the header. But
> GRE tunnel does not set dev->hard_header_len, so that check can
> be simply bypassed, therefore uninit memory could be passed down
> to ipgre_xmit(). Similar for dev->needed_headroom.
>
> dev->hard_header_len is supposed to be the length of the header
> created by dev->header_ops->create(), so it should be used whenever
> header_ops is set, and dev->needed_headroom should be used when it
> is not set.
>
> Reported-and-tested-by: syzbot+4a2c52677a8a1aa283cb@syzkaller.appspotmail.com
> Cc: Xie He <xie.he.0141@gmail.com>
> Cc: William Tu <u9012063@gmail.com>
> Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
> Note, there are some other suspicious use of dev->hard_header_len in
> the same file, but let's leave them to a separate patch if really
> needed.

There is agreement that hard_header_len should be the length of link
layer headers visible to the upper layers, needed_headroom the
additional room required for headers that are not exposed, i.e., those
pushed inside ndo_start_xmit.

The link layer header length also has to agree with the interface
hardware type (ARPHRD_..).

Tunnel devices have not always been consistent in this, but today
"bare" ip tunnel devices without additional headers (ipip, sit, ..) do
match this and advertise 0 byte hard_header_len. Bareudp, vxlan and
geneve also conform to this. Known exception that probably needs to be
addressed is sit, which still advertises LL_MAX_HEADER and so has
exposed quite a few syzkaller issues. Side note, it is not entirely
clear to me what sets ARPHRD_TUNNEL et al apart from ARPHRD_NONE and
why they are needed.

GRE devices advertise ARPHRD_IPGRE and GRETAP advertise ARPHRD_ETHER.
The second makes sense, as it appears as an Ethernet device. The first
should match "bare" ip tunnel devices, if following the above logic.
Indeed, this is what commit e271c7b4420d ("gre: do not keep the GRE
header around in collect medata mode") implements. It changes
dev->type to ARPHRD_NONE in collect_md mode.

Some of the inconsistency comes from the various modes of the GRE
driver. Which brings us to ipgre_header_ops. It is set only in two
special cases.

Commit 6a5f44d7a048 ("[IPV4] ip_gre: sendto/recvfrom NBMA address")
added ipgre_header_ops.parse to be able to receive the inner ip source
address with PF_PACKET recvfrom. And apparently relies on
ipgre_header_ops.create to be able to set an address, which implies
SOCK_DGRAM.

The other special case, CONFIG_NET_IPGRE_BROADCAST, predates git. Its
implementation starts with the beautiful comment "/* Nice toy.
Unfortunately, useless in real life :-)". From the rest of that
detailed comment, it is not clear to me why it would need to expose
the headers. The example does not use packet sockets.

A packet socket cannot know devices details such as which configurable
mode a device may be in. And different modes conflict with the basic
rule that for a given well defined link layer type, i.e., dev->type,
header length can be expected to be consistent. In an ideal world
these exceptions would not exist, therefore.

Unfortunately, this is legacy behavior that will have to continue to
be supported.

I agree that then swapping dev->needed_headroom for
dev->hard_header_len at init is then a good approach.

Underlying functions like ip_tunnel_xmit can modify needed_headroom at
runtime, not sure how that affects runtime changes in
ipgre_link_update:

"
        max_headroom = LL_RESERVED_SPACE(rt->dst.dev) + sizeof(struct iphdr)
                        + rt->dst.header_len + ip_encap_hlen(&tunnel->encap);
        if (max_headroom > dev->needed_headroom)
                dev->needed_headroom = max_headroom;
"

>  net/ipv4/ip_gre.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index 4e31f23e4117..82fee0010353 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -626,8 +626,7 @@ static netdev_tx_t ipgre_xmit(struct sk_buff *skb,
>
>         if (dev->header_ops) {
>                 /* Need space for new headers */
> -               if (skb_cow_head(skb, dev->needed_headroom -
> -                                     (tunnel->hlen + sizeof(struct iphdr))))
> +               if (skb_cow_head(skb, dev->hard_header_len))
>                         goto free_skb;
>
>                 tnl_params = (const struct iphdr *)skb->data;
> @@ -748,7 +747,11 @@ static void ipgre_link_update(struct net_device *dev, bool set_mtu)
>         len = tunnel->tun_hlen - len;
>         tunnel->hlen = tunnel->hlen + len;
>
> -       dev->needed_headroom = dev->needed_headroom + len;
> +       if (dev->header_ops)
> +               dev->hard_header_len += len;
> +       else
> +               dev->needed_headroom += len;
> +
>         if (set_mtu)
>                 dev->mtu = max_t(int, dev->mtu - len, 68);
>
> @@ -944,6 +947,7 @@ static void __gre_tunnel_init(struct net_device *dev)
>         tunnel->parms.iph.protocol = IPPROTO_GRE;
>
>         tunnel->hlen = tunnel->tun_hlen + tunnel->encap_hlen;
> +       dev->needed_headroom = tunnel->hlen + sizeof(tunnel->parms.iph);
>
>         dev->features           |= GRE_FEATURES;
>         dev->hw_features        |= GRE_FEATURES;
> @@ -987,10 +991,14 @@ static int ipgre_tunnel_init(struct net_device *dev)
>                                 return -EINVAL;
>                         dev->flags = IFF_BROADCAST;
>                         dev->header_ops = &ipgre_header_ops;
> +                       dev->hard_header_len = tunnel->hlen + sizeof(*iph);
> +                       dev->needed_headroom = 0;

here and below, perhaps more descriptive of what is being done, something like:

/* If device has header_ops.create, then headers are part of hard_header_len */
swap(dev->needed_headroom, dev->hard_header_len)

>                 }
>  #endif
>         } else if (!tunnel->collect_md) {
>                 dev->header_ops = &ipgre_header_ops;
> +               dev->hard_header_len = tunnel->hlen + sizeof(*iph);
> +               dev->needed_headroom = 0;
>         }
>
>         return ip_tunnel_init(dev);
> --
> 2.28.0
>
Willem de Bruijn Oct. 11, 2020, 9:06 p.m. UTC | #3
On Sun, Oct 11, 2020 at 4:42 PM Xie He <xie.he.0141@gmail.com> wrote:
>

> On Sun, Oct 11, 2020 at 12:11 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:

> >

> > GRE tunnel has its own header_ops, ipgre_header_ops, and sets it

> > conditionally. When it is set, it assumes the outer IP header is

> > already created before ipgre_xmit().

> >

> > This is not true when we send packets through a raw packet socket,

> > where L2 headers are supposed to be constructed by user. Packet

> > socket calls dev_validate_header() to validate the header. But

> > GRE tunnel does not set dev->hard_header_len, so that check can

> > be simply bypassed, therefore uninit memory could be passed down

> > to ipgre_xmit(). Similar for dev->needed_headroom.

> >

> > dev->hard_header_len is supposed to be the length of the header

> > created by dev->header_ops->create(), so it should be used whenever

> > header_ops is set, and dev->needed_headroom should be used when it

> > is not set.

>

> Hi, thanks for attempting to fix this tunnel. Are we still considering

> removing header_ops->create?

>

> As said in my email sent previously today, I want to remove

> header_ops->create because 1) this keeps the un-exposed headers of GRE

> devices consistent with those of GRETAP devices, and 2) I think the

> GRE header (and the headers before the GRE header) is not actually the

> L2 header of the tunnel (the Wikipedia page for "Generic Routing

> Encapsulation" doesn't consider this protocol to be at L2 either).

>

> I'm not sure if you still agree to remove header_ops->create. Do you

> still agree but think it'd be better to do that in a separate patch?

>

> Removing header_ops->create would simplify the fixing of the issue you

> are trying to fix, too, because that way we would no longer need to

> use header_ops or hard_header_len. Also, I'm worried that changing

> hard_header_len (or needed_headroom) in ipgre_link_update would have

> racing issues. If we remove header_ops, we no longer need to use

> hard_header_len and we can just set needed_headroom to the maximum

> value, so that we no longer need to update them in ipgre_link_update.


Our messages crossed.

It seems there are legacy expectations that sendto/recvfrom packet
sockets allow writing/reading the outer IP address, as of commit
6a5f44d7a048 ("[IPV4] ip_gre: sendto/recvfrom NBMA address"). That is
the express purpose of that commit.

The behavior is inconsistent with other tunnels, as you also point
out, and probably only rarely used if at all. I would love to get rid
of it, but given that we cannot be certain that it is unused, I'm afraid
that we have to continue to support this special case.
Willem de Bruijn Oct. 11, 2020, 9:25 p.m. UTC | #4
On Sun, Oct 11, 2020 at 5:00 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>

> On Sun, Oct 11, 2020 at 3:11 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:

> >

> > GRE tunnel has its own header_ops, ipgre_header_ops, and sets it

> > conditionally. When it is set, it assumes the outer IP header is

> > already created before ipgre_xmit().

> >

> > This is not true when we send packets through a raw packet socket,

> > where L2 headers are supposed to be constructed by user. Packet

> > socket calls dev_validate_header() to validate the header. But

> > GRE tunnel does not set dev->hard_header_len, so that check can

> > be simply bypassed, therefore uninit memory could be passed down

> > to ipgre_xmit(). Similar for dev->needed_headroom.

> >

> > dev->hard_header_len is supposed to be the length of the header

> > created by dev->header_ops->create(), so it should be used whenever

> > header_ops is set, and dev->needed_headroom should be used when it

> > is not set.

> >

> > Reported-and-tested-by: syzbot+4a2c52677a8a1aa283cb@syzkaller.appspotmail.com

> > Cc: Xie He <xie.he.0141@gmail.com>

> > Cc: William Tu <u9012063@gmail.com>

> > Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>

> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>



> I agree that then swapping dev->needed_headroom for

> dev->hard_header_len at init is then a good approach.

>

> Underlying functions like ip_tunnel_xmit can modify needed_headroom at

> runtime, not sure how that affects runtime changes in

> ipgre_link_update:


Never mind, the patch only selects between needed_headroom or
hard_header_link at ipgre_tunnel_init.

> > @@ -987,10 +991,14 @@ static int ipgre_tunnel_init(struct net_device *dev)

> >                                 return -EINVAL;

> >                         dev->flags = IFF_BROADCAST;

> >                         dev->header_ops = &ipgre_header_ops;

> > +                       dev->hard_header_len = tunnel->hlen + sizeof(*iph);

> > +                       dev->needed_headroom = 0;

>

> here and below, perhaps more descriptive of what is being done, something like:

>

> /* If device has header_ops.create, then headers are part of hard_header_len */

> swap(dev->needed_headroom, dev->hard_header_len)


Arguably nitpicking. Otherwise, patch looks great to me, thanks, so

Acked-by: Willem de Bruijn <willemb@google.com>
Xie He Oct. 11, 2020, 9:50 p.m. UTC | #5
On Sun, Oct 11, 2020 at 2:07 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Sun, Oct 11, 2020 at 4:42 PM Xie He <xie.he.0141@gmail.com> wrote:
> >
> > Hi, thanks for attempting to fix this tunnel. Are we still considering
> > removing header_ops->create?
> >
> > As said in my email sent previously today, I want to remove
> > header_ops->create because 1) this keeps the un-exposed headers of GRE
> > devices consistent with those of GRETAP devices, and 2) I think the
> > GRE header (and the headers before the GRE header) is not actually the
> > L2 header of the tunnel (the Wikipedia page for "Generic Routing
> > Encapsulation" doesn't consider this protocol to be at L2 either).
> >
> > I'm not sure if you still agree to remove header_ops->create. Do you
> > still agree but think it'd be better to do that in a separate patch?
> >
> > Removing header_ops->create would simplify the fixing of the issue you
> > are trying to fix, too, because that way we would no longer need to
> > use header_ops or hard_header_len. Also, I'm worried that changing
> > hard_header_len (or needed_headroom) in ipgre_link_update would have
> > racing issues. If we remove header_ops, we no longer need to use
> > hard_header_len and we can just set needed_headroom to the maximum
> > value, so that we no longer need to update them in ipgre_link_update.
>
> Our messages crossed.
>
> It seems there are legacy expectations that sendto/recvfrom packet
> sockets allow writing/reading the outer IP address, as of commit
> 6a5f44d7a048 ("[IPV4] ip_gre: sendto/recvfrom NBMA address"). That is
> the express purpose of that commit.
>
> The behavior is inconsistent with other tunnels, as you also point
> out, and probably only rarely used if at all. I would love to get rid
> of it, but given that we cannot be certain that it is unused, I'm afraid
> that we have to continue to support this special case.

OK. At least we agree that in an ideal world header_ops for GRE
devices should be removed.

Thanks, Willem.
Xie He Oct. 11, 2020, 10:03 p.m. UTC | #6
On Sun, Oct 11, 2020 at 2:01 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>

> There is agreement that hard_header_len should be the length of link

> layer headers visible to the upper layers, needed_headroom the

> additional room required for headers that are not exposed, i.e., those

> pushed inside ndo_start_xmit.

>

> The link layer header length also has to agree with the interface

> hardware type (ARPHRD_..).

>

> Tunnel devices have not always been consistent in this, but today

> "bare" ip tunnel devices without additional headers (ipip, sit, ..) do

> match this and advertise 0 byte hard_header_len. Bareudp, vxlan and

> geneve also conform to this. Known exception that probably needs to be

> addressed is sit, which still advertises LL_MAX_HEADER and so has

> exposed quite a few syzkaller issues. Side note, it is not entirely

> clear to me what sets ARPHRD_TUNNEL et al apart from ARPHRD_NONE and

> why they are needed.

>

> GRE devices advertise ARPHRD_IPGRE and GRETAP advertise ARPHRD_ETHER.

> The second makes sense, as it appears as an Ethernet device. The first

> should match "bare" ip tunnel devices, if following the above logic.

> Indeed, this is what commit e271c7b4420d ("gre: do not keep the GRE

> header around in collect medata mode") implements. It changes

> dev->type to ARPHRD_NONE in collect_md mode.

>

> Some of the inconsistency comes from the various modes of the GRE

> driver. Which brings us to ipgre_header_ops. It is set only in two

> special cases.

>

> Commit 6a5f44d7a048 ("[IPV4] ip_gre: sendto/recvfrom NBMA address")

> added ipgre_header_ops.parse to be able to receive the inner ip source

> address with PF_PACKET recvfrom. And apparently relies on

> ipgre_header_ops.create to be able to set an address, which implies

> SOCK_DGRAM.

>

> The other special case, CONFIG_NET_IPGRE_BROADCAST, predates git. Its

> implementation starts with the beautiful comment "/* Nice toy.

> Unfortunately, useless in real life :-)". From the rest of that

> detailed comment, it is not clear to me why it would need to expose

> the headers. The example does not use packet sockets.

>

> A packet socket cannot know devices details such as which configurable

> mode a device may be in. And different modes conflict with the basic

> rule that for a given well defined link layer type, i.e., dev->type,

> header length can be expected to be consistent. In an ideal world

> these exceptions would not exist, therefore.


Nice explanation of the situation. I agree with you.

Thanks!
Xie He Oct. 11, 2020, 10:45 p.m. UTC | #7
On Sun, Oct 11, 2020 at 12:11 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> @@ -626,8 +626,7 @@ static netdev_tx_t ipgre_xmit(struct sk_buff *skb,
>
>         if (dev->header_ops) {
>                 /* Need space for new headers */
> -               if (skb_cow_head(skb, dev->needed_headroom -
> -                                     (tunnel->hlen + sizeof(struct iphdr))))
> +               if (skb_cow_head(skb, dev->hard_header_len))
>                         goto free_skb;
>
>                 tnl_params = (const struct iphdr *)skb->data;

As I understand, the skb_cow functions are for ensuring enough header
space before skb->data. (Right?) However, at this stage our skb->data
is already at the outer IP header, I think we don't need to request
additional header space before the outer IP header.
Cong Wang Oct. 12, 2020, 10:26 p.m. UTC | #8
On Sun, Oct 11, 2020 at 3:46 PM Xie He <xie.he.0141@gmail.com> wrote:
>

> On Sun, Oct 11, 2020 at 12:11 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:

> >

> > @@ -626,8 +626,7 @@ static netdev_tx_t ipgre_xmit(struct sk_buff *skb,

> >

> >         if (dev->header_ops) {

> >                 /* Need space for new headers */

> > -               if (skb_cow_head(skb, dev->needed_headroom -

> > -                                     (tunnel->hlen + sizeof(struct iphdr))))

> > +               if (skb_cow_head(skb, dev->hard_header_len))

> >                         goto free_skb;

> >

> >                 tnl_params = (const struct iphdr *)skb->data;

>

> As I understand, the skb_cow functions are for ensuring enough header

> space before skb->data. (Right?) However, at this stage our skb->data

> is already at the outer IP header, I think we don't need to request

> additional header space before the outer IP header.


Good point, I thought skb_headroom() == dev->hard_header_len,
but skb->data already points to the tunnel header like you said, so
we should pass 0 to skb_cow_head() here.

Thanks.
Xie He Oct. 14, 2020, 8:51 a.m. UTC | #9
On Sun, Oct 11, 2020 at 2:01 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> There is agreement that hard_header_len should be the length of link
> layer headers visible to the upper layers, needed_headroom the
> additional room required for headers that are not exposed, i.e., those
> pushed inside ndo_start_xmit.
>
> The link layer header length also has to agree with the interface
> hardware type (ARPHRD_..).
>
> Tunnel devices have not always been consistent in this, but today
> "bare" ip tunnel devices without additional headers (ipip, sit, ..) do
> match this and advertise 0 byte hard_header_len. Bareudp, vxlan and
> geneve also conform to this. Known exception that probably needs to be
> addressed is sit, which still advertises LL_MAX_HEADER and so has
> exposed quite a few syzkaller issues. Side note, it is not entirely
> clear to me what sets ARPHRD_TUNNEL et al apart from ARPHRD_NONE and
> why they are needed.
>
> GRE devices advertise ARPHRD_IPGRE and GRETAP advertise ARPHRD_ETHER.
> The second makes sense, as it appears as an Ethernet device. The first
> should match "bare" ip tunnel devices, if following the above logic.
> Indeed, this is what commit e271c7b4420d ("gre: do not keep the GRE
> header around in collect medata mode") implements. It changes
> dev->type to ARPHRD_NONE in collect_md mode.
>
> Some of the inconsistency comes from the various modes of the GRE
> driver. Which brings us to ipgre_header_ops. It is set only in two
> special cases.
>
> Commit 6a5f44d7a048 ("[IPV4] ip_gre: sendto/recvfrom NBMA address")
> added ipgre_header_ops.parse to be able to receive the inner ip source
> address with PF_PACKET recvfrom. And apparently relies on
> ipgre_header_ops.create to be able to set an address, which implies
> SOCK_DGRAM.
>
> The other special case, CONFIG_NET_IPGRE_BROADCAST, predates git. Its
> implementation starts with the beautiful comment "/* Nice toy.
> Unfortunately, useless in real life :-)". From the rest of that
> detailed comment, it is not clear to me why it would need to expose
> the headers. The example does not use packet sockets.
>
> A packet socket cannot know devices details such as which configurable
> mode a device may be in. And different modes conflict with the basic
> rule that for a given well defined link layer type, i.e., dev->type,
> header length can be expected to be consistent. In an ideal world
> these exceptions would not exist, therefore.
>
> Unfortunately, this is legacy behavior that will have to continue to
> be supported.

Thanks for your explanation. So header_ops for GRE devices is only
used in 2 special situations. In normal situations, header_ops is not
used for GRE devices. And we consider not using header_ops should be
the ideal arrangement for GRE devices.

Can we create a new dev->type (like ARPHRD_IPGRE_SPECIAL) for GRE
devices that use header_ops? I guess changing dev->type will not
affect the interface to the user space? This way we can solve the
problem of the same dev->type having different hard_header_len values.

Also, for the second special situation, if there's no obvious reason
to use header_ops, maybe we can consider removing header_ops for this
situation.
Willem de Bruijn Oct. 14, 2020, 3:12 p.m. UTC | #10
On Wed, Oct 14, 2020 at 4:52 AM Xie He <xie.he.0141@gmail.com> wrote:
>

> On Sun, Oct 11, 2020 at 2:01 PM Willem de Bruijn

> <willemdebruijn.kernel@gmail.com> wrote:

> >

> > There is agreement that hard_header_len should be the length of link

> > layer headers visible to the upper layers, needed_headroom the

> > additional room required for headers that are not exposed, i.e., those

> > pushed inside ndo_start_xmit.

> >

> > The link layer header length also has to agree with the interface

> > hardware type (ARPHRD_..).

> >

> > Tunnel devices have not always been consistent in this, but today

> > "bare" ip tunnel devices without additional headers (ipip, sit, ..) do

> > match this and advertise 0 byte hard_header_len. Bareudp, vxlan and

> > geneve also conform to this. Known exception that probably needs to be

> > addressed is sit, which still advertises LL_MAX_HEADER and so has

> > exposed quite a few syzkaller issues. Side note, it is not entirely

> > clear to me what sets ARPHRD_TUNNEL et al apart from ARPHRD_NONE and

> > why they are needed.

> >

> > GRE devices advertise ARPHRD_IPGRE and GRETAP advertise ARPHRD_ETHER.

> > The second makes sense, as it appears as an Ethernet device. The first

> > should match "bare" ip tunnel devices, if following the above logic.

> > Indeed, this is what commit e271c7b4420d ("gre: do not keep the GRE

> > header around in collect medata mode") implements. It changes

> > dev->type to ARPHRD_NONE in collect_md mode.

> >

> > Some of the inconsistency comes from the various modes of the GRE

> > driver. Which brings us to ipgre_header_ops. It is set only in two

> > special cases.

> >

> > Commit 6a5f44d7a048 ("[IPV4] ip_gre: sendto/recvfrom NBMA address")

> > added ipgre_header_ops.parse to be able to receive the inner ip source

> > address with PF_PACKET recvfrom. And apparently relies on

> > ipgre_header_ops.create to be able to set an address, which implies

> > SOCK_DGRAM.

> >

> > The other special case, CONFIG_NET_IPGRE_BROADCAST, predates git. Its

> > implementation starts with the beautiful comment "/* Nice toy.

> > Unfortunately, useless in real life :-)". From the rest of that

> > detailed comment, it is not clear to me why it would need to expose

> > the headers. The example does not use packet sockets.

> >

> > A packet socket cannot know devices details such as which configurable

> > mode a device may be in. And different modes conflict with the basic

> > rule that for a given well defined link layer type, i.e., dev->type,

> > header length can be expected to be consistent. In an ideal world

> > these exceptions would not exist, therefore.

> >

> > Unfortunately, this is legacy behavior that will have to continue to

> > be supported.

>

> Thanks for your explanation. So header_ops for GRE devices is only

> used in 2 special situations. In normal situations, header_ops is not

> used for GRE devices. And we consider not using header_ops should be

> the ideal arrangement for GRE devices.

>

> Can we create a new dev->type (like ARPHRD_IPGRE_SPECIAL) for GRE

> devices that use header_ops? I guess changing dev->type will not

> affect the interface to the user space? This way we can solve the

> problem of the same dev->type having different hard_header_len values.


But does that address any real issue?

If anything, it would make sense to keep ARHPHRD_IPGRE for tunnels
that expect headers and switch to ARPHRD_NONE for those that do not.
As the collect_md commit I mentioned above does.

> Also, for the second special situation, if there's no obvious reason

> to use header_ops, maybe we can consider removing header_ops for this

> situation.


Unfortunately, there's no knowing if some application is using this
broadcast mode *with* a process using packet sockets.
Xie He Oct. 14, 2020, 7:47 p.m. UTC | #11
On Wed, Oct 14, 2020 at 8:12 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Wed, Oct 14, 2020 at 4:52 AM Xie He <xie.he.0141@gmail.com> wrote:
> >
> > On Sun, Oct 11, 2020 at 2:01 PM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > There is agreement that hard_header_len should be the length of link
> > > layer headers visible to the upper layers, needed_headroom the
> > > additional room required for headers that are not exposed, i.e., those
> > > pushed inside ndo_start_xmit.
> > >
> > > The link layer header length also has to agree with the interface
> > > hardware type (ARPHRD_..).
> > >
> > > Tunnel devices have not always been consistent in this, but today
> > > "bare" ip tunnel devices without additional headers (ipip, sit, ..) do
> > > match this and advertise 0 byte hard_header_len. Bareudp, vxlan and
> > > geneve also conform to this. Known exception that probably needs to be
> > > addressed is sit, which still advertises LL_MAX_HEADER and so has
> > > exposed quite a few syzkaller issues. Side note, it is not entirely
> > > clear to me what sets ARPHRD_TUNNEL et al apart from ARPHRD_NONE and
> > > why they are needed.
> > >
> > > GRE devices advertise ARPHRD_IPGRE and GRETAP advertise ARPHRD_ETHER.
> > > The second makes sense, as it appears as an Ethernet device. The first
> > > should match "bare" ip tunnel devices, if following the above logic.
> > > Indeed, this is what commit e271c7b4420d ("gre: do not keep the GRE
> > > header around in collect medata mode") implements. It changes
> > > dev->type to ARPHRD_NONE in collect_md mode.
> > >
> > > Some of the inconsistency comes from the various modes of the GRE
> > > driver. Which brings us to ipgre_header_ops. It is set only in two
> > > special cases.
> > >
> > > Commit 6a5f44d7a048 ("[IPV4] ip_gre: sendto/recvfrom NBMA address")
> > > added ipgre_header_ops.parse to be able to receive the inner ip source
> > > address with PF_PACKET recvfrom. And apparently relies on
> > > ipgre_header_ops.create to be able to set an address, which implies
> > > SOCK_DGRAM.
> > >
> > > The other special case, CONFIG_NET_IPGRE_BROADCAST, predates git. Its
> > > implementation starts with the beautiful comment "/* Nice toy.
> > > Unfortunately, useless in real life :-)". From the rest of that
> > > detailed comment, it is not clear to me why it would need to expose
> > > the headers. The example does not use packet sockets.
> > >
> > > A packet socket cannot know devices details such as which configurable
> > > mode a device may be in. And different modes conflict with the basic
> > > rule that for a given well defined link layer type, i.e., dev->type,
> > > header length can be expected to be consistent. In an ideal world
> > > these exceptions would not exist, therefore.
> > >
> > > Unfortunately, this is legacy behavior that will have to continue to
> > > be supported.
> >
> > Thanks for your explanation. So header_ops for GRE devices is only
> > used in 2 special situations. In normal situations, header_ops is not
> > used for GRE devices. And we consider not using header_ops should be
> > the ideal arrangement for GRE devices.
> >
> > Can we create a new dev->type (like ARPHRD_IPGRE_SPECIAL) for GRE
> > devices that use header_ops? I guess changing dev->type will not
> > affect the interface to the user space? This way we can solve the
> > problem of the same dev->type having different hard_header_len values.
>
> But does that address any real issue?

It doesn't address any issue visible when using. Just to solve the
problem of the same dev->type having different hard_header_len values
which you mentioned. Making this change will not affect the user in
any way. So I think it is valuable to make this change.

> If anything, it would make sense to keep ARHPHRD_IPGRE for tunnels
> that expect headers and switch to ARPHRD_NONE for those that do not.
> As the collect_md commit I mentioned above does.

I thought we agreed that ideally GRE devices would not have
header_ops. Currently GRE devices (in normal situations) indeed do not
use header_ops (and use ARHPHRD_IPGRE as dev->type). I think we should
keep this behavior.

To solve the problem of the same dev->type having different
hard_header_len values which you mentioned. I think we should create a
new dev->type (ARPHRD_IPGRE_SPECIAL) for GRE devices that use
header_ops.

Also, for collect_md, I think we should use ARHPHRD_IPGRE. I see no
reason to use ARPHRD_NONE.

> > Also, for the second special situation, if there's no obvious reason
> > to use header_ops, maybe we can consider removing header_ops for this
> > situation.
>
> Unfortunately, there's no knowing if some application is using this
> broadcast mode *with* a process using packet sockets.

We can't always keep the interface to the user space unchanged when
fixing problems. When we fix drivers by adding hard_header_len or
removing hard_header_len, we ARE changing the interface. I did these
fixes a lot. I also changed skb->protocol when sending skbs for some
drivers, which in fact was also changing the interface. It is not
possible to keep the interface strictly unchanged, otherwise a lot of
problems will be impossible to fix.
Willem de Bruijn Oct. 14, 2020, 8:19 p.m. UTC | #12
On Wed, Oct 14, 2020 at 3:48 PM Xie He <xie.he.0141@gmail.com> wrote:
>
> On Wed, Oct 14, 2020 at 8:12 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Wed, Oct 14, 2020 at 4:52 AM Xie He <xie.he.0141@gmail.com> wrote:
> > >
> > > On Sun, Oct 11, 2020 at 2:01 PM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > > >
> > > > There is agreement that hard_header_len should be the length of link
> > > > layer headers visible to the upper layers, needed_headroom the
> > > > additional room required for headers that are not exposed, i.e., those
> > > > pushed inside ndo_start_xmit.
> > > >
> > > > The link layer header length also has to agree with the interface
> > > > hardware type (ARPHRD_..).
> > > >
> > > > Tunnel devices have not always been consistent in this, but today
> > > > "bare" ip tunnel devices without additional headers (ipip, sit, ..) do
> > > > match this and advertise 0 byte hard_header_len. Bareudp, vxlan and
> > > > geneve also conform to this. Known exception that probably needs to be
> > > > addressed is sit, which still advertises LL_MAX_HEADER and so has
> > > > exposed quite a few syzkaller issues. Side note, it is not entirely
> > > > clear to me what sets ARPHRD_TUNNEL et al apart from ARPHRD_NONE and
> > > > why they are needed.
> > > >
> > > > GRE devices advertise ARPHRD_IPGRE and GRETAP advertise ARPHRD_ETHER.
> > > > The second makes sense, as it appears as an Ethernet device. The first
> > > > should match "bare" ip tunnel devices, if following the above logic.
> > > > Indeed, this is what commit e271c7b4420d ("gre: do not keep the GRE
> > > > header around in collect medata mode") implements. It changes
> > > > dev->type to ARPHRD_NONE in collect_md mode.
> > > >
> > > > Some of the inconsistency comes from the various modes of the GRE
> > > > driver. Which brings us to ipgre_header_ops. It is set only in two
> > > > special cases.
> > > >
> > > > Commit 6a5f44d7a048 ("[IPV4] ip_gre: sendto/recvfrom NBMA address")
> > > > added ipgre_header_ops.parse to be able to receive the inner ip source
> > > > address with PF_PACKET recvfrom. And apparently relies on
> > > > ipgre_header_ops.create to be able to set an address, which implies
> > > > SOCK_DGRAM.
> > > >
> > > > The other special case, CONFIG_NET_IPGRE_BROADCAST, predates git. Its
> > > > implementation starts with the beautiful comment "/* Nice toy.
> > > > Unfortunately, useless in real life :-)". From the rest of that
> > > > detailed comment, it is not clear to me why it would need to expose
> > > > the headers. The example does not use packet sockets.
> > > >
> > > > A packet socket cannot know devices details such as which configurable
> > > > mode a device may be in. And different modes conflict with the basic
> > > > rule that for a given well defined link layer type, i.e., dev->type,
> > > > header length can be expected to be consistent. In an ideal world
> > > > these exceptions would not exist, therefore.
> > > >
> > > > Unfortunately, this is legacy behavior that will have to continue to
> > > > be supported.
> > >
> > > Thanks for your explanation. So header_ops for GRE devices is only
> > > used in 2 special situations. In normal situations, header_ops is not
> > > used for GRE devices. And we consider not using header_ops should be
> > > the ideal arrangement for GRE devices.
> > >
> > > Can we create a new dev->type (like ARPHRD_IPGRE_SPECIAL) for GRE
> > > devices that use header_ops? I guess changing dev->type will not
> > > affect the interface to the user space? This way we can solve the
> > > problem of the same dev->type having different hard_header_len values.
> >
> > But does that address any real issue?
>
> It doesn't address any issue visible when using. Just to solve the
> problem of the same dev->type having different hard_header_len values
> which you mentioned. Making this change will not affect the user in
> any way. So I think it is valuable to make this change.
>
> > If anything, it would make sense to keep ARHPHRD_IPGRE for tunnels
> > that expect headers and switch to ARPHRD_NONE for those that do not.
> > As the collect_md commit I mentioned above does.
>
> I thought we agreed that ideally GRE devices would not have
> header_ops. Currently GRE devices (in normal situations) indeed do not
> use header_ops (and use ARHPHRD_IPGRE as dev->type). I think we should
> keep this behavior.
>
> To solve the problem of the same dev->type having different
> hard_header_len values which you mentioned. I think we should create a
> new dev->type (ARPHRD_IPGRE_SPECIAL) for GRE devices that use
> header_ops.
>
> Also, for collect_md, I think we should use ARHPHRD_IPGRE. I see no
> reason to use ARPHRD_NONE.

What does ARPHRD_IPGRE define beyond ARPHRD_NONE? And same for
ARPHRD_TUNNEL variants. If they are indistinguishable, they are the
same and might as well have the same label.

> > > Also, for the second special situation, if there's no obvious reason
> > > to use header_ops, maybe we can consider removing header_ops for this
> > > situation.
> >
> > Unfortunately, there's no knowing if some application is using this
> > broadcast mode *with* a process using packet sockets.
>
> We can't always keep the interface to the user space unchanged when
> fixing problems. When we fix drivers by adding hard_header_len or
> removing hard_header_len, we ARE changing the interface. I did these
> fixes a lot. I also changed skb->protocol when sending skbs for some
> drivers, which in fact was also changing the interface. It is not
> possible to keep the interface strictly unchanged, otherwise a lot of
> problems will be impossible to fix.

I understand that for bug fixes this is sometimes unavoidable. I don't
think code cleanup is reason enough, though.
Xie He Oct. 15, 2020, 1:38 a.m. UTC | #13
On Wed, Oct 14, 2020 at 1:19 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>

> On Wed, Oct 14, 2020 at 3:48 PM Xie He <xie.he.0141@gmail.com> wrote:

> >

> > I thought we agreed that ideally GRE devices would not have

> > header_ops. Currently GRE devices (in normal situations) indeed do not

> > use header_ops (and use ARHPHRD_IPGRE as dev->type). I think we should

> > keep this behavior.

> >

> > To solve the problem of the same dev->type having different

> > hard_header_len values which you mentioned. I think we should create a

> > new dev->type (ARPHRD_IPGRE_SPECIAL) for GRE devices that use

> > header_ops.

> >

> > Also, for collect_md, I think we should use ARHPHRD_IPGRE. I see no

> > reason to use ARPHRD_NONE.

>

> What does ARPHRD_IPGRE define beyond ARPHRD_NONE? And same for

> ARPHRD_TUNNEL variants. If they are indistinguishable, they are the

> same and might as well have the same label.


It is indeed reasonable to keep devices indistinguishable to each
other having the same dev->type label. But I see a lot of devices in
the kernel without header_ops having different dev->type labels. For
example, ARPHRD_SLIP should be the same as ARPHRD_RAWIP. One feature
distinguishing these devices might be their dev->mtu.

GRE devices may have their special dev->mtu determined by the maximum
IP packet size and the GRE header length.

For ARPHRD_TUNNEL, it may also have its own dev->mtu. I also see it
has header_ops->parse_protocol (but it doesn't have
header_ops->create).
Xie He Oct. 15, 2020, 2:24 a.m. UTC | #14
On Wed, Oct 14, 2020 at 6:38 PM Xie He <xie.he.0141@gmail.com> wrote:
>
> On Wed, Oct 14, 2020 at 1:19 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Wed, Oct 14, 2020 at 3:48 PM Xie He <xie.he.0141@gmail.com> wrote:
> > >
> > > I thought we agreed that ideally GRE devices would not have
> > > header_ops. Currently GRE devices (in normal situations) indeed do not
> > > use header_ops (and use ARHPHRD_IPGRE as dev->type). I think we should
> > > keep this behavior.
> > >
> > > To solve the problem of the same dev->type having different
> > > hard_header_len values which you mentioned. I think we should create a
> > > new dev->type (ARPHRD_IPGRE_SPECIAL) for GRE devices that use
> > > header_ops.
> > >
> > > Also, for collect_md, I think we should use ARHPHRD_IPGRE. I see no
> > > reason to use ARPHRD_NONE.
> >
> > What does ARPHRD_IPGRE define beyond ARPHRD_NONE? And same for
> > ARPHRD_TUNNEL variants. If they are indistinguishable, they are the
> > same and might as well have the same label.
>
> It is indeed reasonable to keep devices indistinguishable to each
> other having the same dev->type label. But I see a lot of devices in
> the kernel without header_ops having different dev->type labels. For
> example, ARPHRD_SLIP should be the same as ARPHRD_RAWIP. One feature
> distinguishing these devices might be their dev->mtu.
>
> GRE devices may have their special dev->mtu determined by the maximum
> IP packet size and the GRE header length.
>
> For ARPHRD_TUNNEL, it may also have its own dev->mtu. I also see it
> has header_ops->parse_protocol (but it doesn't have
> header_ops->create).

Actually I think dev->type can be seen from user space. For example,
when you type "ip link", it will display the link type for you. So I
think it is useful to keep different dev->type labels without merging
them even if they appear to have no difference.
Willem de Bruijn Oct. 15, 2020, 1:42 p.m. UTC | #15
On Wed, Oct 14, 2020 at 10:25 PM Xie He <xie.he.0141@gmail.com> wrote:
>
> On Wed, Oct 14, 2020 at 6:38 PM Xie He <xie.he.0141@gmail.com> wrote:
> >
> > On Wed, Oct 14, 2020 at 1:19 PM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > On Wed, Oct 14, 2020 at 3:48 PM Xie He <xie.he.0141@gmail.com> wrote:
> > > >
> > > > I thought we agreed that ideally GRE devices would not have
> > > > header_ops. Currently GRE devices (in normal situations) indeed do not
> > > > use header_ops (and use ARHPHRD_IPGRE as dev->type). I think we should
> > > > keep this behavior.
> > > >
> > > > To solve the problem of the same dev->type having different
> > > > hard_header_len values which you mentioned. I think we should create a
> > > > new dev->type (ARPHRD_IPGRE_SPECIAL) for GRE devices that use
> > > > header_ops.
> > > >
> > > > Also, for collect_md, I think we should use ARHPHRD_IPGRE. I see no
> > > > reason to use ARPHRD_NONE.
> > >
> > > What does ARPHRD_IPGRE define beyond ARPHRD_NONE? And same for
> > > ARPHRD_TUNNEL variants. If they are indistinguishable, they are the
> > > same and might as well have the same label.
> >
> > It is indeed reasonable to keep devices indistinguishable to each
> > other having the same dev->type label. But I see a lot of devices in
> > the kernel without header_ops having different dev->type labels. For
> > example, ARPHRD_SLIP should be the same as ARPHRD_RAWIP. One feature
> > distinguishing these devices might be their dev->mtu.
> >
> > GRE devices may have their special dev->mtu determined by the maximum
> > IP packet size and the GRE header length.
> >
> > For ARPHRD_TUNNEL, it may also have its own dev->mtu. I also see it
> > has header_ops->parse_protocol (but it doesn't have
> > header_ops->create).
>
> Actually I think dev->type can be seen from user space. For example,
> when you type "ip link", it will display the link type for you. So I
> think it is useful to keep different dev->type labels without merging
> them even if they appear to have no difference.

Ah, indeed. These constants are matched in iproute2 in lib/ll_types.c
to string representations.

Good catch. Yes, then they have to stay as is.
Xie He Oct. 15, 2020, 7:19 p.m. UTC | #16
On Thu, Oct 15, 2020 at 6:42 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>

> On Wed, Oct 14, 2020 at 10:25 PM Xie He <xie.he.0141@gmail.com> wrote:

> >

> > Actually I think dev->type can be seen from user space. For example,

> > when you type "ip link", it will display the link type for you. So I

> > think it is useful to keep different dev->type labels without merging

> > them even if they appear to have no difference.

>

> Ah, indeed. These constants are matched in iproute2 in lib/ll_types.c

> to string representations.

>

> Good catch. Yes, then they have to stay as is.


So in this case it may be better to keep dev->type as ARHPHRD_IPGRE
for GRE devices with or without header_ops. This way we can avoid the
need to update iproute2.

We can still consider changing GRE devices in collect_md mode from
ARPHRD_NONE to ARHPHRD_IPGRE. The original author who changed it to
ARPHRD_NONE might assume ARHPHRD_IPGRE is for GRE devices with
header_ops, so he felt he needed to distinguish GRE devices without
header_ops by dev->type. However, ARHPHRD_IPGRE is actually already
used for GRE devices with header_ops AND without header_ops. So it
doesn't hurt to use ARHPHRD_IPGRE for GRE devices in collect_md mode,
too. This would also make iproute2 to correctly display GRE devices in
collect_md mode as GRE devices.
Willem de Bruijn Oct. 15, 2020, 7:56 p.m. UTC | #17
On Thu, Oct 15, 2020 at 3:19 PM Xie He <xie.he.0141@gmail.com> wrote:
>

> On Thu, Oct 15, 2020 at 6:42 AM Willem de Bruijn

> <willemdebruijn.kernel@gmail.com> wrote:

> >

> > On Wed, Oct 14, 2020 at 10:25 PM Xie He <xie.he.0141@gmail.com> wrote:

> > >

> > > Actually I think dev->type can be seen from user space. For example,

> > > when you type "ip link", it will display the link type for you. So I

> > > think it is useful to keep different dev->type labels without merging

> > > them even if they appear to have no difference.

> >

> > Ah, indeed. These constants are matched in iproute2 in lib/ll_types.c

> > to string representations.

> >

> > Good catch. Yes, then they have to stay as is.

>

> So in this case it may be better to keep dev->type as ARHPHRD_IPGRE

> for GRE devices with or without header_ops. This way we can avoid the

> need to update iproute2.

>

> We can still consider changing GRE devices in collect_md mode from

> ARPHRD_NONE to ARHPHRD_IPGRE. The original author who changed it to

> ARPHRD_NONE might assume ARHPHRD_IPGRE is for GRE devices with

> header_ops, so he felt he needed to distinguish GRE devices without

> header_ops by dev->type. However, ARHPHRD_IPGRE is actually already

> used for GRE devices with header_ops AND without header_ops. So it

> doesn't hurt to use ARHPHRD_IPGRE for GRE devices in collect_md mode,

> too. This would also make iproute2 to correctly display GRE devices in

> collect_md mode as GRE devices.


That's true. Whenever you make user visible changes there is some
chance of breakage. I'm not sure it's worth the risk here. The present
state is perhaps not ideal, but not broken.
diff mbox series

Patch

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 4e31f23e4117..82fee0010353 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -626,8 +626,7 @@  static netdev_tx_t ipgre_xmit(struct sk_buff *skb,
 
 	if (dev->header_ops) {
 		/* Need space for new headers */
-		if (skb_cow_head(skb, dev->needed_headroom -
-				      (tunnel->hlen + sizeof(struct iphdr))))
+		if (skb_cow_head(skb, dev->hard_header_len))
 			goto free_skb;
 
 		tnl_params = (const struct iphdr *)skb->data;
@@ -748,7 +747,11 @@  static void ipgre_link_update(struct net_device *dev, bool set_mtu)
 	len = tunnel->tun_hlen - len;
 	tunnel->hlen = tunnel->hlen + len;
 
-	dev->needed_headroom = dev->needed_headroom + len;
+	if (dev->header_ops)
+		dev->hard_header_len += len;
+	else
+		dev->needed_headroom += len;
+
 	if (set_mtu)
 		dev->mtu = max_t(int, dev->mtu - len, 68);
 
@@ -944,6 +947,7 @@  static void __gre_tunnel_init(struct net_device *dev)
 	tunnel->parms.iph.protocol = IPPROTO_GRE;
 
 	tunnel->hlen = tunnel->tun_hlen + tunnel->encap_hlen;
+	dev->needed_headroom = tunnel->hlen + sizeof(tunnel->parms.iph);
 
 	dev->features		|= GRE_FEATURES;
 	dev->hw_features	|= GRE_FEATURES;
@@ -987,10 +991,14 @@  static int ipgre_tunnel_init(struct net_device *dev)
 				return -EINVAL;
 			dev->flags = IFF_BROADCAST;
 			dev->header_ops = &ipgre_header_ops;
+			dev->hard_header_len = tunnel->hlen + sizeof(*iph);
+			dev->needed_headroom = 0;
 		}
 #endif
 	} else if (!tunnel->collect_md) {
 		dev->header_ops = &ipgre_header_ops;
+		dev->hard_header_len = tunnel->hlen + sizeof(*iph);
+		dev->needed_headroom = 0;
 	}
 
 	return ip_tunnel_init(dev);