Message ID | b2d237d08317ca55926add9654a48409ac1b8f5b.1606412894.git.gnault@redhat.com |
---|---|
State | New |
Headers | show |
Series | [net] ipv4: Fix tos mask in inet_rtm_getroute() | expand |
On 11/26/20 11:09 AM, Guillaume Nault wrote: > When inet_rtm_getroute() was converted to use the RCU variants of > ip_route_input() and ip_route_output_key(), the TOS parameters > stopped being masked with IPTOS_RT_MASK before doing the route lookup. > > As a result, "ip route get" can return a different route than what > would be used when sending real packets. > > For example: > > $ ip route add 192.0.2.11/32 dev eth0 > $ ip route add unreachable 192.0.2.11/32 tos 2 > $ ip route get 192.0.2.11 tos 2 > RTNETLINK answers: No route to host > > But, packets with TOS 2 (ECT(0) if interpreted as an ECN bit) would > actually be routed using the first route: > > $ ping -c 1 -Q 2 192.0.2.11 > PING 192.0.2.11 (192.0.2.11) 56(84) bytes of data. > 64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.173 ms > > --- 192.0.2.11 ping statistics --- > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > rtt min/avg/max/mdev = 0.173/0.173/0.173/0.000 ms > > This patch re-applies IPTOS_RT_MASK in inet_rtm_getroute(), to > return results consistent with real route lookups. > > Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup") > Signed-off-by: Guillaume Nault <gnault@redhat.com> > --- > net/ipv4/route.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > Reviewed-by: David Ahern <dsahern@kernel.org>
On Sat, 28 Nov 2020 10:03:42 -0700 David Ahern wrote: > On 11/26/20 11:09 AM, Guillaume Nault wrote: > > When inet_rtm_getroute() was converted to use the RCU variants of > > ip_route_input() and ip_route_output_key(), the TOS parameters > > stopped being masked with IPTOS_RT_MASK before doing the route lookup. > > > > As a result, "ip route get" can return a different route than what > > would be used when sending real packets. > > > > For example: > > > > $ ip route add 192.0.2.11/32 dev eth0 > > $ ip route add unreachable 192.0.2.11/32 tos 2 > > $ ip route get 192.0.2.11 tos 2 > > RTNETLINK answers: No route to host > > > > But, packets with TOS 2 (ECT(0) if interpreted as an ECN bit) would > > actually be routed using the first route: > > > > $ ping -c 1 -Q 2 192.0.2.11 > > PING 192.0.2.11 (192.0.2.11) 56(84) bytes of data. > > 64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.173 ms > > > > --- 192.0.2.11 ping statistics --- > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > > rtt min/avg/max/mdev = 0.173/0.173/0.173/0.000 ms > > > > This patch re-applies IPTOS_RT_MASK in inet_rtm_getroute(), to > > return results consistent with real route lookups. > > > > Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup") > > Signed-off-by: Guillaume Nault <gnault@redhat.com> > > Reviewed-by: David Ahern <dsahern@kernel.org> Applied, thanks! Should the discrepancy between the behavior of ip_route_input_rcu() and ip_route_input() be addressed, possibly?
On Sat, Nov 28, 2020 at 01:17:16PM -0800, Jakub Kicinski wrote: > On Sat, 28 Nov 2020 10:03:42 -0700 David Ahern wrote: > > On 11/26/20 11:09 AM, Guillaume Nault wrote: > > > When inet_rtm_getroute() was converted to use the RCU variants of > > > ip_route_input() and ip_route_output_key(), the TOS parameters > > > stopped being masked with IPTOS_RT_MASK before doing the route lookup. > > > > > > As a result, "ip route get" can return a different route than what > > > would be used when sending real packets. > > > > > > For example: > > > > > > $ ip route add 192.0.2.11/32 dev eth0 > > > $ ip route add unreachable 192.0.2.11/32 tos 2 > > > $ ip route get 192.0.2.11 tos 2 > > > RTNETLINK answers: No route to host > > > > > > But, packets with TOS 2 (ECT(0) if interpreted as an ECN bit) would > > > actually be routed using the first route: > > > > > > $ ping -c 1 -Q 2 192.0.2.11 > > > PING 192.0.2.11 (192.0.2.11) 56(84) bytes of data. > > > 64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.173 ms > > > > > > --- 192.0.2.11 ping statistics --- > > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > > > rtt min/avg/max/mdev = 0.173/0.173/0.173/0.000 ms > > > > > > This patch re-applies IPTOS_RT_MASK in inet_rtm_getroute(), to > > > return results consistent with real route lookups. > > > > > > Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup") > > > Signed-off-by: Guillaume Nault <gnault@redhat.com> > > > > Reviewed-by: David Ahern <dsahern@kernel.org> > > Applied, thanks! > > Should the discrepancy between the behavior of ip_route_input_rcu() and > ip_route_input() be addressed, possibly? Do you mean masking TOS with IPTOS_RT_MASK directly in ip_route_input_rcu(), instead of in the callers? After this patch, all callers apply IPTOS_RT_MASK before calling ip_route_input_rcu(). So, yes, that could be easily consolidated there, and I'll do that after net merges into net-next. More generally, my long term plan is indeed to do mask the TOS in central places, to get consistent behaviour across the networking stack. However, generally speaking, I need to be careful not to break any established behaviour. I'm mostly worried about the ECN bits. I guess that any caller that doesn't mask these bits has a bug (as that may break ECN, which is there since a long time). However, there are many code paths to audit before we can be sure. The end goal is to fully support DSCP. Once we'll be sure that no code path can possibly intreprete an ECN bit as TOS, we'll can safely drop all those obsolete TOS* masks and macros from the kernel code and simply mask out the ECN bits (thus preserving the whole DSCP space). Please note that this is background work for me. Expect slow (but hopefully regular) progress from me.
On Sun, 29 Nov 2020 13:54:16 +0100 Guillaume Nault wrote: > On Sat, Nov 28, 2020 at 01:17:16PM -0800, Jakub Kicinski wrote: > > On Sat, 28 Nov 2020 10:03:42 -0700 David Ahern wrote: > > > On 11/26/20 11:09 AM, Guillaume Nault wrote: > > > > When inet_rtm_getroute() was converted to use the RCU variants of > > > > ip_route_input() and ip_route_output_key(), the TOS parameters > > > > stopped being masked with IPTOS_RT_MASK before doing the route lookup. > > > > > > > > As a result, "ip route get" can return a different route than what > > > > would be used when sending real packets. > > > > > > > > For example: > > > > > > > > $ ip route add 192.0.2.11/32 dev eth0 > > > > $ ip route add unreachable 192.0.2.11/32 tos 2 > > > > $ ip route get 192.0.2.11 tos 2 > > > > RTNETLINK answers: No route to host > > > > > > > > But, packets with TOS 2 (ECT(0) if interpreted as an ECN bit) would > > > > actually be routed using the first route: > > > > > > > > $ ping -c 1 -Q 2 192.0.2.11 > > > > PING 192.0.2.11 (192.0.2.11) 56(84) bytes of data. > > > > 64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.173 ms > > > > > > > > --- 192.0.2.11 ping statistics --- > > > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > > > > rtt min/avg/max/mdev = 0.173/0.173/0.173/0.000 ms > > > > > > > > This patch re-applies IPTOS_RT_MASK in inet_rtm_getroute(), to > > > > return results consistent with real route lookups. > > > > > > > > Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup") > > > > Signed-off-by: Guillaume Nault <gnault@redhat.com> > > > > > > Reviewed-by: David Ahern <dsahern@kernel.org> > > > > Applied, thanks! > > > > Should the discrepancy between the behavior of ip_route_input_rcu() and > > ip_route_input() be addressed, possibly? > > Do you mean masking TOS with IPTOS_RT_MASK directly in > ip_route_input_rcu(), instead of in the callers? > > After this patch, all callers apply IPTOS_RT_MASK before calling > ip_route_input_rcu(). So, yes, that could be easily consolidated there, > and I'll do that after net merges into net-next. > > More generally, my long term plan is indeed to do mask the TOS in > central places, to get consistent behaviour across the networking > stack. However, generally speaking, I need to be careful not to break > any established behaviour. > > I'm mostly worried about the ECN bits. I guess that any caller that > doesn't mask these bits has a bug (as that may break ECN, which is > there since a long time). However, there are many code paths to audit > before we can be sure. > > The end goal is to fully support DSCP. Once we'll be sure that no > code path can possibly intreprete an ECN bit as TOS, we'll can safely > drop all those obsolete TOS* masks and macros from the kernel code and > simply mask out the ECN bits (thus preserving the whole DSCP space). Sounds great! > Please note that this is background work for me. Expect slow (but > hopefully regular) progress from me.
diff --git a/net/ipv4/route.c b/net/ipv4/route.c index dc2a399cd9f4..9f43abeac3a8 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -3222,7 +3222,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, fl4.daddr = dst; fl4.saddr = src; - fl4.flowi4_tos = rtm->rtm_tos; + fl4.flowi4_tos = rtm->rtm_tos & IPTOS_RT_MASK; fl4.flowi4_oif = tb[RTA_OIF] ? nla_get_u32(tb[RTA_OIF]) : 0; fl4.flowi4_mark = mark; fl4.flowi4_uid = uid; @@ -3246,8 +3246,9 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, fl4.flowi4_iif = iif; /* for rt_fill_info */ skb->dev = dev; skb->mark = mark; - err = ip_route_input_rcu(skb, dst, src, rtm->rtm_tos, - dev, &res); + err = ip_route_input_rcu(skb, dst, src, + rtm->rtm_tos & IPTOS_RT_MASK, dev, + &res); rt = skb_rtable(skb); if (err == 0 && rt->dst.error)
When inet_rtm_getroute() was converted to use the RCU variants of ip_route_input() and ip_route_output_key(), the TOS parameters stopped being masked with IPTOS_RT_MASK before doing the route lookup. As a result, "ip route get" can return a different route than what would be used when sending real packets. For example: $ ip route add 192.0.2.11/32 dev eth0 $ ip route add unreachable 192.0.2.11/32 tos 2 $ ip route get 192.0.2.11 tos 2 RTNETLINK answers: No route to host But, packets with TOS 2 (ECT(0) if interpreted as an ECN bit) would actually be routed using the first route: $ ping -c 1 -Q 2 192.0.2.11 PING 192.0.2.11 (192.0.2.11) 56(84) bytes of data. 64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.173 ms --- 192.0.2.11 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.173/0.173/0.173/0.000 ms This patch re-applies IPTOS_RT_MASK in inet_rtm_getroute(), to return results consistent with real route lookups. Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup") Signed-off-by: Guillaume Nault <gnault@redhat.com> --- net/ipv4/route.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)