Message ID | 1609892546-11389-1-git-send-email-stranche@quicinc.com |
---|---|
State | New |
Headers | show |
Series | [net,1/2] net: ipv6: fib: flush exceptions when purging route | expand |
On 1/5/21 5:22 PM, Sean Tranchetti wrote: > From: Sean Tranchetti <stranche@codeaurora.org> > > Route removal is handled by two code paths. The main removal path is via > fib6_del_route() which will handle purging any PMTU exceptions from the > cache, removing all per-cpu copies of the DST entry used by the route, and > releasing the fib6_info struct. > > The second removal location is during fib6_add_rt2node() during a route > replacement operation. This path also calls fib6_purge_rt() to handle > cleaning up the per-cpu copies of the DST entries and releasing the > fib6_info associated with the older route, but it does not flush any PMTU > exceptions that the older route had. Since the older route is removed from > the tree during the replacement, we lose any way of accessing it again. > > As these lingering DSTs and the fib6_info struct are holding references to > the underlying netdevice struct as well, unregistering that device from the > kernel can never complete. > I think the right fixes tag is: Fixes: 2b760fcf5cfb3 ("ipv6: hook up exception table to store dst cache") cc'ed author of that patch. > Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> > --- > net/ipv6/ip6_fib.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c > index 605cdd3..f43e275 100644 > --- a/net/ipv6/ip6_fib.c > +++ b/net/ipv6/ip6_fib.c > @@ -1025,6 +1025,8 @@ static void fib6_purge_rt(struct fib6_info *rt, struct fib6_node *fn, > { > struct fib6_table *table = rt->fib6_table; > > + /* Flush all cached dst in exception table */ > + rt6_flush_exceptions(rt); > fib6_drop_pcpu_from(rt, table); > > if (rt->nh && !list_empty(&rt->nh_list)) > @@ -1927,9 +1929,6 @@ static void fib6_del_route(struct fib6_table *table, struct fib6_node *fn, > net->ipv6.rt6_stats->fib_rt_entries--; > net->ipv6.rt6_stats->fib_discarded_routes++; > > - /* Flush all cached dst in exception table */ > - rt6_flush_exceptions(rt); > - > /* Reset round-robin state, if necessary */ > if (rcu_access_pointer(fn->rr_ptr) == rt) > fn->rr_ptr = NULL; > Reviewed-by: David Ahern <dsahern@kernel.org>
On 1/5/21 5:22 PM, Sean Tranchetti wrote: > From: Sean Tranchetti <stranche@codeaurora.org> > > Adds new 2 new tests to the PTMU script: pmtu_ipv4/6_route_change. > > These tests explicitly test for a recently discovered problem in the > IPv6 routing framework where PMTU exceptions were not properly released > when replacing a route via "ip route change ...". > > After creating PMTU exceptions, the route from the device A to R1 will be > replaced with a new route, then device A will be deleted. If the PMTU > exceptions were properly cleaned up by the kernel, this device deletion > will succeed. Otherwise, the unregistration of the device will stall, and > messages such as the following will be logged in dmesg: > > unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 4 > > Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> > --- > tools/testing/selftests/net/pmtu.sh | 71 +++++++++++++++++++++++++++++++++++-- > 1 file changed, 69 insertions(+), 2 deletions(-) > > diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh > index 464e31e..64cd2e2 100755 > --- a/tools/testing/selftests/net/pmtu.sh > +++ b/tools/testing/selftests/net/pmtu.sh > @@ -162,7 +162,15 @@ > # - list_flush_ipv6_exception > # Using the same topology as in pmtu_ipv6, create exceptions, and check > # they are shown when listing exception caches, gone after flushing them > - > +# > +# - pmtu_ipv4_route_change > +# Use the same topology as in pmtu_ipv4, but issue a route replacement > +# command and delete the corresponding device afterward. This tests for > +# proper cleanup of the PMTU exceptions by the route replacement path. > +# Device unregistration should complete successfully > +# > +# - pmtu_ipv6_route_change > +# Same as above but with IPv6 > > # Kselftest framework requirement - SKIP code is 4. > ksft_skip=4 > @@ -224,7 +232,9 @@ tests=" > cleanup_ipv4_exception ipv4: cleanup of cached exceptions 1 > cleanup_ipv6_exception ipv6: cleanup of cached exceptions 1 > list_flush_ipv4_exception ipv4: list and flush cached exceptions 1 > - list_flush_ipv6_exception ipv6: list and flush cached exceptions 1" > + list_flush_ipv6_exception ipv6: list and flush cached exceptions 1 > + pmtu_ipv4_route_change ipv4: PMTU exception w/route replace 1 > + pmtu_ipv6_route_change ipv6: PMTU exception w/route replace 1" > > NS_A="ns-A" > NS_B="ns-B" > @@ -1782,6 +1792,63 @@ test_list_flush_ipv6_exception() { > return ${fail} > } > > +test_pmtu_ipvX_route_change() { > + family=${1} > + > + setup namespaces routing || return 2 > + trace "${ns_a}" veth_A-R1 "${ns_r1}" veth_R1-A \ > + "${ns_r1}" veth_R1-B "${ns_b}" veth_B-R1 \ > + "${ns_a}" veth_A-R2 "${ns_r2}" veth_R2-A \ > + "${ns_r2}" veth_R2-B "${ns_b}" veth_B-R2 > + > + if [ ${family} -eq 4 ]; then > + ping=ping > + dst1="${prefix4}.${b_r1}.1" > + dst2="${prefix4}.${b_r2}.1" > + gw="${prefix4}.${a_r1}.2" > + else > + ping=${ping6} > + dst1="${prefix6}:${b_r1}::1" > + dst2="${prefix6}:${b_r2}::1" > + gw="${prefix6}:${a_r1}::2" > + fi > + > + # Set up initial MTU values > + mtu "${ns_a}" veth_A-R1 2000 > + mtu "${ns_r1}" veth_R1-A 2000 > + mtu "${ns_r1}" veth_R1-B 1400 > + mtu "${ns_b}" veth_B-R1 1400 > + > + mtu "${ns_a}" veth_A-R2 2000 > + mtu "${ns_r2}" veth_R2-A 2000 > + mtu "${ns_r2}" veth_R2-B 1500 > + mtu "${ns_b}" veth_B-R2 1500 > + > + # Create route exceptions > + run_cmd ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s 1800 ${dst1} > + run_cmd ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s 1800 ${dst2} > + > + # Check that exceptions have been created with the correct PMTU > + pmtu_1="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst1})" > + check_pmtu_value "1400" "${pmtu_1}" "exceeding MTU" || return 1 > + pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst2})" > + check_pmtu_value "1500" "${pmtu_2}" "exceeding MTU" || return 1 > + > + # Replace the route from A to R1 > + run_cmd ${ns_a} ip route change default via ${gw} > + > + # Delete the device in A > + run_cmd ${ns_a} ip link del "veth_A-R1" > +} > + > +test_pmtu_ipv4_route_change() { > + test_pmtu_ipvX_route_change 4 > +} > + > +test_pmtu_ipv6_route_change() { > + test_pmtu_ipvX_route_change 6 > +} > + > usage() { > echo > echo "$0 [OPTIONS] [TEST]..." > Thanks for adding the tests. Reviewed-by: David Ahern <dsahern@kernel.org>
On Wed, 6 Jan 2021 09:55:09 -0700 David Ahern wrote: > On 1/5/21 5:22 PM, Sean Tranchetti wrote: > > From: Sean Tranchetti <stranche@codeaurora.org> > > > > Route removal is handled by two code paths. The main removal path is via > > fib6_del_route() which will handle purging any PMTU exceptions from the > > cache, removing all per-cpu copies of the DST entry used by the route, and > > releasing the fib6_info struct. > > > > The second removal location is during fib6_add_rt2node() during a route > > replacement operation. This path also calls fib6_purge_rt() to handle > > cleaning up the per-cpu copies of the DST entries and releasing the > > fib6_info associated with the older route, but it does not flush any PMTU > > exceptions that the older route had. Since the older route is removed from > > the tree during the replacement, we lose any way of accessing it again. > > > > As these lingering DSTs and the fib6_info struct are holding references to > > the underlying netdevice struct as well, unregistering that device from the > > kernel can never complete. > > I think the right fixes tag is: > > Fixes: 2b760fcf5cfb3 ("ipv6: hook up exception table to store dst cache") > > cc'ed author of that patch. > > > Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> > > Reviewed-by: David Ahern <dsahern@kernel.org> Applied, thanks!
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 605cdd3..f43e275 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1025,6 +1025,8 @@ static void fib6_purge_rt(struct fib6_info *rt, struct fib6_node *fn, { struct fib6_table *table = rt->fib6_table; + /* Flush all cached dst in exception table */ + rt6_flush_exceptions(rt); fib6_drop_pcpu_from(rt, table); if (rt->nh && !list_empty(&rt->nh_list)) @@ -1927,9 +1929,6 @@ static void fib6_del_route(struct fib6_table *table, struct fib6_node *fn, net->ipv6.rt6_stats->fib_rt_entries--; net->ipv6.rt6_stats->fib_discarded_routes++; - /* Flush all cached dst in exception table */ - rt6_flush_exceptions(rt); - /* Reset round-robin state, if necessary */ if (rcu_access_pointer(fn->rr_ptr) == rt) fn->rr_ptr = NULL;