diff mbox series

[net,1/2] net: ipv6: fib: flush exceptions when purging route

Message ID 1609892546-11389-1-git-send-email-stranche@quicinc.com
State New
Headers show
Series [net,1/2] net: ipv6: fib: flush exceptions when purging route | expand

Commit Message

Sean Tranchetti Jan. 6, 2021, 12:22 a.m. UTC
From: Sean Tranchetti <stranche@codeaurora.org>

Route removal is handled by two code paths. The main removal path is via
fib6_del_route() which will handle purging any PMTU exceptions from the
cache, removing all per-cpu copies of the DST entry used by the route, and
releasing the fib6_info struct.

The second removal location is during fib6_add_rt2node() during a route
replacement operation. This path also calls fib6_purge_rt() to handle
cleaning up the per-cpu copies of the DST entries and releasing the
fib6_info associated with the older route, but it does not flush any PMTU
exceptions that the older route had. Since the older route is removed from
the tree during the replacement, we lose any way of accessing it again.

As these lingering DSTs and the fib6_info struct are holding references to
the underlying netdevice struct as well, unregistering that device from the
kernel can never complete.

Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
---
 net/ipv6/ip6_fib.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

David Ahern Jan. 6, 2021, 4:55 p.m. UTC | #1
On 1/5/21 5:22 PM, Sean Tranchetti wrote:
> From: Sean Tranchetti <stranche@codeaurora.org>

> 

> Route removal is handled by two code paths. The main removal path is via

> fib6_del_route() which will handle purging any PMTU exceptions from the

> cache, removing all per-cpu copies of the DST entry used by the route, and

> releasing the fib6_info struct.

> 

> The second removal location is during fib6_add_rt2node() during a route

> replacement operation. This path also calls fib6_purge_rt() to handle

> cleaning up the per-cpu copies of the DST entries and releasing the

> fib6_info associated with the older route, but it does not flush any PMTU

> exceptions that the older route had. Since the older route is removed from

> the tree during the replacement, we lose any way of accessing it again.

> 

> As these lingering DSTs and the fib6_info struct are holding references to

> the underlying netdevice struct as well, unregistering that device from the

> kernel can never complete.

> 


I think the right fixes tag is:

Fixes: 2b760fcf5cfb3 ("ipv6: hook up exception table to store dst cache")

cc'ed author of that patch.

> Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>

> ---

>  net/ipv6/ip6_fib.c | 5 ++---

>  1 file changed, 2 insertions(+), 3 deletions(-)

> 

> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c

> index 605cdd3..f43e275 100644

> --- a/net/ipv6/ip6_fib.c

> +++ b/net/ipv6/ip6_fib.c

> @@ -1025,6 +1025,8 @@ static void fib6_purge_rt(struct fib6_info *rt, struct fib6_node *fn,

>  {

>  	struct fib6_table *table = rt->fib6_table;

>  

> +	/* Flush all cached dst in exception table */

> +	rt6_flush_exceptions(rt);

>  	fib6_drop_pcpu_from(rt, table);

>  

>  	if (rt->nh && !list_empty(&rt->nh_list))

> @@ -1927,9 +1929,6 @@ static void fib6_del_route(struct fib6_table *table, struct fib6_node *fn,

>  	net->ipv6.rt6_stats->fib_rt_entries--;

>  	net->ipv6.rt6_stats->fib_discarded_routes++;

>  

> -	/* Flush all cached dst in exception table */

> -	rt6_flush_exceptions(rt);

> -

>  	/* Reset round-robin state, if necessary */

>  	if (rcu_access_pointer(fn->rr_ptr) == rt)

>  		fn->rr_ptr = NULL;

> 


Reviewed-by: David Ahern <dsahern@kernel.org>
David Ahern Jan. 6, 2021, 5:03 p.m. UTC | #2
On 1/5/21 5:22 PM, Sean Tranchetti wrote:
> From: Sean Tranchetti <stranche@codeaurora.org>

> 

> Adds new 2 new tests to the PTMU script: pmtu_ipv4/6_route_change.

> 

> These tests explicitly test for a recently discovered problem in the

> IPv6 routing framework where PMTU exceptions were not properly released

> when replacing a route via "ip route change ...".

> 

> After creating PMTU exceptions, the route from the device A to R1 will be

> replaced with a new route, then device A will be deleted. If the PMTU

> exceptions were properly cleaned up by the kernel, this device deletion

> will succeed. Otherwise, the unregistration of the device will stall, and

> messages such as the following will be logged in dmesg:

> 

> unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 4

> 

> Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>

> ---

>  tools/testing/selftests/net/pmtu.sh | 71 +++++++++++++++++++++++++++++++++++--

>  1 file changed, 69 insertions(+), 2 deletions(-)

> 

> diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh

> index 464e31e..64cd2e2 100755

> --- a/tools/testing/selftests/net/pmtu.sh

> +++ b/tools/testing/selftests/net/pmtu.sh

> @@ -162,7 +162,15 @@

>  # - list_flush_ipv6_exception

>  #	Using the same topology as in pmtu_ipv6, create exceptions, and check

>  #	they are shown when listing exception caches, gone after flushing them

> -

> +#

> +# - pmtu_ipv4_route_change

> +#	Use the same topology as in pmtu_ipv4, but issue a route replacement

> +#	command and delete the corresponding device afterward. This tests for

> +#	proper cleanup of the PMTU exceptions by the route replacement path.

> +#	Device unregistration should complete successfully

> +#

> +# - pmtu_ipv6_route_change

> +#	Same as above but with IPv6

>  

>  # Kselftest framework requirement - SKIP code is 4.

>  ksft_skip=4

> @@ -224,7 +232,9 @@ tests="

>  	cleanup_ipv4_exception		ipv4: cleanup of cached exceptions	1

>  	cleanup_ipv6_exception		ipv6: cleanup of cached exceptions	1

>  	list_flush_ipv4_exception	ipv4: list and flush cached exceptions	1

> -	list_flush_ipv6_exception	ipv6: list and flush cached exceptions	1"

> +	list_flush_ipv6_exception	ipv6: list and flush cached exceptions	1

> +	pmtu_ipv4_route_change		ipv4: PMTU exception w/route replace	1

> +	pmtu_ipv6_route_change		ipv6: PMTU exception w/route replace	1"

>  

>  NS_A="ns-A"

>  NS_B="ns-B"

> @@ -1782,6 +1792,63 @@ test_list_flush_ipv6_exception() {

>  	return ${fail}

>  }

>  

> +test_pmtu_ipvX_route_change() {

> +	family=${1}

> +

> +	setup namespaces routing || return 2

> +	trace "${ns_a}"  veth_A-R1    "${ns_r1}" veth_R1-A \

> +	      "${ns_r1}" veth_R1-B    "${ns_b}"  veth_B-R1 \

> +	      "${ns_a}"  veth_A-R2    "${ns_r2}" veth_R2-A \

> +	      "${ns_r2}" veth_R2-B    "${ns_b}"  veth_B-R2

> +

> +	if [ ${family} -eq 4 ]; then

> +		ping=ping

> +		dst1="${prefix4}.${b_r1}.1"

> +		dst2="${prefix4}.${b_r2}.1"

> +		gw="${prefix4}.${a_r1}.2"

> +	else

> +		ping=${ping6}

> +		dst1="${prefix6}:${b_r1}::1"

> +		dst2="${prefix6}:${b_r2}::1"

> +		gw="${prefix6}:${a_r1}::2"

> +	fi

> +

> +	# Set up initial MTU values

> +	mtu "${ns_a}"  veth_A-R1 2000

> +	mtu "${ns_r1}" veth_R1-A 2000

> +	mtu "${ns_r1}" veth_R1-B 1400

> +	mtu "${ns_b}"  veth_B-R1 1400

> +

> +	mtu "${ns_a}"  veth_A-R2 2000

> +	mtu "${ns_r2}" veth_R2-A 2000

> +	mtu "${ns_r2}" veth_R2-B 1500

> +	mtu "${ns_b}"  veth_B-R2 1500

> +

> +	# Create route exceptions

> +	run_cmd ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s 1800 ${dst1}

> +	run_cmd ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s 1800 ${dst2}

> +

> +	# Check that exceptions have been created with the correct PMTU

> +	pmtu_1="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst1})"

> +	check_pmtu_value "1400" "${pmtu_1}" "exceeding MTU" || return 1

> +	pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst2})"

> +	check_pmtu_value "1500" "${pmtu_2}" "exceeding MTU" || return 1

> +

> +	# Replace the route from A to R1

> +	run_cmd ${ns_a} ip route change default via ${gw}

> +

> +	# Delete the device in A

> +	run_cmd ${ns_a} ip link del "veth_A-R1"

> +}

> +

> +test_pmtu_ipv4_route_change() {

> +	test_pmtu_ipvX_route_change 4

> +}

> +

> +test_pmtu_ipv6_route_change() {

> +	test_pmtu_ipvX_route_change 6

> +}

> +

>  usage() {

>  	echo

>  	echo "$0 [OPTIONS] [TEST]..."

> 


Thanks for adding the tests.
Reviewed-by: David Ahern <dsahern@kernel.org>
Jakub Kicinski Jan. 7, 2021, 8:04 p.m. UTC | #3
On Wed, 6 Jan 2021 09:55:09 -0700 David Ahern wrote:
> On 1/5/21 5:22 PM, Sean Tranchetti wrote:

> > From: Sean Tranchetti <stranche@codeaurora.org>

> > 

> > Route removal is handled by two code paths. The main removal path is via

> > fib6_del_route() which will handle purging any PMTU exceptions from the

> > cache, removing all per-cpu copies of the DST entry used by the route, and

> > releasing the fib6_info struct.

> > 

> > The second removal location is during fib6_add_rt2node() during a route

> > replacement operation. This path also calls fib6_purge_rt() to handle

> > cleaning up the per-cpu copies of the DST entries and releasing the

> > fib6_info associated with the older route, but it does not flush any PMTU

> > exceptions that the older route had. Since the older route is removed from

> > the tree during the replacement, we lose any way of accessing it again.

> > 

> > As these lingering DSTs and the fib6_info struct are holding references to

> > the underlying netdevice struct as well, unregistering that device from the

> > kernel can never complete.

> 

> I think the right fixes tag is:

> 

> Fixes: 2b760fcf5cfb3 ("ipv6: hook up exception table to store dst cache")

> 

> cc'ed author of that patch.

> 

> > Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>

>

> Reviewed-by: David Ahern <dsahern@kernel.org>


Applied, thanks!
diff mbox series

Patch

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 605cdd3..f43e275 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1025,6 +1025,8 @@  static void fib6_purge_rt(struct fib6_info *rt, struct fib6_node *fn,
 {
 	struct fib6_table *table = rt->fib6_table;
 
+	/* Flush all cached dst in exception table */
+	rt6_flush_exceptions(rt);
 	fib6_drop_pcpu_from(rt, table);
 
 	if (rt->nh && !list_empty(&rt->nh_list))
@@ -1927,9 +1929,6 @@  static void fib6_del_route(struct fib6_table *table, struct fib6_node *fn,
 	net->ipv6.rt6_stats->fib_rt_entries--;
 	net->ipv6.rt6_stats->fib_discarded_routes++;
 
-	/* Flush all cached dst in exception table */
-	rt6_flush_exceptions(rt);
-
 	/* Reset round-robin state, if necessary */
 	if (rcu_access_pointer(fn->rr_ptr) == rt)
 		fn->rr_ptr = NULL;