mbox series

[iproute2-next,0/6] ip: nexthop: Support resilient groups

Message ID cover.1615568866.git.petrm@nvidia.com
Headers show
Series ip: nexthop: Support resilient groups | expand

Message

Petr Machata March 12, 2021, 5:23 p.m. UTC
Support for resilient next-hop groups was recently accepted to Linux
kernel[1]. Resilient next-hop groups add a layer of indirection between the
SKB hash and the next hop. Thus the hash is used to reference a hash table
bucket, which is then used to reference a particular next hop. This allows
the system more flexibility when assigning SKB hash space to next hops.
Previously, each next hop had to be assigned a continuous range of SKB hash
space. With a hash table as an intermediate layer, it is possible to
reassign next hops with a hash table bucket granularity. In turn, this
mends issues with traffic flow redirection resulting from next hop removal
or adjustments in next-hop weights.

In this patch set, introduce support for resilient next-hop groups to
iproute2.

- Patch #1 brings include/uapi/linux/nexthop.h and /rtnetlink.h up to date.

- Patches #2 and #3 add new helpers that will be useful later.

- Patch #4 extends the ip/nexthop sub-tool to accept group type as a
  command line argument, and to dispatch based on the specified type.

- Patch #5 adds the support for resilient next-hop groups.

- Patch #6 adds the support for resilient next-hop group bucket interface.

To illustrate the usage, consider the following commands:

 # ip nexthop add id 1 via 192.0.2.2 dev dummy1
 # ip nexthop add id 2 via 192.0.2.3 dev dummy1
 # ip nexthop add id 10 group 1/2 type resilient \
	buckets 8 idle_timer 60 unbalanced_timer 300

The last command creates a resilient next-hop group. It will have 8
buckets, each bucket will be considered idle when no traffic hits it for at
least 60 seconds, and if the table remains out of balance for 300 seconds,
it will be forcefully brought into balance.

And this is how the next-hop group bucket interface looks:

 # ip nexthop bucket show id 10
 id 10 index 0 idle_time 5.59 nhid 1
 id 10 index 1 idle_time 5.59 nhid 1
 id 10 index 2 idle_time 8.74 nhid 2
 id 10 index 3 idle_time 8.74 nhid 2
 id 10 index 4 idle_time 8.74 nhid 1
 id 10 index 5 idle_time 8.74 nhid 1
 id 10 index 6 idle_time 8.74 nhid 1
 id 10 index 7 idle_time 8.74 nhid 1

[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=2a0186a37700b0d5b8cc40be202a62af44f02fa2

Ido Schimmel (4):
  nexthop: Synchronize uAPI files
  nexthop: Add ability to specify group type
  nexthop: Add support for resilient nexthop groups
  nexthop: Add support for nexthop buckets

Petr Machata (2):
  json_print: Add print_tv()
  nexthop: Extract a helper to parse a NH ID

 include/json_print.h           |   1 +
 include/libnetlink.h           |   3 +
 include/uapi/linux/nexthop.h   |  47 +++-
 include/uapi/linux/rtnetlink.h |   7 +
 ip/ip_common.h                 |   1 +
 ip/ipmonitor.c                 |   6 +
 ip/ipnexthop.c                 | 451 ++++++++++++++++++++++++++++++++-
 lib/json_print.c               |  13 +
 lib/libnetlink.c               |  26 ++
 man/man8/ip-nexthop.8          | 112 +++++++-
 10 files changed, 650 insertions(+), 17 deletions(-)

Comments

David Ahern March 14, 2021, 3:55 p.m. UTC | #1
On 3/12/21 10:23 AM, Petr Machata wrote:
> From: Petr Machata <me@pmachata.org>

> 

> From: Ido Schimmel <idosch@nvidia.com>


All of the patches have the above. If Ido is the author and you are
sending, AIUI you add your Signed-off-by below his.

> 

> Next patches are going to add a 'resilient' nexthop group type, so allow

> users to specify the type using the 'type' argument. Currently, only

> 'mpath' type is supported.

> 

> These two command are equivalent:

> 

> Signed-off-by: Ido Schimmel <idosch@nvidia.com>

> ---

>  ip/ipnexthop.c        | 32 +++++++++++++++++++++++++++++++-

>  man/man8/ip-nexthop.8 | 18 ++++++++++++++++--

>  2 files changed, 47 insertions(+), 3 deletions(-)

> 


...

> diff --git a/man/man8/ip-nexthop.8 b/man/man8/ip-nexthop.8

> index 4d55f4dbcc75..f02e0555a000 100644

> --- a/man/man8/ip-nexthop.8

> +++ b/man/man8/ip-nexthop.8

> @@ -54,7 +54,9 @@ ip-nexthop \- nexthop object management

>  .BR fdb " ] | "

>  .B  group

>  .IR GROUP " [ "

> -.BR fdb " ] } "

> +.BR fdb " ] [ "

> +.B type

> +.IR TYPE " ] } "

>  

>  .ti -8

>  .IR ENCAP " := [ "

> @@ -71,6 +73,10 @@ ip-nexthop \- nexthop object management

>  .IR GROUP " := "

>  .BR id "[," weight "[/...]"

>  

> +.ti -8

> +.IR TYPE " := { "

> +.BR mpath " }"

> +

>  .SH DESCRIPTION

>  .B ip nexthop

>  is used to manipulate entries in the kernel's nexthop tables.

> @@ -122,9 +128,17 @@ is a set of encapsulation attributes specific to the

>  .in -2

>  

>  .TP

> -.BI group " GROUP"

> +.BI group " GROUP [ " type " TYPE ]"

>  create a nexthop group. Group specification is id with an optional

>  weight (id,weight) and a '/' as a separator between entries.

> +.sp

> +.I TYPE

> +is a string specifying the nexthop group type. Namely:

> +

> +.in +8

> +.BI mpath

> +- multipath nexthop group

> +


Add a comment that this is the default group type and refers to the
legacy hash-bashed multipath group.

The rest of the patches look ok to me.
Petr Machata March 15, 2021, 11:38 a.m. UTC | #2
David Ahern <dsahern@gmail.com> writes:

> On 3/12/21 10:23 AM, Petr Machata wrote:

>> From: Petr Machata <me@pmachata.org>

>> 

>> From: Ido Schimmel <idosch@nvidia.com>

>

> All of the patches have the above. If Ido is the author and you are

> sending, AIUI you add your Signed-off-by below his.


Sorry about that, that's a leftover from when I was sending the DCB
patches. I'll resend with the correct headers.

>> +.sp

>> +.I TYPE

>> +is a string specifying the nexthop group type. Namely:

>> +

>> +.in +8

>> +.BI mpath

>> +- multipath nexthop group

>> +

>

> Add a comment that this is the default group type and refers to the

> legacy hash-bashed multipath group.


OK.