Message ID | 20210126132311.3061388-1-idosch@idosch.org |
---|---|
Headers | show |
Series | Add notifications when route hardware flags change | expand |
On 1/26/21 6:23 AM, Ido Schimmel wrote: > @@ -889,22 +882,29 @@ static void nsim_nexthop_destroy(struct nsim_nexthop *nexthop) > static int nsim_nexthop_account(struct nsim_fib_data *data, u64 occ, > bool add, struct netlink_ext_ack *extack) > { > - int err = 0; > + int i, err = 0; > > if (add) { > - if (data->nexthops.num + occ <= data->nexthops.max) { > - data->nexthops.num += occ; > - } else { > - err = -ENOSPC; > - NL_SET_ERR_MSG_MOD(extack, "Exceeded number of supported nexthops"); > - } > + for (i = 0; i < occ; i++) > + if (!atomic64_add_unless(&data->nexthops.num, 1, > + data->nexthops.max)) { seems like this can be if (!atomic64_add_unless(&data->nexthops.num, occ, data->nexthops.max)) { and then the err_num_decrease is not needed > + err = -ENOSPC; > + NL_SET_ERR_MSG_MOD(extack, "Exceeded number of supported nexthops"); > + goto err_num_decrease; > + } > } else { > - if (WARN_ON(occ > data->nexthops.num)) > + if (WARN_ON(occ > atomic64_read(&data->nexthops.num))) > return -EINVAL; > - data->nexthops.num -= occ; > + atomic64_sub(occ, &data->nexthops.num); > } > > return err; > + > +err_num_decrease: > + for (i--; i >= 0; i--) > + atomic64_dec(&data->nexthops.num); and if this path is really needed, why not atomic64_sub here? > + return err; > + > } > > static int nsim_nexthop_add(struct nsim_fib_data *data, >
On 1/26/21 6:23 AM, Ido Schimmel wrote: > From: Amit Cohen <amcohen@nvidia.com> > > Currently, netdevsim implements dummy FIB offload and marks notified > routes with RTM_F_TRAP flag. netdevsim does not defer route notifications > to a work queue because it does not need to program any hardware. > > Given that netdevsim's purpose is to both give an example implementation > and allow developers to test their code, align netdevsim to a "real" > hardware device driver like mlxsw and have it also perform the route > "programming" in a non-atomic context. > > It will be used to test route flags notifications which will be added in > the next patches. > > The following changes are needed when route handling is performed in WQ: > - Handle the accounting in the main context, to be able to return an > error for adding route when all the routes are used. > For FIB_EVENT_ENTRY_REPLACE increase the counter before scheduling > the delayed work, and in case that this event replaces an existing route, > decrease the counter as part of the delayed work. > > - For IPv6, cannot use fen6_info->rt->fib6_siblings list because it > might be changed during handling the delayed work. > Save an array with the nexthops as part of fib6_event struct, and take > a reference for each nexthop to prevent them from being freed while > event is queued. > > - Change GFP_ATOMIC allocations to GFP_KERNEL. > > - Use single work item that is handling a list of ordered routes. > Handling routes must be processed in the order they were submitted to > avoid logical errors that could lead to unexpected failures. > > Signed-off-by: Amit Cohen <amcohen@nvidia.com> > Signed-off-by: Ido Schimmel <idosch@nvidia.com> > --- > drivers/net/netdevsim/fib.c | 467 +++++++++++++++++++++++++----------- > 1 file changed, 327 insertions(+), 140 deletions(-) > Acked-by: David Ahern <dsahern@kernel.org>
On 1/26/21 6:23 AM, Ido Schimmel wrote: > From: Amit Cohen <amcohen@nvidia.com> > > fib_dump_info() does not change 'fri', so pass it as 'const'. > It will later allow us to invoke fib_dump_info() from > fib_alias_hw_flags_set(). > > Signed-off-by: Amit Cohen <amcohen@nvidia.com> > Signed-off-by: Ido Schimmel <idosch@nvidia.com> > --- > net/ipv4/fib_lookup.h | 2 +- > net/ipv4/fib_semantics.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > Reviewed-by: David Ahern <dsahern@kernel.org>
On 1/26/21 6:23 AM, Ido Schimmel wrote: > From: Amit Cohen <amcohen@nvidia.com> > > After installing a route to the kernel, user space receives an > acknowledgment, which means the route was installed in the kernel, > but not necessarily in hardware. > > The asynchronous nature of route installation in hardware can lead to a > routing daemon advertising a route before it was actually installed in > hardware. This can result in packet loss or mis-routed packets until the > route is installed in hardware. > > It is also possible for a route already installed in hardware to change > its action and therefore its flags. For example, a host route that is > trapping packets can be "promoted" to perform decapsulation following > the installation of an IPinIP/VXLAN tunnel. > > Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags > are changed. The aim is to provide an indication to user-space > (e.g., routing daemons) about the state of the route in hardware. > > Introduce a sysctl that controls this behavior. > > Keep the default value at 0 (i.e., do not emit notifications) for several > reasons: > - Multiple RTM_NEWROUTE notification per-route might confuse existing > routing daemons. are you aware of any routing daemons that are affected? Seems like they should be able to handle redundant notifications > - Convergence reasons in routing daemons. > - The extra notifications will negatively impact the insertion rate. any numbers on the overhead? > - Not all users are interested in these notifications. > > Signed-off-by: Amit Cohen <amcohen@nvidia.com> > Acked-by: Roopa Prabhu <roopa@nvidia.com> > Signed-off-by: Ido Schimmel <idosch@nvidia.com> > --- > Documentation/networking/ip-sysctl.rst | 20 +++++++++++++++++++ > include/net/netns/ipv4.h | 2 ++ > net/ipv4/af_inet.c | 2 ++ > net/ipv4/fib_trie.c | 27 ++++++++++++++++++++++++++ > net/ipv4/sysctl_net_ipv4.c | 9 +++++++++ > 5 files changed, 60 insertions(+) >
On 1/26/21 6:23 AM, Ido Schimmel wrote: > From: Amit Cohen <amcohen@nvidia.com> > > The next patch will emit notification when hardware flags are changed, > in case that fib_notify_on_flag_change sysctl is set to 1. > > To know sysctl values, net struct is needed. > This change is consistent with the IPv4 version, which gets 'net' struct > as its first argument. > > Currently, the only callers of this function are mlxsw and netdevsim. > Patch the callers to pass net. > > Signed-off-by: Amit Cohen <amcohen@nvidia.com> > Signed-off-by: Ido Schimmel <idosch@nvidia.com> > --- > .../net/ethernet/mellanox/mlxsw/spectrum_router.c | 7 ++++--- > drivers/net/netdevsim/fib.c | 14 ++++++++------ > include/net/ip6_fib.h | 5 +++-- > 3 files changed, 15 insertions(+), 11 deletions(-) > Reviewed-by: David Ahern <dsahern@kernel.org>
>-----Original Message----- >From: David Ahern <dsahern@gmail.com> >Sent: Wednesday, January 27, 2021 6:33 >To: Ido Schimmel <idosch@idosch.org>; netdev@vger.kernel.org >Cc: davem@davemloft.net; kuba@kernel.org; Amit Cohen <amcohen@nvidia.com>; Roopa Prabhu <roopa@nvidia.com>; Donald >Sharp <sharpd@nvidia.com>; Benjamin Poirier <bpoirier@nvidia.com>; mlxsw <mlxsw@nvidia.com>; Ido Schimmel ><idosch@nvidia.com> >Subject: Re: [PATCH net-next 01/10] netdevsim: fib: Convert the current occupancy to an atomic variable > >On 1/26/21 6:23 AM, Ido Schimmel wrote: >> @@ -889,22 +882,29 @@ static void nsim_nexthop_destroy(struct >> nsim_nexthop *nexthop) static int nsim_nexthop_account(struct nsim_fib_data *data, u64 occ, >> bool add, struct netlink_ext_ack *extack) { >> - int err = 0; >> + int i, err = 0; >> >> if (add) { >> - if (data->nexthops.num + occ <= data->nexthops.max) { >> - data->nexthops.num += occ; >> - } else { >> - err = -ENOSPC; >> - NL_SET_ERR_MSG_MOD(extack, "Exceeded number of supported nexthops"); >> - } >> + for (i = 0; i < occ; i++) >> + if (!atomic64_add_unless(&data->nexthops.num, 1, >> + data->nexthops.max)) { > >seems like this can be > if (!atomic64_add_unless(&data->nexthops.num, occ, > data->nexthops.max)) { atomic64_add_unless(x, y, z) adds y to x if x was not already z. Which means that when for example num=2, occ=2, max=3: atomic64_add_unless(&data->nexthops.num, occ, data->nexthops.max) won't fail when it should. This situation is realistic and actually with atomic64_add_unless() selftests/drivers/net/netdevsim/nexthop.sh fails. > >and then the err_num_decrease is not needed > >> + err = -ENOSPC; >> + NL_SET_ERR_MSG_MOD(extack, "Exceeded number of supported nexthops"); >> + goto err_num_decrease; >> + } >> } else { >> - if (WARN_ON(occ > data->nexthops.num)) >> + if (WARN_ON(occ > atomic64_read(&data->nexthops.num))) >> return -EINVAL; >> - data->nexthops.num -= occ; >> + atomic64_sub(occ, &data->nexthops.num); >> } >> >> return err; >> + >> +err_num_decrease: >> + for (i--; i >= 0; i--) >> + atomic64_dec(&data->nexthops.num); > >and if this path is really needed, why not atomic64_sub here? > >> + return err; >> + >> } >> >> static int nsim_nexthop_add(struct nsim_fib_data *data, >>
>-----Original Message----- >From: David Ahern <dsahern@gmail.com> >Sent: Wednesday, January 27, 2021 7:03 >To: Ido Schimmel <idosch@idosch.org>; netdev@vger.kernel.org >Cc: davem@davemloft.net; kuba@kernel.org; Amit Cohen <amcohen@nvidia.com>; Roopa Prabhu <roopa@nvidia.com>; Donald >Sharp <sharpd@nvidia.com>; Benjamin Poirier <bpoirier@nvidia.com>; mlxsw <mlxsw@nvidia.com>; Ido Schimmel ><idosch@nvidia.com> >Subject: Re: [PATCH net-next 05/10] net: ipv4: Emit notification when fib hardware flags are changed > >On 1/26/21 6:23 AM, Ido Schimmel wrote: >> From: Amit Cohen <amcohen@nvidia.com> >> >> After installing a route to the kernel, user space receives an >> acknowledgment, which means the route was installed in the kernel, but >> not necessarily in hardware. >> >> The asynchronous nature of route installation in hardware can lead to >> a routing daemon advertising a route before it was actually installed >> in hardware. This can result in packet loss or mis-routed packets >> until the route is installed in hardware. >> >> It is also possible for a route already installed in hardware to >> change its action and therefore its flags. For example, a host route >> that is trapping packets can be "promoted" to perform decapsulation >> following the installation of an IPinIP/VXLAN tunnel. >> >> Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP >> flags are changed. The aim is to provide an indication to user-space >> (e.g., routing daemons) about the state of the route in hardware. >> >> Introduce a sysctl that controls this behavior. >> >> Keep the default value at 0 (i.e., do not emit notifications) for >> several >> reasons: >> - Multiple RTM_NEWROUTE notification per-route might confuse existing >> routing daemons. > >are you aware of any routing daemons that are affected? Seems like they should be able to handle redundant notifications Actually no, we didn't check all the existing daemons, just assume that not everyone will want to activate the notifications at all. So there is no point in sending notifications for users which aren't interested in them. > >> - Convergence reasons in routing daemons. >> - The extra notifications will negatively impact the insertion rate. > >any numbers on the overhead? For addition of 256k routes in mlxsw, the overhead is 3.6% of the total time. > >> - Not all users are interested in these notifications. >> >> Signed-off-by: Amit Cohen <amcohen@nvidia.com> >> Acked-by: Roopa Prabhu <roopa@nvidia.com> >> Signed-off-by: Ido Schimmel <idosch@nvidia.com> >> --- >> Documentation/networking/ip-sysctl.rst | 20 +++++++++++++++++++ >> include/net/netns/ipv4.h | 2 ++ >> net/ipv4/af_inet.c | 2 ++ >> net/ipv4/fib_trie.c | 27 ++++++++++++++++++++++++++ >> net/ipv4/sysctl_net_ipv4.c | 9 +++++++++ >> 5 files changed, 60 insertions(+) >>
On 1/26/21 6:23 AM, Ido Schimmel wrote: > From: Amit Cohen <amcohen@nvidia.com> > > After installing a route to the kernel, user space receives an > acknowledgment, which means the route was installed in the kernel, > but not necessarily in hardware. > > The asynchronous nature of route installation in hardware can lead to a > routing daemon advertising a route before it was actually installed in > hardware. This can result in packet loss or mis-routed packets until the > route is installed in hardware. > > It is also possible for a route already installed in hardware to change > its action and therefore its flags. For example, a host route that is > trapping packets can be "promoted" to perform decapsulation following > the installation of an IPinIP/VXLAN tunnel. > > Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags > are changed. The aim is to provide an indication to user-space > (e.g., routing daemons) about the state of the route in hardware. > > Introduce a sysctl that controls this behavior. > > Keep the default value at 0 (i.e., do not emit notifications) for several > reasons: > - Multiple RTM_NEWROUTE notification per-route might confuse existing > routing daemons. > - Convergence reasons in routing daemons. > - The extra notifications will negatively impact the insertion rate. > - Not all users are interested in these notifications. > > Signed-off-by: Amit Cohen <amcohen@nvidia.com> > Acked-by: Roopa Prabhu <roopa@nvidia.com> > Signed-off-by: Ido Schimmel <idosch@nvidia.com> > --- > Documentation/networking/ip-sysctl.rst | 20 +++++++++++++++++++ > include/net/netns/ipv4.h | 2 ++ > net/ipv4/af_inet.c | 2 ++ > net/ipv4/fib_trie.c | 27 ++++++++++++++++++++++++++ > net/ipv4/sysctl_net_ipv4.c | 9 +++++++++ > 5 files changed, 60 insertions(+) > Reviewed-by: David Ahern <dsahern@kernel.org>
On 1/27/21 3:51 AM, Amit Cohen wrote: > > >> -----Original Message----- >> From: David Ahern <dsahern@gmail.com> >> Sent: Wednesday, January 27, 2021 6:33 >> To: Ido Schimmel <idosch@idosch.org>; netdev@vger.kernel.org >> Cc: davem@davemloft.net; kuba@kernel.org; Amit Cohen <amcohen@nvidia.com>; Roopa Prabhu <roopa@nvidia.com>; Donald >> Sharp <sharpd@nvidia.com>; Benjamin Poirier <bpoirier@nvidia.com>; mlxsw <mlxsw@nvidia.com>; Ido Schimmel >> <idosch@nvidia.com> >> Subject: Re: [PATCH net-next 01/10] netdevsim: fib: Convert the current occupancy to an atomic variable >> >> On 1/26/21 6:23 AM, Ido Schimmel wrote: >>> @@ -889,22 +882,29 @@ static void nsim_nexthop_destroy(struct >>> nsim_nexthop *nexthop) static int nsim_nexthop_account(struct nsim_fib_data *data, u64 occ, >>> bool add, struct netlink_ext_ack *extack) { >>> - int err = 0; >>> + int i, err = 0; >>> >>> if (add) { >>> - if (data->nexthops.num + occ <= data->nexthops.max) { >>> - data->nexthops.num += occ; >>> - } else { >>> - err = -ENOSPC; >>> - NL_SET_ERR_MSG_MOD(extack, "Exceeded number of supported nexthops"); >>> - } >>> + for (i = 0; i < occ; i++) >>> + if (!atomic64_add_unless(&data->nexthops.num, 1, >>> + data->nexthops.max)) { >> >> seems like this can be >> if (!atomic64_add_unless(&data->nexthops.num, occ, >> data->nexthops.max)) { > > atomic64_add_unless(x, y, z) adds y to x if x was not already z. > Which means that when for example num=2, occ=2, max=3: > atomic64_add_unless(&data->nexthops.num, occ, data->nexthops.max) won't fail when it should. > ok, missed that in the description. I thought it was if the total would equal or be greater than z.
On Tue, 26 Jan 2021 15:23:06 +0200 Ido Schimmel wrote: > Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags > are changed. The aim is to provide an indication to user-space > (e.g., routing daemons) about the state of the route in hardware. What does the daemon in the user space do with it? The notification will only be generated for the _first_ ASIC which offloaded the object. Which may be fine for you today but as an uAPI it feels slightly lacking. If the user space just wants to make sure the devices are synced to notifications from certain stage, wouldn't it be more idiomatic to provide some "fence" operation? WDYT? David?
On 1/28/21 8:04 PM, Jakub Kicinski wrote: > On Tue, 26 Jan 2021 15:23:06 +0200 Ido Schimmel wrote: >> Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags >> are changed. The aim is to provide an indication to user-space >> (e.g., routing daemons) about the state of the route in hardware. > > What does the daemon in the user space do with it? You don't want FRR for example to advertise a route to a peer until it is really programmed in h/w. This notification gives routing daemons that information. > > The notification will only be generated for the _first_ ASIC which > offloaded the object. Which may be fine for you today but as an uAPI > it feels slightly lacking. > > If the user space just wants to make sure the devices are synced to > notifications from certain stage, wouldn't it be more idiomatic to > provide some "fence" operation? > > WDYT? David? > This feature was first discussed I think about 2 years ago - when I was still with Cumulus, so I already knew the intent and end goal. I think support for multiple ASICs / NICs doing this kind of offload will have a whole lot of challenges. I don't think this particular user notification is going to be a big problem - e.g., you could always delay the emit until all have indicated the offload.
On Thu, 28 Jan 2021 20:33:22 -0700 David Ahern wrote: > On 1/28/21 8:04 PM, Jakub Kicinski wrote: > > On Tue, 26 Jan 2021 15:23:06 +0200 Ido Schimmel wrote: > >> Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags > >> are changed. The aim is to provide an indication to user-space > >> (e.g., routing daemons) about the state of the route in hardware. > > > > What does the daemon in the user space do with it? > > You don't want FRR for example to advertise a route to a peer until it > is really programmed in h/w. This notification gives routing daemons > that information. I see. Hm. > > The notification will only be generated for the _first_ ASIC which > > offloaded the object. Which may be fine for you today but as an uAPI > > it feels slightly lacking. > > > > If the user space just wants to make sure the devices are synced to > > notifications from certain stage, wouldn't it be more idiomatic to > > provide some "fence" operation? > > > > WDYT? David? > > This feature was first discussed I think about 2 years ago - when I was > still with Cumulus, so I already knew the intent and end goal. > > I think support for multiple ASICs / NICs doing this kind of offload > will have a whole lot of challenges. I don't think this particular user > notification is going to be a big problem - e.g., you could always delay > the emit until all have indicated the offload. My impression from working on this problem in TC is that the definition of "all" becomes problematic especially if one takes into account drivers getting reloaded. But I think routing offload has stronger semantics than TC, so no objections. We need a respin for the somewhat embarrassing loop in patch 1, tho.
On Thu, Jan 28, 2021 at 08:33:22PM -0700, David Ahern wrote: > On 1/28/21 8:04 PM, Jakub Kicinski wrote: > > On Tue, 26 Jan 2021 15:23:06 +0200 Ido Schimmel wrote: > >> Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags > >> are changed. The aim is to provide an indication to user-space > >> (e.g., routing daemons) about the state of the route in hardware. > > > > What does the daemon in the user space do with it? > > You don't want FRR for example to advertise a route to a peer until it > is really programmed in h/w. This notification gives routing daemons > that information. Correct. It is in the cover letter: "These flags are of interest to routing daemons since they would like to delay advertisement of routes until they are installed in hardware." Amit is working on follow-up to emit notifications when route offload fails. This request also comes from the FRR team. Currently we have a policy inside mlxsw to abort route offload and install a default route that sends all the traffic to the CPU. It obviously kills the box and anyway the policy is something user space should decide, not the kernel. > > > > > The notification will only be generated for the _first_ ASIC which > > offloaded the object. Which may be fine for you today but as an uAPI > > it feels slightly lacking. > > > > If the user space just wants to make sure the devices are synced to > > notifications from certain stage, wouldn't it be more idiomatic to > > provide some "fence" operation? > > > > WDYT? David? > > > > This feature was first discussed I think about 2 years ago - when I was > still with Cumulus, so I already knew the intent and end goal. > > I think support for multiple ASICs / NICs doing this kind of offload > will have a whole lot of challenges. I don't think this particular user > notification is going to be a big problem - e.g., you could always delay > the emit until all have indicated the offload. I do not have experience with multi-ASIC systems, but my understanding is that each ASIC has its own copy of the networking stack and the ASICs are connected via front panel or backplane ports, like distinct leaf/spine switches. In Linux, such a system can be supported by registering a devlink instance for each ASIC and reloading each instance to a separate namespace. Thanks for reviewing, David
On Thu, Jan 28, 2021 at 08:15:45PM -0800, Jakub Kicinski wrote: > My impression from working on this problem in TC is that the definition > of "all" becomes problematic especially if one takes into account > drivers getting reloaded. But I think routing offload has stronger > semantics than TC, so no objections. During the teardown phase of the reload, all the routes using the driver's netdevs (or their uppers) will be deleted by the kernel because the netdevs will disappear. During the init phase of the reload, the driver will re-register its FIB notifier and ask for a dump of all the existing routes (usually only host routes). With this patchset, user space will receive a notification that these routes are now in hardware. # ip monitor route broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1 local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1 local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1 broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1 broadcast 10.209.1.255 dev eth0 table local proto kernel scope link src 10.209.0.191 local 10.209.0.191 dev eth0 table local proto kernel scope host src 10.209.0.191 broadcast 10.209.0.0 dev eth0 table local proto kernel scope link src 10.209.0.191 10.209.0.1 dev eth0 proto dhcp scope link src 10.209.0.191 metric 1024 10.209.0.0/23 dev eth0 proto kernel scope link src 10.209.0.191 default via 10.209.0.1 dev eth0 proto dhcp src 10.209.0.191 metric 1024 << init phase starts here >> default via 10.209.0.1 dev eth0 proto dhcp src 10.209.0.191 metric 1024 rt_trap 10.209.0.0/23 dev eth0 proto kernel scope link src 10.209.0.191 rt_trap 10.209.0.1 dev eth0 proto dhcp scope link src 10.209.0.191 metric 1024 rt_trap broadcast 10.209.0.0 dev eth0 table local proto kernel scope link src 10.209.0.191 rt_trap local 10.209.0.191 dev eth0 table local proto kernel scope host src 10.209.0.191 rt_trap broadcast 10.209.1.255 dev eth0 table local proto kernel scope link src 10.209.0.191 rt_trap broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1 rt_trap local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1 rt_trap local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1 rt_trap broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1 rt_trap
From: Ido Schimmel <idosch@nvidia.com> Routes installed to the kernel can be programmed to capable devices, in which case they are marked with one of two flags. RTM_F_OFFLOAD for routes that offload traffic from the kernel and RTM_F_TRAP for routes that trap packets to the kernel for processing (e.g., host routes). These flags are of interest to routing daemons since they would like to delay advertisement of routes until they are installed in hardware. This allows them to avoid packet loss or misrouted packets. Currently, routing daemons do not receive any notifications when these flags are changed, requiring them to poll the kernel tables for changes which is inefficient. This series addresses the issue by having the kernel emit RTM_NEWROUTE notifications whenever these flags change. The behavior is controlled by two sysctls (net.ipv4.fib_notify_on_flag_change and net.ipv6.fib_notify_on_flag_change) that default to 0 (no notifications). Note that even if route installation in hardware is improved to be more synchronous, these notifications are still of interest. For example, a multipath route can change from RTM_F_OFFLOAD to RTM_F_TRAP if its neighbours become invalid. A routing daemon can choose to withdraw / replace the route in that case. In addition, the deletion of a route from the kernel can prompt the installation of an identical route (already in kernel, with an higher metric) to hardware. For testing purposes, netdevsim is aligned to simulate a "real" driver that programs routes to hardware. Series overview: Patches #1-#2 align netdevsim to perform route programming in a non-atomic context Patches #3-#5 add sysctl to control IPv4 notifications Patches #6-#8 add sysctl to control IPv6 notifications Patch #9 extends existing fib tests to set sysctls before running tests Patch #10 adds test for fib notifications over netdevsim Amit Cohen (10): netdevsim: fib: Convert the current occupancy to an atomic variable netdevsim: fib: Perform the route programming in a non-atomic context net: ipv4: Pass fib_rt_info as const to fib_dump_info() net: ipv4: Publish fib_nlmsg_size() net: ipv4: Emit notification when fib hardware flags are changed net: Pass 'net' struct as first argument to fib6_info_hw_flags_set() net: Do not call fib6_info_hw_flags_set() when IPv6 is disabled net: ipv6: Emit notification when fib hardware flags are changed selftests: Extend fib tests to run with and without flags notifications selftests: netdevsim: Add fib_notifications test Documentation/networking/ip-sysctl.rst | 40 ++ .../ethernet/mellanox/mlxsw/spectrum_router.c | 23 +- drivers/net/netdevsim/fib.c | 535 ++++++++++++------ include/net/ip6_fib.h | 9 +- include/net/netns/ipv4.h | 2 + include/net/netns/ipv6.h | 1 + net/ipv4/af_inet.c | 2 + net/ipv4/fib_lookup.h | 3 +- net/ipv4/fib_semantics.c | 4 +- net/ipv4/fib_trie.c | 27 + net/ipv4/sysctl_net_ipv4.c | 9 + net/ipv6/af_inet6.c | 1 + net/ipv6/route.c | 44 ++ net/ipv6/sysctl_net_ipv6.c | 9 + .../selftests/drivers/net/mlxsw/fib.sh | 14 + .../selftests/drivers/net/netdevsim/fib.sh | 14 + .../net/netdevsim/fib_notifications.sh | 300 ++++++++++ 17 files changed, 855 insertions(+), 182 deletions(-) create mode 100755 tools/testing/selftests/drivers/net/netdevsim/fib_notifications.sh