diff mbox series

neigh: Support filtering neighbours for L3 slave

Message ID 20210801090105.27595-1-lschlesinger@drivenets.com
State New
Headers show
Series neigh: Support filtering neighbours for L3 slave | expand

Commit Message

Lahav Schlesinger Aug. 1, 2021, 9:01 a.m. UTC
Currently there's support for filtering neighbours for interfaces which
are in a specific VRF (passing the VRF interface in 'NDA_MASTER'), but
there's not support for filtering interfaces which are not in an L3
domain (the "default VRF").

This means userspace is unable to show/flush neighbours in the default VRF
(in contrast to a "real" VRF - Using "ip neigh show vrf <vrf_dev>").

Therefore for userspace to be able to do so, it must manually iterate
over all the interfaces, check each one if it's in the default VRF, and
if so send the matching flush/show message.

This patch adds the ability to do so easily, by passing a dummy value as
the 'NDA_MASTER' ('NDV_NOT_L3_SLAVE').
Note that 'NDV_NOT_L3_SLAVE' is a negative number, meaning it is not a valid
ifindex, so it doesn't break existing programs.

I have a patch for iproute2 ready for adding this support in userspace.

Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
---
 include/uapi/linux/neighbour.h | 2 ++
 net/core/neighbour.c           | 3 +++
 2 files changed, 5 insertions(+)

Comments

David Ahern Aug. 3, 2021, 3:51 a.m. UTC | #1
On 8/2/21 2:23 AM, Lahav Schlesinger wrote:
> On Sun, Aug 01, 2021 at 11:50:16AM -0600, David Ahern wrote:

>> On 8/1/21 3:01 AM, Lahav Schlesinger wrote:

>>> Currently there's support for filtering neighbours for interfaces which

>>> are in a specific VRF (passing the VRF interface in 'NDA_MASTER'), but

>>> there's not support for filtering interfaces which are not in an L3

>>> domain (the "default VRF").

>>>

>>> This means userspace is unable to show/flush neighbours in the default VRF

>>> (in contrast to a "real" VRF - Using "ip neigh show vrf <vrf_dev>").

>>>

>>> Therefore for userspace to be able to do so, it must manually iterate

>>> over all the interfaces, check each one if it's in the default VRF, and

>>> if so send the matching flush/show message.

>>>

>>> This patch adds the ability to do so easily, by passing a dummy value as

>>> the 'NDA_MASTER' ('NDV_NOT_L3_SLAVE').

>>> Note that 'NDV_NOT_L3_SLAVE' is a negative number, meaning it is not a valid

>>> ifindex, so it doesn't break existing programs.

>>>

>>> I have a patch for iproute2 ready for adding this support in userspace.

>>>

>>> Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>

>>> Cc: David S. Miller <davem@davemloft.net>

>>> Cc: Jakub Kicinski <kuba@kernel.org>

>>> Cc: David Ahern <dsahern@kernel.org>

>>> ---

>>>  include/uapi/linux/neighbour.h | 2 ++

>>>  net/core/neighbour.c           | 3 +++

>>>  2 files changed, 5 insertions(+)

>>>

>>> diff --git a/include/uapi/linux/neighbour.h b/include/uapi/linux/neighbour.h

>>> index dc8b72201f6c..d4f4c2189c63 100644

>>> --- a/include/uapi/linux/neighbour.h

>>> +++ b/include/uapi/linux/neighbour.h

>>> @@ -196,4 +196,6 @@ enum {

>>>  };

>>>  #define NFEA_MAX (__NFEA_MAX - 1)

>>>

>>> +#define NDV_NOT_L3_SLAVE	(-10)

>>> +

>>>  #endif

>>> diff --git a/net/core/neighbour.c b/net/core/neighbour.c

>>> index 53e85c70c6e5..b280103b6806 100644

>>> --- a/net/core/neighbour.c

>>> +++ b/net/core/neighbour.c

>>> @@ -2529,6 +2529,9 @@ static bool neigh_master_filtered(struct net_device *dev, int master_idx)

>>>  {

>>>  	struct net_device *master;

>>>

>>> +	if (master_idx == NDV_NOT_L3_SLAVE)

>>> +		return netif_is_l3_slave(dev);

>>> +

>>>  	if (!master_idx)

>>>  		return false;

>>>

>>>

>>

>> you can not special case VRFs like this, and such a feature should apply

>> to links and addresses as well.

> 

> Understandable, I'll change it.

> In this case though, how would you advice to efficiently filter

> neighbours for interfaces in the default VRF in userspace (without

> quering the master of every interface that is being dumped)?

> I reckoned that because there's support in iproute2 for filtering based

> on a specific VRF, filtering for the default VRF is a natural extension


iproute2 has support for a link database (ll_cache). You would basically
have to expand the cache to track any master device a link is associated
with and then fill the cache with a link dump first. It's expensive at
scale; the "no stats" filter helps a bit.

This is the reason for kernel side filtering on primary attributes
(coarse grain filtering at reasonable cost).

> 

>>

>> One idea is to pass "*_MASTER" as -1 (use "none" keyword for iproute2)

>> and then update kernel side to only return entries if the corresponding

>> device is not enslaved to another device. Unfortunately since I did not

>> check that _MASTER was non-zero in the current code, we can not use 0 as

>> a valid flag for "not enslaved". Be sure to document why -1 is used.

> 

> Do you mean the command will look like "ip link show master none"?

> If so, wouldn't this cause an ambiguity if an interface names "none" is present?

> 


You could always detect "none" as a valid device name and error out or
use "nomaster" as a way to say "show all devices not enslaved to a
bridge or VRF.
Lahav Schlesinger Aug. 3, 2021, 6:47 a.m. UTC | #2
On Mon, Aug 02, 2021 at 09:51:29PM -0600, David Ahern wrote:
> On 8/2/21 2:23 AM, Lahav Schlesinger wrote:

> > On Sun, Aug 01, 2021 at 11:50:16AM -0600, David Ahern wrote:

> >> On 8/1/21 3:01 AM, Lahav Schlesinger wrote:

> >>> Currently there's support for filtering neighbours for interfaces which

> >>> are in a specific VRF (passing the VRF interface in 'NDA_MASTER'), but

> >>> there's not support for filtering interfaces which are not in an L3

> >>> domain (the "default VRF").

> >>>

> >>> This means userspace is unable to show/flush neighbours in the default VRF

> >>> (in contrast to a "real" VRF - Using "ip neigh show vrf <vrf_dev>").

> >>>

> >>> Therefore for userspace to be able to do so, it must manually iterate

> >>> over all the interfaces, check each one if it's in the default VRF, and

> >>> if so send the matching flush/show message.

> >>>

> >>> This patch adds the ability to do so easily, by passing a dummy value as

> >>> the 'NDA_MASTER' ('NDV_NOT_L3_SLAVE').

> >>> Note that 'NDV_NOT_L3_SLAVE' is a negative number, meaning it is not a valid

> >>> ifindex, so it doesn't break existing programs.

> >>>

> >>> I have a patch for iproute2 ready for adding this support in userspace.

> >>>

> >>> Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>

> >>> Cc: David S. Miller <davem@davemloft.net>

> >>> Cc: Jakub Kicinski <kuba@kernel.org>

> >>> Cc: David Ahern <dsahern@kernel.org>

> >>> ---

> >>>  include/uapi/linux/neighbour.h | 2 ++

> >>>  net/core/neighbour.c           | 3 +++

> >>>  2 files changed, 5 insertions(+)

> >>>

> >>> diff --git a/include/uapi/linux/neighbour.h b/include/uapi/linux/neighbour.h

> >>> index dc8b72201f6c..d4f4c2189c63 100644

> >>> --- a/include/uapi/linux/neighbour.h

> >>> +++ b/include/uapi/linux/neighbour.h

> >>> @@ -196,4 +196,6 @@ enum {

> >>>  };

> >>>  #define NFEA_MAX (__NFEA_MAX - 1)

> >>>

> >>> +#define NDV_NOT_L3_SLAVE	(-10)

> >>> +

> >>>  #endif

> >>> diff --git a/net/core/neighbour.c b/net/core/neighbour.c

> >>> index 53e85c70c6e5..b280103b6806 100644

> >>> --- a/net/core/neighbour.c

> >>> +++ b/net/core/neighbour.c

> >>> @@ -2529,6 +2529,9 @@ static bool neigh_master_filtered(struct net_device *dev, int master_idx)

> >>>  {

> >>>  	struct net_device *master;

> >>>

> >>> +	if (master_idx == NDV_NOT_L3_SLAVE)

> >>> +		return netif_is_l3_slave(dev);

> >>> +

> >>>  	if (!master_idx)

> >>>  		return false;

> >>>

> >>>

> >>

> >> you can not special case VRFs like this, and such a feature should apply

> >> to links and addresses as well.

> >

> > Understandable, I'll change it.

> > In this case though, how would you advice to efficiently filter

> > neighbours for interfaces in the default VRF in userspace (without

> > quering the master of every interface that is being dumped)?

> > I reckoned that because there's support in iproute2 for filtering based

> > on a specific VRF, filtering for the default VRF is a natural extension

>

> iproute2 has support for a link database (ll_cache). You would basically

> have to expand the cache to track any master device a link is associated

> with and then fill the cache with a link dump first. It's expensive at

> scale; the "no stats" filter helps a bit.

>

> This is the reason for kernel side filtering on primary attributes

> (coarse grain filtering at reasonable cost).

>


Nice, didn't know about the ll_cache.
Does filtering based on being in the default VRF is something you think
can be merged into iproute2 (say with "novrf" keyword, following the "nomaster"
convention below - e.g. "ip link show novrf")?
If so I'll add it as a patch to iproute2.

> >

> >>

> >> One idea is to pass "*_MASTER" as -1 (use "none" keyword for iproute2)

> >> and then update kernel side to only return entries if the corresponding

> >> device is not enslaved to another device. Unfortunately since I did not

> >> check that _MASTER was non-zero in the current code, we can not use 0 as

> >> a valid flag for "not enslaved". Be sure to document why -1 is used.

> >

> > Do you mean the command will look like "ip link show master none"?

> > If so, wouldn't this cause an ambiguity if an interface names "none" is present?

> >

>

> You could always detect "none" as a valid device name and error out or

> use "nomaster" as a way to say "show all devices not enslaved to a

> bridge or VRF.


I think "nomaster" sounds reasonable, especially since it's already has
a meaning with "ip link set".
David Ahern Aug. 3, 2021, 2:42 p.m. UTC | #3
On 8/3/21 12:47 AM, Lahav Schlesinger wrote:
>>>> you can not special case VRFs like this, and such a feature should apply

>>>> to links and addresses as well.

>>>

>>> Understandable, I'll change it.

>>> In this case though, how would you advice to efficiently filter

>>> neighbours for interfaces in the default VRF in userspace (without

>>> quering the master of every interface that is being dumped)?

>>> I reckoned that because there's support in iproute2 for filtering based

>>> on a specific VRF, filtering for the default VRF is a natural extension

>>

>> iproute2 has support for a link database (ll_cache). You would basically

>> have to expand the cache to track any master device a link is associated

>> with and then fill the cache with a link dump first. It's expensive at

>> scale; the "no stats" filter helps a bit.

>>

>> This is the reason for kernel side filtering on primary attributes

>> (coarse grain filtering at reasonable cost).

>>

> 

> Nice, didn't know about the ll_cache.

> Does filtering based on being in the default VRF is something you think

> can be merged into iproute2 (say with "novrf" keyword, following the "nomaster"

> convention below - e.g. "ip link show novrf")?

> If so I'll add it as a patch to iproute2.


yes. There is also an outstanding request to show the vrf of neighbor
entries (e.g., add a new column at the end with vrf) when doing a full dump.
diff mbox series

Patch

diff --git a/include/uapi/linux/neighbour.h b/include/uapi/linux/neighbour.h
index dc8b72201f6c..d4f4c2189c63 100644
--- a/include/uapi/linux/neighbour.h
+++ b/include/uapi/linux/neighbour.h
@@ -196,4 +196,6 @@  enum {
 };
 #define NFEA_MAX (__NFEA_MAX - 1)
 
+#define NDV_NOT_L3_SLAVE	(-10)
+
 #endif
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 53e85c70c6e5..b280103b6806 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2529,6 +2529,9 @@  static bool neigh_master_filtered(struct net_device *dev, int master_idx)
 {
 	struct net_device *master;
 
+	if (master_idx == NDV_NOT_L3_SLAVE)
+		return netif_is_l3_slave(dev);
+
 	if (!master_idx)
 		return false;