diff mbox series

[bpf-next] bpf: using rcu_read_lock for bpf_sk_storage_map iterator

Message ID 20200914184630.1048718-1-yhs@fb.com
State New
Headers show
Series [bpf-next] bpf: using rcu_read_lock for bpf_sk_storage_map iterator | expand

Commit Message

Yonghong Song Sept. 14, 2020, 6:46 p.m. UTC
Currently, we use bucket_lock when traversing bpf_sk_storage_map
elements. Since bpf_iter programs cannot use bpf_sk_storage_get()
and bpf_sk_storage_delete() helpers which may also grab bucket lock,
we do not have a deadlock issue which exists for hashmap when
using bucket_lock ([1]).

If a bucket contains a lot of sockets, during bpf_iter traversing
a bucket, concurrent bpf_sk_storage_{get,delete}() may experience
some undesirable delays. Using rcu_read_lock() is a reasonable
compromise here. Although it may lose some precision, e.g.,
access stale sockets, but it will not hurt performance of other
bpf programs.

[1] https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com

Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 net/core/bpf_sk_storage.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

Comments

Song Liu Sept. 14, 2020, 9:28 p.m. UTC | #1
On Mon, Sep 14, 2020 at 11:47 AM Yonghong Song <yhs@fb.com> wrote:
>
> Currently, we use bucket_lock when traversing bpf_sk_storage_map
> elements. Since bpf_iter programs cannot use bpf_sk_storage_get()
> and bpf_sk_storage_delete() helpers which may also grab bucket lock,
> we do not have a deadlock issue which exists for hashmap when
> using bucket_lock ([1]).

The paragraph above describes why we can use bucket_lock, which is more
or less irrelevant to this change. Also, I am not sure why we refer to [1] here.

>
> If a bucket contains a lot of sockets, during bpf_iter traversing
> a bucket, concurrent bpf_sk_storage_{get,delete}() may experience
> some undesirable delays. Using rcu_read_lock() is a reasonable

It will be great to include some performance comparison.

> compromise here. Although it may lose some precision, e.g.,
> access stale sockets, but it will not hurt performance of other
> bpf programs.
>
> [1] https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com
>
> Cc: Martin KaFai Lau <kafai@fb.com>
> Signed-off-by: Yonghong Song <yhs@fb.com>

Other than these,

Acked-by: Song Liu <songliubraving@fb.com>

[...]
Yonghong Song Sept. 15, 2020, 5:25 a.m. UTC | #2
On 9/14/20 2:28 PM, Song Liu wrote:
> On Mon, Sep 14, 2020 at 11:47 AM Yonghong Song <yhs@fb.com> wrote:
>>
>> Currently, we use bucket_lock when traversing bpf_sk_storage_map
>> elements. Since bpf_iter programs cannot use bpf_sk_storage_get()
>> and bpf_sk_storage_delete() helpers which may also grab bucket lock,
>> we do not have a deadlock issue which exists for hashmap when
>> using bucket_lock ([1]).
> 
> The paragraph above describes why we can use bucket_lock, which is more
> or less irrelevant to this change. Also, I am not sure why we refer to [1] here.

What I try to clarify here is unlike [1], we do not have bugs
with the current code.
But I guess no bugs is implicit so I probably do not need to say
anything. Will skip the above paragraph in the next revision.

> 
>>
>> If a bucket contains a lot of sockets, during bpf_iter traversing
>> a bucket, concurrent bpf_sk_storage_{get,delete}() may experience
>> some undesirable delays. Using rcu_read_lock() is a reasonable
> 
> It will be great to include some performance comparison.

Sure. Let me give some try to collect some statistics.

> 
>> compromise here. Although it may lose some precision, e.g.,
>> access stale sockets, but it will not hurt performance of other
>> bpf programs.
>>
>> [1] https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com
>>
>> Cc: Martin KaFai Lau <kafai@fb.com>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
> 
> Other than these,
> 
> Acked-by: Song Liu <songliubraving@fb.com>
> 
> [...]
>
Jakub Kicinski Sept. 15, 2020, 3:33 p.m. UTC | #3
On Mon, 14 Sep 2020 11:46:30 -0700 Yonghong Song wrote:
> Currently, we use bucket_lock when traversing bpf_sk_storage_map

> elements. Since bpf_iter programs cannot use bpf_sk_storage_get()

> and bpf_sk_storage_delete() helpers which may also grab bucket lock,

> we do not have a deadlock issue which exists for hashmap when

> using bucket_lock ([1]).

> 

> If a bucket contains a lot of sockets, during bpf_iter traversing

> a bucket, concurrent bpf_sk_storage_{get,delete}() may experience

> some undesirable delays. Using rcu_read_lock() is a reasonable

> compromise here. Although it may lose some precision, e.g.,

> access stale sockets, but it will not hurt performance of other

> bpf programs.

> 

> [1] https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com

> 

> Cc: Martin KaFai Lau <kafai@fb.com>

> Signed-off-by: Yonghong Song <yhs@fb.com>


Sparse is not happy about it. Could you add some annotations, perhaps?

include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_find_next' - unexpected unlock
include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_stop' - unexpected unlock
Yonghong Song Sept. 15, 2020, 5:35 p.m. UTC | #4
On 9/15/20 8:33 AM, Jakub Kicinski wrote:
> On Mon, 14 Sep 2020 11:46:30 -0700 Yonghong Song wrote:

>> Currently, we use bucket_lock when traversing bpf_sk_storage_map

>> elements. Since bpf_iter programs cannot use bpf_sk_storage_get()

>> and bpf_sk_storage_delete() helpers which may also grab bucket lock,

>> we do not have a deadlock issue which exists for hashmap when

>> using bucket_lock ([1]).

>>

>> If a bucket contains a lot of sockets, during bpf_iter traversing

>> a bucket, concurrent bpf_sk_storage_{get,delete}() may experience

>> some undesirable delays. Using rcu_read_lock() is a reasonable

>> compromise here. Although it may lose some precision, e.g.,

>> access stale sockets, but it will not hurt performance of other

>> bpf programs.

>>

>> [1] https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com

>>

>> Cc: Martin KaFai Lau <kafai@fb.com>

>> Signed-off-by: Yonghong Song <yhs@fb.com>

> 

> Sparse is not happy about it. Could you add some annotations, perhaps?

> 

> include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_find_next' - unexpected unlock

> include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_stop' - unexpected unlock


Okay, I will try.

On my system, sparse is unhappy and core dumped....

/data/users/yhs/work/net-next/include/linux/string.h:12:38: error: too 
many errors
/bin/sh: line 1: 2710132 Segmentation fault      (core dumped) sparse 
-D__linux__ -Dlinux -D__STDC__ -Dunix
-D__unix__ -Wbitwise -Wno-return-void -Wno-unknown-attribute 
-D__x86_64__ --arch=x86 -mlittle-endian -m64 -W
p,-MMD,net/core/.bpf_sk_storage.o.d -nostdinc -isystem
...
/data/users/yhs/work/net-next/net/core/bpf_sk_storage.c
make[3]: *** [net/core/bpf_sk_storage.o] Error 139
make[3]: *** Deleting file `net/core/bpf_sk_storage.o'

-bash-4.4$ rpm -qf /bin/sparse
sparse-0.5.2-1.el7.x86_64
-bash-4.4$
Jakub Kicinski Sept. 15, 2020, 5:40 p.m. UTC | #5
On Tue, 15 Sep 2020 10:35:50 -0700 Yonghong Song wrote:
> On 9/15/20 8:33 AM, Jakub Kicinski wrote:
> > On Mon, 14 Sep 2020 11:46:30 -0700 Yonghong Song wrote:  
> >> Currently, we use bucket_lock when traversing bpf_sk_storage_map
> >> elements. Since bpf_iter programs cannot use bpf_sk_storage_get()
> >> and bpf_sk_storage_delete() helpers which may also grab bucket lock,
> >> we do not have a deadlock issue which exists for hashmap when
> >> using bucket_lock ([1]).
> >>
> >> If a bucket contains a lot of sockets, during bpf_iter traversing
> >> a bucket, concurrent bpf_sk_storage_{get,delete}() may experience
> >> some undesirable delays. Using rcu_read_lock() is a reasonable
> >> compromise here. Although it may lose some precision, e.g.,
> >> access stale sockets, but it will not hurt performance of other
> >> bpf programs.
> >>
> >> [1] https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com
> >>
> >> Cc: Martin KaFai Lau <kafai@fb.com>
> >> Signed-off-by: Yonghong Song <yhs@fb.com>  
> > 
> > Sparse is not happy about it. Could you add some annotations, perhaps?
> > 
> > include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_find_next' - unexpected unlock
> > include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_stop' - unexpected unlock  
> 
> Okay, I will try.
> 
> On my system, sparse is unhappy and core dumped....
> 
> /data/users/yhs/work/net-next/include/linux/string.h:12:38: error: too 
> many errors
> /bin/sh: line 1: 2710132 Segmentation fault      (core dumped) sparse 
> -D__linux__ -Dlinux -D__STDC__ -Dunix
> -D__unix__ -Wbitwise -Wno-return-void -Wno-unknown-attribute 
> -D__x86_64__ --arch=x86 -mlittle-endian -m64 -W
> p,-MMD,net/core/.bpf_sk_storage.o.d -nostdinc -isystem
> ...
> /data/users/yhs/work/net-next/net/core/bpf_sk_storage.c
> make[3]: *** [net/core/bpf_sk_storage.o] Error 139
> make[3]: *** Deleting file `net/core/bpf_sk_storage.o'
> 
> -bash-4.4$ rpm -qf /bin/sparse
> sparse-0.5.2-1.el7.x86_64
> -bash-4.4$

I think you need to build from source, sadly :(

https://git.kernel.org/pub/scm//devel/sparse/sparse.git
Yonghong Song Sept. 15, 2020, 6:56 p.m. UTC | #6
On 9/15/20 10:40 AM, Jakub Kicinski wrote:
> On Tue, 15 Sep 2020 10:35:50 -0700 Yonghong Song wrote:

>> On 9/15/20 8:33 AM, Jakub Kicinski wrote:

>>> On Mon, 14 Sep 2020 11:46:30 -0700 Yonghong Song wrote:

>>>> Currently, we use bucket_lock when traversing bpf_sk_storage_map

>>>> elements. Since bpf_iter programs cannot use bpf_sk_storage_get()

>>>> and bpf_sk_storage_delete() helpers which may also grab bucket lock,

>>>> we do not have a deadlock issue which exists for hashmap when

>>>> using bucket_lock ([1]).

>>>>

>>>> If a bucket contains a lot of sockets, during bpf_iter traversing

>>>> a bucket, concurrent bpf_sk_storage_{get,delete}() may experience

>>>> some undesirable delays. Using rcu_read_lock() is a reasonable

>>>> compromise here. Although it may lose some precision, e.g.,

>>>> access stale sockets, but it will not hurt performance of other

>>>> bpf programs.

>>>>

>>>> [1] https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com

>>>>

>>>> Cc: Martin KaFai Lau <kafai@fb.com>

>>>> Signed-off-by: Yonghong Song <yhs@fb.com>

>>>

>>> Sparse is not happy about it. Could you add some annotations, perhaps?

>>>

>>> include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_find_next' - unexpected unlock

>>> include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_stop' - unexpected unlock

>>

>> Okay, I will try.

>>

>> On my system, sparse is unhappy and core dumped....

>>

>> /data/users/yhs/work/net-next/include/linux/string.h:12:38: error: too

>> many errors

>> /bin/sh: line 1: 2710132 Segmentation fault      (core dumped) sparse

>> -D__linux__ -Dlinux -D__STDC__ -Dunix

>> -D__unix__ -Wbitwise -Wno-return-void -Wno-unknown-attribute

>> -D__x86_64__ --arch=x86 -mlittle-endian -m64 -W

>> p,-MMD,net/core/.bpf_sk_storage.o.d -nostdinc -isystem

>> ...

>> /data/users/yhs/work/net-next/net/core/bpf_sk_storage.c

>> make[3]: *** [net/core/bpf_sk_storage.o] Error 139

>> make[3]: *** Deleting file `net/core/bpf_sk_storage.o'

>>

>> -bash-4.4$ rpm -qf /bin/sparse

>> sparse-0.5.2-1.el7.x86_64

>> -bash-4.4$

> 

> I think you need to build from source, sadly :(

> 

> https://git.kernel.org/pub/scm//devel/sparse/sparse.git


Indeed, building sparse from source works. After adding some 
__releases(RCU) and __acquires(RCU), I now have:
   context imbalance in 'bpf_sk_storage_map_seq_find_next' - different 
lock contexts for basic block
I may need to restructure code to please sparse...

>
Alexei Starovoitov Sept. 15, 2020, 7:03 p.m. UTC | #7
On Tue, Sep 15, 2020 at 11:56 AM Yonghong Song <yhs@fb.com> wrote:
>

>

>

> On 9/15/20 10:40 AM, Jakub Kicinski wrote:

> > On Tue, 15 Sep 2020 10:35:50 -0700 Yonghong Song wrote:

> >> On 9/15/20 8:33 AM, Jakub Kicinski wrote:

> >>> On Mon, 14 Sep 2020 11:46:30 -0700 Yonghong Song wrote:

> >>>> Currently, we use bucket_lock when traversing bpf_sk_storage_map

> >>>> elements. Since bpf_iter programs cannot use bpf_sk_storage_get()

> >>>> and bpf_sk_storage_delete() helpers which may also grab bucket lock,

> >>>> we do not have a deadlock issue which exists for hashmap when

> >>>> using bucket_lock ([1]).

> >>>>

> >>>> If a bucket contains a lot of sockets, during bpf_iter traversing

> >>>> a bucket, concurrent bpf_sk_storage_{get,delete}() may experience

> >>>> some undesirable delays. Using rcu_read_lock() is a reasonable

> >>>> compromise here. Although it may lose some precision, e.g.,

> >>>> access stale sockets, but it will not hurt performance of other

> >>>> bpf programs.

> >>>>

> >>>> [1] https://lore.kernel.org/bpf/20200902235341.2001534-1-yhs@fb.com

> >>>>

> >>>> Cc: Martin KaFai Lau <kafai@fb.com>

> >>>> Signed-off-by: Yonghong Song <yhs@fb.com>

> >>>

> >>> Sparse is not happy about it. Could you add some annotations, perhaps?

> >>>

> >>> include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_find_next' - unexpected unlock

> >>> include/linux/rcupdate.h:686:9: warning: context imbalance in 'bpf_sk_storage_map_seq_stop' - unexpected unlock

> >>

> >> Okay, I will try.

> >>

> >> On my system, sparse is unhappy and core dumped....

> >>

> >> /data/users/yhs/work/net-next/include/linux/string.h:12:38: error: too

> >> many errors

> >> /bin/sh: line 1: 2710132 Segmentation fault      (core dumped) sparse

> >> -D__linux__ -Dlinux -D__STDC__ -Dunix

> >> -D__unix__ -Wbitwise -Wno-return-void -Wno-unknown-attribute

> >> -D__x86_64__ --arch=x86 -mlittle-endian -m64 -W

> >> p,-MMD,net/core/.bpf_sk_storage.o.d -nostdinc -isystem

> >> ...

> >> /data/users/yhs/work/net-next/net/core/bpf_sk_storage.c

> >> make[3]: *** [net/core/bpf_sk_storage.o] Error 139

> >> make[3]: *** Deleting file `net/core/bpf_sk_storage.o'

> >>

> >> -bash-4.4$ rpm -qf /bin/sparse

> >> sparse-0.5.2-1.el7.x86_64

> >> -bash-4.4$

> >

> > I think you need to build from source, sadly :(

> >

> > https://git.kernel.org/pub/scm//devel/sparse/sparse.git

>

> Indeed, building sparse from source works. After adding some

> __releases(RCU) and __acquires(RCU), I now have:

>    context imbalance in 'bpf_sk_storage_map_seq_find_next' - different

> lock contexts for basic block

> I may need to restructure code to please sparse...


I don't think sparse can handle such things even with all annotations.
I would spend too much time on it.
diff mbox series

Patch

diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 4a86ea34f29e..a1db5e988d19 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -701,7 +701,7 @@  bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info,
 		if (!selem) {
 			/* not found, unlock and go to the next bucket */
 			b = &smap->buckets[bucket_id++];
-			raw_spin_unlock_bh(&b->lock);
+			rcu_read_unlock();
 			skip_elems = 0;
 			break;
 		}
@@ -715,7 +715,7 @@  bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info,
 
 	for (i = bucket_id; i < (1U << smap->bucket_log); i++) {
 		b = &smap->buckets[i];
-		raw_spin_lock_bh(&b->lock);
+		rcu_read_lock();
 		count = 0;
 		hlist_for_each_entry(selem, &b->list, map_node) {
 			sk_storage = rcu_dereference_raw(selem->local_storage);
@@ -726,7 +726,7 @@  bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info,
 			}
 			count++;
 		}
-		raw_spin_unlock_bh(&b->lock);
+		rcu_read_unlock();
 		skip_elems = 0;
 	}
 
@@ -806,13 +806,10 @@  static void bpf_sk_storage_map_seq_stop(struct seq_file *seq, void *v)
 	struct bpf_local_storage_map *smap;
 	struct bpf_local_storage_map_bucket *b;
 
-	if (!v) {
+	if (!v)
 		(void)__bpf_sk_storage_map_seq_show(seq, v);
-	} else {
-		smap = (struct bpf_local_storage_map *)info->map;
-		b = &smap->buckets[info->bucket_id];
-		raw_spin_unlock_bh(&b->lock);
-	}
+	else
+		rcu_read_unlock();
 }
 
 static int bpf_iter_init_sk_storage_map(void *priv_data,