Message ID | 20210709062943.101532-1-ilias.apalodimas@linaro.org |
---|---|
State | New |
Headers | show |
Series | [1/1,v2] skbuff: Fix a potential race while recycling page_pool packets | expand |
On Thu, Jul 8, 2021 at 11:30 PM Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote: > > As Alexander points out, when we are trying to recycle a cloned/expanded > SKB we might trigger a race. The recycling code relies on the > pp_recycle bit to trigger, which we carry over to cloned SKBs. > If that cloned SKB gets expanded or if we get references to the frags, > call skbb_release_data() and overwrite skb->head, we are creating separate > instances accessing the same page frags. Since the skb_release_data() > will first try to recycle the frags, there's a potential race between > the original and cloned SKB, since both will have the pp_recycle bit set. > > Fix this by explicitly those SKBs not recyclable. > The atomic_sub_return effectively limits us to a single release case, > and when we are calling skb_release_data we are also releasing the > option to perform the recycling, or releasing the pages from the page pool. > > Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") > Reported-by: Alexander Duyck <alexanderduyck@fb.com> > Suggested-by: Alexander Duyck <alexanderduyck@fb.com> > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> > --- > Changes since v1: > - Set the recycle bit to 0 during skb_release_data instead of the > individual fucntions triggering the issue, in order to catch all > cases > net/core/skbuff.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 12aabcda6db2..f91f09a824be 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb) > if (skb->cloned && > atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, > &shinfo->dataref)) > - return; > + goto exit; > > skb_zcopy_clear(skb, true); > > @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb) > kfree_skb_list(shinfo->frag_list); > > skb_free_head(skb); > +exit: > + skb->pp_recycle = 0; > } > > /* > -- > 2.32.0.rc0 > This is probably the cleanest approach with the least amount of change, but one thing I am concerned with in this approach is that we end up having to dirty a cacheline that I am not sure is otherwise touched during skb cleanup. I am not sure if that will be an issue or not. If it is then an alternative or follow-on patch could move the pp_recycle flag into the skb_shared_info flags itself and then make certain that we clear it around the same time we are setting shinfo->dataref to 1. Otherwise this looks good to me. Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
On Fri, Jul 09, 2021 at 07:34:38AM -0700, Alexander Duyck wrote: > On Thu, Jul 8, 2021 at 11:30 PM Ilias Apalodimas > <ilias.apalodimas@linaro.org> wrote: > > > > As Alexander points out, when we are trying to recycle a cloned/expanded > > SKB we might trigger a race. The recycling code relies on the > > pp_recycle bit to trigger, which we carry over to cloned SKBs. > > If that cloned SKB gets expanded or if we get references to the frags, > > call skbb_release_data() and overwrite skb->head, we are creating separate > > instances accessing the same page frags. Since the skb_release_data() > > will first try to recycle the frags, there's a potential race between > > the original and cloned SKB, since both will have the pp_recycle bit set. > > > > Fix this by explicitly those SKBs not recyclable. > > The atomic_sub_return effectively limits us to a single release case, > > and when we are calling skb_release_data we are also releasing the > > option to perform the recycling, or releasing the pages from the page pool. > > > > Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") > > Reported-by: Alexander Duyck <alexanderduyck@fb.com> > > Suggested-by: Alexander Duyck <alexanderduyck@fb.com> > > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> > > --- > > Changes since v1: > > - Set the recycle bit to 0 during skb_release_data instead of the > > individual fucntions triggering the issue, in order to catch all > > cases > > net/core/skbuff.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index 12aabcda6db2..f91f09a824be 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb) > > if (skb->cloned && > > atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, > > &shinfo->dataref)) > > - return; > > + goto exit; > > > > skb_zcopy_clear(skb, true); > > > > @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb) > > kfree_skb_list(shinfo->frag_list); > > > > skb_free_head(skb); > > +exit: > > + skb->pp_recycle = 0; > > } > > > > /* > > -- > > 2.32.0.rc0 > > > > This is probably the cleanest approach with the least amount of > change, but one thing I am concerned with in this approach is that we > end up having to dirty a cacheline that I am not sure is otherwise > touched during skb cleanup. I am not sure if that will be an issue or > not. If it is then an alternative or follow-on patch could move the > pp_recycle flag into the skb_shared_info flags itself and then make > certain that we clear it around the same time we are setting > shinfo->dataref to 1. > Yep that's a viable alternative. Let's see if there's any measurable impact. > Otherwise this looks good to me. > > Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Thanks Alexander!
On 09/07/2021 16.34, Alexander Duyck wrote: > On Thu, Jul 8, 2021 at 11:30 PM Ilias Apalodimas > <ilias.apalodimas@linaro.org> wrote: >> >> As Alexander points out, when we are trying to recycle a cloned/expanded >> SKB we might trigger a race. The recycling code relies on the >> pp_recycle bit to trigger, which we carry over to cloned SKBs. >> If that cloned SKB gets expanded or if we get references to the frags, >> call skbb_release_data() and overwrite skb->head, we are creating separate >> instances accessing the same page frags. Since the skb_release_data() >> will first try to recycle the frags, there's a potential race between >> the original and cloned SKB, since both will have the pp_recycle bit set. >> >> Fix this by explicitly those SKBs not recyclable. >> The atomic_sub_return effectively limits us to a single release case, >> and when we are calling skb_release_data we are also releasing the >> option to perform the recycling, or releasing the pages from the page pool. >> >> Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") >> Reported-by: Alexander Duyck <alexanderduyck@fb.com> >> Suggested-by: Alexander Duyck <alexanderduyck@fb.com> >> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> >> --- >> Changes since v1: >> - Set the recycle bit to 0 during skb_release_data instead of the >> individual fucntions triggering the issue, in order to catch all >> cases >> net/core/skbuff.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >> index 12aabcda6db2..f91f09a824be 100644 >> --- a/net/core/skbuff.c >> +++ b/net/core/skbuff.c >> @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb) >> if (skb->cloned && >> atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, >> &shinfo->dataref)) >> - return; >> + goto exit; >> >> skb_zcopy_clear(skb, true); >> >> @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb) >> kfree_skb_list(shinfo->frag_list); >> >> skb_free_head(skb); >> +exit: >> + skb->pp_recycle = 0; >> } >> >> /* >> -- >> 2.32.0.rc0 >> > > This is probably the cleanest approach with the least amount of > change, but one thing I am concerned with in this approach is that we > end up having to dirty a cacheline that I am not sure is otherwise > touched during skb cleanup. I am not sure if that will be an issue or > not. If it is then an alternative or follow-on patch could move the > pp_recycle flag into the skb_shared_info flags itself and then make > certain that we clear it around the same time we are setting > shinfo->dataref to 1. > The skb->cloned and skb->pp_recycle (bitfields) are on the same cache-line (incl. nohdr, destructor, active_extensions). Thus, we know this must be in CPUs cache, regardless of this change. I do acknowledge that it might be in cache coherency "Shared" state, and writing skb->pp_recycle=0 the CPU *might* have to change the cache coherency state, but I don't expect this to be a performance problem. > Otherwise this looks good to me. > > Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> I've gone over the code-path, with Ilias on IRC and I've convinced myself that this fix is correct, thus ACK.
On 2021/7/9 14:29, Ilias Apalodimas wrote: > As Alexander points out, when we are trying to recycle a cloned/expanded > SKB we might trigger a race. The recycling code relies on the > pp_recycle bit to trigger, which we carry over to cloned SKBs. > If that cloned SKB gets expanded or if we get references to the frags, > call skbb_release_data() and overwrite skb->head, we are creating separate > instances accessing the same page frags. Since the skb_release_data() > will first try to recycle the frags, there's a potential race between > the original and cloned SKB, since both will have the pp_recycle bit set. > > Fix this by explicitly those SKBs not recyclable. > The atomic_sub_return effectively limits us to a single release case, > and when we are calling skb_release_data we are also releasing the > option to perform the recycling, or releasing the pages from the page pool. > > Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") > Reported-by: Alexander Duyck <alexanderduyck@fb.com> > Suggested-by: Alexander Duyck <alexanderduyck@fb.com> > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> > --- > Changes since v1: > - Set the recycle bit to 0 during skb_release_data instead of the > individual fucntions triggering the issue, in order to catch all > cases > net/core/skbuff.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 12aabcda6db2..f91f09a824be 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb) > if (skb->cloned && > atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, > &shinfo->dataref)) > - return; > + goto exit; Is it possible this patch may break the head frag page for the original skb, supposing it's head frag page is from the page pool and below change clears the pp_recycle for original skb, causing a page leaking for the page pool? > > skb_zcopy_clear(skb, true); > > @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb) > kfree_skb_list(shinfo->frag_list); > > skb_free_head(skb); > +exit: > + skb->pp_recycle = 0; > } > > /* >
On Thu, 15 Jul 2021 at 07:01, Yunsheng Lin <linyunsheng@huawei.com> wrote: > > On 2021/7/9 14:29, Ilias Apalodimas wrote: > > As Alexander points out, when we are trying to recycle a cloned/expanded > > SKB we might trigger a race. The recycling code relies on the > > pp_recycle bit to trigger, which we carry over to cloned SKBs. > > If that cloned SKB gets expanded or if we get references to the frags, > > call skbb_release_data() and overwrite skb->head, we are creating separate > > instances accessing the same page frags. Since the skb_release_data() > > will first try to recycle the frags, there's a potential race between > > the original and cloned SKB, since both will have the pp_recycle bit set. > > > > Fix this by explicitly those SKBs not recyclable. > > The atomic_sub_return effectively limits us to a single release case, > > and when we are calling skb_release_data we are also releasing the > > option to perform the recycling, or releasing the pages from the page pool. > > > > Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") > > Reported-by: Alexander Duyck <alexanderduyck@fb.com> > > Suggested-by: Alexander Duyck <alexanderduyck@fb.com> > > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> > > --- > > Changes since v1: > > - Set the recycle bit to 0 during skb_release_data instead of the > > individual fucntions triggering the issue, in order to catch all > > cases > > net/core/skbuff.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index 12aabcda6db2..f91f09a824be 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb) > > if (skb->cloned && > > atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, > > &shinfo->dataref)) > > - return; > > + goto exit; > > Is it possible this patch may break the head frag page for the original skb, > supposing it's head frag page is from the page pool and below change clears > the pp_recycle for original skb, causing a page leaking for the page pool? > So this would leak eventually dma mapping if the skb is cloned (and the dataref is now +1) and we are freeing the original skb first? > > > > skb_zcopy_clear(skb, true); > > > > @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb) > > kfree_skb_list(shinfo->frag_list); > > > > skb_free_head(skb); > > +exit: > > + skb->pp_recycle = 0; > > } > > > > /* > >
On Thu, 15 Jul 2021 at 13:00, Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote: > > On Thu, 15 Jul 2021 at 07:01, Yunsheng Lin <linyunsheng@huawei.com> wrote: > > > > On 2021/7/9 14:29, Ilias Apalodimas wrote: > > > As Alexander points out, when we are trying to recycle a cloned/expanded > > > SKB we might trigger a race. The recycling code relies on the > > > pp_recycle bit to trigger, which we carry over to cloned SKBs. > > > If that cloned SKB gets expanded or if we get references to the frags, > > > call skbb_release_data() and overwrite skb->head, we are creating separate > > > instances accessing the same page frags. Since the skb_release_data() > > > will first try to recycle the frags, there's a potential race between > > > the original and cloned SKB, since both will have the pp_recycle bit set. > > > > > > Fix this by explicitly those SKBs not recyclable. > > > The atomic_sub_return effectively limits us to a single release case, > > > and when we are calling skb_release_data we are also releasing the > > > option to perform the recycling, or releasing the pages from the page pool. > > > > > > Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") > > > Reported-by: Alexander Duyck <alexanderduyck@fb.com> > > > Suggested-by: Alexander Duyck <alexanderduyck@fb.com> > > > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> > > > --- > > > Changes since v1: > > > - Set the recycle bit to 0 during skb_release_data instead of the > > > individual fucntions triggering the issue, in order to catch all > > > cases > > > net/core/skbuff.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > index 12aabcda6db2..f91f09a824be 100644 > > > --- a/net/core/skbuff.c > > > +++ b/net/core/skbuff.c > > > @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb) > > > if (skb->cloned && > > > atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, > > > &shinfo->dataref)) > > > - return; > > > + goto exit; > > > > Is it possible this patch may break the head frag page for the original skb, > > supposing it's head frag page is from the page pool and below change clears > > the pp_recycle for original skb, causing a page leaking for the page pool? > > > > So this would leak eventually dma mapping if the skb is cloned (and > the dataref is now +1) and we are freeing the original skb first? > Apologies for the noise, my description was not complete. The case you are thinking is clone an SKB and then expand the original? thanks /Ilias > > > > > > skb_zcopy_clear(skb, true); > > > > > > @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb) > > > kfree_skb_list(shinfo->frag_list); > > > > > > skb_free_head(skb); > > > +exit: > > > + skb->pp_recycle = 0; > > > } > > > > > > /* > > >
On 2021/7/15 18:38, Ilias Apalodimas wrote: > On Thu, 15 Jul 2021 at 13:00, Ilias Apalodimas > <ilias.apalodimas@linaro.org> wrote: >> >> On Thu, 15 Jul 2021 at 07:01, Yunsheng Lin <linyunsheng@huawei.com> wrote: >>> >>> On 2021/7/9 14:29, Ilias Apalodimas wrote: >>>> As Alexander points out, when we are trying to recycle a cloned/expanded >>>> SKB we might trigger a race. The recycling code relies on the >>>> pp_recycle bit to trigger, which we carry over to cloned SKBs. >>>> If that cloned SKB gets expanded or if we get references to the frags, >>>> call skbb_release_data() and overwrite skb->head, we are creating separate >>>> instances accessing the same page frags. Since the skb_release_data() >>>> will first try to recycle the frags, there's a potential race between >>>> the original and cloned SKB, since both will have the pp_recycle bit set. >>>> >>>> Fix this by explicitly those SKBs not recyclable. >>>> The atomic_sub_return effectively limits us to a single release case, >>>> and when we are calling skb_release_data we are also releasing the >>>> option to perform the recycling, or releasing the pages from the page pool. >>>> >>>> Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") >>>> Reported-by: Alexander Duyck <alexanderduyck@fb.com> >>>> Suggested-by: Alexander Duyck <alexanderduyck@fb.com> >>>> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> >>>> --- >>>> Changes since v1: >>>> - Set the recycle bit to 0 during skb_release_data instead of the >>>> individual fucntions triggering the issue, in order to catch all >>>> cases >>>> net/core/skbuff.c | 4 +++- >>>> 1 file changed, 3 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >>>> index 12aabcda6db2..f91f09a824be 100644 >>>> --- a/net/core/skbuff.c >>>> +++ b/net/core/skbuff.c >>>> @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb) >>>> if (skb->cloned && >>>> atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, >>>> &shinfo->dataref)) >>>> - return; >>>> + goto exit; >>> >>> Is it possible this patch may break the head frag page for the original skb, >>> supposing it's head frag page is from the page pool and below change clears >>> the pp_recycle for original skb, causing a page leaking for the page pool? >>> >> >> So this would leak eventually dma mapping if the skb is cloned (and >> the dataref is now +1) and we are freeing the original skb first? >> > > Apologies for the noise, my description was not complete. > The case you are thinking is clone an SKB and then expand the original? Yes. It seems we might need different pp_recycle bit for head frag and data frag. > > thanks > /Ilias > > >>>> >>>> skb_zcopy_clear(skb, true); >>>> >>>> @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb) >>>> kfree_skb_list(shinfo->frag_list); >>>> >>>> skb_free_head(skb); >>>> +exit: >>>> + skb->pp_recycle = 0; >>>> } >>>> >>>> /* >>>> > . >
On Thu, 15 Jul 2021 at 13:48, Yunsheng Lin <linyunsheng@huawei.com> wrote: > > On 2021/7/15 18:38, Ilias Apalodimas wrote: > > On Thu, 15 Jul 2021 at 13:00, Ilias Apalodimas > > <ilias.apalodimas@linaro.org> wrote: > >> > >> On Thu, 15 Jul 2021 at 07:01, Yunsheng Lin <linyunsheng@huawei.com> wrote: > >>> > >>> On 2021/7/9 14:29, Ilias Apalodimas wrote: > >>>> As Alexander points out, when we are trying to recycle a cloned/expanded > >>>> SKB we might trigger a race. The recycling code relies on the > >>>> pp_recycle bit to trigger, which we carry over to cloned SKBs. > >>>> If that cloned SKB gets expanded or if we get references to the frags, > >>>> call skbb_release_data() and overwrite skb->head, we are creating separate > >>>> instances accessing the same page frags. Since the skb_release_data() > >>>> will first try to recycle the frags, there's a potential race between > >>>> the original and cloned SKB, since both will have the pp_recycle bit set. > >>>> > >>>> Fix this by explicitly those SKBs not recyclable. > >>>> The atomic_sub_return effectively limits us to a single release case, > >>>> and when we are calling skb_release_data we are also releasing the > >>>> option to perform the recycling, or releasing the pages from the page pool. > >>>> > >>>> Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") > >>>> Reported-by: Alexander Duyck <alexanderduyck@fb.com> > >>>> Suggested-by: Alexander Duyck <alexanderduyck@fb.com> > >>>> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> > >>>> --- > >>>> Changes since v1: > >>>> - Set the recycle bit to 0 during skb_release_data instead of the > >>>> individual fucntions triggering the issue, in order to catch all > >>>> cases > >>>> net/core/skbuff.c | 4 +++- > >>>> 1 file changed, 3 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c > >>>> index 12aabcda6db2..f91f09a824be 100644 > >>>> --- a/net/core/skbuff.c > >>>> +++ b/net/core/skbuff.c > >>>> @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb) > >>>> if (skb->cloned && > >>>> atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, > >>>> &shinfo->dataref)) > >>>> - return; > >>>> + goto exit; > >>> > >>> Is it possible this patch may break the head frag page for the original skb, > >>> supposing it's head frag page is from the page pool and below change clears > >>> the pp_recycle for original skb, causing a page leaking for the page pool? > >>> > >> > >> So this would leak eventually dma mapping if the skb is cloned (and > >> the dataref is now +1) and we are freeing the original skb first? > >> > > > > Apologies for the noise, my description was not complete. > > The case you are thinking is clone an SKB and then expand the original? > > Yes. > It seems we might need different pp_recycle bit for head frag and data frag. We could just reset the pp_recycle flag on pskb_carve_inside_header, pskb_expand_header and pskb_carve_inside_nonlinear which were the three functions that might trigger the race to begin with. The point on adding it on skb_release_data was to have a catch all for all future cases ... Let me stare at itt a bit more in case I can come up with something better Thanks /Ilias > > > > > thanks > > /Ilias > > > > > >>>> > >>>> skb_zcopy_clear(skb, true); > >>>> > >>>> @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb) > >>>> kfree_skb_list(shinfo->frag_list); > >>>> > >>>> skb_free_head(skb); > >>>> +exit: > >>>> + skb->pp_recycle = 0; > >>>> } > >>>> > >>>> /* > >>>> > > . > >
On Wed, Jul 14, 2021 at 9:02 PM Yunsheng Lin <linyunsheng@huawei.com> wrote: > > On 2021/7/9 14:29, Ilias Apalodimas wrote: > > As Alexander points out, when we are trying to recycle a cloned/expanded > > SKB we might trigger a race. The recycling code relies on the > > pp_recycle bit to trigger, which we carry over to cloned SKBs. > > If that cloned SKB gets expanded or if we get references to the frags, > > call skbb_release_data() and overwrite skb->head, we are creating separate > > instances accessing the same page frags. Since the skb_release_data() > > will first try to recycle the frags, there's a potential race between > > the original and cloned SKB, since both will have the pp_recycle bit set. > > > > Fix this by explicitly those SKBs not recyclable. > > The atomic_sub_return effectively limits us to a single release case, > > and when we are calling skb_release_data we are also releasing the > > option to perform the recycling, or releasing the pages from the page pool. > > > > Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") > > Reported-by: Alexander Duyck <alexanderduyck@fb.com> > > Suggested-by: Alexander Duyck <alexanderduyck@fb.com> > > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> > > --- > > Changes since v1: > > - Set the recycle bit to 0 during skb_release_data instead of the > > individual fucntions triggering the issue, in order to catch all > > cases > > net/core/skbuff.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index 12aabcda6db2..f91f09a824be 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb) > > if (skb->cloned && > > atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, > > &shinfo->dataref)) > > - return; > > + goto exit; > > Is it possible this patch may break the head frag page for the original skb, > supposing it's head frag page is from the page pool and below change clears > the pp_recycle for original skb, causing a page leaking for the page pool? I don't see how. The assumption here is that when atomic_sub_return gets down to 0 we will still have an skb with skb->pp_recycle set and it will flow down and encounter skb_free_head below. All we are doing is skipping those steps and clearing skb->pp_recycle for all but the last buffer and the last one to free it will trigger the recycling. > > > > skb_zcopy_clear(skb, true); > > > > @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb) > > kfree_skb_list(shinfo->frag_list); > > > > skb_free_head(skb); > > +exit: > > + skb->pp_recycle = 0; Note the path here. We don't clear skb->pp_recycle for the last buffer where "dataref == 0" until *AFTER* the head has been freed, and all clones will have skb->pp_recycle = 1 as long as they are a clone of the original skb that had it set.
> > > atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, [...] > > > &shinfo->dataref)) > > > - return; > > > + goto exit; > > > > Is it possible this patch may break the head frag page for the original skb, > > supposing it's head frag page is from the page pool and below change clears > > the pp_recycle for original skb, causing a page leaking for the page pool? > > I don't see how. The assumption here is that when atomic_sub_return > gets down to 0 we will still have an skb with skb->pp_recycle set and > it will flow down and encounter skb_free_head below. All we are doing > is skipping those steps and clearing skb->pp_recycle for all but the > last buffer and the last one to free it will trigger the recycling. I think the assumption here is that 1. We clone an skb 2. The original skb goes into pskb_expand_head() 3. skb_release_data() will be called for the original skb But with the dataref bumped, we'll skip the recycling for it but we'll also skip recycling or unmapping the current head (which is a page_pool mapped buffer) > > > > > > > skb_zcopy_clear(skb, true); > > > > > > @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb) > > > kfree_skb_list(shinfo->frag_list); > > > > > > skb_free_head(skb); > > > +exit: > > > + skb->pp_recycle = 0; > > Note the path here. We don't clear skb->pp_recycle for the last buffer > where "dataref == 0" until *AFTER* the head has been freed, and all > clones will have skb->pp_recycle = 1 as long as they are a clone of > the original skb that had it set.
On Thu, Jul 15, 2021 at 7:45 AM Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote: > > > > > atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, > > [...] > > > > > &shinfo->dataref)) > > > > - return; > > > > + goto exit; > > > > > > Is it possible this patch may break the head frag page for the original skb, > > > supposing it's head frag page is from the page pool and below change clears > > > the pp_recycle for original skb, causing a page leaking for the page pool? > > > > I don't see how. The assumption here is that when atomic_sub_return > > gets down to 0 we will still have an skb with skb->pp_recycle set and > > it will flow down and encounter skb_free_head below. All we are doing > > is skipping those steps and clearing skb->pp_recycle for all but the > > last buffer and the last one to free it will trigger the recycling. > > I think the assumption here is that > 1. We clone an skb > 2. The original skb goes into pskb_expand_head() > 3. skb_release_data() will be called for the original skb > > But with the dataref bumped, we'll skip the recycling for it but we'll also > skip recycling or unmapping the current head (which is a page_pool mapped > buffer) Right, but in that case it is the clone that is left holding the original head and the skb->pp_recycle flag is set on the clone as it was copied from the original when we cloned it. What we have essentially done is transferred the responsibility for freeing it from the original to the clone. If you think about it the result is the same as if step 2 was to go into kfree_skb. We would still be calling skb_release_data and the dataref would be decremented without the original freeing the page. We have to wait until all the clones are freed and dataref reaches 0 before the head can be recycled.
On Thu, Jul 15, 2021 at 07:57:57AM -0700, Alexander Duyck wrote: > On Thu, Jul 15, 2021 at 7:45 AM Ilias Apalodimas > <ilias.apalodimas@linaro.org> wrote: > > > > > > > atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, > > > > [...] > > > > > > > &shinfo->dataref)) > > > > > - return; > > > > > + goto exit; > > > > > > > > Is it possible this patch may break the head frag page for the original skb, > > > > supposing it's head frag page is from the page pool and below change clears > > > > the pp_recycle for original skb, causing a page leaking for the page pool? > > > > > > I don't see how. The assumption here is that when atomic_sub_return > > > gets down to 0 we will still have an skb with skb->pp_recycle set and > > > it will flow down and encounter skb_free_head below. All we are doing > > > is skipping those steps and clearing skb->pp_recycle for all but the > > > last buffer and the last one to free it will trigger the recycling. > > > > I think the assumption here is that > > 1. We clone an skb > > 2. The original skb goes into pskb_expand_head() > > 3. skb_release_data() will be called for the original skb > > > > But with the dataref bumped, we'll skip the recycling for it but we'll also > > skip recycling or unmapping the current head (which is a page_pool mapped > > buffer) > > Right, but in that case it is the clone that is left holding the > original head and the skb->pp_recycle flag is set on the clone as it > was copied from the original when we cloned it. Ah yes, that's what I missed > What we have > essentially done is transferred the responsibility for freeing it from > the original to the clone. > > If you think about it the result is the same as if step 2 was to go > into kfree_skb. We would still be calling skb_release_data and the > dataref would be decremented without the original freeing the page. We > have to wait until all the clones are freed and dataref reaches 0 > before the head can be recycled. Yep sounds correct Thanks /Ilias
On 2021/7/15 23:02, Ilias Apalodimas wrote: > On Thu, Jul 15, 2021 at 07:57:57AM -0700, Alexander Duyck wrote: >> On Thu, Jul 15, 2021 at 7:45 AM Ilias Apalodimas >> <ilias.apalodimas@linaro.org> wrote: >>> >>>>>> atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, >>> >>> [...] >>> >>>>>> &shinfo->dataref)) >>>>>> - return; >>>>>> + goto exit; >>>>> >>>>> Is it possible this patch may break the head frag page for the original skb, >>>>> supposing it's head frag page is from the page pool and below change clears >>>>> the pp_recycle for original skb, causing a page leaking for the page pool? >>>> >>>> I don't see how. The assumption here is that when atomic_sub_return >>>> gets down to 0 we will still have an skb with skb->pp_recycle set and >>>> it will flow down and encounter skb_free_head below. All we are doing >>>> is skipping those steps and clearing skb->pp_recycle for all but the >>>> last buffer and the last one to free it will trigger the recycling. >>> >>> I think the assumption here is that >>> 1. We clone an skb >>> 2. The original skb goes into pskb_expand_head() >>> 3. skb_release_data() will be called for the original skb >>> >>> But with the dataref bumped, we'll skip the recycling for it but we'll also >>> skip recycling or unmapping the current head (which is a page_pool mapped >>> buffer) >> >> Right, but in that case it is the clone that is left holding the >> original head and the skb->pp_recycle flag is set on the clone as it >> was copied from the original when we cloned it. > > Ah yes, that's what I missed > >> What we have >> essentially done is transferred the responsibility for freeing it from >> the original to the clone. >> >> If you think about it the result is the same as if step 2 was to go >> into kfree_skb. We would still be calling skb_release_data and the >> dataref would be decremented without the original freeing the page. We >> have to wait until all the clones are freed and dataref reaches 0 >> before the head can be recycled. > > Yep sounds correct Ok, I suppose the fraglist skb is handled similar as the regular skb, right? Also, this patch might need respinning as the state of this patch is "Changes Requested" in patchwork. > > Thanks > /Ilias > . >
On Fri, 16 Jul 2021 at 05:30, Yunsheng Lin <linyunsheng@huawei.com> wrote: > > On 2021/7/15 23:02, Ilias Apalodimas wrote: > > On Thu, Jul 15, 2021 at 07:57:57AM -0700, Alexander Duyck wrote: > >> On Thu, Jul 15, 2021 at 7:45 AM Ilias Apalodimas > >> <ilias.apalodimas@linaro.org> wrote: > >>> > >>>>>> atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, > >>> > >>> [...] > >>> > >>>>>> &shinfo->dataref)) > >>>>>> - return; > >>>>>> + goto exit; > >>>>> > >>>>> Is it possible this patch may break the head frag page for the original skb, > >>>>> supposing it's head frag page is from the page pool and below change clears > >>>>> the pp_recycle for original skb, causing a page leaking for the page pool? > >>>> > >>>> I don't see how. The assumption here is that when atomic_sub_return > >>>> gets down to 0 we will still have an skb with skb->pp_recycle set and > >>>> it will flow down and encounter skb_free_head below. All we are doing > >>>> is skipping those steps and clearing skb->pp_recycle for all but the > >>>> last buffer and the last one to free it will trigger the recycling. > >>> > >>> I think the assumption here is that > >>> 1. We clone an skb > >>> 2. The original skb goes into pskb_expand_head() > >>> 3. skb_release_data() will be called for the original skb > >>> > >>> But with the dataref bumped, we'll skip the recycling for it but we'll also > >>> skip recycling or unmapping the current head (which is a page_pool mapped > >>> buffer) > >> > >> Right, but in that case it is the clone that is left holding the > >> original head and the skb->pp_recycle flag is set on the clone as it > >> was copied from the original when we cloned it. > > > > Ah yes, that's what I missed > > > >> What we have > >> essentially done is transferred the responsibility for freeing it from > >> the original to the clone. > >> > >> If you think about it the result is the same as if step 2 was to go > >> into kfree_skb. We would still be calling skb_release_data and the > >> dataref would be decremented without the original freeing the page. We > >> have to wait until all the clones are freed and dataref reaches 0 > >> before the head can be recycled. > > > > Yep sounds correct > > Ok, I suppose the fraglist skb is handled similar as the regular skb, right? > Yes, even in the fragments case your cloned/expanded SBK will still have the recycle bit set, so it will try to recycle them or unmap them > Also, this patch might need respinning as the state of this patch is "Changes > Requested" in patchwork. Thanks, I'll respin it and add a comment explaining why > > > > > Thanks > > /Ilias > > . > >
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 12aabcda6db2..f91f09a824be 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb) if (skb->cloned && atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, &shinfo->dataref)) - return; + goto exit; skb_zcopy_clear(skb, true); @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb) kfree_skb_list(shinfo->frag_list); skb_free_head(skb); +exit: + skb->pp_recycle = 0; } /*
As Alexander points out, when we are trying to recycle a cloned/expanded SKB we might trigger a race. The recycling code relies on the pp_recycle bit to trigger, which we carry over to cloned SKBs. If that cloned SKB gets expanded or if we get references to the frags, call skbb_release_data() and overwrite skb->head, we are creating separate instances accessing the same page frags. Since the skb_release_data() will first try to recycle the frags, there's a potential race between the original and cloned SKB, since both will have the pp_recycle bit set. Fix this by explicitly those SKBs not recyclable. The atomic_sub_return effectively limits us to a single release case, and when we are calling skb_release_data we are also releasing the option to perform the recycling, or releasing the pages from the page pool. Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") Reported-by: Alexander Duyck <alexanderduyck@fb.com> Suggested-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> --- Changes since v1: - Set the recycle bit to 0 during skb_release_data instead of the individual fucntions triggering the issue, in order to catch all cases net/core/skbuff.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- 2.32.0.rc0