mbox series

[bpf-next,v6,0/2] Introduce a new kfunc of bpf_task_under_cgroup

Message ID 20230505060818.60037-1-zhoufeng.zf@bytedance.com
Headers show
Series Introduce a new kfunc of bpf_task_under_cgroup | expand

Message

Feng Zhou May 5, 2023, 6:08 a.m. UTC
From: Feng Zhou <zhoufeng.zf@bytedance.com>

Trace sched related functions, such as enqueue_task_fair, it is necessary to
specify a task instead of the current task which within a given cgroup.

Feng Zhou (2):
  bpf: Add bpf_task_under_cgroup() kfunc
  selftests/bpf: Add testcase for bpf_task_under_cgroup

Changelog:
v5->v6: Addressed comments from Yonghong Song
- Some code format modifications.
- Add ack-by
Details in here:
https://lore.kernel.org/all/20230504031513.13749-1-zhoufeng.zf@bytedance.com/

v4->v5: Addressed comments from Yonghong Song
- Some code format modifications.
Details in here:
https://lore.kernel.org/all/20230428071737.43849-1-zhoufeng.zf@bytedance.com/

v3->v4: Addressed comments from Yonghong Song
- Modify test cases and test other tasks, not the current task.
Details in here:
https://lore.kernel.org/all/20230427023019.73576-1-zhoufeng.zf@bytedance.com/

v2->v3: Addressed comments from Alexei Starovoitov
- Modify the comment information of the function.
- Narrow down the testcase's hook point
Details in here:
https://lore.kernel.org/all/20230421090403.15515-1-zhoufeng.zf@bytedance.com/

v1->v2: Addressed comments from Alexei Starovoitov
- Add kfunc instead.
Details in here:
https://lore.kernel.org/all/20230420072657.80324-1-zhoufeng.zf@bytedance.com/

 kernel/bpf/helpers.c                          | 20 +++++++
 tools/testing/selftests/bpf/DENYLIST.s390x    |  1 +
 .../bpf/prog_tests/task_under_cgroup.c        | 53 +++++++++++++++++++
 .../bpf/progs/test_task_under_cgroup.c        | 51 ++++++++++++++++++
 4 files changed, 125 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_task_under_cgroup.c

Comments

Hao Luo May 5, 2023, 6:58 a.m. UTC | #1
On Thu, May 4, 2023 at 11:08 PM Feng zhou <zhoufeng.zf@bytedance.com> wrote:
>
<...>
> ---
>  kernel/bpf/helpers.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index bb6b4637ebf2..453cbd312366 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2149,6 +2149,25 @@ __bpf_kfunc struct cgroup *bpf_cgroup_from_id(u64 cgid)
>                 return NULL;
>         return cgrp;
>  }
> +
> +/**
> + * bpf_task_under_cgroup - wrap task_under_cgroup_hierarchy() as a kfunc, test
> + * task's membership of cgroup ancestry.
> + * @task: the task to be tested
> + * @ancestor: possible ancestor of @task's cgroup
> + *
> + * Tests whether @task's default cgroup hierarchy is a descendant of @ancestor.
> + * It follows all the same rules as cgroup_is_descendant, and only applies
> + * to the default hierarchy.
> + */
> +__bpf_kfunc long bpf_task_under_cgroup(struct task_struct *task,
> +                                      struct cgroup *ancestor)
> +{
> +       if (unlikely(!ancestor || !task))
> +               return -EINVAL;
> +
> +       return task_under_cgroup_hierarchy(task, ancestor);
> +}
>  #endif /* CONFIG_CGROUPS */
>

I wonder in what situation a null 'task' or 'ancestor' can be passed.
Please call out in the comment that the returned value can be a
negative error, so that writing if(bpf_task_under_cgroup()) may cause
surprising results.

Hao
Feng Zhou May 5, 2023, 7:18 a.m. UTC | #2
在 2023/5/5 14:58, Hao Luo 写道:
> On Thu, May 4, 2023 at 11:08 PM Feng zhou <zhoufeng.zf@bytedance.com> wrote:
>>
> <...>
>> ---
>>   kernel/bpf/helpers.c | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
>>
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index bb6b4637ebf2..453cbd312366 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -2149,6 +2149,25 @@ __bpf_kfunc struct cgroup *bpf_cgroup_from_id(u64 cgid)
>>                  return NULL;
>>          return cgrp;
>>   }
>> +
>> +/**
>> + * bpf_task_under_cgroup - wrap task_under_cgroup_hierarchy() as a kfunc, test
>> + * task's membership of cgroup ancestry.
>> + * @task: the task to be tested
>> + * @ancestor: possible ancestor of @task's cgroup
>> + *
>> + * Tests whether @task's default cgroup hierarchy is a descendant of @ancestor.
>> + * It follows all the same rules as cgroup_is_descendant, and only applies
>> + * to the default hierarchy.
>> + */
>> +__bpf_kfunc long bpf_task_under_cgroup(struct task_struct *task,
>> +                                      struct cgroup *ancestor)
>> +{
>> +       if (unlikely(!ancestor || !task))
>> +               return -EINVAL;
>> +
>> +       return task_under_cgroup_hierarchy(task, ancestor);
>> +}
>>   #endif /* CONFIG_CGROUPS */
>>
> 
> I wonder in what situation a null 'task' or 'ancestor' can be passed.
> Please call out in the comment that the returned value can be a
> negative error, so that writing if(bpf_task_under_cgroup()) may cause
> surprising results.
> 
> Hao

Hmm, you are right. As kfunc, the NULL value of the parameter is judged, 
and bpf verify will prompt the developer to add it. There is really no 
need to add this part of the judgment. See other people's opinions.
Yonghong Song May 5, 2023, 6:43 p.m. UTC | #3
On 5/5/23 12:18 AM, Feng Zhou wrote:
> 在 2023/5/5 14:58, Hao Luo 写道:
>> On Thu, May 4, 2023 at 11:08 PM Feng zhou <zhoufeng.zf@bytedance.com> 
>> wrote:
>>>
>> <...>
>>> ---
>>>   kernel/bpf/helpers.c | 20 ++++++++++++++++++++
>>>   1 file changed, 20 insertions(+)
>>>
>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>>> index bb6b4637ebf2..453cbd312366 100644
>>> --- a/kernel/bpf/helpers.c
>>> +++ b/kernel/bpf/helpers.c
>>> @@ -2149,6 +2149,25 @@ __bpf_kfunc struct cgroup 
>>> *bpf_cgroup_from_id(u64 cgid)
>>>                  return NULL;
>>>          return cgrp;
>>>   }
>>> +
>>> +/**
>>> + * bpf_task_under_cgroup - wrap task_under_cgroup_hierarchy() as a 
>>> kfunc, test
>>> + * task's membership of cgroup ancestry.
>>> + * @task: the task to be tested
>>> + * @ancestor: possible ancestor of @task's cgroup
>>> + *
>>> + * Tests whether @task's default cgroup hierarchy is a descendant of 
>>> @ancestor.
>>> + * It follows all the same rules as cgroup_is_descendant, and only 
>>> applies
>>> + * to the default hierarchy.
>>> + */
>>> +__bpf_kfunc long bpf_task_under_cgroup(struct task_struct *task,
>>> +                                      struct cgroup *ancestor)
>>> +{
>>> +       if (unlikely(!ancestor || !task))
>>> +               return -EINVAL;
>>> +
>>> +       return task_under_cgroup_hierarchy(task, ancestor);
>>> +}
>>>   #endif /* CONFIG_CGROUPS */
>>>
>>
>> I wonder in what situation a null 'task' or 'ancestor' can be passed.
>> Please call out in the comment that the returned value can be a
>> negative error, so that writing if(bpf_task_under_cgroup()) may cause
>> surprising results.
>>
>> Hao
> 
> Hmm, you are right. As kfunc, the NULL value of the parameter is judged, 
> and bpf verify will prompt the developer to add it. There is really no 
> need to add this part of the judgment. See other people's opinions.

Thanks for pointing out Hou.

Currently, bpf_task_under_cgroup() is marked as KF_RCU.

Per documentation:
2.4.7 KF_RCU flag
-----------------

The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs 
marked with
KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier 
guarantees
that the objects are valid and there is no use-after-free. The pointers 
are not
NULL, but the object's refcount could have reached zero. The kfuncs need to
consider doing refcnt != 0 check, especially when returning a KF_ACQUIRE
pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very 
likely
also be KF_RET_NULL.


The pointer cannot be NULL, so the following line of code can be removed:
 >>> +       if (unlikely(!ancestor || !task))
 >>> +               return -EINVAL;

I think we do not need to check refcnt != 0 case since ancestor and
task won't go away.

In the example of second patch, both arguments are TRUSTED arguments
which is stronger than RCU, so the test itself is okay.
I am considering whether we should enforce arguments of the kfunc
to be KF_TRUSTED_ARGS, but I think esp. in some cases, cgroup
might be RCU protected e.g., task->cgroup->dfl_cgrp. So leaving argument
requirement as KF_RCU should be better.

> 
>
Alexei Starovoitov May 5, 2023, 6:48 p.m. UTC | #4
On Fri, May 5, 2023 at 11:44 AM Yonghong Song <yhs@meta.com> wrote:
>
>
>
> On 5/5/23 12:18 AM, Feng Zhou wrote:
> > 在 2023/5/5 14:58, Hao Luo 写道:
> >> On Thu, May 4, 2023 at 11:08 PM Feng zhou <zhoufeng.zf@bytedance.com>
> >> wrote:
> >>>
> >> <...>
> >>> ---
> >>>   kernel/bpf/helpers.c | 20 ++++++++++++++++++++
> >>>   1 file changed, 20 insertions(+)
> >>>
> >>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >>> index bb6b4637ebf2..453cbd312366 100644
> >>> --- a/kernel/bpf/helpers.c
> >>> +++ b/kernel/bpf/helpers.c
> >>> @@ -2149,6 +2149,25 @@ __bpf_kfunc struct cgroup
> >>> *bpf_cgroup_from_id(u64 cgid)
> >>>                  return NULL;
> >>>          return cgrp;
> >>>   }
> >>> +
> >>> +/**
> >>> + * bpf_task_under_cgroup - wrap task_under_cgroup_hierarchy() as a
> >>> kfunc, test
> >>> + * task's membership of cgroup ancestry.
> >>> + * @task: the task to be tested
> >>> + * @ancestor: possible ancestor of @task's cgroup
> >>> + *
> >>> + * Tests whether @task's default cgroup hierarchy is a descendant of
> >>> @ancestor.
> >>> + * It follows all the same rules as cgroup_is_descendant, and only
> >>> applies
> >>> + * to the default hierarchy.
> >>> + */
> >>> +__bpf_kfunc long bpf_task_under_cgroup(struct task_struct *task,
> >>> +                                      struct cgroup *ancestor)
> >>> +{
> >>> +       if (unlikely(!ancestor || !task))
> >>> +               return -EINVAL;
> >>> +
> >>> +       return task_under_cgroup_hierarchy(task, ancestor);
> >>> +}
> >>>   #endif /* CONFIG_CGROUPS */
> >>>
> >>
> >> I wonder in what situation a null 'task' or 'ancestor' can be passed.
> >> Please call out in the comment that the returned value can be a
> >> negative error, so that writing if(bpf_task_under_cgroup()) may cause
> >> surprising results.
> >>
> >> Hao
> >
> > Hmm, you are right. As kfunc, the NULL value of the parameter is judged,
> > and bpf verify will prompt the developer to add it. There is really no
> > need to add this part of the judgment. See other people's opinions.
>
> Thanks for pointing out Hou.
>
> Currently, bpf_task_under_cgroup() is marked as KF_RCU.
>
> Per documentation:
> 2.4.7 KF_RCU flag
> -----------------
>
> The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs
> marked with
> KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier
> guarantees
> that the objects are valid and there is no use-after-free. The pointers
> are not
> NULL, but the object's refcount could have reached zero. The kfuncs need to
> consider doing refcnt != 0 check, especially when returning a KF_ACQUIRE
> pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very
> likely
> also be KF_RET_NULL.
>
>
> The pointer cannot be NULL, so the following line of code can be removed:
>  >>> +       if (unlikely(!ancestor || !task))
>  >>> +               return -EINVAL;

Right. With KF_RCU the verifier guarantees != NULL.
Let's get rid of this check.
This will make the return value clean.

> I think we do not need to check refcnt != 0 case since ancestor and
> task won't go away.

correct.

> In the example of second patch, both arguments are TRUSTED arguments
> which is stronger than RCU, so the test itself is okay.
> I am considering whether we should enforce arguments of the kfunc
> to be KF_TRUSTED_ARGS, but I think esp. in some cases, cgroup
> might be RCU protected e.g., task->cgroup->dfl_cgrp. So leaving argument
> requirement as KF_RCU should be better.

+1