mbox series

[net,0/2] net_sched: fix a UAF in tcf_action_init()

Message ID 20200923035624.7307-1-xiyou.wangcong@gmail.com
Headers show
Series net_sched: fix a UAF in tcf_action_init() | expand

Message

Cong Wang Sept. 23, 2020, 3:56 a.m. UTC
This patchset fixes a use-after-free triggered by syzbot. Please
find more details in each patch description.

---

Cong Wang (2):
  net_sched: defer tcf_idr_insert() in tcf_action_init_1()
  net_sched: commit action insertions together

 include/net/act_api.h      |  2 --
 net/sched/act_api.c        | 52 +++++++++++++++++++++++++-------------
 net/sched/act_bpf.c        |  4 +--
 net/sched/act_connmark.c   |  1 -
 net/sched/act_csum.c       |  3 ---
 net/sched/act_ct.c         |  2 --
 net/sched/act_ctinfo.c     |  3 ---
 net/sched/act_gact.c       |  2 --
 net/sched/act_gate.c       |  3 ---
 net/sched/act_ife.c        |  3 ---
 net/sched/act_ipt.c        |  2 --
 net/sched/act_mirred.c     |  2 --
 net/sched/act_mpls.c       |  2 --
 net/sched/act_nat.c        |  3 ---
 net/sched/act_pedit.c      |  2 --
 net/sched/act_police.c     |  2 --
 net/sched/act_sample.c     |  2 --
 net/sched/act_simple.c     |  2 --
 net/sched/act_skbedit.c    |  2 --
 net/sched/act_skbmod.c     |  2 --
 net/sched/act_tunnel_key.c |  3 ---
 net/sched/act_vlan.c       |  2 --
 22 files changed, 35 insertions(+), 66 deletions(-)

Comments

David Miller Sept. 25, 2020, 2:47 a.m. UTC | #1
From: Cong Wang <xiyou.wangcong@gmail.com>

Date: Tue, 22 Sep 2020 20:56:22 -0700

> This patchset fixes a use-after-free triggered by syzbot. Please

> find more details in each patch description.


Series applied and queued up for -stable, thanks Cong.
Vlad Buslov Sept. 25, 2020, 3:24 p.m. UTC | #2
On Wed 23 Sep 2020 at 06:56, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> All TC actions call tcf_idr_insert() for new action at the end
> of their ->init(), so we can actually move it to a central place
> in tcf_action_init_1().
>
> And once the action is inserted into the global IDR, other parallel
> process could free it immediately as its refcnt is still 1, so we can
> not fail after this, we need to move it after the goto action
> validation to avoid handling the failure case after insertion.
>
> This is found during code review, is not directly triggered by syzbot.
> And this prepares for the next patch.
>
> Cc: Vlad Buslov <vladbu@mellanox.com>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Jiri Pirko <jiri@resnulli.us>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---

Hi Cong,

Thanks for fixing this!

>  include/net/act_api.h      |  2 --
>  net/sched/act_api.c        | 38 ++++++++++++++++++++------------------
>  net/sched/act_bpf.c        |  4 +---
>  net/sched/act_connmark.c   |  1 -
>  net/sched/act_csum.c       |  3 ---
>  net/sched/act_ct.c         |  2 --
>  net/sched/act_ctinfo.c     |  3 ---
>  net/sched/act_gact.c       |  2 --
>  net/sched/act_gate.c       |  3 ---
>  net/sched/act_ife.c        |  3 ---
>  net/sched/act_ipt.c        |  2 --
>  net/sched/act_mirred.c     |  2 --
>  net/sched/act_mpls.c       |  2 --
>  net/sched/act_nat.c        |  3 ---
>  net/sched/act_pedit.c      |  2 --
>  net/sched/act_police.c     |  2 --
>  net/sched/act_sample.c     |  2 --
>  net/sched/act_simple.c     |  2 --
>  net/sched/act_skbedit.c    |  2 --
>  net/sched/act_skbmod.c     |  2 --
>  net/sched/act_tunnel_key.c |  3 ---
>  net/sched/act_vlan.c       |  2 --
>  22 files changed, 21 insertions(+), 66 deletions(-)
>
> diff --git a/include/net/act_api.h b/include/net/act_api.h
> index cb382a89ea58..87214927314a 100644
> --- a/include/net/act_api.h
> +++ b/include/net/act_api.h
> @@ -166,8 +166,6 @@ int tcf_idr_create_from_flags(struct tc_action_net *tn, u32 index,
>  			      struct nlattr *est, struct tc_action **a,
>  			      const struct tc_action_ops *ops, int bind,
>  			      u32 flags);
> -void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a);
> -
>  void tcf_idr_cleanup(struct tc_action_net *tn, u32 index);
>  int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
>  			struct tc_action **a, int bind);
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index 063d8aaf2900..0030f00234ee 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -467,17 +467,6 @@ int tcf_idr_create_from_flags(struct tc_action_net *tn, u32 index,
>  }
>  EXPORT_SYMBOL(tcf_idr_create_from_flags);
>  
> -void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a)
> -{
> -	struct tcf_idrinfo *idrinfo = tn->idrinfo;
> -
> -	mutex_lock(&idrinfo->lock);
> -	/* Replace ERR_PTR(-EBUSY) allocated by tcf_idr_check_alloc */
> -	WARN_ON(!IS_ERR(idr_replace(&idrinfo->action_idr, a, a->tcfa_index)));
> -	mutex_unlock(&idrinfo->lock);
> -}
> -EXPORT_SYMBOL(tcf_idr_insert);
> -
>  /* Cleanup idr index that was allocated but not initialized. */
>  
>  void tcf_idr_cleanup(struct tc_action_net *tn, u32 index)
> @@ -902,6 +891,16 @@ static const struct nla_policy tcf_action_policy[TCA_ACT_MAX + 1] = {
>  	[TCA_ACT_HW_STATS]	= NLA_POLICY_BITFIELD32(TCA_ACT_HW_STATS_ANY),
>  };
>  
> +static void tcf_idr_insert(struct tc_action *a)
> +{
> +	struct tcf_idrinfo *idrinfo = a->idrinfo;
> +
> +	mutex_lock(&idrinfo->lock);
> +	/* Replace ERR_PTR(-EBUSY) allocated by tcf_idr_check_alloc */
> +	WARN_ON(!IS_ERR(idr_replace(&idrinfo->action_idr, a, a->tcfa_index)));
> +	mutex_unlock(&idrinfo->lock);
> +}
> +
>  struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
>  				    struct nlattr *nla, struct nlattr *est,
>  				    char *name, int ovr, int bind,
> @@ -989,6 +988,16 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
>  	if (err < 0)
>  		goto err_mod;
>  
> +	if (TC_ACT_EXT_CMP(a->tcfa_action, TC_ACT_GOTO_CHAIN) &&
> +	    !rcu_access_pointer(a->goto_chain)) {
> +		tcf_action_destroy_1(a, bind);
> +		NL_SET_ERR_MSG(extack, "can't use goto chain with NULL chain");
> +		return ERR_PTR(-EINVAL);
> +	}

I don't think calling tcf_action_destoy_1() is enough here. Since you
moved this block before assigning cookie and releasing the module, you
also need to release them manually in addition to destroying the action
instance.

> +
> +	if (err == ACT_P_CREATED)
> +		tcf_idr_insert(a);
> +
>  	if (!name && tb[TCA_ACT_COOKIE])
>  		tcf_set_action_cookie(&a->act_cookie, cookie);
>  
> @@ -1002,13 +1011,6 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
>  	if (err != ACT_P_CREATED)
>  		module_put(a_o->owner);
>  
> -	if (TC_ACT_EXT_CMP(a->tcfa_action, TC_ACT_GOTO_CHAIN) &&
> -	    !rcu_access_pointer(a->goto_chain)) {
> -		tcf_action_destroy_1(a, bind);
> -		NL_SET_ERR_MSG(extack, "can't use goto chain with NULL chain");
> -		return ERR_PTR(-EINVAL);
> -	}
> -
>  	return a;
>  
>  err_mod:

[...]
Cong Wang Sept. 25, 2020, 7:22 p.m. UTC | #3
On Fri, Sep 25, 2020 at 8:24 AM Vlad Buslov <vlad@buslov.dev> wrote:
> > +     if (TC_ACT_EXT_CMP(a->tcfa_action, TC_ACT_GOTO_CHAIN) &&

> > +         !rcu_access_pointer(a->goto_chain)) {

> > +             tcf_action_destroy_1(a, bind);

> > +             NL_SET_ERR_MSG(extack, "can't use goto chain with NULL chain");

> > +             return ERR_PTR(-EINVAL);

> > +     }

>

> I don't think calling tcf_action_destoy_1() is enough here. Since you

> moved this block before assigning cookie and releasing the module, you

> also need to release them manually in addition to destroying the action

> instance.

>


tcf_action_destoy_1() eventually calls free_tcf() which frees cookie and
tcf_action_destroy() which releases module refcnt.

What am I missing here?

Thanks.
Vlad Buslov Sept. 25, 2020, 7:45 p.m. UTC | #4
On Fri 25 Sep 2020 at 22:22, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Fri, Sep 25, 2020 at 8:24 AM Vlad Buslov <vlad@buslov.dev> wrote:
>> > +     if (TC_ACT_EXT_CMP(a->tcfa_action, TC_ACT_GOTO_CHAIN) &&
>> > +         !rcu_access_pointer(a->goto_chain)) {
>> > +             tcf_action_destroy_1(a, bind);
>> > +             NL_SET_ERR_MSG(extack, "can't use goto chain with NULL chain");
>> > +             return ERR_PTR(-EINVAL);
>> > +     }
>>
>> I don't think calling tcf_action_destoy_1() is enough here. Since you
>> moved this block before assigning cookie and releasing the module, you
>> also need to release them manually in addition to destroying the action
>> instance.
>>
>
> tcf_action_destoy_1() eventually calls free_tcf() which frees cookie and
> tcf_action_destroy() which releases module refcnt.
>
> What am I missing here?
>
> Thanks.

The memory referenced by the function local pointer "cookie" hasn't been
assigned yet to the a->act_cookie because in your patch you moved
goto_chain validation code before the cookie change. That means that if
user overwrites existing action, then action old a->act_cookie will be
freed by tcf_action_destroy_1() but new cookie that was allocated by
nla_memdup_cookie() will leak.

Regards,
Vlad
Davide Caratti Sept. 28, 2020, 10:14 a.m. UTC | #5
hello,

On Fri, 2020-09-25 at 22:45 +0300, Vlad Buslov wrote:
> On Fri 25 Sep 2020 at 22:22, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Fri, Sep 25, 2020 at 8:24 AM Vlad Buslov <vlad@buslov.dev> wrote:
> > > > +     if (TC_ACT_EXT_CMP(a->tcfa_action, TC_ACT_GOTO_CHAIN) &&
> > > > +         !rcu_access_pointer(a->goto_chain)) {
> > > > +             tcf_action_destroy_1(a, bind);
> > > > +             NL_SET_ERR_MSG(extack, "can't use goto chain with NULL chain");
> > > > +             return ERR_PTR(-EINVAL);
> > > > +     }
> > > 
> > > I don't think calling tcf_action_destoy_1() is enough here. Since you
> > > moved this block before assigning cookie and releasing the module, you
> > > also need to release them manually in addition to destroying the action
> > > instance.
> > > 
> > 
> > tcf_action_destoy_1() eventually calls free_tcf() which frees cookie and
> > tcf_action_destroy() which releases module refcnt.
> > 
> > What am I missing here?
> > 
> > Thanks.
> 
> The memory referenced by the function local pointer "cookie" hasn't been
> assigned yet to the a->act_cookie because in your patch you moved
> goto_chain validation code before the cookie change. That means that if
> user overwrites existing action, then action old a->act_cookie will be
> freed by tcf_action_destroy_1() but new cookie that was allocated by
> nla_memdup_cookie() will leak.

maybe we can just delete this if (TC_ACT_EXT_CMP(...)) { ... }
statement, instead of moving it? Each TC action already does the check
for NULL "goto chains" with a_o->init() -> tcf_action_check_ctrlact(),
so this if () statement looks dead code to me _ I probably forgot to
remove it after all actions were converted to validate the control
action inside their .init() function.
Cong Wang Sept. 28, 2020, 5:38 p.m. UTC | #6
On Mon, Sep 28, 2020 at 3:14 AM Davide Caratti <dcaratti@redhat.com> wrote:
>
> hello,
>
> On Fri, 2020-09-25 at 22:45 +0300, Vlad Buslov wrote:
> > On Fri 25 Sep 2020 at 22:22, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > > On Fri, Sep 25, 2020 at 8:24 AM Vlad Buslov <vlad@buslov.dev> wrote:
> > > > > +     if (TC_ACT_EXT_CMP(a->tcfa_action, TC_ACT_GOTO_CHAIN) &&
> > > > > +         !rcu_access_pointer(a->goto_chain)) {
> > > > > +             tcf_action_destroy_1(a, bind);
> > > > > +             NL_SET_ERR_MSG(extack, "can't use goto chain with NULL chain");
> > > > > +             return ERR_PTR(-EINVAL);
> > > > > +     }
> > > >
> > > > I don't think calling tcf_action_destoy_1() is enough here. Since you
> > > > moved this block before assigning cookie and releasing the module, you
> > > > also need to release them manually in addition to destroying the action
> > > > instance.
> > > >
> > >
> > > tcf_action_destoy_1() eventually calls free_tcf() which frees cookie and
> > > tcf_action_destroy() which releases module refcnt.
> > >
> > > What am I missing here?
> > >
> > > Thanks.
> >
> > The memory referenced by the function local pointer "cookie" hasn't been
> > assigned yet to the a->act_cookie because in your patch you moved
> > goto_chain validation code before the cookie change. That means that if
> > user overwrites existing action, then action old a->act_cookie will be
> > freed by tcf_action_destroy_1() but new cookie that was allocated by
> > nla_memdup_cookie() will leak.

Yes, good catch!


>
> maybe we can just delete this if (TC_ACT_EXT_CMP(...)) { ... }
> statement, instead of moving it? Each TC action already does the check
> for NULL "goto chains" with a_o->init() -> tcf_action_check_ctrlact(),
> so this if () statement looks dead code to me _ I probably forgot to
> remove it after all actions were converted to validate the control
> action inside their .init() function.

Good point, I think you are right, I will send a patch to remove it.

Thanks!