Message ID | 20210407153604.1680079-3-vladbu@nvidia.com |
---|---|
State | New |
Headers | show |
Series | Action initalization fixes | expand |
On Thu 08 Apr 2021 at 02:50, Cong Wang <xiyou.wangcong@gmail.com> wrote: > On Wed, Apr 7, 2021 at 8:36 AM Vlad Buslov <vladbu@nvidia.com> wrote: >> >> Action init code increments reference counter when it changes an action. >> This is the desired behavior for cls API which needs to obtain action >> reference for every classifier that points to action. However, act API just >> needs to change the action and releases the reference before returning. >> This sequence breaks when the requested action doesn't exist, which causes >> act API init code to create new action with specified index, but action is >> still released before returning and is deleted (unless it was referenced >> concurrently by cls API). >> >> Reproduction: >> >> $ sudo tc actions ls action gact >> $ sudo tc actions change action gact drop index 1 >> $ sudo tc actions ls action gact >> > > I didn't know 'change' could actually create an action when > it does not exist. So it sets NLM_F_REPLACE, how could it > replace a non-existing one? Is this the right behavior or is it too > late to change even if it is not? Origins of setting ovr based on NLM_F_REPLACE are lost since this code goes back to Linus' Linux-2.6.12-rc2 commit. Jamal, do you know if this is the expected behavior or just something unintended? > >> Extend tcf_action_init() to accept 'init_res' array and initialize it with >> action->ops->init() result. In tcf_action_add() remove pointers to created >> actions from actions array before passing it to tcf_action_put_many(). > > In my last comments, I actually meant whether we can avoid this > 'init_res[]' array. Since here you want to check whether an action > returned by tcf_action_init_1() is a new one or an existing one, how > about checking its refcnt? Something like: > > act = tcf_action_init_1(...); > if (IS_ERR(act)) { > err = PTR_ERR(act); > goto err; > } > if (refcount_read(&act->tcfa_refcnt) == 1) { > // we know this is a newly allocated one > } > > Thanks. Hmm, I don't think this would work in general case. Consider following cases: 1. Action existed during init as filter action(refcnt=1), init overwrote it setting refcnt=2, by the time we got to checking tcfa_refcnt filter has been deleted (refcnt=1) so code will incorrectly assume that it has created the action. 2. We need this check in tcf_action_add() to release the refcnt in case of overwriting existing actions, but by that time actions are already accessible though idr, so even in case when new action has been created (refcnt=1) it could already been referenced by concurrently created filter (refcnt=2). Regards, Vlad
On 2021-04-07 7:50 p.m., Cong Wang wrote: > On Wed, Apr 7, 2021 at 8:36 AM Vlad Buslov <vladbu@nvidia.com> wrote: >> >> Action init code increments reference counter when it changes an action. >> This is the desired behavior for cls API which needs to obtain action >> reference for every classifier that points to action. However, act API just >> needs to change the action and releases the reference before returning. >> This sequence breaks when the requested action doesn't exist, which causes >> act API init code to create new action with specified index, but action is >> still released before returning and is deleted (unless it was referenced >> concurrently by cls API). >> >> Reproduction: >> >> $ sudo tc actions ls action gact >> $ sudo tc actions change action gact drop index 1 >> $ sudo tc actions ls action gact >> > > I didn't know 'change' could actually create an action when > it does not exist. So it sets NLM_F_REPLACE, how could it > replace a non-existing one? Is this the right behavior or is it too > late to change even if it is not? Thats expected behavior for "change" essentially mapping to classical "SET" i.e. "create if it doesnt exist, replace if it exists" i.e NLM_F_CREATE | NLM_F_REPLACE In retrospect, "replace" should probably have been just NLM_F_REPLACE "replace if it exists, error otherwise". Currently there is no distinction between the two. "Add" is classical "CREATE" i.e "create if doesnt exist, otherwise error" It may be feasible to fix "replace" but not sure how many scripts over the years are now dependent on that behavior. cheers, jamal
On 2021-04-08 3:50 a.m., Vlad Buslov wrote: > > On Thu 08 Apr 2021 at 02:50, Cong Wang <xiyou.wangcong@gmail.com> wrote: > Origins of setting ovr based on NLM_F_REPLACE are lost since this code > goes back to Linus' Linux-2.6.12-rc2 commit. Jamal, do you know if this > is the expected behavior or just something unintended? Seems our emails crossed path. The problem with ovr is the ambiguity of whether we are saying both CREATE and REPLACE or just one or the other. We could improve the kernel side by just passing the flags to each action. Note it is too late to fix iproute2 without some backward compat flag; but it may not be too late for someone writting a new application in user space. cheers, jamal
On Thu, Apr 8, 2021 at 12:50 AM Vlad Buslov <vladbu@nvidia.com> wrote: > > > On Thu 08 Apr 2021 at 02:50, Cong Wang <xiyou.wangcong@gmail.com> wrote: > > In my last comments, I actually meant whether we can avoid this > > 'init_res[]' array. Since here you want to check whether an action > > returned by tcf_action_init_1() is a new one or an existing one, how > > about checking its refcnt? Something like: > > > > act = tcf_action_init_1(...); > > if (IS_ERR(act)) { > > err = PTR_ERR(act); > > goto err; > > } > > if (refcount_read(&act->tcfa_refcnt) == 1) { > > // we know this is a newly allocated one > > } > > > > Thanks. > > Hmm, I don't think this would work in general case. Consider following > cases: > > 1. Action existed during init as filter action(refcnt=1), init overwrote > it setting refcnt=2, by the time we got to checking tcfa_refcnt filter > has been deleted (refcnt=1) so code will incorrectly assume that it has > created the action. > > 2. We need this check in tcf_action_add() to release the refcnt in case > of overwriting existing actions, but by that time actions are already > accessible though idr, so even in case when new action has been created > (refcnt=1) it could already been referenced by concurrently created > filter (refcnt=2). Hmm, I nearly forgot RTNL is lifted for some cases along TC filter and action control paths... It seems we have no better way to work around this. Thanks.
On Thu, Apr 8, 2021 at 4:59 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > On 2021-04-07 7:50 p.m., Cong Wang wrote: > > On Wed, Apr 7, 2021 at 8:36 AM Vlad Buslov <vladbu@nvidia.com> wrote: > >> > >> Action init code increments reference counter when it changes an action. > >> This is the desired behavior for cls API which needs to obtain action > >> reference for every classifier that points to action. However, act API just > >> needs to change the action and releases the reference before returning. > >> This sequence breaks when the requested action doesn't exist, which causes > >> act API init code to create new action with specified index, but action is > >> still released before returning and is deleted (unless it was referenced > >> concurrently by cls API). > >> > >> Reproduction: > >> > >> $ sudo tc actions ls action gact > >> $ sudo tc actions change action gact drop index 1 > >> $ sudo tc actions ls action gact > >> > > > > I didn't know 'change' could actually create an action when > > it does not exist. So it sets NLM_F_REPLACE, how could it > > replace a non-existing one? Is this the right behavior or is it too > > late to change even if it is not? > > Thats expected behavior for "change" essentially mapping > to classical "SET" i.e. > "create if it doesnt exist, replace if it exists" > i.e NLM_F_CREATE | NLM_F_REPLACE > > In retrospect, "replace" should probably have been just NLM_F_REPLACE > "replace if it exists, error otherwise". > Currently there is no distinction between the two. This is how I interpret "replace" too, but again it is probably too late to change. > > "Add" is classical "CREATE" i.e "create if doesnt exist, otherwise > error" > > It may be feasible to fix "replace" but not sure how many scripts over > the years are now dependent on that behavior. Right, we probably have to live with it forever. Thanks.
diff --git a/include/net/act_api.h b/include/net/act_api.h index 2bf3092ae7ec..312f0f6554a0 100644 --- a/include/net/act_api.h +++ b/include/net/act_api.h @@ -185,7 +185,7 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions, int nr_actions, struct tcf_result *res); int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla, struct nlattr *est, char *name, int ovr, int bind, - struct tc_action *actions[], size_t *attr_size, + struct tc_action *actions[], int init_res[], size_t *attr_size, bool rtnl_held, struct netlink_ext_ack *extack); struct tc_action_ops *tc_action_load_ops(char *name, struct nlattr *nla, bool rtnl_held, @@ -193,7 +193,8 @@ struct tc_action_ops *tc_action_load_ops(char *name, struct nlattr *nla, struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, struct nlattr *nla, struct nlattr *est, char *name, int ovr, int bind, - struct tc_action_ops *ops, bool rtnl_held, + struct tc_action_ops *a_o, int *init_res, + bool rtnl_held, struct netlink_ext_ack *extack); int tcf_action_dump(struct sk_buff *skb, struct tc_action *actions[], int bind, int ref, bool terse); diff --git a/net/sched/act_api.c b/net/sched/act_api.c index b919826939e0..50854cfbfcdb 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -992,7 +992,8 @@ struct tc_action_ops *tc_action_load_ops(char *name, struct nlattr *nla, struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, struct nlattr *nla, struct nlattr *est, char *name, int ovr, int bind, - struct tc_action_ops *a_o, bool rtnl_held, + struct tc_action_ops *a_o, int *init_res, + bool rtnl_held, struct netlink_ext_ack *extack) { struct nla_bitfield32 flags = { 0, 0 }; @@ -1028,6 +1029,7 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, } if (err < 0) goto err_out; + *init_res = err; if (!name && tb[TCA_ACT_COOKIE]) tcf_set_action_cookie(&a->act_cookie, cookie); @@ -1056,7 +1058,7 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla, struct nlattr *est, char *name, int ovr, int bind, - struct tc_action *actions[], size_t *attr_size, + struct tc_action *actions[], int init_res[], size_t *attr_size, bool rtnl_held, struct netlink_ext_ack *extack) { struct tc_action_ops *ops[TCA_ACT_MAX_PRIO] = {}; @@ -1084,7 +1086,8 @@ int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla, for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { act = tcf_action_init_1(net, tp, tb[i], est, name, ovr, bind, - ops[i - 1], rtnl_held, extack); + ops[i - 1], &init_res[i - 1], rtnl_held, + extack); if (IS_ERR(act)) { err = PTR_ERR(act); goto err; @@ -1497,12 +1500,13 @@ static int tcf_action_add(struct net *net, struct nlattr *nla, struct netlink_ext_ack *extack) { size_t attr_size = 0; - int loop, ret; + int loop, ret, i; struct tc_action *actions[TCA_ACT_MAX_PRIO] = {}; + int init_res[TCA_ACT_MAX_PRIO] = {}; for (loop = 0; loop < 10; loop++) { ret = tcf_action_init(net, NULL, nla, NULL, NULL, ovr, 0, - actions, &attr_size, true, extack); + actions, init_res, &attr_size, true, extack); if (ret != -EAGAIN) break; } @@ -1510,8 +1514,12 @@ static int tcf_action_add(struct net *net, struct nlattr *nla, if (ret < 0) return ret; ret = tcf_add_notify(net, n, actions, portid, attr_size, extack); - if (ovr) - tcf_action_put_many(actions); + + /* only put existing actions */ + for (i = 0; i < TCA_ACT_MAX_PRIO; i++) + if (init_res[i] == ACT_P_CREATED) + actions[i] = NULL; + tcf_action_put_many(actions); return ret; } diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 9332ec6863e8..9ecb91ebf094 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -3040,6 +3040,7 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb, { #ifdef CONFIG_NET_CLS_ACT { + int init_res[TCA_ACT_MAX_PRIO] = {}; struct tc_action *act; size_t attr_size = 0; @@ -3051,8 +3052,8 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb, return PTR_ERR(a_o); act = tcf_action_init_1(net, tp, tb[exts->police], rate_tlv, "police", ovr, - TCA_ACT_BIND, a_o, rtnl_held, - extack); + TCA_ACT_BIND, a_o, init_res, + rtnl_held, extack); if (IS_ERR(act)) { module_put(a_o->owner); return PTR_ERR(act); @@ -3067,8 +3068,8 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb, err = tcf_action_init(net, tp, tb[exts->action], rate_tlv, NULL, ovr, TCA_ACT_BIND, - exts->actions, &attr_size, - rtnl_held, extack); + exts->actions, init_res, + &attr_size, rtnl_held, extack); if (err < 0) return err; exts->nr_actions = err;
Action init code increments reference counter when it changes an action. This is the desired behavior for cls API which needs to obtain action reference for every classifier that points to action. However, act API just needs to change the action and releases the reference before returning. This sequence breaks when the requested action doesn't exist, which causes act API init code to create new action with specified index, but action is still released before returning and is deleted (unless it was referenced concurrently by cls API). Reproduction: $ sudo tc actions ls action gact $ sudo tc actions change action gact drop index 1 $ sudo tc actions ls action gact Extend tcf_action_init() to accept 'init_res' array and initialize it with action->ops->init() result. In tcf_action_add() remove pointers to created actions from actions array before passing it to tcf_action_put_many(). Fixes: cae422f379f3 ("net: sched: use reference counting action init") Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> --- Notes: Changes V1 -> V2: - Extend commit message with reproduction and fix details. - Don't extend tcf_action_put_many() with action filtering. Filter actions array in caller instead. include/net/act_api.h | 5 +++-- net/sched/act_api.c | 22 +++++++++++++++------- net/sched/cls_api.c | 9 +++++---- 3 files changed, 23 insertions(+), 13 deletions(-)