Message ID | 20210413025011.1251-1-kerneljasonxing@gmail.com |
---|---|
State | New |
Headers | show |
Series | [net,v2] i40e: fix the panic when running bpf in xdpdrv mode | expand |
kerneljasonxing@gmail.com wrote: > From: Jason Xing <xingwanli@kuaishou.com> Hi Jason, Sorry, I missed this on the first time: Added intel-wired-lan, please include on any future submissions for Intel drivers. get-maintainers script might help here? > > Fix this panic by adding more rules to calculate the value of @rss_size_max > which could be used in allocating the queues when bpf is loaded, which, > however, could cause the failure and then trigger the NULL pointer of > vsi->rx_rings. Prio to this fix, the machine doesn't care about how many > cpus are online and then allocates 256 queues on the machine with 32 cpus > online actually. > > Once the load of bpf begins, the log will go like this "failed to get > tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI > failed". > > Thus, I attach the key information of the crash-log here. > > BUG: unable to handle kernel NULL pointer dereference at > 0000000000000000 > RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e] > Call Trace: > [2160294.717292] ? i40e_reconfig_rss_queues+0x170/0x170 [i40e] > [2160294.717666] dev_xdp_install+0x4f/0x70 > [2160294.718036] dev_change_xdp_fd+0x11f/0x230 > [2160294.718380] ? dev_disable_lro+0xe0/0xe0 > [2160294.718705] do_setlink+0xac7/0xe70 > [2160294.719035] ? __nla_parse+0xed/0x120 > [2160294.719365] rtnl_newlink+0x73b/0x860 > > Fixes: 41c445ff0f48 ("i40e: main driver core") > This Fixes line should be connected to the Sign offs with no linefeeds between. > Signed-off-by: Jason Xing <xingwanli@kuaishou.com> > Signed-off-by: Shujin Li <lishujin@kuaishou.com> Did Shujin contribute to this patch? Why are they signing off? If they developed this patch with you, it should say: Co-developed-by: Shujin .... Signed-off-by: Shujin ... Signed-off-by: Jason ... Your signature should be last if you sent the patch. The sign-offs are like a chain of custody, please review https://www.kernel.org/doc/html/latest/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by Thanks, Jesse
On Wed, Apr 14, 2021 at 12:27 AM Jesse Brandeburg <jesse.brandeburg@intel.com> wrote: > > kerneljasonxing@gmail.com wrote: > > > From: Jason Xing <xingwanli@kuaishou.com> > > Hi Jason, > > Sorry, I missed this on the first time: Added intel-wired-lan, > please include on any future submissions for Intel drivers. > get-maintainers script might help here? > Hi, Jesse In the first patch, I did send to intel-wired-lan but it told me that it is open for the member only, so I got rid of it in this patch v2. > > > > Fix this panic by adding more rules to calculate the value of @rss_size_max > > which could be used in allocating the queues when bpf is loaded, which, > > however, could cause the failure and then trigger the NULL pointer of > > vsi->rx_rings. Prio to this fix, the machine doesn't care about how many > > cpus are online and then allocates 256 queues on the machine with 32 cpus > > online actually. > > > > Once the load of bpf begins, the log will go like this "failed to get > > tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI > > failed". > > > > Thus, I attach the key information of the crash-log here. > > > > BUG: unable to handle kernel NULL pointer dereference at > > 0000000000000000 > > RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e] > > Call Trace: > > [2160294.717292] ? i40e_reconfig_rss_queues+0x170/0x170 [i40e] > > [2160294.717666] dev_xdp_install+0x4f/0x70 > > [2160294.718036] dev_change_xdp_fd+0x11f/0x230 > > [2160294.718380] ? dev_disable_lro+0xe0/0xe0 > > [2160294.718705] do_setlink+0xac7/0xe70 > > [2160294.719035] ? __nla_parse+0xed/0x120 > > [2160294.719365] rtnl_newlink+0x73b/0x860 > > > > Fixes: 41c445ff0f48 ("i40e: main driver core") > > > > This Fixes line should be connected to the Sign offs with > no linefeeds between. > > > Signed-off-by: Jason Xing <xingwanli@kuaishou.com> > > Signed-off-by: Shujin Li <lishujin@kuaishou.com> > > Did Shujin contribute to this patch? Why are they signing off? If > they developed this patch with you, it should say: > Co-developed-by: Shujin .... > Signed-off-by: Shujin ... > Signed-off-by: Jason ... > Well, yes, I will add a Co-developed-by label later. > Your signature should be last if you sent the patch. The sign-offs are > like a chain of custody, please review > https://www.kernel.org/doc/html/latest/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by > Thanks so much for your detailed instructions. I'm about to correct them all together and then resend the patch v3. Jason > Thanks, > Jesse
On Wed, Apr 14, 2021 at 12:27 AM Jesse Brandeburg <jesse.brandeburg@intel.com> wrote: > > kerneljasonxing@gmail.com wrote: > > > From: Jason Xing <xingwanli@kuaishou.com> > > Hi Jason, > > Sorry, I missed this on the first time: Added intel-wired-lan, > please include on any future submissions for Intel drivers. > get-maintainers script might help here? > Probably I got this wrong in the last email. Did you mean that I should add intel-wired-lan in the title not the cc list? It seems I should put this together on the next submission like this: [Intel-wired-lan] [PATCH net v4] Am I missing something? Thanks, Jason > > > > Fix this panic by adding more rules to calculate the value of @rss_size_max > > which could be used in allocating the queues when bpf is loaded, which, > > however, could cause the failure and then trigger the NULL pointer of > > vsi->rx_rings. Prio to this fix, the machine doesn't care about how many > > cpus are online and then allocates 256 queues on the machine with 32 cpus > > online actually. > > > > Once the load of bpf begins, the log will go like this "failed to get > > tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI > > failed". > > > > Thus, I attach the key information of the crash-log here. > > > > BUG: unable to handle kernel NULL pointer dereference at > > 0000000000000000 > > RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e] > > Call Trace: > > [2160294.717292] ? i40e_reconfig_rss_queues+0x170/0x170 [i40e] > > [2160294.717666] dev_xdp_install+0x4f/0x70 > > [2160294.718036] dev_change_xdp_fd+0x11f/0x230 > > [2160294.718380] ? dev_disable_lro+0xe0/0xe0 > > [2160294.718705] do_setlink+0xac7/0xe70 > > [2160294.719035] ? __nla_parse+0xed/0x120 > > [2160294.719365] rtnl_newlink+0x73b/0x860 > > > > Fixes: 41c445ff0f48 ("i40e: main driver core") > > > > This Fixes line should be connected to the Sign offs with > no linefeeds between. > > > Signed-off-by: Jason Xing <xingwanli@kuaishou.com> > > Signed-off-by: Shujin Li <lishujin@kuaishou.com> > > Did Shujin contribute to this patch? Why are they signing off? If > they developed this patch with you, it should say: > Co-developed-by: Shujin .... > Signed-off-by: Shujin ... > Signed-off-by: Jason ... > > Your signature should be last if you sent the patch. The sign-offs are > like a chain of custody, please review > https://www.kernel.org/doc/html/latest/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by > > Thanks, > Jesse
kerneljasonxing@gmail.com wrote: > From: Jason Xing <xingwanli@kuaishou.com> > > Fix this panic by adding more rules to calculate the value of @rss_size_max > which could be used in allocating the queues when bpf is loaded, which, > however, could cause the failure and then trigger the NULL pointer of > vsi->rx_rings. Prio to this fix, the machine doesn't care about how many > cpus are online and then allocates 256 queues on the machine with 32 cpus > online actually. > > Once the load of bpf begins, the log will go like this "failed to get > tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI > failed". > > Thus, I attach the key information of the crash-log here. > > BUG: unable to handle kernel NULL pointer dereference at > 0000000000000000 > RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e] > Call Trace: > [2160294.717292] ? i40e_reconfig_rss_queues+0x170/0x170 [i40e] > [2160294.717666] dev_xdp_install+0x4f/0x70 > [2160294.718036] dev_change_xdp_fd+0x11f/0x230 > [2160294.718380] ? dev_disable_lro+0xe0/0xe0 > [2160294.718705] do_setlink+0xac7/0xe70 > [2160294.719035] ? __nla_parse+0xed/0x120 > [2160294.719365] rtnl_newlink+0x73b/0x860 > > Fixes: 41c445ff0f48 ("i40e: main driver core") > Co-developed-by: Shujin Li <lishujin@kuaishou.com> > Signed-off-by: Shujin Li <lishujin@kuaishou.com> > Signed-off-by: Jason Xing <xingwanli@kuaishou.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> @Jakub/@DaveM - feel free to apply this directly.
Jason Xing wrote: > On Wed, Apr 14, 2021 at 12:27 AM Jesse Brandeburg > <jesse.brandeburg@intel.com> wrote: > > > > kerneljasonxing@gmail.com wrote: > > > > > From: Jason Xing <xingwanli@kuaishou.com> > > > > Hi Jason, > > > > Sorry, I missed this on the first time: Added intel-wired-lan, > > please include on any future submissions for Intel drivers. > > get-maintainers script might help here? > > > > Probably I got this wrong in the last email. Did you mean that I should add > intel-wired-lan in the title not the cc list? It seems I should put > this together on > the next submission like this: > > [Intel-wired-lan] [PATCH net v4] Your v3 submittal was correct. My intent was to make sure intel-wired-lan was in CC: If Kuba or Dave wants us to take the fix in via intel-wired-lan trees, then we can do that, or they can apply it directly. I'll ack it on the v3.
On Thu, Apr 15, 2021 at 10:08 AM Jesse Brandeburg <jesse.brandeburg@intel.com> wrote: > > Jason Xing wrote: > > > On Wed, Apr 14, 2021 at 12:27 AM Jesse Brandeburg > > <jesse.brandeburg@intel.com> wrote: > > > > > > kerneljasonxing@gmail.com wrote: > > > > > > > From: Jason Xing <xingwanli@kuaishou.com> > > > > > > Hi Jason, > > > > > > Sorry, I missed this on the first time: Added intel-wired-lan, > > > please include on any future submissions for Intel drivers. > > > get-maintainers script might help here? > > > > > > > Probably I got this wrong in the last email. Did you mean that I should add > > intel-wired-lan in the title not the cc list? It seems I should put > > this together on > > the next submission like this: > > > > [Intel-wired-lan] [PATCH net v4] > > Your v3 submittal was correct. My intent was to make sure > intel-wired-lan was in CC: > Well, I get to know more about the whole thing. > If Kuba or Dave wants us to take the fix in via intel-wired-lan trees, > then we can do that, or they can apply it directly. I'll ack it on the > v3. Thanks, Jesse:) Jason >
On Wed, 14 Apr 2021 19:06:52 -0700 Jesse Brandeburg <jesse.brandeburg@intel.com> wrote: > kerneljasonxing@gmail.com wrote: > > > From: Jason Xing <xingwanli@kuaishou.com> > > > > Fix this panic by adding more rules to calculate the value of @rss_size_max > > which could be used in allocating the queues when bpf is loaded, which, > > however, could cause the failure and then trigger the NULL pointer of > > vsi->rx_rings. Prio to this fix, the machine doesn't care about how many > > cpus are online and then allocates 256 queues on the machine with 32 cpus > > online actually. > > > > Once the load of bpf begins, the log will go like this "failed to get > > tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI > > failed". > > > > Thus, I attach the key information of the crash-log here. > > > > BUG: unable to handle kernel NULL pointer dereference at > > 0000000000000000 > > RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e] > > Call Trace: > > [2160294.717292] ? i40e_reconfig_rss_queues+0x170/0x170 [i40e] > > [2160294.717666] dev_xdp_install+0x4f/0x70 > > [2160294.718036] dev_change_xdp_fd+0x11f/0x230 > > [2160294.718380] ? dev_disable_lro+0xe0/0xe0 > > [2160294.718705] do_setlink+0xac7/0xe70 > > [2160294.719035] ? __nla_parse+0xed/0x120 > > [2160294.719365] rtnl_newlink+0x73b/0x860 > > > > Fixes: 41c445ff0f48 ("i40e: main driver core") > > Co-developed-by: Shujin Li <lishujin@kuaishou.com> > > Signed-off-by: Shujin Li <lishujin@kuaishou.com> > > Signed-off-by: Jason Xing <xingwanli@kuaishou.com> > > Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> > > @Jakub/@DaveM - feel free to apply this directly. Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> The crash/bug happens in this code: static int i40e_xdp_setup(struct i40e_vsi *vsi, struct bpf_prog *prog, struct netlink_ext_ack *extack) { [...] for (i = 0; i < vsi->num_queue_pairs; i++) WRITE_ONCE(vsi->rx_rings[i]->xdp_prog, vsi->xdp_prog); And this is a side effect of i40e_setup_pf_switch() failing with "setup of MAIN VSI failed". LGTM -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 521ea9d..4e9a247 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -11867,6 +11867,7 @@ static int i40e_sw_init(struct i40e_pf *pf) { int err = 0; int size; + u16 pow; /* Set default capability flags */ pf->flags = I40E_FLAG_RX_CSUM_ENABLED | @@ -11885,6 +11886,11 @@ static int i40e_sw_init(struct i40e_pf *pf) pf->rss_table_size = pf->hw.func_caps.rss_table_size; pf->rss_size_max = min_t(int, pf->rss_size_max, pf->hw.func_caps.num_tx_qp); + + /* find the next higher power-of-2 of num cpus */ + pow = roundup_pow_of_two(num_online_cpus()); + pf->rss_size_max = min_t(int, pf->rss_size_max, pow); + if (pf->hw.func_caps.rss) { pf->flags |= I40E_FLAG_RSS_ENABLED; pf->alloc_rss_size = min_t(int, pf->rss_size_max,