diff mbox series

sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues

Message ID 20210427210938.661700-1-ignat@cloudflare.com
State Accepted
Commit 99ba0ea616aabdc8e26259fd722503e012199a76
Headers show
Series sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues | expand

Commit Message

Ignat Korchagin April 27, 2021, 9:09 p.m. UTC
efx->xdp_tx_queue_count is initially initialized to num_possible_cpus() and is
later used to allocate and traverse efx->xdp_tx_queues lookup array. However,
we may end up not initializing all the array slots with real queues during
probing. This results, for example, in a NULL pointer dereference, when running
"# ethtool -S <iface>", similar to below

[2570283.664955][T4126959] BUG: kernel NULL pointer dereference, address: 00000000000000f8
[2570283.681283][T4126959] #PF: supervisor read access in kernel mode
[2570283.695678][T4126959] #PF: error_code(0x0000) - not-present page
[2570283.710013][T4126959] PGD 0 P4D 0
[2570283.721649][T4126959] Oops: 0000 [#1] SMP PTI
[2570283.734108][T4126959] CPU: 23 PID: 4126959 Comm: ethtool Tainted: G           O      5.10.20-cloudflare-2021.3.1 #1
[2570283.752641][T4126959] Hardware name: <redacted>
[2570283.781408][T4126959] RIP: 0010:efx_ethtool_get_stats+0x2ca/0x330 [sfc]
[2570283.796073][T4126959] Code: 00 85 c0 74 39 48 8b 95 a8 0f 00 00 48 85 d2 74 2d 31 c0 eb 07 48 8b 95 a8 0f 00 00 48 63 c8 49 83 c4 08 83 c0 01 48 8b 14 ca <48> 8b 92 f8 00 00 00 49 89 54 24 f8 39 85 a0 0f 00 00 77 d7 48 8b
[2570283.831259][T4126959] RSP: 0018:ffffb79a77657ce8 EFLAGS: 00010202
[2570283.845121][T4126959] RAX: 0000000000000019 RBX: ffffb799cd0c9280 RCX: 0000000000000018
[2570283.860872][T4126959] RDX: 0000000000000000 RSI: ffff96dd970ce000 RDI: 0000000000000005
[2570283.876525][T4126959] RBP: ffff96dd86f0a000 R08: ffff96dd970ce480 R09: 000000000000005f
[2570283.892014][T4126959] R10: ffffb799cd0c9fff R11: ffffb799cd0c9000 R12: ffffb799cd0c94f8
[2570283.907406][T4126959] R13: ffffffffc11b1090 R14: ffff96dd970ce000 R15: ffffffffc11cd66c
[2570283.922705][T4126959] FS:  00007fa7723f8740(0000) GS:ffff96f51fac0000(0000) knlGS:0000000000000000
[2570283.938848][T4126959] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2570283.952524][T4126959] CR2: 00000000000000f8 CR3: 0000001a73e6e006 CR4: 00000000007706e0
[2570283.967529][T4126959] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2570283.982400][T4126959] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[2570283.997308][T4126959] PKRU: 55555554
[2570284.007649][T4126959] Call Trace:
[2570284.017598][T4126959]  dev_ethtool+0x1832/0x2830

Fix this by adjusting efx->xdp_tx_queue_count after probing to reflect the true
value of initialized slots in efx->xdp_tx_queues.

Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Fixes: e26ca4b53582 ("sfc: reduce the number of requested xdp ev queues")
Cc: <stable@vger.kernel.org> # 5.12.x
---
 drivers/net/ethernet/sfc/efx_channels.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

patchwork-bot+netdevbpf@kernel.org April 27, 2021, 10:40 p.m. UTC | #1
Hello:

This patch was applied to netdev/net-next.git (refs/heads/master):

On Tue, 27 Apr 2021 22:09:38 +0100 you wrote:
> efx->xdp_tx_queue_count is initially initialized to num_possible_cpus() and is
> later used to allocate and traverse efx->xdp_tx_queues lookup array. However,
> we may end up not initializing all the array slots with real queues during
> probing. This results, for example, in a NULL pointer dereference, when running
> "# ethtool -S <iface>", similar to below
> 
> [2570283.664955][T4126959] BUG: kernel NULL pointer dereference, address: 00000000000000f8
> [2570283.681283][T4126959] #PF: supervisor read access in kernel mode
> [2570283.695678][T4126959] #PF: error_code(0x0000) - not-present page
> [2570283.710013][T4126959] PGD 0 P4D 0
> [2570283.721649][T4126959] Oops: 0000 [#1] SMP PTI
> [2570283.734108][T4126959] CPU: 23 PID: 4126959 Comm: ethtool Tainted: G           O      5.10.20-cloudflare-2021.3.1 #1
> [2570283.752641][T4126959] Hardware name: <redacted>
> [2570283.781408][T4126959] RIP: 0010:efx_ethtool_get_stats+0x2ca/0x330 [sfc]
> [2570283.796073][T4126959] Code: 00 85 c0 74 39 48 8b 95 a8 0f 00 00 48 85 d2 74 2d 31 c0 eb 07 48 8b 95 a8 0f 00 00 48 63 c8 49 83 c4 08 83 c0 01 48 8b 14 ca <48> 8b 92 f8 00 00 00 49 89 54 24 f8 39 85 a0 0f 00 00 77 d7 48 8b
> [2570283.831259][T4126959] RSP: 0018:ffffb79a77657ce8 EFLAGS: 00010202
> [2570283.845121][T4126959] RAX: 0000000000000019 RBX: ffffb799cd0c9280 RCX: 0000000000000018
> [2570283.860872][T4126959] RDX: 0000000000000000 RSI: ffff96dd970ce000 RDI: 0000000000000005
> [2570283.876525][T4126959] RBP: ffff96dd86f0a000 R08: ffff96dd970ce480 R09: 000000000000005f
> [2570283.892014][T4126959] R10: ffffb799cd0c9fff R11: ffffb799cd0c9000 R12: ffffb799cd0c94f8
> [2570283.907406][T4126959] R13: ffffffffc11b1090 R14: ffff96dd970ce000 R15: ffffffffc11cd66c
> [2570283.922705][T4126959] FS:  00007fa7723f8740(0000) GS:ffff96f51fac0000(0000) knlGS:0000000000000000
> [2570283.938848][T4126959] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [2570283.952524][T4126959] CR2: 00000000000000f8 CR3: 0000001a73e6e006 CR4: 00000000007706e0
> [2570283.967529][T4126959] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [2570283.982400][T4126959] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [2570283.997308][T4126959] PKRU: 55555554
> [2570284.007649][T4126959] Call Trace:
> [2570284.017598][T4126959]  dev_ethtool+0x1832/0x2830
> 
> [...]

Here is the summary with links:
  - sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues
    https://git.kernel.org/netdev/net-next/c/99ba0ea616aa

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
Edward Cree April 29, 2021, 2:22 p.m. UTC | #2
On 27/04/2021 22:09, Ignat Korchagin wrote:
> efx->xdp_tx_queue_count is initially initialized to num_possible_cpus() and is

> later used to allocate and traverse efx->xdp_tx_queues lookup array. However,

> we may end up not initializing all the array slots with real queues during

> probing. This results, for example, in a NULL pointer dereference, when running

> "# ethtool -S <iface>", similar to below

...
> diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c

> index 1bfeee283ea9..a3ca406a3561 100644

> --- a/drivers/net/ethernet/sfc/efx_channels.c

> +++ b/drivers/net/ethernet/sfc/efx_channels.c

> @@ -914,6 +914,8 @@ int efx_set_channels(struct efx_nic *efx)

>  			}

>  		}

>  	}

> +	if (xdp_queue_number)

Wait, why is this guard condition needed?
What happens if we had nonzero efx->xdp_tx_queue_count initially, but we end up
 with no TXQs available for XDP at all (so xdp_queue_number == 0)?

-ed
> +		efx->xdp_tx_queue_count = xdp_queue_number;

>  

>  	rc = netif_set_real_num_tx_queues(efx->net_dev, efx->n_tx_channels);

>  	if (rc)

>
Ignat Korchagin April 29, 2021, 2:49 p.m. UTC | #3
On Thu, Apr 29, 2021 at 3:22 PM Edward Cree <ecree.xilinx@gmail.com> wrote:
>

> On 27/04/2021 22:09, Ignat Korchagin wrote:

> > efx->xdp_tx_queue_count is initially initialized to num_possible_cpus() and is

> > later used to allocate and traverse efx->xdp_tx_queues lookup array. However,

> > we may end up not initializing all the array slots with real queues during

> > probing. This results, for example, in a NULL pointer dereference, when running

> > "# ethtool -S <iface>", similar to below

> ...

> > diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c

> > index 1bfeee283ea9..a3ca406a3561 100644

> > --- a/drivers/net/ethernet/sfc/efx_channels.c

> > +++ b/drivers/net/ethernet/sfc/efx_channels.c

> > @@ -914,6 +914,8 @@ int efx_set_channels(struct efx_nic *efx)

> >                       }

> >               }

> >       }

> > +     if (xdp_queue_number)

> Wait, why is this guard condition needed?

> What happens if we had nonzero efx->xdp_tx_queue_count initially, but we end up

>  with no TXQs available for XDP at all (so xdp_queue_number == 0)?

>

> -ed


My thoughts were: efx->xdp_tx_queue_count is originally used to
allocate efx->xdp_tx_queues.
So, if xdp_queue_number ends up being 0, we should keep
efx->xdp_tx_queue_count positive not
to forget to release efx->xdp_tx_queues (because most checks are
efx->xdp_tx_queue_count && efx->xdp_tx_queues).

I'm not familiar enough with SFC internals to definitely say if it is
even possible to have
xdp_queue_number == 0 while having efx->xdp_tx_queue_count > 0, but my
understanding is that
it should not be possible due to the checks in the driver init path,
when we actually determine the number
of queues, channels, events per channel etc.

Ignat

> > +             efx->xdp_tx_queue_count = xdp_queue_number;

> >

> >       rc = netif_set_real_num_tx_queues(efx->net_dev, efx->n_tx_channels);

> >       if (rc)

> >

>
Edward Cree April 29, 2021, 3:04 p.m. UTC | #4
On 29/04/2021 15:49, Ignat Korchagin wrote:
> On Thu, Apr 29, 2021 at 3:22 PM Edward Cree <ecree.xilinx@gmail.com> wrote:

>>

>> On 27/04/2021 22:09, Ignat Korchagin wrote:

>>> +     if (xdp_queue_number)

>> Wait, why is this guard condition needed?

>> What happens if we had nonzero efx->xdp_tx_queue_count initially, but we end up

>>  with no TXQs available for XDP at all (so xdp_queue_number == 0)?

>>

>> -ed

> 

> My thoughts were: efx->xdp_tx_queue_count is originally used to

> allocate efx->xdp_tx_queues.

> So, if xdp_queue_number ends up being 0, we should keep

> efx->xdp_tx_queue_count positive not

> to forget to release efx->xdp_tx_queues (because most checks are

> efx->xdp_tx_queue_count && efx->xdp_tx_queues).

Well, we allocated it in this function, so could we not just free it
 (and NULL it) if we get here with xdp_queue_number == 0?
Assuming it even makes sense for those checks to be that conjunction,
 and not just efx->xdp_tx_queues.

> I'm not familiar enough with SFC internals to definitely say if it is

> even possible to have

> xdp_queue_number == 0 while having efx->xdp_tx_queue_count > 0

If it's possible for us to get xdp_queue_number != efx->xdp_tx_queue_count
 at all (which I can't remember exactly how it happens, but I think it's a
 case of not getting as many VIs back from firmware as we wanted, which
 happens after the initial determination of numbers of queues & channels),
 then it's possible that our number of available TXQs is reduced far
 enough that we don't have any left for XDP.
At least, I think so; this part of the driver confuses me too :S

-ed
diff mbox series

Patch

diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
index 1bfeee283ea9..a3ca406a3561 100644
--- a/drivers/net/ethernet/sfc/efx_channels.c
+++ b/drivers/net/ethernet/sfc/efx_channels.c
@@ -914,6 +914,8 @@  int efx_set_channels(struct efx_nic *efx)
 			}
 		}
 	}
+	if (xdp_queue_number)
+		efx->xdp_tx_queue_count = xdp_queue_number;
 
 	rc = netif_set_real_num_tx_queues(efx->net_dev, efx->n_tx_channels);
 	if (rc)