diff mbox series

[net-next,03/21] ethtool, stats: introduce standard XDP statistics

Message ID 20210803163641.3743-4-alexandr.lobakin@intel.com
State New
Headers show
Series None | expand

Commit Message

Alexander Lobakin Aug. 3, 2021, 4:36 p.m. UTC
Most of the driver-side XDP enabled drivers provide some statistics
on XDP programs runs and different actions taken (number of passes,
drops, redirects etc.).
Regarding that it's almost pretty the same across all the drivers
(which is obvious), we can implement some sort of "standardized"
statistics using Ethtool standard stats infra to eliminate a lot
of code and stringsets duplication, different approaches to count
these stats and so on.
These new 12 fields provided by the standard XDP stats should cover
most, if not all, stats that might be interesting for collecting and
tracking.
Note that most NIC drivers keep XDP statistics on a per-channel
basis, so this also introduces a new callback for getting a number
of channels which a driver will provide stats for. If it's not
implemented or returns 0, it means stats are global/device-wide.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---
 include/linux/ethtool.h              |  36 +++++++
 include/uapi/linux/ethtool.h         |   2 +
 include/uapi/linux/ethtool_netlink.h |  34 +++++++
 net/ethtool/netlink.h                |   1 +
 net/ethtool/stats.c                  | 134 ++++++++++++++++++++++++++-
 net/ethtool/strset.c                 |   5 +
 6 files changed, 211 insertions(+), 1 deletion(-)

Comments

Saeed Mahameed Aug. 3, 2021, 11:57 p.m. UTC | #1
On Tue, 2021-08-03 at 13:49 -0700, Jakub Kicinski wrote:
> On Tue,  3 Aug 2021 18:36:23 +0200 Alexander Lobakin wrote:
> > Most of the driver-side XDP enabled drivers provide some statistics
> > on XDP programs runs and different actions taken (number of passes,
> > drops, redirects etc.).
> 
> Could you please share the statistics to back that statement up?
> Having uAPI for XDP stats is pretty much making the recommendation 
> that drivers should implement such stats. The recommendation from
> Alexei and others back in the day (IIRC) was that XDP programs should
> implement stats, not the drivers, to avoid duplication.
> 

There are stats "mainly errors*"  that are not even visible or reported
to the user prog, for that i had an idea in the past to attach an
exception_bpf_prog provided by the user, where driver/stack will report
errors to this special exception_prog.

> > Regarding that it's almost pretty the same across all the drivers
> > (which is obvious), we can implement some sort of "standardized"
> > statistics using Ethtool standard stats infra to eliminate a lot
> > of code and stringsets duplication, different approaches to count
> > these stats and so on.
> 
> I'm not 100% sold on the fact that these should be ethtool stats. 
> Why not rtnl_fill_statsinfo() stats? Current ethtool std stats are 
> all pretty Ethernet specific, and all HW stats. Mixing HW and SW
> stats
> is what we're trying to get away from.
> 

XDP is going to always be eBPF based ! why not just report such stats
to a special BPF_MAP ? BPF stack can collect the stats from the driver
and report them to this special MAP upon user request.

> > These new 12 fields provided by the standard XDP stats should cover
> > most, if not all, stats that might be interesting for collecting
> > and
> > tracking.
> > Note that most NIC drivers keep XDP statistics on a per-channel
> > basis, so this also introduces a new callback for getting a number
> > of channels which a driver will provide stats for. If it's not
> > implemented or returns 0, it means stats are global/device-wide.
> 
> Per-channel stats via std ethtool stats are not a good idea. Per
> queue
> stats must be via the queue netlink interface we keep talking about
> for
> ever but which doesn't seem to materialize. When stats are reported
> via
> a different interface than objects they pertain to matching stats,
> objects and their lifetime becomes very murky.
Jakub Kicinski Aug. 4, 2021, 12:36 p.m. UTC | #2
On Tue, 03 Aug 2021 16:57:22 -0700 Saeed Mahameed wrote:
> On Tue, 2021-08-03 at 13:49 -0700, Jakub Kicinski wrote:

> > On Tue,  3 Aug 2021 18:36:23 +0200 Alexander Lobakin wrote:  

> > > Most of the driver-side XDP enabled drivers provide some statistics

> > > on XDP programs runs and different actions taken (number of passes,

> > > drops, redirects etc.).  

> > 

> > Could you please share the statistics to back that statement up?

> > Having uAPI for XDP stats is pretty much making the recommendation 

> > that drivers should implement such stats. The recommendation from

> > Alexei and others back in the day (IIRC) was that XDP programs should

> > implement stats, not the drivers, to avoid duplication.

> 

> There are stats "mainly errors*"  that are not even visible or reported

> to the user prog, 


Fair point, exceptions should not be performance critical.

> for that i had an idea in the past to attach an

> exception_bpf_prog provided by the user, where driver/stack will report

> errors to this special exception_prog.


Or maybe we should turn trace_xdp_exception() into a call which
unconditionally collects exception stats? I think we can reasonably
expect the exception_bpf_prog to always be attached, right?

> > > Regarding that it's almost pretty the same across all the drivers

> > > (which is obvious), we can implement some sort of "standardized"

> > > statistics using Ethtool standard stats infra to eliminate a lot

> > > of code and stringsets duplication, different approaches to count

> > > these stats and so on.  

> > 

> > I'm not 100% sold on the fact that these should be ethtool stats. 

> > Why not rtnl_fill_statsinfo() stats? Current ethtool std stats are 

> > all pretty Ethernet specific, and all HW stats. Mixing HW and SW

> > stats

> > is what we're trying to get away from.

> 

> XDP is going to always be eBPF based ! why not just report such stats

> to a special BPF_MAP ? BPF stack can collect the stats from the driver

> and report them to this special MAP upon user request.


Do you mean replacing the ethtool-netlink / rtnetlink etc. with
a new BPF_MAP? I don't think adding another category of uAPI thru 
which netdevice stats are exposed would do much good :( Plus it 
doesn't address the "yet another cacheline" concern.

To my understanding the need for stats recognizes the fact that (in
large organizations) fleet monitoring is done by different teams than
XDP development. So XDP team may have all the stats they need, but the
team doing fleet monitoring has no idea how to get to them.

To bridge the two worlds we need a way for the infra team to ask the
XDP for well-defined stats. Maybe we should take a page from the BPF
iterators book and create a program type for bridging the two worlds?
Called by networking core when duping stats to extract from the
existing BPF maps all the relevant stats and render them into a well
known struct? Users' XDP design can still use a single per-cpu map with
all the stats if they so choose, but there's a way to implement more
optimal designs and still expose well-defined stats.

Maybe that's too complex, IDK.
Alexander Lobakin Aug. 4, 2021, 3:53 p.m. UTC | #3
From: Jakub Kicinski <kuba@kernel.org>

Date: Wed, 4 Aug 2021 05:36:50 -0700

> On Tue, 03 Aug 2021 16:57:22 -0700 Saeed Mahameed wrote:

> > On Tue, 2021-08-03 at 13:49 -0700, Jakub Kicinski wrote:

> > > On Tue,  3 Aug 2021 18:36:23 +0200 Alexander Lobakin wrote:  

> > > > Most of the driver-side XDP enabled drivers provide some statistics

> > > > on XDP programs runs and different actions taken (number of passes,

> > > > drops, redirects etc.).  

> > > 

> > > Could you please share the statistics to back that statement up?

> > > Having uAPI for XDP stats is pretty much making the recommendation 

> > > that drivers should implement such stats. The recommendation from

> > > Alexei and others back in the day (IIRC) was that XDP programs should

> > > implement stats, not the drivers, to avoid duplication.


Well, 20+ patches in the series with at least half of them is
drivers conversion. Plus mlx5. Plus we'll about to land XDP
statistics for all Intel drivers, just firstly need to get a
common infra for them (the purpose of this series).

Also, introducing IEEE and rmon stats didn't make a statement that
all drivers should really expose them, right?

> > There are stats "mainly errors*"  that are not even visible or reported

> > to the user prog, 


Not really. Many drivers like to count the number of redirects,
xdp_xmits and stuff (incl. mlx5). Nevertheless, these stats aren't
the same as something you can get from inside an XDP prog, right.

> Fair point, exceptions should not be performance critical.

> 

> > for that i had an idea in the past to attach an

> > exception_bpf_prog provided by the user, where driver/stack will report

> > errors to this special exception_prog.

> 

> Or maybe we should turn trace_xdp_exception() into a call which

> unconditionally collects exception stats? I think we can reasonably

> expect the exception_bpf_prog to always be attached, right?


trace_xdp_exception() is again a error path, and would restrict us
to have only "bad" statistics.

> > > > Regarding that it's almost pretty the same across all the drivers

> > > > (which is obvious), we can implement some sort of "standardized"

> > > > statistics using Ethtool standard stats infra to eliminate a lot

> > > > of code and stringsets duplication, different approaches to count

> > > > these stats and so on.  

> > > 

> > > I'm not 100% sold on the fact that these should be ethtool stats. 

> > > Why not rtnl_fill_statsinfo() stats? Current ethtool std stats are 

> > > all pretty Ethernet specific, and all HW stats. Mixing HW and SW

> > > stats

> > > is what we're trying to get away from.


I was trying to introduce as few functional changes as possible,
including that all the current drivers expose XDP stats through
Ethtool.
I don't say it's a 100% optimal way, but lots of different scripts
and monitoring tools are already based on this fact and there can
be some negative impact. There'll be for sure due to that std stats
is a bit different thing and different drivers count and name XDP
stats differently (breh).

BTW, I'm fine with rtnl xstats. A nice reminder, thanks. If there
won't be much cons like "don't touch our Ethtool stats", I would
prefer this one instead of Ethtool standard stats way.

> > XDP is going to always be eBPF based ! why not just report such stats

> > to a special BPF_MAP ? BPF stack can collect the stats from the driver

> > and report them to this special MAP upon user request.

> 

> Do you mean replacing the ethtool-netlink / rtnetlink etc. with

> a new BPF_MAP? I don't think adding another category of uAPI thru 

> which netdevice stats are exposed would do much good :( Plus it 

> doesn't address the "yet another cacheline" concern.


+ this makes obtaining/tracking the statistics much harder. For now,
all you need is `ethtool -S devname` (mainline) or
`ethtool -S devname --groups xdp` (this series), and obtaining rtnl
xstats is just a different command to invoke. BPF_MAP-based stats
are a completely different story then.

> To my understanding the need for stats recognizes the fact that (in

> large organizations) fleet monitoring is done by different teams than

> XDP development. So XDP team may have all the stats they need, but the

> team doing fleet monitoring has no idea how to get to them.

> 

> To bridge the two worlds we need a way for the infra team to ask the

> XDP for well-defined stats. Maybe we should take a page from the BPF

> iterators book and create a program type for bridging the two worlds?

> Called by networking core when duping stats to extract from the

> existing BPF maps all the relevant stats and render them into a well

> known struct? Users' XDP design can still use a single per-cpu map with

> all the stats if they so choose, but there's a way to implement more

> optimal designs and still expose well-defined stats.

> 

> Maybe that's too complex, IDK.


Thanks,
Al
David Ahern Aug. 4, 2021, 4:17 p.m. UTC | #4
On 8/4/21 6:36 AM, Jakub Kicinski wrote:
>> XDP is going to always be eBPF based ! why not just report such stats

>> to a special BPF_MAP ? BPF stack can collect the stats from the driver

>> and report them to this special MAP upon user request.

> Do you mean replacing the ethtool-netlink / rtnetlink etc. with

> a new BPF_MAP? I don't think adding another category of uAPI thru 

> which netdevice stats are exposed would do much good :( Plus it 

> doesn't address the "yet another cacheline" concern.

> 

> To my understanding the need for stats recognizes the fact that (in

> large organizations) fleet monitoring is done by different teams than

> XDP development. So XDP team may have all the stats they need, but the

> team doing fleet monitoring has no idea how to get to them.

> 

> To bridge the two worlds we need a way for the infra team to ask the

> XDP for well-defined stats. Maybe we should take a page from the BPF

> iterators book and create a program type for bridging the two worlds?

> Called by networking core when duping stats to extract from the

> existing BPF maps all the relevant stats and render them into a well

> known struct? Users' XDP design can still use a single per-cpu map with

> all the stats if they so choose, but there's a way to implement more

> optimal designs and still expose well-defined stats.

> 

> Maybe that's too complex, IDK.


I was just explaining to someone internally how to get stats at all of
the different points in the stack to track down reasons for dropped packets:

ethtool -S for h/w and driver
tc -s for drops by the qdisc
/proc/net/softnet_stat for drops at the backlog layer
netstat -s for network and transport layer

yet another command and API just adds to the nightmare of explaining and
understanding these stats.

There is real value in continuing to use ethtool API for XDP stats. Not
saying this reorg of the XDP stats is the right thing to do, only that
the existing API has real user benefits.

Does anyone have data that shows bumping a properly implemented counter
causes a noticeable performance degradation and if so by how much? You
mention 'yet another cacheline' but collecting stats on stack and
incrementing the driver structs at the end of the napi loop should not
have a huge impact versus the value the stats provide.
Jakub Kicinski Aug. 4, 2021, 4:44 p.m. UTC | #5
On Wed, 4 Aug 2021 10:17:56 -0600 David Ahern wrote:
> On 8/4/21 6:36 AM, Jakub Kicinski wrote:

> >> XDP is going to always be eBPF based ! why not just report such stats

> >> to a special BPF_MAP ? BPF stack can collect the stats from the driver

> >> and report them to this special MAP upon user request.  

> > Do you mean replacing the ethtool-netlink / rtnetlink etc. with

> > a new BPF_MAP? I don't think adding another category of uAPI thru 

> > which netdevice stats are exposed would do much good :( Plus it 

> > doesn't address the "yet another cacheline" concern.

> > 

> > To my understanding the need for stats recognizes the fact that (in

> > large organizations) fleet monitoring is done by different teams than

> > XDP development. So XDP team may have all the stats they need, but the

> > team doing fleet monitoring has no idea how to get to them.

> > 

> > To bridge the two worlds we need a way for the infra team to ask the

> > XDP for well-defined stats. Maybe we should take a page from the BPF

> > iterators book and create a program type for bridging the two worlds?

> > Called by networking core when duping stats to extract from the

> > existing BPF maps all the relevant stats and render them into a well

> > known struct? Users' XDP design can still use a single per-cpu map with

> > all the stats if they so choose, but there's a way to implement more

> > optimal designs and still expose well-defined stats.

> > 

> > Maybe that's too complex, IDK.  

> 

> I was just explaining to someone internally how to get stats at all of

> the different points in the stack to track down reasons for dropped packets:

> 

> ethtool -S for h/w and driver

> tc -s for drops by the qdisc

> /proc/net/softnet_stat for drops at the backlog layer

> netstat -s for network and transport layer

> 

> yet another command and API just adds to the nightmare of explaining and

> understanding these stats.


Are you referring to RTM_GETSTATS when you say "yet another command"?
RTM_GETSTATS exists and is used by offloads today.

I'd expect ip -s (-s) to be extended to run GETSTATS and display the xdp
stats. (Not sure why ip -s was left out of your list :))

> There is real value in continuing to use ethtool API for XDP stats. Not

> saying this reorg of the XDP stats is the right thing to do, only that

> the existing API has real user benefits.


RTM_GETSTATS is an existing API. New ethtool stats are intended to be HW
stats. I don't want to go back to ethtool being a dumping ground for all
stats because that's what the old interface encouraged.

> Does anyone have data that shows bumping a properly implemented counter

> causes a noticeable performance degradation and if so by how much? You

> mention 'yet another cacheline' but collecting stats on stack and

> incrementing the driver structs at the end of the napi loop should not

> have a huge impact versus the value the stats provide.


Not sure, maybe Jesper has some numbers. Maybe Intel folks do?

I'm just allergic to situations when there is a decision made and 
then months later patches are posted disregarding the decision, 
without analysis on why that decision was wrong. And while the
maintainer who made the decision is on vacation.
Jakub Kicinski Aug. 4, 2021, 4:57 p.m. UTC | #6
On Wed,  4 Aug 2021 17:53:27 +0200 Alexander Lobakin wrote:
> From: Jakub Kicinski <kuba@kernel.org>

> Date: Wed, 4 Aug 2021 05:36:50 -0700

> 

> > On Tue, 03 Aug 2021 16:57:22 -0700 Saeed Mahameed wrote:  

> > > On Tue, 2021-08-03 at 13:49 -0700, Jakub Kicinski wrote:  

> > > > On Tue,  3 Aug 2021 18:36:23 +0200 Alexander Lobakin wrote:    

> > > > > Most of the driver-side XDP enabled drivers provide some statistics

> > > > > on XDP programs runs and different actions taken (number of passes,

> > > > > drops, redirects etc.).    

> > > > 

> > > > Could you please share the statistics to back that statement up?

> > > > Having uAPI for XDP stats is pretty much making the recommendation 

> > > > that drivers should implement such stats. The recommendation from

> > > > Alexei and others back in the day (IIRC) was that XDP programs should

> > > > implement stats, not the drivers, to avoid duplication.  

> 

> Well, 20+ patches in the series with at least half of them is

> drivers conversion. Plus mlx5. Plus we'll about to land XDP

> statistics for all Intel drivers, just firstly need to get a

> common infra for them (the purpose of this series).


Great, do you have impact of the stats on Intel drivers?
(Preferably from realistic scenarios where CPU cache is actually 
under pressure, not { return XDP_PASS; }). Numbers win arguments.

> Also, introducing IEEE and rmon stats didn't make a statement that

> all drivers should really expose them, right?


That's not relevant. IEEE and RMON stats are read from HW, they have 
no impact on the SW fast path.

> > > There are stats "mainly errors*"  that are not even visible or reported

> > > to the user prog,   

> 

> Not really. Many drivers like to count the number of redirects,

> xdp_xmits and stuff (incl. mlx5). Nevertheless, these stats aren't

> the same as something you can get from inside an XDP prog, right.

> 

> > Fair point, exceptions should not be performance critical.

> >   

> > > for that i had an idea in the past to attach an

> > > exception_bpf_prog provided by the user, where driver/stack will report

> > > errors to this special exception_prog.  

> > 

> > Or maybe we should turn trace_xdp_exception() into a call which

> > unconditionally collects exception stats? I think we can reasonably

> > expect the exception_bpf_prog to always be attached, right?  

> 

> trace_xdp_exception() is again a error path, and would restrict us

> to have only "bad" statistics.

> 

> > > > > Regarding that it's almost pretty the same across all the drivers

> > > > > (which is obvious), we can implement some sort of "standardized"

> > > > > statistics using Ethtool standard stats infra to eliminate a lot

> > > > > of code and stringsets duplication, different approaches to count

> > > > > these stats and so on.    

> > > > 

> > > > I'm not 100% sold on the fact that these should be ethtool stats. 

> > > > Why not rtnl_fill_statsinfo() stats? Current ethtool std stats are 

> > > > all pretty Ethernet specific, and all HW stats. Mixing HW and SW

> > > > stats

> > > > is what we're trying to get away from.  

> 

> I was trying to introduce as few functional changes as possible,

> including that all the current drivers expose XDP stats through

> Ethtool.


You know this, but for the benefit of others - ethtool -S does not 
dump standard stats from the netlink API, and ethtool -S --goups does
not dump "old" stats. So users will need to use different commands
to get to the two, anyway.

> I don't say it's a 100% optimal way, but lots of different scripts

> and monitoring tools are already based on this fact and there can

> be some negative impact. There'll be for sure due to that std stats

> is a bit different thing and different drivers count and name XDP

> stats differently (breh).


That's concerning. I'd much rather you didn't convert all the drivers
than convert them before someone makes 100% sure the meaning of the
stats is equivalent.

> BTW, I'm fine with rtnl xstats. A nice reminder, thanks. If there

> won't be much cons like "don't touch our Ethtool stats", I would

> prefer this one instead of Ethtool standard stats way.


You'll have to leave the ethtool -S ones in place anyway, right?
New drivers would not include them but I don't think there's much
we can (or should) do for the existing ones.

> > > XDP is going to always be eBPF based ! why not just report such stats

> > > to a special BPF_MAP ? BPF stack can collect the stats from the driver

> > > and report them to this special MAP upon user request.  

> > 

> > Do you mean replacing the ethtool-netlink / rtnetlink etc. with

> > a new BPF_MAP? I don't think adding another category of uAPI thru 

> > which netdevice stats are exposed would do much good :( Plus it 

> > doesn't address the "yet another cacheline" concern.  

> 

> + this makes obtaining/tracking the statistics much harder. For now,

> all you need is `ethtool -S devname` (mainline) or

> `ethtool -S devname --groups xdp` (this series), and obtaining rtnl

> xstats is just a different command to invoke. BPF_MAP-based stats

> are a completely different story then.

> 

> > To my understanding the need for stats recognizes the fact that (in

> > large organizations) fleet monitoring is done by different teams than

> > XDP development. So XDP team may have all the stats they need, but the

> > team doing fleet monitoring has no idea how to get to them.

> > 

> > To bridge the two worlds we need a way for the infra team to ask the

> > XDP for well-defined stats. Maybe we should take a page from the BPF

> > iterators book and create a program type for bridging the two worlds?

> > Called by networking core when duping stats to extract from the

> > existing BPF maps all the relevant stats and render them into a well

> > known struct? Users' XDP design can still use a single per-cpu map with

> > all the stats if they so choose, but there's a way to implement more

> > optimal designs and still expose well-defined stats.

> > 

> > Maybe that's too complex, IDK.  

> 

> Thanks,

> Al
David Ahern Aug. 4, 2021, 5:28 p.m. UTC | #7
On 8/4/21 10:44 AM, Jakub Kicinski wrote:
> On Wed, 4 Aug 2021 10:17:56 -0600 David Ahern wrote:

>> On 8/4/21 6:36 AM, Jakub Kicinski wrote:

>>>> XDP is going to always be eBPF based ! why not just report such stats

>>>> to a special BPF_MAP ? BPF stack can collect the stats from the driver

>>>> and report them to this special MAP upon user request.  

>>> Do you mean replacing the ethtool-netlink / rtnetlink etc. with

>>> a new BPF_MAP? I don't think adding another category of uAPI thru 

>>> which netdevice stats are exposed would do much good :( Plus it 

>>> doesn't address the "yet another cacheline" concern.

>>>

>>> To my understanding the need for stats recognizes the fact that (in

>>> large organizations) fleet monitoring is done by different teams than

>>> XDP development. So XDP team may have all the stats they need, but the

>>> team doing fleet monitoring has no idea how to get to them.

>>>

>>> To bridge the two worlds we need a way for the infra team to ask the

>>> XDP for well-defined stats. Maybe we should take a page from the BPF

>>> iterators book and create a program type for bridging the two worlds?

>>> Called by networking core when duping stats to extract from the

>>> existing BPF maps all the relevant stats and render them into a well

>>> known struct? Users' XDP design can still use a single per-cpu map with

>>> all the stats if they so choose, but there's a way to implement more

>>> optimal designs and still expose well-defined stats.

>>>

>>> Maybe that's too complex, IDK.  

>>

>> I was just explaining to someone internally how to get stats at all of

>> the different points in the stack to track down reasons for dropped packets:

>>

>> ethtool -S for h/w and driver

>> tc -s for drops by the qdisc

>> /proc/net/softnet_stat for drops at the backlog layer

>> netstat -s for network and transport layer

>>

>> yet another command and API just adds to the nightmare of explaining and

>> understanding these stats.

> 

> Are you referring to RTM_GETSTATS when you say "yet another command"?

> RTM_GETSTATS exists and is used by offloads today.

> 

> I'd expect ip -s (-s) to be extended to run GETSTATS and display the xdp

> stats. (Not sure why ip -s was left out of your list :))


It's on my diagram, and yes, forgot to add it here.

> 

>> There is real value in continuing to use ethtool API for XDP stats. Not

>> saying this reorg of the XDP stats is the right thing to do, only that

>> the existing API has real user benefits.

> 

> RTM_GETSTATS is an existing API. New ethtool stats are intended to be HW

> stats. I don't want to go back to ethtool being a dumping ground for all

> stats because that's what the old interface encouraged.


driver stats are important too. e.g., mlx5's cache stats and per-queue
stats.

> 

>> Does anyone have data that shows bumping a properly implemented counter

>> causes a noticeable performance degradation and if so by how much? You

>> mention 'yet another cacheline' but collecting stats on stack and

>> incrementing the driver structs at the end of the napi loop should not

>> have a huge impact versus the value the stats provide.

> 

> Not sure, maybe Jesper has some numbers. Maybe Intel folks do?


I just ran some quick tests with my setup and measured about 1.2% worst
case. Certainly not exhaustive. Perhaps Intel or Mellanox can provide
numbers for their high speed nics - e.g. ConnectX-6 and a saturated host.

> 

> I'm just allergic to situations when there is a decision made and 

> then months later patches are posted disregarding the decision, 

> without analysis on why that decision was wrong. And while the

> maintainer who made the decision is on vacation.

> 


stats is one of the many sensitive topics. I have been consistent in
defending the need to use existing APIs and tooling and not relying on
XDP program writers to add the relevant stats and then provide whatever
tool is needed to extract and print them. Standardization for
fundamental analysis tools.
Saeed Mahameed Aug. 4, 2021, 6:27 p.m. UTC | #8
On Wed, 2021-08-04 at 11:28 -0600, David Ahern wrote:
> On 8/4/21 10:44 AM, Jakub Kicinski wrote:

> > On Wed, 4 Aug 2021 10:17:56 -0600 David Ahern wrote:

> > > On 8/4/21 6:36 AM, Jakub Kicinski wrote:

> > > > > XDP is going to always be eBPF based ! why not just report

> > > > > such stats

> > > > > to a special BPF_MAP ? BPF stack can collect the stats from

> > > > > the driver

> > > > > and report them to this special MAP upon user request.  

> > > > Do you mean replacing the ethtool-netlink / rtnetlink etc. with

> > > > a new BPF_MAP? I don't think adding another category of uAPI

> > > > thru 

> > > > which netdevice stats are exposed would do much good :( Plus it

> > > > doesn't address the "yet another cacheline" concern.

> > > > 

> > > > To my understanding the need for stats recognizes the fact that

> > > > (in

> > > > large organizations) fleet monitoring is done by different

> > > > teams than

> > > > XDP development. So XDP team may have all the stats they need,

> > > > but the

> > > > team doing fleet monitoring has no idea how to get to them.

> > > > 

> > > > To bridge the two worlds we need a way for the infra team to

> > > > ask the

> > > > XDP for well-defined stats. Maybe we should take a page from

> > > > the BPF

> > > > iterators book and create a program type for bridging the two

> > > > worlds?

> > > > Called by networking core when duping stats to extract from the

> > > > existing BPF maps all the relevant stats and render them into a

> > > > well

> > > > known struct? Users' XDP design can still use a single per-cpu

> > > > map with

> > > > all the stats if they so choose, but there's a way to implement

> > > > more

> > > > optimal designs and still expose well-defined stats.

> > > > 

> > > > Maybe that's too complex, IDK.  

> > > 


The main question here, do we want the prog to count or driver ? 
and the answer will lead to more questions :) :

1) will the prog/user need to access driver for driver only stats ? or
driver shall report to a special program and all the collection and
reporting is done in XDP/BPF internally .. 
2) stats per prog/queue/cpu/interface ? 
3) how to eventually report to user ethtool/ip -s/bpftool ?

too complex, IDK too .. :D


> > > I was just explaining to someone internally how to get stats at

> > > all of

> > > the different points in the stack to track down reasons for

> > > dropped packets:

> > > 

> > > ethtool -S for h/w and driver

> > > tc -s for drops by the qdisc

> > > /proc/net/softnet_stat for drops at the backlog layer

> > > netstat -s for network and transport layer

> > > 

> > > yet another command and API just adds to the nightmare of

> > > explaining and

> > > understanding these stats.

> > 

> > Are you referring to RTM_GETSTATS when you say "yet another

> > command"?

> > RTM_GETSTATS exists and is used by offloads today.

> > 

> > I'd expect ip -s (-s) to be extended to run GETSTATS and display

> > the xdp

> > stats. (Not sure why ip -s was left out of your list :))

> 

> It's on my diagram, and yes, forgot to add it here.

> 


i think ip -s is a good place for "standard" driver based xdp stats.
but as Jakub already explained, adding such driver mechanism is like
making a statement that drivers must implement this.

> > 

> > > There is real value in continuing to use ethtool API for XDP

> > > stats. Not

> > > saying this reorg of the XDP stats is the right thing to do, only

> > > that

> > > the existing API has real user benefits.

> > 

> > RTM_GETSTATS is an existing API. New ethtool stats are intended to

> > be HW

> > stats. I don't want to go back to ethtool being a dumping ground

> > for all

> > stats because that's what the old interface encouraged.

> 

> driver stats are important too. e.g., mlx5's cache stats and per-

> queue

> stats.

> 


one could claim that mlx5 cache stats should move to page_pool and
per_queue stats should move to the stack.

> > 

> > > Does anyone have data that shows bumping a properly implemented

> > > counter

> > > causes a noticeable performance degradation and if so by how

> > > much? You

> > > mention 'yet another cacheline' but collecting stats on stack and

> > > incrementing the driver structs at the end of the napi loop

> > > should not

> > > have a huge impact versus the value the stats provide.

> > 

> > Not sure, maybe Jesper has some numbers. Maybe Intel folks do?

> 


A properly implemented counter that doesn't introduce new cache misses,
will hardly show any measurable difference, the only way to measure is
via instructions per packet.

usually the way we implement counters in mlx5 is that if this is the
fastest flow that we expect then we only increment the good counters
"packet++/drop++/redirect++" any slower path should include counters to
indicate the slower path and the effect of the new "slower" counters
will still be negligible as we already are at a higher instructions per
packet hence the slower path .. 

the only time you measure a difference is when you introduce new
counting on a counter-free flow, e.g page_pool ;)

> I just ran some quick tests with my setup and measured about 1.2%

> worst


1.2% is a lot ! what was the test ? what is the change ?

> case. Certainly not exhaustive. Perhaps Intel or Mellanox can provide

> numbers for their high speed nics - e.g. ConnectX-6 and a saturated

> host.

> 


let's define what are we testing first, there are multiple places we
need to check, Tariq will be exploring transitioning mlx5 cache to
page_pool with all the counters, maybe it is a good place to measure.. 

> > 

> > I'm just allergic to situations when there is a decision made and 

> > then months later patches are posted disregarding the decision, 

> > without analysis on why that decision was wrong. And while the

> > maintainer who made the decision is on vacation.

> > 

> 

> stats is one of the many sensitive topics. I have been consistent in

> defending the need to use existing APIs and tooling and not relying

> on

> XDP program writers to add the relevant stats and then provide

> whatever

> tool is needed to extract and print them. Standardization for

> fundamental analysis tools.
David Ahern Aug. 5, 2021, 12:43 a.m. UTC | #9
On 8/4/21 12:27 PM, Saeed Mahameed wrote:
> 

>> I just ran some quick tests with my setup and measured about 1.2%

>> worst

> 

> 1.2% is a lot ! what was the test ? what is the change ?


I did say "quick test ... not exhaustive" and it was definitely
eyeballing a pps change over a small time window.

If multiple counters are bumped 20-25 million times a second (e.g. XDP
drop case), how measurable is it? I was just trying to ballpark the
overhead - 1%, 5%, more? If it is <~ 1% then there is no performance
argument in which case let's do the right thing for users - export via
existing APIs.
Saeed Mahameed Aug. 5, 2021, 2:14 a.m. UTC | #10
On Wed, 2021-08-04 at 18:43 -0600, David Ahern wrote:
> On 8/4/21 12:27 PM, Saeed Mahameed wrote:

> > 

> > > I just ran some quick tests with my setup and measured about 1.2%

> > > worst

> > 

> > 1.2% is a lot ! what was the test ? what is the change ?

> 

> I did say "quick test ... not exhaustive" and it was definitely

> eyeballing a pps change over a small time window.

> 

> If multiple counters are bumped 20-25 million times a second (e.g.

> XDP

> drop case), how measurable is it? I was just trying to ballpark the

> overhead - 1%, 5%, more? If it is <~ 1% then there is no performance

> argument in which case let's do the right thing for users - export

> via

> existing APIs.


from experience I don't believe it can be more than 1% and yes on
existing APIs !
Alexander Lobakin Aug. 5, 2021, 11:18 a.m. UTC | #11
From: Jakub Kicinski <kuba@kernel.org>

Date: Wed, 5 Aug 2021 09:57:16 -0700

> On Wed,  4 Aug 2021 17:53:27 +0200 Alexander Lobakin wrote:

> > From: Jakub Kicinski <kuba@kernel.org>

> > Date: Wed, 4 Aug 2021 05:36:50 -0700

> > 

> > > On Tue, 03 Aug 2021 16:57:22 -0700 Saeed Mahameed wrote:  

> > > > On Tue, 2021-08-03 at 13:49 -0700, Jakub Kicinski wrote:  

> > > > > On Tue,  3 Aug 2021 18:36:23 +0200 Alexander Lobakin wrote:    

> > > > > > Most of the driver-side XDP enabled drivers provide some statistics

> > > > > > on XDP programs runs and different actions taken (number of passes,

> > > > > > drops, redirects etc.).    

> > > > > 

> > > > > Could you please share the statistics to back that statement up?

> > > > > Having uAPI for XDP stats is pretty much making the recommendation 

> > > > > that drivers should implement such stats. The recommendation from

> > > > > Alexei and others back in the day (IIRC) was that XDP programs should

> > > > > implement stats, not the drivers, to avoid duplication.  

> > 

> > Well, 20+ patches in the series with at least half of them is

> > drivers conversion. Plus mlx5. Plus we'll about to land XDP

> > statistics for all Intel drivers, just firstly need to get a

> > common infra for them (the purpose of this series).

> 

> Great, do you have impact of the stats on Intel drivers?

> (Preferably from realistic scenarios where CPU cache is actually 

> under pressure, not { return XDP_PASS; }). Numbers win arguments.

> 

> > Also, introducing IEEE and rmon stats didn't make a statement that

> > all drivers should really expose them, right?

> 

> That's not relevant. IEEE and RMON stats are read from HW, they have 

> no impact on the SW fast path.


True, I thought about this, but after the mail was sent, sorry.

> > > > There are stats "mainly errors*"  that are not even visible or reported

> > > > to the user prog,   

> > 

> > Not really. Many drivers like to count the number of redirects,

> > xdp_xmits and stuff (incl. mlx5). Nevertheless, these stats aren't

> > the same as something you can get from inside an XDP prog, right.

> > 

> > > Fair point, exceptions should not be performance critical.

> > >   

> > > > for that i had an idea in the past to attach an

> > > > exception_bpf_prog provided by the user, where driver/stack will report

> > > > errors to this special exception_prog.  

> > > 

> > > Or maybe we should turn trace_xdp_exception() into a call which

> > > unconditionally collects exception stats? I think we can reasonably

> > > expect the exception_bpf_prog to always be attached, right?  

> > 

> > trace_xdp_exception() is again a error path, and would restrict us

> > to have only "bad" statistics.

> > 

> > > > > > Regarding that it's almost pretty the same across all the drivers

> > > > > > (which is obvious), we can implement some sort of "standardized"

> > > > > > statistics using Ethtool standard stats infra to eliminate a lot

> > > > > > of code and stringsets duplication, different approaches to count

> > > > > > these stats and so on.    

> > > > > 

> > > > > I'm not 100% sold on the fact that these should be ethtool stats. 

> > > > > Why not rtnl_fill_statsinfo() stats? Current ethtool std stats are 

> > > > > all pretty Ethernet specific, and all HW stats. Mixing HW and SW

> > > > > stats

> > > > > is what we're trying to get away from.  

> > 

> > I was trying to introduce as few functional changes as possible,

> > including that all the current drivers expose XDP stats through

> > Ethtool.

> 

> You know this, but for the benefit of others - ethtool -S does not 

> dump standard stats from the netlink API, and ethtool -S --goups does

> not dump "old" stats. So users will need to use different commands

> to get to the two, anyway.


Ditto.

> > I don't say it's a 100% optimal way, but lots of different scripts

> > and monitoring tools are already based on this fact and there can

> > be some negative impact. There'll be for sure due to that std stats

> > is a bit different thing and different drivers count and name XDP

> > stats differently (breh).

> 

> That's concerning. I'd much rather you didn't convert all the drivers

> than convert them before someone makes 100% sure the meaning of the

> stats is equivalent.

> 

> > BTW, I'm fine with rtnl xstats. A nice reminder, thanks. If there

> > won't be much cons like "don't touch our Ethtool stats", I would

> > prefer this one instead of Ethtool standard stats way.

> 

> You'll have to leave the ethtool -S ones in place anyway, right?

> New drivers would not include them but I don't think there's much

> we can (or should) do for the existing ones.


That's also a nice and fair one. So I leave a little conclusion
at the end of this message to clarify things.

> > > > XDP is going to always be eBPF based ! why not just report such stats

> > > > to a special BPF_MAP ? BPF stack can collect the stats from the driver

> > > > and report them to this special MAP upon user request.  

> > > 

> > > Do you mean replacing the ethtool-netlink / rtnetlink etc. with

> > > a new BPF_MAP? I don't think adding another category of uAPI thru 

> > > which netdevice stats are exposed would do much good :( Plus it 

> > > doesn't address the "yet another cacheline" concern.  

> > 

> > + this makes obtaining/tracking the statistics much harder. For now,

> > all you need is `ethtool -S devname` (mainline) or

> > `ethtool -S devname --groups xdp` (this series), and obtaining rtnl

> > xstats is just a different command to invoke. BPF_MAP-based stats

> > are a completely different story then.

> > 

> > > To my understanding the need for stats recognizes the fact that (in

> > > large organizations) fleet monitoring is done by different teams than

> > > XDP development. So XDP team may have all the stats they need, but the

> > > team doing fleet monitoring has no idea how to get to them.

> > > 

> > > To bridge the two worlds we need a way for the infra team to ask the

> > > XDP for well-defined stats. Maybe we should take a page from the BPF

> > > iterators book and create a program type for bridging the two worlds?

> > > Called by networking core when duping stats to extract from the

> > > existing BPF maps all the relevant stats and render them into a well

> > > known struct? Users' XDP design can still use a single per-cpu map with

> > > all the stats if they so choose, but there's a way to implement more

> > > optimal designs and still expose well-defined stats.

> > > 

> > > Maybe that's too complex, IDK.  


[ the conclusion I talk about above ]

From the concerns, ideas and thoughts gone throughout this thread,
I see the next steps as:
 - don't use Ethtool standard stats infra as it's designed for HW
   stats only;
 - expose SW XDP stats via rtnl xstats which is currently used to
   query SW/CPU-port stats from switchdev-based drivers;
 - don't remove the existing Ethtool XDP stats provided by drivers,
   just wire them up with the new API too;
 - encourage driver developers to use this new API when they want to
   provide any XDP stats, not the Ethtool stats which are already
   overburdened and mix HW, SW and whatnot in most of the complex
   drivers.

That really makes lot more sense for me than the v1 approach.

Anyways, I don't think I'll queue v2 before going on a vacation
which will happen in two days, so we have two more weeks to
discuss all this before I'll start the rework. I'll still be on
the line here and keep on tracking the thread and replying when
needed.

> > Thanks,

> > Al


Thanks everyone for the feedback, lots of precious stuff,
Al
Jakub Kicinski Aug. 5, 2021, 1:31 p.m. UTC | #12
On Thu,  5 Aug 2021 13:18:26 +0200 Alexander Lobakin wrote:
>  - encourage driver developers to use this new API when they want to

>    provide any XDP stats, not the Ethtool stats which are already

>    overburdened and mix HW, SW and whatnot in most of the complex

>    drivers.


On the question of adding the stats in general I'd still ask for 
(a) performance analysis or (b) to start with just the exception 
stats which I'd think should be uncontroversial.
Jesper Dangaard Brouer Aug. 12, 2021, 12:19 p.m. UTC | #13
On 04/08/2021 18.44, Jakub Kicinski wrote:
> On Wed, 4 Aug 2021 10:17:56 -0600 David Ahern wrote:

>> On 8/4/21 6:36 AM, Jakub Kicinski wrote:

> 

>> Does anyone have data that shows bumping a properly implemented counter

>> causes a noticeable performance degradation and if so by how much? You

>> mention 'yet another cacheline' but collecting stats on stack and

>> incrementing the driver structs at the end of the napi loop should not

>> have a huge impact versus the value the stats provide.

> 

> Not sure, maybe Jesper has some numbers. Maybe Intel folks do?


(sorry, behind on emails after vacation ... just partly answering inside 
this thread, not checking if you did a smart counter impl.).

I don't have exact numbers, but I hope Magnus (Intel) would be motivated 
to validate performance degradation from this patchset.  As I know Intel 
is hunting the DPDK numbers with AF_XDP-zc, where every last cycle *do* 
count.

My experience is that counters can easily hurt performance, without the 
developers noticing the small degradation's.  As Ahern sketch out above 
(stats on stack + end of napi loop update), I do believe that a smart 
counter implementation is possible to hide this overhead (hopefully 
completely in the CPUs pipeline slots).

I do highly appreciate the effort to standardize the XDP stats!
So, I do hope this can somehow move forward.

--Jesper
diff mbox series

Patch

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 4711b96dae0c..62d617ad9f50 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -380,6 +380,36 @@  struct ethtool_rmon_stats {
 	u64 hist_tx[ETHTOOL_RMON_HIST_MAX];
 };
 
+/**
+ * struct ethtool_xdp_stats - standard driver-side XDP statistics
+ * @packets: number of frames passed to bpf_prog_run_xdp().
+ * @errors: number of general XDP errors, if driver has one unified counter.
+ * @aborted: number of %XDP_ABORTED returns.
+ * @drop: number of %XDP_DROP returns.
+ * @invalid: number of returns of unallowed values (i.e. not XDP_*).
+ * @pass: number of %XDP_PASS returns.
+ * @redirect: number of successfully performed %XDP_REDIRECT requests.
+ * @redirect_errors: number of failed %XDP_REDIRECT requests.
+ * @tx: number of successfully performed %XDP_TX requests.
+ * @tx_errors: number of failed %XDP_TX requests.
+ * @xmit: number of xdp_frames successfully transmitted via .ndo_xdp_xmit().
+ * @xmit_drops: number of frames dropped from .ndo_xdp_xmit().
+ */
+struct ethtool_xdp_stats {
+	u64	packets;
+	u64	errors;
+	u64	aborted;
+	u64	drop;
+	u64	invalid;
+	u64	pass;
+	u64	redirect;
+	u64	redirect_errors;
+	u64	tx;
+	u64	tx_errors;
+	u64	xmit;
+	u64	xmit_drops;
+};
+
 #define ETH_MODULE_EEPROM_PAGE_LEN	128
 #define ETH_MODULE_MAX_I2C_ADDRESS	0x7f
 
@@ -570,6 +600,9 @@  struct ethtool_module_eeprom {
  * @get_eth_ctrl_stats: Query some of the IEEE 802.3 MAC Ctrl statistics.
  * @get_rmon_stats: Query some of the RMON (RFC 2819) statistics.
  *	Set %ranges to a pointer to zero-terminated array of byte ranges.
+ * @get_std_stats_channels: Get the number of channels which get_*_stats will
+ *	return statistics for.
+ * @get_xdp_stats: Query some XDP statistics.
  *
  * All operations are optional (i.e. the function pointer may be set
  * to %NULL) and callers must take this into account.  Callers must
@@ -689,6 +722,9 @@  struct ethtool_ops {
 	void	(*get_rmon_stats)(struct net_device *dev,
 				  struct ethtool_rmon_stats *rmon_stats,
 				  const struct ethtool_rmon_hist_range **ranges);
+	int	(*get_std_stats_channels)(struct net_device *dev, u32 sset);
+	void	(*get_xdp_stats)(struct net_device *dev,
+				 struct ethtool_xdp_stats *xdp_stats);
 };
 
 int ethtool_check_ops(const struct ethtool_ops *ops);
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 67aa7134b301..c3f1f3cde739 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -674,6 +674,7 @@  enum ethtool_link_ext_substate_cable_issue {
  * @ETH_SS_STATS_ETH_MAC: names of IEEE 802.3 MAC statistics
  * @ETH_SS_STATS_ETH_CTRL: names of IEEE 802.3 MAC Control statistics
  * @ETH_SS_STATS_RMON: names of RMON statistics
+ * @ETH_SS_STATS_XDP: names of XDP statistics
  *
  * @ETH_SS_COUNT: number of defined string sets
  */
@@ -699,6 +700,7 @@  enum ethtool_stringset {
 	ETH_SS_STATS_ETH_MAC,
 	ETH_SS_STATS_ETH_CTRL,
 	ETH_SS_STATS_RMON,
+	ETH_SS_STATS_XDP,
 
 	/* add new constants above here */
 	ETH_SS_COUNT
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index b3b93710eff7..f9d19cfa9397 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -716,6 +716,7 @@  enum {
 	ETHTOOL_STATS_ETH_MAC,
 	ETHTOOL_STATS_ETH_CTRL,
 	ETHTOOL_STATS_RMON,
+	ETHTOOL_STATS_XDP,
 
 	/* add new constants above here */
 	__ETHTOOL_STATS_CNT
@@ -737,6 +738,8 @@  enum {
 	ETHTOOL_A_STATS_GRP_HIST_BKT_HI,	/* u32 */
 	ETHTOOL_A_STATS_GRP_HIST_VAL,		/* u64 */
 
+	ETHTOOL_A_STATS_GRP_STAT_BLOCK,		/* nest */
+
 	/* add new constants above here */
 	__ETHTOOL_A_STATS_GRP_CNT,
 	ETHTOOL_A_STATS_GRP_MAX = (__ETHTOOL_A_STATS_CNT - 1)
@@ -831,6 +834,37 @@  enum {
 	ETHTOOL_A_STATS_RMON_MAX = (__ETHTOOL_A_STATS_RMON_CNT - 1)
 };
 
+enum {
+	/* Number of frames passed to bpf_prog_run_xdp() */
+	ETHTOOL_A_STATS_XDP_PACKETS,
+	/* Number o general XDP errors if driver counts them together */
+	ETHTOOL_A_STATS_XDP_ERRORS,
+	/* Number of %XDP_ABORTED returns */
+	ETHTOOL_A_STATS_XDP_ABORTED,
+	/* Number of %XDP_DROP returns */
+	ETHTOOL_A_STATS_XDP_DROP,
+	/* Number of returns of unallowed values (i.e. not XDP_*) */
+	ETHTOOL_A_STATS_XDP_INVALID,
+	/* Number of %XDP_PASS returns */
+	ETHTOOL_A_STATS_XDP_PASS,
+	/* Number of successfully performed %XDP_REDIRECT requests */
+	ETHTOOL_A_STATS_XDP_REDIRECT,
+	/* Number of failed %XDP_REDIRECT requests */
+	ETHTOOL_A_STATS_XDP_REDIRECT_ERRORS,
+	/* Number of successfully performed %XDP_TX requests */
+	ETHTOOL_A_STATS_XDP_TX,
+	/* Number of failed %XDP_TX requests */
+	ETHTOOL_A_STATS_XDP_TX_ERRORS,
+	/* Number of xdp_frames successfully transmitted via .ndo_xdp_xmit() */
+	ETHTOOL_A_STATS_XDP_XMIT,
+	/* Number of frames dropped from .ndo_xdp_xmit() */
+	ETHTOOL_A_STATS_XDP_XMIT_DROPS,
+
+	/* Add new constants above here */
+	__ETHTOOL_A_STATS_XDP_CNT,
+	ETHTOOL_A_STATS_XDP_MAX = (__ETHTOOL_A_STATS_XDP_CNT - 1)
+};
+
 /* generic netlink info */
 #define ETHTOOL_GENL_NAME "ethtool"
 #define ETHTOOL_GENL_VERSION 1
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 077aac3929a8..c7983bba0b43 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -397,5 +397,6 @@  extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING
 extern const char stats_eth_mac_names[__ETHTOOL_A_STATS_ETH_MAC_CNT][ETH_GSTRING_LEN];
 extern const char stats_eth_ctrl_names[__ETHTOOL_A_STATS_ETH_CTRL_CNT][ETH_GSTRING_LEN];
 extern const char stats_rmon_names[__ETHTOOL_A_STATS_RMON_CNT][ETH_GSTRING_LEN];
+extern const char stats_xdp_names[__ETHTOOL_A_STATS_XDP_CNT][ETH_GSTRING_LEN];
 
 #endif /* _NET_ETHTOOL_NETLINK_H */
diff --git a/net/ethtool/stats.c b/net/ethtool/stats.c
index 8b5c27e034f9..76f2a78e8d02 100644
--- a/net/ethtool/stats.c
+++ b/net/ethtool/stats.c
@@ -19,6 +19,8 @@  struct stats_reply_data {
 	struct ethtool_eth_ctrl_stats	ctrl_stats;
 	struct ethtool_rmon_stats	rmon_stats;
 	const struct ethtool_rmon_hist_range	*rmon_ranges;
+	u32				xdp_stats_channels;
+	struct ethtool_xdp_stats	*xdp_stats;
 };
 
 #define STATS_REPDATA(__reply_base) \
@@ -29,6 +31,7 @@  const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN] = {
 	[ETHTOOL_STATS_ETH_MAC]			= "eth-mac",
 	[ETHTOOL_STATS_ETH_CTRL]		= "eth-ctrl",
 	[ETHTOOL_STATS_RMON]			= "rmon",
+	[ETHTOOL_STATS_XDP]			= "xdp",
 };
 
 const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN] = {
@@ -73,6 +76,21 @@  const char stats_rmon_names[__ETHTOOL_A_STATS_RMON_CNT][ETH_GSTRING_LEN] = {
 	[ETHTOOL_A_STATS_RMON_JABBER]		= "etherStatsJabbers",
 };
 
+const char stats_xdp_names[__ETHTOOL_A_STATS_XDP_CNT][ETH_GSTRING_LEN] = {
+	[ETHTOOL_A_STATS_XDP_PACKETS]		= "packets",
+	[ETHTOOL_A_STATS_XDP_ERRORS]		= "errors",
+	[ETHTOOL_A_STATS_XDP_ABORTED]		= "aborted",
+	[ETHTOOL_A_STATS_XDP_DROP]		= "drop",
+	[ETHTOOL_A_STATS_XDP_INVALID]		= "invalid",
+	[ETHTOOL_A_STATS_XDP_PASS]		= "pass",
+	[ETHTOOL_A_STATS_XDP_REDIRECT]		= "redirect",
+	[ETHTOOL_A_STATS_XDP_REDIRECT_ERRORS]	= "redirect-errors",
+	[ETHTOOL_A_STATS_XDP_TX]		= "tx",
+	[ETHTOOL_A_STATS_XDP_TX_ERRORS]		= "tx-errors",
+	[ETHTOOL_A_STATS_XDP_XMIT]		= "xmit",
+	[ETHTOOL_A_STATS_XDP_XMIT_DROPS]	= "xmit-drops",
+};
+
 const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1] = {
 	[ETHTOOL_A_STATS_HEADER]	=
 		NLA_POLICY_NESTED(ethnl_header_policy),
@@ -101,6 +119,37 @@  static int stats_parse_request(struct ethnl_req_info *req_base,
 	return 0;
 }
 
+static int stats_prepare_data_xdp(struct net_device *dev,
+				  struct stats_reply_data *data)
+{
+	const struct ethtool_ops *ops = dev->ethtool_ops;
+	size_t size;
+	int ret;
+
+	/* Zero means stats are global/device-wide */
+	data->xdp_stats_channels = 0;
+
+	if (ops->get_std_stats_channels) {
+		ret = ops->get_std_stats_channels(dev, ETH_SS_STATS_XDP);
+		if (ret > 0)
+			data->xdp_stats_channels = ret;
+	}
+
+	size = array_size(min_not_zero(data->xdp_stats_channels, 1U),
+			  sizeof(*data->xdp_stats));
+	if (unlikely(size == SIZE_MAX))
+		return -EOVERFLOW;
+
+	data->xdp_stats = kvmalloc(size, GFP_KERNEL);
+	if (!data->xdp_stats)
+		return -ENOMEM;
+
+	memset(data->xdp_stats, 0xff, size);
+	ops->get_xdp_stats(dev, data->xdp_stats);
+
+	return 0;
+}
+
 static int stats_prepare_data(const struct ethnl_req_info *req_base,
 			      struct ethnl_reply_data *reply_base,
 			      struct genl_info *info)
@@ -125,6 +174,8 @@  static int stats_prepare_data(const struct ethnl_req_info *req_base,
 		     __ETHTOOL_A_STATS_ETH_CTRL_CNT);
 	BUILD_BUG_ON(offsetof(typeof(data->rmon_stats), hist) / sizeof(u64) !=
 		     __ETHTOOL_A_STATS_RMON_CNT);
+	BUILD_BUG_ON(sizeof(*data->xdp_stats) / sizeof(u64) !=
+		     __ETHTOOL_A_STATS_XDP_CNT);
 
 	/* Mark all stats as unset (see ETHTOOL_STAT_NOT_SET) to prevent them
 	 * from being reported to user space in case driver did not set them.
@@ -146,15 +197,19 @@  static int stats_prepare_data(const struct ethnl_req_info *req_base,
 	if (test_bit(ETHTOOL_STATS_RMON, req_info->stat_mask) &&
 	    ops->get_rmon_stats)
 		ops->get_rmon_stats(dev, &data->rmon_stats, &data->rmon_ranges);
+	if (test_bit(ETHTOOL_STATS_XDP, req_info->stat_mask) &&
+	    ops->get_xdp_stats)
+		ret = stats_prepare_data_xdp(dev, data);
 
 	ethnl_ops_complete(dev);
-	return 0;
+	return ret;
 }
 
 static int stats_reply_size(const struct ethnl_req_info *req_base,
 			    const struct ethnl_reply_data *reply_base)
 {
 	const struct stats_req_info *req_info = STATS_REQINFO(req_base);
+	const struct stats_reply_data *data = STATS_REPDATA(reply_base);
 	unsigned int n_grps = 0, n_stats = 0;
 	int len = 0;
 
@@ -180,6 +235,14 @@  static int stats_reply_size(const struct ethnl_req_info *req_base,
 			nla_total_size(4)) *	/* _A_STATS_GRP_HIST_BKT_HI */
 			ETHTOOL_RMON_HIST_MAX * 2;
 	}
+	if (test_bit(ETHTOOL_STATS_XDP, req_info->stat_mask)) {
+		n_stats += min_not_zero(data->xdp_stats_channels, 1U) *
+			   sizeof(*data->xdp_stats) / sizeof(u64);
+		n_grps++;
+
+		len += (nla_total_size(0) *	/* _A_STATS_GRP_STAT_BLOCK */
+			data->xdp_stats_channels);
+	}
 
 	len += n_grps * (nla_total_size(0) + /* _A_STATS_GRP */
 			 nla_total_size(4) + /* _A_STATS_GRP_ID */
@@ -356,6 +419,64 @@  static int stats_put_rmon_stats(struct sk_buff *skb,
 	return 0;
 }
 
+static int stats_put_xdp_stats_one(struct sk_buff *skb,
+				   const struct ethtool_xdp_stats *xdp_stats)
+{
+	if (stat_put(skb, ETHTOOL_A_STATS_XDP_PACKETS,
+		     xdp_stats->packets) ||
+	    stat_put(skb, ETHTOOL_A_STATS_XDP_ABORTED,
+		     xdp_stats->aborted) ||
+	    stat_put(skb, ETHTOOL_A_STATS_XDP_DROP,
+		     xdp_stats->drop) ||
+	    stat_put(skb, ETHTOOL_A_STATS_XDP_INVALID,
+		     xdp_stats->invalid) ||
+	    stat_put(skb, ETHTOOL_A_STATS_XDP_PASS,
+		     xdp_stats->pass) ||
+	    stat_put(skb, ETHTOOL_A_STATS_XDP_REDIRECT,
+		     xdp_stats->redirect) ||
+	    stat_put(skb, ETHTOOL_A_STATS_XDP_REDIRECT_ERRORS,
+		     xdp_stats->redirect_errors) ||
+	    stat_put(skb, ETHTOOL_A_STATS_XDP_TX,
+		     xdp_stats->tx) ||
+	    stat_put(skb, ETHTOOL_A_STATS_XDP_TX_ERRORS,
+		     xdp_stats->tx_errors) ||
+	    stat_put(skb, ETHTOOL_A_STATS_XDP_XMIT,
+		     xdp_stats->xmit) ||
+	    stat_put(skb, ETHTOOL_A_STATS_XDP_XMIT_DROPS,
+		     xdp_stats->xmit_drops))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int stats_put_xdp_stats(struct sk_buff *skb,
+			       const struct stats_reply_data *data)
+{
+	struct nlattr *nest;
+	int i;
+
+	if (!data->xdp_stats_channels)
+		return stats_put_xdp_stats_one(skb, data->xdp_stats);
+
+	for (i = 0; i < data->xdp_stats_channels; i++) {
+		nest = nla_nest_start(skb, ETHTOOL_A_STATS_GRP_STAT_BLOCK);
+		if (!nest)
+			return -EMSGSIZE;
+
+		if (stats_put_xdp_stats_one(skb, data->xdp_stats + i))
+			goto err_cancel_xdp;
+
+		nla_nest_end(skb, nest);
+	}
+
+	return 0;
+
+err_cancel_xdp:
+	nla_nest_cancel(skb, nest);
+
+	return -EMSGSIZE;
+}
+
 static int stats_put_stats(struct sk_buff *skb,
 			   const struct stats_reply_data *data,
 			   u32 id, u32 ss_id,
@@ -406,10 +527,20 @@  static int stats_fill_reply(struct sk_buff *skb,
 	if (!ret && test_bit(ETHTOOL_STATS_RMON, req_info->stat_mask))
 		ret = stats_put_stats(skb, data, ETHTOOL_STATS_RMON,
 				      ETH_SS_STATS_RMON, stats_put_rmon_stats);
+	if (!ret && test_bit(ETHTOOL_STATS_XDP, req_info->stat_mask))
+		ret = stats_put_stats(skb, data, ETHTOOL_STATS_XDP,
+				      ETH_SS_STATS_XDP, stats_put_xdp_stats);
 
 	return ret;
 }
 
+static void stats_cleanup_data(struct ethnl_reply_data *reply_data)
+{
+	struct stats_reply_data *data = STATS_REPDATA(reply_data);
+
+	kvfree(data->xdp_stats);
+}
+
 const struct ethnl_request_ops ethnl_stats_request_ops = {
 	.request_cmd		= ETHTOOL_MSG_STATS_GET,
 	.reply_cmd		= ETHTOOL_MSG_STATS_GET_REPLY,
@@ -421,4 +552,5 @@  const struct ethnl_request_ops ethnl_stats_request_ops = {
 	.prepare_data		= stats_prepare_data,
 	.reply_size		= stats_reply_size,
 	.fill_reply		= stats_fill_reply,
+	.cleanup_data		= stats_cleanup_data,
 };
diff --git a/net/ethtool/strset.c b/net/ethtool/strset.c
index 2d51b7ab4dc5..a149048f9c34 100644
--- a/net/ethtool/strset.c
+++ b/net/ethtool/strset.c
@@ -105,6 +105,11 @@  static const struct strset_info info_template[] = {
 		.count		= __ETHTOOL_A_STATS_RMON_CNT,
 		.strings	= stats_rmon_names,
 	},
+	[ETH_SS_STATS_XDP] = {
+		.per_dev	= false,
+		.count		= __ETHTOOL_A_STATS_XDP_CNT,
+		.strings	= stats_xdp_names,
+	},
 };
 
 struct strset_req_info {