mbox series

[net-next,0/3] bnxt: add rx discards stats for oom and netpool

Message ID 20210825231830.2748915-1-kuba@kernel.org
Headers show
Series bnxt: add rx discards stats for oom and netpool | expand

Message

Jakub Kicinski Aug. 25, 2021, 11:18 p.m. UTC
Drivers should avoid silently dropping frames. This set adds two
stats for previously unaccounted events to bnxt - packets dropped
due to allocation failures and packets dropped during emergency
ring polling.

Jakub Kicinski (3):
  bnxt: reorder logic in bnxt_get_stats64()
  bnxt: count packets discarded because of netpoll
  bnxt: count discards due to memory allocation errors

 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 54 ++++++++++++++-----
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  3 ++
 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c |  7 +++
 3 files changed, 51 insertions(+), 13 deletions(-)

Comments

Vladimir Oltean Aug. 26, 2021, 12:22 a.m. UTC | #1
Hi Jakub,

On Wed, Aug 25, 2021 at 04:18:30PM -0700, Jakub Kicinski wrote:
> Count packets dropped due to buffer or skb allocation errors.

> Report as part of rx_dropped, and per-queue in ethtool

> (retaining only the former across down/up cycles).

> 

> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

> ---

>  drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 14 +++++++++++++-

>  drivers/net/ethernet/broadcom/bnxt/bnxt.h         |  1 +

>  drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c |  1 +

>  3 files changed, 15 insertions(+), 1 deletion(-)

> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c

> index 25f1327aedb6..f8a28021389b 100644

> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c

> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c

> @@ -188,6 +188,7 @@ static const char * const bnxt_rx_sw_stats_str[] = {

>  	"rx_l4_csum_errors",

>  	"rx_resets",

>  	"rx_buf_errors",

> +	"rx_oom_discards",


'Could you consider adding "driver" stats under RTM_GETSTATS,
or a similar new structured interface over ethtool?

Looks like the statistic in question has pretty clear semantics,
and may be more broadly useful.'

https://patchwork.ozlabs.org/project/netdev/patch/20201017213611.2557565-2-vladimir.oltean@nxp.com/

>  };

>  

>  static const char * const bnxt_cmn_sw_stats_str[] = {

> -- 

> 2.31.1

>
Jakub Kicinski Aug. 26, 2021, 12:35 a.m. UTC | #2
On Thu, 26 Aug 2021 03:22:57 +0300 Vladimir Oltean wrote:
> 'Could you consider adding "driver" stats under RTM_GETSTATS,

> or a similar new structured interface over ethtool?

> 

> Looks like the statistic in question has pretty clear semantics,

> and may be more broadly useful.'


It's commonly reported per ring, I need for make a home for these 
first by adding that damn netlink queue API. It's my next project.

I can drop the ethtool stat from this patch if you have a strong
preference.
Vladimir Oltean Aug. 26, 2021, 12:42 a.m. UTC | #3
On Wed, Aug 25, 2021 at 05:35:37PM -0700, Jakub Kicinski wrote:
> On Thu, 26 Aug 2021 03:22:57 +0300 Vladimir Oltean wrote:

> > 'Could you consider adding "driver" stats under RTM_GETSTATS,

> > or a similar new structured interface over ethtool?

> >

> > Looks like the statistic in question has pretty clear semantics,

> > and may be more broadly useful.'

>

> It's commonly reported per ring, I need for make a home for these

> first by adding that damn netlink queue API. It's my next project.

>

> I can drop the ethtool stat from this patch if you have a strong

> preference.


I don't have any strong preference, far from it. What would you do if
you were reviewing somebody else's patch which made the same change?
Jakub Kicinski Aug. 26, 2021, 1:44 a.m. UTC | #4
On Thu, 26 Aug 2021 03:42:08 +0300 Vladimir Oltean wrote:
> On Wed, Aug 25, 2021 at 05:35:37PM -0700, Jakub Kicinski wrote:

> > On Thu, 26 Aug 2021 03:22:57 +0300 Vladimir Oltean wrote:  

> > > 'Could you consider adding "driver" stats under RTM_GETSTATS,

> > > or a similar new structured interface over ethtool?

> > >

> > > Looks like the statistic in question has pretty clear semantics,

> > > and may be more broadly useful.'  

> >

> > It's commonly reported per ring, I need for make a home for these

> > first by adding that damn netlink queue API. It's my next project.

> >

> > I can drop the ethtool stat from this patch if you have a strong

> > preference.  

> 

> I don't have any strong preference, far from it. What would you do if

> you were reviewing somebody else's patch which made the same change?


If someone else posted this patch I'd probably not complain, as I said
there is no well suited API, and my knee jerk expectation was it should
be reported in the per-queue API which doesn't exist.

When you'd seem me complain is when drivers expose in -S stats which
have proper APIs or when higher layer/common code is trying to piggy
back on -S instead of creating its own structured interface.

I don't see value in tracking this particular statistic in production
settings, maybe that's also affecting my judgment here. But since
that's the case I'll just drop it.


If you have any feedback on my suggestions, reviews, comments etc.
please do share on- or off-list at any time. No need to wait a year
until I post a vaguely similar patch ;)
Vladimir Oltean Aug. 26, 2021, 12:46 p.m. UTC | #5
On Wed, Aug 25, 2021 at 06:44:51PM -0700, Jakub Kicinski wrote:
> On Thu, 26 Aug 2021 03:42:08 +0300 Vladimir Oltean wrote:

> > On Wed, Aug 25, 2021 at 05:35:37PM -0700, Jakub Kicinski wrote:

> > > On Thu, 26 Aug 2021 03:22:57 +0300 Vladimir Oltean wrote:

> > > > 'Could you consider adding "driver" stats under RTM_GETSTATS,

> > > > or a similar new structured interface over ethtool?

> > > >

> > > > Looks like the statistic in question has pretty clear semantics,

> > > > and may be more broadly useful.'

> > >

> > > It's commonly reported per ring, I need for make a home for these

> > > first by adding that damn netlink queue API. It's my next project.

> > >

> > > I can drop the ethtool stat from this patch if you have a strong

> > > preference.

> >

> > I don't have any strong preference, far from it. What would you do if

> > you were reviewing somebody else's patch which made the same change?

>

> If someone else posted this patch I'd probably not complain, as I said

> there is no well suited API, and my knee jerk expectation was it should

> be reported in the per-queue API which doesn't exist.

>

> When you'd seem me complain is when drivers expose in -S stats which

> have proper APIs or when higher layer/common code is trying to piggy

> back on -S instead of creating its own structured interface.

>

> I don't see value in tracking this particular statistic in production

> settings, maybe that's also affecting my judgment here. But since

> that's the case I'll just drop it.

>

>

> If you have any feedback on my suggestions, reviews, comments etc.

> please do share on- or off-list at any time. No need to wait a year

> until I post a vaguely similar patch ;)


I don't know why you get the impression that "I waited a year until you
posted a vaguely similar patch". I am not following you, it just happens
that I was online and reading netdev when you posted this change now.
From the experience of threads that I directly participated in (and this
is why I dug up a DSA thread from a year ago, that was the one I could
find the quickest, again I am not watching your footsteps but
statistically speaking, it would be unlikely for the threads I
participated in to be the only ones where you've said this), you do seem
to tell people to try and use more "generic" and "structured" methods of
statistics reporting as opposed to putting everything in the plain
"ethtool -S", even if those methods don't exist or don't work for that
particular driver and would require major rework (like ndo_get_stats64
which is non-sleepable).

The 'driver stats under RTM_GETSTATS' was a direct quote exactly for
this reason. Now if this rx_oom_discards counter would be better expressed
as a generic 'driver counter' or a 'per-queue counter', none of which exist,
I don't know/don't care. I do wonder sometimes if you think about what
is the people's reaction when you tell them that ethtool -S is not fine
and they should use a kernel interface which doesn't exist, and I was
just curious to see what would yours be.

To create a new kernel interface for statistics would need not only the
vision, but also the passion and dedication to stick to those patches.
People will generally lack the desire to do that, because for better or
worse, "ethtool -S" is the central place to diagnose interface-level
problems. You've also expressed this clearer than words can say by
sending a patch to extend an interface you don't like.

In fact, my message seems to have hit quite the wrong way. I did not
want you to drop the counter from ethtool -S, please keep it if you want
it, but to sway you towards a more relaxed attitude when reviewing
patches for new counters added through that interface. Heck, I would
even like to resubmit the ethtool -S realloc counters if they had any
chance of getting accepted, it's not as if I had any serious intention
of extending the statistics reporting interfaces for something that minor.