mbox series

[v2,net-next,0/3] Add devlink and devlink health reporters to

Message ID 20201104122755.753241-1-george.cherian@marvell.com
Headers show
Series Add devlink and devlink health reporters to | expand

Message

George Cherian Nov. 4, 2020, 12:27 p.m. UTC
Add basic devlink and devlink health reporters.
Devlink health reporters are added for NPA and NIX blocks.
These reporters report the error count in respective blocks.

Address Jakub's comment to add devlink support for error reporting.
https://www.spinics.net/lists/netdev/msg670712.html

Change-log:
- Address Willem's comments on v1.
- Fixed the sparse issues, reported by Jakub.

George Cherian (3):
  octeontx2-af: Add devlink suppoort to af driver
  octeontx2-af: Add devlink health reporters for NPA
  octeontx2-af: Add devlink health reporters for NIX

 .../net/ethernet/marvell/octeontx2/Kconfig    |   1 +
 .../ethernet/marvell/octeontx2/af/Makefile    |   3 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.c   |   9 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |   4 +
 .../marvell/octeontx2/af/rvu_devlink.c        | 860 ++++++++++++++++++
 .../marvell/octeontx2/af/rvu_devlink.h        |  67 ++
 .../marvell/octeontx2/af/rvu_struct.h         |  33 +
 7 files changed, 975 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.h

Comments

Saeed Mahameed Nov. 5, 2020, 5:08 a.m. UTC | #1
On Wed, 2020-11-04 at 17:57 +0530, George Cherian wrote:
> Add health reporters for RVU NPA block.

                               ^^^ NIX ?

Cc: Jiri 

Anyway, could you please spare some words on what is NPA and what is
NIX?

Regarding the reporters names, all drivers register well known generic
names such as (fw,hw,rx,tx), I don't know if it is a good idea to use
vendor specific names, if you are reporting for hw/fw units then just
use "hw" or "fw" as the reporter name and append the unit NPA/NIX to
the counter/error names.

> Only reporter dump is supported.

> 

> Output:

>  # ./devlink health

>  pci/0002:01:00.0:

>    reporter npa

>      state healthy error 0 recover 0

>    reporter nix

>      state healthy error 0 recover 0

>  # ./devlink  health dump show pci/0002:01:00.0 reporter nix

>   NIX_AF_GENERAL:

>          Memory Fault on NIX_AQ_INST_S read: 0

>          Memory Fault on NIX_AQ_RES_S write: 0

>          AQ Doorbell error: 0

>          Rx on unmapped PF_FUNC: 0

>          Rx multicast replication error: 0

>          Memory fault on NIX_RX_MCE_S read: 0

>          Memory fault on multicast WQE read: 0

>          Memory fault on mirror WQE read: 0

>          Memory fault on mirror pkt write: 0

>          Memory fault on multicast pkt write: 0

>    NIX_AF_RAS:

>          Poisoned data on NIX_AQ_INST_S read: 0

>          Poisoned data on NIX_AQ_RES_S write: 0

>          Poisoned data on HW context read: 0

>          Poisoned data on packet read from mirror buffer: 0

>          Poisoned data on packet read from mcast buffer: 0

>          Poisoned data on WQE read from mirror buffer: 0

>          Poisoned data on WQE read from multicast buffer: 0

>          Poisoned data on NIX_RX_MCE_S read: 0

>    NIX_AF_RVU:

>          Unmap Slot Error: 0

> 


Now i am a little bit skeptic here, devlink health reporter
infrastructure was never meant to deal with dump op only, the main
purpose is to diagnose/dump and recover.

especially in your use case where you only report counters, i don't
believe devlink health dump is a proper interface for this.
Many of these counters if not most are data path packet based and maybe
they should belong to ethtool.