Message ID | 20210414034454.1970967-1-kuba@kernel.org |
---|---|
Headers | show |
Series | ethtool: add standard FEC statistics | expand |
On Tue, 2021-04-13 at 20:44 -0700, Jakub Kicinski wrote: > ethtool_link_ksettings *); > + void (*get_fec_stats)(struct net_device *dev, > + struct ethtool_fec_stats > *fec_stats); why void ? some drivers need to access the FW and it could be an old FW/device where the fec stats are not supported. and sometimes e.g. in mlx5 case FW can fail for FW related businesses :)..
On Wed, 14 Apr 2021 23:25:43 -0700 Saeed Mahameed wrote: > On Tue, 2021-04-13 at 20:44 -0700, Jakub Kicinski wrote: > > ethtool_link_ksettings *); > > + void (*get_fec_stats)(struct net_device *dev, > > + struct ethtool_fec_stats *fec_stats); > > why void ? some drivers need to access the FW and it could be an old > FW/device where the fec stats are not supported. When stats are not supported just returning is fine. Stats are initialized to -1, core will not dump them into the netlink message if driver didn't assign anything. > and sometimes e.g. in mlx5 case FW can fail for FW related businesses > :).. Can do. I was wondering if the entity reading the stats (from user space) can do anything useful with the error, and didn't really come up with anything other than printing an error. Which the kernel can do as well. OTOH if there are multiple stats to read and one of them fails its probably better to return partial results than fail the entire op. Therefore I went for no error - if something fails - the stats will be missing. Does that make any sense? Or do you think errors are rare enough that it's okay if they are fatal? (with the caveat that -EOPNOTSUPP should be ignored).
On Thu, 2021-04-15 at 08:21 -0700, Jakub Kicinski wrote: > On Wed, 14 Apr 2021 23:25:43 -0700 Saeed Mahameed wrote: > > On Tue, 2021-04-13 at 20:44 -0700, Jakub Kicinski wrote: > > > ethtool_link_ksettings *); > > > + void (*get_fec_stats)(struct net_device *dev, > > > + struct ethtool_fec_stats > > > *fec_stats); > > > > why void ? some drivers need to access the FW and it could be an > > old > > FW/device where the fec stats are not supported. > > When stats are not supported just returning is fine. Stats are > initialized to -1, core will not dump them into the netlink message > if driver didn't assign anything. > > > and sometimes e.g. in mlx5 case FW can fail for FW related > > businesses > > :).. > > Can do. I was wondering if the entity reading the stats (from user > space) can do anything useful with the error, and didn't really come > up with anything other than printing an error. Which the kernel can > do as well. OTOH if there are multiple stats to read and one of them > fails its probably better to return partial results than fail > the entire op. Therefore I went for no error - if something fails - > the stats will be missing. > > Does that make any sense? Or do you think errors are rare enough that > it's okay if they are fatal? (with the caveat that -EOPNOTSUPP should > be ignored). Agreed, Thanks for the explanation but you still need to handle the error internally in the driver, otherwise the command returns garbage or 0 if you didn't check return status.
On 14/04/2021 04:44, Jakub Kicinski wrote: > Report what appears to be the standard block counts: > - 30.5.1.1.17 aFECCorrectedBlocks > - 30.5.1.1.18 aFECUncorrectableBlocks > > Don't report the per-lane symbol counts, if those really > count symbols they are not what the standard calls for > (even if symbols seem like the most useful thing to count.) > > Fingers crossed that fec_corrected_errors is not in symbols. > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> I'm trying to find out whether these count the right thing or not. Some component documentation says that the per-lane _signal_ "asserts at maximum once per FEC coding block". I haven't yet been able to track down how the counters aggregate that, but it would seem to suggest that they in fact count blocks and not symbols, and are just misnamed. Investigation continues. -ed