mbox series

[RFC,V4,net-next,0/5] ethtool: Extend module EEPROM dump API

Message ID 1616433075-27051-1-git-send-email-moshe@nvidia.com
Headers show
Series ethtool: Extend module EEPROM dump API | expand

Message

Moshe Shemesh March 22, 2021, 5:11 p.m. UTC
Ethtool supports module EEPROM dumps via the `ethtool -m <dev>` command.
But in current state its functionality is limited - offset and length
parameters, which are used to specify a linear desired region of EEPROM
data to dump, is not enough, considering emergence of complex module
EEPROM layouts such as CMIS 4.0.
Moreover, CMIS 4.0 extends the amount of pages that may be accessible by
introducing another parameter for page addressing - banks.

Besides, currently module EEPROM is represented as a chunk of
concatenated pages, where lower 128 bytes of all pages, except page 00h,
are omitted. Offset and length are used to address parts of this fake
linear memory. But in practice drivers, which implement
get_module_info() and get_module_eeprom() ethtool ops still calculate
page number and set I2C address on their own.

This series tackles these issues by adding ethtool op, which allows to
pass page number, bank number and I2C address in addition to offset and
length parameters to the driver, adds corresponding netlink
infrastructure and implements the new interface in mlx5 driver.

This allows to extend userspace 'ethtool -m' CLI by adding new
parameters - page, bank and i2c. New command line format:
 ethtool -m <dev> [hex on|off] [raw on|off] [offset N] [length N] [page N] [bank N] [i2c N]

The consequence of this series is a possibility to dump arbitrary EEPROM
page at a time, in contrast to dumps of concatenated pages. Therefore,
offset and length change their semantics and may be used only to specify
a part of data within a page, which size is currently limited to 256
bytes.

As for backwards compatibility with get_module_info() and
get_module_eeprom() pair, the series addresses it as well by
implementing a fallback mechanism. As mentioned earlier, drivers derive
a page number from 'global' offset, so this can be done vice versa
without their involvement thanks to standardization. If kernel netlink
handler of 'ethtool -m' command detects that new ethtool op is not
supported by the driver, it calculates offset from given page number and
page offset and calls old ndos, if they are available.

Change log:
v3 -> v4:
- Renamed many identifiers to use 'eeprom' instead of 'eeprom_data'.
- Renamed netlink enums and defines to use 'MODULE_EEPROM' instead of
   'EEPROM_DATA'.
- Renamed struct ethtool_eeprom_data to ethtool_module_eeprom.
- Added MODULE_EEPROM_MAX_OFFSET (257 * 256) macro and check global offset
    against it to avoid overflow.
- Removed ndo pointer check from _parse_request().
- Removed separate length element from netlink response.
- Limited reads to 128 bytes without crossing half page bound.

v2 -> v3:
- Removed page number limitations
- Added length limit when page is present in fallback
- Changed page, bank and i2c_address type to u8 all over the patchset
- Added 0x51 I2C address usage increasing offset by 256 for SFP

v1 -> v2:
- Limited i2c_address values by 127
- Added page bound check for offset and length
- Added defines for these two points
- Added extack to ndo parameters
- Moved ethnl_ops_begin(dev) and set error path accordingly



Vladyslav Tarasiuk (5):
  ethtool: Allow network drivers to dump arbitrary EEPROM data
  net/mlx5: Refactor module EEPROM query
  net/mlx5: Implement get_module_eeprom_by_page()
  net/mlx5: Add support for DSFP module EEPROM dumps
  ethtool: Add fallback to get_module_eeprom from netlink command

 Documentation/networking/ethtool-netlink.rst  |  32 ++-
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  |  44 ++++
 .../net/ethernet/mellanox/mlx5/core/port.c    | 101 +++++---
 include/linux/ethtool.h                       |  33 ++-
 include/linux/mlx5/port.h                     |  12 +
 include/uapi/linux/ethtool_netlink.h          |  19 ++
 net/ethtool/Makefile                          |   2 +-
 net/ethtool/eeprom.c                          | 231 ++++++++++++++++++
 net/ethtool/netlink.c                         |  10 +
 net/ethtool/netlink.h                         |   2 +
 10 files changed, 454 insertions(+), 32 deletions(-)
 create mode 100644 net/ethtool/eeprom.c

Comments

Andrew Lunn March 23, 2021, 12:27 a.m. UTC | #1
> > +#define ETH_MODULE_EEPROM_PAGE_LEN	256

> 

> Sorry to keep raising issues, but I think you want to make this constant

> 128.


Yes, i also think the KAPI should be limited to returning a maximum of
a 1/2 page per call.

> > +#define MODULE_EEPROM_MAX_OFFSET (257 *

> > ETH_MODULE_EEPROM_PAGE_LEN)

> 

> The device actually has 257 addressable chunks of 128 bytes each.  With

> ETH_MODULE_EEPROM_PAGE_LEN set to 256, your constant is 2X too big.

> 

> Note also, SFP devices (but not QSFP or CMIS) actually have another 256

> bytes available at 0x50, in addition to the full 257*128 at 0x51.  So SFP is

> actually 259*128 or (256 + 257 * 128).

> 

> Devices that don't support pages have much lower limits (256 bytes for

> QSFP/CMIS and 512 for SFP).  Some SFP only support 256 bytes.  Most devices

> will return nonsense for pages above 3.  So, this check is really only an

> absolute limit.  The SFP driver that takes this request will probably check

> against a more refined MAX length (eg modinfo->eeprom_len).

> 

> I suggest setting this constant to 259 * (ETH_MODULE_EEPROM_PAGE_LEN / 2).

> Let the driver refine it from there.


I don't even see a need for this. The offset should be within one 1/2
page, of one bank. So offset >= 0 and <= 127. Length is also > 0 and
<- 127. And offset+length is <= 127.

For the moment, please forget about backwards compatibility with the
IOCTL interface. Lets get a new clean KAPI and a new clean internal
API between the ethtool core and the drivers. Once we have that agreed
on, we can work on the various compatibility shims we need to work
between old and new APIs in various places.

      Andrew
Andrew Lunn March 23, 2021, 12:54 a.m. UTC | #2
> +static int eeprom_prepare_data(const struct ethnl_req_info *req_base,

> +			       struct ethnl_reply_data *reply_base,

> +			       struct genl_info *info)

> +{

> +	struct eeprom_reply_data *reply = MODULE_EEPROM_REPDATA(reply_base);

> +	struct eeprom_req_info *request = MODULE_EEPROM_REQINFO(req_base);

> +	struct ethtool_module_eeprom page_data = {0};

> +	struct net_device *dev = reply_base->dev;

> +	int ret;

> +

> +	if (!dev->ethtool_ops->get_module_eeprom_by_page)

> +		return -EOPNOTSUPP;

> +

> +	/* Allow dumps either of low or high page without crossing half page boundary */

> +	if ((request->offset < ETH_MODULE_EEPROM_PAGE_LEN / 2 &&

> +	     request->offset + request->length > ETH_MODULE_EEPROM_PAGE_LEN / 2) ||

> +	    request->offset + request->length > ETH_MODULE_EEPROM_PAGE_LEN)

> +		return -EINVAL;


Please keep all the parameter validation together, in
eeprom_parse_request(). At some point, we may extend
eeprom_parse_request() to make use of extack, to indicate which
parameter is invalid. Just getting an -EINVAL can be hard to debug,
where as NL_SET_ERR_MSG_ATTR() can help the user.

      Andrew
Don Bollinger March 23, 2021, 1:23 a.m. UTC | #3
> Subject: Re: [RFC PATCH V4 net-next 1/5] ethtool: Allow network drivers to

> dump arbitrary EEPROM data

> 

> > > +#define ETH_MODULE_EEPROM_PAGE_LEN	256

> >

> > Sorry to keep raising issues, but I think you want to make this

> > constant 128.

> 

> Yes, i also think the KAPI should be limited to returning a maximum of a

1/2
> page per call.

> 

> > > +#define MODULE_EEPROM_MAX_OFFSET (257 *

> > > ETH_MODULE_EEPROM_PAGE_LEN)

> >

> > The device actually has 257 addressable chunks of 128 bytes each.

> > With ETH_MODULE_EEPROM_PAGE_LEN set to 256, your constant is 2X too

> big.

> >

> > Note also, SFP devices (but not QSFP or CMIS) actually have another

> > 256 bytes available at 0x50, in addition to the full 257*128 at 0x51.

> > So SFP is actually 259*128 or (256 + 257 * 128).

> >

> > Devices that don't support pages have much lower limits (256 bytes for

> > QSFP/CMIS and 512 for SFP).  Some SFP only support 256 bytes.  Most

> > devices will return nonsense for pages above 3.  So, this check is

> > really only an absolute limit.  The SFP driver that takes this request

> > will probably check against a more refined MAX length (eg modinfo-

> >eeprom_len).

> >

> > I suggest setting this constant to 259 * (ETH_MODULE_EEPROM_PAGE_LEN

> / 2).

> > Let the driver refine it from there.

> 

> I don't even see a need for this. The offset should be within one 1/2

page, of
> one bank. So offset >= 0 and <= 127. Length is also > 0 and

> <- 127. And offset+length is <= 127.


I like the clean approach, but...   How do you request low memory?  If all
of the pages start at offset 0, then page 0 is at page 0, offset 0.  That
has always referred to low memory.  (I have actually used 'page -1' for this
in some code I don't share with others.)  I believe all of this code
currently is consistent with the paged area starting at offset 128.  If that
changes to start paged memory at offset 0, there are several places that
will need to be adjusted.

Whatever choice is made, there should be documentation scattered around in
the code, man pages and Documentation directories to tell the
user/programmer whether the paged area starts at offset 0 or offset 128.  It
is constantly confusing, and there is no obvious answer.

> 

> For the moment, please forget about backwards compatibility with the IOCTL

> interface. Lets get a new clean KAPI and a new clean internal API between

> the ethtool core and the drivers. Once we have that agreed on, we can work

> on the various compatibility shims we need to work between old and new

> APIs in various places.

> 

>       Andrew


Don
Andrew Lunn March 23, 2021, 2:03 a.m. UTC | #4
> > I don't even see a need for this. The offset should be within one 1/2

> page, of

> > one bank. So offset >= 0 and <= 127. Length is also > 0 and

> > <- 127. And offset+length is <= 127.

> 

> I like the clean approach, but...   How do you request low memory?


Duh!

I got my conditions wrong. Too focused on 1/2 pages to think that two
of them makes one page!

Lets try again:

offset < 256
0 < len < 128

if (offset < 128)
   offset + len < 128
else
   offset + len < 256

Does that look better?

Reading bytes from the lower 1/2 of page 0 should give the same data
as reading data from the lower 1/2 of page 42. So we can allow that,
but don't be too surprised when an SFP gets it wrong and gives you
rubbish. I would suggest ethtool(1) never actually does read from the
lower 1/2 of any page other than 0.

And i agree about documentation. I would suggest a comment in
ethtool_netlink.h, and the RST documentation.

		   Andrew
Don Bollinger March 23, 2021, 5:47 p.m. UTC | #5
> > > I don't even see a need for this. The offset should be within one

> > > 1/2

> > page, of

> > > one bank. So offset >= 0 and <= 127. Length is also > 0 and

> > > <- 127. And offset+length is <= 127.

> >

> > I like the clean approach, but...   How do you request low memory?

> 

> Duh!

> 

> I got my conditions wrong. Too focused on 1/2 pages to think that two of

> them makes one page!

> 

> Lets try again:

> 

> offset < 256

> 0 < len < 128


Actually 0 < len <= 128.  Length of 128 is not only legal, but very common.
"Read the whole 1/2 page block".

> 

> if (offset < 128)

>    offset + len < 128


Again, offset + len <= 128

> else

>    offset + len < 256


offset + len <= 256

> 

> Does that look better?

> 

> Reading bytes from the lower 1/2 of page 0 should give the same data as

> reading data from the lower 1/2 of page 42. So we can allow that, but

don't
> be too surprised when an SFP gets it wrong and gives you rubbish. I would


The spec is clear that the lower half is the same for all pages.  If the SFP
gives you rubbish you should throw the device in the rubbish.

> suggest ethtool(1) never actually does read from the lower 1/2 of any page

> other than 0.


I agree, despite my previous comment.  While the spec is clear that should
work, I believe virtually all such instances are bugs not yet discovered.

And, note that the legacy API provides no way to access lower memory from
any page but 0.  There's just no syntax for it.  Not that we care about
legacy :-).

> 

> And i agree about documentation. I would suggest a comment in

> ethtool_netlink.h, and the RST documentation.

> 

> 		   Andrew


Don
Andrew Lunn March 23, 2021, 10:17 p.m. UTC | #6
> The spec is clear that the lower half is the same for all pages.  If the SFP

> gives you rubbish you should throw the device in the rubbish.


Sometimes you don't get the choice. You have to use the GPON SFP the
ISP sent you, if you want FTTH. And they tend to be cheap and broken
with respect to the spec.

    Andrew
Moshe Shemesh March 24, 2021, 10:03 a.m. UTC | #7
On 3/23/2021 2:27 AM, Andrew Lunn wrote:
>

>>> +#define ETH_MODULE_EEPROM_PAGE_LEN 256

>> Sorry to keep raising issues, but I think you want to make this constant

>> 128.

> Yes, i also think the KAPI should be limited to returning a maximum of

> a 1/2 page per call.

OK.
>>> +#define MODULE_EEPROM_MAX_OFFSET (257 *

>>> ETH_MODULE_EEPROM_PAGE_LEN)

>> The device actually has 257 addressable chunks of 128 bytes each.  With

>> ETH_MODULE_EEPROM_PAGE_LEN set to 256, your constant is 2X too big.

>>

>> Note also, SFP devices (but not QSFP or CMIS) actually have another 256

>> bytes available at 0x50, in addition to the full 257*128 at 0x51.  So SFP is

>> actually 259*128 or (256 + 257 * 128).

>>

>> Devices that don't support pages have much lower limits (256 bytes for

>> QSFP/CMIS and 512 for SFP).  Some SFP only support 256 bytes.  Most devices

>> will return nonsense for pages above 3.  So, this check is really only an

>> absolute limit.  The SFP driver that takes this request will probably check

>> against a more refined MAX length (eg modinfo->eeprom_len).

>>

>> I suggest setting this constant to 259 * (ETH_MODULE_EEPROM_PAGE_LEN / 2).

>> Let the driver refine it from there.

> I don't even see a need for this. The offset should be within one 1/2

> page, of one bank. So offset >= 0 and <= 127. Length is also > 0 and

> <- 127. And offset+length is <= 127.

>

> For the moment, please forget about backwards compatibility with the

> IOCTL interface. Lets get a new clean KAPI and a new clean internal

> API between the ethtool core and the drivers. Once we have that agreed

> on, we can work on the various compatibility shims we need to work

> between old and new APIs in various places.



OK, so following the comments here, I will ignore backward compatibility 
of having global offset and length.

That means I assume in this KAPI I am asked to get data from specific 
page. Should I also require user space to send page number to this KAPI 
(I mean make page number mandatory) ?

>        Andrew
Moshe Shemesh March 24, 2021, 10:05 a.m. UTC | #8
On 3/23/2021 2:54 AM, Andrew Lunn wrote:
>

>> +static int eeprom_prepare_data(const struct ethnl_req_info *req_base,

>> +                            struct ethnl_reply_data *reply_base,

>> +                            struct genl_info *info)

>> +{

>> +     struct eeprom_reply_data *reply = MODULE_EEPROM_REPDATA(reply_base);

>> +     struct eeprom_req_info *request = MODULE_EEPROM_REQINFO(req_base);

>> +     struct ethtool_module_eeprom page_data = {0};

>> +     struct net_device *dev = reply_base->dev;

>> +     int ret;

>> +

>> +     if (!dev->ethtool_ops->get_module_eeprom_by_page)

>> +             return -EOPNOTSUPP;

>> +

>> +     /* Allow dumps either of low or high page without crossing half page boundary */

>> +     if ((request->offset < ETH_MODULE_EEPROM_PAGE_LEN / 2 &&

>> +          request->offset + request->length > ETH_MODULE_EEPROM_PAGE_LEN / 2) ||

>> +         request->offset + request->length > ETH_MODULE_EEPROM_PAGE_LEN)

>> +             return -EINVAL;

> Please keep all the parameter validation together, in

> eeprom_parse_request(). At some point, we may extend

> eeprom_parse_request() to make use of extack, to indicate which

> parameter is invalid. Just getting an -EINVAL can be hard to debug,

> where as NL_SET_ERR_MSG_ATTR() can help the user.

Sure, we can add that.
>        Andrew
Moshe Shemesh March 24, 2021, 10:14 a.m. UTC | #9
On 3/23/2021 7:47 PM, Don Bollinger wrote:
>>>> I don't even see a need for this. The offset should be within one

>>>> 1/2

>>> page, of

>>>> one bank. So offset >= 0 and <= 127. Length is also > 0 and

>>>> <- 127. And offset+length is <= 127.

>>> I like the clean approach, but...   How do you request low memory?

>> Duh!

>>

>> I got my conditions wrong. Too focused on 1/2 pages to think that two of

>> them makes one page!

>>

>> Lets try again:

>>

>> offset < 256

>> 0 < len < 128

> Actually 0 < len <= 128.  Length of 128 is not only legal, but very common.

> "Read the whole 1/2 page block".

Ack.
>> if (offset < 128)

>>     offset + len < 128

> Again, offset + len <= 128

>

>> else

>>     offset + len < 256

> offset + len <= 256

Ack.
>> Does that look better?

>>

>> Reading bytes from the lower 1/2 of page 0 should give the same data as

>> reading data from the lower 1/2 of page 42. So we can allow that, but

> don't

>> be too surprised when an SFP gets it wrong and gives you rubbish. I would

> The spec is clear that the lower half is the same for all pages.  If the SFP

> gives you rubbish you should throw the device in the rubbish.

>

>> suggest ethtool(1) never actually does read from the lower 1/2 of any page

>> other than 0.

> I agree, despite my previous comment.  While the spec is clear that should

> work, I believe virtually all such instances are bugs not yet discovered.



Agreed, so we will accept offset < 128 only on page zero.

> And, note that the legacy API provides no way to access lower memory from

> any page but 0.  There's just no syntax for it.  Not that we care about

> legacy :-).

>

>> And i agree about documentation. I would suggest a comment in

>> ethtool_netlink.h, and the RST documentation.



Ack, will comment there on limiting new KAPI only to half pages reading.

We may address reading cross pages by user space (implementing multiple 
calls to KAPI to get such data).

>>                   Andrew

> Don

>

>
Andrew Lunn March 24, 2021, 12:15 p.m. UTC | #10
> OK, so following the comments here, I will ignore backward compatibility of

> having global offset and length.


Yes, we can do that in userspace.

> That means I assume in this KAPI I am asked to get data from specific page.

> Should I also require user space to send page number to this KAPI (I mean

> make page number mandatory) ?


It makes the API more explicit. We always need the page, so if it is
not passed you need to default to something. As with addresses, you
have no way to pass back what page was actually read. So yes, make it
mandatory.

And i suppose the next question is, do we make bank mandatory? Once
you have a page > 0x10, you need the bank. Either we need to encode
the logic of when a bank is needed, and make it mandatory then, or it
is always mandatory.

Given the general pattern, we make it mandatory.

But sometime in the future when we get yet another SFP format, with
additional parameters, they will be optional, in order to keep
backwards compatibility.

	  Andrew