mbox series

[v2,0/3] net/ncsi: Add NCSI Intel OEM command to keep PHY link up

Message ID 20210708122754.555846-1-i.mikhaylov@yadro.com
Headers show
Series net/ncsi: Add NCSI Intel OEM command to keep PHY link up | expand

Message

Ivan Mikhaylov July 8, 2021, 12:27 p.m. UTC
Add NCSI Intel OEM command to keep PHY link up and prevents any channel
resets during the host load on i210. Also includes dummy response handler for
Intel manufacturer id.

Changes from v1:
 1. sparse fixes about casts
 2. put it after ncsi_dev_state_probe_cis instead of
    ncsi_dev_state_probe_channel because sometimes channel is not ready
    after it
 3. inl -> intel

Ivan Mikhaylov (3):
  net/ncsi: fix restricted cast warning of sparse
  net/ncsi: add NCSI Intel OEM command to keep PHY up
  net/ncsi: add dummy response handler for Intel boards

 net/ncsi/Kconfig       |  6 +++++
 net/ncsi/internal.h    |  5 +++++
 net/ncsi/ncsi-manage.c | 51 +++++++++++++++++++++++++++++++++++++++---
 net/ncsi/ncsi-rsp.c    | 11 +++++++--
 4 files changed, 68 insertions(+), 5 deletions(-)

Comments

Paul Fertser July 20, 2021, 9:53 a.m. UTC | #1
Hello,

On Thu, Jul 08, 2021 at 03:27:51PM +0300, Ivan Mikhaylov wrote:
> Add NCSI Intel OEM command to keep PHY link up and prevents any channel

> resets during the host load on i210.


There're multiple things to consider here and I have hesitations about
the way you propose to solve the issue.

While the host is booted up and fully functional it assumes it has
full proper control of network cards, and sometimes it really needs to
reset them to e.g. recover from crashed firmware. The PHY resets might
also make sense in certain cases, and so in general having this "link
up" bit set all the time might be breaking assumptions.

As far as I can tell the Intel developers assumed you would enable
this bit just prior to powering on the host and turn off after all the
POST codes are transferred and we can assume the host system is done
with the UEFI stage and the real OS took over.

OpenBMC seems to have all the necessary hooks to do it that way, and
you have a netlink command to send whatever you need for that from the
userspace, e.g. with the "C version" ncsi-netlink command to set this
bit just run:

ncsi-netlink -l 3 -c 0 -p 0 -o 0x50 0x00 0x00 0x01 0x57 0x20 0x00 0x01

https://gerrit.openbmc-project.xyz/c/openbmc/phosphor-networkd/+/36592
would provide an OpenBMC-specific way too.


There's another related thing to consider here: by default I210 has
power-saving modes enabled and so when BMC is booting the link is
established only in 100BASE-T mode. With this configuration and this
bit always set you'd be always stuck to that, never getting Gigabit
speeds.

For server motherboards I propose to configure I210 with this:
./eeupdate64e /all /ww 0x13 0x0081 # disable Low Power Link Up
./eeupdate64e /all /ww 0x20 0x2004 # enable 1000 in non-D0a
(it's a one-time operation that's best to be performed along with the
initial I210 flashing)


Ivan, so far I have an impression that the user-space solution would
be much easier, flexible and manageable and that there's no need for
this command to be in Linux at all.

-- 
Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
mailto:fercerpav@gmail.com
Ivan Mikhaylov July 20, 2021, 2 p.m. UTC | #2
On Tue, 2021-07-20 at 12:53 +0300, Paul Fertser wrote:
> Hello,

> 

> On Thu, Jul 08, 2021 at 03:27:51PM +0300, Ivan Mikhaylov wrote:

> > Add NCSI Intel OEM command to keep PHY link up and prevents any channel

> > resets during the host load on i210.

> 

> There're multiple things to consider here and I have hesitations about

> the way you propose to solve the issue.

> 

> While the host is booted up and fully functional it assumes it has

> full proper control of network cards, and sometimes it really needs to

> reset them to e.g. recover from crashed firmware. The PHY resets might

> also make sense in certain cases, and so in general having this "link

> up" bit set all the time might be breaking assumptions.


Paul, what kind of assumption it would break? You know that you set that option
in your kernel, anyways you can look at /proc/config.gz if you have hesitations.
In other ways, if you're saying about possible runtime control, there is ncsi-
netlink control and solution from phosphor-networkd which is on review stage.
Joel proposed it as DTS option which may help at runtime. Some of those commands
should be applied after channel probe as I think including phy reset control.

> As far as I can tell the Intel developers assumed you would enable

> this bit just prior to powering on the host and turn off after all the

> POST codes are transferred and we can assume the host system is done

> with the UEFI stage and the real OS took over.

> 

> OpenBMC seems to have all the necessary hooks to do it that way, and

> you have a netlink command to send whatever you need for that from the

> userspace, e.g. with the "C version" ncsi-netlink command to set this

> bit just run:

> 

> ncsi-netlink -l 3 -c 0 -p 0 -o 0x50 0x00 0x00 0x01 0x57 0x20 0x00 0x01

> 

> https://gerrit.openbmc-project.xyz/c/openbmc/phosphor-networkd/+/36592

> would provide an OpenBMC-specific way too.


I know about it, Eddie posted that link before.

> There's another related thing to consider here: by default I210 has

> power-saving modes enabled and so when BMC is booting the link is

> established only in 100BASE-T mode. With this configuration and this

> bit always set you'd be always stuck to that, never getting Gigabit

> speeds.

> 

> For server motherboards I propose to configure I210 with this:

> ./eeupdate64e /all /ww 0x13 0x0081 # disable Low Power Link Up

> ./eeupdate64e /all /ww 0x20 0x2004 # enable 1000 in non-D0a

> (it's a one-time operation that's best to be performed along with the

> initial I210 flashing)


Good to know, thanks.

> Ivan, so far I have an impression that the user-space solution would

> be much easier, flexible and manageable and that there's no need for

> this command to be in Linux at all.


You may not have such things on your image with suitable env which you can rely
on. There is smaf for mellanox which is done in the same way for example.

Thanks.
Paul Fertser July 20, 2021, 2:16 p.m. UTC | #3
On Tue, Jul 20, 2021 at 05:00:40PM +0300, Ivan Mikhaylov wrote:
> > While the host is booted up and fully functional it assumes it has

> > full proper control of network cards, and sometimes it really needs to

> > reset them to e.g. recover from crashed firmware. The PHY resets might

> > also make sense in certain cases, and so in general having this "link

> > up" bit set all the time might be breaking assumptions.

> 

> Paul, what kind of assumption it would break?


The host OS drivers assume they can fully control PCIe network
cards. Doing anything (including inhibiting PHY resets) behind its
back might break assumptions the driver authors had. This bit in
question certainly makes the card behave in an unusual way, so no
wonder Intel didn't enable it by default.

I do not claim I know for a fact it's problematic but it doesn't feel
like "the right thing" so some edge cases might expose issues.

> Joel proposed it as DTS option which may help at runtime.


Sorry, I'm not following. If BMC is fully booted it's able to
configure NC-SI appropriately by a userspace action coordinated with
other BMC tasks. If BMC is not yet ready then we can't communicate
with it via Ethernet anyway. So I can't see when exactly is it going
to be helpful.

> Some of those commands should be applied after channel probe as I

> think including phy reset control.


Do you have any other commands in mind? So far I assumed we're
discussing just the one to mask PHY resets.

> > Ivan, so far I have an impression that the user-space solution would

> > be much easier, flexible and manageable and that there's no need for

> > this command to be in Linux at all.

> 

> You may not have such things on your image with suitable env which you can rely

> on. There is smaf for mellanox which is done in the same way for example.


I can hardly imagine why an OS running on BMC would be using this code
in question and appropriate DT configuration but not having the right
means in userspace to control it. What would be the usecase?

If the network subsystem maintainers think this is a good idea, all
things considered, I'm fine with it. I210 losing link exactly at the
time when you need it (to enter the UEFI interactive menu) is
super-annoying, so probably any fix is better than none :)

Thank you for discussion.

-- 
Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
mailto:fercerpav@gmail.com