mbox series

[net-next,0/3] bonding: 3ad: support for 200G/400G ports and more verbose warning

Message ID 20210209103209.482770-1-razor@blackwall.org
Headers show
Series bonding: 3ad: support for 200G/400G ports and more verbose warning | expand

Message

Nikolay Aleksandrov Feb. 9, 2021, 10:32 a.m. UTC
From: Nikolay Aleksandrov <nikolay@nvidia.com>

Hi,
We'd like to have proper 200G and 400G support with 3ad bond mode, so we
need to add new definitions for them in order to have separate oper keys,
aggregated bandwidth and proper operation (patches 01 and 02). In
patch 03 Ido changes the code to use WARN_ONCE() instead of
pr_warn_once which would help future detection of unsupported speeds.
These warnings are usually detected by automated tools and regression
tests.

Thanks,
 Nik

Ido Schimmel (1):
  bonding: 3ad: Use a more verbose warning for unknown speeds

Nikolay Aleksandrov (2):
  bonding: 3ad: add support for 200G speed
  bonding: 3ad: add support for 400G speed

 drivers/net/bonding/bond_3ad.c | 27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

Comments

Alexander Duyck Feb. 10, 2021, 7:44 p.m. UTC | #1
On Tue, Feb 9, 2021 at 2:42 AM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>

> From: Ido Schimmel <idosch@nvidia.com>

>

> The bond driver needs to be patched to support new ethtool speeds.

> Currently it emits a single warning [1] when it encounters an unknown

> speed. As evident by the two previous patches, this is not explicit

> enough. Instead, use WARN_ONCE() to get a more verbose warning [2].

>

> [1]

> bond10: (slave swp1): unknown ethtool speed (200000) for port 1 (set it to 0)

>

> [2]

> bond20: (slave swp2): unknown ethtool speed (400000) for port 1 (set it to 0)

> WARNING: CPU: 5 PID: 96 at drivers/net/bonding/bond_3ad.c:317 __get_link_speed.isra.0+0x110/0x120

> Modules linked in:

> CPU: 5 PID: 96 Comm: kworker/u16:5 Not tainted 5.11.0-rc6-custom-02818-g69a767ec7302 #3243

> Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 01/06/2019

> Workqueue: bond20 bond_mii_monitor

> RIP: 0010:__get_link_speed.isra.0+0x110/0x120

> Code: 5b ff ff ff 52 4c 8b 4e 08 44 0f b7 c7 48 c7 c7 18 46 4a b8 48 8b 16 c6 05 d9 76 41 01 01 49 8b 31 89 44 24 04 e8 a2 8a 3f 00 <0f> 0b 8b 44 24 04 59 c3 0

> f 1f 84 00 00 00 00 00 48 85 ff 74 3b 53

> RSP: 0018:ffffb683c03afde0 EFLAGS: 00010282

> RAX: 0000000000000000 RBX: ffff96bd3f2a9a38 RCX: 0000000000000000

> RDX: ffff96c06fd67560 RSI: ffff96c06fd57850 RDI: ffff96c06fd57850

> RBP: 0000000000000000 R08: ffffffffb8b49888 R09: 0000000000009ffb

> R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000

> R13: ffff96bd3f2a9a38 R14: ffff96bd49c56400 R15: ffff96bd49c564f0

> FS:  0000000000000000(0000) GS:ffff96c06fd40000(0000) knlGS:0000000000000000

> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

> CR2: 00007f327ad804b0 CR3: 0000000142ad5006 CR4: 00000000003706e0

> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

> Call Trace:

>  ad_update_actor_keys+0x36/0xc0

>  bond_3ad_handle_link_change+0x5d/0xf0

>  bond_mii_monitor.cold+0x1c2/0x1e8

>  process_one_work+0x1c9/0x360

>  worker_thread+0x48/0x3c0

>  kthread+0x113/0x130

>  ret_from_fork+0x1f/0x30

>

> Signed-off-by: Ido Schimmel <idosch@nvidia.com>


I'm not really sure making the warning consume more text is really
going to solve the problem. I was actually much happier with just the
first error as I don't need a stack trace. Just having the line is
enough information for me to search and find the cause for the issue.
Adding a backtrace is just overkill.

If we really think this is something that is important maybe we should
move this up to an error instead of a warning. For example why not
make this use pr_err_once, instead of pr_warn_once? It should make it
more likely to be highlighted in the system log.
Ido Schimmel Feb. 10, 2021, 8:11 p.m. UTC | #2
On Wed, Feb 10, 2021 at 11:44:31AM -0800, Alexander Duyck wrote:
> I'm not really sure making the warning consume more text is really

> going to solve the problem. I was actually much happier with just the

> first error as I don't need a stack trace. Just having the line is

> enough information for me to search and find the cause for the issue.

> Adding a backtrace is just overkill.

> 

> If we really think this is something that is important maybe we should

> move this up to an error instead of a warning. For example why not

> make this use pr_err_once, instead of pr_warn_once? It should make it

> more likely to be highlighted in the system log.


Yea, I expected this comment.

We are currently looking for patterns such as 'BUG', 'WARNING', 'BUG
kmalloc', 'UBSAN' etc in regression. Mostly based on what syzkaller is
doing [1] (which we are also running). We can instead promote this
warning to pr_err_once() and start looking at errors as well. It might
uncover more issues / false positives.

[1] https://github.com/google/syzkaller/blob/42b90a7c596c2b7d8f8d034dff7d8c635631de5a/pkg/report/linux.go#L952