Message ID | 20210212025641.323844-2-saeed@kernel.org |
---|---|
State | New |
Headers | show |
Series | mlx5 fixes 2021-02-11 | expand |
On Fri, Feb 12, 2021 at 3:59 AM Saeed Mahameed <saeed@kernel.org> wrote: > > From: Parav Pandit <parav@nvidia.com> > > rate_bytes_ps is a 64-bit field. It passed as 32-bit field to > apply_police_params(). Due to this when police rate is higher > than 4Gbps, 32-bit calculation ignores the carry. This results > in incorrect rate configurationn the device. > > Fix it by performing 64-bit calculation. I just stumbled over this commit while looking at an unrelated problem. > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > index dd0bfbacad47..717fbaa6ce73 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > @@ -5040,7 +5040,7 @@ static int apply_police_params(struct mlx5e_priv *priv, u64 rate, > */ > if (rate) { > rate = (rate * BITS_PER_BYTE) + 500000; > - rate_mbps = max_t(u32, do_div(rate, 1000000), 1); > + rate_mbps = max_t(u64, do_div(rate, 1000000), 1); I think there are still multiple issues with this line: - Before commit 1fe3e3166b35 ("net/mlx5e: E-switch, Fix rate calculation for overflow"), it was trying to calculate rate divided by 1000000, but now it uses the remainder of the division rather than the quotient. I assume this was meant to use div_u64() instead of do_div(). - Both div_u64() and do_div() return a 32-bit number, and '1' is a constant that also comfortably fits into a 32-bit number, so changing the max_t to return a 64-bit type has no effect on the result - The maximum of an arbitrary unsigned integer and '1' is either one or zero, so there doesn't need to be an expensive division here at all. From the comment it sounds like the intention was to use 'min_t()' instead of 'max_t()'. It has however used 'max_t' since the code was first introduced. Arnd
On Sat, 2021-02-27 at 13:14 +0100, Arnd Bergmann wrote: > On Fri, Feb 12, 2021 at 3:59 AM Saeed Mahameed <saeed@kernel.org> > wrote: > > > > From: Parav Pandit <parav@nvidia.com> > > > > rate_bytes_ps is a 64-bit field. It passed as 32-bit field to > > apply_police_params(). Due to this when police rate is higher > > than 4Gbps, 32-bit calculation ignores the carry. This results > > in incorrect rate configurationn the device. > > > > Fix it by performing 64-bit calculation. > > I just stumbled over this commit while looking at an unrelated > problem. > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > index dd0bfbacad47..717fbaa6ce73 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > @@ -5040,7 +5040,7 @@ static int apply_police_params(struct > > mlx5e_priv *priv, u64 rate, > > */ > > if (rate) { > > rate = (rate * BITS_PER_BYTE) + 500000; > > - rate_mbps = max_t(u32, do_div(rate, 1000000), 1); > > + rate_mbps = max_t(u64, do_div(rate, 1000000), 1); > > I think there are still multiple issues with this line: > > - Before commit 1fe3e3166b35 ("net/mlx5e: E-switch, Fix rate > calculation for > overflow"), it was trying to calculate rate divided by 1000000, but > now > it uses the remainder of the division rather than the quotient. I > assume > this was meant to use div_u64() instead of do_div(). > Yes, I already have a patch lined up to fix this issue. Thanks for spotting this. > - Both div_u64() and do_div() return a 32-bit number, and '1' is a > constant > that also comfortably fits into a 32-bit number, so changing the > max_t > to return a 64-bit type has no effect on the result > as of the above comment, we shouldn't be using the return value of do_div(). > - The maximum of an arbitrary unsigned integer and '1' is either one > or zero, > so there doesn't need to be an expensive division here at all. > From the > comment it sounds like the intention was to use 'min_t()' instead > of 'max_t()'. > It has however used 'max_t' since the code was first introduced. > if the input rate is less that 1mbps then the quotient will be 0, otherwise we want the quotient, and we don't allow 0, so max_t(rate, 1) should be used, what am I missing ?
On Tue, Mar 2, 2021 at 1:52 AM Saeed Mahameed <saeed@kernel.org> wrote: > On Sat, 2021-02-27 at 13:14 +0100, Arnd Bergmann wrote: > > On Fri, Feb 12, 2021 at 3:59 AM Saeed Mahameed <saeed@kernel.org> > > wrote: > > > > > > From: Parav Pandit <parav@nvidia.com> > > > > > > rate_bytes_ps is a 64-bit field. It passed as 32-bit field to > > > apply_police_params(). Due to this when police rate is higher > > > than 4Gbps, 32-bit calculation ignores the carry. This results > > > in incorrect rate configurationn the device. > > > > > > Fix it by performing 64-bit calculation. > > > > I just stumbled over this commit while looking at an unrelated > > problem. > > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > > b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > > index dd0bfbacad47..717fbaa6ce73 100644 > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > > @@ -5040,7 +5040,7 @@ static int apply_police_params(struct > > > mlx5e_priv *priv, u64 rate, > > > */ > > > if (rate) { > > > rate = (rate * BITS_PER_BYTE) + 500000; > > > - rate_mbps = max_t(u32, do_div(rate, 1000000), 1); > > > + rate_mbps = max_t(u64, do_div(rate, 1000000), 1); > > > > I think there are still multiple issues with this line: > > > > - Before commit 1fe3e3166b35 ("net/mlx5e: E-switch, Fix rate > > calculation for > > overflow"), it was trying to calculate rate divided by 1000000, but > > now > > it uses the remainder of the division rather than the quotient. I > > assume > > this was meant to use div_u64() instead of do_div(). > > > > Yes, I already have a patch lined up to fix this issue. ok > > - Both div_u64() and do_div() return a 32-bit number, and '1' is a > > constant > > that also comfortably fits into a 32-bit number, so changing the > > max_t > > to return a 64-bit type has no effect on the result > > > > as of the above comment, we shouldn't be using the return value of > do_div(). Ok, I was confused here because do_div() returns a 32-bit type, and is called by div_u64(). Of course that was nonsense because do_div() returns the 32-bit remainder, while the division result remains 64-bit. > > - The maximum of an arbitrary unsigned integer and '1' is either one > > or zero, > > so there doesn't need to be an expensive division here at all. > > From the > > comment it sounds like the intention was to use 'min_t()' instead > > of 'max_t()'. > > It has however used 'max_t' since the code was first introduced. > > > > if the input rate is less that 1mbps then the quotient will be 0, > otherwise we want the quotient, and we don't allow 0, so max_t(rate, 1) > should be used, what am I missing ? And I have no idea what I was thinking here, of course you are right and there is no other bug. Arnd
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index dd0bfbacad47..717fbaa6ce73 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -5040,7 +5040,7 @@ static int apply_police_params(struct mlx5e_priv *priv, u64 rate, */ if (rate) { rate = (rate * BITS_PER_BYTE) + 500000; - rate_mbps = max_t(u32, do_div(rate, 1000000), 1); + rate_mbps = max_t(u64, do_div(rate, 1000000), 1); } err = mlx5_esw_modify_vport_rate(esw, vport_num, rate_mbps);