[net-next,0/8,pull,request] 100GbE Intel Wired LAN Driver Updates 2021-06-11

Message ID	20210611162000.2438023-1-anthony.l.nguyen@intel.com
Headers	show Return-Path: <netdev-owner@kernel.org> IronPort-SDR: GymGEwo8BrAWlY9WzIGO6BiHZIkDJ7n9h6/+UVcPtxdpLrC/kCuk+h6HZM08Lc6gJo2TXVXysO 4LUTxYIBhoeg== IronPort-SDR: oyqQnuwLKypu4fzUGxfg/FQE7QfPICa6J/Su48ujQrAAMWyH5/IVsU+BEJvQOvrunhSxCjzTZ1 N55vgm2P9XaA== From: Tony Nguyen <anthony.l.nguyen@intel.com> To: davem@davemloft.net, kuba@kernel.org Cc: Tony Nguyen <anthony.l.nguyen@intel.com>, netdev@vger.kernel.org, sassmann@redhat.com, richardcochran@gmail.com, jacob.e.keller@intel.com Subject: [PATCH net-next 0/8][pull request] 100GbE Intel Wired LAN Driver Updates 2021-06-11 Date: Fri, 11 Jun 2021 09:19:52 -0700 Message-Id: <20210611162000.2438023-1-anthony.l.nguyen@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	100GbE Intel Wired LAN Driver Updates 2021-06-11 \| expand [net-next,0/8,pull,request] 100GbE Intel Wired LAN Driver Updates 2021-06-11 [net-next,4/8] ice: add low level PTP clock access functions [net-next,7/8] ice: enable receive hardware timestamping [net-next,8/8] ice: enable transmit timestamps for E810 devices

Tony Nguyen June 11, 2021, 4:19 p.m. UTC

Jake Keller says:

Extend the ice driver to support basic PTP clock functionality for E810
devices.

This includes some tangential work required to setup the sideband queue and
driver shared parameters as well.

This series only supports E810-based devices. This is because other devices
based on the E822 MAC use a different and more complex PHY.

The low level device functionality is kept within ice_ptp_hw.c and is
designed to be extensible for supporting E822 devices in a future series.

This series also only supports very basic functionality including the
ptp_clock device and timestamping. Support for configuring periodic outputs
and external input timestamps will be implemented in a future series.

There are a couple of potential "what? why?" bits in this series I want to
point out:

1) the PTP hardware functionality is shared between multiple functions. This
means that the same clock registers are shared across multiple PFs. In order
to avoid contention or clashing between PFs, firmware assigns "ownership" to
one PF, while other PFs are merely "associated" with the timer. Because we
share the hardware resource, only the clock owner will allocate and register
a PTP clock device. Other PFs determine the appropriate PTP clock index to
report by using a firmware interface to read a shared parameter that is set
by the owning PF.

2) the ice driver uses its own kthread instead of using do_aux_work. This is
because the periodic and asynchronous tasks are necessary for all PFs, but
only one PF will allocate the clock.

The series is broken up into functional pieces to allow easy review.

The following are changes since commit 76cf404c40ae8efcf8c6405535a3f6f69e6ba2a5:
  Merge branch 'ipa-mem-2'
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue 100GbE

Jacob Keller (8):
  ice: add support for sideband messages
  ice: process 1588 PTP capabilities during initialization
  ice: add support for set/get of driver-stored firmware parameters
  ice: add low level PTP clock access functions
  ice: register 1588 PTP clock device object for E810 devices
  ice: report the PTP clock index in ethtool .get_ts_info
  ice: enable receive hardware timestamping
  ice: enable transmit timestamps for E810 devices

 drivers/net/ethernet/intel/Kconfig            |    1 +
 drivers/net/ethernet/intel/ice/Makefile       |    1 +
 drivers/net/ethernet/intel/ice/ice.h          |    8 +-
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   41 +
 drivers/net/ethernet/intel/ice/ice_base.c     |   14 +-
 drivers/net/ethernet/intel/ice/ice_common.c   |  243 ++++
 drivers/net/ethernet/intel/ice/ice_common.h   |   10 +
 drivers/net/ethernet/intel/ice/ice_controlq.c |   62 +
 drivers/net/ethernet/intel/ice/ice_controlq.h |    2 +
 drivers/net/ethernet/intel/ice/ice_ethtool.c  |   27 +-
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   69 +
 drivers/net/ethernet/intel/ice/ice_lib.c      |   20 +-
 drivers/net/ethernet/intel/ice/ice_lib.h      |    3 +-
 drivers/net/ethernet/intel/ice/ice_main.c     |   95 ++
 drivers/net/ethernet/intel/ice/ice_ptp.c      | 1269 +++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_ptp.h      |  161 +++
 drivers/net/ethernet/intel/ice/ice_ptp_hw.c   |  653 +++++++++
 drivers/net/ethernet/intel/ice/ice_ptp_hw.h   |   79 +
 drivers/net/ethernet/intel/ice/ice_sbq_cmd.h  |   92 ++
 drivers/net/ethernet/intel/ice/ice_txrx.c     |   37 +
 drivers/net/ethernet/intel/ice/ice_txrx.h     |    5 +
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c |    3 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   62 +
 include/linux/kernel.h                        |   12 +
 24 files changed, 2962 insertions(+), 7 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ice/ice_ptp.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_ptp.h
 create mode 100644 drivers/net/ethernet/intel/ice/ice_ptp_hw.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_ptp_hw.h
 create mode 100644 drivers/net/ethernet/intel/ice/ice_sbq_cmd.h

patchwork-bot+netdevbpf@kernel.org June 11, 2021, 9 p.m. UTC | #1

Hello:

This series was applied to netdev/net-next.git (refs/heads/master):

On Fri, 11 Jun 2021 09:19:52 -0700 you wrote:
> Jake Keller says:
> 
> Extend the ice driver to support basic PTP clock functionality for E810
> devices.
> 
> This includes some tangential work required to setup the sideband queue and
> driver shared parameters as well.
> 
> [...]

Here is the summary with links:
  - [net-next,1/8] ice: add support for sideband messages
    https://git.kernel.org/netdev/net-next/c/8f5ee3c477a8
  - [net-next,2/8] ice: process 1588 PTP capabilities during initialization
    https://git.kernel.org/netdev/net-next/c/9733cc94c523
  - [net-next,3/8] ice: add support for set/get of driver-stored firmware parameters
    https://git.kernel.org/netdev/net-next/c/7f9ab54d3144
  - [net-next,4/8] ice: add low level PTP clock access functions
    https://git.kernel.org/netdev/net-next/c/03cb4473be92
  - [net-next,5/8] ice: register 1588 PTP clock device object for E810 devices
    https://git.kernel.org/netdev/net-next/c/06c16d89d2cb
  - [net-next,6/8] ice: report the PTP clock index in ethtool .get_ts_info
    https://git.kernel.org/netdev/net-next/c/67569a7f9401
  - [net-next,7/8] ice: enable receive hardware timestamping
    https://git.kernel.org/netdev/net-next/c/77a781155a65
  - [net-next,8/8] ice: enable transmit timestamps for E810 devices
    https://git.kernel.org/netdev/net-next/c/ea9b847cda64

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html

Jacob Keller June 14, 2021, 4:43 p.m. UTC | #2

On 6/11/2021 2:18 PM, Jakub Kicinski wrote:
> On Fri, 11 Jun 2021 09:19:57 -0700 Tony Nguyen wrote:

>> +static u64

>> +ice_ptp_read_src_clk_reg(struct ice_pf *pf, struct ptp_system_timestamp *sts)

>> +{

>> +	struct ice_hw *hw = &pf->hw;

>> +	u32 hi, lo, lo2;

>> +	u8 tmr_idx;

>> +

>> +	tmr_idx = ice_get_ptp_src_clock_index(hw);

>> +	/* Read the system timestamp pre PHC read */

>> +	if (sts)

>> +		ptp_read_system_prets(sts);

>> +

>> +	lo = rd32(hw, GLTSYN_TIME_L(tmr_idx));

>> +

>> +	/* Read the system timestamp post PHC read */

>> +	if (sts)

>> +		ptp_read_system_postts(sts);

>> +

>> +	hi = rd32(hw, GLTSYN_TIME_H(tmr_idx));

>> +	lo2 = rd32(hw, GLTSYN_TIME_L(tmr_idx));

>> +

>> +	if (lo2 < lo) {

>> +		/* if TIME_L rolled over read TIME_L again and update

>> +		 * system timestamps

>> +		 */

>> +		if (sts)

>> +			ptp_read_system_prets(sts);

>> +		lo = rd32(hw, GLTSYN_TIME_L(tmr_idx));

>> +		if (sts)

>> +			ptp_read_system_postts(sts);

> 

> ptp_read_system* helpers already check for NULL sts.

>


Hah. Yep, I knew that... and of course I forgot about it.

> 

>> +static int ice_ptp_adjfine(struct ptp_clock_info *info, long scaled_ppm)

>> +{

>> +	struct ice_pf *pf = ptp_info_to_pf(info);

>> +	u64 freq, divisor = 1000000ULL;

>> +	struct ice_hw *hw = &pf->hw;

>> +	s64 incval, diff;

>> +	int neg_adj = 0;

>> +	int err;

>> +

>> +	incval = ICE_PTP_NOMINAL_INCVAL_E810;

>> +

>> +	if (scaled_ppm < 0) {

>> +		neg_adj = 1;

>> +		scaled_ppm = -scaled_ppm;

>> +	}

>> +

>> +	while ((u64)scaled_ppm > div_u64(U64_MAX, incval)) {

>> +		/* handle overflow by scaling down the scaled_ppm and

>> +		 * the divisor, losing some precision

>> +		 */

>> +		scaled_ppm >>= 2;

>> +		divisor >>= 2;

>> +	}

> 

> I have a question regarding ppm overflows.

> 

> We have the max_adj field in struct ptp_clock_info which is checked

> against ppb, but ppb is a signed 32 bit and scaled_ppm is a long,

> meaning values larger than S32_MAX << 16 / 1000 will overflow 

> the ppb calculation, and therefore the check.

> 


Hmmm.. I thought ppb was a s64, not an s32.

In general, I believe max_adj is usually capped at 1 billion anyways,
since it doesn't make sense to slow a clock by more than 1billioln ppb,
and increasing it more than that isn't really useful either.

> Are we okay with that? Is my math off? Did I miss some part 

> of the kernel which filters crazy high scaled_ppm/freq?

> 

> Since dialed_freq is updated regardless of return value of .adjfine 

> the driver has no clear way to reject bad scaled_ppm>


I'm not sure. +Richard?

>> +	freq = (incval * (u64)scaled_ppm) >> 16;

>> +	diff = div_u64(freq, divisor);

Jakub Kicinski June 14, 2021, 6:08 p.m. UTC | #3

On Mon, 14 Jun 2021 09:43:17 -0700 Jacob Keller wrote:
> >> +static int ice_ptp_adjfine(struct ptp_clock_info *info, long scaled_ppm)

> >> +{

> >> +	struct ice_pf *pf = ptp_info_to_pf(info);

> >> +	u64 freq, divisor = 1000000ULL;

> >> +	struct ice_hw *hw = &pf->hw;

> >> +	s64 incval, diff;

> >> +	int neg_adj = 0;

> >> +	int err;

> >> +

> >> +	incval = ICE_PTP_NOMINAL_INCVAL_E810;

> >> +

> >> +	if (scaled_ppm < 0) {

> >> +		neg_adj = 1;

> >> +		scaled_ppm = -scaled_ppm;

> >> +	}

> >> +

> >> +	while ((u64)scaled_ppm > div_u64(U64_MAX, incval)) {

> >> +		/* handle overflow by scaling down the scaled_ppm and

> >> +		 * the divisor, losing some precision

> >> +		 */

> >> +		scaled_ppm >>= 2;

> >> +		divisor >>= 2;

> >> +	}  

> > 

> > I have a question regarding ppm overflows.

> > 

> > We have the max_adj field in struct ptp_clock_info which is checked

> > against ppb, but ppb is a signed 32 bit and scaled_ppm is a long,

> > meaning values larger than S32_MAX << 16 / 1000 will overflow 

> > the ppb calculation, and therefore the check.

> 

> Hmmm.. I thought ppb was a s64, not an s32.

> 

> In general, I believe max_adj is usually capped at 1 billion anyways,

> since it doesn't make sense to slow a clock by more than 1billioln ppb,

> and increasing it more than that isn't really useful either.


Do you mean it's capped somewhere in the code to 1B?

I'm no time expert but this is not probability where 1 is a magic
value, adjusting clock by 1 - 1ppb vs 1 + 1ppb makes little difference,
no? Both mean something is super fishy with the nominal or expected
frequency, but the hardware can do that and more.

Flipping the question, if adjusting by large ppb values is not correct,
why not cap the adjustment at the value which would prevent the u64
overflow?

I don't really have a preferences here, I'm mostly disturbed by 
the overflow in the ppb vs max_adj check.

> > Are we okay with that? Is my math off? Did I miss some part 

> > of the kernel which filters crazy high scaled_ppm/freq?

> > 

> > Since dialed_freq is updated regardless of return value of .adjfine 

> > the driver has no clear way to reject bad scaled_ppm>  

> 

> I'm not sure. +Richard?

Richard Cochran June 14, 2021, 6:12 p.m. UTC | #4

On Mon, Jun 14, 2021 at 09:43:17AM -0700, Jacob Keller wrote:
> > Since dialed_freq is updated regardless of return value of .adjfine 

> > the driver has no clear way to reject bad scaled_ppm>

> 

> I'm not sure. +Richard?


The driver advertises "max_adj".  The PHC layer checks user space inputs:

ptp_clock.c line 140:
	} else if (tx->modes & ADJ_FREQUENCY) {
		s32 ppb = scaled_ppm_to_ppb(tx->freq);
		if (ppb > ops->max_adj || ppb < -ops->max_adj)
			return -ERANGE;

So, if the max_adj is correct for the driver/HW, then all is well.

Thanks,
Richard

Jakub Kicinski June 14, 2021, 6:50 p.m. UTC | #5

On Mon, 14 Jun 2021 11:12:20 -0700 Richard Cochran wrote:
> On Mon, Jun 14, 2021 at 09:43:17AM -0700, Jacob Keller wrote:

> > > Since dialed_freq is updated regardless of return value of .adjfine 

> > > the driver has no clear way to reject bad scaled_ppm>  

> > 

> > I'm not sure. +Richard?  

> 

> The driver advertises "max_adj".  The PHC layer checks user space inputs:

> 

> ptp_clock.c line 140:

> 	} else if (tx->modes & ADJ_FREQUENCY) {

> 		s32 ppb = scaled_ppm_to_ppb(tx->freq);

> 		if (ppb > ops->max_adj || ppb < -ops->max_adj)

> 			return -ERANGE;

> 

> So, if the max_adj is correct for the driver/HW, then all is well.


tx->freq is a long, and the conversion to ppb can overflow the s32 type.
E.g. 281474976645 will become -2 AFAICT. I hacked up phc_ctl to not do
range checking and kernel happily accepted that value. Shall we do this?

--->8----

diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index 03a246e60fd9..ed32fc98d9d8 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -63,7 +63,7 @@ static void enqueue_external_timestamp(struct timestamp_event_queue *queue,
        spin_unlock_irqrestore(&queue->lock, flags);
 }
 
-s32 scaled_ppm_to_ppb(long ppm)
+s64 scaled_ppm_to_ppb(long ppm)
 {
        /*
         * The 'freq' field in the 'struct timex' is in parts per
@@ -80,7 +80,7 @@ s32 scaled_ppm_to_ppb(long ppm)
        s64 ppb = 1 + ppm;
        ppb *= 125;
        ppb >>= 13;
-       return (s32) ppb;
+       return ppb;
 }
 EXPORT_SYMBOL(scaled_ppm_to_ppb);
 
@@ -138,7 +138,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct __kernel_timex *tx)
                delta = ktime_to_ns(kt);
                err = ops->adjtime(ops, delta);
        } else if (tx->modes & ADJ_FREQUENCY) {
-               s32 ppb = scaled_ppm_to_ppb(tx->freq);
+               s64 ppb = scaled_ppm_to_ppb(tx->freq);
                if (ppb > ops->max_adj || ppb < -ops->max_adj)
                        return -ERANGE;
                if (ops->adjfine)

Jacob Keller June 14, 2021, 7:48 p.m. UTC | #6

> -----Original Message-----

> From: Jakub Kicinski <kuba@kernel.org>

> Sent: Monday, June 14, 2021 11:51 AM

> To: Richard Cochran <richardcochran@gmail.com>

> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; Nguyen, Anthony L

> <anthony.l.nguyen@intel.com>; davem@davemloft.net;

> netdev@vger.kernel.org; sassmann@redhat.com; Brelinski, TonyX

> <tonyx.brelinski@intel.com>

> Subject: Re: [PATCH net-next 5/8] ice: register 1588 PTP clock device object for

> E810 devices

> 

> On Mon, 14 Jun 2021 11:12:20 -0700 Richard Cochran wrote:

> > On Mon, Jun 14, 2021 at 09:43:17AM -0700, Jacob Keller wrote:

> > > > Since dialed_freq is updated regardless of return value of .adjfine

> > > > the driver has no clear way to reject bad scaled_ppm>

> > >

> > > I'm not sure. +Richard?

> >

> > The driver advertises "max_adj".  The PHC layer checks user space inputs:

> >

> > ptp_clock.c line 140:

> > 	} else if (tx->modes & ADJ_FREQUENCY) {

> > 		s32 ppb = scaled_ppm_to_ppb(tx->freq);

> > 		if (ppb > ops->max_adj || ppb < -ops->max_adj)

> > 			return -ERANGE;

> >

> > So, if the max_adj is correct for the driver/HW, then all is well.

> 

> tx->freq is a long, and the conversion to ppb can overflow the s32 type.

> E.g. 281474976645 will become -2 AFAICT. I hacked up phc_ctl to not do

> range checking and kernel happily accepted that value. Shall we do this?

> 

> --->8----

> 

> diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c

> index 03a246e60fd9..ed32fc98d9d8 100644

> --- a/drivers/ptp/ptp_clock.c

> +++ b/drivers/ptp/ptp_clock.c

> @@ -63,7 +63,7 @@ static void enqueue_external_timestamp(struct

> timestamp_event_queue *queue,

>         spin_unlock_irqrestore(&queue->lock, flags);

>  }

> 

> -s32 scaled_ppm_to_ppb(long ppm)

> +s64 scaled_ppm_to_ppb(long ppm)

>  {

>         /*

>          * The 'freq' field in the 'struct timex' is in parts per

> @@ -80,7 +80,7 @@ s32 scaled_ppm_to_ppb(long ppm)

>         s64 ppb = 1 + ppm;

>         ppb *= 125;

>         ppb >>= 13;

> -       return (s32) ppb;

> +       return ppb;

>  }

>  EXPORT_SYMBOL(scaled_ppm_to_ppb);

> 

> @@ -138,7 +138,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct

> __kernel_timex *tx)

>                 delta = ktime_to_ns(kt);

>                 err = ops->adjtime(ops, delta);

>         } else if (tx->modes & ADJ_FREQUENCY) {

> -               s32 ppb = scaled_ppm_to_ppb(tx->freq);

> +               s64 ppb = scaled_ppm_to_ppb(tx->freq);

>                 if (ppb > ops->max_adj || ppb < -ops->max_adj)

>                         return -ERANGE;

>                 if (ops->adjfine)


This looks correct to me.

Jacob Keller June 14, 2021, 7:50 p.m. UTC | #7

> -----Original Message-----

> From: Jakub Kicinski <kuba@kernel.org>

> Sent: Monday, June 14, 2021 11:09 AM

> To: Keller, Jacob E <jacob.e.keller@intel.com>

> Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Richard Cochran

> <richardcochran@gmail.com>; davem@davemloft.net; netdev@vger.kernel.org;

> sassmann@redhat.com; Brelinski, TonyX <tonyx.brelinski@intel.com>

> Subject: Re: [PATCH net-next 5/8] ice: register 1588 PTP clock device object for

> E810 devices

> 

> On Mon, 14 Jun 2021 09:43:17 -0700 Jacob Keller wrote:

> > >> +static int ice_ptp_adjfine(struct ptp_clock_info *info, long scaled_ppm)

> > >> +{

> > >> +	struct ice_pf *pf = ptp_info_to_pf(info);

> > >> +	u64 freq, divisor = 1000000ULL;

> > >> +	struct ice_hw *hw = &pf->hw;

> > >> +	s64 incval, diff;

> > >> +	int neg_adj = 0;

> > >> +	int err;

> > >> +

> > >> +	incval = ICE_PTP_NOMINAL_INCVAL_E810;

> > >> +

> > >> +	if (scaled_ppm < 0) {

> > >> +		neg_adj = 1;

> > >> +		scaled_ppm = -scaled_ppm;

> > >> +	}

> > >> +

> > >> +	while ((u64)scaled_ppm > div_u64(U64_MAX, incval)) {

> > >> +		/* handle overflow by scaling down the scaled_ppm and

> > >> +		 * the divisor, losing some precision

> > >> +		 */

> > >> +		scaled_ppm >>= 2;

> > >> +		divisor >>= 2;

> > >> +	}

> > >

> > > I have a question regarding ppm overflows.

> > >

> > > We have the max_adj field in struct ptp_clock_info which is checked

> > > against ppb, but ppb is a signed 32 bit and scaled_ppm is a long,

> > > meaning values larger than S32_MAX << 16 / 1000 will overflow

> > > the ppb calculation, and therefore the check.

> >

> > Hmmm.. I thought ppb was a s64, not an s32.

> >

> > In general, I believe max_adj is usually capped at 1 billion anyways,

> > since it doesn't make sense to slow a clock by more than 1billioln ppb,

> > and increasing it more than that isn't really useful either.

> 

> Do you mean it's capped somewhere in the code to 1B?

> 

> I'm no time expert but this is not probability where 1 is a magic

> value, adjusting clock by 1 - 1ppb vs 1 + 1ppb makes little difference,

> no? Both mean something is super fishy with the nominal or expected

> frequency, but the hardware can do that and more.

> 

> Flipping the question, if adjusting by large ppb values is not correct,

> why not cap the adjustment at the value which would prevent the u64

> overflow?


Large ppb values are sometimes used when you want to slew a clock to bring it in sync when its a few milliseconds to seconds off, without performing a time jump (so that you maintain monotonic increasing time).

That being said, we are supposed to be checking max_adj, except that you're right the conversion to ppb could overflow, and there's no check prior to the conversion from scaled_ppm to ppb.

> 

> I don't really have a preferences here, I'm mostly disturbed by

> the overflow in the ppb vs max_adj check.

> 

> > > Are we okay with that? Is my math off? Did I miss some part

> > > of the kernel which filters crazy high scaled_ppm/freq?

> > >

> > > Since dialed_freq is updated regardless of return value of .adjfine

> > > the driver has no clear way to reject bad scaled_ppm>

> >

> > I'm not sure. +Richard?

Jakub Kicinski June 14, 2021, 8:48 p.m. UTC | #8

On Mon, 14 Jun 2021 19:50:23 +0000 Keller, Jacob E wrote:
> > > Hmmm.. I thought ppb was a s64, not an s32.

> > >

> > > In general, I believe max_adj is usually capped at 1 billion anyways,

> > > since it doesn't make sense to slow a clock by more than 1billioln ppb,

> > > and increasing it more than that isn't really useful either.  

> > 

> > Do you mean it's capped somewhere in the code to 1B?

> > 

> > I'm no time expert but this is not probability where 1 is a magic

> > value, adjusting clock by 1 - 1ppb vs 1 + 1ppb makes little difference,

> > no? Both mean something is super fishy with the nominal or expected

> > frequency, but the hardware can do that and more.

> > 

> > Flipping the question, if adjusting by large ppb values is not correct,

> > why not cap the adjustment at the value which would prevent the u64

> > overflow?  

> 

> Large ppb values are sometimes used when you want to slew a clock to

> bring it in sync when its a few milliseconds to seconds off, without

> performing a time jump (so that you maintain monotonic increasing

> time).


Ah, you're right, ptp4l will explicitly cap the freq adjustments
based on max_adj from sysfs, so setting max_adj too low could impact
the convergence time in strange scenarios.

> That being said, we are supposed to be checking max_adj, except that

> you're right the conversion to ppb could overflow, and there's no

> check prior to the conversion from scaled_ppm to ppb.

Jacob Keller June 14, 2021, 8:51 p.m. UTC | #9

On 6/14/2021 1:48 PM, Jakub Kicinski wrote:
> On Mon, 14 Jun 2021 19:50:23 +0000 Keller, Jacob E wrote:

>>>> Hmmm.. I thought ppb was a s64, not an s32.

>>>>

>>>> In general, I believe max_adj is usually capped at 1 billion anyways,

>>>> since it doesn't make sense to slow a clock by more than 1billioln ppb,

>>>> and increasing it more than that isn't really useful either.  

>>>

>>> Do you mean it's capped somewhere in the code to 1B?

>>>

>>> I'm no time expert but this is not probability where 1 is a magic

>>> value, adjusting clock by 1 - 1ppb vs 1 + 1ppb makes little difference,

>>> no? Both mean something is super fishy with the nominal or expected

>>> frequency, but the hardware can do that and more.

>>>

>>> Flipping the question, if adjusting by large ppb values is not correct,

>>> why not cap the adjustment at the value which would prevent the u64

>>> overflow?  

>>

>> Large ppb values are sometimes used when you want to slew a clock to

>> bring it in sync when its a few milliseconds to seconds off, without

>> performing a time jump (so that you maintain monotonic increasing

>> time).

> 

> Ah, you're right, ptp4l will explicitly cap the freq adjustments

> based on max_adj from sysfs, so setting max_adj too low could impact

> the convergence time in strange scenarios.

> 


Your patch to fix it so that the conversion from scaled_ppm to ppb can't
overflow is the correct approach, here. The scaled_ppm function didn't
account for the fact that the provided adjustment could overflow the s32.

Increasing that to s64 ensures it won't overflow and prevents invalid
bogus frequencies from passing that check.

Jakub Kicinski June 14, 2021, 10:25 p.m. UTC | #10

On Mon, 14 Jun 2021 13:51:57 -0700 Jacob Keller wrote:
> > Ah, you're right, ptp4l will explicitly cap the freq adjustments

> > based on max_adj from sysfs, so setting max_adj too low could impact

> > the convergence time in strange scenarios.

> 

> Your patch to fix it so that the conversion from scaled_ppm to ppb can't

> overflow is the correct approach, here. The scaled_ppm function didn't

> account for the fact that the provided adjustment could overflow the s32.

> 

> Increasing that to s64 ensures it won't overflow and prevents invalid

> bogus frequencies from passing that check.


On second though I went with long for ppb IOW the same as scaled_ppm,
presumably the problem does not exist on 32bit, so no point using 64b
everywhere.

Richard Cochran June 15, 2021, 5:08 a.m. UTC | #11

On Mon, Jun 14, 2021 at 11:50:43AM -0700, Jakub Kicinski wrote:
> tx->freq is a long, and the conversion to ppb can overflow the s32 type.

> E.g. 281474976645 will become -2 AFAICT. I hacked up phc_ctl to not do

> range checking and kernel happily accepted that value. Shall we do this?


Yes, you are right.  The range check has a bug, and your fix is good.

Thanks,
Richard

[net-next,0/8,pull,request] 100GbE Intel Wired LAN Driver Updates 2021-06-11

Message

Comments