Message ID | 20201019172435.4416-1-ceggers@arri.de |
---|---|
Headers | show |
Series | net: dsa: microchip: PTP support for KSZ956x | expand |
Hi Richard, On Thursday, 22 October 2020, 04:42:01 CEST, Richard Cochran wrote: > I'm just catching up with this. > > Really. Truly. Please -- Include the maintainer on CC for such patches! sorry for missing you on the recipients list. I blindly trusted the output of get_maintainer.pl. I recently sent two other patches which may also be of interest for you. They related to handling of SO_TIMESTAMPING on 32 bit platforms with newer C libraries: https://patchwork.ozlabs.org/project/netdev/patch/20201012093542.15504-1-ceggers@arri.de/ https://patchwork.ozlabs.org/project/netdev/patch/20201012093542.15504-2-ceggers@arri.de/ > On Thu, Oct 22, 2020 at 02:39:35AM +0300, Vladimir Oltean wrote: > > On Mon, Oct 19, 2020 at 07:24:33PM +0200, Christian Eggers wrote: > > > The PTP hardware performs internal detection of PTP frames (likely > > > similar as ptp_classify_raw() and ptp_parse_header()). As these filters > > > cannot be disabled, the current delay mode (E2E/P2P) and the clock mode > > > (master/slave) must be configured via sysfs attributes. > > This is a complete no-go. NAK. I didn't design the hardware nor do I have access to adequate documentation. I will try to figure out what functionality is concretely affected by these two settings. If I am correct, the KSZ hardware consists of two main building blocks: 1. A TC on the switch path. 2. An OC on the DSA host port.
On Thu, Oct 22, 2020 at 01:11:40PM +0200, Christian Eggers wrote: > Hi Vladimir, > > On Thursday, 22 October 2020, 12:50:14 CEST, Vladimir Oltean wrote: > after applying the RX timestamp correctly to the correction field (shifting > the nanoseconds by 16), That modification should have been done anyway, since the unit of measurement for correctionField is scaled ppb (48 bits nanoseconds, 16 bits scaled nanoseconds), and not nanoseconds. > it seems that "moving" the timestamp back to the tail tag on TX is not > required anymore. Keeping the RX timestamp simply in the correction > field (negative value), works fine now. So this halves the effort in > the tag_ksz driver. Ok, this makes sense. Depending on what Richard responds, it now looks like the cleanest approach would be to move your implementation that is currently in ksz9477_update_ptp_correction_field() into a generic function called static inline void ptp_onestep_p2p_move_t2_to_correction(struct sk_buff *skb, unsigned int ptp_type, struct ptp_header *ptp_header, ktime_t t2) You should then clearly document that this function is needed for hardware capable of one-step P2P that does not already modify the correctionField of Pdelay_Req event messages on ingress.
On Thu, Oct 22, 2020 at 02:32:43PM +0300, Vladimir Oltean wrote: > On Thu, Oct 22, 2020 at 01:11:40PM +0200, Christian Eggers wrote: > > > it seems that "moving" the timestamp back to the tail tag on TX is not > > required anymore. Keeping the RX timestamp simply in the correction > > field (negative value), works fine now. So this halves the effort in > > the tag_ksz driver. > > Ok, this makes sense. > Depending on what Richard responds, it now looks like the cleanest > approach would be to move your implementation that is currently in > ksz9477_update_ptp_correction_field() into a generic function called +1
Hi Vladimir, On Friday, 30 October 2020, 19:24:47 CET, Vladimir Oltean wrote: > On Thu, Oct 22, 2020 at 12:17:48PM +0200, Christian Eggers wrote: > > I tried to study the effect of setting the ocmode bit on the KSZ either to > > master or to slave. The main visible change is, that some PTP message > > types > > are be filtered out on RX: > > - in "master" mode, "Sync" messages from other nodes will not be received > > (but everything else like "Announce" seem to work) > > - in "slave" mode, "Delay_Req" messages from other nodes will not be > > received > Could you dump the contents of your REG_PTP_MSG_CONF2 register? runtime register value is 0x1004 (matches default value from the data sheet). The Linux driver doesn't touch this register. Below is a dump of all PTP related (global) registers. regards Christian KSZ9563 (Ethernet switch) Global IEEE 1588 PTP CLKCTRL 0002 SWFA enabled CLKSADJ NOP PTPSD subtract CLKREAD NOP CLKWRITE NOP CLKCADJ disabled EN enabled RESET normal RTCCP 0002 PHASE 16ns RTCNS 17D72FF0 NANOSECONDS 399978480 RTCS 00000023 SECONDS 35 RTCSUBNS 00000000 RATEDIR subtract TEMPADJ permanent SUBNS 0 RTCTMPADJ 00000000 CYCLES 0 MSGCFG1 007D MODEEN enabled IEEE802.3 enabled UDPv4 enabled UDPv6 enabled TCMODE P2P OCMODE slave MSGCFG2 1004 UNICASTEN both ALTMASTER disabled PRIOTX event only CHKSYNFU disabled CHKDLY disabled CHKPDLY disabled DROP disabled CHKDOM disabled IPv4CHKSUM calc DOMVER 0200 VERSION 2 DOMAIN 0 UNITIDX 00000000 GPIO_IDX GPIO_1 TS_IDX Unit 0 TRIG_IDX Unit 0
On Sun, Nov 01, 2020 at 10:35:01AM +0100, Christian Eggers wrote: > Hi Vladimir, > > On Friday, 30 October 2020, 19:24:47 CET, Vladimir Oltean wrote: > > On Thu, Oct 22, 2020 at 12:17:48PM +0200, Christian Eggers wrote: > > > I tried to study the effect of setting the ocmode bit on the KSZ either to > > > master or to slave. The main visible change is, that some PTP message > > > types > > > are be filtered out on RX: > > > - in "master" mode, "Sync" messages from other nodes will not be received > > > (but everything else like "Announce" seem to work) > > > - in "slave" mode, "Delay_Req" messages from other nodes will not be > > > received > > Could you dump the contents of your REG_PTP_MSG_CONF2 register? > runtime register value is 0x1004 (matches default value from the data sheet). > The Linux driver doesn't touch this register. Below is a dump of all PTP > related (global) registers. So the bit 5 ("Enable Dropping of Sync/Follow_Up and Delay_Req PTP Messages") is not set. When the PTP messages are dropped, do you know which error counter in ethtool -S is increasing?
On Mon, Nov 02, 2020 at 01:41:49AM +0200, Vladimir Oltean wrote: > In principle I don't see any reason why this switch would not be able > to operate as a one-step peer delay BC. What I meant to say was "one-step E2E BC", since I was talking about having to receive both Sync and Delay_Req at the same time, of course.
On Mon, Nov 02, 2020 at 11:35:00AM +0100, Christian Eggers wrote: > Maybe my mail from October, 22 was ambiguous. I meant that despite of the > presence of filtering, a BCMA algorithm should be about to work (as no > Announce messages are filtered out). > > Additionally I said, that switching between "master" and "slave" mode could > not be done automatically by the driver, as the driver could at most detect > the presence of Sync messages (indication for master mode), but would do hard > to detect a transition to slave mode. > > I see a chance that user space (ptp4l) could configure the appropriate > "hardware filter setup" for master/slave mode. The concept that you want from user space is hard to define. You want ptp4l to make the driver aware of the port state, which is something that happens at runtime and is not part of the "hardware filter setup" as you say. Then, even if ptp4l notifies the driver of port state, E2E BC setups would still be broken because that's how the hardware works and no user space assistance can help with that. Also, what abstraction do you plan using for programming the PTP port state into the kernel. Maybe you should optimize for what you plan to use. If you need to use an E2E profile, maybe it would be worth the effort to find an appropriate abstraction for this port state thing. If you only use it for testing, then maybe it would be a good idea to keep the sysfs in your tree. > > Why am I mentioning this? Because the setting that's causing trouble for > > us is 'port state of the host port OC', which in the context of what I > > said above is nonsense. There _is_ no host port OC. There are 2 switch > > ports which can act as individual OCs, or as a BC, or as a TC. > But the switch has only one clock at all. I assume that the switch cannot be a > boundary clock, only TC seems possible. Why would a P2P BC not work? It does not require more than one clock. > As said above, having "filter setups" for E2E/P2P and for MASTER/SLAVE would > probably fit well for this kind of hardware. For E2E/P2P is one thing, for master/slave is a completely different thing.
Hi Vladimir, On Thursday, 22 October 2020, 13:32:43 CET, Vladimir Oltean wrote: > On Thu, Oct 22, 2020 at 01:11:40PM +0200, Christian Eggers wrote: > > On Thursday, 22 October 2020, 12:50:14 CEST, Vladimir Oltean wrote: > > after applying the RX timestamp correctly to the correction field > > (shifting > > the nanoseconds by 16), > > That modification should have been done anyway, since the unit of > measurement for correctionField is scaled ppb (48 bits nanoseconds, 16 > bits scaled nanoseconds), and not nanoseconds. > > > it seems that "moving" the timestamp back to the tail tag on TX is not > > required anymore. Keeping the RX timestamp simply in the correction > > field (negative value), works fine now. So this halves the effort in > > the tag_ksz driver. unfortunately I made a mistake when testing. Actually the timestamp *must* be moved from the correction field (negative) to the egress tail tag. > Ok, this makes sense. > Depending on what Richard responds, it now looks like the cleanest > approach would be to move your implementation that is currently in > ksz9477_update_ptp_correction_field() into a generic function called > > static inline void ptp_onestep_p2p_move_t2_to_correction(struct sk_buff > *skb, unsigned int ptp_type, > struct ptp_header *ptp_header, > ktime_t t2) I have implemented this in ptp_classify.h. Passing t2 instead of the correction field itself is fine for rx, but as this function is now still required for transmit, it looks a little bit misused there (see below). Shall I keep it as below, or revert it to passing value of the correction field itself? regards Christian static void ksz9477_xmit_timestamp(struct sk_buff *skb) { struct sk_buff *clone = DSA_SKB_CB(skb)->clone; struct ptp_header *ptp_hdr; u32 tstamp_raw = 0; u64 correction; if (!clone) goto out_put_tag; /* Use cached PTP header and type from ksz9477_ptp_should_tstamp(). Note * that KSZ9477_SKB_CB(clone)->ptp_header != NULL implies that this is a * Pdelay_resp message. */ ptp_hdr = KSZ9477_SKB_CB(clone)->ptp_header; if (!ptp_hdr) goto out_put_tag; correction = get_unaligned_be64(&ptp_hdr->correction); /* For PDelay_Resp messages we will likely have a negative value in the * correction field (see ksz9477_rcv()). The switch hardware cannot * correctly update such values, so it must be moved to the time stamp * field in the tail tag. */ if ((s64)correction < 0) { unsigned int ptp_type = KSZ9477_SKB_CB(clone)->ptp_type; struct timespec64 ts; u64 ns; /* Move ingress time stamp from PTP header's correction field to * tail tag. Format of the correction filed is 48 bit ns + 16 * bit fractional ns. Avoid shifting negative numbers. */ ns = -((s64)correction) >> 16; ts = ns_to_timespec64(ns); tstamp_raw = ((ts.tv_sec & 3) << 30) | ts.tv_nsec; >>> /* Set correction field to 0 (by subtracting the negative value) >>> * and update UDP checksum. >>> */ >>> ptp_onestep_p2p_move_t2_to_correction(skb, ptp_type, ptp_hdr, ns_to_ktime(-ns)); } out_put_tag: put_unaligned_be32(tstamp_raw, skb_put(skb, KSZ9477_PTP_TAG_LEN)); } Addtionally ptp_onestep_p2p_move_t2_to_correction() must be able to handle negative values: static inline void ptp_onestep_p2p_move_t2_to_correction(struct sk_buff *skb, unsigned int type, struct ptp_header *hdr, ktime_t t2) { u8 *ptr = skb_mac_header(skb); struct udphdr *uhdr = NULL; s64 ns = ktime_to_ns(t2); __be64 correction_old; s64 correction; /* previous correction value is required for checksum update. */ memcpy(&correction_old, &hdr->correction, sizeof(correction_old)); correction = (s64)be64_to_cpu(correction_old); /* PTP correction field consists of 32 bit nanoseconds and 16 bit * fractional nanoseconds. Avoid shifting negative numbers. */ >>> if (ns >= 0) >>> correction -= ns << 16; >>> else >>> correction += -ns << 16; /* write new correction value */ put_unaligned_be64((u64)correction, &hdr->correction); ... }
Sorry for getting back late to you. It did not compute when I read your email the first time around, then I let it sit for a while. On Thu, Nov 05, 2020 at 09:18:04PM +0100, Christian Eggers wrote: > unfortunately I made a mistake when testing. Actually the timestamp *must* be > moved from the correction field (negative) to the egress tail tag. That doesn't sound very good at all. I have a simple question. Can your driver, when operating as a PTP master clock, distribute a time in 2020 into your network? (you can check with "phc_ctl /dev/ptp0 get")
On Tue, Nov 10, 2020 at 03:36:02PM +0100, Christian Eggers wrote: > On Tuesday, 10 November 2020, 02:42:34 CET, Vladimir Oltean wrote: > > Sorry for getting back late to you. It did not compute when I read your > > email the first time around, then I let it sit for a while. > > > > On Thu, Nov 05, 2020 at 09:18:04PM +0100, Christian Eggers wrote: > > > unfortunately I made a mistake when testing. Actually the timestamp *must* > > > be moved from the correction field (negative) to the egress tail tag. > > That doesn't sound very good at all. > I think that is no drawback. It is implemented and works. It works, but how? The reason why I'm asking you is this. I can guess that this hardware plays by a very simple song when performing one-step TX timestamping. correctionField_New = correctionField_Old + (t_TX - t_Tail_Tag) It does this because, as far as the hardware is concerned, it needs to make no difference between event messages, be they Sync, Pdelay_Req or Pdelay_Resp messages. First, let's see if we agree what happens for a Sync. | | t1 |------\ Sync | | ------\ | | ----->| t2 Master | | Slave Clock | | Clock | | When receiving the one-step Sync message, the Slave Clock must add the correctionField to the originTimestamp in order to figure out what is the t1 it should use. Usually, the correctionField would only be updated by transparent clocks. But when using one-step TX timestamping, it is not mandatory for master clocks to send the correctionField as zero (something which is mandatory for two-step). So they don't. Here's what the IEEE 1588 standard says: -----------------------------[cut here]----------------------------- 9.5.9.3 One-step clocks ----------------------- The originTimestamp field of the Sync message shall be an estimate no worse than ±1 s of the <syncEventEgressTimestamp> excluding any fractional nanoseconds. -----------------------------[cut here]----------------------------- In practice, the Master Clock will probably set the originTimestamp to something more or less arbitrary by reading the current PTP time of the NIC, then it will let the MAC rewrite the correctionField with the precise t_TX (hardware TX timestamp) minus the value from the originTimestamp field*. *Actually parsing the packet at that rate, while also rewriting and timestamping it, might be too tricky and too late, so the MAC, instead of reading the originTimestamp from the actual Sync message, will actually require you, the driver writer, to pass them the value of the originTimestamp twice, this time also through the tail tag (or through whatever other buffer metadata that hardware might use). So this formula would do that job: correctionField_New = correctionField_Old + (t_TX - t_Tail_Tag) where t_TX is the MAC TX timestamp and t_Tail_Tag should be the approximate originTimestamp of the Sync message. I am fairly confident that this is how your hardware works, because that's also how peer delay wants to be timestamped, it seems. It's just that in the case of Pdelay_Resp, t_Tail_Tag must be set to t2 (the precise timestamp of the Pdelay_Req), in order for the correctionField for that Pdelay_Resp message to contain (t3 - t2): | | t1 |------\ Pdelay_Req | | ------\ | | ----->| t2 Clock A | | Clock B | Pdelay_Resp /------| t3 | ------ | t4 |<-----/ | | | You don't do any of that here. So what must be happening in your case is that the originTimestamp for Sync messages is always zero, and the correctionField provides the rest of t1, in its entirety. That's because in your tagger, you also leave the t_Tail_Tag as zero for Sync messages (you only touch it for peer delay). Is that ok? Well, on paper yes, because 0 + t1 = t1 + 0 = t1 (either the correctionField or the originTimestamp could be zero, and they would add up to the same number), but in practice, the bit width of the originTimestamp is 64 bits, whereas the correctionField only has 48 bits for nanoseconds, the lower 16 bits being for fixed point. So if you're going to keep your entire t1 into the correctionField, then in practice your master clock can only advertise a time that holds 48 bits worth of nanoseconds, i.e. 3 1/4 days worth of TAI starting with Jan 1st 1970. Then it would wrap around. So how come this is not what happens in your case? I don't see you update the originTimestamp nor the t_Tail_Tag for Sync messages. The arithmetic that is done on the correctionField should be simple 48-bit two's complement. It should not matter if the correctionField is positive or negative. The only case I believe it would matter that the correctionField is negative is if the arithmetic is not actually on 48 bit two's complement, but on the partial timestamps directly (i.e. 32 bit two's complement). What I'm saying is that this formula is probably calculated in hardware in two steps: correctionField_New = correctionField_Old + (t_TX - t_Tail_Tag) Step 1: t_tmp = t_TX (32 bits) - t_Tail_Tag (32 bits), using 32-bit two's complement arithmetic Step 2: correctionField_New (48 bits) = correctionField_Old (48 bits) + t_tmp (32 bits), using 48-bit two's complement arithmetic, and where t_tmp is probably treated as an unsigned integer that is zero-padded up to 48 bits I am getting the feeling that there are still some things we're missing in this series. I would not hurry to send a v2 until they're clear. I hope that you're not testing PTP just between yourself and yourself. It's really easy for bugs to cancel out.
On Tue, Nov 10, 2020 at 06:40:45PM +0200, Vladimir Oltean wrote: > I am fairly confident that this is how your hardware works, because > that's also how peer delay wants to be timestamped, it seems. So I was confident and also wrong, it appears. My KSZ9477 datasheet says: In the host-to-switch direction, the 4-byte timestamp field is always present when PTP is enabled, as shown in Figure 4-6. This is true for all packets sent by the host, including IBA packets. The host uses this field to insert the receive timestamp from PTP Pdelay_Req messages into the Pdelay_Resp messages. For all other traffic and PTP message types, the host should populate the timestamp field with zeros. Hm. Does that mean that the switch updates the originTimestamp field of the Sync frames by itself? Ok... Very interesting that they decided to introduce a field in the tail tag for a single type of PTP message. But something is still wrong if you need to special-case the negative correctionField, it looks like the arithmetic is not done on the correct number of bits, either by the driver or by the hardware. And zeroing out the correctionField of the Pdelay_Resp on transmission, to put that value into t_Tail_Tag? How can you squeeze a 48-bit value into a 32-bit value without truncation?
On Thursday, 12 November 2020, 16:28:44 CET, Christian Eggers wrote: > Hi Vladimir, > > On Tuesday, 10 November 2020, 20:32:45 CET, Vladimir Oltean wrote: > > But something is still wrong if you need to special-case the negative > > correctionField, it looks like the arithmetic is not done on the correct > > number of bits, either by the driver or by the hardware. > > I got it! I got it not! While keeping the (negative) correction field works perfect when using PTP over L2 (what I did the last weeks), this causes an unwanted side effect when using UDP: ... > User Datagram Protocol, Src Port: 319, Dst Port: 319 > > Source Port: 319 > Destination Port: 319 > Length: 62 > > Checksum: 0x2285 incorrect, should be 0x2286 (maybe caused by "UDP checksum offload"?) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > [Checksum Status: Bad] > [Stream index: 0] > [Timestamps] > > Precision Time Protocol (IEEE1588) > > 0000 .... = transportSpecific: 0x0 > .... 0011 = messageId: Peer_Delay_Resp Message (0x3) > 0000 .... = Reserved: 0 > .... 0010 = versionPTP: 2 > messageLength: 54 > subdomainNumber: 0 > Reserved: 0 > flags: 0x0000 > correction: 5579788,000000 nanoseconds > Reserved: 0 > ClockIdentity: 0x849000fffe0980f7 > SourcePortID: 1 > sequenceId: 785 > control: Other Message (5) > logMessagePeriod: 127 > requestreceiptTimestamp (seconds): 0 > requestreceiptTimestamp (nanoseconds): 0 > requestingSourcePortIdentity: 0x849000fffe0980f6 > requestingSourcePortId: 2 While correction field is ok (residential delay ~5ms, using one printk...), the UDP checksum is off by one in all PDelay_Resp messages. The KSZ device offers on option to set the UDP checksum to zero, but this also didn't help and additionally wouldn't work for IPv6. It seems that I should return to "moving T2 from the correction field to the tail tag" on tx. regards Christian