Message ID | cover.1624384990.git.lucien.xin@gmail.com |
---|---|
Headers | show |
Series | sctp: implement RFC8899: Packetization Layer Path MTU Discovery for SCTP transport | expand |
On Tue, Jun 22, 2021 at 6:13 PM David Laight <David.Laight@aculab.com> wrote: > > From: Xin Long > > Sent: 22 June 2021 19:05 > > > > Overview(From RFC8899): > > > > In contrast to PMTUD, Packetization Layer Path MTU Discovery > > (PLPMTUD) [RFC4821] introduces a method that does not rely upon > > reception and validation of PTB messages. It is therefore more > > robust than Classical PMTUD. This has become the recommended > > approach for implementing discovery of the PMTU [BCP145]. > > > > It uses a general strategy in which the PL sends probe packets to > > search for the largest size of unfragmented datagram that can be sent > > over a network path. Probe packets are sent to explore using a > > larger packet size. If a probe packet is successfully delivered (as > > determined by the PL), then the PLPMTU is raised to the size of the > > successful probe. If a black hole is detected (e.g., where packets > > of size PLPMTU are consistently not received), the method reduces the > > PLPMTU. > > This seems to take a long time (probably well over a minute) > to determine the mtu. I just noticed this is a misread of RFC8899, and the next probe packet should be sent immediately once the ACK of the last probe is received, instead of waiting the timeout, which should be for the missing probe. I will fix this with: diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index d29b579da904..f3aca1acf93a 100644 --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -1275,6 +1275,8 @@ enum sctp_disposition sctp_sf_backbeat_8_3(struct net *net, return SCTP_DISPOSITION_DISCARD; sctp_transport_pl_recv(link); + sctp_add_cmd_sf(commands, SCTP_CMD_PROBE_TIMER_UPDATE, + SCTP_TRANSPORT(link)); return SCTP_DISPOSITION_CONSUME; } diff --git a/net/sctp/transport.c b/net/sctp/transport.c index f27b856ea8ce..88815b98d9d0 100644 --- a/net/sctp/transport.c +++ b/net/sctp/transport.c @@ -215,6 +215,11 @@ void sctp_transport_reset_probe_timer(struct sctp_transport *transport) { int scale = 1; + if (transport->pl.probe_count == 0) { + if (!mod_timer(&transport->probe_timer, jiffies + transport->rto)) + sctp_transport_hold(transport); + return; + } if (timer_pending(&transport->probe_timer)) return; if (transport->pl.state == SCTP_PL_COMPLETE && Thanks for the comment. > > What is used for the actual mtu while this is in progress? > > Does packet loss and packet retransmission cause the mtu > to be reduced as well? No, the data packet is not a probe in this implementation. > > I can imagine that there is an expectation (from the application) > that the mtu is that of an ethernet link - perhaps less a PPPoE > header. > Starting with an mtu of 1200 will break this assumption and may > have odd side effects. Starting searching from mtu of 1200, but the real pmtu will only be updated when the search is done and optimal mtu is found. So at the beginning, it will still use the dst mtu as before. > For TCP/UDP the ICMP segmentation required error is immediate > and gets used for the retransmissions. > This code seems to be looking at separate timeouts - so a lot of > packets could get discarded and application timers expire before > if determines the correct mtu. This patch will also process ICMP error msg, and gets the 'mtu' size from it but before using it, it will verify(probe) it first: see Patch: sctp: do state transition when receiving an icmp TOOBIG packet > > Maybe I missed something about this only being done on inactive > paths? > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) >
On Wed, Jun 23, 2021 at 5:50 AM David Laight <David.Laight@aculab.com> wrote: > > From: Xin Long > > Sent: 23 June 2021 04:49 > ... > > [103] pl_send: PLPMTUD: state: 1, size: 1200, high: 0 <--[a] > > [103] pl_recv: PLPMTUD: state: 1, size: 1200, high: 0 > ... > > [103] pl_send: PLPMTUD: state: 2, size: 1456, high: 0 > > [103] pl_recv: PLPMTUD: state: 2, size: 1456, high: 0 <--[b] > > [103] pl_send: PLPMTUD: state: 2, size: 1488, high: 0 > > [108] pl_send: PLPMTUD: state: 2, size: 1488, high: 0 > > [113] pl_send: PLPMTUD: state: 2, size: 1488, high: 0 > > [118] pl_send: PLPMTUD: state: 2, size: 1488, high: 0 > > [118] pl_recv: PLPMTUD: state: 2, size: 1456, high: 1488 <---[c] > > [118] pl_send: PLPMTUD: state: 2, size: 1460, high: 1488 > > [118] pl_recv: PLPMTUD: state: 2, size: 1460, high: 1488 <--- [d] > > [118] pl_send: PLPMTUD: state: 2, size: 1464, high: 1488 > > [124] pl_send: PLPMTUD: state: 2, size: 1464, high: 1488 > > [129] pl_send: PLPMTUD: state: 2, size: 1464, high: 1488 > > [134] pl_send: PLPMTUD: state: 2, size: 1464, high: 1488 > > [134] pl_recv: PLPMTUD: state: 2, size: 1460, high: 1464 <-- around > > 30s "search complete from 1200 bytes" > > [287] pl_send: PLPMTUD: state: 3, size: 1460, high: 0 > > [287] pl_recv: PLPMTUD: state: 3, size: 1460, high: 0 > > [287] pl_send: PLPMTUD: state: 2, size: 1464, high: 0 <-- [aa] > > [292] pl_send: PLPMTUD: state: 2, size: 1464, high: 0 > > [298] pl_send: PLPMTUD: state: 2, size: 1464, high: 0 > > [303] pl_send: PLPMTUD: state: 2, size: 1464, high: 0 > > [303] pl_recv: PLPMTUD: state: 2, size: 1460, high: 1464 <--[bb] <-- > > around 15s "re-search complete from current pmtu" > > > > So since no interval to send the next probe when the ACK is received > > for the last one, > > it won't take much time from [a] to [b], and [c] to [d], > > and there are at most 2 failures to find the right pmtu, each failure > > takes 5s * 3 = 15s. > > > > when it goes back to search from search complete after a long timeout, > > it will take only 1 failure to get the right pmtu from [aa] to [bb]. > > What mtu is being used during the 'failures' ? > I hope it is the last working one. Yes, it's always the working one, which was set in search complete state. More specifically, it changes in 3 cases: a) at the beginning, it's using the dst->mtu; b) set to the optimal one in search complete state after searching is done. c) 'black hole' found, it sets to 1200, and starts to probe from 1200. d) if still fails with 1200, the mtu is set to MIN_PLPMTU, but still probes with 1200 until it succeeds. > > Also, what actually happen if the network route changes from > one that supports 1460 bytes to one that only supports 1200 > and where ICMP errors are not generated? then it's the case c) and d) above. For the "black hole" detection, I'd like to probe it with the current mtu and 3 times (15s in total) > > The first protocol retry is (probably) after 2 seconds. > But it will use the 1460 byte mtu and fail again. yeah, but 15s is the time we have to wait for confirming this is caused by really the mtu change. > > Notwithstanding the standards, what pmtu actually exist > 'in the wild' for normal networks? PLPMTUD is trying to probe the max payload of SCTP, not the real pmtu. > Are there actually any others apart from 'full sized ethernet' > and 'PPPoE'? sctp over UDP if that's what you mean. With the probe, we don't really care about the outer header, because what we get with the probe is the real available size for sctp payload. Like we probe with 1400, if it gets ACKed from the peer, the payload 1400 will be able to go though the path. > So would it actually better to send two probes one for > each of those two sizes and see which ones respond? we could only if regardless of the RFC8899: "To avoid excessive load, the interval between individual probe packets MUST be at least one RTT..." > > (I'm not sure we ever manage to send full length packets. > Our data is M3UA (mostly SMS) and sent with Nagle disabled. > So even the customers sending 1000s of SMS/sec are unlikely > to fill packets.) > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales)