Message ID | 7af3da33a7aa540f7878cfcbf5076dcf61d201ef.1612393368.git.andreas.a.roeseler@gmail.com |
---|---|
State | New |
Headers | show |
Series | add support for RFC 8335 PROBE | expand |
On Wed, Feb 3, 2021 at 6:30 PM Andreas Roeseler <andreas.a.roeseler@gmail.com> wrote: > > Modify the icmp_rcv function to check for PROBE messages and call > icmp_echo if a PROBE request is detected. > > Modify the existing icmp_echo function to respond to both ping and PROBE > requests. > > This was tested using a custom modification of the iputils package and > wireshark. It supports IPV4 probing by name, ifindex, and probing by both IPV4 and IPV6 > addresses. It currently does not support responding to probes off the proxy node > (See RFC 8335 Section 2). > > > Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com> > --- > Changes since v1: > - Reorder variable declarations to follow coding style > - Switch to functions such as dev_get_by_name and ip_dev_find to lookup > net devices > --- > net/ipv4/icmp.c | 98 ++++++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 88 insertions(+), 10 deletions(-) > > diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c > index 396b492c804f..18f9a2a3bf59 100644 > --- a/net/ipv4/icmp.c > +++ b/net/ipv4/icmp.c > @@ -983,21 +983,85 @@ static bool icmp_redirect(struct sk_buff *skb) > > static bool icmp_echo(struct sk_buff *skb) > { > + struct icmp_bxm icmp_param; > struct net *net; > + struct net_device *dev; > + struct icmp_extobj_hdr *extobj_hdr; > + struct icmp_ext_ctype3_hdr *ctype3_hdr; > + __u8 status; nit: please maintain reverse christmas tree variable ordering > > net = dev_net(skb_dst(skb)->dev); > - if (!net->ipv4.sysctl_icmp_echo_ignore_all) { > - struct icmp_bxm icmp_param; > + /* should there be an ICMP stat for ignored echos? */ > + if (net->ipv4.sysctl_icmp_echo_ignore_all) > + return true; > + > + icmp_param.data.icmph = *icmp_hdr(skb); > + icmp_param.skb = skb; > + icmp_param.offset = 0; > + icmp_param.data_len = skb->len; > + icmp_param.head_len = sizeof(struct icmphdr); > > - icmp_param.data.icmph = *icmp_hdr(skb); > + if (icmp_param.data.icmph.type == ICMP_ECHO) { > icmp_param.data.icmph.type = ICMP_ECHOREPLY; > - icmp_param.skb = skb; > - icmp_param.offset = 0; > - icmp_param.data_len = skb->len; > - icmp_param.head_len = sizeof(struct icmphdr); > - icmp_reply(&icmp_param, skb); > + goto send_reply; > } > - /* should there be an ICMP stat for ignored echos? */ > + if (!net->ipv4.sysctl_icmp_echo_enable_probe) > + return true; > + /* We currently do not support probing off the proxy node */ > + if (!(ntohs(icmp_param.data.icmph.un.echo.sequence) & 1)) > + return true; What does this comment mean? And why does the sequence number need to be even? > + > + icmp_param.data.icmph.type = ICMP_EXT_ECHOREPLY; > + icmp_param.data.icmph.un.echo.sequence &= htons(0xFF00); Why this mask? > + extobj_hdr = (struct icmp_extobj_hdr *)(skb->data + sizeof(struct icmp_ext_hdr)); > + ctype3_hdr = (struct icmp_ext_ctype3_hdr *)(extobj_hdr + 1); It is not safe to trust the contents of unverified packets. We cannot just cast to a string and call dev_get_by_name. Need to verify packet length and data format. Also below code just casts to the expected data type at some offset. Can that be defined more formally as header structs? Like ctype3_hdr, but for other headers, as well. > + status = 0; > + switch (extobj_hdr->class_type) { > + case CTYPE_NAME: > + dev = dev_get_by_name(net, (char *)(extobj_hdr + 1)); > + break; > + case CTYPE_INDEX: > + dev = dev_get_by_index(net, ntohl(*((uint32_t *)(extobj_hdr + 1)))); > + break; > + case CTYPE_ADDR: > + switch (ntohs(ctype3_hdr->afi)) { > + case AFI_IP: > + dev = ip_dev_find(net, *(__be32 *)(ctype3_hdr + 1)); > + break; > + case AFI_IP6: > + dev = ipv6_dev_find(net, (struct in6_addr *)(ctype3_hdr + 1), dev); > + if(dev) dev_hold(dev); > + break; > + default: > + icmp_param.data.icmph.code = ICMP_EXT_MAL_QUERY; > + goto send_reply; > + } > + break; > + default: > + icmp_param.data.icmph.code = ICMP_EXT_MAL_QUERY; > + goto send_reply; > + } > + if(!dev) { > + icmp_param.data.icmph.code = ICMP_EXT_NO_IF; > + goto send_reply; > + } > + /* RFC 8335: 3 the last 8 bits of the Extended Echo Reply Message > + * are laid out as follows: > + * +-+-+-+-+-+-+-+-+ > + * |State|Res|A|4|6| > + * +-+-+-+-+-+-+-+-+ > + */ > + if (dev->flags & IFF_UP) > + status |= EXT_ECHOREPLY_ACTIVE; > + if (dev->ip_ptr->ifa_list) > + status |= EXT_ECHOREPLY_IPV4; > + if (!list_empty(&dev->ip6_ptr->addr_list)) > + status |= EXT_ECHOREPLY_IPV6; > + dev_put(dev); > + icmp_param.data.icmph.un.echo.sequence |= htons(status); > + > +send_reply: > + icmp_reply(&icmp_param, skb); > return true; > }
On Wed, 3 Feb 2021 15:24:55 -0800 Andreas Roeseler wrote: > Modify the icmp_rcv function to check for PROBE messages and call > icmp_echo if a PROBE request is detected. > > Modify the existing icmp_echo function to respond to both ping and PROBE > requests. > > This was tested using a custom modification of the iputils package and > wireshark. It supports IPV4 probing by name, ifindex, and probing by both IPV4 and IPV6 > addresses. It currently does not support responding to probes off the proxy node > (See RFC 8335 Section 2). > > > Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com> make allmodconfig && make W=1 C=1 says: ../net/ipv4/icmp.c: note: in included file (through ../include/linux/spinlock.h, ../include/linux/mmzone.h, ../include/linux/gfp.h, ../include/linux/umh.h, ../include/linux/kmod.h, ../include/linux/module.h): ../include/linux/bottom_half.h:32:30: warning: context imbalance in 'icmp_reply' - different lock contexts for basic block ../include/linux/bottom_half.h:32:30: warning: context imbalance in '__icmp_send' - different lock contexts for basic block ../net/ipv4/icmp.c:1024:45: warning: cast to restricted __be32 ../net/ipv4/icmp.c:1024:45: warning: cast to restricted __be32 ../net/ipv4/icmp.c:1024:45: warning: cast to restricted __be32 ../net/ipv4/icmp.c:1024:45: warning: cast to restricted __be32 ../net/ipv4/icmp.c:1024:45: warning: cast to restricted __be32 ../net/ipv4/icmp.c:1024:45: warning: cast to restricted __be32 ../net/ipv4/icmp.c:1027:25: warning: cast to restricted __be16 ../net/ipv4/icmp.c:1027:25: warning: cast to restricted __be16 ../net/ipv4/icmp.c:1027:25: warning: cast to restricted __be16 ../net/ipv4/icmp.c:1027:25: warning: cast to restricted __be16 ../net/ipv4/icmp.c:1058:29: warning: incorrect type in argument 1 (different address spaces) ../net/ipv4/icmp.c:1058:29: expected struct list_head const *head ../net/ipv4/icmp.c:1058:29: got struct list_head [noderef] __rcu * ../net/ipv4/icmp.c: note: in included file (through ../include/linux/spinlock.h, ../include/linux/mmzone.h, ../include/linux/gfp.h, ../include/linux/umh.h, ../include/linux/kmod.h, ../include/linux/module.h): ../include/linux/bottom_half.h:32:30: warning: context imbalance in 'icmp_reply' - different lock contexts for basic block ../include/linux/bottom_half.h:32:30: warning: context imbalance in '__icmp_send' - different lock contexts for basic block ../net/ipv4/icmp.c:1056:16: warning: dereference of noderef expression net/ipv4/icmp.o: In function `icmp_echo': icmp.c:(.text+0x123a): undefined reference to `ipv6_dev_find' make[1]: *** [vmlinux] Error 1 make: *** [__sub-make] Error 2 New errors added
On Thu, 2021-02-04 at 14:52 -0500, Willem de Bruijn wrote: > On Wed, Feb 3, 2021 at 6:30 PM Andreas Roeseler > <andreas.a.roeseler@gmail.com> wrote: > > > > Modify the icmp_rcv function to check for PROBE messages and call > > icmp_echo if a PROBE request is detected. > > > > Modify the existing icmp_echo function to respond to both ping and > > PROBE > > requests. > > > > This was tested using a custom modification of the iputils package > > and > > wireshark. It supports IPV4 probing by name, ifindex, and probing > > by both IPV4 and IPV6 > > addresses. It currently does not support responding to probes off > > the proxy node > > (See RFC 8335 Section 2). > > > > > > Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com> > > --- > > Changes since v1: > > - Reorder variable declarations to follow coding style > > - Switch to functions such as dev_get_by_name and ip_dev_find to > > lookup > > net devices > > --- > > net/ipv4/icmp.c | 98 ++++++++++++++++++++++++++++++++++++++++++++- > > ---- > > 1 file changed, 88 insertions(+), 10 deletions(-) > > > > diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c > > index 396b492c804f..18f9a2a3bf59 100644 > > --- a/net/ipv4/icmp.c > > +++ b/net/ipv4/icmp.c > > @@ -983,21 +983,85 @@ static bool icmp_redirect(struct sk_buff > > *skb) > > > > static bool icmp_echo(struct sk_buff *skb) > > { > > + struct icmp_bxm icmp_param; > > struct net *net; > > + struct net_device *dev; > > + struct icmp_extobj_hdr *extobj_hdr; > > + struct icmp_ext_ctype3_hdr *ctype3_hdr; > > + __u8 status; > > nit: please maintain reverse christmas tree variable ordering > > > > > net = dev_net(skb_dst(skb)->dev); > > - if (!net->ipv4.sysctl_icmp_echo_ignore_all) { > > - struct icmp_bxm icmp_param; > > + /* should there be an ICMP stat for ignored echos? */ > > + if (net->ipv4.sysctl_icmp_echo_ignore_all) > > + return true; > > + > > + icmp_param.data.icmph = *icmp_hdr(skb); > > + icmp_param.skb = skb; > > + icmp_param.offset = 0; > > + icmp_param.data_len = skb->len; > > + icmp_param.head_len = sizeof(struct icmphdr); > > > > - icmp_param.data.icmph = *icmp_hdr(skb); > > + if (icmp_param.data.icmph.type == ICMP_ECHO) { > > icmp_param.data.icmph.type = ICMP_ECHOREPLY; > > - icmp_param.skb = skb; > > - icmp_param.offset = 0; > > - icmp_param.data_len = skb->len; > > - icmp_param.head_len = sizeof(struct > > icmphdr); > > - icmp_reply(&icmp_param, skb); > > + goto send_reply; > > } > > - /* should there be an ICMP stat for ignored echos? */ > > + if (!net->ipv4.sysctl_icmp_echo_enable_probe) > > + return true; > > + /* We currently do not support probing off the proxy node > > */ > > + if (!(ntohs(icmp_param.data.icmph.un.echo.sequence) & 1)) > > + return true; > > What does this comment mean? > > And why does the sequence number need to be even? RFC 8335 Section 2 specifies that the ICMP Extended Echo Request messages use a modified sequence number field, as follows. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identifier |Sequence Number| Reserved |L| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The L-bit is set if the probed interface resides on the proxy node. We only support probing interfaces that lie on the proxy node since it simplifies the response handler and, if we wanted to know the status of an interface on another node, we could simply probe that node instead. This line ensures that the L-bit is set to 1. I'll clarify the wording of the comment. > > > + > > + icmp_param.data.icmph.type = ICMP_EXT_ECHOREPLY; > > + icmp_param.data.icmph.un.echo.sequence &= htons(0xFF00); > > Why this mask? This clears the status fields of the reply message to avoid sending old data back to the probing host. > > > + extobj_hdr = (struct icmp_extobj_hdr *)(skb->data + > > sizeof(struct icmp_ext_hdr)); > > + ctype3_hdr = (struct icmp_ext_ctype3_hdr *)(extobj_hdr + > > 1); > > It is not safe to trust the contents of unverified packets. We cannot > just cast to a string and call dev_get_by_name. Need to verify packet > length and data format. > Great catch, I will work on adding this into the next version. > Also below code just casts to the expected data type at some offset. > Can that be defined more formally as header structs? Like ctype3_hdr, > but for other headers, as well. > Will do. > > + status = 0; > > + switch (extobj_hdr->class_type) { > > + case CTYPE_NAME: > > + dev = dev_get_by_name(net, (char *)(extobj_hdr + > > 1)); > > + break; > > + case CTYPE_INDEX: > > + dev = dev_get_by_index(net, ntohl(*((uint32_t > > *)(extobj_hdr + 1)))); > > + break; > > + case CTYPE_ADDR: > > + switch (ntohs(ctype3_hdr->afi)) { > > + case AFI_IP: > > + dev = ip_dev_find(net, *(__be32 > > *)(ctype3_hdr + 1)); > > + break; > > + case AFI_IP6: > > + dev = ipv6_dev_find(net, (struct in6_addr > > *)(ctype3_hdr + 1), dev); > > + if(dev) dev_hold(dev); > > + break; > > + default: > > + icmp_param.data.icmph.code = > > ICMP_EXT_MAL_QUERY; > > + goto send_reply; > > + } > > + break; > > + default: > > + icmp_param.data.icmph.code = ICMP_EXT_MAL_QUERY; > > + goto send_reply; > > + } > > + if(!dev) { > > + icmp_param.data.icmph.code = ICMP_EXT_NO_IF; > > + goto send_reply; > > + } > > + /* RFC 8335: 3 the last 8 bits of the Extended Echo Reply > > Message > > + * are laid out as follows: > > + * +-+-+-+-+-+-+-+-+ > > + * |State|Res|A|4|6| > > + * +-+-+-+-+-+-+-+-+ > > + */ > > + if (dev->flags & IFF_UP) > > + status |= EXT_ECHOREPLY_ACTIVE; > > + if (dev->ip_ptr->ifa_list) > > + status |= EXT_ECHOREPLY_IPV4; > > + if (!list_empty(&dev->ip6_ptr->addr_list)) > > + status |= EXT_ECHOREPLY_IPV6; > > + dev_put(dev); > > + icmp_param.data.icmph.un.echo.sequence |= htons(status); > > + > > +send_reply: > > + icmp_reply(&icmp_param, skb); > > return true; > > }
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 396b492c804f..18f9a2a3bf59 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -983,21 +983,85 @@ static bool icmp_redirect(struct sk_buff *skb) static bool icmp_echo(struct sk_buff *skb) { + struct icmp_bxm icmp_param; struct net *net; + struct net_device *dev; + struct icmp_extobj_hdr *extobj_hdr; + struct icmp_ext_ctype3_hdr *ctype3_hdr; + __u8 status; net = dev_net(skb_dst(skb)->dev); - if (!net->ipv4.sysctl_icmp_echo_ignore_all) { - struct icmp_bxm icmp_param; + /* should there be an ICMP stat for ignored echos? */ + if (net->ipv4.sysctl_icmp_echo_ignore_all) + return true; + + icmp_param.data.icmph = *icmp_hdr(skb); + icmp_param.skb = skb; + icmp_param.offset = 0; + icmp_param.data_len = skb->len; + icmp_param.head_len = sizeof(struct icmphdr); - icmp_param.data.icmph = *icmp_hdr(skb); + if (icmp_param.data.icmph.type == ICMP_ECHO) { icmp_param.data.icmph.type = ICMP_ECHOREPLY; - icmp_param.skb = skb; - icmp_param.offset = 0; - icmp_param.data_len = skb->len; - icmp_param.head_len = sizeof(struct icmphdr); - icmp_reply(&icmp_param, skb); + goto send_reply; } - /* should there be an ICMP stat for ignored echos? */ + if (!net->ipv4.sysctl_icmp_echo_enable_probe) + return true; + /* We currently do not support probing off the proxy node */ + if (!(ntohs(icmp_param.data.icmph.un.echo.sequence) & 1)) + return true; + + icmp_param.data.icmph.type = ICMP_EXT_ECHOREPLY; + icmp_param.data.icmph.un.echo.sequence &= htons(0xFF00); + extobj_hdr = (struct icmp_extobj_hdr *)(skb->data + sizeof(struct icmp_ext_hdr)); + ctype3_hdr = (struct icmp_ext_ctype3_hdr *)(extobj_hdr + 1); + status = 0; + switch (extobj_hdr->class_type) { + case CTYPE_NAME: + dev = dev_get_by_name(net, (char *)(extobj_hdr + 1)); + break; + case CTYPE_INDEX: + dev = dev_get_by_index(net, ntohl(*((uint32_t *)(extobj_hdr + 1)))); + break; + case CTYPE_ADDR: + switch (ntohs(ctype3_hdr->afi)) { + case AFI_IP: + dev = ip_dev_find(net, *(__be32 *)(ctype3_hdr + 1)); + break; + case AFI_IP6: + dev = ipv6_dev_find(net, (struct in6_addr *)(ctype3_hdr + 1), dev); + if(dev) dev_hold(dev); + break; + default: + icmp_param.data.icmph.code = ICMP_EXT_MAL_QUERY; + goto send_reply; + } + break; + default: + icmp_param.data.icmph.code = ICMP_EXT_MAL_QUERY; + goto send_reply; + } + if(!dev) { + icmp_param.data.icmph.code = ICMP_EXT_NO_IF; + goto send_reply; + } + /* RFC 8335: 3 the last 8 bits of the Extended Echo Reply Message + * are laid out as follows: + * +-+-+-+-+-+-+-+-+ + * |State|Res|A|4|6| + * +-+-+-+-+-+-+-+-+ + */ + if (dev->flags & IFF_UP) + status |= EXT_ECHOREPLY_ACTIVE; + if (dev->ip_ptr->ifa_list) + status |= EXT_ECHOREPLY_IPV4; + if (!list_empty(&dev->ip6_ptr->addr_list)) + status |= EXT_ECHOREPLY_IPV6; + dev_put(dev); + icmp_param.data.icmph.un.echo.sequence |= htons(status); + +send_reply: + icmp_reply(&icmp_param, skb); return true; } @@ -1087,6 +1151,13 @@ int icmp_rcv(struct sk_buff *skb) icmph = icmp_hdr(skb); ICMPMSGIN_INC_STATS(net, icmph->type); + + /* + * Check for ICMP Extended Echo (PROBE) messages + */ + if (icmph->type == ICMP_EXT_ECHO || icmph->type == ICMPV6_EXT_ECHO_REQUEST) + goto probe; + /* * 18 is the highest 'known' ICMP type. Anything else is a mystery * @@ -1096,7 +1167,6 @@ int icmp_rcv(struct sk_buff *skb) if (icmph->type > NR_ICMP_TYPES) goto error; - /* * Parse the ICMP message */ @@ -1123,6 +1193,7 @@ int icmp_rcv(struct sk_buff *skb) success = icmp_pointers[icmph->type].handler(skb); +success_check: if (success) { consume_skb(skb); return NET_RX_SUCCESS; @@ -1136,6 +1207,13 @@ int icmp_rcv(struct sk_buff *skb) error: __ICMP_INC_STATS(net, ICMP_MIB_INERRORS); goto drop; +probe: + /* + * We can't use icmp_pointers[].handler() because the codes for PROBE + * messages are 42 or 160 + */ + success = icmp_echo(skb); + goto success_check; } static bool ip_icmp_error_rfc4884_validate(const struct sk_buff *skb, int off)
Modify the icmp_rcv function to check for PROBE messages and call icmp_echo if a PROBE request is detected. Modify the existing icmp_echo function to respond to both ping and PROBE requests. This was tested using a custom modification of the iputils package and wireshark. It supports IPV4 probing by name, ifindex, and probing by both IPV4 and IPV6 addresses. It currently does not support responding to probes off the proxy node (See RFC 8335 Section 2). Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com> --- Changes since v1: - Reorder variable declarations to follow coding style - Switch to functions such as dev_get_by_name and ip_dev_find to lookup net devices --- net/ipv4/icmp.c | 98 ++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 88 insertions(+), 10 deletions(-)