Message ID | 20241029-b4-ovpn-v11-0-de4698c73a25@openvpn.net |
---|---|
Headers | show |
Series | Introducing OpenVPN Data Channel Offload | expand |
2024-10-29, 11:47:19 +0100, Antonio Quartulli wrote: > +static void ovpn_peer_release(struct ovpn_peer *peer) > +{ > + ovpn_bind_reset(peer, NULL); > + > + dst_cache_destroy(&peer->dst_cache); Is it safe to destroy the cache at this time? In the same function, we use rcu to free the peer, but AFAICT the dst_cache will be freed immediately: void dst_cache_destroy(struct dst_cache *dst_cache) { [...] free_percpu(dst_cache->cache); } (probably no real issue because ovpn_udp_send_skb gets called while we hold a reference to the peer?) > + netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker); > + kfree_rcu(peer, rcu); > +} [...] > +static int ovpn_peer_del_p2p(struct ovpn_peer *peer, > + enum ovpn_del_peer_reason reason) > + __must_hold(&peer->ovpn->lock) > +{ > + struct ovpn_peer *tmp; > + > + tmp = rcu_dereference_protected(peer->ovpn->peer, > + lockdep_is_held(&peer->ovpn->lock)); > + if (tmp != peer) { > + DEBUG_NET_WARN_ON_ONCE(1); > + if (tmp) > + ovpn_peer_put(tmp); Does peer->ovpn->peer need to be set to NULL here as well? Or is it going to survive this _put? > + > + return -ENOENT; > + } > + > + tmp->delete_reason = reason; > + RCU_INIT_POINTER(peer->ovpn->peer, NULL); > + ovpn_peer_put(tmp); > + > + return 0; > +}
On 30/10/2024 17:37, Sabrina Dubroca wrote: > 2024-10-29, 11:47:19 +0100, Antonio Quartulli wrote: >> +static void ovpn_peer_release(struct ovpn_peer *peer) >> +{ >> + ovpn_bind_reset(peer, NULL); >> + >> + dst_cache_destroy(&peer->dst_cache); > > Is it safe to destroy the cache at this time? In the same function, we > use rcu to free the peer, but AFAICT the dst_cache will be freed > immediately: > > void dst_cache_destroy(struct dst_cache *dst_cache) > { > [...] > free_percpu(dst_cache->cache); > } > > (probably no real issue because ovpn_udp_send_skb gets called while we > hold a reference to the peer?) Right. That was my assumption: release happens on refcnt = 0 only, therefore no field should be in use anymore. Anything that may still be in use will have its own refcounter. > >> + netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker); >> + kfree_rcu(peer, rcu); >> +} > > > [...] >> +static int ovpn_peer_del_p2p(struct ovpn_peer *peer, >> + enum ovpn_del_peer_reason reason) >> + __must_hold(&peer->ovpn->lock) >> +{ >> + struct ovpn_peer *tmp; >> + >> + tmp = rcu_dereference_protected(peer->ovpn->peer, >> + lockdep_is_held(&peer->ovpn->lock)); >> + if (tmp != peer) { >> + DEBUG_NET_WARN_ON_ONCE(1); >> + if (tmp) >> + ovpn_peer_put(tmp); > > Does peer->ovpn->peer need to be set to NULL here as well? Or is it > going to survive this _put? First of all consider that this is truly something that we don't expect to happen (hence the WARN_ON). If this is happening it's because we are trying to delete a peer that is not the one we are connected to (unexplainable scenario in p2p mode). Still, should we hit this case (I truly can't see how), I'd say "leave everything as is - maybe this call was just a mistake". Cheers, > >> + >> + return -ENOENT; >> + } >> + >> + tmp->delete_reason = reason; >> + RCU_INIT_POINTER(peer->ovpn->peer, NULL); >> + ovpn_peer_put(tmp); >> + >> + return 0; >> +} >
It seems some little changes to ovpn.yaml were not reflected in the generated files I committed. Specifically I changed some U32 to BE32 (IPv4 addresses) and files were not regenerated before committing. (I saw the failure in patchwork about this) It seems I'll have to send v12 after all. Cheers, On 29/10/2024 11:47, Antonio Quartulli wrote: > Notable changes from v10: > * extended commit message of 23/23 with brief description of the output > * Link to v10: https://lore.kernel.org/r/20241025-b4-ovpn-v10-0-b87530777be7@openvpn.net > > Please note that some patches were already reviewed by Andre Lunn, > Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag > since no major code modification has happened since the review. > > The latest code can also be found at: > > https://github.com/OpenVPN/linux-kernel-ovpn > > Thanks a lot! > Best Regards, > > Antonio Quartulli > OpenVPN Inc. > > --- > Antonio Quartulli (23): > netlink: add NLA_POLICY_MAX_LEN macro > net: introduce OpenVPN Data Channel Offload (ovpn) > ovpn: add basic netlink support > ovpn: add basic interface creation/destruction/management routines > ovpn: keep carrier always on > ovpn: introduce the ovpn_peer object > ovpn: introduce the ovpn_socket object > ovpn: implement basic TX path (UDP) > ovpn: implement basic RX path (UDP) > ovpn: implement packet processing > ovpn: store tunnel and transport statistics > ovpn: implement TCP transport > ovpn: implement multi-peer support > ovpn: implement peer lookup logic > ovpn: implement keepalive mechanism > ovpn: add support for updating local UDP endpoint > ovpn: add support for peer floating > ovpn: implement peer add/get/dump/delete via netlink > ovpn: implement key add/get/del/swap via netlink > ovpn: kill key and notify userspace in case of IV exhaustion > ovpn: notify userspace when a peer is deleted > ovpn: add basic ethtool support > testing/selftests: add test tool and scripts for ovpn module > > Documentation/netlink/specs/ovpn.yaml | 362 +++ > MAINTAINERS | 11 + > drivers/net/Kconfig | 14 + > drivers/net/Makefile | 1 + > drivers/net/ovpn/Makefile | 22 + > drivers/net/ovpn/bind.c | 54 + > drivers/net/ovpn/bind.h | 117 + > drivers/net/ovpn/crypto.c | 214 ++ > drivers/net/ovpn/crypto.h | 145 ++ > drivers/net/ovpn/crypto_aead.c | 386 ++++ > drivers/net/ovpn/crypto_aead.h | 33 + > drivers/net/ovpn/io.c | 462 ++++ > drivers/net/ovpn/io.h | 25 + > drivers/net/ovpn/main.c | 337 +++ > drivers/net/ovpn/main.h | 24 + > drivers/net/ovpn/netlink-gen.c | 212 ++ > drivers/net/ovpn/netlink-gen.h | 41 + > drivers/net/ovpn/netlink.c | 1135 ++++++++++ > drivers/net/ovpn/netlink.h | 18 + > drivers/net/ovpn/ovpnstruct.h | 61 + > drivers/net/ovpn/packet.h | 40 + > drivers/net/ovpn/peer.c | 1201 ++++++++++ > drivers/net/ovpn/peer.h | 165 ++ > drivers/net/ovpn/pktid.c | 130 ++ > drivers/net/ovpn/pktid.h | 87 + > drivers/net/ovpn/proto.h | 104 + > drivers/net/ovpn/skb.h | 56 + > drivers/net/ovpn/socket.c | 178 ++ > drivers/net/ovpn/socket.h | 55 + > drivers/net/ovpn/stats.c | 21 + > drivers/net/ovpn/stats.h | 47 + > drivers/net/ovpn/tcp.c | 506 +++++ > drivers/net/ovpn/tcp.h | 44 + > drivers/net/ovpn/udp.c | 406 ++++ > drivers/net/ovpn/udp.h | 26 + > include/net/netlink.h | 1 + > include/uapi/linux/if_link.h | 15 + > include/uapi/linux/ovpn.h | 109 + > include/uapi/linux/udp.h | 1 + > tools/net/ynl/ynl-gen-c.py | 4 +- > tools/testing/selftests/Makefile | 1 + > tools/testing/selftests/net/ovpn/.gitignore | 2 + > tools/testing/selftests/net/ovpn/Makefile | 17 + > tools/testing/selftests/net/ovpn/config | 10 + > tools/testing/selftests/net/ovpn/data64.key | 5 + > tools/testing/selftests/net/ovpn/ovpn-cli.c | 2370 ++++++++++++++++++++ > tools/testing/selftests/net/ovpn/tcp_peers.txt | 5 + > .../testing/selftests/net/ovpn/test-chachapoly.sh | 9 + > tools/testing/selftests/net/ovpn/test-float.sh | 9 + > tools/testing/selftests/net/ovpn/test-tcp.sh | 9 + > tools/testing/selftests/net/ovpn/test.sh | 183 ++ > tools/testing/selftests/net/ovpn/udp_peers.txt | 5 + > 52 files changed, 9494 insertions(+), 1 deletion(-) > --- > base-commit: ab101c553bc1f76a839163d1dc0d1e715ad6bb4e > change-id: 20241002-b4-ovpn-eeee35c694a2 > > Best regards,
2024-10-29, 11:47:22 +0100, Antonio Quartulli wrote: > +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) > +{ [...] > + opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); > + if (unlikely(opcode != OVPN_DATA_V2)) { > + /* DATA_V1 is not supported */ > + if (opcode == OVPN_DATA_V1) The TCP encap code passes everything that's not V2 to userspace. Why not do that with UDP as well? > + goto drop; > + > + /* unknown or control packet: let it bubble up to userspace */ > + return 1; > + } > + > + peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); > + /* some OpenVPN server implementations send data packets with the > + * peer-id set to undef. In this case we skip the peer lookup by peer-id > + * and we try with the transport address > + */ > + if (peer_id != OVPN_PEER_ID_UNDEF) { > + peer = ovpn_peer_get_by_id(ovpn, peer_id); > + if (!peer) { > + net_err_ratelimited("%s: received data from unknown peer (id: %d)\n", > + __func__, peer_id); > + goto drop; > + } > + } > + > + if (!peer) { nit: that could be an "else" combined with the previous case? > + /* data packet with undef peer-id */ > + peer = ovpn_peer_get_by_transp_addr(ovpn, skb); > + if (unlikely(!peer)) { > + net_dbg_ratelimited("%s: received data with undef peer-id from unknown source\n", > + __func__); > + goto drop; > + } > + }
On 31/10/2024 12:29, Sabrina Dubroca wrote: > 2024-10-29, 11:47:22 +0100, Antonio Quartulli wrote: >> +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) >> +{ > [...] >> + opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); >> + if (unlikely(opcode != OVPN_DATA_V2)) { >> + /* DATA_V1 is not supported */ >> + if (opcode == OVPN_DATA_V1) > > The TCP encap code passes everything that's not V2 to userspace. Why > not do that with UDP as well? If that's the case, then this is a bug in the TCP code. DATA_Vx packets are part of the data channel and userspace can't do anything with them (userspace handles the control channel only when the ovpn module is in use). I'll go check the TCP code then, because sending DATA_V1 to userspace is not expected. Thanks for noticing this discrepancy. > >> + goto drop; >> + >> + /* unknown or control packet: let it bubble up to userspace */ >> + return 1; >> + } >> + >> + peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); >> + /* some OpenVPN server implementations send data packets with the >> + * peer-id set to undef. In this case we skip the peer lookup by peer-id >> + * and we try with the transport address >> + */ >> + if (peer_id != OVPN_PEER_ID_UNDEF) { >> + peer = ovpn_peer_get_by_id(ovpn, peer_id); >> + if (!peer) { >> + net_err_ratelimited("%s: received data from unknown peer (id: %d)\n", >> + __func__, peer_id); >> + goto drop; >> + } >> + } >> + >> + if (!peer) { > > nit: that could be an "else" combined with the previous case? mhh that's true. Then I can combine the two "if (!peer)" in one block only. Thanks! Regards, > >> + /* data packet with undef peer-id */ >> + peer = ovpn_peer_get_by_transp_addr(ovpn, skb); >> + if (unlikely(!peer)) { >> + net_dbg_ratelimited("%s: received data with undef peer-id from unknown source\n", >> + __func__); >> + goto drop; >> + } >> + } >
Hello: This series was applied to netdev/net-next.git (main) by Jakub Kicinski <kuba@kernel.org>: On Tue, 29 Oct 2024 11:47:13 +0100 you wrote: > Notable changes from v10: > * extended commit message of 23/23 with brief description of the output > * Link to v10: https://lore.kernel.org/r/20241025-b4-ovpn-v10-0-b87530777be7@openvpn.net > > Please note that some patches were already reviewed by Andre Lunn, > Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag > since no major code modification has happened since the review. > > [...] Here is the summary with links: - [net-next,v11,01/23] netlink: add NLA_POLICY_MAX_LEN macro https://git.kernel.org/netdev/net-next/c/4138e9ec0093 - [net-next,v11,02/23] net: introduce OpenVPN Data Channel Offload (ovpn) (no matching commit) - [net-next,v11,03/23] ovpn: add basic netlink support (no matching commit) - [net-next,v11,04/23] ovpn: add basic interface creation/destruction/management routines (no matching commit) - [net-next,v11,05/23] ovpn: keep carrier always on (no matching commit) - [net-next,v11,06/23] ovpn: introduce the ovpn_peer object (no matching commit) - [net-next,v11,07/23] ovpn: introduce the ovpn_socket object (no matching commit) - [net-next,v11,08/23] ovpn: implement basic TX path (UDP) (no matching commit) - [net-next,v11,09/23] ovpn: implement basic RX path (UDP) (no matching commit) - [net-next,v11,10/23] ovpn: implement packet processing (no matching commit) - [net-next,v11,11/23] ovpn: store tunnel and transport statistics (no matching commit) - [net-next,v11,12/23] ovpn: implement TCP transport (no matching commit) - [net-next,v11,13/23] ovpn: implement multi-peer support (no matching commit) - [net-next,v11,14/23] ovpn: implement peer lookup logic (no matching commit) - [net-next,v11,15/23] ovpn: implement keepalive mechanism (no matching commit) - [net-next,v11,16/23] ovpn: add support for updating local UDP endpoint (no matching commit) - [net-next,v11,17/23] ovpn: add support for peer floating (no matching commit) - [net-next,v11,18/23] ovpn: implement peer add/get/dump/delete via netlink (no matching commit) - [net-next,v11,19/23] ovpn: implement key add/get/del/swap via netlink (no matching commit) - [net-next,v11,20/23] ovpn: kill key and notify userspace in case of IV exhaustion (no matching commit) - [net-next,v11,21/23] ovpn: notify userspace when a peer is deleted (no matching commit) - [net-next,v11,22/23] ovpn: add basic ethtool support (no matching commit) - [net-next,v11,23/23] testing/selftests: add test tool and scripts for ovpn module (no matching commit) You are awesome, thank you!
2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote: > +static int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer, > + const struct sockaddr_storage *ss, > + const u8 *local_ip) > + __must_hold(&peer->lock) > +{ > + struct ovpn_bind *bind; > + size_t ip_len; > + > + /* create new ovpn_bind object */ > + bind = ovpn_bind_from_sockaddr(ss); > + if (IS_ERR(bind)) > + return PTR_ERR(bind); > + > + if (local_ip) { > + if (ss->ss_family == AF_INET) { > + ip_len = sizeof(struct in_addr); > + } else if (ss->ss_family == AF_INET6) { > + ip_len = sizeof(struct in6_addr); > + } else { > + netdev_dbg(peer->ovpn->dev, "%s: invalid family for remote endpoint\n", > + __func__); ratelimited since that can be triggered from packet processing? [...] > +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb) > +{ [...] > + > + switch (family) { > + case AF_INET: > + sa = (struct sockaddr_in *)&ss; > + sa->sin_family = AF_INET; > + sa->sin_addr.s_addr = ip_hdr(skb)->saddr; > + sa->sin_port = udp_hdr(skb)->source; > + salen = sizeof(*sa); > + break; > + case AF_INET6: > + sa6 = (struct sockaddr_in6 *)&ss; > + sa6->sin6_family = AF_INET6; > + sa6->sin6_addr = ipv6_hdr(skb)->saddr; > + sa6->sin6_port = udp_hdr(skb)->source; > + sa6->sin6_scope_id = ipv6_iface_scope_id(&ipv6_hdr(skb)->saddr, > + skb->skb_iif); > + salen = sizeof(*sa6); > + break; > + default: > + goto unlock; > + } > + > + netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__, %u for peer->id? and ratelimited too, probably. (also in ovpn_peer_update_local_endpoint in the previous patch) > + peer->id, &ss); > + ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss, > + local_ip); skip the rehash if this fails? peer->bind will still be the old one so moving it to the new hash chain won't help (the lookup will fail).
2024-10-29, 11:47:27 +0100, Antonio Quartulli wrote: > struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn, > struct sk_buff *skb) > { > - struct ovpn_peer *peer = NULL; > + struct ovpn_peer *tmp, *peer = NULL; > struct sockaddr_storage ss = { 0 }; > + struct hlist_nulls_head *nhead; > + struct hlist_nulls_node *ntmp; > + size_t sa_len; > > if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss))) > return NULL; > > if (ovpn->mode == OVPN_MODE_P2P) > - peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss); > + return ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss); > + > + switch (ss.ss_family) { > + case AF_INET: > + sa_len = sizeof(struct sockaddr_in); > + break; > + case AF_INET6: > + sa_len = sizeof(struct sockaddr_in6); > + break; > + default: > + return NULL; > + } You could get rid of that switch by having ovpn_peer_skb_to_sockaddr also set sa_len (or return 0/the size). > + > + nhead = ovpn_get_hash_head(ovpn->peers->by_transp_addr, &ss, sa_len); > + > + rcu_read_lock(); > + hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead, > + hash_entry_transp_addr) { I think that's missing the retry in case we ended up in the wrong bucket due to a peer rehash? > + if (!ovpn_peer_transp_match(tmp, &ss)) > + continue; > + > + if (!ovpn_peer_hold(tmp)) > + continue; > + > + peer = tmp; > + break; > + } > + rcu_read_unlock(); > > return peer; > }
2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote: > +static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn, > + struct genl_info *info, > + struct nlattr **attrs) > +{ > + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs, > + OVPN_A_PEER_ID)) > + return -EINVAL; > + > + if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) { > + NL_SET_ERR_MSG_MOD(info->extack, > + "cannot specify both remote IPv4 or IPv6 address"); > + return -EINVAL; > + } > + > + if (!attrs[OVPN_A_PEER_REMOTE_IPV4] && > + !attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) { > + NL_SET_ERR_MSG_MOD(info->extack, > + "cannot specify remote port without IP address"); > + return -EINVAL; > + } > + > + if (!attrs[OVPN_A_PEER_REMOTE_IPV4] && > + attrs[OVPN_A_PEER_LOCAL_IPV4]) { > + NL_SET_ERR_MSG_MOD(info->extack, > + "cannot specify local IPv4 address without remote"); > + return -EINVAL; > + } > + > + if (!attrs[OVPN_A_PEER_REMOTE_IPV6] && > + attrs[OVPN_A_PEER_LOCAL_IPV6]) { I think these consistency checks should account for v4mapped addresses. With remote=v4mapped and local=v6 we'll end up with an incorrect ipv4 "local" address (taken out of the ipv6 address's first 4B by ovpn_peer_reset_sockaddr). With remote=ipv6 and local=v4mapped, we'll pass the last 4B of OVPN_A_PEER_LOCAL_IPV6 to ovpn_peer_reset_sockaddr and try to read 16B (the full ipv6 address) out of that. > + NL_SET_ERR_MSG_MOD(info->extack, > + "cannot specify local IPV6 address without remote"); > + return -EINVAL; > + } [...] > int ovpn_nl_peer_set_doit(struct sk_buff *skb, struct genl_info *info) > { [...] > + ret = ovpn_nl_peer_modify(peer, info, attrs); > + if (ret < 0) { > + ovpn_peer_put(peer); > + return ret; > + } > + > + /* ret == 1 means that VPN IPv4/6 has been modified and rehashing > + * is required > + */ > + if (ret > 0) { && mode == MP ? I don't see ovpn_nl_peer_modify checking that before returning 1, and in P2P mode ovpn->peers will be NULL. > + spin_lock_bh(&ovpn->peers->lock); > + ovpn_peer_hash_vpn_ip(peer); > + spin_unlock_bh(&ovpn->peers->lock); > + } > + > + ovpn_peer_put(peer); > + > + return 0; > +} > int ovpn_nl_peer_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) > { [...] > + } else { > + rcu_read_lock(); > + hash_for_each_rcu(ovpn->peers->by_id, bkt, peer, > + hash_entry_id) { > + /* skip already dumped peers that were dumped by > + * previous invocations > + */ > + if (last_idx > 0) { > + last_idx--; > + continue; > + } If a peer that was dumped during a previous invocation is removed in between, we'll miss one that's still present in the overall dump. I don't know how much it matters (I guses it depends on how the results of this dump are used by userspace), so I'll let you decide if this needs to be fixed immediately or if it can be ignored for now. > + > + if (ovpn_nl_send_peer(skb, info, peer, > + NETLINK_CB(cb->skb).portid, > + cb->nlh->nlmsg_seq, > + NLM_F_MULTI) < 0) > + break; > + > + /* count peers being dumped during this invocation */ > + dumped++; > + } > + rcu_read_unlock(); > + } > + > +out: > + netdev_put(ovpn->dev, &ovpn->dev_tracker); > + > + /* sum up peers dumped in this message, so that at the next invocation > + * we can continue from where we left > + */ > + cb->args[1] += dumped; > + return skb->len; > } > > int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct genl_info *info) > { > - return -EOPNOTSUPP; > + struct nlattr *attrs[OVPN_A_PEER_MAX + 1]; > + struct ovpn_struct *ovpn = info->user_ptr[0]; > + struct ovpn_peer *peer; > + u32 peer_id; > + int ret; > + > + if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER)) > + return -EINVAL; > + > + ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER], > + ovpn_peer_nl_policy, info->extack); > + if (ret) > + return ret; > + > + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs, > + OVPN_A_PEER_ID)) > + return -EINVAL; > + > + peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]); > + > + peer = ovpn_peer_get_by_id(ovpn, peer_id); > + if (!peer) maybe c/p the extack from ovpn_nl_peer_get_doit? > + return -ENOENT; > + > + netdev_dbg(ovpn->dev, "%s: peer id=%u\n", __func__, peer->id); > + ret = ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_USERSPACE); > + ovpn_peer_put(peer); > + > + return ret; > }
2024-10-30, 21:47:58 +0100, Antonio Quartulli wrote: > On 30/10/2024 17:37, Sabrina Dubroca wrote: > > 2024-10-29, 11:47:19 +0100, Antonio Quartulli wrote: > > > +static void ovpn_peer_release(struct ovpn_peer *peer) > > > +{ > > > + ovpn_bind_reset(peer, NULL); > > > + > > > + dst_cache_destroy(&peer->dst_cache); > > > > Is it safe to destroy the cache at this time? In the same function, we > > use rcu to free the peer, but AFAICT the dst_cache will be freed > > immediately: > > > > void dst_cache_destroy(struct dst_cache *dst_cache) > > { > > [...] > > free_percpu(dst_cache->cache); > > } > > > > (probably no real issue because ovpn_udp_send_skb gets called while we > > hold a reference to the peer?) > > Right. > That was my assumption: release happens on refcnt = 0 only, therefore no > field should be in use anymore. > Anything that may still be in use will have its own refcounter. My worry is that code changes over time, assumptions are forgotten, and we end up with code that was a bit odd but safe not being safe anymore. > > > > > + netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker); > > > + kfree_rcu(peer, rcu); > > > +} > > > > > > [...] > > > +static int ovpn_peer_del_p2p(struct ovpn_peer *peer, > > > + enum ovpn_del_peer_reason reason) > > > + __must_hold(&peer->ovpn->lock) > > > +{ > > > + struct ovpn_peer *tmp; > > > + > > > + tmp = rcu_dereference_protected(peer->ovpn->peer, > > > + lockdep_is_held(&peer->ovpn->lock)); > > > + if (tmp != peer) { > > > + DEBUG_NET_WARN_ON_ONCE(1); > > > + if (tmp) > > > + ovpn_peer_put(tmp); > > > > Does peer->ovpn->peer need to be set to NULL here as well? Or is it > > going to survive this _put? > > First of all consider that this is truly something that we don't expect to > happen (hence the WARN_ON). > If this is happening it's because we are trying to delete a peer that is not > the one we are connected to (unexplainable scenario in p2p mode). > > Still, should we hit this case (I truly can't see how), I'd say "leave > everything as is - maybe this call was just a mistake". Yeah, true, let's leave it. Thanks.
Hi Antonio, On 29.10.2024 12:47, Antonio Quartulli wrote: > Notable changes from v10: > * extended commit message of 23/23 with brief description of the output > * Link to v10: https://lore.kernel.org/r/20241025-b4-ovpn-v10-0-b87530777be7@openvpn.net > > Please note that some patches were already reviewed by Andre Lunn, > Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag > since no major code modification has happened since the review. > > The latest code can also be found at: > > https://github.com/OpenVPN/linux-kernel-ovpn As I promised many months ago I am starting publishing some nit picks regarding the series. The review was started when series was V3 "rebasing" the review to every next version to publish it at once. But I lost this race to the new version releasing velocity :) So, I am going to publish it patch-by-patch. Anyway you and all participants have done a great progress toward making accelerator part of the kernel. Most of considerable things already resolved so do not wait me please to finish picking every nit. Regarding "big" topics I have only two concerns: link creation using RTNL and a switch statement usage. In the corresponding thread, I asked Jiri to clarify that "should" regarding .newlink implementation. Hope he will have a chance to find a time to reply. For the 'switch' statement, I see a repeating pattern of handling mode-or family-specific cases like this: int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer) { switch (ovpn->mode) { case OVPN_MODE_MP: return ovpn_peer_add_mp(ovpn, peer); case OVPN_MODE_P2P: return ovpn_peer_add_p2p(ovpn, peer); default: return -EOPNOTSUPP; } } or void ovpn_encrypt_post(void *data, int ret) { ... switch (peer->sock->sock->sk->sk_protocol) { case IPPROTO_UDP: ovpn_udp_send_skb(peer->ovpn, peer, skb); break; case IPPROTO_TCP: ovpn_tcp_send_skb(peer, skb); break; default: /* no transport configured yet */ goto err; } ... } or void ovpn_peer_keepalive_work(...) { ... switch (ovpn->mode) { case OVPN_MODE_MP: next_run = ovpn_peer_keepalive_work_mp(ovpn, now); break; case OVPN_MODE_P2P: next_run = ovpn_peer_keepalive_work_p2p(ovpn, now); break; } ... } Did you consider to implement mode specific operations as a set of operations like this: ovpn_ops { int (*peer_add)(struct ovpn_struct *ovpn, struct ovpn_peer *peer); int (*peer_del)(struct ovpn_peer *peer, enum ovpn_del_peer_reason reason); void (*send_skb)(struct ovpn_peer *peer, struct sk_buff *skb); time64_t (*keepalive_work)(...); }; Initialize them during the interface creation and invoke these operations indirectly. E.g. int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer) { return ovpn->ops->peer_add(ovpn, peer); } void ovpn_encrypt_post(void *data, int ret) { ... ovpn->ops->send_skb(peer, skb); ... } void ovpn_peer_keepalive_work(...) { ... next_run = ovpn->ops->keepalive_work(ovpn, now); ... } Anyway the module has all these option values in advance during the network interface creation phase and I believe replacing 'switch' statements with indirect calls can make code easy to read. -- Sergey
On 29.10.2024 12:47, Antonio Quartulli wrote: > Add basic infrastructure for handling ovpn interfaces. > > Signed-off-by: Antonio Quartulli <antonio@openvpn.net> > --- > drivers/net/ovpn/main.c | 115 ++++++++++++++++++++++++++++++++++++++++-- > drivers/net/ovpn/main.h | 7 +++ > drivers/net/ovpn/ovpnstruct.h | 8 +++ > drivers/net/ovpn/packet.h | 40 +++++++++++++++ > include/uapi/linux/if_link.h | 15 ++++++ > 5 files changed, 180 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c > index d5bdb0055f4dd3a6e32dc6e792bed1e7fd59e101..eead7677b8239eb3c48bb26ca95492d88512b8d4 100644 > --- a/drivers/net/ovpn/main.c > +++ b/drivers/net/ovpn/main.c > @@ -10,18 +10,52 @@ > #include <linux/genetlink.h> > #include <linux/module.h> > #include <linux/netdevice.h> > +#include <linux/inetdevice.h> > +#include <net/ip.h> > #include <net/rtnetlink.h> > -#include <uapi/linux/ovpn.h> > +#include <uapi/linux/if_arp.h> > > #include "ovpnstruct.h" > #include "main.h" > #include "netlink.h" > #include "io.h" > +#include "packet.h" > > /* Driver info */ > #define DRV_DESCRIPTION "OpenVPN data channel offload (ovpn)" > #define DRV_COPYRIGHT "(C) 2020-2024 OpenVPN, Inc." > > +static void ovpn_struct_free(struct net_device *net) > +{ > +} nit: since this handler is not mandatory, its introduction can be moved to the later patch, which actually fills it with meaningful operations. > +static int ovpn_net_open(struct net_device *dev) > +{ > + netif_tx_start_all_queues(dev); > + return 0; > +} > + > +static int ovpn_net_stop(struct net_device *dev) > +{ > + netif_tx_stop_all_queues(dev); > + return 0; > +} > + > +static const struct net_device_ops ovpn_netdev_ops = { > + .ndo_open = ovpn_net_open, > + .ndo_stop = ovpn_net_stop, > + .ndo_start_xmit = ovpn_net_xmit, > +}; > + > +static const struct device_type ovpn_type = { > + .name = OVPN_FAMILY_NAME, nit: same question here regarding name deriviation. Are you sure that the device type name is the same as the GENL family name? > +}; > + > +static const struct nla_policy ovpn_policy[IFLA_OVPN_MAX + 1] = { > + [IFLA_OVPN_MODE] = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_P2P, > + OVPN_MODE_MP), > +}; > + > /** > * ovpn_dev_is_valid - check if the netdevice is of type 'ovpn' > * @dev: the interface to check > @@ -33,16 +67,76 @@ bool ovpn_dev_is_valid(const struct net_device *dev) > return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit; > } > > +static void ovpn_setup(struct net_device *dev) > +{ > + /* compute the overhead considering AEAD encryption */ > + const int overhead = sizeof(u32) + NONCE_WIRE_SIZE + 16 + Where are these magic sizeof(u32) and '16' came from? > + sizeof(struct udphdr) + > + max(sizeof(struct ipv6hdr), sizeof(struct iphdr)); > + > + netdev_features_t feat = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM | > + NETIF_F_GSO | NETIF_F_GSO_SOFTWARE | > + NETIF_F_HIGHDMA; > + > + dev->needs_free_netdev = true; > + > + dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS; > + > + dev->netdev_ops = &ovpn_netdev_ops; > + > + dev->priv_destructor = ovpn_struct_free; > + > + dev->hard_header_len = 0; > + dev->addr_len = 0; > + dev->mtu = ETH_DATA_LEN - overhead; > + dev->min_mtu = IPV4_MIN_MTU; > + dev->max_mtu = IP_MAX_MTU - overhead; > + > + dev->type = ARPHRD_NONE; > + dev->flags = IFF_POINTOPOINT | IFF_NOARP; > + dev->priv_flags |= IFF_NO_QUEUE; > + > + dev->lltx = true; > + dev->features |= feat; > + dev->hw_features |= feat; > + dev->hw_enc_features |= feat; > + > + dev->needed_headroom = OVPN_HEAD_ROOM; > + dev->needed_tailroom = OVPN_MAX_PADDING; > + > + SET_NETDEV_DEVTYPE(dev, &ovpn_type); > +} > + > static int ovpn_newlink(struct net *src_net, struct net_device *dev, > struct nlattr *tb[], struct nlattr *data[], > struct netlink_ext_ack *extack) > { > - return -EOPNOTSUPP; > + struct ovpn_struct *ovpn = netdev_priv(dev); > + enum ovpn_mode mode = OVPN_MODE_P2P; > + > + if (data && data[IFLA_OVPN_MODE]) { > + mode = nla_get_u8(data[IFLA_OVPN_MODE]); > + netdev_dbg(dev, "setting device mode: %u\n", mode); > + } > + > + ovpn->dev = dev; > + ovpn->mode = mode; > + > + /* turn carrier explicitly off after registration, this way state is > + * clearly defined > + */ > + netif_carrier_off(dev); > + > + return register_netdevice(dev); > } > > static struct rtnl_link_ops ovpn_link_ops = { > .kind = OVPN_FAMILY_NAME, > .netns_refund = false, > + .priv_size = sizeof(struct ovpn_struct), > + .setup = ovpn_setup, > + .policy = ovpn_policy, > + .maxtype = IFLA_OVPN_MAX, > .newlink = ovpn_newlink, > .dellink = unregister_netdevice_queue, > }; > @@ -51,26 +145,37 @@ static int ovpn_netdev_notifier_call(struct notifier_block *nb, > unsigned long state, void *ptr) > { > struct net_device *dev = netdev_notifier_info_to_dev(ptr); > + struct ovpn_struct *ovpn; > > if (!ovpn_dev_is_valid(dev)) > return NOTIFY_DONE; > > + ovpn = netdev_priv(dev); nit: netdev_priv() returns only a pointer, it is safe to fetch the pointer in advance, but do not dereference it until we are sure that an event references the desired interface type. Takin this into consideration, the assignment of private data pointer can be moved above to the variable declaration. Just to make code couple of lines shorter. > + > switch (state) { > case NETDEV_REGISTER: > - /* add device to internal list for later destruction upon > - * unregistration > - */ > + ovpn->registered = true; > break; > case NETDEV_UNREGISTER: > + /* twiddle thumbs on netns device moves */ > + if (dev->reg_state != NETREG_UNREGISTERING) > + break; > + > /* can be delivered multiple times, so check registered flag, > * then destroy the interface > */ > + if (!ovpn->registered) > + return NOTIFY_DONE; > + > + netif_carrier_off(dev); > + ovpn->registered = false; > break; > case NETDEV_POST_INIT: > case NETDEV_GOING_DOWN: > case NETDEV_DOWN: > case NETDEV_UP: > case NETDEV_PRE_UP: > + break; > default: > return NOTIFY_DONE; > } > diff --git a/drivers/net/ovpn/main.h b/drivers/net/ovpn/main.h > index a3215316c49bfcdf2496590bac878f145b8b27fd..0740a05070a817e0daea7b63a1f4fcebd274eb37 100644 > --- a/drivers/net/ovpn/main.h > +++ b/drivers/net/ovpn/main.h > @@ -12,4 +12,11 @@ > > bool ovpn_dev_is_valid(const struct net_device *dev); > > +#define SKB_HEADER_LEN \ > + (max(sizeof(struct iphdr), sizeof(struct ipv6hdr)) + \ > + sizeof(struct udphdr) + NET_SKB_PAD) > + > +#define OVPN_HEAD_ROOM ALIGN(16 + SKB_HEADER_LEN, 4) Where is this magic '16' came from? > +#define OVPN_MAX_PADDING 16 > + > #endif /* _NET_OVPN_MAIN_H_ */ > diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h > index e3e4df6418b081436378fc51d98db5bd7b5d1fbe..211df871538d34fdff90d182f21a0b0fb11b28ad 100644 > --- a/drivers/net/ovpn/ovpnstruct.h > +++ b/drivers/net/ovpn/ovpnstruct.h > @@ -11,15 +11,23 @@ > #define _NET_OVPN_OVPNSTRUCT_H_ > > #include <net/net_trackers.h> > +#include <uapi/linux/if_link.h> > +#include <uapi/linux/ovpn.h> > > /** > * struct ovpn_struct - per ovpn interface state > * @dev: the actual netdev representing the tunnel > * @dev_tracker: reference tracker for associated dev > + * @registered: whether dev is still registered with netdev or not > + * @mode: device operation mode (i.e. p2p, mp, ..) > + * @dev_list: entry for the module wide device list > */ > struct ovpn_struct { > struct net_device *dev; > netdevice_tracker dev_tracker; > + bool registered; > + enum ovpn_mode mode; > + struct list_head dev_list; dev_list is no more used and should be deleted. > }; > > #endif /* _NET_OVPN_OVPNSTRUCT_H_ */ > diff --git a/drivers/net/ovpn/packet.h b/drivers/net/ovpn/packet.h > new file mode 100644 > index 0000000000000000000000000000000000000000..7ed146f5932a25f448af6da58738a7eae81007fe > --- /dev/null > +++ b/drivers/net/ovpn/packet.h > @@ -0,0 +1,40 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* OpenVPN data channel offload > + * > + * Copyright (C) 2020-2024 OpenVPN, Inc. > + * > + * Author: Antonio Quartulli <antonio@openvpn.net> > + * James Yonan <james@openvpn.net> > + */ > + > +#ifndef _NET_OVPN_PACKET_H_ > +#define _NET_OVPN_PACKET_H_ > + > +/* When the OpenVPN protocol is ran in AEAD mode, use > + * the OpenVPN packet ID as the AEAD nonce: > + * > + * 00000005 521c3b01 4308c041 > + * [seq # ] [ nonce_tail ] > + * [ 12-byte full IV ] -> NONCE_SIZE > + * [4-bytes -> NONCE_WIRE_SIZE > + * on wire] > + */ Nice diagram! Can we go futher and define the OpenVPN packet header as a stucture? Referencing the structure instead of using magic sizes and offsets can greatly improve the code readability. Especially when it comes to header construction/parsing in the encryption/decryption code. E.g. define a structures like this: struct ovpn_pkt_hdr { __be32 op; __be32 pktid; u8 auth[]; } __attribute__((packed)); struct ovpn_aead_iv { __be32 pktid; u8 nonce[OVPN_NONCE_TAIL_SIZE]; } __attribute__((packed)); > + > +/* OpenVPN nonce size */ > +#define NONCE_SIZE 12 nit: is using the common 'OVPN_' prefix here and for other constants any good idea? E.g. OVPN_NONCE_SIZE. It can give some hints where it comes from for a code reader. And another one question. Could you clarify in the comment to this constant where it came from? AFAIU, these 12 bytes is the expectation of the nonce size of AEAD crypto protocol, rigth? > + > +/* OpenVPN nonce size reduced by 8-byte nonce tail -- this is the > + * size of the AEAD Associated Data (AD) sent over the wire > + * and is normally the head of the IV > + */ > +#define NONCE_WIRE_SIZE (NONCE_SIZE - sizeof(struct ovpn_nonce_tail)) If the headers and IV are defined as structures, we no more need this constant since the header construction will be done by a compiler according to the structure layout. > +/* Last 8 bytes of AEAD nonce > + * Provided by userspace and usually derived from > + * key material generated during TLS handshake > + */ > +struct ovpn_nonce_tail { > + u8 u8[OVPN_NONCE_TAIL_SIZE]; > +}; Why do you need a dadicated structure for this array? Can we declare the corresponding fields like this: u8 nonce_tail_xmit[OVPN_NONCE_TAIL_SIZE]; u8 nonce_tail_recv[OVPN_NONCE_TAIL_SIZE]; > +#endif /* _NET_OVPN_PACKET_H_ */ > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h > index 8516c1ccd57a7c7634a538fe3ac16c858f647420..84d294aab20b79b8e9cb9b736a074105c99338f3 100644 > --- a/include/uapi/linux/if_link.h > +++ b/include/uapi/linux/if_link.h > @@ -1975,4 +1975,19 @@ enum { > > #define IFLA_DSA_MAX (__IFLA_DSA_MAX - 1) > > +/* OVPN section */ > + > +enum ovpn_mode { > + OVPN_MODE_P2P, > + OVPN_MODE_MP, > +}; Mode min/max values can be defined here and the netlink policy can reference these values: enum ovpn_mode { OVPN_MODE_P2P, OVPN_MODE_MP, __OVPN_MODE_MAX }; #define OVPN_MODE_MIN OVPN_MODE_P2P #define OVPN_MODE_MAX (__OVPN_MODE_MAX - 1) ... = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_MIN, OVPN_MODE_MAX) > + > +enum { > + IFLA_OVPN_UNSPEC, > + IFLA_OVPN_MODE, > + __IFLA_OVPN_MAX, > +}; > + > +#define IFLA_OVPN_MAX (__IFLA_OVPN_MAX - 1) > + > #endif /* _UAPI_LINUX_IF_LINK_H */ > -- Sergey
On 29.10.2024 12:47, Antonio Quartulli wrote: > An ovpn interface will keep carrier always on and let the user > decide when an interface should be considered disconnected. > > This way, even if an ovpn interface is not connected to any peer, > it can still retain all IPs and routes and thus prevent any data > leak. > > Signed-off-by: Antonio Quartulli <antonio@openvpn.net> > Reviewed-by: Andrew Lunn <andrew@lunn.ch> > --- > drivers/net/ovpn/main.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c > index eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6 100644 > --- a/drivers/net/ovpn/main.c > +++ b/drivers/net/ovpn/main.c > @@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device *net) > > static int ovpn_net_open(struct net_device *dev) > { > + /* ovpn keeps the carrier always on to avoid losing IP or route > + * configuration upon disconnection. This way it can prevent leaks > + * of traffic outside of the VPN tunnel. > + * The user may override this behaviour by tearing down the interface > + * manually. > + */ > + netif_carrier_on(dev); If a user cares about the traffic leaking, then he can create a blackhole route with huge metric: # ip route add blackhole default metric 10000 Why the network interface should implicitly provide this functionality? And on another hand, how a routing daemon can learn a topology change without indication from the interface? > netif_tx_start_all_queues(dev); > return 0; > } >
On 29.10.2024 12:47, Antonio Quartulli wrote: > An ovpn_peer object holds the whole status of a remote peer > (regardless whether it is a server or a client). > > This includes status for crypto, tx/rx buffers, napi, etc. > > Only support for one peer is introduced (P2P mode). > Multi peer support is introduced with a later patch. Reviewing the peer creation/destroying code I came to a generic question. Did you consider keeping a single P2P peer in the peers table as well? Looks like such approach can greatly simply the code by dropping all these 'switch (ovpn->mode)' checks and implementing a unified peer management. The 'peer' field in the main private data structure can be kept to accelerate lookups, still using peers table for management tasks like removing all the peers on the interface teardown. > Along with the ovpn_peer, also the ovpn_bind object is introcued > as the two are strictly related. > An ovpn_bind object wraps a sockaddr representing the local > coordinates being used to talk to a specific peer. > > Signed-off-by: Antonio Quartulli <antonio@openvpn.net> > --- > drivers/net/ovpn/Makefile | 2 + > drivers/net/ovpn/bind.c | 58 +++++++ > drivers/net/ovpn/bind.h | 117 ++++++++++++++ Why do we need these bind.c/bind.h files? They contains a minimum of code and still anyway references the peer object. Can we merge these definitions and code into peer.c/peer.h? > drivers/net/ovpn/main.c | 11 ++ > drivers/net/ovpn/main.h | 2 + > drivers/net/ovpn/ovpnstruct.h | 4 + > drivers/net/ovpn/peer.c | 354 ++++++++++++++++++++++++++++++++++++++++++ > drivers/net/ovpn/peer.h | 79 ++++++++++ > 8 files changed, 627 insertions(+) [...] > diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c > new file mode 100644 > index 0000000000000000000000000000000000000000..b4d2ccec2ceddf43bc445b489cc62a578ef0ad0a > --- /dev/null > +++ b/drivers/net/ovpn/bind.c > @@ -0,0 +1,58 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* OpenVPN data channel offload > + * > + * Copyright (C) 2012-2024 OpenVPN, Inc. > + * > + * Author: James Yonan <james@openvpn.net> > + * Antonio Quartulli <antonio@openvpn.net> > + */ > + > +#include <linux/netdevice.h> > +#include <linux/socket.h> > + > +#include "ovpnstruct.h" > +#include "bind.h" > +#include "peer.h" > + > +/** > + * ovpn_bind_from_sockaddr - retrieve binding matching sockaddr > + * @ss: the sockaddr to match > + * > + * Return: the bind matching the passed sockaddr if found, NULL otherwise The function returns ERR_PTR() in case of error, the comment should be updated. > + */ > +struct ovpn_bind *ovpn_bind_from_sockaddr(const struct sockaddr_storage *ss) > +{ > + struct ovpn_bind *bind; > + size_t sa_len; > + > + if (ss->ss_family == AF_INET) > + sa_len = sizeof(struct sockaddr_in); > + else if (ss->ss_family == AF_INET6) > + sa_len = sizeof(struct sockaddr_in6); > + else > + return ERR_PTR(-EAFNOSUPPORT); > + > + bind = kzalloc(sizeof(*bind), GFP_ATOMIC); > + if (unlikely(!bind)) > + return ERR_PTR(-ENOMEM); > + > + memcpy(&bind->remote, ss, sa_len); > + > + return bind; > +} > + > +/** > + * ovpn_bind_reset - assign new binding to peer > + * @peer: the peer whose binding has to be replaced > + * @new: the new bind to assign > + */ > +void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *new) > +{ > + struct ovpn_bind *old; > + > + spin_lock_bh(&peer->lock); > + old = rcu_replace_pointer(peer->bind, new, true); > + spin_unlock_bh(&peer->lock); Locking will be removed from this function in the subsequent patch. Should we move the peer->lock usage to ovpn_peer_release() now? > + > + kfree_rcu(old, rcu); > +} > diff --git a/drivers/net/ovpn/bind.h b/drivers/net/ovpn/bind.h > new file mode 100644 > index 0000000000000000000000000000000000000000..859213d5040deb36c416eafcf5c6ab31c4d52c7a > --- /dev/null > +++ b/drivers/net/ovpn/bind.h > @@ -0,0 +1,117 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* OpenVPN data channel offload > + * > + * Copyright (C) 2012-2024 OpenVPN, Inc. > + * > + * Author: James Yonan <james@openvpn.net> > + * Antonio Quartulli <antonio@openvpn.net> > + */ > + > +#ifndef _NET_OVPN_OVPNBIND_H_ > +#define _NET_OVPN_OVPNBIND_H_ > + > +#include <net/ip.h> > +#include <linux/in.h> > +#include <linux/in6.h> > +#include <linux/rcupdate.h> > +#include <linux/skbuff.h> > +#include <linux/spinlock.h> > + > +struct ovpn_peer; > + > +/** > + * union ovpn_sockaddr - basic transport layer address Why do we need this dedicated named union? Can we merge this union into the ovpn_bind struct as already done for the local address? > + * @in4: IPv4 address > + * @in6: IPv6 address > + */ > +union ovpn_sockaddr { Family type can be putted here as a dedicated element to make address type check simple: unsigned short int sa_family; > + struct sockaddr_in in4; > + struct sockaddr_in6 in6; > +};> + > +/** > + * struct ovpn_bind - remote peer binding > + * @remote: the remote peer sockaddress > + * @local: local endpoint used to talk to the peer > + * @local.ipv4: local IPv4 used to talk to the peer > + * @local.ipv6: local IPv6 used to talk to the peer > + * @rcu: used to schedule RCU cleanup job > + */ > +struct ovpn_bind { > + union ovpn_sockaddr remote; /* remote sockaddr */ > + > + union { > + struct in_addr ipv4; > + struct in6_addr ipv6; > + } local; > + > + struct rcu_head rcu; > +}; > + > +/** > + * skb_protocol_to_family - translate skb->protocol to AF_INET or AF_INET6 > + * @skb: the packet sk_buff to inspect > + * > + * Return: AF_INET, AF_INET6 or 0 in case of unknown protocol > + */ > +static inline unsigned short skb_protocol_to_family(const struct sk_buff *skb) The function called outside the peer.c file only in ovpn_decrypt_post() and that call is debatable. Considering this, I belive, the skb_protocol_to_family() should be moved inside peer.c as static non-inlined function. > +{ > + switch (skb->protocol) { > + case htons(ETH_P_IP): > + return AF_INET; > + case htons(ETH_P_IPV6): > + return AF_INET6; > + default: > + return 0; > + } > +} > + > +/** > + * ovpn_bind_skb_src_match - match packet source with binding > + * @bind: the binding to match > + * @skb: the packet to match > + * > + * Return: true if the packet source matches the remote peer sockaddr > + * in the binding > + */ > +static inline bool ovpn_bind_skb_src_match(const struct ovpn_bind *bind, > + const struct sk_buff *skb) The function is called only from ovpn_peer_float() and probably should be moved into peer.c and un-inlined. > +{ > + const unsigned short family = skb_protocol_to_family(skb); > + const union ovpn_sockaddr *remote; > + > + if (unlikely(!bind)) > + return false; The caller ovpn_peer_float() function already verified bind object pointer, why we should redo the same check here? > + > + remote = &bind->remote; > + > + if (unlikely(remote->in4.sin_family != family)) > + return false; > + > + switch (family) { > + case AF_INET: > + if (unlikely(remote->in4.sin_addr.s_addr != ip_hdr(skb)->saddr)) > + return false; > + > + if (unlikely(remote->in4.sin_port != udp_hdr(skb)->source)) > + return false; > + break; > + case AF_INET6: > + if (unlikely(!ipv6_addr_equal(&remote->in6.sin6_addr, > + &ipv6_hdr(skb)->saddr))) > + return false; > + > + if (unlikely(remote->in6.sin6_port != udp_hdr(skb)->source)) > + return false; > + break; > + default: > + return false; > + } > + > + return true; > +} > + > +struct ovpn_bind *ovpn_bind_from_sockaddr(const struct sockaddr_storage *sa); > +void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *bind); > + > +#endif /* _NET_OVPN_OVPNBIND_H_ */ > diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c > index eaa83a8662e4ac2c758201008268f9633643c0b6..5492ce07751d135c1484fe1ed8227c646df94969 100644 > --- a/drivers/net/ovpn/main.c > +++ b/drivers/net/ovpn/main.c > @@ -20,6 +20,7 @@ > #include "netlink.h" > #include "io.h" > #include "packet.h" > +#include "peer.h" > > /* Driver info */ > #define DRV_DESCRIPTION "OpenVPN data channel offload (ovpn)" > @@ -29,6 +30,11 @@ static void ovpn_struct_free(struct net_device *net) > { > } > > +static int ovpn_net_init(struct net_device *dev) > +{ > + return 0; > +} The function is not required. Can we move its introduction to 'implement basic RX path (UDP)' patch, where its content will be added and where counterpart ovpn_net_uninit() function will be introduced? > + > static int ovpn_net_open(struct net_device *dev) > { > /* ovpn keeps the carrier always on to avoid losing IP or route > @@ -49,6 +55,7 @@ static int ovpn_net_stop(struct net_device *dev) > } > > static const struct net_device_ops ovpn_netdev_ops = { > + .ndo_init = ovpn_net_init, > .ndo_open = ovpn_net_open, > .ndo_stop = ovpn_net_stop, > .ndo_start_xmit = ovpn_net_xmit, > @@ -128,6 +135,7 @@ static int ovpn_newlink(struct net *src_net, struct net_device *dev, > > ovpn->dev = dev; > ovpn->mode = mode; > + spin_lock_init(&ovpn->lock); > > /* turn carrier explicitly off after registration, this way state is > * clearly defined > @@ -176,6 +184,9 @@ static int ovpn_netdev_notifier_call(struct notifier_block *nb, > > netif_carrier_off(dev); > ovpn->registered = false; > + > + if (ovpn->mode == OVPN_MODE_P2P) > + ovpn_peer_release_p2p(ovpn); > break; > case NETDEV_POST_INIT: > case NETDEV_GOING_DOWN: > diff --git a/drivers/net/ovpn/main.h b/drivers/net/ovpn/main.h > index 0740a05070a817e0daea7b63a1f4fcebd274eb37..28e5c44816e110974333a7a6a9cf18bd15ae84e6 100644 > --- a/drivers/net/ovpn/main.h > +++ b/drivers/net/ovpn/main.h > @@ -19,4 +19,6 @@ bool ovpn_dev_is_valid(const struct net_device *dev); > #define OVPN_HEAD_ROOM ALIGN(16 + SKB_HEADER_LEN, 4) > #define OVPN_MAX_PADDING 16 > > +#define OVPN_QUEUE_LEN 1024 This macro is unused, should we drop it? > + > #endif /* _NET_OVPN_MAIN_H_ */ > diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h > index 211df871538d34fdff90d182f21a0b0fb11b28ad..a22c5083381c131db01a28c0f51e661d690d4998 100644 > --- a/drivers/net/ovpn/ovpnstruct.h > +++ b/drivers/net/ovpn/ovpnstruct.h > @@ -20,6 +20,8 @@ > * @dev_tracker: reference tracker for associated dev > * @registered: whether dev is still registered with netdev or not > * @mode: device operation mode (i.e. p2p, mp, ..) > + * @lock: protect this object > + * @peer: in P2P mode, this is the only remote peer > * @dev_list: entry for the module wide device list > */ > struct ovpn_struct { > @@ -27,6 +29,8 @@ struct ovpn_struct { > netdevice_tracker dev_tracker; > bool registered; > enum ovpn_mode mode; > + spinlock_t lock; /* protect writing to the ovpn_struct object */ > + struct ovpn_peer __rcu *peer; > struct list_head dev_list; > }; > > diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c > new file mode 100644 > index 0000000000000000000000000000000000000000..d9788a0cc99b5839c466c35d1b2266cc6b95fb72 > --- /dev/null > +++ b/drivers/net/ovpn/peer.c > @@ -0,0 +1,354 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* OpenVPN data channel offload > + * > + * Copyright (C) 2020-2024 OpenVPN, Inc. > + * > + * Author: James Yonan <james@openvpn.net> > + * Antonio Quartulli <antonio@openvpn.net> > + */ > + > +#include <linux/skbuff.h> > +#include <linux/list.h> > + > +#include "ovpnstruct.h" > +#include "bind.h" > +#include "io.h" > +#include "main.h" > +#include "netlink.h" > +#include "peer.h" > + > +/** > + * ovpn_peer_new - allocate and initialize a new peer object > + * @ovpn: the openvpn instance inside which the peer should be created > + * @id: the ID assigned to this peer > + * > + * Return: a pointer to the new peer on success or an error code otherwise > + */ > +struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id) > +{ > + struct ovpn_peer *peer; > + int ret; > + > + /* alloc and init peer object */ > + peer = kzalloc(sizeof(*peer), GFP_KERNEL); > + if (!peer) > + return ERR_PTR(-ENOMEM); > + > + peer->id = id; > + peer->halt = false; > + peer->ovpn = ovpn; > + > + peer->vpn_addrs.ipv4.s_addr = htonl(INADDR_ANY); > + peer->vpn_addrs.ipv6 = in6addr_any; > + > + RCU_INIT_POINTER(peer->bind, NULL); > + spin_lock_init(&peer->lock); > + kref_init(&peer->refcount); > + > + ret = dst_cache_init(&peer->dst_cache, GFP_KERNEL); > + if (ret < 0) { > + netdev_err(ovpn->dev, "%s: cannot initialize dst cache\n", > + __func__); > + kfree(peer); > + return ERR_PTR(ret); > + } > + > + netdev_hold(ovpn->dev, &ovpn->dev_tracker, GFP_KERNEL); > + > + return peer; > +} > + > +/** > + * ovpn_peer_release - release peer private members > + * @peer: the peer to release > + */ > +static void ovpn_peer_release(struct ovpn_peer *peer) > +{ > + ovpn_bind_reset(peer, NULL); > + > + dst_cache_destroy(&peer->dst_cache); > + netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker); > + kfree_rcu(peer, rcu); > +} > + > +/** > + * ovpn_peer_release_kref - callback for kref_put > + * @kref: the kref object belonging to the peer > + */ > +void ovpn_peer_release_kref(struct kref *kref) > +{ > + struct ovpn_peer *peer = container_of(kref, struct ovpn_peer, refcount); > + > + ovpn_peer_release(peer); > +} > + > +/** > + * ovpn_peer_skb_to_sockaddr - fill sockaddr with skb source address > + * @skb: the packet to extract data from > + * @ss: the sockaddr to fill > + * > + * Return: true on success or false otherwise > + */ > +static bool ovpn_peer_skb_to_sockaddr(struct sk_buff *skb, > + struct sockaddr_storage *ss) > +{ > + struct sockaddr_in6 *sa6; > + struct sockaddr_in *sa4; > + > + ss->ss_family = skb_protocol_to_family(skb); Why do we need skb_protocol_to_family() call? Can we directly use skb->protocol and ETH_P_IP/ETH_P_IPV6 in the switch? > + switch (ss->ss_family) { > + case AF_INET: > + sa4 = (struct sockaddr_in *)ss; > + sa4->sin_family = AF_INET; > + sa4->sin_addr.s_addr = ip_hdr(skb)->saddr; > + sa4->sin_port = udp_hdr(skb)->source; > + break; > + case AF_INET6: > + sa6 = (struct sockaddr_in6 *)ss; > + sa6->sin6_family = AF_INET6; > + sa6->sin6_addr = ipv6_hdr(skb)->saddr; > + sa6->sin6_port = udp_hdr(skb)->source; > + break; > + default: > + return false; > + } > + > + return true; > +} > + > +/** > + * ovpn_peer_transp_match - check if sockaddr and peer binding match > + * @peer: the peer to get the binding from > + * @ss: the sockaddr to match > + * > + * Return: true if sockaddr and binding match or false otherwise > + */ > +static bool ovpn_peer_transp_match(const struct ovpn_peer *peer, > + const struct sockaddr_storage *ss) > +{ > + struct ovpn_bind *bind = rcu_dereference(peer->bind); > + struct sockaddr_in6 *sa6; > + struct sockaddr_in *sa4; > + > + if (unlikely(!bind)) > + return false; > + > + if (ss->ss_family != bind->remote.in4.sin_family) nit: if the dedicated 'sa_family' element is added into the union, the check can be more straighforward (without 'in4' access): if (ss->ss_family != bind->remote.sa_family) > + return false; > + > + switch (ss->ss_family) { > + case AF_INET: > + sa4 = (struct sockaddr_in *)ss; > + if (sa4->sin_addr.s_addr != bind->remote.in4.sin_addr.s_addr) > + return false; > + if (sa4->sin_port != bind->remote.in4.sin_port) > + return false; > + break; > + case AF_INET6: > + sa6 = (struct sockaddr_in6 *)ss; > + if (!ipv6_addr_equal(&sa6->sin6_addr, > + &bind->remote.in6.sin6_addr)) > + return false; > + if (sa6->sin6_port != bind->remote.in6.sin6_port) > + return false; > + break; > + default: > + return false; > + } > + > + return true; > +} > + > +/** > + * ovpn_peer_get_by_transp_addr_p2p - get peer by transport address in a P2P > + * instance > + * @ovpn: the openvpn instance to search > + * @ss: the transport socket address > + * > + * Return: the peer if found or NULL otherwise > + */ > +static struct ovpn_peer * > +ovpn_peer_get_by_transp_addr_p2p(struct ovpn_struct *ovpn, > + struct sockaddr_storage *ss) > +{ > + struct ovpn_peer *tmp, *peer = NULL; > + > + rcu_read_lock(); > + tmp = rcu_dereference(ovpn->peer); > + if (likely(tmp && ovpn_peer_transp_match(tmp, ss) && > + ovpn_peer_hold(tmp))) > + peer = tmp; > + rcu_read_unlock(); > + > + return peer; > +} > + > +/** > + * ovpn_peer_get_by_transp_addr - retrieve peer by transport address > + * @ovpn: the openvpn instance to search > + * @skb: the skb to retrieve the source transport address from > + * > + * Return: a pointer to the peer if found or NULL otherwise > + */ > +struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn, > + struct sk_buff *skb) > +{ > + struct ovpn_peer *peer = NULL; > + struct sockaddr_storage ss = { 0 }; nit: reverse x-mas tree order, please. > + > + if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss))) > + return NULL; > + > + if (ovpn->mode == OVPN_MODE_P2P) > + peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss); > + > + return peer; > +} > + > +/** > + * ovpn_peer_get_by_id_p2p - get peer by ID in a P2P instance > + * @ovpn: the openvpn instance to search > + * @peer_id: the ID of the peer to find > + * > + * Return: the peer if found or NULL otherwise > + */ > +static struct ovpn_peer *ovpn_peer_get_by_id_p2p(struct ovpn_struct *ovpn, > + u32 peer_id) > +{ > + struct ovpn_peer *tmp, *peer = NULL; > + > + rcu_read_lock(); > + tmp = rcu_dereference(ovpn->peer); > + if (likely(tmp && tmp->id == peer_id && ovpn_peer_hold(tmp))) > + peer = tmp; > + rcu_read_unlock(); > + > + return peer; > +} > + > +/** > + * ovpn_peer_get_by_id - retrieve peer by ID > + * @ovpn: the openvpn instance to search > + * @peer_id: the unique peer identifier to match > + * > + * Return: a pointer to the peer if found or NULL otherwise > + */ > +struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id) > +{ > + struct ovpn_peer *peer = NULL; > + > + if (ovpn->mode == OVPN_MODE_P2P) > + peer = ovpn_peer_get_by_id_p2p(ovpn, peer_id); > + > + return peer; > +} > + > +/** > + * ovpn_peer_add_p2p - add peer to related tables in a P2P instance > + * @ovpn: the instance to add the peer to > + * @peer: the peer to add > + * > + * Return: 0 on success or a negative error code otherwise > + */ > +static int ovpn_peer_add_p2p(struct ovpn_struct *ovpn, struct ovpn_peer *peer) > +{ > + struct ovpn_peer *tmp; > + > + spin_lock_bh(&ovpn->lock); > + /* in p2p mode it is possible to have a single peer only, therefore the > + * old one is released and substituted by the new one > + */ > + tmp = rcu_dereference_protected(ovpn->peer, > + lockdep_is_held(&ovpn->lock)); > + if (tmp) { > + tmp->delete_reason = OVPN_DEL_PEER_REASON_TEARDOWN; > + ovpn_peer_put(tmp); > + } > + > + rcu_assign_pointer(ovpn->peer, peer); nit: the rcu_dereference_protected() + rcu_assign_pointer() calls can be replaced with a single rcu_replace_pointer() call. > + spin_unlock_bh(&ovpn->lock); > + > + return 0; > +} > + > +/** > + * ovpn_peer_add - add peer to the related tables > + * @ovpn: the openvpn instance the peer belongs to > + * @peer: the peer object to add > + * > + * Assume refcounter was increased by caller > + * > + * Return: 0 on success or a negative error code otherwise > + */ > +int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer) > +{ > + switch (ovpn->mode) { > + case OVPN_MODE_P2P: > + return ovpn_peer_add_p2p(ovpn, peer); > + default: > + return -EOPNOTSUPP; > + } > +} > + > +/** > + * ovpn_peer_del_p2p - delete peer from related tables in a P2P instance > + * @peer: the peer to delete > + * @reason: reason why the peer was deleted (sent to userspace) > + * > + * Return: 0 on success or a negative error code otherwise > + */ > +static int ovpn_peer_del_p2p(struct ovpn_peer *peer, > + enum ovpn_del_peer_reason reason) > + __must_hold(&peer->ovpn->lock) > +{ > + struct ovpn_peer *tmp; > + > + tmp = rcu_dereference_protected(peer->ovpn->peer, > + lockdep_is_held(&peer->ovpn->lock)); > + if (tmp != peer) { > + DEBUG_NET_WARN_ON_ONCE(1); nit: the above two lines can be simplified to: if (DEBUG_NET_WARN_ON_ONCE(tmp != peer)) { > + if (tmp) > + ovpn_peer_put(tmp); > + > + return -ENOENT; > + } > + > + tmp->delete_reason = reason; > + RCU_INIT_POINTER(peer->ovpn->peer, NULL); > + ovpn_peer_put(tmp); > + > + return 0; > +} > + > +/** > + * ovpn_peer_release_p2p - release peer upon P2P device teardown > + * @ovpn: the instance being torn down > + */ > +void ovpn_peer_release_p2p(struct ovpn_struct *ovpn) > +{ > + struct ovpn_peer *tmp; > + > + spin_lock_bh(&ovpn->lock); > + tmp = rcu_dereference_protected(ovpn->peer, > + lockdep_is_held(&ovpn->lock)); > + if (tmp) > + ovpn_peer_del_p2p(tmp, OVPN_DEL_PEER_REASON_TEARDOWN); > + spin_unlock_bh(&ovpn->lock); > +} > + > +/** > + * ovpn_peer_del - delete peer from related tables > + * @peer: the peer object to delete > + * @reason: reason for deleting peer (will be sent to userspace) > + * > + * Return: 0 on success or a negative error code otherwise > + */ > +int ovpn_peer_del(struct ovpn_peer *peer, enum ovpn_del_peer_reason reason) > +{ > + switch (peer->ovpn->mode) { > + case OVPN_MODE_P2P: > + return ovpn_peer_del_p2p(peer, reason); > + default: > + return -EOPNOTSUPP; > + } > +} > diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h > new file mode 100644 > index 0000000000000000000000000000000000000000..6e0c6b14559de886d0677117f5a7ae029214e1f8 > --- /dev/null > +++ b/drivers/net/ovpn/peer.h > @@ -0,0 +1,79 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* OpenVPN data channel offload > + * > + * Copyright (C) 2020-2024 OpenVPN, Inc. > + * > + * Author: James Yonan <james@openvpn.net> > + * Antonio Quartulli <antonio@openvpn.net> > + */ > + > +#ifndef _NET_OVPN_OVPNPEER_H_ > +#define _NET_OVPN_OVPNPEER_H_ > + > +#include <net/dst_cache.h> > + > +/** > + * struct ovpn_peer - the main remote peer object > + * @ovpn: main openvpn instance this peer belongs to > + * @id: unique identifier > + * @vpn_addrs: IP addresses assigned over the tunnel > + * @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel > + * @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel > + * @dst_cache: cache for dst_entry used to send to peer > + * @bind: remote peer binding > + * @halt: true if ovpn_peer_mark_delete was called > + * @delete_reason: why peer was deleted (i.e. timeout, transport error, ..) > + * @lock: protects binding to peer (bind) > + * @refcount: reference counter > + * @rcu: used to free peer in an RCU safe way > + * @delete_work: deferred cleanup work, used to notify userspace > + */ > +struct ovpn_peer { > + struct ovpn_struct *ovpn; > + u32 id; > + struct { > + struct in_addr ipv4; > + struct in6_addr ipv6; > + } vpn_addrs; > + struct dst_cache dst_cache; > + struct ovpn_bind __rcu *bind; > + bool halt; > + enum ovpn_del_peer_reason delete_reason; > + spinlock_t lock; /* protects bind */ > + struct kref refcount; > + struct rcu_head rcu; > + struct work_struct delete_work; > +}; > + > +/** > + * ovpn_peer_hold - increase reference counter > + * @peer: the peer whose counter should be increased > + * > + * Return: true if the counter was increased or false if it was zero already > + */ > +static inline bool ovpn_peer_hold(struct ovpn_peer *peer) > +{ > + return kref_get_unless_zero(&peer->refcount); > +} > + > +void ovpn_peer_release_kref(struct kref *kref); > + > +/** > + * ovpn_peer_put - decrease reference counter > + * @peer: the peer whose counter should be decreased > + */ > +static inline void ovpn_peer_put(struct ovpn_peer *peer) > +{ > + kref_put(&peer->refcount, ovpn_peer_release_kref); > +} > + > +struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id); > +int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer); > +int ovpn_peer_del(struct ovpn_peer *peer, enum ovpn_del_peer_reason reason); > +void ovpn_peer_release_p2p(struct ovpn_struct *ovpn); > + > +struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn, > + struct sk_buff *skb); > +struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id); > + > +#endif /* _NET_OVPN_OVPNPEER_H_ */ >
On 29.10.2024 12:47, Antonio Quartulli wrote: [...] > +static void ovpn_peer_release(struct ovpn_peer *peer) > +{ > + ovpn_bind_reset(peer, NULL); > + nit: this empty line after ovpn_bind_reset() is removed in the 'implement basic TX path (UDP)' patch. What tricks git and it produces a sensless diff with 'ovpn_bind_reset(...)' line beeing removed and then introduced again. If you do not like this empty line then remove it here, please :) > + dst_cache_destroy(&peer->dst_cache); > + netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker); > + kfree_rcu(peer, rcu); > +} -- Sergey
Missed the most essential note regarding this patch :) On 29.10.2024 12:47, Antonio Quartulli wrote: > +static int ovpn_net_open(struct net_device *dev) > +{ > + netif_tx_start_all_queues(dev); > + return 0; > +} > + > +static int ovpn_net_stop(struct net_device *dev) > +{ > + netif_tx_stop_all_queues(dev); Here we stop a user generated traffic in downlink. Shall we take care about other kinds of traffic: keepalive, uplink? I believe we should remove all the peers here or at least stop the keepalive generation. But peers removing is better since administratively down is administratively down, meaning user expected full traffic stop in any direction. And even if we only stop the keepalive generation then peer(s) anyway will destroy the tunnel on their side. This way we even should not care about peers removing on the device unregistering. What do you think? > + return 0; > +}
On 29.10.2024 12:47, Antonio Quartulli wrote: > Packets received over the socket are forwarded to the user device. > > Implementation is UDP only. TCP will be added by a later patch. > > Note: no decryption/decapsulation exists yet, packets are forwarded as > they arrive without much processing. > > Signed-off-by: Antonio Quartulli <antonio@openvpn.net> > --- > drivers/net/ovpn/io.c | 66 ++++++++++++++++++++++++++- > drivers/net/ovpn/io.h | 2 + > drivers/net/ovpn/main.c | 13 +++++- > drivers/net/ovpn/ovpnstruct.h | 3 ++ > drivers/net/ovpn/proto.h | 75 ++++++++++++++++++++++++++++++ > drivers/net/ovpn/socket.c | 24 ++++++++++ > drivers/net/ovpn/udp.c | 104 +++++++++++++++++++++++++++++++++++++++++- > drivers/net/ovpn/udp.h | 3 +- > 8 files changed, 286 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c > index 77ba4d33ae0bd2f52e8bd1c06a182d24285297b4..791a1b117125118b179cb13cdfd5fbab6523a360 100644 > --- a/drivers/net/ovpn/io.c > +++ b/drivers/net/ovpn/io.c > @@ -9,15 +9,79 @@ > > #include <linux/netdevice.h> > #include <linux/skbuff.h> > +#include <net/gro_cells.h> > #include <net/gso.h> > > -#include "io.h" > #include "ovpnstruct.h" > #include "peer.h" > +#include "io.h" > +#include "netlink.h" > +#include "proto.h" > #include "udp.h" > #include "skb.h" > #include "socket.h" > > +/* Called after decrypt to write the IP packet to the device. > + * This method is expected to manage/free the skb. > + */ > +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb) > +{ > + unsigned int pkt_len; > + > + /* we can't guarantee the packet wasn't corrupted before entering the > + * VPN, therefore we give other layers a chance to check that > + */ > + skb->ip_summed = CHECKSUM_NONE; > + > + /* skb hash for transport packet no longer valid after decapsulation */ > + skb_clear_hash(skb); > + > + /* post-decrypt scrub -- prepare to inject encapsulated packet onto the > + * interface, based on __skb_tunnel_rx() in dst.h > + */ > + skb->dev = peer->ovpn->dev; > + skb_set_queue_mapping(skb, 0); > + skb_scrub_packet(skb, true); > + The skb->protocol field is going to be updated in the upcoming patch in the caller (ovpn_decrypt_post). Shall we put a comment here clarifying, why do not touch the protocol field here? > + skb_reset_network_header(skb); ovpn_decrypt_post() already reseted the network header. Why do we need it here again? > + skb_reset_transport_header(skb); > + skb_probe_transport_header(skb); > + skb_reset_inner_headers(skb); > + > + memset(skb->cb, 0, sizeof(skb->cb)); Why do we need to zero the control buffer here? > + /* cause packet to be "received" by the interface */ > + pkt_len = skb->len; > + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, > + skb) == NET_RX_SUCCESS)) > + /* update RX stats with the size of decrypted packet */ > + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); > +} > + > +static void ovpn_decrypt_post(struct sk_buff *skb, int ret) > +{ > + struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer; > + > + if (unlikely(ret < 0)) > + goto drop; > + > + ovpn_netdev_write(peer, skb); > + /* skb is passed to upper layer - don't free it */ > + skb = NULL; > +drop: > + if (unlikely(skb)) > + dev_core_stats_rx_dropped_inc(peer->ovpn->dev); > + ovpn_peer_put(peer); > + kfree_skb(skb); > +} > + > +/* pick next packet from RX queue, decrypt and forward it to the device */ The function now receives packets from externel callers. Should we update the above comment? > +void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb) > +{ > + ovpn_skb_cb(skb)->peer = peer; > + ovpn_decrypt_post(skb, 0); > +} > + > static void ovpn_encrypt_post(struct sk_buff *skb, int ret) > { > struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer; > diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h > index aa259be66441f7b0262f39da12d6c3dce0a9b24c..9667a0a470e0b4b427524fffb5b9b395007e5a2f 100644 > --- a/drivers/net/ovpn/io.h > +++ b/drivers/net/ovpn/io.h > @@ -12,4 +12,6 @@ > > netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev); > > +void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb); > + > #endif /* _NET_OVPN_OVPN_H_ */ > diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c > index 5492ce07751d135c1484fe1ed8227c646df94969..73348765a8cf24321aa6be78e75f607d6dbffb1d 100644 > --- a/drivers/net/ovpn/main.c > +++ b/drivers/net/ovpn/main.c > @@ -11,6 +11,7 @@ > #include <linux/module.h> > #include <linux/netdevice.h> > #include <linux/inetdevice.h> > +#include <net/gro_cells.h> > #include <net/ip.h> > #include <net/rtnetlink.h> > #include <uapi/linux/if_arp.h> > @@ -32,7 +33,16 @@ static void ovpn_struct_free(struct net_device *net) > > static int ovpn_net_init(struct net_device *dev) > { > - return 0; > + struct ovpn_struct *ovpn = netdev_priv(dev); > + > + return gro_cells_init(&ovpn->gro_cells, dev); > +} > + > +static void ovpn_net_uninit(struct net_device *dev) > +{ > + struct ovpn_struct *ovpn = netdev_priv(dev); > + > + gro_cells_destroy(&ovpn->gro_cells); > } > > static int ovpn_net_open(struct net_device *dev) > @@ -56,6 +66,7 @@ static int ovpn_net_stop(struct net_device *dev) > > static const struct net_device_ops ovpn_netdev_ops = { > .ndo_init = ovpn_net_init, > + .ndo_uninit = ovpn_net_uninit, > .ndo_open = ovpn_net_open, > .ndo_stop = ovpn_net_stop, > .ndo_start_xmit = ovpn_net_xmit, > diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h > index a22c5083381c131db01a28c0f51e661d690d4998..4a48fc048890ab1cda78bc104fe3034b4a49d226 100644 > --- a/drivers/net/ovpn/ovpnstruct.h > +++ b/drivers/net/ovpn/ovpnstruct.h > @@ -10,6 +10,7 @@ > #ifndef _NET_OVPN_OVPNSTRUCT_H_ > #define _NET_OVPN_OVPNSTRUCT_H_ > > +#include <net/gro_cells.h> > #include <net/net_trackers.h> > #include <uapi/linux/if_link.h> > #include <uapi/linux/ovpn.h> > @@ -23,6 +24,7 @@ > * @lock: protect this object > * @peer: in P2P mode, this is the only remote peer > * @dev_list: entry for the module wide device list > + * @gro_cells: pointer to the Generic Receive Offload cell > */ > struct ovpn_struct { > struct net_device *dev; > @@ -32,6 +34,7 @@ struct ovpn_struct { > spinlock_t lock; /* protect writing to the ovpn_struct object */ > struct ovpn_peer __rcu *peer; > struct list_head dev_list; > + struct gro_cells gro_cells; > }; > > #endif /* _NET_OVPN_OVPNSTRUCT_H_ */ > diff --git a/drivers/net/ovpn/proto.h b/drivers/net/ovpn/proto.h > new file mode 100644 > index 0000000000000000000000000000000000000000..69604cf26bbf82539ee5cd5a7ac9c23920f555de > --- /dev/null > +++ b/drivers/net/ovpn/proto.h > @@ -0,0 +1,75 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* OpenVPN data channel offload > + * > + * Copyright (C) 2020-2024 OpenVPN, Inc. > + * > + * Author: Antonio Quartulli <antonio@openvpn.net> > + * James Yonan <james@openvpn.net> > + */ > + > +#ifndef _NET_OVPN_OVPNPROTO_H_ > +#define _NET_OVPN_OVPNPROTO_H_ > + > +#include "main.h" > + > +#include <linux/skbuff.h> > + > +/* Methods for operating on the initial command > + * byte of the OpenVPN protocol. > + */ > + > +/* packet opcode (high 5 bits) and key-id (low 3 bits) are combined in > + * one byte > + */ > +#define OVPN_KEY_ID_MASK 0x07 > +#define OVPN_OPCODE_SHIFT 3 > +#define OVPN_OPCODE_MASK 0x1F Instead of defining mask(s) and shift(s), we can define only masks and use bitfield API (see below). > +/* upper bounds on opcode and key ID */ > +#define OVPN_KEY_ID_MAX (OVPN_KEY_ID_MASK + 1) > +#define OVPN_OPCODE_MAX (OVPN_OPCODE_MASK + 1) > +/* packet opcodes of interest to us */ > +#define OVPN_DATA_V1 6 /* data channel V1 packet */ > +#define OVPN_DATA_V2 9 /* data channel V2 packet */ > +/* size of initial packet opcode */ > +#define OVPN_OP_SIZE_V1 1 > +#define OVPN_OP_SIZE_V2 4 > +#define OVPN_PEER_ID_MASK 0x00FFFFFF > +#define OVPN_PEER_ID_UNDEF 0x00FFFFFF > +/* first byte of keepalive message */ > +#define OVPN_KEEPALIVE_FIRST_BYTE 0x2a > +/* first byte of exit message */ > +#define OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE 0x28 From the above list of macros, OVPN_KEY_ID_MAX, OVPN_OPCODE_MAX, OVPN_OP_SIZE_V1, OVPN_KEEPALIVE_FIRST_BYTE, and OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE are unused and looks like should be removed. > +/** > + * ovpn_opcode_from_skb - extract OP code from skb at specified offset > + * @skb: the packet to extract the OP code from > + * @offset: the offset in the data buffer where the OP code is located > + * > + * Note: this function assumes that the skb head was pulled enough > + * to access the first byte. > + * > + * Return: the OP code > + */ > +static inline u8 ovpn_opcode_from_skb(const struct sk_buff *skb, u16 offset) > +{ > + u8 byte = *(skb->data + offset); > + > + return byte >> OVPN_OPCODE_SHIFT; For example here, the shift can be replaced with bitfield macro: #define OVPN_OPCODE_PKTTYPE_MSK 0xf8000000 #define OVPN_OPCODE_KEYID_MSK 0x07000000 #define OVPN_OPCODE_PEERID_MSK 0x00ffffff static inline u8 ovpn_opcode_from_skb(...) { u32 opcode = be32_to_cpu(*(__be32 *)(skb->data + offset)); return FIELD_GET(OVPN_OPCODE_PKTTYPE_MSK, opcode); } And the upcoming ovpn_opcode_compose() can be implemented like this: static inline u32 ovpn_opcode_compose(u8 opcode, u8 key_id, u32 peer_id) { return FIELD_PREP(OVPN_OPCODE_PKTTYPE_MSK, opcode) | FIELD_PREP(OVPN_OPCODE_KEYID_MSK, key_id) | FIELD_PREP(OVPN_OPCODE_PEERID_MSK, peer_id); } And with this size can be even embedded into ovpn_aead_encrypt() to make the header composing more clear. > +} > + > +/** > + * ovpn_peer_id_from_skb - extract peer ID from skb at specified offset > + * @skb: the packet to extract the OP code from > + * @offset: the offset in the data buffer where the OP code is located > + * > + * Note: this function assumes that the skb head was pulled enough > + * to access the first 4 bytes. > + * > + * Return: the peer ID. > + */ > +static inline u32 ovpn_peer_id_from_skb(const struct sk_buff *skb, u16 offset) > +{ > + return ntohl(*(__be32 *)(skb->data + offset)) & OVPN_PEER_ID_MASK; > +} > + > +#endif /* _NET_OVPN_OVPNPROTO_H_ */ > diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c > index 090a3232ab0ec19702110f1a90f45c7f10889f6f..964b566de69f4132806a969a455cec7f6059a0bd 100644 > --- a/drivers/net/ovpn/socket.c > +++ b/drivers/net/ovpn/socket.c > @@ -22,6 +22,9 @@ static void ovpn_socket_detach(struct socket *sock) > if (!sock) > return; > > + if (sock->sk->sk_protocol == IPPROTO_UDP) > + ovpn_udp_socket_detach(sock); > + > sockfd_put(sock); > } > > @@ -71,6 +74,27 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer) > return ret; > } > > +/* Retrieve the corresponding ovpn object from a UDP socket > + * rcu_read_lock must be held on entry > + */ > +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) > +{ > + struct ovpn_socket *ovpn_sock; > + > + if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != UDP_ENCAP_OVPNINUDP)) > + return NULL; > + > + ovpn_sock = rcu_dereference_sk_user_data(sk); > + if (unlikely(!ovpn_sock)) > + return NULL; > + > + /* make sure that sk matches our stored transport socket */ > + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk)) > + return NULL; > + > + return ovpn_sock->ovpn; Now, returning of this pointer is safe. But the following TCP transport support calls the socket release via a scheduled work. What extends socket lifetime and makes it possible to receive a UDP packet way after the interface private data release. Is it correct assumption? If the above is right then shall we set ->ovpn = NULL before scheduling the socket releasing work or somehow else mark the socket as half-destroyed? > +} > + > /** > * ovpn_socket_new - create a new socket and initialize it > * @sock: the kernel socket to embed > diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c > index d26d7566e9c8dfe91fa77f49c34fb179a9fb2239..d1e88ae83843f02d591e67a7995f2d6868720695 100644 > --- a/drivers/net/ovpn/udp.c > +++ b/drivers/net/ovpn/udp.c > @@ -21,9 +21,95 @@ > #include "bind.h" > #include "io.h" > #include "peer.h" > +#include "proto.h" > #include "socket.h" > #include "udp.h" > > +/** > + * ovpn_udp_encap_recv - Start processing a received UDP packet. > + * @sk: socket over which the packet was received > + * @skb: the received packet > + * > + * If the first byte of the payload is DATA_V2, the packet is further processed, > + * otherwise it is forwarded to the UDP stack for delivery to user space. > + * > + * Return: > + * 0 if skb was consumed or dropped > + * >0 if skb should be passed up to userspace as UDP (packet not consumed) > + * <0 if skb should be resubmitted as proto -N (packet not consumed) > + */ > +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) > +{ > + struct ovpn_peer *peer = NULL; > + struct ovpn_struct *ovpn; > + u32 peer_id; > + u8 opcode; > + > + ovpn = ovpn_from_udp_sock(sk); > + if (unlikely(!ovpn)) { > + net_err_ratelimited("%s: cannot obtain ovpn object from UDP socket\n", > + __func__); Probably we should zero ovpn pointer in the ovpn_sock to survive scheduled socket release (see comment in ovpn_from_udp_sock). So, this print should be removed to avoid printing misguiding errors. > + goto drop_noovpn; > + } > + > + /* Make sure the first 4 bytes of the skb data buffer after the UDP > + * header are accessible. > + * They are required to fetch the OP code, the key ID and the peer ID. > + */ > + if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) + > + OVPN_OP_SIZE_V2))) { > + net_dbg_ratelimited("%s: packet too small\n", __func__); > + goto drop; > + } > + > + opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); > + if (unlikely(opcode != OVPN_DATA_V2)) { > + /* DATA_V1 is not supported */ > + if (opcode == OVPN_DATA_V1) > + goto drop; This packet dropping makes protocol accelerator, intendent to speed up the data packets processing, a protocol enforcement entity, isn't it? Shall we follow the principle of beeing liberal in what we accept and just forward everything besides data packets upstream to a userspace application? > + > + /* unknown or control packet: let it bubble up to userspace */ > + return 1; > + } > + > + peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); > + /* some OpenVPN server implementations send data packets with the > + * peer-id set to undef. In this case we skip the peer lookup by peer-id > + * and we try with the transport address > + */ > + if (peer_id != OVPN_PEER_ID_UNDEF) { > + peer = ovpn_peer_get_by_id(ovpn, peer_id); > + if (!peer) { > + net_err_ratelimited("%s: received data from unknown peer (id: %d)\n", > + __func__, peer_id); Why do we consider a peer sending us garbage our problem? Meaning, this peer miss can be not our fault but a malformed packet from a 3rd party side. E.g. nowdays I can see a lot of traces of these "active probers" in my OpenVPN logs. Shall remove this message or at least make it debug to avoid bothering users with garbage traveling Internet? Anyway we can not do anything regarding incoming traffic. > + goto drop; > + } > + } > + > + if (!peer) { AFAIU, this condition can true only in case of peer_id beeing equal to OVPN_PEER_ID_UNDEF, right? In this case the condition check can be replaced by simple 'else' statement. And to make code more corresponding to the above comment regarding implementations that send undefined peer-id, can we swap sides of the lookup method selection? E.g. /* Comment about fancy implementations sending undefined peer-id */ if (peer_id == OVPN_PEER_ID_UNDEF) { /* Do transport address based loockup */ } else { /* Do peer-id based loockup */ } > + /* data packet with undef peer-id */ > + peer = ovpn_peer_get_by_transp_addr(ovpn, skb); > + if (unlikely(!peer)) { > + net_dbg_ratelimited("%s: received data with undef peer-id from unknown source\n", > + __func__); > + goto drop; > + } > + } > + > + /* pop off outer UDP header */ > + __skb_pull(skb, sizeof(struct udphdr)); > + ovpn_recv(peer, skb); > + return 0; > + > +drop: > + if (peer) > + ovpn_peer_put(peer); AFAIU, the peer is alway NULL here. Shall we remove the above check? > + dev_core_stats_rx_dropped_inc(ovpn->dev); > +drop_noovpn: > + kfree_skb(skb); > + return 0; > +} > + > /** > * ovpn_udp4_output - send IPv4 packet over udp socket > * @ovpn: the openvpn instance > @@ -259,8 +345,12 @@ void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer, > */ > int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) > { > + struct udp_tunnel_sock_cfg cfg = { > + .encap_type = UDP_ENCAP_OVPNINUDP, > + .encap_rcv = ovpn_udp_encap_recv, > + }; > struct ovpn_socket *old_data; > - int ret = 0; > + int ret; > > /* sanity check */ > if (sock->sk->sk_protocol != IPPROTO_UDP) { > @@ -274,6 +364,7 @@ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) > if (!old_data) { > /* socket is currently unused - we can take it */ > rcu_read_unlock(); > + setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg); > return 0; > } > > @@ -302,3 +393,14 @@ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) > > return ret; > } > + > +/** > + * ovpn_udp_socket_detach - clean udp-tunnel status for this socket > + * @sock: the socket to clean > + */ > +void ovpn_udp_socket_detach(struct socket *sock) > +{ > + struct udp_tunnel_sock_cfg cfg = { }; > + > + setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg); > +} > diff --git a/drivers/net/ovpn/udp.h b/drivers/net/ovpn/udp.h > index e60f8cd2b4ac8f910aabcf8ed546af59d6ca4be4..fecb68464896bc1228315faf268453f9005e693d 100644 > --- a/drivers/net/ovpn/udp.h > +++ b/drivers/net/ovpn/udp.h > @@ -18,8 +18,9 @@ struct sk_buff; > struct socket; > > int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn); > - > +void ovpn_udp_socket_detach(struct socket *sock); > void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer, > struct sk_buff *skb); > +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk); > > #endif /* _NET_OVPN_UDP_H_ */ >
2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote: > +static int ovpn_nl_peer_modify(struct ovpn_peer *peer, struct genl_info *info, > + struct nlattr **attrs) > +{ > + struct sockaddr_storage ss = {}; > + u32 sockfd, interv, timeout; > + struct socket *sock = NULL; > + u8 *local_ip = NULL; > + bool rehash = false; > + int ret; > + > + if (attrs[OVPN_A_PEER_SOCKET]) { > + /* lookup the fd in the kernel table and extract the socket > + * object > + */ > + sockfd = nla_get_u32(attrs[OVPN_A_PEER_SOCKET]); > + /* sockfd_lookup() increases sock's refcounter */ > + sock = sockfd_lookup(sockfd, &ret); > + if (!sock) { > + NL_SET_ERR_MSG_FMT_MOD(info->extack, > + "cannot lookup peer socket (fd=%u): %d", > + sockfd, ret); > + return -ENOTSOCK; > + } > + > + /* Only when using UDP as transport protocol the remote endpoint > + * can be configured so that ovpn knows where to send packets > + * to. > + * > + * In case of TCP, the socket is connected to the peer and ovpn > + * will just send bytes over it, without the need to specify a > + * destination. > + */ > + if (sock->sk->sk_protocol != IPPROTO_UDP && > + (attrs[OVPN_A_PEER_REMOTE_IPV4] || > + attrs[OVPN_A_PEER_REMOTE_IPV6])) { > + NL_SET_ERR_MSG_FMT_MOD(info->extack, > + "unexpected remote IP address for non UDP socket"); > + sockfd_put(sock); > + return -EINVAL; > + } > + > + if (peer->sock) > + ovpn_socket_put(peer->sock); > + > + peer->sock = ovpn_socket_new(sock, peer); I don't see anything preventing concurrent updates of peer->sock. I think peer->lock should be taken from the start of ovpn_nl_peer_modify. Concurrent changes to peer->vpn_addrs and peer->keepalive_* are also not prevented with the current code. > + if (IS_ERR(peer->sock)) { > + NL_SET_ERR_MSG_FMT_MOD(info->extack, > + "cannot encapsulate socket: %ld", > + PTR_ERR(peer->sock)); > + sockfd_put(sock); > + peer->sock = NULL; > + return -ENOTSOCK; > + } > + } > + > + if (ovpn_nl_attr_sockaddr_remote(attrs, &ss) != AF_UNSPEC) { > + /* we carry the local IP in a generic container. > + * ovpn_peer_reset_sockaddr() will properly interpret it > + * based on ss.ss_family > + */ > + local_ip = ovpn_nl_attr_local_ip(attrs); > + > + spin_lock_bh(&peer->lock); > + /* set peer sockaddr */ > + ret = ovpn_peer_reset_sockaddr(peer, &ss, local_ip); > + if (ret < 0) { > + NL_SET_ERR_MSG_FMT_MOD(info->extack, > + "cannot set peer sockaddr: %d", > + ret); > + spin_unlock_bh(&peer->lock); > + return ret; > + } > + spin_unlock_bh(&peer->lock); > + } > + > + if (attrs[OVPN_A_PEER_VPN_IPV4]) { > + rehash = true; > + peer->vpn_addrs.ipv4.s_addr = > + nla_get_in_addr(attrs[OVPN_A_PEER_VPN_IPV4]); > + } > + > + if (attrs[OVPN_A_PEER_VPN_IPV6]) { > + rehash = true; > + peer->vpn_addrs.ipv6 = > + nla_get_in6_addr(attrs[OVPN_A_PEER_VPN_IPV6]); > + } > + > + /* when setting the keepalive, both parameters have to be configured */ > + if (attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL] && > + attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]) { > + interv = nla_get_u32(attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL]); > + timeout = nla_get_u32(attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]); > + ovpn_peer_keepalive_set(peer, interv, timeout); > + } > + > + netdev_dbg(peer->ovpn->dev, > + "%s: peer id=%u endpoint=%pIScp/%s VPN-IPv4=%pI4 VPN-IPv6=%pI6c\n", > + __func__, peer->id, &ss, > + peer->sock->sock->sk->sk_prot_creator->name, > + &peer->vpn_addrs.ipv4.s_addr, &peer->vpn_addrs.ipv6); > + > + return rehash ? 1 : 0; > +} > + [...] > +void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer) > + __must_hold(&peer->ovpn->peers->lock) Changes to peer->vpn_addrs are not protected by peers->lock, so those could be getting updated while we're rehashing (and taking peer->lock in ovpn_nl_peer_modify as I'm suggesting above also wouldn't prevent that). > +{ > + struct hlist_nulls_head *nhead; > + > + if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) { > + /* remove potential old hashing */ > + hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr); > + > + nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr, > + &peer->vpn_addrs.ipv4, > + sizeof(peer->vpn_addrs.ipv4)); > + hlist_nulls_add_head_rcu(&peer->hash_entry_addr4, nhead); > + } > + > + if (!ipv6_addr_any(&peer->vpn_addrs.ipv6)) { > + /* remove potential old hashing */ > + hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr); > + > + nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr, > + &peer->vpn_addrs.ipv6, > + sizeof(peer->vpn_addrs.ipv6)); > + hlist_nulls_add_head_rcu(&peer->hash_entry_addr6, nhead); > + } > +}
On 29.10.2024 12:47, Antonio Quartulli wrote: > +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb) > +{ > + unsigned int pkt_len; > + > + /* we can't guarantee the packet wasn't corrupted before entering the > + * VPN, therefore we give other layers a chance to check that > + */ > + skb->ip_summed = CHECKSUM_NONE; > + > + /* skb hash for transport packet no longer valid after decapsulation */ > + skb_clear_hash(skb); > + > + /* post-decrypt scrub -- prepare to inject encapsulated packet onto the > + * interface, based on __skb_tunnel_rx() in dst.h > + */ > + skb->dev = peer->ovpn->dev; > + skb_set_queue_mapping(skb, 0); > + skb_scrub_packet(skb, true); > + > + skb_reset_network_header(skb); > + skb_reset_transport_header(skb); > + skb_probe_transport_header(skb); > + skb_reset_inner_headers(skb); > + > + memset(skb->cb, 0, sizeof(skb->cb)); > + > + /* cause packet to be "received" by the interface */ > + pkt_len = skb->len; > + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, > + skb) == NET_RX_SUCCESS)) nit: to improve readability, the packet delivery call can be composed like this: pkt_len = skb->len; res = gro_cells_receive(&peer->ovpn->gro_cells, skb); if (likely(res == NET_RX_SUCCESS)) > + /* update RX stats with the size of decrypted packet */ > + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); > +}
On 04.11.2024 13:26, Sabrina Dubroca wrote: > 2024-10-29, 11:47:27 +0100, Antonio Quartulli wrote: >> struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn, >> struct sk_buff *skb) >> { >> - struct ovpn_peer *peer = NULL; >> + struct ovpn_peer *tmp, *peer = NULL; >> struct sockaddr_storage ss = { 0 }; >> + struct hlist_nulls_head *nhead; >> + struct hlist_nulls_node *ntmp; >> + size_t sa_len; >> >> if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss))) >> return NULL; >> >> if (ovpn->mode == OVPN_MODE_P2P) >> - peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss); >> + return ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss); >> + >> + switch (ss.ss_family) { >> + case AF_INET: >> + sa_len = sizeof(struct sockaddr_in); >> + break; >> + case AF_INET6: >> + sa_len = sizeof(struct sockaddr_in6); >> + break; >> + default: >> + return NULL; >> + } > > You could get rid of that switch by having ovpn_peer_skb_to_sockaddr > also set sa_len (or return 0/the size). > >> + >> + nhead = ovpn_get_hash_head(ovpn->peers->by_transp_addr, &ss, sa_len); >> + >> + rcu_read_lock(); >> + hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead, >> + hash_entry_transp_addr) { > > I think that's missing the retry in case we ended up in the wrong > bucket due to a peer rehash? Nice catch! I am also wondering why the 'nulls' variant was selected, but there are no nulls value verification with the search respin. Since we started discussing the list API, why the 'nulls' variant is used for address hash tables and the normal variant is used for the peer-id lookup? > >> + if (!ovpn_peer_transp_match(tmp, &ss)) >> + continue; >> + >> + if (!ovpn_peer_hold(tmp)) >> + continue; >> + >> + peer = tmp; >> + break; >> + } >> + rcu_read_unlock(); >> >> return peer; >> } -- Sergey
On 05/11/2024 14:12, Sabrina Dubroca wrote: > 2024-10-30, 21:47:58 +0100, Antonio Quartulli wrote: >> On 30/10/2024 17:37, Sabrina Dubroca wrote: >>> 2024-10-29, 11:47:19 +0100, Antonio Quartulli wrote: >>>> +static void ovpn_peer_release(struct ovpn_peer *peer) >>>> +{ >>>> + ovpn_bind_reset(peer, NULL); >>>> + >>>> + dst_cache_destroy(&peer->dst_cache); >>> >>> Is it safe to destroy the cache at this time? In the same function, we >>> use rcu to free the peer, but AFAICT the dst_cache will be freed >>> immediately: >>> >>> void dst_cache_destroy(struct dst_cache *dst_cache) >>> { >>> [...] >>> free_percpu(dst_cache->cache); >>> } >>> >>> (probably no real issue because ovpn_udp_send_skb gets called while we >>> hold a reference to the peer?) >> >> Right. >> That was my assumption: release happens on refcnt = 0 only, therefore no >> field should be in use anymore. >> Anything that may still be in use will have its own refcounter. > > My worry is that code changes over time, assumptions are forgotten, > and we end up with code that was a bit odd but safe not being safe > anymore. Yeah, makes sense. I'll move the call to dst_cache_destroy() and to kfree(peer) in a RCU callback. Thanks! > >>> >>>> + netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker); >>>> + kfree_rcu(peer, rcu); >>>> +} >>> >>> >>> [...] >>>> +static int ovpn_peer_del_p2p(struct ovpn_peer *peer, >>>> + enum ovpn_del_peer_reason reason) >>>> + __must_hold(&peer->ovpn->lock) >>>> +{ >>>> + struct ovpn_peer *tmp; >>>> + >>>> + tmp = rcu_dereference_protected(peer->ovpn->peer, >>>> + lockdep_is_held(&peer->ovpn->lock)); >>>> + if (tmp != peer) { >>>> + DEBUG_NET_WARN_ON_ONCE(1); >>>> + if (tmp) >>>> + ovpn_peer_put(tmp); >>> >>> Does peer->ovpn->peer need to be set to NULL here as well? Or is it >>> going to survive this _put? >> >> First of all consider that this is truly something that we don't expect to >> happen (hence the WARN_ON). >> If this is happening it's because we are trying to delete a peer that is not >> the one we are connected to (unexplainable scenario in p2p mode). >> >> Still, should we hit this case (I truly can't see how), I'd say "leave >> everything as is - maybe this call was just a mistake". > > Yeah, true, let's leave it. Thanks. >
2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote: > diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c > index 63c140138bf98e5d1df79a2565b666d86513323d..0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261 100644 > --- a/drivers/net/ovpn/io.c > +++ b/drivers/net/ovpn/io.c > @@ -135,6 +135,15 @@ void ovpn_decrypt_post(void *data, int ret) > /* keep track of last received authenticated packet for keepalive */ > peer->last_recv = ktime_get_real_seconds(); > > + if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) { What prevents peer->sock from being replaced and released concurrently? Or possibly reading the error value that ovpn_socket_new can return before peer->sock is reset to NULL, just noticed this in ovpn_nl_peer_modify: if (attrs[OVPN_A_PEER_SOCKET]) { // ... peer->sock = ovpn_socket_new(sock, peer); if (IS_ERR(peer->sock)) { // ... peer->sock = NULL; (ovpn_encrypt_post has a similar check on peer->sock->sock->sk->sk_protocol that I don't think is safe either) > + /* check if this peer changed it's IP address and update > + * state > + */ > + ovpn_peer_float(peer, skb); > + /* update source endpoint for this peer */ > + ovpn_peer_update_local_endpoint(peer, skb); Why not do both in the same function? They're not called anywhere else (at least in this version of the series). They both modify peer->bind depending on skb_protocol_to_family(skb), and operate under peer->lock. > +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb) > +{ > + struct hlist_nulls_head *nhead; > + struct sockaddr_storage ss; > + const u8 *local_ip = NULL; > + struct sockaddr_in6 *sa6; > + struct sockaddr_in *sa; > + struct ovpn_bind *bind; > + sa_family_t family; > + size_t salen; > + > + rcu_read_lock(); > + bind = rcu_dereference(peer->bind); > + if (unlikely(!bind)) { > + rcu_read_unlock(); > + return; > + } > + > + spin_lock_bh(&peer->lock); You could take the lock from the start, instead of using rcu_read_lock to get peer->bind. It would guarantee that the bind we got isn't already being replaced just as we wait to update it. And same in ovpn_peer_update_local_endpoint, it would make sure we're updating the local IP for the active bind. (sorry I didn't think about that last time we discussed this) > + if (likely(ovpn_bind_skb_src_match(bind, skb))) > + goto unlock; > + > + family = skb_protocol_to_family(skb); > +
On 12/11/2024 02:18, Sergey Ryazanov wrote: > On 04.11.2024 13:26, Sabrina Dubroca wrote: >> 2024-10-29, 11:47:27 +0100, Antonio Quartulli wrote: >>> struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct >>> *ovpn, >>> struct sk_buff *skb) >>> { >>> - struct ovpn_peer *peer = NULL; >>> + struct ovpn_peer *tmp, *peer = NULL; >>> struct sockaddr_storage ss = { 0 }; >>> + struct hlist_nulls_head *nhead; >>> + struct hlist_nulls_node *ntmp; >>> + size_t sa_len; >>> if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss))) >>> return NULL; >>> if (ovpn->mode == OVPN_MODE_P2P) >>> - peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss); >>> + return ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss); >>> + >>> + switch (ss.ss_family) { >>> + case AF_INET: >>> + sa_len = sizeof(struct sockaddr_in); >>> + break; >>> + case AF_INET6: >>> + sa_len = sizeof(struct sockaddr_in6); >>> + break; >>> + default: >>> + return NULL; >>> + } >> >> You could get rid of that switch by having ovpn_peer_skb_to_sockaddr >> also set sa_len (or return 0/the size). Yeah, makes sense. Thanks! >> >>> + >>> + nhead = ovpn_get_hash_head(ovpn->peers->by_transp_addr, &ss, >>> sa_len); >>> + >>> + rcu_read_lock(); >>> + hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead, >>> + hash_entry_transp_addr) { >> >> I think that's missing the retry in case we ended up in the wrong >> bucket due to a peer rehash? Oh, for some reason I convinced myself that this is handled behind the scene, but indeed the lookup must be explicitly restarted. will fix it, thanks for pointing this out! > > Nice catch! I am also wondering why the 'nulls' variant was selected, > but there are no nulls value verification with the search respin. > > Since we started discussing the list API, why the 'nulls' variant is > used for address hash tables and the normal variant is used for the > peer-id lookup? Because the nulls variant is used only for tables where a re-hash can happen. The peer-id table does not expect its objected to be re-used or re-hashed since the ID of a peer cannot change throughout its lifetime. Regards, > >> >>> + if (!ovpn_peer_transp_match(tmp, &ss)) >>> + continue; >>> + >>> + if (!ovpn_peer_hold(tmp)) >>> + continue; >>> + >>> + peer = tmp; >>> + break; >>> + } >>> + rcu_read_unlock(); >>> return peer; >>> } > > -- > Sergey >
On 04/11/2024 12:24, Sabrina Dubroca wrote: > 2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote: >> +static int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer, >> + const struct sockaddr_storage *ss, >> + const u8 *local_ip) >> + __must_hold(&peer->lock) >> +{ >> + struct ovpn_bind *bind; >> + size_t ip_len; >> + >> + /* create new ovpn_bind object */ >> + bind = ovpn_bind_from_sockaddr(ss); >> + if (IS_ERR(bind)) >> + return PTR_ERR(bind); >> + >> + if (local_ip) { >> + if (ss->ss_family == AF_INET) { >> + ip_len = sizeof(struct in_addr); >> + } else if (ss->ss_family == AF_INET6) { >> + ip_len = sizeof(struct in6_addr); >> + } else { >> + netdev_dbg(peer->ovpn->dev, "%s: invalid family for remote endpoint\n", >> + __func__); > > ratelimited since that can be triggered from packet processing? ACK > > > [...] >> +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb) >> +{ > [...] >> + >> + switch (family) { >> + case AF_INET: >> + sa = (struct sockaddr_in *)&ss; >> + sa->sin_family = AF_INET; >> + sa->sin_addr.s_addr = ip_hdr(skb)->saddr; >> + sa->sin_port = udp_hdr(skb)->source; >> + salen = sizeof(*sa); >> + break; >> + case AF_INET6: >> + sa6 = (struct sockaddr_in6 *)&ss; >> + sa6->sin6_family = AF_INET6; >> + sa6->sin6_addr = ipv6_hdr(skb)->saddr; >> + sa6->sin6_port = udp_hdr(skb)->source; >> + sa6->sin6_scope_id = ipv6_iface_scope_id(&ipv6_hdr(skb)->saddr, >> + skb->skb_iif); >> + salen = sizeof(*sa6); >> + break; >> + default: >> + goto unlock; >> + } >> + >> + netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__, > > %u for peer->id? > > and ratelimited too, probably. > > (also in ovpn_peer_update_local_endpoint in the previous patch) Technically we don't expect that frequent float/endpoint updates, but should they happen..better to be protected. ACK > >> + peer->id, &ss); >> + ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss, >> + local_ip); > > skip the rehash if this fails? peer->bind will still be the old one so > moving it to the new hash chain won't help (the lookup will fail). Yeah, it makes sense. Thanks a lot. Regards,
On 12/11/2024 11:56, Sabrina Dubroca wrote: > 2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote: >> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c >> index 63c140138bf98e5d1df79a2565b666d86513323d..0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261 100644 >> --- a/drivers/net/ovpn/io.c >> +++ b/drivers/net/ovpn/io.c >> @@ -135,6 +135,15 @@ void ovpn_decrypt_post(void *data, int ret) >> /* keep track of last received authenticated packet for keepalive */ >> peer->last_recv = ktime_get_real_seconds(); >> >> + if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) { > > What prevents peer->sock from being replaced and released > concurrently? Technically nothing. Userspace currently does not even support updating a peer socket at runtime, but I wanted ovpn to be flexible enough from the beginning. One approach might be to go back to peer->sock being unmutable and forget about this. OTOH, if we want to keep this flexibility (which I think is nice), I think I should make peer->sock an RCU pointer and access it accordingly. Does it make sense? > > Or possibly reading the error value that ovpn_socket_new can return > before peer->sock is reset to NULL, just noticed this in > ovpn_nl_peer_modify: > > if (attrs[OVPN_A_PEER_SOCKET]) { > // ... > peer->sock = ovpn_socket_new(sock, peer); > if (IS_ERR(peer->sock)) { > // ... > peer->sock = NULL; > > > (ovpn_encrypt_post has a similar check on > peer->sock->sock->sk->sk_protocol that I don't think is safe either) Yap, agreed. > > >> + /* check if this peer changed it's IP address and update >> + * state >> + */ >> + ovpn_peer_float(peer, skb); >> + /* update source endpoint for this peer */ >> + ovpn_peer_update_local_endpoint(peer, skb); > > Why not do both in the same function? They're not called anywhere else > (at least in this version of the series). They both modify peer->bind > depending on skb_protocol_to_family(skb), and operate under > peer->lock. I never considered to do so as I just always assumed the two to be two separate features/routines. I think it's a good idea and I would get rid of a few common instructions (along with acquiring the lock twice). Thanks! > > >> +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb) >> +{ >> + struct hlist_nulls_head *nhead; >> + struct sockaddr_storage ss; >> + const u8 *local_ip = NULL; >> + struct sockaddr_in6 *sa6; >> + struct sockaddr_in *sa; >> + struct ovpn_bind *bind; >> + sa_family_t family; >> + size_t salen; >> + >> + rcu_read_lock(); >> + bind = rcu_dereference(peer->bind); >> + if (unlikely(!bind)) { >> + rcu_read_unlock(); >> + return; >> + } >> + >> + spin_lock_bh(&peer->lock); > > You could take the lock from the start, instead of using rcu_read_lock > to get peer->bind. It would guarantee that the bind we got isn't > already being replaced just as we wait to update it. And same in > ovpn_peer_update_local_endpoint, it would make sure we're updating the > local IP for the active bind. > > (sorry I didn't think about that last time we discussed this) no worries :) and I like the idea. will do that, thanks. > >> + if (likely(ovpn_bind_skb_src_match(bind, skb))) >> + goto unlock; >> + >> + family = skb_protocol_to_family(skb); >> + >
On 04/11/2024 16:14, Sabrina Dubroca wrote: > 2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote: >> +static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn, >> + struct genl_info *info, >> + struct nlattr **attrs) >> +{ >> + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs, >> + OVPN_A_PEER_ID)) >> + return -EINVAL; >> + >> + if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) { >> + NL_SET_ERR_MSG_MOD(info->extack, >> + "cannot specify both remote IPv4 or IPv6 address"); >> + return -EINVAL; >> + } >> + >> + if (!attrs[OVPN_A_PEER_REMOTE_IPV4] && >> + !attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) { >> + NL_SET_ERR_MSG_MOD(info->extack, >> + "cannot specify remote port without IP address"); >> + return -EINVAL; >> + } >> + >> + if (!attrs[OVPN_A_PEER_REMOTE_IPV4] && >> + attrs[OVPN_A_PEER_LOCAL_IPV4]) { >> + NL_SET_ERR_MSG_MOD(info->extack, >> + "cannot specify local IPv4 address without remote"); >> + return -EINVAL; >> + } >> + >> + if (!attrs[OVPN_A_PEER_REMOTE_IPV6] && >> + attrs[OVPN_A_PEER_LOCAL_IPV6]) { > > I think these consistency checks should account for v4mapped > addresses. With remote=v4mapped and local=v6 we'll end up with an > incorrect ipv4 "local" address (taken out of the ipv6 address's first > 4B by ovpn_peer_reset_sockaddr). With remote=ipv6 and local=v4mapped, > we'll pass the last 4B of OVPN_A_PEER_LOCAL_IPV6 to > ovpn_peer_reset_sockaddr and try to read 16B (the full ipv6 address) > out of that. Right, a v4mapped address would fool this check. How about checking if both or none addresses are v4mapped? This way we should prevent such cases. > >> + NL_SET_ERR_MSG_MOD(info->extack, >> + "cannot specify local IPV6 address without remote"); >> + return -EINVAL; >> + } > > > [...] >> int ovpn_nl_peer_set_doit(struct sk_buff *skb, struct genl_info *info) >> { > [...] >> + ret = ovpn_nl_peer_modify(peer, info, attrs); >> + if (ret < 0) { >> + ovpn_peer_put(peer); >> + return ret; >> + } >> + >> + /* ret == 1 means that VPN IPv4/6 has been modified and rehashing >> + * is required >> + */ >> + if (ret > 0) { > > && mode == MP ? > > I don't see ovpn_nl_peer_modify checking that before returning 1, and > in P2P mode ovpn->peers will be NULL. Right. I was wondering if it's better to add the check on the return statement of ovpn_nl_peer_modify...but I think it's more functional to add it here, as per your suggestion. > >> + spin_lock_bh(&ovpn->peers->lock); >> + ovpn_peer_hash_vpn_ip(peer); >> + spin_unlock_bh(&ovpn->peers->lock); >> + } >> + >> + ovpn_peer_put(peer); >> + >> + return 0; >> +} > >> int ovpn_nl_peer_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) >> { > [...] >> + } else { >> + rcu_read_lock(); >> + hash_for_each_rcu(ovpn->peers->by_id, bkt, peer, >> + hash_entry_id) { >> + /* skip already dumped peers that were dumped by >> + * previous invocations >> + */ >> + if (last_idx > 0) { >> + last_idx--; >> + continue; >> + } > > If a peer that was dumped during a previous invocation is removed in > between, we'll miss one that's still present in the overall dump. I > don't know how much it matters (I guses it depends on how the results > of this dump are used by userspace), so I'll let you decide if this > needs to be fixed immediately or if it can be ignored for now. True, this is a risk I assumed. Not extremely important if you ask me, but do you have any suggestion how to avoid this in an elegant and lockless way? IIRC I got inspired by the station dump in the mac80211 code, which probably assumes the same risk. > >> + >> + if (ovpn_nl_send_peer(skb, info, peer, >> + NETLINK_CB(cb->skb).portid, >> + cb->nlh->nlmsg_seq, >> + NLM_F_MULTI) < 0) >> + break; >> + >> + /* count peers being dumped during this invocation */ >> + dumped++; >> + } >> + rcu_read_unlock(); >> + } >> + >> +out: >> + netdev_put(ovpn->dev, &ovpn->dev_tracker); >> + >> + /* sum up peers dumped in this message, so that at the next invocation >> + * we can continue from where we left >> + */ >> + cb->args[1] += dumped; >> + return skb->len; >> } >> >> int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct genl_info *info) >> { >> - return -EOPNOTSUPP; >> + struct nlattr *attrs[OVPN_A_PEER_MAX + 1]; >> + struct ovpn_struct *ovpn = info->user_ptr[0]; >> + struct ovpn_peer *peer; >> + u32 peer_id; >> + int ret; >> + >> + if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER)) >> + return -EINVAL; >> + >> + ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER], >> + ovpn_peer_nl_policy, info->extack); >> + if (ret) >> + return ret; >> + >> + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs, >> + OVPN_A_PEER_ID)) >> + return -EINVAL; >> + >> + peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]); >> + >> + peer = ovpn_peer_get_by_id(ovpn, peer_id); >> + if (!peer) > > maybe c/p the extack from ovpn_nl_peer_get_doit? Yes, will do. Thanks a lot. Regards, > >> + return -ENOENT; >> + >> + netdev_dbg(ovpn->dev, "%s: peer id=%u\n", __func__, peer->id); >> + ret = ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_USERSPACE); >> + ovpn_peer_put(peer); >> + >> + return ret; >> } >
On 11/11/2024 16:41, Sabrina Dubroca wrote: > 2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote: >> +static int ovpn_nl_peer_modify(struct ovpn_peer *peer, struct genl_info *info, >> + struct nlattr **attrs) >> +{ >> + struct sockaddr_storage ss = {}; >> + u32 sockfd, interv, timeout; >> + struct socket *sock = NULL; >> + u8 *local_ip = NULL; >> + bool rehash = false; >> + int ret; >> + >> + if (attrs[OVPN_A_PEER_SOCKET]) { >> + /* lookup the fd in the kernel table and extract the socket >> + * object >> + */ >> + sockfd = nla_get_u32(attrs[OVPN_A_PEER_SOCKET]); >> + /* sockfd_lookup() increases sock's refcounter */ >> + sock = sockfd_lookup(sockfd, &ret); >> + if (!sock) { >> + NL_SET_ERR_MSG_FMT_MOD(info->extack, >> + "cannot lookup peer socket (fd=%u): %d", >> + sockfd, ret); >> + return -ENOTSOCK; >> + } >> + >> + /* Only when using UDP as transport protocol the remote endpoint >> + * can be configured so that ovpn knows where to send packets >> + * to. >> + * >> + * In case of TCP, the socket is connected to the peer and ovpn >> + * will just send bytes over it, without the need to specify a >> + * destination. >> + */ >> + if (sock->sk->sk_protocol != IPPROTO_UDP && >> + (attrs[OVPN_A_PEER_REMOTE_IPV4] || >> + attrs[OVPN_A_PEER_REMOTE_IPV6])) { >> + NL_SET_ERR_MSG_FMT_MOD(info->extack, >> + "unexpected remote IP address for non UDP socket"); >> + sockfd_put(sock); >> + return -EINVAL; >> + } >> + >> + if (peer->sock) >> + ovpn_socket_put(peer->sock); >> + >> + peer->sock = ovpn_socket_new(sock, peer); > > I don't see anything preventing concurrent updates of peer->sock. I > think peer->lock should be taken from the start of > ovpn_nl_peer_modify. Concurrent changes to peer->vpn_addrs and > peer->keepalive_* are also not prevented with the current code. Yeah, this came up to my mind as well when checking the keepalive worker code. I'll make sure all updates happen under lock. > > >> + if (IS_ERR(peer->sock)) { >> + NL_SET_ERR_MSG_FMT_MOD(info->extack, >> + "cannot encapsulate socket: %ld", >> + PTR_ERR(peer->sock)); >> + sockfd_put(sock); >> + peer->sock = NULL; >> + return -ENOTSOCK; >> + } >> + } >> + >> + if (ovpn_nl_attr_sockaddr_remote(attrs, &ss) != AF_UNSPEC) { >> + /* we carry the local IP in a generic container. >> + * ovpn_peer_reset_sockaddr() will properly interpret it >> + * based on ss.ss_family >> + */ >> + local_ip = ovpn_nl_attr_local_ip(attrs); >> + >> + spin_lock_bh(&peer->lock); >> + /* set peer sockaddr */ >> + ret = ovpn_peer_reset_sockaddr(peer, &ss, local_ip); >> + if (ret < 0) { >> + NL_SET_ERR_MSG_FMT_MOD(info->extack, >> + "cannot set peer sockaddr: %d", >> + ret); >> + spin_unlock_bh(&peer->lock); >> + return ret; >> + } >> + spin_unlock_bh(&peer->lock); >> + } >> + >> + if (attrs[OVPN_A_PEER_VPN_IPV4]) { >> + rehash = true; >> + peer->vpn_addrs.ipv4.s_addr = >> + nla_get_in_addr(attrs[OVPN_A_PEER_VPN_IPV4]); >> + } >> + >> + if (attrs[OVPN_A_PEER_VPN_IPV6]) { >> + rehash = true; >> + peer->vpn_addrs.ipv6 = >> + nla_get_in6_addr(attrs[OVPN_A_PEER_VPN_IPV6]); >> + } >> + >> + /* when setting the keepalive, both parameters have to be configured */ >> + if (attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL] && >> + attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]) { >> + interv = nla_get_u32(attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL]); >> + timeout = nla_get_u32(attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]); >> + ovpn_peer_keepalive_set(peer, interv, timeout); >> + } >> + >> + netdev_dbg(peer->ovpn->dev, >> + "%s: peer id=%u endpoint=%pIScp/%s VPN-IPv4=%pI4 VPN-IPv6=%pI6c\n", >> + __func__, peer->id, &ss, >> + peer->sock->sock->sk->sk_prot_creator->name, >> + &peer->vpn_addrs.ipv4.s_addr, &peer->vpn_addrs.ipv6); >> + >> + return rehash ? 1 : 0; >> +} >> + > > [...] >> +void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer) >> + __must_hold(&peer->ovpn->peers->lock) > > Changes to peer->vpn_addrs are not protected by peers->lock, so those > could be getting updated while we're rehashing (and taking peer->lock > in ovpn_nl_peer_modify as I'm suggesting above also wouldn't prevent > that). > /me screams :-D Indeed peers->lock is only about protecting the lists, not the content of the listed objects. How about acquiring the peers->lock before calling ovpn_nl_peer_modify()? This way we prevent concurrent updates to interfere with each other, while at the same time we avoid concurrent adds/dels of the peer (the second part should already be protected as of today). None of them is time critical and the lock should avoid the issue you mentioned. Thanks a lot. Regards, >> +{ >> + struct hlist_nulls_head *nhead; >> + >> + if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) { >> + /* remove potential old hashing */ >> + hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr); >> + >> + nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr, >> + &peer->vpn_addrs.ipv4, >> + sizeof(peer->vpn_addrs.ipv4)); >> + hlist_nulls_add_head_rcu(&peer->hash_entry_addr4, nhead); >> + } >> + >> + if (!ipv6_addr_any(&peer->vpn_addrs.ipv6)) { >> + /* remove potential old hashing */ >> + hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr); >> + >> + nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr, >> + &peer->vpn_addrs.ipv6, >> + sizeof(peer->vpn_addrs.ipv6)); >> + hlist_nulls_add_head_rcu(&peer->hash_entry_addr6, nhead); >> + } >> +} >
2024-11-09, 03:01:21 +0200, Sergey Ryazanov wrote: > On 29.10.2024 12:47, Antonio Quartulli wrote: > > +/* When the OpenVPN protocol is ran in AEAD mode, use > > + * the OpenVPN packet ID as the AEAD nonce: > > + * > > + * 00000005 521c3b01 4308c041 > > + * [seq # ] [ nonce_tail ] > > + * [ 12-byte full IV ] -> NONCE_SIZE > > + * [4-bytes -> NONCE_WIRE_SIZE > > + * on wire] > > + */ > > Nice diagram! Can we go futher and define the OpenVPN packet header as a > stucture? Referencing the structure instead of using magic sizes and offsets > can greatly improve the code readability. Especially when it comes to header > construction/parsing in the encryption/decryption code. > > E.g. define a structures like this: > > struct ovpn_pkt_hdr { > __be32 op; > __be32 pktid; > u8 auth[]; > } __attribute__((packed)); > > struct ovpn_aead_iv { > __be32 pktid; > u8 nonce[OVPN_NONCE_TAIL_SIZE]; > } __attribute__((packed)); __attribute__((packed)) should not be needed here as the fields in both structs look properly aligned, and IIRC using packed can cause the compiler to generate worse code. > > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h > > index 8516c1ccd57a7c7634a538fe3ac16c858f647420..84d294aab20b79b8e9cb9b736a074105c99338f3 100644 > > --- a/include/uapi/linux/if_link.h > > +++ b/include/uapi/linux/if_link.h > > @@ -1975,4 +1975,19 @@ enum { > > #define IFLA_DSA_MAX (__IFLA_DSA_MAX - 1) > > +/* OVPN section */ > > + > > +enum ovpn_mode { > > + OVPN_MODE_P2P, > > + OVPN_MODE_MP, > > +}; > > Mode min/max values can be defined here and the netlink policy can reference > these values: > > enum ovpn_mode { > OVPN_MODE_P2P, > OVPN_MODE_MP, > __OVPN_MODE_MAX > }; > > #define OVPN_MODE_MIN OVPN_MODE_P2P > #define OVPN_MODE_MAX (__OVPN_MODE_MAX - 1) > > ... = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_MIN, OVPN_MODE_MAX) I don't think there's much benefit to that, other than making the diff smaller on a (very unlikely) patch that would add a new mode in the future. It even looks more inconvenient to me when reading the code ("ok what are _MIN and _MAX? the code is using _P2P and _MP, do they match?").
2024-11-10, 15:38:27 +0200, Sergey Ryazanov wrote: > On 29.10.2024 12:47, Antonio Quartulli wrote: > > An ovpn_peer object holds the whole status of a remote peer > > (regardless whether it is a server or a client). > > > > This includes status for crypto, tx/rx buffers, napi, etc. > > > > Only support for one peer is introduced (P2P mode). > > Multi peer support is introduced with a later patch. > > Reviewing the peer creation/destroying code I came to a generic question. > Did you consider keeping a single P2P peer in the peers table as well? > > Looks like such approach can greatly simply the code by dropping all these > 'switch (ovpn->mode)' checks and implementing a unified peer management. The > 'peer' field in the main private data structure can be kept to accelerate > lookups, still using peers table for management tasks like removing all the > peers on the interface teardown. It would save a few 'switch(mode)', but force every client to allocate the hashtable for no reason at all. That tradeoff doesn't look very beneficial to me, the P2P-specific code is really simple. And if you keep ovpn->peer to make lookups faster, you're not removing that many 'switch(mode)'.
On 12.11.2024 18:47, Sabrina Dubroca wrote: > 2024-11-09, 03:01:21 +0200, Sergey Ryazanov wrote: >> On 29.10.2024 12:47, Antonio Quartulli wrote: >>> +/* When the OpenVPN protocol is ran in AEAD mode, use >>> + * the OpenVPN packet ID as the AEAD nonce: >>> + * >>> + * 00000005 521c3b01 4308c041 >>> + * [seq # ] [ nonce_tail ] >>> + * [ 12-byte full IV ] -> NONCE_SIZE >>> + * [4-bytes -> NONCE_WIRE_SIZE >>> + * on wire] >>> + */ >> >> Nice diagram! Can we go futher and define the OpenVPN packet header as a >> stucture? Referencing the structure instead of using magic sizes and offsets >> can greatly improve the code readability. Especially when it comes to header >> construction/parsing in the encryption/decryption code. >> >> E.g. define a structures like this: >> >> struct ovpn_pkt_hdr { >> __be32 op; >> __be32 pktid; >> u8 auth[]; >> } __attribute__((packed)); >> >> struct ovpn_aead_iv { >> __be32 pktid; >> u8 nonce[OVPN_NONCE_TAIL_SIZE]; >> } __attribute__((packed)); > > __attribute__((packed)) should not be needed here as the fields in > both structs look properly aligned, and IIRC using packed can cause > the compiler to generate worse code. True, the fields are pretty good aligned and from code generation perspective packed indication is unneeded. I suggested to mark structs as packed mostly as a documentation to clearly state that these structures represent specific memory layout. >>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h >>> index 8516c1ccd57a7c7634a538fe3ac16c858f647420..84d294aab20b79b8e9cb9b736a074105c99338f3 100644 >>> --- a/include/uapi/linux/if_link.h >>> +++ b/include/uapi/linux/if_link.h >>> @@ -1975,4 +1975,19 @@ enum { >>> #define IFLA_DSA_MAX (__IFLA_DSA_MAX - 1) >>> +/* OVPN section */ >>> + >>> +enum ovpn_mode { >>> + OVPN_MODE_P2P, >>> + OVPN_MODE_MP, >>> +}; >> >> Mode min/max values can be defined here and the netlink policy can reference >> these values: >> >> enum ovpn_mode { >> OVPN_MODE_P2P, >> OVPN_MODE_MP, >> __OVPN_MODE_MAX >> }; >> >> #define OVPN_MODE_MIN OVPN_MODE_P2P >> #define OVPN_MODE_MAX (__OVPN_MODE_MAX - 1) >> >> ... = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_MIN, OVPN_MODE_MAX) > > I don't think there's much benefit to that, other than making the diff > smaller on a (very unlikely) patch that would add a new mode in the > future. It even looks more inconvenient to me when reading the code > ("ok what are _MIN and _MAX? the code is using _P2P and _MP, do they > match?"). I would answer yes. Just prefer to trust these kind of statements until it crashes badly. Honestly, I never thought that referring to a max value might raise such a question. Can you give an example why it should be meaningful to know exact min/max values of an unordered set? I suggested to define boundaries indeed for documentation purpose. Diff reduction is also desirable, but as you already mentioned, here it is not the case. Using specific values in a range declaration assigns them with extra semantic. Like, MODE_P2P is also a minimal possible value while MODE_MP has this extra meaning of minimal possible value. And we can learn this only from the policy which is specified far way from the modes declarations. I also see policies declaration as referring to already defined information rather than creating new meanings. On another hand the NL policy is the only user, so maybe we should left it as-is for the sake of simplicity. -- Sergey
On 12.11.2024 19:31, Sabrina Dubroca wrote: > 2024-11-10, 15:38:27 +0200, Sergey Ryazanov wrote: >> On 29.10.2024 12:47, Antonio Quartulli wrote: >>> An ovpn_peer object holds the whole status of a remote peer >>> (regardless whether it is a server or a client). >>> >>> This includes status for crypto, tx/rx buffers, napi, etc. >>> >>> Only support for one peer is introduced (P2P mode). >>> Multi peer support is introduced with a later patch. >> >> Reviewing the peer creation/destroying code I came to a generic question. >> Did you consider keeping a single P2P peer in the peers table as well? >> >> Looks like such approach can greatly simply the code by dropping all these >> 'switch (ovpn->mode)' checks and implementing a unified peer management. The >> 'peer' field in the main private data structure can be kept to accelerate >> lookups, still using peers table for management tasks like removing all the >> peers on the interface teardown. > > It would save a few 'switch(mode)', but force every client to allocate > the hashtable for no reason at all. That tradeoff doesn't look very > beneficial to me, the P2P-specific code is really simple. And if you > keep ovpn->peer to make lookups faster, you're not removing that many > 'switch(mode)'. Looking at the done review, I can retrospectively conclude that I personally do not like short 'switch' statements and special handlers :) Seriously, this module has a highest density of switches per KLOC from what I have seen before and a major part of it dedicated to handle the special case of P2P connection. What together look too unusual, so it feels like a flaw in the design. I racked my brains to come up with a better solution and failed. So I took a different approach, inviting people to discuss item pieces of the code to find a solution collectively or to realize that there is no better solution for now. The problem is that all these hash tables become inefficient with the single entry (P2P case). I was thinking about allocating a table with a single bin, but it still requires hash function run to access the indexed entry. And back to the hashtable(s) size for the MP mode. 8k-bins table looks a good choice for a normal server with 1-2Gb uplink serving up to 1k connections. But it sill unclear, how this choice can affect installations with a bigger number of connections? Or is this module applicable for embedded solutions? E.g. running a couple of VPN servers on a home router with a few actual connections looks like a waste of RAM. I was about to suggest to use rhashtable due to its dynamic sizing feature, but the module needs three tables. Any better idea? -- Sergey
2024-11-13, 03:37:13 +0200, Sergey Ryazanov wrote: > On 12.11.2024 19:31, Sabrina Dubroca wrote: > > 2024-11-10, 15:38:27 +0200, Sergey Ryazanov wrote: > > > On 29.10.2024 12:47, Antonio Quartulli wrote: > > > > An ovpn_peer object holds the whole status of a remote peer > > > > (regardless whether it is a server or a client). > > > > > > > > This includes status for crypto, tx/rx buffers, napi, etc. > > > > > > > > Only support for one peer is introduced (P2P mode). > > > > Multi peer support is introduced with a later patch. > > > > > > Reviewing the peer creation/destroying code I came to a generic question. > > > Did you consider keeping a single P2P peer in the peers table as well? > > > > > > Looks like such approach can greatly simply the code by dropping all these > > > 'switch (ovpn->mode)' checks and implementing a unified peer management. The > > > 'peer' field in the main private data structure can be kept to accelerate > > > lookups, still using peers table for management tasks like removing all the > > > peers on the interface teardown. > > > > It would save a few 'switch(mode)', but force every client to allocate > > the hashtable for no reason at all. That tradeoff doesn't look very > > beneficial to me, the P2P-specific code is really simple. And if you > > keep ovpn->peer to make lookups faster, you're not removing that many > > 'switch(mode)'. > > Looking at the done review, I can retrospectively conclude that I personally > do not like short 'switch' statements and special handlers :) > > Seriously, this module has a highest density of switches per KLOC from what > I have seen before and a major part of it dedicated to handle the special > case of P2P connection. I think it's fine. Either way there will be two implementations of whatever mode-dependent operation needs to be done. switch doesn't make it more complex than an ops structure. If you're reading the current version and find ovpn_peer_add, you see directly that it'll do either ovpn_peer_add_mp or ovpn_peer_add_p2p. With an ops structure, you'd have a call to ovpn->ops->peer_add, and you'd have to look up all possible ops structures to know that it can be either ovpn_peer_add_mp or ovpn_peer_add_p2p. If there's an undefined number of implementations living in different modules (like net_device_ops, or L4 protocols), you don't have a choice. xfrm went the opposite way to what you're proposing a few years ago (see commit 0c620e97b349 ("xfrm: remove output indirection from xfrm_mode") and others), and it made the code simpler. > What together look too unusual, so it feels like a > flaw in the design. I don't think it's a flaw in the design, maybe just different needs from other code you've seen (but similar in some ways to xfrm). > I racked my brains to come up with a better solution and > failed. So I took a different approach, inviting people to discuss item > pieces of the code to find a solution collectively or to realize that there > is no better solution for now. Sure. And I think there is no better solution, so I'm answering this thread to say that. > The problem is that all these hash tables become inefficient with the single > entry (P2P case). I was thinking about allocating a table with a single bin, > but it still requires hash function run to access the indexed entry. And the current implementation relies on fixed-size hashtables (hash_for_each_safe -> HASH_SIZE -> ARRAY_SIZE -> sizeof). > And back to the hashtable(s) size for the MP mode. 8k-bins table looks a > good choice for a normal server with 1-2Gb uplink serving up to 1k > connections. But it sill unclear, how this choice can affect installations > with a bigger number of connections? Or is this module applicable for > embedded solutions? E.g. running a couple of VPN servers on a home router > with a few actual connections looks like a waste of RAM. I was about to > suggest to use rhashtable due to its dynamic sizing feature, but the module > needs three tables. Any better idea? For this initial implementation I think it's fine. Sure, converting to rhashtable (or some other type of dynamically-sized hashtable, if rhashtable doesn't fit) in the future would make sense. But I don't think it's necessary to get the patches into net-next.
2024-11-12, 15:26:59 +0100, Antonio Quartulli wrote: > On 11/11/2024 16:41, Sabrina Dubroca wrote: > > 2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote: > > > +void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer) > > > + __must_hold(&peer->ovpn->peers->lock) > > > > Changes to peer->vpn_addrs are not protected by peers->lock, so those > > could be getting updated while we're rehashing (and taking peer->lock > > in ovpn_nl_peer_modify as I'm suggesting above also wouldn't prevent > > that). > > > > /me screams :-D Sorry :) > Indeed peers->lock is only about protecting the lists, not the content of > the listed objects. > > How about acquiring the peers->lock before calling ovpn_nl_peer_modify()? It seems like it would work. Maybe a bit weird to have conditional locking (MP mode only), but ok. You already have this lock ordering (hold peers->lock before taking peer->lock) in ovpn_peer_keepalive_work_mp, so there should be no deadlock from doing the same thing in the netlink code. Then I would also do that in ovpn_peer_float to protect that rehash. It feels like peers->lock is turning into a duplicate of ovpn->lock. ovpn->lock used for P2P mode, peers->lock used equivalently for MP mode. You might consider merging them (but I wouldn't see it as necessary for merging the series unless there's a locking issue with the current proposal).
2024-11-12, 15:03:00 +0100, Antonio Quartulli wrote: > On 12/11/2024 11:56, Sabrina Dubroca wrote: > > 2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote: > > > diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c > > > index 63c140138bf98e5d1df79a2565b666d86513323d..0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261 100644 > > > --- a/drivers/net/ovpn/io.c > > > +++ b/drivers/net/ovpn/io.c > > > @@ -135,6 +135,15 @@ void ovpn_decrypt_post(void *data, int ret) > > > /* keep track of last received authenticated packet for keepalive */ > > > peer->last_recv = ktime_get_real_seconds(); > > > + if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) { > > > > What prevents peer->sock from being replaced and released > > concurrently? > > Technically nothing. > Userspace currently does not even support updating a peer socket at runtime, > but I wanted ovpn to be flexible enough from the beginning. Is there a reason to do that? With TCP the peer would have to reconnect, and I guess fully restart the whole process (become a new peer with a new ID etc). With UDP, do you need to replace the socket? > One approach might be to go back to peer->sock being unmutable and forget > about this. > > OTOH, if we want to keep this flexibility (which I think is nice), I think I > should make peer->sock an RCU pointer and access it accordingly. You already use kfree_rcu for ovpn_socket, so the only difference would be the __rcu annotation and helpers? (+ rcu_read_lock/unlock in a few places) Adding rcu_read_lock for peer->sock in ovpn_tcp_tx_work looks painful... (another place that I missed where things could go bad if the socket was updated in the current implementation, btw) Maybe save that for later since you don't have a use case for it yet?
2024-11-12, 15:19:50 +0100, Antonio Quartulli wrote: > On 04/11/2024 16:14, Sabrina Dubroca wrote: > > 2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote: > > > +static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn, > > > + struct genl_info *info, > > > + struct nlattr **attrs) > > > +{ > > > + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs, > > > + OVPN_A_PEER_ID)) > > > + return -EINVAL; > > > + > > > + if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) { > > > + NL_SET_ERR_MSG_MOD(info->extack, > > > + "cannot specify both remote IPv4 or IPv6 address"); > > > + return -EINVAL; > > > + } > > > + > > > + if (!attrs[OVPN_A_PEER_REMOTE_IPV4] && > > > + !attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) { > > > + NL_SET_ERR_MSG_MOD(info->extack, > > > + "cannot specify remote port without IP address"); > > > + return -EINVAL; > > > + } > > > + > > > + if (!attrs[OVPN_A_PEER_REMOTE_IPV4] && > > > + attrs[OVPN_A_PEER_LOCAL_IPV4]) { > > > + NL_SET_ERR_MSG_MOD(info->extack, > > > + "cannot specify local IPv4 address without remote"); > > > + return -EINVAL; > > > + } > > > + > > > + if (!attrs[OVPN_A_PEER_REMOTE_IPV6] && > > > + attrs[OVPN_A_PEER_LOCAL_IPV6]) { > > > > I think these consistency checks should account for v4mapped > > addresses. With remote=v4mapped and local=v6 we'll end up with an > > incorrect ipv4 "local" address (taken out of the ipv6 address's first > > 4B by ovpn_peer_reset_sockaddr). With remote=ipv6 and local=v4mapped, > > we'll pass the last 4B of OVPN_A_PEER_LOCAL_IPV6 to > > ovpn_peer_reset_sockaddr and try to read 16B (the full ipv6 address) > > out of that. > > Right, a v4mapped address would fool this check. > How about checking if both or none addresses are v4mapped? This way we > should prevent such cases. I don't know when userspace would use v4mapped addresses, but treating a v4mapped address as a "proper" ipv4 address should work with the rest of the code, since you already have the conversion in ovpn_nl_attr_local_ip and ovpn_nl_attr_sockaddr_remote. So maybe you could do something like (rough idea and completely untested): static int get_family(attr_v4, attr_v6) { if (attr_v4) return AF_INET; if (attr_v6) { if (ipv6_addr_v4mapped(attr_v6) return AF_INET; return AF_INET6; } return AF_UNSPEC; } // in _precheck: // keep the attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6] check // maybe add a similar one for LOCAL_IPV4 && LOCAL_IPV6 remote_family = get_family(attrs[OVPN_A_PEER_REMOTE_IPV4], attrs[OVPN_A_PEER_REMOTE_IPV6]); local_family = get_family(attrs[OVPN_A_PEER_LOCAL_IPV4], attrs[OVPN_A_PEER_LOCAL_IPV6]); if (remote_family != local_family) { extack "incompatible address families"; return -EINVAL; } That would mirror the conversion that ovpn_nl_attr_local_ip/ovpn_nl_attr_sockaddr_remote do. > > > int ovpn_nl_peer_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) > > > { > > [...] > > > + } else { > > > + rcu_read_lock(); > > > + hash_for_each_rcu(ovpn->peers->by_id, bkt, peer, > > > + hash_entry_id) { > > > + /* skip already dumped peers that were dumped by > > > + * previous invocations > > > + */ > > > + if (last_idx > 0) { > > > + last_idx--; > > > + continue; > > > + } > > > > If a peer that was dumped during a previous invocation is removed in > > between, we'll miss one that's still present in the overall dump. I > > don't know how much it matters (I guses it depends on how the results > > of this dump are used by userspace), so I'll let you decide if this > > needs to be fixed immediately or if it can be ignored for now. > > True, this is a risk I assumed. > Not extremely important if you ask me, but do you have any suggestion how to > avoid this in an elegant and lockless way? No, inconsistent dumps are an old problem with netlink, so I'm just mentioning it as something to be aware of. You can add genl_dump_check_consistent to let userspace know that it may have gotten incorrect information (you'll need to keep a counter and increment it when a peer is added/removed). On a very busy server you may never manage to get a consistent dump, if peers are going up and down very fast. There's been some progress for dumping netdevices in commit 759ab1edb56c ("net: store netdevs in an xarray"), but that can still return incorrect data.
On 12/11/2024 17:47, Sabrina Dubroca wrote: > 2024-11-09, 03:01:21 +0200, Sergey Ryazanov wrote: >> On 29.10.2024 12:47, Antonio Quartulli wrote: >>> +/* When the OpenVPN protocol is ran in AEAD mode, use >>> + * the OpenVPN packet ID as the AEAD nonce: >>> + * >>> + * 00000005 521c3b01 4308c041 >>> + * [seq # ] [ nonce_tail ] >>> + * [ 12-byte full IV ] -> NONCE_SIZE >>> + * [4-bytes -> NONCE_WIRE_SIZE >>> + * on wire] >>> + */ >> >> Nice diagram! Can we go futher and define the OpenVPN packet header as a >> stucture? Referencing the structure instead of using magic sizes and offsets >> can greatly improve the code readability. Especially when it comes to header >> construction/parsing in the encryption/decryption code. >> >> E.g. define a structures like this: >> >> struct ovpn_pkt_hdr { >> __be32 op; >> __be32 pktid; >> u8 auth[]; >> } __attribute__((packed)); >> >> struct ovpn_aead_iv { >> __be32 pktid; >> u8 nonce[OVPN_NONCE_TAIL_SIZE]; >> } __attribute__((packed)); > > __attribute__((packed)) should not be needed here as the fields in > both structs look properly aligned, and IIRC using packed can cause > the compiler to generate worse code. Agreed. Using packed will make certain architecture read every field byte by byte (I remember David M. biting us on this in batman-adv :)) This said, I like the idea of using a struct, but I don't feel confident enough to change the code now that we are hitting v12. This kind of change will be better implemented later and tested carefully. (and patches are always welcome! :)) > > >>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h >>> index 8516c1ccd57a7c7634a538fe3ac16c858f647420..84d294aab20b79b8e9cb9b736a074105c99338f3 100644 >>> --- a/include/uapi/linux/if_link.h >>> +++ b/include/uapi/linux/if_link.h >>> @@ -1975,4 +1975,19 @@ enum { >>> #define IFLA_DSA_MAX (__IFLA_DSA_MAX - 1) >>> +/* OVPN section */ >>> + >>> +enum ovpn_mode { >>> + OVPN_MODE_P2P, >>> + OVPN_MODE_MP, >>> +}; >> >> Mode min/max values can be defined here and the netlink policy can reference >> these values: >> >> enum ovpn_mode { >> OVPN_MODE_P2P, >> OVPN_MODE_MP, >> __OVPN_MODE_MAX >> }; >> >> #define OVPN_MODE_MIN OVPN_MODE_P2P >> #define OVPN_MODE_MAX (__OVPN_MODE_MAX - 1) >> >> ... = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_MIN, OVPN_MODE_MAX) > > I don't think there's much benefit to that, other than making the diff > smaller on a (very unlikely) patch that would add a new mode in the > future. It even looks more inconvenient to me when reading the code > ("ok what are _MIN and _MAX? the code is using _P2P and _MP, do they > match?"). I agree with Sabrina here. I also initially thought about having MIN/MAX, but it wouldn't make things simpler for the ovpn_mode. Regards,
On 13/11/2024 12:25, Sabrina Dubroca wrote: > 2024-11-12, 15:03:00 +0100, Antonio Quartulli wrote: >> On 12/11/2024 11:56, Sabrina Dubroca wrote: >>> 2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote: >>>> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c >>>> index 63c140138bf98e5d1df79a2565b666d86513323d..0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261 100644 >>>> --- a/drivers/net/ovpn/io.c >>>> +++ b/drivers/net/ovpn/io.c >>>> @@ -135,6 +135,15 @@ void ovpn_decrypt_post(void *data, int ret) >>>> /* keep track of last received authenticated packet for keepalive */ >>>> peer->last_recv = ktime_get_real_seconds(); >>>> + if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) { >>> >>> What prevents peer->sock from being replaced and released >>> concurrently? >> >> Technically nothing. >> Userspace currently does not even support updating a peer socket at runtime, >> but I wanted ovpn to be flexible enough from the beginning. > > Is there a reason to do that? With TCP the peer would have to > reconnect, and I guess fully restart the whole process (become a new > peer with a new ID etc). With UDP, do you need to replace the socket? At the moment userspace won't try to do that, but I can foresee some future use cases: i.e. a peer that switches to a different interface and needs to open a new socket to keep sending data. Moreover, in userspace we're currently working on multisocket support (theoretically server side only), therefore I can imagine a peer floating from one socket to the other while keeping the session alive. This is all work in progress, but not that far in the future. For TCP, you're right, although at some point we may even implement transport reconnections without losing the VPN state (this is not even planned, just a brain dump). > >> One approach might be to go back to peer->sock being unmutable and forget >> about this. >> >> OTOH, if we want to keep this flexibility (which I think is nice), I think I >> should make peer->sock an RCU pointer and access it accordingly. > > You already use kfree_rcu for ovpn_socket, so the only difference > would be the __rcu annotation and helpers? (+ rcu_read_lock/unlock in > a few places) > > Adding rcu_read_lock for peer->sock in ovpn_tcp_tx_work looks > painful... (another place that I missed where things could go bad if > the socket was updated in the current implementation, btw) > > Maybe save that for later since you don't have a use case for it yet? I agree with you. I'll make the socket unmutable again and I'll work on this later on. Thanks a lot for digging with me into this. Regards,
On 13/11/2024 17:56, Sabrina Dubroca wrote: > 2024-11-12, 15:19:50 +0100, Antonio Quartulli wrote: >> On 04/11/2024 16:14, Sabrina Dubroca wrote: >>> 2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote: >>>> +static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn, >>>> + struct genl_info *info, >>>> + struct nlattr **attrs) >>>> +{ >>>> + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs, >>>> + OVPN_A_PEER_ID)) >>>> + return -EINVAL; >>>> + >>>> + if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) { >>>> + NL_SET_ERR_MSG_MOD(info->extack, >>>> + "cannot specify both remote IPv4 or IPv6 address"); >>>> + return -EINVAL; >>>> + } >>>> + >>>> + if (!attrs[OVPN_A_PEER_REMOTE_IPV4] && >>>> + !attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) { >>>> + NL_SET_ERR_MSG_MOD(info->extack, >>>> + "cannot specify remote port without IP address"); >>>> + return -EINVAL; >>>> + } >>>> + >>>> + if (!attrs[OVPN_A_PEER_REMOTE_IPV4] && >>>> + attrs[OVPN_A_PEER_LOCAL_IPV4]) { >>>> + NL_SET_ERR_MSG_MOD(info->extack, >>>> + "cannot specify local IPv4 address without remote"); >>>> + return -EINVAL; >>>> + } >>>> + >>>> + if (!attrs[OVPN_A_PEER_REMOTE_IPV6] && >>>> + attrs[OVPN_A_PEER_LOCAL_IPV6]) { >>> >>> I think these consistency checks should account for v4mapped >>> addresses. With remote=v4mapped and local=v6 we'll end up with an >>> incorrect ipv4 "local" address (taken out of the ipv6 address's first >>> 4B by ovpn_peer_reset_sockaddr). With remote=ipv6 and local=v4mapped, >>> we'll pass the last 4B of OVPN_A_PEER_LOCAL_IPV6 to >>> ovpn_peer_reset_sockaddr and try to read 16B (the full ipv6 address) >>> out of that. >> >> Right, a v4mapped address would fool this check. >> How about checking if both or none addresses are v4mapped? This way we >> should prevent such cases. > > I don't know when userspace would use v4mapped addresses, It happens when listening on [::] with a v6 socket that has no "IPV6_V6ONLY" set to true (you can check ipv6(7) for more details). This socket can receive IPv4 connections, which are implemented using v4mapped addresses. In this case both remote and local are going to be v4mapped. However, the sanity check should make sure nobody can inject bogus combinations. > but treating > a v4mapped address as a "proper" ipv4 address should work with the > rest of the code, since you already have the conversion in > ovpn_nl_attr_local_ip and ovpn_nl_attr_sockaddr_remote. So maybe you > could do something like (rough idea and completely untested): > > static int get_family(attr_v4, attr_v6) > { > if (attr_v4) > return AF_INET; > if (attr_v6) { > if (ipv6_addr_v4mapped(attr_v6) > return AF_INET; > return AF_INET6; > } > return AF_UNSPEC; > } > > > // in _precheck: > // keep the attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6] check > // maybe add a similar one for LOCAL_IPV4 && LOCAL_IPV6 the latter is already covered by: 192 if (!attrs[OVPN_A_PEER_REMOTE_IPV4] && 193 attrs[OVPN_A_PEER_LOCAL_IPV4]) { 194 NL_SET_ERR_MSG_MOD(info->extack, 195 "cannot specify local IPv4 address without remote"); 196 return -EINVAL; 197 } 198 199 if (!attrs[OVPN_A_PEER_REMOTE_IPV6] && 200 attrs[OVPN_A_PEER_LOCAL_IPV6]) { 201 NL_SET_ERR_MSG_MOD(info->extack, 202 "cannot specify local IPV6 address without remote"); 203 return -EINVAL; 204 } > > remote_family = get_family(attrs[OVPN_A_PEER_REMOTE_IPV4], attrs[OVPN_A_PEER_REMOTE_IPV6]); > local_family = get_family(attrs[OVPN_A_PEER_LOCAL_IPV4], attrs[OVPN_A_PEER_LOCAL_IPV6]); > if (remote_family != local_family) { > extack "incompatible address families"; > return -EINVAL; > } > > That would mirror the conversion that > ovpn_nl_attr_local_ip/ovpn_nl_attr_sockaddr_remote do. Yeah, pretty much what I was suggested, but in a more explicit manner. I like it. > >>>> int ovpn_nl_peer_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) >>>> { >>> [...] >>>> + } else { >>>> + rcu_read_lock(); >>>> + hash_for_each_rcu(ovpn->peers->by_id, bkt, peer, >>>> + hash_entry_id) { >>>> + /* skip already dumped peers that were dumped by >>>> + * previous invocations >>>> + */ >>>> + if (last_idx > 0) { >>>> + last_idx--; >>>> + continue; >>>> + } >>> >>> If a peer that was dumped during a previous invocation is removed in >>> between, we'll miss one that's still present in the overall dump. I >>> don't know how much it matters (I guses it depends on how the results >>> of this dump are used by userspace), so I'll let you decide if this >>> needs to be fixed immediately or if it can be ignored for now. >> >> True, this is a risk I assumed. >> Not extremely important if you ask me, but do you have any suggestion how to >> avoid this in an elegant and lockless way? > > No, inconsistent dumps are an old problem with netlink, so I'm just > mentioning it as something to be aware of. You can add > genl_dump_check_consistent to let userspace know that it may have > gotten incorrect information (you'll need to keep a counter and > increment it when a peer is added/removed). On a very busy server you > may never manage to get a consistent dump, if peers are going up and > down very fast. > > There's been some progress for dumping netdevices in commit > 759ab1edb56c ("net: store netdevs in an xarray"), but that can still > return incorrect data. Got it. I'll keep it as it is for now, since this is not critical. Thanks a lot. Regards, >
On 13/11/2024 12:05, Sabrina Dubroca wrote: > 2024-11-12, 15:26:59 +0100, Antonio Quartulli wrote: >> On 11/11/2024 16:41, Sabrina Dubroca wrote: >>> 2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote: >>>> +void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer) >>>> + __must_hold(&peer->ovpn->peers->lock) >>> >>> Changes to peer->vpn_addrs are not protected by peers->lock, so those >>> could be getting updated while we're rehashing (and taking peer->lock >>> in ovpn_nl_peer_modify as I'm suggesting above also wouldn't prevent >>> that). >>> >> >> /me screams :-D > > Sorry :) > >> Indeed peers->lock is only about protecting the lists, not the content of >> the listed objects. >> >> How about acquiring the peers->lock before calling ovpn_nl_peer_modify()? > > It seems like it would work. Maybe a bit weird to have conditional > locking (MP mode only), but ok. You already have this lock ordering > (hold peers->lock before taking peer->lock) in > ovpn_peer_keepalive_work_mp, so there should be no deadlock from doing > the same thing in the netlink code. Yeah. > > Then I would also do that in ovpn_peer_float to protect that rehash. I am not extremely comfortable with this, because it means acquiring peers->lock on every packet (right now we do so only on peer->lock) and it may defeat the advantage of the RCU locking on the hashtables. Wouldn't you agree? An alternative would be to hold peer->lock for the entire function, but this will lead to dead locks...no go either. > > It feels like peers->lock is turning into a duplicate of > ovpn->lock. ovpn->lock used for P2P mode, peers->lock used > equivalently for MP mode. You might consider merging them (but I > wouldn't see it as necessary for merging the series unless there's a > locking issue with the current proposal). I agree: ovpn->lock was introduced to protect ovpn's fields, but actually the only one e protect is peer. They are truly the same and I could therefore get rid of ovpn->peers->lock and always use ovpn->lock. Will see how invasive this is and decide whether to commit it to v12 or not. Thanks! Regards,
On 10/11/2024 20:52, Sergey Ryazanov wrote: > On 29.10.2024 12:47, Antonio Quartulli wrote: > > [...] > >> +static void ovpn_peer_release(struct ovpn_peer *peer) >> +{ >> + ovpn_bind_reset(peer, NULL); >> + > > nit: this empty line after ovpn_bind_reset() is removed in the > 'implement basic TX path (UDP)' patch. What tricks git and it produces a > sensless diff with 'ovpn_bind_reset(...)' line beeing removed and then > introduced again. If you do not like this empty line then remove it > here, please :) Thanks! will make sure it won't be introduced at all. Regards, > >> + dst_cache_destroy(&peer->dst_cache); >> + netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker); >> + kfree_rcu(peer, rcu); >> +} > > -- > Sergey
On 06/11/2024 02:18, Sergey Ryazanov wrote: > Hi Antonio, > > On 29.10.2024 12:47, Antonio Quartulli wrote: >> Notable changes from v10: >> * extended commit message of 23/23 with brief description of the output >> * Link to v10: https://lore.kernel.org/r/20241025-b4-ovpn-v10-0- >> b87530777be7@openvpn.net >> >> Please note that some patches were already reviewed by Andre Lunn, >> Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag >> since no major code modification has happened since the review. >> >> The latest code can also be found at: >> >> https://github.com/OpenVPN/linux-kernel-ovpn > > As I promised many months ago I am starting publishing some nit picks > regarding the series. Thanks and welcome back! > The review was started when series was V3 > "rebasing" the review to every next version to publish it at once. But I > lost this race to the new version releasing velocity :) So, I am going > to publish it patch-by-patch. > > Anyway you and all participants have done a great progress toward making > accelerator part of the kernel. Most of considerable things already > resolved so do not wait me please to finish picking every nit. I'll go through them all and judge what's meaningful to add to v12 and what can be postponed for later improvements. > > Regarding "big" topics I have only two concerns: link creation using > RTNL and a switch statement usage. In the corresponding thread, I asked > Jiri to clarify that "should" regarding .newlink implementation. Hope he > will have a chance to find a time to reply. True, but to be honest at this point I am fine with sticking to RTNL, also because we will soon introduce the ability to create 'persistent' ifaces, which a user should be able to create before starting openvpn. Going through RTNL for this is the best choice IMHO, therefore we have an extra use case in favour of this approach (next to what Jiri already mentioned). > > For the 'switch' statement, I see a repeating pattern of handling mode- > or family-specific cases like this: > > int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer) > { > switch (ovpn->mode) { > case OVPN_MODE_MP: > return ovpn_peer_add_mp(ovpn, peer); > case OVPN_MODE_P2P: > return ovpn_peer_add_p2p(ovpn, peer); > default: > return -EOPNOTSUPP; > } > } > > or > > void ovpn_encrypt_post(void *data, int ret) > { > ... > switch (peer->sock->sock->sk->sk_protocol) { > case IPPROTO_UDP: > ovpn_udp_send_skb(peer->ovpn, peer, skb); > break; > case IPPROTO_TCP: > ovpn_tcp_send_skb(peer, skb); > break; > default: > /* no transport configured yet */ > goto err; > } > ... > } > > or > > void ovpn_peer_keepalive_work(...) > { > ... > switch (ovpn->mode) { > case OVPN_MODE_MP: > next_run = ovpn_peer_keepalive_work_mp(ovpn, now); > break; > case OVPN_MODE_P2P: > next_run = ovpn_peer_keepalive_work_p2p(ovpn, now); > break; > } > ... > } > > Did you consider to implement mode specific operations as a set of > operations like this: > > ovpn_ops { > int (*peer_add)(struct ovpn_struct *ovpn, struct ovpn_peer *peer); > int (*peer_del)(struct ovpn_peer *peer, enum ovpn_del_peer_reason > reason); > void (*send_skb)(struct ovpn_peer *peer, struct sk_buff *skb); > time64_t (*keepalive_work)(...); > }; > > Initialize them during the interface creation and invoke these > operations indirectly. E.g. > > int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer) > { > return ovpn->ops->peer_add(ovpn, peer); > } > > void ovpn_encrypt_post(void *data, int ret) > { > ... > ovpn->ops->send_skb(peer, skb); > ... > } > > void ovpn_peer_keepalive_work(...) > { > ... > next_run = ovpn->ops->keepalive_work(ovpn, now); > ... > } > > Anyway the module has all these option values in advance during the > network interface creation phase and I believe replacing 'switch' > statements with indirect calls can make code easy to read. I see this was already discussed with Sabrina under another patch and I have the same opinion. To me the switch/case approach looks cleaner and I truly like it, especially when enums are involved. ops/callbacks are fine when they can be redefined at runtime (i.e. a proto that can be registered by another module), but this is not the case here. I also feel that with ops it's not easy to understand what call is truly being made by just looking at the caller context and reading can be harder. So I truly prefer to stick to this schema. Thanks a lot for sharing your point though. Regards,
On 14.11.2024 17:33, Antonio Quartulli wrote: > On 06/11/2024 02:18, Sergey Ryazanov wrote: >> Regarding "big" topics I have only two concerns: link creation using >> RTNL and a switch statement usage. In the corresponding thread, I >> asked Jiri to clarify that "should" regarding .newlink implementation. >> Hope he will have a chance to find a time to reply. > > True, but to be honest at this point I am fine with sticking to RTNL, > also because we will soon introduce the ability to create 'persistent' > ifaces, which a user should be able to create before starting openvpn. Could you share the use case for this functionality? > Going through RTNL for this is the best choice IMHO, therefore we have > an extra use case in favour of this approach (next to what Jiri already > mentioned). In absence of arguments it's hard to understand, what's the "best" meaning. So, I'm still not sure is it worth to split uAPI between two interfaces. Anyway, it's up to maintainers to decide is it mergeable in this form or not. I just shared some arguments for the full management interface in GENL. -- Sergey
On 14.11.2024 10:07, Antonio Quartulli wrote: > On 12/11/2024 17:47, Sabrina Dubroca wrote: >> 2024-11-09, 03:01:21 +0200, Sergey Ryazanov wrote: >>> On 29.10.2024 12:47, Antonio Quartulli wrote: >>>> +/* When the OpenVPN protocol is ran in AEAD mode, use >>>> + * the OpenVPN packet ID as the AEAD nonce: >>>> + * >>>> + * 00000005 521c3b01 4308c041 >>>> + * [seq # ] [ nonce_tail ] >>>> + * [ 12-byte full IV ] -> NONCE_SIZE >>>> + * [4-bytes -> NONCE_WIRE_SIZE >>>> + * on wire] >>>> + */ >>> >>> Nice diagram! Can we go futher and define the OpenVPN packet header as a >>> stucture? Referencing the structure instead of using magic sizes and >>> offsets >>> can greatly improve the code readability. Especially when it comes to >>> header >>> construction/parsing in the encryption/decryption code. >>> >>> E.g. define a structures like this: >>> >>> struct ovpn_pkt_hdr { >>> __be32 op; >>> __be32 pktid; >>> u8 auth[]; >>> } __attribute__((packed)); >>> >>> struct ovpn_aead_iv { >>> __be32 pktid; >>> u8 nonce[OVPN_NONCE_TAIL_SIZE]; >>> } __attribute__((packed)); >> >> __attribute__((packed)) should not be needed here as the fields in >> both structs look properly aligned, and IIRC using packed can cause >> the compiler to generate worse code. > > Agreed. Using packed will make certain architecture read every field > byte by byte (I remember David M. biting us on this in batman-adv :)) Still curious to see an example of that strange architecture/compiler combination. Anyway, as Sabrina mentioned, the header is already pretty aligned. So it's up to you how to document the structure. > This said, I like the idea of using a struct, but I don't feel confident > enough to change the code now that we are hitting v12. > This kind of change will be better implemented later and tested > carefully. (and patches are always welcome! :)) The main reason behind the structure introduction is to improve the code readability. To reduce a shadow, where bugs can reside. I wonder how many people have invested their time to dig through the encryption preparation function? As for risk of breaking something I should say that it can be addressed by connecting the kernel implementation to pure usespace implementation, which can be assumed the reference. And, I believe, it worth the benefit of merging easy to understand code. -- Sergey
On 09/11/2024 02:01, Sergey Ryazanov wrote: > On 29.10.2024 12:47, Antonio Quartulli wrote: >> Add basic infrastructure for handling ovpn interfaces. >> >> Signed-off-by: Antonio Quartulli <antonio@openvpn.net> >> --- >> drivers/net/ovpn/main.c | 115 ++++++++++++++++++++++++++++++++ >> ++++++++-- >> drivers/net/ovpn/main.h | 7 +++ >> drivers/net/ovpn/ovpnstruct.h | 8 +++ >> drivers/net/ovpn/packet.h | 40 +++++++++++++++ >> include/uapi/linux/if_link.h | 15 ++++++ >> 5 files changed, 180 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c >> index >> d5bdb0055f4dd3a6e32dc6e792bed1e7fd59e101..eead7677b8239eb3c48bb26ca95492d88512b8d4 100644 >> --- a/drivers/net/ovpn/main.c >> +++ b/drivers/net/ovpn/main.c >> @@ -10,18 +10,52 @@ >> #include <linux/genetlink.h> >> #include <linux/module.h> >> #include <linux/netdevice.h> >> +#include <linux/inetdevice.h> >> +#include <net/ip.h> >> #include <net/rtnetlink.h> >> -#include <uapi/linux/ovpn.h> >> +#include <uapi/linux/if_arp.h> >> #include "ovpnstruct.h" >> #include "main.h" >> #include "netlink.h" >> #include "io.h" >> +#include "packet.h" >> /* Driver info */ >> #define DRV_DESCRIPTION "OpenVPN data channel offload (ovpn)" >> #define DRV_COPYRIGHT "(C) 2020-2024 OpenVPN, Inc." >> +static void ovpn_struct_free(struct net_device *net) >> +{ >> +} > > nit: since this handler is not mandatory, its introduction can be moved > to the later patch, which actually fills it with meaningful operations. ehmm sure I will move it > >> +static int ovpn_net_open(struct net_device *dev) >> +{ >> + netif_tx_start_all_queues(dev); >> + return 0; >> +} >> + >> +static int ovpn_net_stop(struct net_device *dev) >> +{ >> + netif_tx_stop_all_queues(dev); >> + return 0; >> +} >> + >> +static const struct net_device_ops ovpn_netdev_ops = { >> + .ndo_open = ovpn_net_open, >> + .ndo_stop = ovpn_net_stop, >> + .ndo_start_xmit = ovpn_net_xmit, >> +}; >> + >> +static const struct device_type ovpn_type = { >> + .name = OVPN_FAMILY_NAME, > > nit: same question here regarding name deriviation. Are you sure that > the device type name is the same as the GENL family name? Like I said in the previous patch, I want all representative strings to be "ovpn", that is already the netlink family name. But I can create another constant to document this explicitly. > >> +}; >> + >> +static const struct nla_policy ovpn_policy[IFLA_OVPN_MAX + 1] = { >> + [IFLA_OVPN_MODE] = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_P2P, >> + OVPN_MODE_MP), >> +}; >> + >> /** >> * ovpn_dev_is_valid - check if the netdevice is of type 'ovpn' >> * @dev: the interface to check >> @@ -33,16 +67,76 @@ bool ovpn_dev_is_valid(const struct net_device *dev) >> return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit; >> } >> +static void ovpn_setup(struct net_device *dev) >> +{ >> + /* compute the overhead considering AEAD encryption */ >> + const int overhead = sizeof(u32) + NONCE_WIRE_SIZE + 16 + > > Where are these magic sizeof(u32) and '16' came from? It's in the "nice diagram" you commented later in this patch :-) But I can extend the comment. [...] >> @@ -51,26 +145,37 @@ static int ovpn_netdev_notifier_call(struct >> notifier_block *nb, >> unsigned long state, void *ptr) >> { >> struct net_device *dev = netdev_notifier_info_to_dev(ptr); >> + struct ovpn_struct *ovpn; >> if (!ovpn_dev_is_valid(dev)) >> return NOTIFY_DONE; >> + ovpn = netdev_priv(dev); > > nit: netdev_priv() returns only a pointer, it is safe to fetch the > pointer in advance, but do not dereference it until we are sure that an > event references the desired interface type. Takin this into > consideration, the assignment of private data pointer can be moved above > to the variable declaration. Just to make code couple of lines shorter. I do it here because it seems more "logically correct" to retrieve the priv pointer after having confirmed that this is a ovpn interface with ovpn_dev_is_valid(). Moving it above kinda says "I already know there is a ovpn object here", but this is not the case until after the valid() check. So I prefer to keep it here. [...] >> --- a/drivers/net/ovpn/main.h >> +++ b/drivers/net/ovpn/main.h >> @@ -12,4 +12,11 @@ >> bool ovpn_dev_is_valid(const struct net_device *dev); >> +#define SKB_HEADER_LEN \ >> + (max(sizeof(struct iphdr), sizeof(struct ipv6hdr)) + \ >> + sizeof(struct udphdr) + NET_SKB_PAD) >> + >> +#define OVPN_HEAD_ROOM ALIGN(16 + SKB_HEADER_LEN, 4) > > Where is this magic '16' came from? should be the same 16 af the over head above (it's the auth tag len) Will make this more explicit with a comment. > >> +#define OVPN_MAX_PADDING 16 >> + >> #endif /* _NET_OVPN_MAIN_H_ */ >> diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ >> ovpnstruct.h >> index >> e3e4df6418b081436378fc51d98db5bd7b5d1fbe..211df871538d34fdff90d182f21a0b0fb11b28ad 100644 >> --- a/drivers/net/ovpn/ovpnstruct.h >> +++ b/drivers/net/ovpn/ovpnstruct.h >> @@ -11,15 +11,23 @@ >> #define _NET_OVPN_OVPNSTRUCT_H_ >> #include <net/net_trackers.h> >> +#include <uapi/linux/if_link.h> >> +#include <uapi/linux/ovpn.h> >> /** >> * struct ovpn_struct - per ovpn interface state >> * @dev: the actual netdev representing the tunnel >> * @dev_tracker: reference tracker for associated dev >> + * @registered: whether dev is still registered with netdev or not >> + * @mode: device operation mode (i.e. p2p, mp, ..) >> + * @dev_list: entry for the module wide device list >> */ >> struct ovpn_struct { >> struct net_device *dev; >> netdevice_tracker dev_tracker; >> + bool registered; >> + enum ovpn_mode mode; >> + struct list_head dev_list; > > dev_list is no more used and should be deleted. ACK [...] >> + >> +/* OpenVPN nonce size */ >> +#define NONCE_SIZE 12 > > nit: is using the common 'OVPN_' prefix here and for other constants any > good idea? E.g. OVPN_NONCE_SIZE. It can give some hints where it comes > from for a code reader. ACK > > And another one question. Could you clarify in the comment to this > constant where it came from? AFAIU, these 12 bytes is the expectation of > the nonce size of AEAD crypto protocol, rigth? Correct: 12bytes/96bits. Will extend the comment. > >> + >> +/* OpenVPN nonce size reduced by 8-byte nonce tail -- this is the >> + * size of the AEAD Associated Data (AD) sent over the wire >> + * and is normally the head of the IV >> + */ >> +#define NONCE_WIRE_SIZE (NONCE_SIZE - sizeof(struct ovpn_nonce_tail)) > > If the headers and IV are defined as structures, we no more need this > constant since the header construction will be done by a compiler > according to the structure layout. yap yap. Will do this later as explained in the other email. > >> +/* Last 8 bytes of AEAD nonce >> + * Provided by userspace and usually derived from >> + * key material generated during TLS handshake >> + */ >> +struct ovpn_nonce_tail { >> + u8 u8[OVPN_NONCE_TAIL_SIZE]; >> +}; > > Why do you need a dadicated structure for this array? Can we declare the > corresponding fields like this: > > u8 nonce_tail_xmit[OVPN_NONCE_TAIL_SIZE]; > u8 nonce_tail_recv[OVPN_NONCE_TAIL_SIZE]; > I think the original reason was to have something to pass to sizeof() without making it harder for the reader. At some point I also wanted to get rid of the struct,but something stopped me. Not sure what was though. Will give it a try. Thanks a lot. Regards,
On 14/11/2024 23:57, Sergey Ryazanov wrote: > On 14.11.2024 10:07, Antonio Quartulli wrote: >> On 12/11/2024 17:47, Sabrina Dubroca wrote: >>> 2024-11-09, 03:01:21 +0200, Sergey Ryazanov wrote: >>>> On 29.10.2024 12:47, Antonio Quartulli wrote: >>>>> +/* When the OpenVPN protocol is ran in AEAD mode, use >>>>> + * the OpenVPN packet ID as the AEAD nonce: >>>>> + * >>>>> + * 00000005 521c3b01 4308c041 >>>>> + * [seq # ] [ nonce_tail ] >>>>> + * [ 12-byte full IV ] -> NONCE_SIZE >>>>> + * [4-bytes -> NONCE_WIRE_SIZE >>>>> + * on wire] >>>>> + */ >>>> >>>> Nice diagram! Can we go futher and define the OpenVPN packet header >>>> as a >>>> stucture? Referencing the structure instead of using magic sizes and >>>> offsets >>>> can greatly improve the code readability. Especially when it comes >>>> to header >>>> construction/parsing in the encryption/decryption code. >>>> >>>> E.g. define a structures like this: >>>> >>>> struct ovpn_pkt_hdr { >>>> __be32 op; >>>> __be32 pktid; >>>> u8 auth[]; >>>> } __attribute__((packed)); >>>> >>>> struct ovpn_aead_iv { >>>> __be32 pktid; >>>> u8 nonce[OVPN_NONCE_TAIL_SIZE]; >>>> } __attribute__((packed)); >>> >>> __attribute__((packed)) should not be needed here as the fields in >>> both structs look properly aligned, and IIRC using packed can cause >>> the compiler to generate worse code. >> >> Agreed. Using packed will make certain architecture read every field >> byte by byte (I remember David M. biting us on this in batman-adv :)) > > Still curious to see an example of that strange architecture/compiler > combination. Anyway, as Sabrina mentioned, the header is already pretty > aligned. So it's up to you how to document the structure. IIRC MIPS was one of those, but don't take my word for granted. > >> This said, I like the idea of using a struct, but I don't feel >> confident enough to change the code now that we are hitting v12. >> This kind of change will be better implemented later and tested >> carefully. (and patches are always welcome! :)) > > The main reason behind the structure introduction is to improve the code > readability. To reduce a shadow, where bugs can reside. I wonder how > many people have invested their time to dig through the encryption > preparation function? > > As for risk of breaking something I should say that it can be addressed > by connecting the kernel implementation to pure usespace implementation, > which can be assumed the reference. And, I believe, it worth the benefit > of merging easy to understand code. > I understand your point, but this is something I need to spend time on because the openvpn packet format is not "very stable", as in "it can vary depending on negotiated features". When implementing ovpn I decided what was the supported set of features so to create a stable packet header, but this may change moving forward (there is already some work going on in userspace regarding new features that ovpn will have to support). Therefore I want to take some time thinking about what's best. Regards,
On 10/11/2024 21:42, Sergey Ryazanov wrote: > Missed the most essential note regarding this patch :) > > On 29.10.2024 12:47, Antonio Quartulli wrote: >> +static int ovpn_net_open(struct net_device *dev) >> +{ >> + netif_tx_start_all_queues(dev); >> + return 0; >> +} >> + >> +static int ovpn_net_stop(struct net_device *dev) >> +{ >> + netif_tx_stop_all_queues(dev); > > Here we stop a user generated traffic in downlink. Shall we take care > about other kinds of traffic: keepalive, uplink? Keepalive is "metadata" and should continue to flow, regardless of whether the user interface is brought down. Uplink traffic directed to *this* device should just be dropped at delivery time. Incoming traffic directed to other peers will continue to work. > > I believe we should remove all the peers here or at least stop the > keepalive generation. But peers removing is better since > administratively down is administratively down, meaning user expected > full traffic stop in any direction. And even if we only stop the > keepalive generation then peer(s) anyway will destroy the tunnel on > their side. Uhm, I don't think the user expects all "protocol" traffic (and client to client) to stop by simply bringing down the interface. > > This way we even should not care about peers removing on the device > unregistering. What do you think? I think you are now mixing data plane and control plane. The fact that the user is stopping payload traffic does not imply we want to stop the VPN. The user may just be doing something with the interface (and on an MP node client-to-client traffic will still continue to flow). This would also be a non-negligible (and user faving) change in behaviour compared to the current openvpn implementation. Thanks for your input though, I can imagine coming from different angles things may look not the same. Regards, > >> + return 0; >> +}
On 09/11/2024 02:11, Sergey Ryazanov wrote: > On 29.10.2024 12:47, Antonio Quartulli wrote: >> An ovpn interface will keep carrier always on and let the user >> decide when an interface should be considered disconnected. >> >> This way, even if an ovpn interface is not connected to any peer, >> it can still retain all IPs and routes and thus prevent any data >> leak. >> >> Signed-off-by: Antonio Quartulli <antonio@openvpn.net> >> Reviewed-by: Andrew Lunn <andrew@lunn.ch> >> --- >> drivers/net/ovpn/main.c | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c >> index >> eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6 100644 >> --- a/drivers/net/ovpn/main.c >> +++ b/drivers/net/ovpn/main.c >> @@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device *net) >> static int ovpn_net_open(struct net_device *dev) >> { >> + /* ovpn keeps the carrier always on to avoid losing IP or route >> + * configuration upon disconnection. This way it can prevent leaks >> + * of traffic outside of the VPN tunnel. >> + * The user may override this behaviour by tearing down the >> interface >> + * manually. >> + */ >> + netif_carrier_on(dev); > > If a user cares about the traffic leaking, then he can create a > blackhole route with huge metric: > > # ip route add blackhole default metric 10000 > > Why the network interface should implicitly provide this functionality? > And on another hand, how a routing daemon can learn a topology change > without indication from the interface? This was discussed loooong ago with Andrew. Here my last response: https://lore.kernel.org/all/d896bbd8-2709-4834-a637-f982fc51fc57@openvpn.net/ Regards, > >> netif_tx_start_all_queues(dev); >> return 0; >> } >> >
On 11/11/2024 02:54, Sergey Ryazanov wrote: [...] >> +/* Called after decrypt to write the IP packet to the device. >> + * This method is expected to manage/free the skb. >> + */ >> +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff >> *skb) >> +{ >> + unsigned int pkt_len; >> + >> + /* we can't guarantee the packet wasn't corrupted before entering >> the >> + * VPN, therefore we give other layers a chance to check that >> + */ >> + skb->ip_summed = CHECKSUM_NONE; >> + >> + /* skb hash for transport packet no longer valid after >> decapsulation */ >> + skb_clear_hash(skb); >> + >> + /* post-decrypt scrub -- prepare to inject encapsulated packet >> onto the >> + * interface, based on __skb_tunnel_rx() in dst.h >> + */ >> + skb->dev = peer->ovpn->dev; >> + skb_set_queue_mapping(skb, 0); >> + skb_scrub_packet(skb, true); >> + > > The skb->protocol field is going to be updated in the upcoming patch in > the caller (ovpn_decrypt_post). Shall we put a comment here clarifying, > why do not touch the protocol field here? Well, I would personally not document missing details in a partly implemented code path. > >> + skb_reset_network_header(skb); > > ovpn_decrypt_post() already reseted the network header. Why do we need > it here again? yeah, I think this can be removed. > >> + skb_reset_transport_header(skb); >> + skb_probe_transport_header(skb); >> + skb_reset_inner_headers(skb); >> + >> + memset(skb->cb, 0, sizeof(skb->cb)); > > Why do we need to zero the control buffer here? To avoid the next layer to assume the cb is clean while it is not. Other drivers do the same as well. I think this was recommended by Sabrina as well. > >> + /* cause packet to be "received" by the interface */ >> + pkt_len = skb->len; >> + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, >> + skb) == NET_RX_SUCCESS)) >> + /* update RX stats with the size of decrypted packet */ >> + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); >> +} >> + >> +static void ovpn_decrypt_post(struct sk_buff *skb, int ret) >> +{ >> + struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer; >> + >> + if (unlikely(ret < 0)) >> + goto drop; >> + >> + ovpn_netdev_write(peer, skb); >> + /* skb is passed to upper layer - don't free it */ >> + skb = NULL; >> +drop: >> + if (unlikely(skb)) >> + dev_core_stats_rx_dropped_inc(peer->ovpn->dev); >> + ovpn_peer_put(peer); >> + kfree_skb(skb); >> +} >> + >> +/* pick next packet from RX queue, decrypt and forward it to the >> device */ > > The function now receives packets from externel callers. Should we > update the above comment? yap will do. [...] >> --- /dev/null >> +++ b/drivers/net/ovpn/proto.h >> @@ -0,0 +1,75 @@ >> +/* SPDX-License-Identifier: GPL-2.0-only */ >> +/* OpenVPN data channel offload >> + * >> + * Copyright (C) 2020-2024 OpenVPN, Inc. >> + * >> + * Author: Antonio Quartulli <antonio@openvpn.net> >> + * James Yonan <james@openvpn.net> >> + */ >> + >> +#ifndef _NET_OVPN_OVPNPROTO_H_ >> +#define _NET_OVPN_OVPNPROTO_H_ >> + >> +#include "main.h" >> + >> +#include <linux/skbuff.h> >> + >> +/* Methods for operating on the initial command >> + * byte of the OpenVPN protocol. >> + */ >> + >> +/* packet opcode (high 5 bits) and key-id (low 3 bits) are combined in >> + * one byte >> + */ >> +#define OVPN_KEY_ID_MASK 0x07 >> +#define OVPN_OPCODE_SHIFT 3 >> +#define OVPN_OPCODE_MASK 0x1F > > Instead of defining mask(s) and shift(s), we can define only masks and > use bitfield API (see below). > >> +/* upper bounds on opcode and key ID */ >> +#define OVPN_KEY_ID_MAX (OVPN_KEY_ID_MASK + 1) >> +#define OVPN_OPCODE_MAX (OVPN_OPCODE_MASK + 1) >> +/* packet opcodes of interest to us */ >> +#define OVPN_DATA_V1 6 /* data channel V1 packet */ >> +#define OVPN_DATA_V2 9 /* data channel V2 packet */ >> +/* size of initial packet opcode */ >> +#define OVPN_OP_SIZE_V1 1 >> +#define OVPN_OP_SIZE_V2 4 >> +#define OVPN_PEER_ID_MASK 0x00FFFFFF >> +#define OVPN_PEER_ID_UNDEF 0x00FFFFFF >> +/* first byte of keepalive message */ >> +#define OVPN_KEEPALIVE_FIRST_BYTE 0x2a >> +/* first byte of exit message */ >> +#define OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE 0x28 > > From the above list of macros, OVPN_KEY_ID_MAX, OVPN_OPCODE_MAX, > OVPN_OP_SIZE_V1, OVPN_KEEPALIVE_FIRST_BYTE, and > OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE are unused and looks like should be > removed. ACK > >> +/** >> + * ovpn_opcode_from_skb - extract OP code from skb at specified offset >> + * @skb: the packet to extract the OP code from >> + * @offset: the offset in the data buffer where the OP code is located >> + * >> + * Note: this function assumes that the skb head was pulled enough >> + * to access the first byte. >> + * >> + * Return: the OP code >> + */ >> +static inline u8 ovpn_opcode_from_skb(const struct sk_buff *skb, u16 >> offset) >> +{ >> + u8 byte = *(skb->data + offset); >> + >> + return byte >> OVPN_OPCODE_SHIFT; > > For example here, the shift can be replaced with bitfield macro: > > #define OVPN_OPCODE_PKTTYPE_MSK 0xf8000000 > #define OVPN_OPCODE_KEYID_MSK 0x07000000 > #define OVPN_OPCODE_PEERID_MSK 0x00ffffff > > static inline u8 ovpn_opcode_from_skb(...) > { > u32 opcode = be32_to_cpu(*(__be32 *)(skb->data + offset)); > > return FIELD_GET(OVPN_OPCODE_PKTTYPE_MSK, opcode); > } > > And the upcoming ovpn_opcode_compose() can be implemented like this: > > static inline u32 ovpn_opcode_compose(u8 opcode, u8 key_id, u32 peer_id) > { > return FIELD_PREP(OVPN_OPCODE_PKTTYPE_MSK, opcode) | > FIELD_PREP(OVPN_OPCODE_KEYID_MSK, key_id) | > FIELD_PREP(OVPN_OPCODE_PEERID_MSK, peer_id); > } > > And with this size can be even embedded into ovpn_aead_encrypt() to make > the header composing more clear. I wasn't aware of the bitfield API. Yeah, it looks cleaner and gives a better definition of the first 4 bytes of the header. There is also GENMASK() that helps with creating MASKs instead of hardcofing the bits in hex. Will give it a try, thanks! > >> +} >> + >> +/** >> + * ovpn_peer_id_from_skb - extract peer ID from skb at specified offset >> + * @skb: the packet to extract the OP code from >> + * @offset: the offset in the data buffer where the OP code is located >> + * >> + * Note: this function assumes that the skb head was pulled enough >> + * to access the first 4 bytes. >> + * >> + * Return: the peer ID. >> + */ >> +static inline u32 ovpn_peer_id_from_skb(const struct sk_buff *skb, >> u16 offset) >> +{ >> + return ntohl(*(__be32 *)(skb->data + offset)) & OVPN_PEER_ID_MASK; >> +} >> + >> +#endif /* _NET_OVPN_OVPNPROTO_H_ */ >> diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c >> index >> 090a3232ab0ec19702110f1a90f45c7f10889f6f..964b566de69f4132806a969a455cec7f6059a0bd 100644 >> --- a/drivers/net/ovpn/socket.c >> +++ b/drivers/net/ovpn/socket.c >> @@ -22,6 +22,9 @@ static void ovpn_socket_detach(struct socket *sock) >> if (!sock) >> return; >> + if (sock->sk->sk_protocol == IPPROTO_UDP) >> + ovpn_udp_socket_detach(sock); >> + >> sockfd_put(sock); >> } >> @@ -71,6 +74,27 @@ static int ovpn_socket_attach(struct socket *sock, >> struct ovpn_peer *peer) >> return ret; >> } >> +/* Retrieve the corresponding ovpn object from a UDP socket >> + * rcu_read_lock must be held on entry >> + */ >> +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) >> +{ >> + struct ovpn_socket *ovpn_sock; >> + >> + if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != >> UDP_ENCAP_OVPNINUDP)) >> + return NULL; >> + >> + ovpn_sock = rcu_dereference_sk_user_data(sk); >> + if (unlikely(!ovpn_sock)) >> + return NULL; >> + >> + /* make sure that sk matches our stored transport socket */ >> + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk)) >> + return NULL; >> + >> + return ovpn_sock->ovpn; > > Now, returning of this pointer is safe. But the following TCP transport > support calls the socket release via a scheduled work. What extends > socket lifetime and makes it possible to receive a UDP packet way after > the interface private data release. Is it correct assumption? Sorry you lost me when sayng "following *TCP* transp[ort support calls". This function is invoked only in UDP context. Was that a typ0? > > If the above is right then shall we set ->ovpn = NULL before scheduling > the socket releasing work or somehow else mark the socket as half- > destroyed? > >> +} >> + >> /** >> * ovpn_socket_new - create a new socket and initialize it >> * @sock: the kernel socket to embed >> diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c >> index >> d26d7566e9c8dfe91fa77f49c34fb179a9fb2239..d1e88ae83843f02d591e67a7995f2d6868720695 100644 >> --- a/drivers/net/ovpn/udp.c >> +++ b/drivers/net/ovpn/udp.c >> @@ -21,9 +21,95 @@ >> #include "bind.h" >> #include "io.h" >> #include "peer.h" >> +#include "proto.h" >> #include "socket.h" >> #include "udp.h" >> +/** >> + * ovpn_udp_encap_recv - Start processing a received UDP packet. >> + * @sk: socket over which the packet was received >> + * @skb: the received packet >> + * >> + * If the first byte of the payload is DATA_V2, the packet is further >> processed, >> + * otherwise it is forwarded to the UDP stack for delivery to user >> space. >> + * >> + * Return: >> + * 0 if skb was consumed or dropped >> + * >0 if skb should be passed up to userspace as UDP (packet not >> consumed) >> + * <0 if skb should be resubmitted as proto -N (packet not consumed) >> + */ >> +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) >> +{ >> + struct ovpn_peer *peer = NULL; >> + struct ovpn_struct *ovpn; >> + u32 peer_id; >> + u8 opcode; >> + >> + ovpn = ovpn_from_udp_sock(sk); >> + if (unlikely(!ovpn)) { >> + net_err_ratelimited("%s: cannot obtain ovpn object from UDP >> socket\n", >> + __func__); > > Probably we should zero ovpn pointer in the ovpn_sock to survive > scheduled socket release (see comment in ovpn_from_udp_sock). So, this > print should be removed to avoid printing misguiding errors. I am also not following this. ovpn is already NULL if we are entering this branch, no? And I think this condition is quite improbable as well. > >> + goto drop_noovpn; >> + } >> + >> + /* Make sure the first 4 bytes of the skb data buffer after the UDP >> + * header are accessible. >> + * They are required to fetch the OP code, the key ID and the >> peer ID. >> + */ >> + if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) + >> + OVPN_OP_SIZE_V2))) { >> + net_dbg_ratelimited("%s: packet too small\n", __func__); >> + goto drop; >> + } >> + >> + opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); >> + if (unlikely(opcode != OVPN_DATA_V2)) { >> + /* DATA_V1 is not supported */ >> + if (opcode == OVPN_DATA_V1) >> + goto drop; > > This packet dropping makes protocol accelerator, intendent to speed up > the data packets processing, a protocol enforcement entity, isn't it? > Shall we follow the principle of beeing liberal in what we accept and > just forward everything besides data packets upstream to a userspace > application? 'ovpn' only supports DATA_V2. When ovpn is in use userspace does nto expect any DATA packet to bubble up as it would not know what to do with it. So any decision regarding data packets should stay in 'ovpn'. We just decided to support the modern DATA_V2 (DATA_V1 is seldomly used nowadays). Moreover, it's quite impossible that a peer will send us DATA_V1 if it passed userspace handshake and negotiation. > >> + >> + /* unknown or control packet: let it bubble up to userspace */ >> + return 1; >> + } >> + >> + peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); >> + /* some OpenVPN server implementations send data packets with the >> + * peer-id set to undef. In this case we skip the peer lookup by >> peer-id >> + * and we try with the transport address >> + */ >> + if (peer_id != OVPN_PEER_ID_UNDEF) { >> + peer = ovpn_peer_get_by_id(ovpn, peer_id); >> + if (!peer) { >> + net_err_ratelimited("%s: received data from unknown peer >> (id: %d)\n", >> + __func__, peer_id); > > Why do we consider a peer sending us garbage our problem? Meaning, this > peer miss can be not our fault but a malformed packet from a 3rd party > side. E.g. nowdays I can see a lot of traces of these "active probers" > in my OpenVPN logs. Shall remove this message or at least make it debug > to avoid bothering users with garbage traveling Internet? Anyway we can > not do anything regarding incoming traffic. It could also be a peer that believes to be connected while 'ovpn' dropped it earlier on. So this message would help the admin/user understanding what's going on. no? Maybe make it an info/notice instead of error? > >> + goto drop; >> + } >> + } >> + >> + if (!peer) { > > AFAIU, this condition can true only in case of peer_id beeing equal to > OVPN_PEER_ID_UNDEF, right? In this case the condition check can be > replaced by simple 'else' statement. > This part was actually rewritten already, so better wait for v12 before further discussing. > And to make code more corresponding to the above comment regarding > implementations that send undefined peer-id, can we swap sides of the > lookup method selection? E.g. > > /* Comment about fancy implementations sending undefined peer-id */ > if (peer_id == OVPN_PEER_ID_UNDEF) { > /* Do transport address based loockup */ > } else { > /* Do peer-id based loockup */ > } > >> + /* data packet with undef peer-id */ >> + peer = ovpn_peer_get_by_transp_addr(ovpn, skb); >> + if (unlikely(!peer)) { >> + net_dbg_ratelimited("%s: received data with undef peer-id >> from unknown source\n", >> + __func__); >> + goto drop; >> + } >> + } >> + >> + /* pop off outer UDP header */ >> + __skb_pull(skb, sizeof(struct udphdr)); >> + ovpn_recv(peer, skb); >> + return 0; >> + >> +drop: >> + if (peer) >> + ovpn_peer_put(peer); > > AFAIU, the peer is alway NULL here. Shall we remove the above check? yeah simplified as well already. Thanks! Regards,
On 12/11/2024 01:16, Sergey Ryazanov wrote: > On 29.10.2024 12:47, Antonio Quartulli wrote: >> +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff >> *skb) >> +{ >> + unsigned int pkt_len; >> + >> + /* we can't guarantee the packet wasn't corrupted before entering >> the >> + * VPN, therefore we give other layers a chance to check that >> + */ >> + skb->ip_summed = CHECKSUM_NONE; >> + >> + /* skb hash for transport packet no longer valid after >> decapsulation */ >> + skb_clear_hash(skb); >> + >> + /* post-decrypt scrub -- prepare to inject encapsulated packet >> onto the >> + * interface, based on __skb_tunnel_rx() in dst.h >> + */ >> + skb->dev = peer->ovpn->dev; >> + skb_set_queue_mapping(skb, 0); >> + skb_scrub_packet(skb, true); >> + >> + skb_reset_network_header(skb); >> + skb_reset_transport_header(skb); >> + skb_probe_transport_header(skb); >> + skb_reset_inner_headers(skb); >> + >> + memset(skb->cb, 0, sizeof(skb->cb)); >> + >> + /* cause packet to be "received" by the interface */ >> + pkt_len = skb->len; >> + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, >> + skb) == NET_RX_SUCCESS)) > > nit: to improve readability, the packet delivery call can be composed > like this: > > pkt_len = skb->len; > res = gro_cells_receive(&peer->ovpn->gro_cells, skb); > if (likely(res == NET_RX_SUCCESS)) > hm, you don't like calls on two lines? :-) ok, will change it. Regards, >> + /* update RX stats with the size of decrypted packet */ >> + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); >> +}
On 14/11/2024 23:10, Sergey Ryazanov wrote: > On 14.11.2024 17:33, Antonio Quartulli wrote: >> On 06/11/2024 02:18, Sergey Ryazanov wrote: >>> Regarding "big" topics I have only two concerns: link creation using >>> RTNL and a switch statement usage. In the corresponding thread, I >>> asked Jiri to clarify that "should" regarding .newlink >>> implementation. Hope he will have a chance to find a time to reply. >> >> True, but to be honest at this point I am fine with sticking to RTNL, >> also because we will soon introduce the ability to create 'persistent' >> ifaces, which a user should be able to create before starting openvpn. > > Could you share the use case for this functionality? This is better asked to the users mailing list, but, for example, we recently had a discussion about this with the VyOS guys, where they want to create the interface and have it fit the various firewall/routing/chachacha logic, before any daemon is started. In OpenVPN userspace we already support the --mktun directive to help with this specific use case, so it's a long existing use case. > >> Going through RTNL for this is the best choice IMHO, therefore we have >> an extra use case in favour of this approach (next to what Jiri >> already mentioned). > > In absence of arguments it's hard to understand, what's the "best" > meaning. well, that's why I added "IMHO" :) > So, I'm still not sure is it worth to split uAPI between two > interfaces. Anyway, it's up to maintainers to decide is it mergeable in > this form or not. I just shared some arguments for the full management > interface in GENL. well, doing it differently from all other virtual drivers also requires some important reason IMHO. Anyway, I like the idea that iproute2 can be used to create interfaces, without the need to have another userspace tool for that. Regards,
On 15.11.2024 16:03, Antonio Quartulli wrote: > On 10/11/2024 21:42, Sergey Ryazanov wrote: >> Missed the most essential note regarding this patch :) >> >> On 29.10.2024 12:47, Antonio Quartulli wrote: >>> +static int ovpn_net_open(struct net_device *dev) >>> +{ >>> + netif_tx_start_all_queues(dev); >>> + return 0; >>> +} >>> + >>> +static int ovpn_net_stop(struct net_device *dev) >>> +{ >>> + netif_tx_stop_all_queues(dev); >> >> Here we stop a user generated traffic in downlink. Shall we take care >> about other kinds of traffic: keepalive, uplink? > > Keepalive is "metadata" and should continue to flow, regardless of > whether the user interface is brought down. > > Uplink traffic directed to *this* device should just be dropped at > delivery time. > > Incoming traffic directed to other peers will continue to work. How it's possible? AFAIU, the module uses the kernel IP routing subsystem. Putting the interface down will effectively block a client-to-client packet to reenter the interface. >> I believe we should remove all the peers here or at least stop the >> keepalive generation. But peers removing is better since >> administratively down is administratively down, meaning user expected >> full traffic stop in any direction. And even if we only stop the >> keepalive generation then peer(s) anyway will destroy the tunnel on >> their side. > > Uhm, I don't think the user expects all "protocol" traffic (and client > to client) to stop by simply bringing down the interface. > >> >> This way we even should not care about peers removing on the device >> unregistering. What do you think? > > I think you are now mixing data plane and control plane. > > The fact that the user is stopping payload traffic does not imply we > want to stop the VPN. > The user may just be doing something with the interface (and on an MP > node client-to-client traffic will still continue to flow). > > This would also be a non-negligible (and user faving) change in > behaviour compared to the current openvpn implementation. It's not about previous implementation, it's about the interface management procedures. I just cannot image how the proposed approach can be aligned with RFC 2863 section 3.1.13. IfAdminStatus and IfOperStatus. And if we are talking about a user experience, I cannot imagine my WLAN interface maintaining a connection to the access point after shutting it down. Or even better, a WLAN interface in the AP mode still forwarding traffic between wireless clients. Or a bridge interface switching traffic between ports and sending STP frames. > Thanks for your input though, I can imagine coming from different angles > things may look not the same. I believe nobody will mind if a userspace service will do a failover to continue serving connected clients. But from the kernel perspective, when user says 'ip link set down' the party is over. -- Sergey
On 19/11/2024 04:08, Sergey Ryazanov wrote: > On 15.11.2024 16:03, Antonio Quartulli wrote: >> On 10/11/2024 21:42, Sergey Ryazanov wrote: >>> Missed the most essential note regarding this patch :) >>> >>> On 29.10.2024 12:47, Antonio Quartulli wrote: >>>> +static int ovpn_net_open(struct net_device *dev) >>>> +{ >>>> + netif_tx_start_all_queues(dev); >>>> + return 0; >>>> +} >>>> + >>>> +static int ovpn_net_stop(struct net_device *dev) >>>> +{ >>>> + netif_tx_stop_all_queues(dev); >>> >>> Here we stop a user generated traffic in downlink. Shall we take care >>> about other kinds of traffic: keepalive, uplink? >> >> Keepalive is "metadata" and should continue to flow, regardless of >> whether the user interface is brought down. >> >> Uplink traffic directed to *this* device should just be dropped at >> delivery time. >> >> Incoming traffic directed to other peers will continue to work. > > How it's possible? AFAIU, the module uses the kernel IP routing > subsystem. Putting the interface down will effectively block a client- > to-client packet to reenter the interface. True. At least part of the traffic is stopped (traffic directed to the VPN IP of a peer will still flow as it does not require a routing table lookup). I circled this discussion through the other devs to see what perspective they would bring and we also agree that if something is stopping, better stop the entire infra. Also, if a user is fumbling with the link state, they are probably trying to bring the VPN down. I will go that way and basically perform the same cleanup as if the interface is being deleted. "the party is over"[cit.] :) Regards,
On 13.11.2024 12:03, Sabrina Dubroca wrote: > 2024-11-13, 03:37:13 +0200, Sergey Ryazanov wrote: >> On 12.11.2024 19:31, Sabrina Dubroca wrote: >>> 2024-11-10, 15:38:27 +0200, Sergey Ryazanov wrote: >>>> On 29.10.2024 12:47, Antonio Quartulli wrote: >>>>> An ovpn_peer object holds the whole status of a remote peer >>>>> (regardless whether it is a server or a client). >>>>> >>>>> This includes status for crypto, tx/rx buffers, napi, etc. >>>>> >>>>> Only support for one peer is introduced (P2P mode). >>>>> Multi peer support is introduced with a later patch. >>>> >>>> Reviewing the peer creation/destroying code I came to a generic question. >>>> Did you consider keeping a single P2P peer in the peers table as well? >>>> >>>> Looks like such approach can greatly simply the code by dropping all these >>>> 'switch (ovpn->mode)' checks and implementing a unified peer management. The >>>> 'peer' field in the main private data structure can be kept to accelerate >>>> lookups, still using peers table for management tasks like removing all the >>>> peers on the interface teardown. >>> >>> It would save a few 'switch(mode)', but force every client to allocate >>> the hashtable for no reason at all. That tradeoff doesn't look very >>> beneficial to me, the P2P-specific code is really simple. And if you >>> keep ovpn->peer to make lookups faster, you're not removing that many >>> 'switch(mode)'. >> >> Looking at the done review, I can retrospectively conclude that I personally >> do not like short 'switch' statements and special handlers :) >> >> Seriously, this module has a highest density of switches per KLOC from what >> I have seen before and a major part of it dedicated to handle the special >> case of P2P connection. > > I think it's fine. Either way there will be two implementations of > whatever mode-dependent operation needs to be done. switch doesn't > make it more complex than an ops structure. > > If you're reading the current version and find ovpn_peer_add, you see > directly that it'll do either ovpn_peer_add_mp or > ovpn_peer_add_p2p. With an ops structure, you'd have a call to > ovpn->ops->peer_add, and you'd have to look up all possible ops > structures to know that it can be either ovpn_peer_add_mp or > ovpn_peer_add_p2p. If there's an undefined number of implementations > living in different modules (like net_device_ops, or L4 protocols), > you don't have a choice. > > xfrm went the opposite way to what you're proposing a few years ago > (see commit 0c620e97b349 ("xfrm: remove output indirection from > xfrm_mode") and others), and it made the code simpler. I checked this. Florian did a nice rework. And the way of implementation looks reasonable since there are more than two encapsulation modes and handling is more complex than just selecting a function to call. What I don't like about switches, that it requires extra lines of code and pushes an author to introduce a default case with error handling. It was mentioned that the module unlikely going to support more than two modes. In this context shall we consider ternary operator usage. E.g.: next_run = ovpn->mode == OVPN_MODE_P2P ? ovpn_peer_keepalive_work_p2p(...) : ovpn_peer_keepalive_work_mp(...); >> And back to the hashtable(s) size for the MP mode. 8k-bins table looks a >> good choice for a normal server with 1-2Gb uplink serving up to 1k >> connections. But it sill unclear, how this choice can affect installations >> with a bigger number of connections? Or is this module applicable for >> embedded solutions? E.g. running a couple of VPN servers on a home router >> with a few actual connections looks like a waste of RAM. I was about to >> suggest to use rhashtable due to its dynamic sizing feature, but the module >> needs three tables. Any better idea? > > For this initial implementation I think it's fine. Sure, converting to > rhashtable (or some other type of dynamically-sized hashtable, if > rhashtable doesn't fit) in the future would make sense. But I don't > think it's necessary to get the patches into net-next. Make sense. Thanks for sharing these thoughts. -- Sergey
[I'm still thinking about the locking problems for ovpn_peer_float, but just noticed this while staring at the rehash code] 2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote: > +void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer) > + __must_hold(&peer->ovpn->peers->lock) > +{ > + struct hlist_nulls_head *nhead; > + > + if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) { > + /* remove potential old hashing */ > + hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr); s/hash_entry_transp_addr/hash_entry_addr4/ ? > + nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr, > + &peer->vpn_addrs.ipv4, > + sizeof(peer->vpn_addrs.ipv4)); > + hlist_nulls_add_head_rcu(&peer->hash_entry_addr4, nhead); > + } > + > + if (!ipv6_addr_any(&peer->vpn_addrs.ipv6)) { > + /* remove potential old hashing */ > + hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr); s/hash_entry_transp_addr/hash_entry_addr6/ ? > + nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr, > + &peer->vpn_addrs.ipv6, > + sizeof(peer->vpn_addrs.ipv6)); > + hlist_nulls_add_head_rcu(&peer->hash_entry_addr6, nhead); > + } > +}
On 21/11/2024 00:22, Sergey Ryazanov wrote: > On 13.11.2024 12:03, Sabrina Dubroca wrote: >> 2024-11-13, 03:37:13 +0200, Sergey Ryazanov wrote: >>> On 12.11.2024 19:31, Sabrina Dubroca wrote: >>>> 2024-11-10, 15:38:27 +0200, Sergey Ryazanov wrote: >>>>> On 29.10.2024 12:47, Antonio Quartulli wrote: >>>>>> An ovpn_peer object holds the whole status of a remote peer >>>>>> (regardless whether it is a server or a client). >>>>>> >>>>>> This includes status for crypto, tx/rx buffers, napi, etc. >>>>>> >>>>>> Only support for one peer is introduced (P2P mode). >>>>>> Multi peer support is introduced with a later patch. >>>>> >>>>> Reviewing the peer creation/destroying code I came to a generic >>>>> question. >>>>> Did you consider keeping a single P2P peer in the peers table as well? >>>>> >>>>> Looks like such approach can greatly simply the code by dropping >>>>> all these >>>>> 'switch (ovpn->mode)' checks and implementing a unified peer >>>>> management. The >>>>> 'peer' field in the main private data structure can be kept to >>>>> accelerate >>>>> lookups, still using peers table for management tasks like removing >>>>> all the >>>>> peers on the interface teardown. >>>> >>>> It would save a few 'switch(mode)', but force every client to allocate >>>> the hashtable for no reason at all. That tradeoff doesn't look very >>>> beneficial to me, the P2P-specific code is really simple. And if you >>>> keep ovpn->peer to make lookups faster, you're not removing that many >>>> 'switch(mode)'. >>> >>> Looking at the done review, I can retrospectively conclude that I >>> personally >>> do not like short 'switch' statements and special handlers :) >>> >>> Seriously, this module has a highest density of switches per KLOC >>> from what >>> I have seen before and a major part of it dedicated to handle the >>> special >>> case of P2P connection. >> >> I think it's fine. Either way there will be two implementations of >> whatever mode-dependent operation needs to be done. switch doesn't >> make it more complex than an ops structure. >> >> If you're reading the current version and find ovpn_peer_add, you see >> directly that it'll do either ovpn_peer_add_mp or >> ovpn_peer_add_p2p. With an ops structure, you'd have a call to >> ovpn->ops->peer_add, and you'd have to look up all possible ops >> structures to know that it can be either ovpn_peer_add_mp or >> ovpn_peer_add_p2p. If there's an undefined number of implementations >> living in different modules (like net_device_ops, or L4 protocols), >> you don't have a choice. >> >> xfrm went the opposite way to what you're proposing a few years ago >> (see commit 0c620e97b349 ("xfrm: remove output indirection from >> xfrm_mode") and others), and it made the code simpler. > > I checked this. Florian did a nice rework. And the way of implementation > looks reasonable since there are more than two encapsulation modes and > handling is more complex than just selecting a function to call. > > What I don't like about switches, that it requires extra lines of code > and pushes an author to introduce a default case with error handling. It > was mentioned that the module unlikely going to support more than two > modes. In this context shall we consider ternary operator usage. E.g.: the default case can actually be dropped. That way we can have the compiler warn when one of the enum values is not handled in the switch (should there be a new one at some point). However, the default is just a sanity check against future code changes which may introduce a bug. > > next_run = ovpn->mode == OVPN_MODE_P2P ? > ovpn_peer_keepalive_work_p2p(...) : > ovpn_peer_keepalive_work_mp(...); I find this ugly to read :-) The switch is much more elegant and straightforward. Do you agree this is getting more into a bike shed coloring discussion? :-D Since there is not much gain in changing approach, I think it is better if the maintainer picks a style that he finds more suitable (or simply likes more). no? > >>> And back to the hashtable(s) size for the MP mode. 8k-bins table looks a >>> good choice for a normal server with 1-2Gb uplink serving up to 1k >>> connections. But it sill unclear, how this choice can affect >>> installations >>> with a bigger number of connections? Or is this module applicable for >>> embedded solutions? E.g. running a couple of VPN servers on a home >>> router >>> with a few actual connections looks like a waste of RAM. I was about to >>> suggest to use rhashtable due to its dynamic sizing feature, but the >>> module >>> needs three tables. Any better idea? >> >> For this initial implementation I think it's fine. Sure, converting to >> rhashtable (or some other type of dynamically-sized hashtable, if >> rhashtable doesn't fit) in the future would make sense. But I don't >> think it's necessary to get the patches into net-next. Agreed. It's in the pipe (along with other features that I have already implemented), but it will come later. Regards, > > Make sense. Thanks for sharing these thoughts. > > -- > Sergey
On 21/11/2024 17:02, Sabrina Dubroca wrote: > [I'm still thinking about the locking problems for ovpn_peer_float, > but just noticed this while staring at the rehash code] > > 2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote: >> +void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer) >> + __must_hold(&peer->ovpn->peers->lock) >> +{ >> + struct hlist_nulls_head *nhead; >> + >> + if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) { >> + /* remove potential old hashing */ >> + hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr); > > s/hash_entry_transp_addr/hash_entry_addr4/ ? cr0p. very good catch! Thanks > > >> + nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr, >> + &peer->vpn_addrs.ipv4, >> + sizeof(peer->vpn_addrs.ipv4)); >> + hlist_nulls_add_head_rcu(&peer->hash_entry_addr4, nhead); >> + } >> + >> + if (!ipv6_addr_any(&peer->vpn_addrs.ipv6)) { >> + /* remove potential old hashing */ >> + hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr); > > s/hash_entry_transp_addr/hash_entry_addr6/ ? Thanks² This is what happens when you copy/paste code around. > > >> + nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr, >> + &peer->vpn_addrs.ipv6, >> + sizeof(peer->vpn_addrs.ipv6)); >> + hlist_nulls_add_head_rcu(&peer->hash_entry_addr6, nhead); >> + } >> +} > Regards,
On 21.11.2024 23:23, Antonio Quartulli wrote: > On 21/11/2024 00:22, Sergey Ryazanov wrote: >> On 13.11.2024 12:03, Sabrina Dubroca wrote: >>> 2024-11-13, 03:37:13 +0200, Sergey Ryazanov wrote: >>>> On 12.11.2024 19:31, Sabrina Dubroca wrote: >>>>> 2024-11-10, 15:38:27 +0200, Sergey Ryazanov wrote: >>>>>> On 29.10.2024 12:47, Antonio Quartulli wrote: >>>>>>> An ovpn_peer object holds the whole status of a remote peer >>>>>>> (regardless whether it is a server or a client). >>>>>>> >>>>>>> This includes status for crypto, tx/rx buffers, napi, etc. >>>>>>> >>>>>>> Only support for one peer is introduced (P2P mode). >>>>>>> Multi peer support is introduced with a later patch. >>>>>> >>>>>> Reviewing the peer creation/destroying code I came to a generic >>>>>> question. >>>>>> Did you consider keeping a single P2P peer in the peers table as >>>>>> well? >>>>>> >>>>>> Looks like such approach can greatly simply the code by dropping >>>>>> all these >>>>>> 'switch (ovpn->mode)' checks and implementing a unified peer >>>>>> management. The >>>>>> 'peer' field in the main private data structure can be kept to >>>>>> accelerate >>>>>> lookups, still using peers table for management tasks like >>>>>> removing all the >>>>>> peers on the interface teardown. >>>>> >>>>> It would save a few 'switch(mode)', but force every client to allocate >>>>> the hashtable for no reason at all. That tradeoff doesn't look very >>>>> beneficial to me, the P2P-specific code is really simple. And if you >>>>> keep ovpn->peer to make lookups faster, you're not removing that many >>>>> 'switch(mode)'. >>>> >>>> Looking at the done review, I can retrospectively conclude that I >>>> personally >>>> do not like short 'switch' statements and special handlers :) >>>> >>>> Seriously, this module has a highest density of switches per KLOC >>>> from what >>>> I have seen before and a major part of it dedicated to handle the >>>> special >>>> case of P2P connection. >>> >>> I think it's fine. Either way there will be two implementations of >>> whatever mode-dependent operation needs to be done. switch doesn't >>> make it more complex than an ops structure. >>> >>> If you're reading the current version and find ovpn_peer_add, you see >>> directly that it'll do either ovpn_peer_add_mp or >>> ovpn_peer_add_p2p. With an ops structure, you'd have a call to >>> ovpn->ops->peer_add, and you'd have to look up all possible ops >>> structures to know that it can be either ovpn_peer_add_mp or >>> ovpn_peer_add_p2p. If there's an undefined number of implementations >>> living in different modules (like net_device_ops, or L4 protocols), >>> you don't have a choice. >>> >>> xfrm went the opposite way to what you're proposing a few years ago >>> (see commit 0c620e97b349 ("xfrm: remove output indirection from >>> xfrm_mode") and others), and it made the code simpler. >> >> I checked this. Florian did a nice rework. And the way of >> implementation looks reasonable since there are more than two >> encapsulation modes and handling is more complex than just selecting a >> function to call. >> >> What I don't like about switches, that it requires extra lines of code >> and pushes an author to introduce a default case with error handling. >> It was mentioned that the module unlikely going to support more than >> two modes. In this context shall we consider ternary operator usage. >> E.g.: > > the default case can actually be dropped. That way we can have the > compiler warn when one of the enum values is not handled in the switch > (should there be a new one at some point). > However, the default is just a sanity check against future code changes > which may introduce a bug. > >> >> next_run = ovpn->mode == OVPN_MODE_P2P ? >> ovpn_peer_keepalive_work_p2p(...) : >> ovpn_peer_keepalive_work_mp(...); > > I find this ugly to read :-) Yeah. Doesn't look pretty as well. Just to conclude the discussion. Considering what we discussed here and the Sabrina's point regarding the trampoline penalty for indirect invocation, we do not have a better solution for now other than using switches everywhere. -- Sergey
On 21.11.2024 23:17, Antonio Quartulli wrote: > On 20/11/2024 23:56, Sergey Ryazanov wrote: >> On 15.11.2024 16:13, Antonio Quartulli wrote: >>> On 09/11/2024 02:11, Sergey Ryazanov wrote: >>>> On 29.10.2024 12:47, Antonio Quartulli wrote: >>>>> An ovpn interface will keep carrier always on and let the user >>>>> decide when an interface should be considered disconnected. >>>>> >>>>> This way, even if an ovpn interface is not connected to any peer, >>>>> it can still retain all IPs and routes and thus prevent any data >>>>> leak. >>>>> >>>>> Signed-off-by: Antonio Quartulli <antonio@openvpn.net> >>>>> Reviewed-by: Andrew Lunn <andrew@lunn.ch> >>>>> --- >>>>> drivers/net/ovpn/main.c | 7 +++++++ >>>>> 1 file changed, 7 insertions(+) >>>>> >>>>> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c >>>>> index >>>>> eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6 100644 >>>>> --- a/drivers/net/ovpn/main.c >>>>> +++ b/drivers/net/ovpn/main.c >>>>> @@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device >>>>> *net) >>>>> static int ovpn_net_open(struct net_device *dev) >>>>> { >>>>> + /* ovpn keeps the carrier always on to avoid losing IP or route >>>>> + * configuration upon disconnection. This way it can prevent >>>>> leaks >>>>> + * of traffic outside of the VPN tunnel. >>>>> + * The user may override this behaviour by tearing down the >>>>> interface >>>>> + * manually. >>>>> + */ >>>>> + netif_carrier_on(dev); >>>> >>>> If a user cares about the traffic leaking, then he can create a >>>> blackhole route with huge metric: >>>> >>>> # ip route add blackhole default metric 10000 >>>> >>>> Why the network interface should implicitly provide this >>>> functionality? And on another hand, how a routing daemon can learn a >>>> topology change without indication from the interface? >>> >>> This was discussed loooong ago with Andrew. Here my last response: >>> >>> https://lore.kernel.org/all/d896bbd8-2709-4834-a637- >>> f982fc51fc57@openvpn.net/ >> >> Thank you for sharing the link to the beginning of the conversation. >> Till the moment we have 3 topics regarding the operational state >> indication: >> 1. possible absence of a conception of running state, >> 2. influence on routing protocol implementations, >> 3. traffic leaking. >> >> As for conception of the running state, it should exists for tunneling >> protocols with a state tracking. In this specific case, we can assume >> interface running when it has configured peer with keys. The protocol >> even has nice feature for the connection monitoring - keepalive. > > What about a device in MP mode? It doesn't make sense to turn the > carrier off when the MP node has no peers connected. > At the same time I don't like having P2P and MP devices behaving > differently in this regard. MP with a single network interface is an endless headache. Indeed. On the other hand, penalizing P2P users just because protocol support MP doesn't look like a solution either. > Therefore keeping the carrier on seemed the most logical way forward (at > least for now - we can still come back to this once we have something > smarter to implement). It was shown above how to distinguish between running and non-running cases. If an author doesn't want to implement operational state indication now, then I'm Ok with it. Not a big deal now. I just don't like the idea to promote the abuse of the running state indicator. Please see below. >> Routing protocols on one hand could benefit from the operational state >> indication. On another hand, hello/hold timer values mentioned in the >> documentation are comparable with default routing protocols timers. >> So, actual improvement is debatable. >> >> Regarding the traffic leading, as I mentioned before, the blackhole >> route or a firewall rule works better then implicit blackholing with a >> non-running interface. >> >> Long story short, I agree that we might not need a real operational >> state indication now. Still protecting from a traffic leaking is not >> good enough justification. > > Well, it's the so called "persistent interface" concept in VPNs: leave > everything as is, even if the connection is lost. It's called routing framework abuse. The IP router will choose the route and the egress interface not because this route is a good option to deliver a packet, but because someone trick it. At some circumstance, e.g. Android app, it could be the only way to prevent traffic leaking. But these special circumstances do not make solution generic and eligible for inclusion into the mainline code. > I know it can be implemented in many other different ways..but I don't > see a real problem with keeping this way. At least routing protocols and network monitoring software will not be happy to see a dead interface pretending that it's still running. Generally speaking, saying that interface is running, when module knows for sure that a packet can not be delivered is a user misguiding. > A blackhole/firewall can still be added if the user prefers (and not use > the persistent interface). The solution with false-indication is not so reliable as it might look. Interface shutdown, inability of a user-space application to start, user-space application crash, user-space application restart, each of them will void the trick. Ergo, blackhole/firewall must be employed by a security concerned user. What makes the proposed feature odd. To summaries, I'm Ok if this change will be merged with a comment like "For future study" or "To be done" or "To be implemented". But a comment like "to prevent traffic leaking" or any other comment implying a "breakthrough security feature" will have a big NACK from my side. -- Sergey
On 23/11/2024 23:25, Sergey Ryazanov wrote: > On 21.11.2024 23:17, Antonio Quartulli wrote: >> On 20/11/2024 23:56, Sergey Ryazanov wrote: >>> On 15.11.2024 16:13, Antonio Quartulli wrote: >>>> On 09/11/2024 02:11, Sergey Ryazanov wrote: >>>>> On 29.10.2024 12:47, Antonio Quartulli wrote: >>>>>> An ovpn interface will keep carrier always on and let the user >>>>>> decide when an interface should be considered disconnected. >>>>>> >>>>>> This way, even if an ovpn interface is not connected to any peer, >>>>>> it can still retain all IPs and routes and thus prevent any data >>>>>> leak. >>>>>> >>>>>> Signed-off-by: Antonio Quartulli <antonio@openvpn.net> >>>>>> Reviewed-by: Andrew Lunn <andrew@lunn.ch> >>>>>> --- >>>>>> drivers/net/ovpn/main.c | 7 +++++++ >>>>>> 1 file changed, 7 insertions(+) >>>>>> >>>>>> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c >>>>>> index >>>>>> eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6 100644 >>>>>> --- a/drivers/net/ovpn/main.c >>>>>> +++ b/drivers/net/ovpn/main.c >>>>>> @@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device >>>>>> *net) >>>>>> static int ovpn_net_open(struct net_device *dev) >>>>>> { >>>>>> + /* ovpn keeps the carrier always on to avoid losing IP or route >>>>>> + * configuration upon disconnection. This way it can prevent >>>>>> leaks >>>>>> + * of traffic outside of the VPN tunnel. >>>>>> + * The user may override this behaviour by tearing down the >>>>>> interface >>>>>> + * manually. >>>>>> + */ >>>>>> + netif_carrier_on(dev); >>>>> >>>>> If a user cares about the traffic leaking, then he can create a >>>>> blackhole route with huge metric: >>>>> >>>>> # ip route add blackhole default metric 10000 >>>>> >>>>> Why the network interface should implicitly provide this >>>>> functionality? And on another hand, how a routing daemon can learn >>>>> a topology change without indication from the interface? >>>> >>>> This was discussed loooong ago with Andrew. Here my last response: >>>> >>>> https://lore.kernel.org/all/d896bbd8-2709-4834-a637- >>>> f982fc51fc57@openvpn.net/ >>> >>> Thank you for sharing the link to the beginning of the conversation. >>> Till the moment we have 3 topics regarding the operational state >>> indication: >>> 1. possible absence of a conception of running state, >>> 2. influence on routing protocol implementations, >>> 3. traffic leaking. >>> >>> As for conception of the running state, it should exists for >>> tunneling protocols with a state tracking. In this specific case, we >>> can assume interface running when it has configured peer with keys. >>> The protocol even has nice feature for the connection monitoring - >>> keepalive. >> >> What about a device in MP mode? It doesn't make sense to turn the >> carrier off when the MP node has no peers connected. >> At the same time I don't like having P2P and MP devices behaving >> differently in this regard. > > MP with a single network interface is an endless headache. Indeed. On > the other hand, penalizing P2P users just because protocol support MP > doesn't look like a solution either. On the upper side, with "iroutes" implemented using the system routing table, routing protocols will be able to detect new routes only when the related client has connected. (The same for the disconnection) But this is a bit orthogonal compared to the oper state. > >> Therefore keeping the carrier on seemed the most logical way forward >> (at least for now - we can still come back to this once we have >> something smarter to implement). > > It was shown above how to distinguish between running and non-running > cases. > > If an author doesn't want to implement operational state indication now, > then I'm Ok with it. Not a big deal now. I just don't like the idea to > promote the abuse of the running state indicator. Please see below. > >>> Routing protocols on one hand could benefit from the operational >>> state indication. On another hand, hello/hold timer values mentioned >>> in the documentation are comparable with default routing protocols >>> timers. So, actual improvement is debatable. >>> >>> Regarding the traffic leading, as I mentioned before, the blackhole >>> route or a firewall rule works better then implicit blackholing with >>> a non-running interface. >>> >>> Long story short, I agree that we might not need a real operational >>> state indication now. Still protecting from a traffic leaking is not >>> good enough justification. >> >> Well, it's the so called "persistent interface" concept in VPNs: leave >> everything as is, even if the connection is lost. > > It's called routing framework abuse. The IP router will choose the route > and the egress interface not because this route is a good option to > deliver a packet, but because someone trick it. This is what the user wants. OpenVPN (userspace) will tear down the P2P interface upon disconnection, assuming the --persist-tun option was not specified by the user. So the interface is gone in any case. By keeping the netcarrier on we are just ensuring that, if the user wanted persist-tun, the iface is not actually making decisions on its own. With a tun interface this can be done, now you want to basically drop this feature that existed for long time and break existing setups. > > At some circumstance, e.g. Android app, it could be the only way to > prevent traffic leaking. But these special circumstances do not make > solution generic and eligible for inclusion into the mainline code. Why not? We are not changing the general rule, but just defining a specific behaviour for a specific driver. For example, I don't think a tun interface goes down when there is no socket attached to it, still packets are just going to be blackhole'd in that case. No? > >> I know it can be implemented in many other different ways..but I don't >> see a real problem with keeping this way. > > At least routing protocols and network monitoring software will not be > happy to see a dead interface pretending that it's still running. They won't know that the interface is disconnected, they will possibly just see traffic being dropped. > Generally speaking, saying that interface is running, when module knows > for sure that a packet can not be delivered is a user misguiding. Or a feature, wanted by the user. > >> A blackhole/firewall can still be added if the user prefers (and not >> use the persistent interface). > > The solution with false-indication is not so reliable as it might look. > Interface shutdown, inability of a user-space application to start, > user-space application crash, user-space application restart, each of > them will void the trick. Ergo, blackhole/firewall must be employed by a > security concerned user. What makes the proposed feature odd. Yeah, this is what other VPN clients call "kill switch". Persist-tun is just one piece of the puzzle, yet important. > > To summaries, I'm Ok if this change will be merged with a comment like > "For future study" or "To be done" or "To be implemented". But a comment > like "to prevent traffic leaking" or any other comment implying a > "breakthrough security feature" will have a big NACK from my side. What if the comment redirects the user to --persist-tun option in order to clarify the context and the wanted behaviour? Would that help? > > -- > Sergey
On 24.11.2024 00:52, Antonio Quartulli wrote: > On 23/11/2024 23:25, Sergey Ryazanov wrote: >> On 21.11.2024 23:17, Antonio Quartulli wrote: >>> On 20/11/2024 23:56, Sergey Ryazanov wrote: >>>> On 15.11.2024 16:13, Antonio Quartulli wrote: >>>>> On 09/11/2024 02:11, Sergey Ryazanov wrote: >>>>>> On 29.10.2024 12:47, Antonio Quartulli wrote: >>>>>>> An ovpn interface will keep carrier always on and let the user >>>>>>> decide when an interface should be considered disconnected. >>>>>>> >>>>>>> This way, even if an ovpn interface is not connected to any peer, >>>>>>> it can still retain all IPs and routes and thus prevent any data >>>>>>> leak. >>>>>>> >>>>>>> Signed-off-by: Antonio Quartulli <antonio@openvpn.net> >>>>>>> Reviewed-by: Andrew Lunn <andrew@lunn.ch> >>>>>>> --- >>>>>>> drivers/net/ovpn/main.c | 7 +++++++ >>>>>>> 1 file changed, 7 insertions(+) >>>>>>> >>>>>>> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c >>>>>>> index >>>>>>> eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6 100644 >>>>>>> --- a/drivers/net/ovpn/main.c >>>>>>> +++ b/drivers/net/ovpn/main.c >>>>>>> @@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device >>>>>>> *net) >>>>>>> static int ovpn_net_open(struct net_device *dev) >>>>>>> { >>>>>>> + /* ovpn keeps the carrier always on to avoid losing IP or route >>>>>>> + * configuration upon disconnection. This way it can prevent >>>>>>> leaks >>>>>>> + * of traffic outside of the VPN tunnel. >>>>>>> + * The user may override this behaviour by tearing down the >>>>>>> interface >>>>>>> + * manually. >>>>>>> + */ >>>>>>> + netif_carrier_on(dev); >>>>>> >>>>>> If a user cares about the traffic leaking, then he can create a >>>>>> blackhole route with huge metric: >>>>>> >>>>>> # ip route add blackhole default metric 10000 >>>>>> >>>>>> Why the network interface should implicitly provide this >>>>>> functionality? And on another hand, how a routing daemon can learn >>>>>> a topology change without indication from the interface? >>>>> >>>>> This was discussed loooong ago with Andrew. Here my last response: >>>>> >>>>> https://lore.kernel.org/all/d896bbd8-2709-4834-a637- >>>>> f982fc51fc57@openvpn.net/ >>>> >>>> Thank you for sharing the link to the beginning of the conversation. >>>> Till the moment we have 3 topics regarding the operational state >>>> indication: >>>> 1. possible absence of a conception of running state, >>>> 2. influence on routing protocol implementations, >>>> 3. traffic leaking. >>>> >>>> As for conception of the running state, it should exists for >>>> tunneling protocols with a state tracking. In this specific case, we >>>> can assume interface running when it has configured peer with keys. >>>> The protocol even has nice feature for the connection monitoring - >>>> keepalive. >>> >>> What about a device in MP mode? It doesn't make sense to turn the >>> carrier off when the MP node has no peers connected. >>> At the same time I don't like having P2P and MP devices behaving >>> differently in this regard. >> >> MP with a single network interface is an endless headache. Indeed. On >> the other hand, penalizing P2P users just because protocol support MP >> doesn't look like a solution either. > > On the upper side, with "iroutes" implemented using the system routing > table, routing protocols will be able to detect new routes only when the > related client has connected. (The same for the disconnection) > > But this is a bit orthogonal compared to the oper state. The patch has nothing common with the routes configuration. The main concern is forcing of the running state indication. And more specifically, the concern is the given justification for this activity. >>> Therefore keeping the carrier on seemed the most logical way forward >>> (at least for now - we can still come back to this once we have >>> something smarter to implement). >> >> It was shown above how to distinguish between running and non-running >> cases. >> >> If an author doesn't want to implement operational state indication >> now, then I'm Ok with it. Not a big deal now. I just don't like the >> idea to promote the abuse of the running state indicator. Please see >> below. >> >>>> Routing protocols on one hand could benefit from the operational >>>> state indication. On another hand, hello/hold timer values mentioned >>>> in the documentation are comparable with default routing protocols >>>> timers. So, actual improvement is debatable. >>>> >>>> Regarding the traffic leading, as I mentioned before, the blackhole >>>> route or a firewall rule works better then implicit blackholing with >>>> a non-running interface. >>>> >>>> Long story short, I agree that we might not need a real operational >>>> state indication now. Still protecting from a traffic leaking is not >>>> good enough justification. >>> >>> Well, it's the so called "persistent interface" concept in VPNs: >>> leave everything as is, even if the connection is lost. >> >> It's called routing framework abuse. The IP router will choose the >> route and the egress interface not because this route is a good option >> to deliver a packet, but because someone trick it. > > This is what the user wants. Will be happy to see a study on user's preferences. > OpenVPN (userspace) will tear down the P2P interface upon disconnection, > assuming the --persist-tun option was not specified by the user. > > So the interface is gone in any case. > > By keeping the netcarrier on we are just ensuring that, if the user > wanted persist-tun, the iface is not actually making decisions on its own. Regarding a decision on its own. Ethernet interface going to the not-running state upon lost of carrier from a switch. It's hardly could be considered a decision of the interface. It's an indication of the fact. Similarly, beeping of UPS is not its decision to make user's life miserable, it's the indication of the power line failure. I hope, at least we are both agree on that a UPS should indicate the line failure. Back to the 'persist-tun' option. I checked the openvpn(8) man page. It gives a reasonable hints to use this option to avoid negative outcomes on internal openvpn process restart. E.g. in case of privilege dropping. It servers the same purpose as 'persist-key'. And there is no word regarding traffic leaking. If somebody have decided that this option gives the funny side-effect and allows to cut the corners, then I cannot say anything but sorry. > With a tun interface this can be done, now you want to basically drop > this feature that existed for long time and break existing setups. Amicus Plato, sed magis amica veritas Yes, I don't want to see this interface misbehaviour advertised as a security feature. I hope the previous email gives a detailed explanation why. If it's going to break existing setup, then end-users can be supported with a changelog notice explaining how to properly address the risk of the traffic leaking. >> At some circumstance, e.g. Android app, it could be the only way to >> prevent traffic leaking. But these special circumstances do not make >> solution generic and eligible for inclusion into the mainline code. > > Why not? We are not changing the general rule, but just defining a > specific behaviour for a specific driver. Yeah. This patch is not changing the general rule. The patch breaks it and the comment in the code makes proud of it. Looks like an old joke that documented bug become a feature. From a system administrator or a firmware developer perspective, the proposed behaviour will look like inconsistency comparing to other interface types. And this inconsistency requires to be addressed with special configuration or a dedicated script in a worst case. And I cannot see justified reason to make their life harder. > For example, I don't think a tun interface goes down when there is no > socket attached to it, still packets are just going to be blackhole'd in > that case. No? Nope. Tun interface indeed will go into the non-running state on the detach event. Moreover, the tun module supports running/non-running indication change upon a command from userspace. But not every userspace application feel a desire to implement it. >>> I know it can be implemented in many other different ways..but I >>> don't see a real problem with keeping this way. >> >> At least routing protocols and network monitoring software will not be >> happy to see a dead interface pretending that it's still running. > > They won't know that the interface is disconnected, they will possibly > just see traffic being dropped. Packet loss detection is quite complex operation. So yes, they are indeed monitoring the interface operational state to warn operator as soon as possible and take some automatic actions if we are talking about routing protocols. Some sophisticated monitoring systems even capable to generate events like 'link unstable' with higher severity if they see interface operational state flapping in a short period of time. So yeah, for these kinds of systems, proper operational state indication is essential. >> Generally speaking, saying that interface is running, when module >> knows for sure that a packet can not be delivered is a user misguiding. > > Or a feature, wanted by the user. > >>> A blackhole/firewall can still be added if the user prefers (and not >>> use the persistent interface). >> >> The solution with false-indication is not so reliable as it might >> look. Interface shutdown, inability of a user-space application to >> start, user-space application crash, user-space application restart, >> each of them will void the trick. Ergo, blackhole/firewall must be >> employed by a security concerned user. What makes the proposed feature >> odd. > > Yeah, this is what other VPN clients call "kill switch". > Persist-tun is just one piece of the puzzle, yet important. > >> >> To summaries, I'm Ok if this change will be merged with a comment like >> "For future study" or "To be done" or "To be implemented". But a >> comment like "to prevent traffic leaking" or any other comment >> implying a "breakthrough security feature" will have a big NACK from >> my side. > > What if the comment redirects the user to --persist-tun option in order > to clarify the context and the wanted behaviour? > > Would that help? Nope. As it was mentioned above, the are no indication that 'persist-tun' is a 'security' feature even in the current openvpn documentation. If the openvpn developers want to keep implementation bug-to-bug compatible, then feel free to configure the blackhole route on behalf of end-user by means of the userspace daemon. Nobody will mind. -- Sergey
On 25/11/2024 03:26, Sergey Ryazanov wrote: >> OpenVPN (userspace) will tear down the P2P interface upon >> disconnection, assuming the --persist-tun option was not specified by >> the user. >> >> So the interface is gone in any case. >> >> By keeping the netcarrier on we are just ensuring that, if the user >> wanted persist-tun, the iface is not actually making decisions on its >> own. > > Regarding a decision on its own. Ethernet interface going to the not- > running state upon lost of carrier from a switch. It's hardly could be > considered a decision of the interface. It's an indication of the fact. > > Similarly, beeping of UPS is not its decision to make user's life > miserable, it's the indication of the power line failure. I hope, at > least we are both agree on that a UPS should indicate the line failure. The answer is always "it depends". > > Back to the 'persist-tun' option. I checked the openvpn(8) man page. It > gives a reasonable hints to use this option to avoid negative outcomes > on internal openvpn process restart. E.g. in case of privilege dropping. > It servers the same purpose as 'persist-key'. And there is no word > regarding traffic leaking. FTR, here is the text in the manpage: --persist-tun Don't close and reopen TUN/TAP device or run up/down scripts across SIGUSR1 or --ping-restart restarts. SIGUSR1 is a restart signal similar to SIGHUP, but which offers finer-grained control over reset options. SIGUSR1 is a session reconnection, not a process restart. The manpage just indicates what happens at the low level when this option is provided. The next question is: what is this useful for? Many things, among those there is the fact the interface will retain its configuration (i.e. IPs, routes, etc). > > If somebody have decided that this option gives the funny side-effect > and allows to cut the corners, then I cannot say anything but sorry. > Well, OpenVPN is more than 20 years old. If a given API allows a specific user behaviour and had done so for those many years, changing it is still a user breakage. Not much we can do. >> With a tun interface this can be done, now you want to basically drop >> this feature that existed for long time and break existing setups. > > Amicus Plato, sed magis amica veritas > > Yes, I don't want to see this interface misbehaviour advertised as a > security feature. I hope the previous email gives a detailed explanation > why. Let's forget about the traffic leak mention and the "security feature". That comment was probably written in the middle of the night and I agree it gives a false sense or what is happening. > > If it's going to break existing setup, then end-users can be supported > with a changelog notice explaining how to properly address the risk of > the traffic leaking. Nope, we can't just break existing user setups. > >>> At some circumstance, e.g. Android app, it could be the only way to >>> prevent traffic leaking. But these special circumstances do not make >>> solution generic and eligible for inclusion into the mainline code. >> >> Why not? We are not changing the general rule, but just defining a >> specific behaviour for a specific driver. > > Yeah. This patch is not changing the general rule. The patch breaks it > and the comment in the code makes proud of it. Looks like an old joke > that documented bug become a feature. Like I said above, let's make the comment meaningful for the expected goal: implement persist-tun while leaving userspace the chance to decide what to do. > > From a system administrator or a firmware developer perspective, the > proposed behaviour will look like inconsistency comparing to other > interface types. And this inconsistency requires to be addressed with > special configuration or a dedicated script in a worst case. And I > cannot see justified reason to make their life harder. You can configure openvpn to bring the interface down when the connection is lost. Why do you say it requires extra scripting and such? > >> For example, I don't think a tun interface goes down when there is no >> socket attached to it, still packets are just going to be blackhole'd >> in that case. No? > > Nope. Tun interface indeed will go into the non-running state on the > detach event. Moreover, the tun module supports running/non-running > indication change upon a command from userspace. But not every userspace > application feel a desire to implement it. With 'ovpn' we basically want a similar effect: let userspace decide what to do depending on the configuration. > >>>> I know it can be implemented in many other different ways..but I >>>> don't see a real problem with keeping this way. >>> >>> At least routing protocols and network monitoring software will not >>> be happy to see a dead interface pretending that it's still running. >> >> They won't know that the interface is disconnected, they will possibly >> just see traffic being dropped. > > Packet loss detection is quite complex operation. So yes, they are > indeed monitoring the interface operational state to warn operator as > soon as possible and take some automatic actions if we are talking about > routing protocols. Some sophisticated monitoring systems even capable to > generate events like 'link unstable' with higher severity if they see > interface operational state flapping in a short period of time. > > So yeah, for these kinds of systems, proper operational state indication > is essential. Again, if the user has not explicitly allowed the persistent behaviour, the interface will be brought down when a disconnection happens. But if the user/administrator *wants* to avoid that, he has needs a chance to do that. Otherwise people that needs this behaviour will just have to stick to using tun and the full userspace implementation. > >>> Generally speaking, saying that interface is running, when module >>> knows for sure that a packet can not be delivered is a user misguiding. >> >> Or a feature, wanted by the user. >> >>>> A blackhole/firewall can still be added if the user prefers (and not >>>> use the persistent interface). >>> >>> The solution with false-indication is not so reliable as it might >>> look. Interface shutdown, inability of a user-space application to >>> start, user-space application crash, user-space application restart, >>> each of them will void the trick. Ergo, blackhole/firewall must be >>> employed by a security concerned user. What makes the proposed >>> feature odd. >> >> Yeah, this is what other VPN clients call "kill switch". >> Persist-tun is just one piece of the puzzle, yet important. >> >>> >>> To summaries, I'm Ok if this change will be merged with a comment >>> like "For future study" or "To be done" or "To be implemented". But a >>> comment like "to prevent traffic leaking" or any other comment >>> implying a "breakthrough security feature" will have a big NACK from >>> my side. >> >> What if the comment redirects the user to --persist-tun option in >> order to clarify the context and the wanted behaviour? >> >> Would that help? > > Nope. As it was mentioned above, the are no indication that 'persist- > tun' is a 'security' feature even in the current openvpn documentation. > Like I mentioned above, I agree we should get rid of that sentence. The security feature must be implemented by means of extra tools, just the interface staying up is not enough. > If the openvpn developers want to keep implementation bug-to-bug > compatible, then feel free to configure the blackhole route on behalf of > end-user by means of the userspace daemon. Nobody will mind. bug-to-bug compatible? What do you mean? Having userspace configure a blackhole route is something that can be considered by whoeever decides to implement the "kill switch" feature. OpenVPN does not. It just implements --persist-tun. So all in all, the conclusion is that in this case it's usersapce to decide when the interface should go up and down, depending on the configuration. I'd like to keep it as it is to avoid the ovpn interface to make decisions on its own. I can spell this out in the comment (I think it definitely makes sense), to clarify that the netcarrier is expected to be driven by userspace (where the control plane is) rather than having the device make decisions without having the full picture. What do you think? Regards,
On 25.11.2024 15:07, Antonio Quartulli wrote: > On 25/11/2024 03:26, Sergey Ryazanov wrote: >>> OpenVPN (userspace) will tear down the P2P interface upon >>> disconnection, assuming the --persist-tun option was not specified by >>> the user. >>> >>> So the interface is gone in any case. >>> >>> By keeping the netcarrier on we are just ensuring that, if the user >>> wanted persist-tun, the iface is not actually making decisions on its >>> own. >> >> Regarding a decision on its own. Ethernet interface going to the not- >> running state upon lost of carrier from a switch. It's hardly could be >> considered a decision of the interface. It's an indication of the fact. >> >> Similarly, beeping of UPS is not its decision to make user's life >> miserable, it's the indication of the power line failure. I hope, at >> least we are both agree on that a UPS should indicate the line failure. > > The answer is always "it depends". > >> Back to the 'persist-tun' option. I checked the openvpn(8) man page. >> It gives a reasonable hints to use this option to avoid negative >> outcomes on internal openvpn process restart. E.g. in case of >> privilege dropping. It servers the same purpose as 'persist-key'. And >> there is no word regarding traffic leaking. > > FTR, here is the text in the manpage: > > --persist-tun > Don't close and reopen TUN/TAP device or run up/down > scripts across SIGUSR1 or --ping-restart restarts. > > SIGUSR1 is a restart signal similar to SIGHUP, but which > offers finer-grained control over reset options. > > > SIGUSR1 is a session reconnection, not a process restart. > The manpage just indicates what happens at the low level when this > option is provided. Still no mentions of the traffic leaking prevention. Is it? > The next question is: what is this useful for? Many things, among those > there is the fact the interface will retain its configuration (i.e. IPs, > routes, etc). This is unrelated to the correct operational state indication. Addresses and routes are not reset in case of interface going to non-running state. >> If somebody have decided that this option gives the funny side-effect >> and allows to cut the corners, then I cannot say anything but sorry. > > Well, OpenVPN is more than 20 years old. More than 20 years of misguiding users has been duly noted :) Should I mention that RFC 1066 containing ifOperStatus definition was issues 12 years before OpenVPN? And than it was updated with multiple clarifications. > If a given API allows a specific user behaviour and had done so for > those many years, changing it is still a user breakage. Not much we can do. > >>> With a tun interface this can be done, now you want to basically drop >>> this feature that existed for long time and break existing setups. >> >> Amicus Plato, sed magis amica veritas >> >> Yes, I don't want to see this interface misbehaviour advertised as a >> security feature. I hope the previous email gives a detailed >> explanation why. > > Let's forget about the traffic leak mention and the "security feature". > That comment was probably written in the middle of the night and I agree > it gives a false sense or what is happening. > >> If it's going to break existing setup, then end-users can be supported >> with a changelog notice explaining how to properly address the risk of >> the traffic leaking. > > Nope, we can't just break existing user setups. > >>>> At some circumstance, e.g. Android app, it could be the only way to >>>> prevent traffic leaking. But these special circumstances do not make >>>> solution generic and eligible for inclusion into the mainline code. >>> >>> Why not? We are not changing the general rule, but just defining a >>> specific behaviour for a specific driver. >> >> Yeah. This patch is not changing the general rule. The patch breaks it >> and the comment in the code makes proud of it. Looks like an old joke >> that documented bug become a feature. > > Like I said above, let's make the comment meaningful for the expected > goal: implement persist-tun while leaving userspace the chance to decide > what to do. > >> >> From a system administrator or a firmware developer perspective, the >> proposed behaviour will look like inconsistency comparing to other >> interface types. And this inconsistency requires to be addressed with >> special configuration or a dedicated script in a worst case. And I >> cannot see justified reason to make their life harder. > > You can configure openvpn to bring the interface down when the > connection is lost. Why do you say it requires extra scripting and such? Being administratively down and being operationally down are different states. >>> For example, I don't think a tun interface goes down when there is no >>> socket attached to it, still packets are just going to be blackhole'd >>> in that case. No? >> >> Nope. Tun interface indeed will go into the non-running state on the >> detach event. Moreover, the tun module supports running/non-running >> indication change upon a command from userspace. But not every >> userspace application feel a desire to implement it. > > With 'ovpn' we basically want a similar effect: let userspace decide > what to do depending on the configuration. > >> >>>>> I know it can be implemented in many other different ways..but I >>>>> don't see a real problem with keeping this way. >>>> >>>> At least routing protocols and network monitoring software will not >>>> be happy to see a dead interface pretending that it's still running. >>> >>> They won't know that the interface is disconnected, they will >>> possibly just see traffic being dropped. >> >> Packet loss detection is quite complex operation. So yes, they are >> indeed monitoring the interface operational state to warn operator as >> soon as possible and take some automatic actions if we are talking >> about routing protocols. Some sophisticated monitoring systems even >> capable to generate events like 'link unstable' with higher severity >> if they see interface operational state flapping in a short period of >> time. >> >> So yeah, for these kinds of systems, proper operational state >> indication is essential. > > Again, if the user has not explicitly allowed the persistent behaviour, > the interface will be brought down when a disconnection happens. > But if the user/administrator *wants* to avoid that, he has needs a > chance to do that. > > Otherwise people that needs this behaviour will just have to stick to > using tun and the full userspace implementation. > >> >>>> Generally speaking, saying that interface is running, when module >>>> knows for sure that a packet can not be delivered is a user misguiding. >>> >>> Or a feature, wanted by the user. >>> >>>>> A blackhole/firewall can still be added if the user prefers (and >>>>> not use the persistent interface). >>>> >>>> The solution with false-indication is not so reliable as it might >>>> look. Interface shutdown, inability of a user-space application to >>>> start, user-space application crash, user-space application restart, >>>> each of them will void the trick. Ergo, blackhole/firewall must be >>>> employed by a security concerned user. What makes the proposed >>>> feature odd. >>> >>> Yeah, this is what other VPN clients call "kill switch". >>> Persist-tun is just one piece of the puzzle, yet important. >>> >>>> >>>> To summaries, I'm Ok if this change will be merged with a comment >>>> like "For future study" or "To be done" or "To be implemented". But >>>> a comment like "to prevent traffic leaking" or any other comment >>>> implying a "breakthrough security feature" will have a big NACK from >>>> my side. >>> >>> What if the comment redirects the user to --persist-tun option in >>> order to clarify the context and the wanted behaviour? >>> >>> Would that help? >> >> Nope. As it was mentioned above, the are no indication that 'persist- >> tun' is a 'security' feature even in the current openvpn documentation. >> > > Like I mentioned above, I agree we should get rid of that sentence. > The security feature must be implemented by means of extra tools, just > the interface staying up is not enough. > >> If the openvpn developers want to keep implementation bug-to-bug >> compatible, then feel free to configure the blackhole route on behalf >> of end-user by means of the userspace daemon. Nobody will mind. > > bug-to-bug compatible? What do you mean? http://www.jargon.net/jargonfile/b/bug-compatible.html With that difference, the local operational state indication does not break compatibility between hosts. > Having userspace configure a blackhole route is something that can be > considered by whoeever decides to implement the "kill switch" feature. > > OpenVPN does not. It just implements --persist-tun. > > So all in all, the conclusion is that in this case it's usersapce to > decide when the interface should go up and down, depending on the > configuration. I'd like to keep it as it is to avoid the ovpn interface > to make decisions on its own. > > I can spell this out in the comment (I think it definitely makes sense), > to clarify that the netcarrier is expected to be driven by userspace > (where the control plane is) rather than having the device make > decisions without having the full picture. > > What do you think? It wasn't suggested to destroy the interface in case of interface becoming non-operational. I apologize if something I wrote earlier sounded like that. The interface existence stays unquestionable. It's going to be solid persistent. Back to the proposed rephrasing. If the 'full picture' means forcing the running state indication even when the netdev is not capable to deliver packets, then it looks like an attempt to hide the control knob of the misguiding feature somewhere else. And since the concept of on-purpose false indication is still here, many words regarding the control plane and a full picture do not sound good either. -- Sergey
On 26/11/2024 01:32, Sergey Ryazanov wrote: > On 15.11.2024 17:02, Antonio Quartulli wrote: >> On 11/11/2024 02:54, Sergey Ryazanov wrote: >> [...] >> >>>> +/* Called after decrypt to write the IP packet to the device. >>>> + * This method is expected to manage/free the skb. >>>> + */ >>>> +static void ovpn_netdev_write(struct ovpn_peer *peer, struct >>>> sk_buff *skb) >>>> +{ >>>> + unsigned int pkt_len; >>>> + >>>> + /* we can't guarantee the packet wasn't corrupted before >>>> entering the >>>> + * VPN, therefore we give other layers a chance to check that >>>> + */ >>>> + skb->ip_summed = CHECKSUM_NONE; >>>> + >>>> + /* skb hash for transport packet no longer valid after >>>> decapsulation */ >>>> + skb_clear_hash(skb); >>>> + >>>> + /* post-decrypt scrub -- prepare to inject encapsulated packet >>>> onto the >>>> + * interface, based on __skb_tunnel_rx() in dst.h >>>> + */ >>>> + skb->dev = peer->ovpn->dev; >>>> + skb_set_queue_mapping(skb, 0); >>>> + skb_scrub_packet(skb, true); >>>> + >>> >>> The skb->protocol field is going to be updated in the upcoming patch >>> in the caller (ovpn_decrypt_post). Shall we put a comment here >>> clarifying, why do not touch the protocol field here? >> >> Well, I would personally not document missing details in a partly >> implemented code path. > > Looks like the question wasn't precisely emphrased. By bad. Let me > elaborate it in more details: > 1. usually skb->protocol is updated just before a packet leaves a module > 2. I've not found it were it was expected > 3. skb->protocol is updated in the caller function - > ovpn_decrypt_post(), along with the skb_reset_network_header() call. > > The question is, shall we put some comment here in the > ovpn_netdev_write() function elaborating that this was done in the > caller? Or is such comment odd? Ok, got it. Mah personally I don't think it's truly needed. But I have no strong opinion. > >>>> + skb_reset_network_header(skb); >>> >>> ovpn_decrypt_post() already reseted the network header. Why do we >>> need it here again? >> >> yeah, I think this can be removed. >> >>> >>>> + skb_reset_transport_header(skb); >>>> + skb_probe_transport_header(skb); >>>> + skb_reset_inner_headers(skb); >>>> + >>>> + memset(skb->cb, 0, sizeof(skb->cb)); >>> >>> Why do we need to zero the control buffer here? >> >> To avoid the next layer to assume the cb is clean while it is not. >> Other drivers do the same as well. > > AFAIR, there is no convention to clean the control buffer before the > handing over. The common practice is a bit opposite, programmer shall > not assume that the control buffer has been zeroed. > > Not a big deal to clean it here, we just can save some CPU cycles > avoiding it. If there is no convention, then I agree with you and I'd remove it. > >> I think this was recommended by Sabrina as well. > > Curious. It's macsec that does not zero it, or I've not understood how > it was done. I don't see it being zero'd. So I possibly misunderstood the suggestion. I'll remove the memset. > >>>> + /* cause packet to be "received" by the interface */ >>>> + pkt_len = skb->len; >>>> + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, >>>> + skb) == NET_RX_SUCCESS)) >>>> + /* update RX stats with the size of decrypted packet */ >>>> + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); >>>> +} >>>> + > > [...] > >>>> diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c >>>> index >>>> 090a3232ab0ec19702110f1a90f45c7f10889f6f..964b566de69f4132806a969a455cec7f6059a0bd 100644 >>>> --- a/drivers/net/ovpn/socket.c >>>> +++ b/drivers/net/ovpn/socket.c >>>> @@ -22,6 +22,9 @@ static void ovpn_socket_detach(struct socket *sock) >>>> if (!sock) >>>> return; >>>> + if (sock->sk->sk_protocol == IPPROTO_UDP) >>>> + ovpn_udp_socket_detach(sock); >>>> + >>>> sockfd_put(sock); >>>> } >>>> @@ -71,6 +74,27 @@ static int ovpn_socket_attach(struct socket >>>> *sock, struct ovpn_peer *peer) >>>> return ret; >>>> } >>>> +/* Retrieve the corresponding ovpn object from a UDP socket >>>> + * rcu_read_lock must be held on entry >>>> + */ >>>> +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) >>>> +{ >>>> + struct ovpn_socket *ovpn_sock; >>>> + >>>> + if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != >>>> UDP_ENCAP_OVPNINUDP)) >>>> + return NULL; >>>> + >>>> + ovpn_sock = rcu_dereference_sk_user_data(sk); >>>> + if (unlikely(!ovpn_sock)) >>>> + return NULL; >>>> + >>>> + /* make sure that sk matches our stored transport socket */ >>>> + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk)) >>>> + return NULL; >>>> + >>>> + return ovpn_sock->ovpn; >>> >>> Now, returning of this pointer is safe. But the following TCP >>> transport support calls the socket release via a scheduled work. What >>> extends socket lifetime and makes it possible to receive a UDP packet >>> way after the interface private data release. Is it correct assumption? >> >> Sorry you lost me when sayng "following *TCP* transp[ort support calls". >> This function is invoked only in UDP context. >> Was that a typ0? > > Yeah, you are right. The question sounds like a riddle. I should > eventually stop composing emails at midnight. Let me paraphrase it. :) > > The potential issue is tricky since we create it patch-by-patch. > > Up to this patch the socket releasing procedure looks solid and > reliable. E.g. the P2P netdev destroying: > > ovpn_netdev_notifier_call(NETDEV_UNREGISTER) > ovpn_peer_release_p2p > ovpn_peer_del_p2p > ovpn_peer_put > ovpn_peer_release_kref > ovpn_peer_release > ovpn_socket_put > ovpn_socket_release_kref > ovpn_socket_detach > ovpn_udp_socket_detach > setup_udp_tunnel_sock > netdev_run_todo > rcu_barrier <- no running ovpn_udp_encap_recv after this point > free_netdev > > After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will > be spawned. And after the rcu_barrier() all running > ovpn_udp_encap_recv() will be done. All good. > ok > Then, the following patch 'ovpn: implement TCP transport' disjoin > ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the > socket detach function call: > > ovpn_socket_release_kref > ovpn_socket_schedule_release > schedule_work(&sock->work) > > And long time after the socket will be actually detached: > > ovpn_socket_release_work > ovpn_socket_detach > ovpn_udp_socket_detach > setup_udp_tunnel_sock > > And until this detaching will take a place, UDP handler can call > ovpn_udp_encap_recv() whatever number of times. > > So, we can end up with this scenario: > > ovpn_netdev_notifier_call(NETDEV_UNREGISTER) > ovpn_peer_release_p2p > ovpn_peer_del_p2p > ovpn_peer_put > ovpn_peer_release_kref > ovpn_peer_release > ovpn_socket_put > ovpn_socket_release_kref > ovpn_socket_schedule_release > schedule_work(&sock->work) > netdev_run_todo > rcu_barrier > free_netdev > > ovpn_udp_encap_recv <- called for an incoming UDP packet > ovpn_from_udp_sock <- returns pointer to freed memory > // Any access to ovpn pointer is the use-after-free > > ovpn_socket_release_work <- kernel finally ivoke the work > ovpn_socket_detach > ovpn_udp_socket_detach > setup_udp_tunnel_sock > > To address the issue, I see two possible solutions: > 1. flush the workqueue somewhere before the netdev release yes! This is what I was missing. This will also solve the "how can the module wait for all workers to be done before unloading?" > 2. set ovpn_sock->ovpn = NULL before scheduling the socket detach This makes sense too. But 1 is definitely what we need. > >>> If the above is right then shall we set ->ovpn = NULL before >>> scheduling the socket releasing work or somehow else mark the socket >>> as half- destroyed? Will think about it, it may make sense to nullify ->ovpn as well. >>> >>>> +} >>>> + >>>> /** >>>> * ovpn_socket_new - create a new socket and initialize it >>>> * @sock: the kernel socket to embed >>>> diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c >>>> index >>>> d26d7566e9c8dfe91fa77f49c34fb179a9fb2239..d1e88ae83843f02d591e67a7995f2d6868720695 100644 >>>> --- a/drivers/net/ovpn/udp.c >>>> +++ b/drivers/net/ovpn/udp.c >>>> @@ -21,9 +21,95 @@ >>>> #include "bind.h" >>>> #include "io.h" >>>> #include "peer.h" >>>> +#include "proto.h" >>>> #include "socket.h" >>>> #include "udp.h" >>>> +/** >>>> + * ovpn_udp_encap_recv - Start processing a received UDP packet. >>>> + * @sk: socket over which the packet was received >>>> + * @skb: the received packet >>>> + * >>>> + * If the first byte of the payload is DATA_V2, the packet is >>>> further processed, >>>> + * otherwise it is forwarded to the UDP stack for delivery to user >>>> space. >>>> + * >>>> + * Return: >>>> + * 0 if skb was consumed or dropped >>>> + * >0 if skb should be passed up to userspace as UDP (packet not >>>> consumed) >>>> + * <0 if skb should be resubmitted as proto -N (packet not consumed) >>>> + */ >>>> +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) >>>> +{ >>>> + struct ovpn_peer *peer = NULL; >>>> + struct ovpn_struct *ovpn; >>>> + u32 peer_id; >>>> + u8 opcode; >>>> + >>>> + ovpn = ovpn_from_udp_sock(sk); >>>> + if (unlikely(!ovpn)) { >>>> + net_err_ratelimited("%s: cannot obtain ovpn object from UDP >>>> socket\n", >>>> + __func__); >>> >>> Probably we should zero ovpn pointer in the ovpn_sock to survive >>> scheduled socket release (see comment in ovpn_from_udp_sock). So, >>> this print should be removed to avoid printing misguiding errors. >> >> I am also not following this. ovpn is already NULL if we are entering >> this branch, no? >> >> And I think this condition is quite improbable as well. > > Here, due to the scheduled nature of the detach function invocation, > ovpn_from_udp_sock() can return us a pointer to the freed memory. > > So we should prevent ovpn_udp_encap_recv() invocation after the netdev > release by flushing the workqueue. Or we can set ovpn_sock->ovpn = NULL > even before scheduling the socket detaching. And in this case, > ovpn_from_udp_sock() returning NULL will be a legitimate case and we > should drop the error printing. ok got it. it is related with the comment above. > >>>> + goto drop_noovpn; >>>> + } >>>> + >>>> + /* Make sure the first 4 bytes of the skb data buffer after the >>>> UDP >>>> + * header are accessible. >>>> + * They are required to fetch the OP code, the key ID and the >>>> peer ID. >>>> + */ >>>> + if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) + >>>> + OVPN_OP_SIZE_V2))) { >>>> + net_dbg_ratelimited("%s: packet too small\n", __func__); >>>> + goto drop; >>>> + } >>>> + >>>> + opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); >>>> + if (unlikely(opcode != OVPN_DATA_V2)) { >>>> + /* DATA_V1 is not supported */ >>>> + if (opcode == OVPN_DATA_V1) >>>> + goto drop; >>> >>> This packet dropping makes protocol accelerator, intendent to speed >>> up the data packets processing, a protocol enforcement entity, isn't >>> it? Shall we follow the principle of beeing liberal in what we accept >>> and just forward everything besides data packets upstream to a >>> userspace application? >> >> 'ovpn' only supports DATA_V2. When ovpn is in use userspace does nto >> expect any DATA packet to bubble up as it would not know what to do >> with it. >> >> So any decision regarding data packets should stay in 'ovpn'. >> >> We just decided to support the modern DATA_V2 (DATA_V1 is seldomly >> used nowadays). >> >> Moreover, it's quite impossible that a peer will send us DATA_V1 if it >> passed userspace handshake and negotiation. > > The question was about the special handling of this packet type. If this > packet type is unlikely, then why should the kernel take special care of > it? Is this specific packet type going to crash the userspace application? Not crash (hopefully) but will create confusion because it is unexpected. The userspace dataplane path is technically inactive when 'ovpn' is in use. The idea is that any DATA_V* packet should be handled in kernelspace and userspace should not need to care. > >>>> + >>>> + /* unknown or control packet: let it bubble up to userspace */ >>>> + return 1; >>>> + } >>>> + >>>> + peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); >>>> + /* some OpenVPN server implementations send data packets with the >>>> + * peer-id set to undef. In this case we skip the peer lookup >>>> by peer-id >>>> + * and we try with the transport address >>>> + */ >>>> + if (peer_id != OVPN_PEER_ID_UNDEF) { >>>> + peer = ovpn_peer_get_by_id(ovpn, peer_id); >>>> + if (!peer) { >>>> + net_err_ratelimited("%s: received data from unknown >>>> peer (id: %d)\n", >>>> + __func__, peer_id); >>> >>> Why do we consider a peer sending us garbage our problem? Meaning, >>> this peer miss can be not our fault but a malformed packet from a 3rd >>> party side. E.g. nowdays I can see a lot of traces of these "active >>> probers" in my OpenVPN logs. Shall remove this message or at least >>> make it debug to avoid bothering users with garbage traveling >>> Internet? Anyway we can not do anything regarding incoming traffic. >> >> It could also be a peer that believes to be connected while 'ovpn' >> dropped it earlier on. So this message would help the admin/user >> understanding what's going on. no? > > It could help troubleshooting, potentionally. On the other hand, it will > flood the kernel log with whatever junk is floating around the Internet. > For sure. Well, only packets having the right opcode in it and being large enough. Because we have already dropped anything that doesn't look like a DATA_V2 packet at this point. > >> Maybe make it an info/notice instead of error? > > At best it can be a debug message for developers. But IMHO the really > best choice is to get rid of it. But yeah, I agree with you. Will just silently drop. > >>>> + goto drop; >>>> + } >>>> + } > > -- > Sergey Thanks, Regards,
On 26/11/2024 09:49, Antonio Quartulli wrote: [...] >> >> The potential issue is tricky since we create it patch-by-patch. >> >> Up to this patch the socket releasing procedure looks solid and >> reliable. E.g. the P2P netdev destroying: >> >> ovpn_netdev_notifier_call(NETDEV_UNREGISTER) >> ovpn_peer_release_p2p >> ovpn_peer_del_p2p >> ovpn_peer_put >> ovpn_peer_release_kref >> ovpn_peer_release >> ovpn_socket_put >> ovpn_socket_release_kref >> ovpn_socket_detach >> ovpn_udp_socket_detach >> setup_udp_tunnel_sock >> netdev_run_todo >> rcu_barrier <- no running ovpn_udp_encap_recv after this point >> free_netdev >> >> After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() >> will be spawned. And after the rcu_barrier() all running >> ovpn_udp_encap_recv() will be done. All good. >> > > ok > >> Then, the following patch 'ovpn: implement TCP transport' disjoin >> ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the >> socket detach function call: >> >> ovpn_socket_release_kref >> ovpn_socket_schedule_release >> schedule_work(&sock->work) >> >> And long time after the socket will be actually detached: >> >> ovpn_socket_release_work >> ovpn_socket_detach >> ovpn_udp_socket_detach >> setup_udp_tunnel_sock >> >> And until this detaching will take a place, UDP handler can call >> ovpn_udp_encap_recv() whatever number of times. >> >> So, we can end up with this scenario: >> >> ovpn_netdev_notifier_call(NETDEV_UNREGISTER) >> ovpn_peer_release_p2p >> ovpn_peer_del_p2p >> ovpn_peer_put >> ovpn_peer_release_kref >> ovpn_peer_release >> ovpn_socket_put >> ovpn_socket_release_kref >> ovpn_socket_schedule_release >> schedule_work(&sock->work) >> netdev_run_todo >> rcu_barrier >> free_netdev >> >> ovpn_udp_encap_recv <- called for an incoming UDP packet >> ovpn_from_udp_sock <- returns pointer to freed memory >> // Any access to ovpn pointer is the use-after-free >> >> ovpn_socket_release_work <- kernel finally ivoke the work >> ovpn_socket_detach >> ovpn_udp_socket_detach >> setup_udp_tunnel_sock >> >> To address the issue, I see two possible solutions: >> 1. flush the workqueue somewhere before the netdev release > > yes! This is what I was missing. This will also solve the "how can the > module wait for all workers to be done before unloading?" > Actually there might be even a simpler solution: each ovpn_socket will hold a reference to an ovpn_peer (TCP) or to an ovpn_priv (UDP). I can simply increase the refcounter those objects while they are referenced by the socket and decrease it when the socket is fully released (in the detach() function called by the worker). This way the netdev cannot be released until all socket (and all peers) are gone. This approach doesn't require any local workqueue or any other special coordination as we'll just force the whole cleanup to happen in a specific order. Does it make sense? Regards,
2024-11-27, 02:40:02 +0100, Antonio Quartulli wrote: > On 26/11/2024 09:49, Antonio Quartulli wrote: > [...] > > > > > > The potential issue is tricky since we create it patch-by-patch. > > > > > > Up to this patch the socket releasing procedure looks solid and > > > reliable. E.g. the P2P netdev destroying: > > > > > > ovpn_netdev_notifier_call(NETDEV_UNREGISTER) > > > ovpn_peer_release_p2p > > > ovpn_peer_del_p2p > > > ovpn_peer_put > > > ovpn_peer_release_kref > > > ovpn_peer_release > > > ovpn_socket_put > > > ovpn_socket_release_kref > > > ovpn_socket_detach > > > ovpn_udp_socket_detach > > > setup_udp_tunnel_sock > > > netdev_run_todo > > > rcu_barrier <- no running ovpn_udp_encap_recv after this point > > > free_netdev > > > > > > After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() > > > will be spawned. And after the rcu_barrier() all running > > > ovpn_udp_encap_recv() will be done. All good. > > > > > > > ok > > > > > Then, the following patch 'ovpn: implement TCP transport' disjoin > > > ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling > > > the socket detach function call: > > > > > > ovpn_socket_release_kref > > > ovpn_socket_schedule_release > > > schedule_work(&sock->work) > > > > > > And long time after the socket will be actually detached: > > > > > > ovpn_socket_release_work > > > ovpn_socket_detach > > > ovpn_udp_socket_detach > > > setup_udp_tunnel_sock > > > > > > And until this detaching will take a place, UDP handler can call > > > ovpn_udp_encap_recv() whatever number of times. > > > > > > So, we can end up with this scenario: > > > > > > ovpn_netdev_notifier_call(NETDEV_UNREGISTER) > > > ovpn_peer_release_p2p > > > ovpn_peer_del_p2p > > > ovpn_peer_put > > > ovpn_peer_release_kref > > > ovpn_peer_release > > > ovpn_socket_put > > > ovpn_socket_release_kref > > > ovpn_socket_schedule_release > > > schedule_work(&sock->work) > > > netdev_run_todo > > > rcu_barrier > > > free_netdev > > > > > > ovpn_udp_encap_recv <- called for an incoming UDP packet > > > ovpn_from_udp_sock <- returns pointer to freed memory > > > // Any access to ovpn pointer is the use-after-free > > > > > > ovpn_socket_release_work <- kernel finally ivoke the work > > > ovpn_socket_detach > > > ovpn_udp_socket_detach > > > setup_udp_tunnel_sock > > > > > > To address the issue, I see two possible solutions: > > > 1. flush the workqueue somewhere before the netdev release > > > > yes! This is what I was missing. This will also solve the "how can the > > module wait for all workers to be done before unloading?" > > > > Actually there might be even a simpler solution: each ovpn_socket will hold > a reference to an ovpn_peer (TCP) or to an ovpn_priv (UDP). > I can simply increase the refcounter those objects while they are referenced > by the socket and decrease it when the socket is fully released (in the > detach() function called by the worker). > > This way the netdev cannot be released until all socket (and all peers) are > gone. > > This approach doesn't require any local workqueue or any other special > coordination as we'll just force the whole cleanup to happen in a specific > order. > > Does it make sense? This dependency between refcounts worries me. I'm already having a hard time remembering how all objects interact together. And since ovpn_peer_release already calls ovpn_socket_put, you'd get a refcount loop if ovpn_socket now also has a ref on the peer, no?
2024-11-26, 02:32:38 +0200, Sergey Ryazanov wrote: > On 15.11.2024 17:02, Antonio Quartulli wrote: > > On 11/11/2024 02:54, Sergey Ryazanov wrote: > > [...] > > > > + skb_reset_transport_header(skb); > > > > + skb_probe_transport_header(skb); > > > > + skb_reset_inner_headers(skb); > > > > + > > > > + memset(skb->cb, 0, sizeof(skb->cb)); > > > > > > Why do we need to zero the control buffer here? > > > > To avoid the next layer to assume the cb is clean while it is not. > > Other drivers do the same as well. > > AFAIR, there is no convention to clean the control buffer before the handing > over. The common practice is a bit opposite, programmer shall not assume > that the control buffer has been zeroed. > > Not a big deal to clean it here, we just can save some CPU cycles avoiding > it. > > > I think this was recommended by Sabrina as well. > > Curious. It's macsec that does not zero it, or I've not understood how it > was done. I only remember discussing a case [1] where one function within ovpn was expecting a cleared skb->cb to behave correctly but the caller did not clear it. In general, as you said, clearing cb "to be nice to other layers" is not expected. Sorry if some comments I made were confusing. [1] https://lore.kernel.org/netdev/ZtXOw-NcL9lvwWa8@hog > > > > +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) > > > > +{ > > > > + struct ovpn_socket *ovpn_sock; > > > > + > > > > + if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != > > > > UDP_ENCAP_OVPNINUDP)) > > > > + return NULL; > > > > + > > > > + ovpn_sock = rcu_dereference_sk_user_data(sk); > > > > + if (unlikely(!ovpn_sock)) > > > > + return NULL; > > > > + > > > > + /* make sure that sk matches our stored transport socket */ > > > > + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk)) > > > > + return NULL; > > > > + > > > > + return ovpn_sock->ovpn; > > > > > > Now, returning of this pointer is safe. But the following TCP > > > transport support calls the socket release via a scheduled work. > > > What extends socket lifetime and makes it possible to receive a UDP > > > packet way after the interface private data release. Is it correct > > > assumption? > > > > Sorry you lost me when sayng "following *TCP* transp[ort support calls". > > This function is invoked only in UDP context. > > Was that a typ0? > > Yeah, you are right. The question sounds like a riddle. I should eventually > stop composing emails at midnight. Let me paraphrase it. > > The potential issue is tricky since we create it patch-by-patch. > > Up to this patch the socket releasing procedure looks solid and reliable. > E.g. the P2P netdev destroying: > > ovpn_netdev_notifier_call(NETDEV_UNREGISTER) > ovpn_peer_release_p2p > ovpn_peer_del_p2p > ovpn_peer_put > ovpn_peer_release_kref > ovpn_peer_release > ovpn_socket_put > ovpn_socket_release_kref > ovpn_socket_detach > ovpn_udp_socket_detach > setup_udp_tunnel_sock > netdev_run_todo > rcu_barrier <- no running ovpn_udp_encap_recv after this point It's more the synchronize_net in unregister_netdevice_many_notify? rcu_barrier waits for pending kfree_rcu/call_rcu, synchronize_rcu waits for rcu_read_lock sections (see the comments for rcu_barrier and synchronize_rcu in kernel/rcu/tree.c). > free_netdev > > After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will be > spawned. And after the rcu_barrier() all running ovpn_udp_encap_recv() will > be done. All good. > > Then, the following patch 'ovpn: implement TCP transport' disjoin > ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the socket > detach function call: > > ovpn_socket_release_kref > ovpn_socket_schedule_release > schedule_work(&sock->work) > > And long time after the socket will be actually detached: > > ovpn_socket_release_work > ovpn_socket_detach > ovpn_udp_socket_detach > setup_udp_tunnel_sock > > And until this detaching will take a place, UDP handler can call > ovpn_udp_encap_recv() whatever number of times. > > So, we can end up with this scenario: > > ovpn_netdev_notifier_call(NETDEV_UNREGISTER) > ovpn_peer_release_p2p > ovpn_peer_del_p2p > ovpn_peer_put > ovpn_peer_release_kref > ovpn_peer_release > ovpn_socket_put > ovpn_socket_release_kref > ovpn_socket_schedule_release > schedule_work(&sock->work) > netdev_run_todo > rcu_barrier > free_netdev > > ovpn_udp_encap_recv <- called for an incoming UDP packet > ovpn_from_udp_sock <- returns pointer to freed memory > // Any access to ovpn pointer is the use-after-free > > ovpn_socket_release_work <- kernel finally ivoke the work > ovpn_socket_detach > ovpn_udp_socket_detach > setup_udp_tunnel_sock > > To address the issue, I see two possible solutions: > 1. flush the workqueue somewhere before the netdev release > 2. set ovpn_sock->ovpn = NULL before scheduling the socket detach Going with #2, we could fully split detach into a synchronous part and async part (with async not needed for UDP). detach_sync clears the pointers (CBs, strp_stop(), ovpn_sock->ovpn, setup_udp_tunnel_sock) so that no more packets will be sent through the ovpn driver. Related to that topic, I'm not sure what's keeping a reference on the peer to guarantee it doesn't get freed before we're done with peer->tcp.tx_work at the end of ovpn_tcp_socket_detach. Maybe all this tcp stuff should move from the peer to ovpn_socket?
2024-11-14, 11:32:36 +0100, Antonio Quartulli wrote: > On 13/11/2024 12:05, Sabrina Dubroca wrote: > > 2024-11-12, 15:26:59 +0100, Antonio Quartulli wrote: > > > On 11/11/2024 16:41, Sabrina Dubroca wrote: > > > > 2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote: > > > > > +void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer) > > > > > + __must_hold(&peer->ovpn->peers->lock) > > > > > > > > Changes to peer->vpn_addrs are not protected by peers->lock, so those > > > > could be getting updated while we're rehashing (and taking peer->lock > > > > in ovpn_nl_peer_modify as I'm suggesting above also wouldn't prevent > > > > that). > > > > > > > > > > /me screams :-D > > > > Sorry :) > > > > > Indeed peers->lock is only about protecting the lists, not the content of > > > the listed objects. > > > > > > How about acquiring the peers->lock before calling ovpn_nl_peer_modify()? > > > > It seems like it would work. Maybe a bit weird to have conditional > > locking (MP mode only), but ok. You already have this lock ordering > > (hold peers->lock before taking peer->lock) in > > ovpn_peer_keepalive_work_mp, so there should be no deadlock from doing > > the same thing in the netlink code. > > Yeah. > > > > > Then I would also do that in ovpn_peer_float to protect that rehash. > > I am not extremely comfortable with this, because it means acquiring > peers->lock on every packet (right now we do so only on peer->lock) and it > may defeat the advantage of the RCU locking on the hashtables. > Wouldn't you agree? Hmpf, yeah. Then I think you could keep most of the current code, except doing the rehash under both locks (peers + peer), and get ss+sa_len for the rehash directly from peer->bind (instead of using the ones we just defined locally in ovpn_peer_float, since they may have changed while we released peer->lock to grab peers->lock). We may end up "rehashing" twice into the same bucket if we have 2 concurrent peer_float calls (call 1 sets remote r1, call 2 sets a new one r2, call 1 hashes according to r2, call 2 also rehashes based on r2). That should be ok (it can happen anyway that a "real" rehash lands in the same bucket). peer_float { spin_lock(peer) match/update bind spin_unlock(peer) if (MP) { spin_lock(peers) spin_lock(peer) rehash using peer->bind->remote rather than ss spin_unlock(peer) spin_unlock(peers) } } Does that sound reasonable?
On 29/11/2024 14:20, Sabrina Dubroca wrote: > 2024-11-27, 02:40:02 +0100, Antonio Quartulli wrote: >> On 26/11/2024 09:49, Antonio Quartulli wrote: >> [...] >>>> >>>> The potential issue is tricky since we create it patch-by-patch. >>>> >>>> Up to this patch the socket releasing procedure looks solid and >>>> reliable. E.g. the P2P netdev destroying: >>>> >>>> ovpn_netdev_notifier_call(NETDEV_UNREGISTER) >>>> ovpn_peer_release_p2p >>>> ovpn_peer_del_p2p >>>> ovpn_peer_put >>>> ovpn_peer_release_kref >>>> ovpn_peer_release >>>> ovpn_socket_put >>>> ovpn_socket_release_kref >>>> ovpn_socket_detach >>>> ovpn_udp_socket_detach >>>> setup_udp_tunnel_sock >>>> netdev_run_todo >>>> rcu_barrier <- no running ovpn_udp_encap_recv after this point >>>> free_netdev >>>> >>>> After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() >>>> will be spawned. And after the rcu_barrier() all running >>>> ovpn_udp_encap_recv() will be done. All good. >>>> >>> >>> ok >>> >>>> Then, the following patch 'ovpn: implement TCP transport' disjoin >>>> ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling >>>> the socket detach function call: >>>> >>>> ovpn_socket_release_kref >>>> ovpn_socket_schedule_release >>>> schedule_work(&sock->work) >>>> >>>> And long time after the socket will be actually detached: >>>> >>>> ovpn_socket_release_work >>>> ovpn_socket_detach >>>> ovpn_udp_socket_detach >>>> setup_udp_tunnel_sock >>>> >>>> And until this detaching will take a place, UDP handler can call >>>> ovpn_udp_encap_recv() whatever number of times. >>>> >>>> So, we can end up with this scenario: >>>> >>>> ovpn_netdev_notifier_call(NETDEV_UNREGISTER) >>>> ovpn_peer_release_p2p >>>> ovpn_peer_del_p2p >>>> ovpn_peer_put >>>> ovpn_peer_release_kref >>>> ovpn_peer_release >>>> ovpn_socket_put >>>> ovpn_socket_release_kref >>>> ovpn_socket_schedule_release >>>> schedule_work(&sock->work) >>>> netdev_run_todo >>>> rcu_barrier >>>> free_netdev >>>> >>>> ovpn_udp_encap_recv <- called for an incoming UDP packet >>>> ovpn_from_udp_sock <- returns pointer to freed memory >>>> // Any access to ovpn pointer is the use-after-free >>>> >>>> ovpn_socket_release_work <- kernel finally ivoke the work >>>> ovpn_socket_detach >>>> ovpn_udp_socket_detach >>>> setup_udp_tunnel_sock >>>> >>>> To address the issue, I see two possible solutions: >>>> 1. flush the workqueue somewhere before the netdev release >>> >>> yes! This is what I was missing. This will also solve the "how can the >>> module wait for all workers to be done before unloading?" >>> >> >> Actually there might be even a simpler solution: each ovpn_socket will hold >> a reference to an ovpn_peer (TCP) or to an ovpn_priv (UDP). >> I can simply increase the refcounter those objects while they are referenced >> by the socket and decrease it when the socket is fully released (in the >> detach() function called by the worker). >> >> This way the netdev cannot be released until all socket (and all peers) are >> gone. >> >> This approach doesn't require any local workqueue or any other special >> coordination as we'll just force the whole cleanup to happen in a specific >> order. >> >> Does it make sense? > > This dependency between refcounts worries me. I'm already having a > hard time remembering how all objects interact together. > > And since ovpn_peer_release already calls ovpn_socket_put, you'd get a > refcount loop if ovpn_socket now also has a ref on the peer, no? You're right. Therefore I started playing with the following approach: * implement ovpn_peer_remove() that is invoked by ovpn_peer_del(), i.e. when ovpn wants to remove the peer from its state * ovpn_peer_remove() will do all kind of cleanup and unhash, including calling ovpn_socket_put() * in turn, when the socket is released from all other contexts, it will also call ovpn_peer_put() and allow the peer to be free'd for good. On one hand it sounds a bit clumsy, but on the other hand it allows each component to keep relying on any reference it is holding until the end. The only downside is that we will start shutting down a peer and then keep it around until any reference is dropped. But it should work. Regards,
On 29/11/2024 18:00, Sabrina Dubroca wrote: > 2024-11-14, 11:32:36 +0100, Antonio Quartulli wrote: >> On 13/11/2024 12:05, Sabrina Dubroca wrote: >>> 2024-11-12, 15:26:59 +0100, Antonio Quartulli wrote: >>>> On 11/11/2024 16:41, Sabrina Dubroca wrote: >>>>> 2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote: >>>>>> +void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer) >>>>>> + __must_hold(&peer->ovpn->peers->lock) >>>>> >>>>> Changes to peer->vpn_addrs are not protected by peers->lock, so those >>>>> could be getting updated while we're rehashing (and taking peer->lock >>>>> in ovpn_nl_peer_modify as I'm suggesting above also wouldn't prevent >>>>> that). >>>>> >>>> >>>> /me screams :-D >>> >>> Sorry :) >>> >>>> Indeed peers->lock is only about protecting the lists, not the content of >>>> the listed objects. >>>> >>>> How about acquiring the peers->lock before calling ovpn_nl_peer_modify()? >>> >>> It seems like it would work. Maybe a bit weird to have conditional >>> locking (MP mode only), but ok. You already have this lock ordering >>> (hold peers->lock before taking peer->lock) in >>> ovpn_peer_keepalive_work_mp, so there should be no deadlock from doing >>> the same thing in the netlink code. >> >> Yeah. >> >>> >>> Then I would also do that in ovpn_peer_float to protect that rehash. >> >> I am not extremely comfortable with this, because it means acquiring >> peers->lock on every packet (right now we do so only on peer->lock) and it >> may defeat the advantage of the RCU locking on the hashtables. >> Wouldn't you agree? > > Hmpf, yeah. Then I think you could keep most of the current code, > except doing the rehash under both locks (peers + peer), and get > ss+sa_len for the rehash directly from peer->bind (instead of using > the ones we just defined locally in ovpn_peer_float, since they may > have changed while we released peer->lock to grab peers->lock). We may > end up "rehashing" twice into the same bucket if we have 2 concurrent > peer_float calls (call 1 sets remote r1, call 2 sets a new one r2, > call 1 hashes according to r2, call 2 also rehashes based on r2). That > should be ok (it can happen anyway that a "real" rehash lands in the > same bucket). I think the double rehashing is ok. It's a double float happening so we expect a double rehashing in any case. > > peer_float { > spin_lock(peer) > match/update bind > spin_unlock(peer) > > if (MP) { > spin_lock(peers) > spin_lock(peer) > rehash using peer->bind->remote rather than ss > spin_unlock(peer) > spin_unlock(peers) > } > } > > > Does that sound reasonable? Yeah, not very elegant, but this is what we need :) Thanks! Regards,
On 26/11/2024 09:17, Antonio Quartulli wrote: [...] >> It wasn't suggested to destroy the interface in case of interface >> becoming non-operational. I apologize if something I wrote earlier >> sounded like that. The interface existence stays unquestionable. It's >> going to be solid persistent. >> >> Back to the proposed rephrasing. If the 'full picture' means forcing >> the running state indication even when the netdev is not capable to >> deliver packets, then it looks like an attempt to hide the control >> knob of the misguiding feature somewhere else. >> >> And since the concept of on-purpose false indication is still here, >> many words regarding the control plane and a full picture do not sound >> good either. > Sergey, I have played a bit with this and, if I understood your idea correctly, the following should be an acceptable design for a P2P interface: * iface created -> netif_carrier_off * peer added -> netif_carrier_on * peer deleted -> netif_carrier_off * iface goes down -> peer deleted -> netif_carrier_off * iface goes up -> carrier stays down until peer is added P2MP interface behaviour is not changed: when interface is brought up carrier goes on and it is never turned off. How does it sound? My main concern was about bringing the interface down, but this is actually not happening. Correct me if I am wrong. Thanks. Regards,