From patchwork Thu Aug 20 23:49:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luke Hsiao X-Patchwork-Id: 262162 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19B24C433DF for ; Thu, 20 Aug 2020 23:57:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E28AF20674 for ; Thu, 20 Aug 2020 23:57:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="T6P+WI0b" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725859AbgHTX5o (ORCPT ); Thu, 20 Aug 2020 19:57:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726772AbgHTX4K (ORCPT ); Thu, 20 Aug 2020 19:56:10 -0400 Received: from mail-pf1-x441.google.com (mail-pf1-x441.google.com [IPv6:2607:f8b0:4864:20::441]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28A89C06134C for ; Thu, 20 Aug 2020 16:50:24 -0700 (PDT) Received: by mail-pf1-x441.google.com with SMTP id m71so170451pfd.1 for ; Thu, 20 Aug 2020 16:50:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=5aNolZrFVM5fFg7pW0bukJjIqrtX1zSAqLqhSf5Fdn0=; b=T6P+WI0bJAuiWVP53rY1r2lIJj05UHq204WD9T+xrgAqkEUUeqrIomi7F5e6lOzJTS cPC1sTVPJ1LmyP96+6BjvCOyHEcz0pC6sC8Xr4JNG3Nbok1mjwFQqzIC1Ggu2ZW0/TrT ZqCTMWa8ekmt5DobVBDXiUUDV8vkoeuyjDJiJaBs0lAEcqqHAwdVRYASCstfeHsfjIOM p4fEHIyt5Edo9mCOXfdurYjHhcqcco3Fz6PHjS9GXuCOk1YPB5gsc8TvPW6n46C0LloQ RZhiDyBHlOvaXHmqGF/ZhuasiBG3WA1tlgtRDZWb8zkDeNvOuOLJhmepQHVN+AowTeei ADUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=5aNolZrFVM5fFg7pW0bukJjIqrtX1zSAqLqhSf5Fdn0=; b=nWOAJk1Tlg3T/anegIBpxwY6AKz/zc+fZU8aCHM4X2X0ZhbVLVSHu0yCXGtDoYP/co RwSiw43TLPRySmNc31owmKHRk6RENdXQp8H0fLMzHbUjfksZOoc0b4qWjT84eWMiM3Ps tIgkelT2mRshJBxqugT/co62Uyr0MdJ/OYxN6DQV/rccsN7y2ceE17Cukn7JgOdFNfJK NzJM9UymYebKgr2qNrufbdGMs2Pa6Nac2rbHzYlwggO5ZemKO+ae9dktOQO0f5rsM77D TGuL4DbWV2gRRLyDrbv5+Hs4Pcrscf0YETSvGO2B3qRASEr7ltbZuCcVdEnf76jEr4qM SKpA== X-Gm-Message-State: AOAM533ewGdzG2FSw8N+P8uBwMtIXPMBd3HWl1Kz/rWp7ITaLdiFrtES YYjA28jN/Ds8zd6lRg3i72pv0InMlW0= X-Google-Smtp-Source: ABdhPJzQzT0IkT6C5YA+iIDwdMxlZjBEhZKtLqIBkfwPARLWiGBdSW5gVjQibJ0YhStutjAxANYAPg== X-Received: by 2002:a62:1c58:: with SMTP id c85mr227132pfc.105.1597967423637; Thu, 20 Aug 2020 16:50:23 -0700 (PDT) Received: from lukehsiao.c.googlers.com.com (40.156.233.35.bc.googleusercontent.com. [35.233.156.40]) by smtp.gmail.com with ESMTPSA id x9sm194815pff.145.2020.08.20.16.50.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Aug 2020 16:50:22 -0700 (PDT) From: Luke Hsiao To: David Miller Cc: netdev@vger.kernel.org, Jens Axboe , Luke Hsiao , Soheil Hassas Yeganeh , Arjun Roy , Eric Dumazet , Jann Horn Subject: [PATCH net-next 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() Date: Thu, 20 Aug 2020 16:49:53 -0700 Message-Id: <20200820234954.1784522-2-luke.w.hsiao@gmail.com> X-Mailer: git-send-email 2.28.0.297.g1956fa8f8d-goog In-Reply-To: <20200820234954.1784522-1-luke.w.hsiao@gmail.com> References: <20200820234954.1784522-1-luke.w.hsiao@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Luke Hsiao For TCP tx zero-copy, the kernel notifies the process of completions by queuing completion notifications on the socket error queue. This patch allows reading these notifications via recvmsg to support TCP tx zero-copy. Ancillary data was originally disallowed due to privilege escalation via io_uring's offloading of sendmsg() onto a kernel thread with kernel credentials (https://crbug.com/project-zero/1975). So, we must ensure that the socket type is one where the ancillary data types that are delivered on recvmsg are plain data (no file descriptors or values that are translated based on the identity of the calling process). This was tested by using io_uring to call recvmsg on the MSG_ERRQUEUE with tx zero-copy enabled. Before this patch, we received -EINVALID from this specific code path. After this patch, we could read tcp tx zero-copy completion notifications from the MSG_ERRQUEUE. Signed-off-by: Soheil Hassas Yeganeh Signed-off-by: Arjun Roy Acked-by: Eric Dumazet Reviewed-by: Jann Horn Signed-off-by: Luke Hsiao --- include/linux/net.h | 3 +++ net/ipv4/af_inet.c | 1 + net/ipv6/af_inet6.c | 1 + net/socket.c | 8 +++++--- 4 files changed, 10 insertions(+), 3 deletions(-) diff --git a/include/linux/net.h b/include/linux/net.h index d48ff1180879..7657c6432a69 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -41,6 +41,8 @@ struct net; #define SOCK_PASSCRED 3 #define SOCK_PASSSEC 4 +#define PROTO_CMSG_DATA_ONLY 0x0001 + #ifndef ARCH_HAS_SOCKET_TYPES /** * enum sock_type - Socket types @@ -135,6 +137,7 @@ typedef int (*sk_read_actor_t)(read_descriptor_t *, struct sk_buff *, struct proto_ops { int family; + unsigned int flags; struct module *owner; int (*release) (struct socket *sock); int (*bind) (struct socket *sock, diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 4307503a6f0b..b7260c8cef2e 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1017,6 +1017,7 @@ static int inet_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned lon const struct proto_ops inet_stream_ops = { .family = PF_INET, + .flags = PROTO_CMSG_DATA_ONLY, .owner = THIS_MODULE, .release = inet_release, .bind = inet_bind, diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 0306509ab063..d9a14935f402 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -661,6 +661,7 @@ int inet6_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, const struct proto_ops inet6_stream_ops = { .family = PF_INET6, + .flags = PROTO_CMSG_DATA_ONLY, .owner = THIS_MODULE, .release = inet6_release, .bind = inet6_bind, diff --git a/net/socket.c b/net/socket.c index dbbe8ea7d395..e84a8e281b4c 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2628,9 +2628,11 @@ long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg, struct user_msghdr __user *umsg, struct sockaddr __user *uaddr, unsigned int flags) { - /* disallow ancillary data requests from this path */ - if (msg->msg_control || msg->msg_controllen) - return -EINVAL; + if (msg->msg_control || msg->msg_controllen) { + /* disallow ancillary data reqs unless cmsg is plain data */ + if (!(sock->ops->flags & PROTO_CMSG_DATA_ONLY)) + return -EINVAL; + } return ____sys_recvmsg(sock, msg, umsg, uaddr, flags, 0); } From patchwork Thu Aug 20 23:49:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luke Hsiao X-Patchwork-Id: 262163 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC533C433E1 for ; Thu, 20 Aug 2020 23:56:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C60E720702 for ; Thu, 20 Aug 2020 23:56:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="c3a3StAt" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726974AbgHTX45 (ORCPT ); Thu, 20 Aug 2020 19:56:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727040AbgHTX4W (ORCPT ); Thu, 20 Aug 2020 19:56:22 -0400 Received: from mail-pg1-x543.google.com (mail-pg1-x543.google.com [IPv6:2607:f8b0:4864:20::543]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5580FC06134D for ; Thu, 20 Aug 2020 16:50:29 -0700 (PDT) Received: by mail-pg1-x543.google.com with SMTP id x6so112582pgx.12 for ; Thu, 20 Aug 2020 16:50:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=qobnCFVTWkfs7U2jqVYwHzId3JsX5devmX9LImldd+A=; b=c3a3StAtWKGmtVp43/Aq0TJXS0Om+PYmlXjwTMTJDsGmr7ZolpEHFQWg5sywrzBJ9h /xGEZFlYfzJCAWMHUDyTAHWarS5hhsvt2OUaZHdcMt5695K+hacVkD7TteE52+vrzxae KG6xEWRp9B22cNbcG4n9ZfN38QApkvc9GEBRX8269M2fsP3cB47/fNfDRtRuiwZF2T4e J5zKcBjwF2F65X4JigEawO0E+4HA2+X+SBPYIjA6VCITUCCGh3TNCLyUsqssAu//fGmu VoRcH6kXA92e94ZkR5RWdUjjHC4uoYE55i3rtHotfJv41gVbk+31Pc2S9i0EpT+eMbWY jtQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qobnCFVTWkfs7U2jqVYwHzId3JsX5devmX9LImldd+A=; b=eSau11UOI9JCJ4c15t4qmJtemVV0jo3SUyCd/oCbZTurYiPw2dEpzLUbIzyLbcEZyu 0y+jVw6fip2i4FiYgrnmOdOWKJlkD02PrafRtFcgaI6hrOKcWd7p7dcyvWkGpJzXHB+A 6iysbgEVE/v6E2hx6Dgh/RVNwI7aCXpvVXUfxCBTZTMFgyFXxJeB+OnLGoX5MuLgYQzu AN+0RiFlNAl4ZnBMobGKJie9EFTfJrw0kLOV83cWXwwNKeVOBWEPDrIJ/UTPm1Ry9rVK E3ffSGe+lpJg/tjM4WAiFubnJTQ26re6pMNa8ish+IS2XtrlmthldVVuMlYyLopIvigb I6nw== X-Gm-Message-State: AOAM532V+QcgA0vctRu2lVV1Fth8Gd+rms5mZsbQ7sJrzHRcv7L0WsJU mon/dB67hstZ1Jj43DiaUJoCeZMqKYQ= X-Google-Smtp-Source: ABdhPJxoQTk6TB2x2wRckZC5THjvyjw9L7NBijXzgs1vPfw0a0qxwNvQFL+IDhOyMMF/8vUHCW0Jqg== X-Received: by 2002:a63:af47:: with SMTP id s7mr377554pgo.335.1597967428892; Thu, 20 Aug 2020 16:50:28 -0700 (PDT) Received: from lukehsiao.c.googlers.com.com (40.156.233.35.bc.googleusercontent.com. [35.233.156.40]) by smtp.gmail.com with ESMTPSA id x9sm194815pff.145.2020.08.20.16.50.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Aug 2020 16:50:28 -0700 (PDT) From: Luke Hsiao To: David Miller Cc: netdev@vger.kernel.org, Jens Axboe , Luke Hsiao , Arjun Roy , Soheil Hassas Yeganeh , Eric Dumazet Subject: [PATCH net-next 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE Date: Thu, 20 Aug 2020 16:49:54 -0700 Message-Id: <20200820234954.1784522-3-luke.w.hsiao@gmail.com> X-Mailer: git-send-email 2.28.0.297.g1956fa8f8d-goog In-Reply-To: <20200820234954.1784522-1-luke.w.hsiao@gmail.com> References: <20200820234954.1784522-1-luke.w.hsiao@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Luke Hsiao Currently, io_uring's recvmsg subscribes to both POLLERR and POLLIN. In the context of TCP tx zero-copy, this is inefficient since we are only reading the error queue and not using recvmsg to read POLLIN responses. This patch was tested by using a simple sending program to call recvmsg using io_uring with MSG_ERRQUEUE set and verifying with printks that the POLLIN is correctly unset when the msg flags are MSG_ERRQUEUE. Signed-off-by: Arjun Roy Signed-off-by: Soheil Hassas Yeganeh Acked-by: Eric Dumazet Signed-off-by: Luke Hsiao --- fs/io_uring.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index dc506b75659c..664ce8739615 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -79,6 +79,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include @@ -4902,7 +4903,8 @@ static __poll_t __io_arm_poll_handler(struct io_kiocb *req, return mask; } -static bool io_arm_poll_handler(struct io_kiocb *req) +static bool io_arm_poll_handler(struct io_kiocb *req, + const struct io_uring_sqe *sqe) { const struct io_op_def *def = &io_op_defs[req->opcode]; struct io_ring_ctx *ctx = req->ctx; @@ -4932,6 +4934,11 @@ static bool io_arm_poll_handler(struct io_kiocb *req) mask |= POLLIN | POLLRDNORM; if (def->pollout) mask |= POLLOUT | POLLWRNORM; + + /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */ + if (req->opcode == IORING_OP_RECVMSG && (sqe->msg_flags & MSG_ERRQUEUE)) + mask &= ~(POLLIN); + mask |= POLLERR | POLLPRI; ipt.pt._qproc = io_async_queue_proc; @@ -6146,7 +6153,7 @@ static void __io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe, * doesn't support non-blocking read/write attempts */ if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) { - if (!io_arm_poll_handler(req)) { + if (!io_arm_poll_handler(req, sqe)) { punt: ret = io_prep_work_files(req); if (unlikely(ret))