From patchwork Wed Jun 24 17:17:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Herbert X-Patchwork-Id: 217175 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1420DC433DF for ; Wed, 24 Jun 2020 17:19:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E184B20823 for ; Wed, 24 Jun 2020 17:19:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.b="EiK+iXWL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405445AbgFXRTF (ORCPT ); Wed, 24 Jun 2020 13:19:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405414AbgFXRTE (ORCPT ); Wed, 24 Jun 2020 13:19:04 -0400 Received: from mail-pl1-x643.google.com (mail-pl1-x643.google.com [IPv6:2607:f8b0:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 710ECC061573 for ; Wed, 24 Jun 2020 10:19:04 -0700 (PDT) Received: by mail-pl1-x643.google.com with SMTP id d10so1305254pls.5 for ; Wed, 24 Jun 2020 10:19:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ol9t0bjdsAj0UzWNXZ3UJ+4bKWVFzJMdubkoVZLa8xA=; b=EiK+iXWLGGkn4hpVC0KxeD/SKkLR03cRAQ/XwlOw9T1/k3oaxAopAhREqjpJ4kAiR/ smVPciiVnd9/FD9DAJJ6l9avVnir/JTl1pL+FsBhT3MSN2aCeEFl4W2Zv8OEstazuqmb hBOGPZ1dJzHIzTkG92yh4gcrGIFB/P3ys8aEqA0iaMXj38l//uFHWsMF9EhYrDj/gKht 8c3vt69qRhjeM0uH1A0VEE8Fgniu+qmV0mjRHrEbeCJsTNafJqKu5TatTpWGiAuBLHm6 r1UKqzqNQ3bGEx2eDiJndF0nj1yiJe5fzjnxN4hjetuMtiJGv/HVnJHXwY4W9U+VX0Kn cHaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ol9t0bjdsAj0UzWNXZ3UJ+4bKWVFzJMdubkoVZLa8xA=; b=cxvr6Qwz9emMDqysBEIvZXoyyVMM5fu0dd5Be3EBoGRJ+u73ZFNADk8il09L15nSHN /H0SqJJBmWArtJnfN2W7Se0P3ty/pmGqxKXKpeu33V1kCiJNbEPlOKcWf/F3e8mOJGJP sjweX/DgHvNn/Ue9zp5Ej8QSLh4X+aCFjLwYJ6Bw9CWEUz8aFUESt8qF0QcJKRqCd4+c 2IH/8OfwX6mGfMRbq37kmS3a2FdCC9EUS72W99xDuucwbJMbmXYwNwi8RRLbYL7GHSxx lroe/9xMIXKZpzlPubeQBSp4hkdJnWhLgUGUWhJCdHSf2Cuj6XcwI53roFrcA9d3Ur5U eTtw== X-Gm-Message-State: AOAM533kWdFqxxBAPNTydAXbeEcKd2SGvBp2MWbRzzCbo/6q0ABHmHZc bC1IqGT69mWaO0q/EP6/xB4EddzdKsE= X-Google-Smtp-Source: ABdhPJxUdPE/TcbmEDgp+ec+yReJTK3z9xFkM8syVGkc37ghhdKA6MDwSHb+qj8wkEY7SXzDUb9Krg== X-Received: by 2002:a17:902:8b82:: with SMTP id ay2mr18039670plb.185.1593019143557; Wed, 24 Jun 2020 10:19:03 -0700 (PDT) Received: from localhost.localdomain (c-73-202-182-113.hsd1.ca.comcast.net. [73.202.182.113]) by smtp.gmail.com with ESMTPSA id w18sm17490241pgj.31.2020.06.24.10.19.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 10:19:02 -0700 (PDT) From: Tom Herbert To: netdev@vger.kernel.org Cc: Tom Herbert Subject: [RFC PATCH 01/11] cgroup: Export cgroup_{procs, threads}_start and cgroup_procs_next Date: Wed, 24 Jun 2020 10:17:40 -0700 Message-Id: <20200624171749.11927-2-tom@herbertland.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624171749.11927-1-tom@herbertland.com> References: <20200624171749.11927-1-tom@herbertland.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Export the functions and put prototypes in linux/cgroup.h. This allows creating cgroup entries that provide per task information. --- include/linux/cgroup.h | 3 +++ kernel/cgroup/cgroup.c | 9 ++++++--- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 4598e4da6b1b..59837f6f4e54 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -119,6 +119,9 @@ int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen); int cgroupstats_build(struct cgroupstats *stats, struct dentry *dentry); int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *tsk); +void *cgroup_procs_start(struct seq_file *s, loff_t *pos); +void *cgroup_threads_start(struct seq_file *s, loff_t *pos); +void *cgroup_procs_next(struct seq_file *s, void *v, loff_t *pos); void cgroup_fork(struct task_struct *p); extern int cgroup_can_fork(struct task_struct *p, diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 1ea181a58465..69cd14201cf0 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -4597,7 +4597,7 @@ static void cgroup_procs_release(struct kernfs_open_file *of) } } -static void *cgroup_procs_next(struct seq_file *s, void *v, loff_t *pos) +void *cgroup_procs_next(struct seq_file *s, void *v, loff_t *pos) { struct kernfs_open_file *of = s->private; struct css_task_iter *it = of->priv; @@ -4607,6 +4607,7 @@ static void *cgroup_procs_next(struct seq_file *s, void *v, loff_t *pos) return css_task_iter_next(it); } +EXPORT_SYMBOL_GPL(cgroup_procs_next); static void *__cgroup_procs_start(struct seq_file *s, loff_t *pos, unsigned int iter_flags) @@ -4637,7 +4638,7 @@ static void *__cgroup_procs_start(struct seq_file *s, loff_t *pos, return cgroup_procs_next(s, NULL, NULL); } -static void *cgroup_procs_start(struct seq_file *s, loff_t *pos) +void *cgroup_procs_start(struct seq_file *s, loff_t *pos) { struct cgroup *cgrp = seq_css(s)->cgroup; @@ -4653,6 +4654,7 @@ static void *cgroup_procs_start(struct seq_file *s, loff_t *pos) return __cgroup_procs_start(s, pos, CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED); } +EXPORT_SYMBOL_GPL(cgroup_procs_start); static int cgroup_procs_show(struct seq_file *s, void *v) { @@ -4764,10 +4766,11 @@ static ssize_t cgroup_procs_write(struct kernfs_open_file *of, return ret ?: nbytes; } -static void *cgroup_threads_start(struct seq_file *s, loff_t *pos) +void *cgroup_threads_start(struct seq_file *s, loff_t *pos) { return __cgroup_procs_start(s, pos, 0); } +EXPORT_SYMBOL_GPL(cgroup_threads_start); static ssize_t cgroup_threads_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) From patchwork Wed Jun 24 17:17:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Herbert X-Patchwork-Id: 217174 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A256EC433E0 for ; Wed, 24 Jun 2020 17:19:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 75F2E20823 for ; Wed, 24 Jun 2020 17:19:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.b="PdmPaahH" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405456AbgFXRTT (ORCPT ); Wed, 24 Jun 2020 13:19:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48828 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405451AbgFXRTT (ORCPT ); Wed, 24 Jun 2020 13:19:19 -0400 Received: from mail-pg1-x544.google.com (mail-pg1-x544.google.com [IPv6:2607:f8b0:4864:20::544]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF85AC061573 for ; Wed, 24 Jun 2020 10:19:18 -0700 (PDT) Received: by mail-pg1-x544.google.com with SMTP id w2so915671pgg.10 for ; Wed, 24 Jun 2020 10:19:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=70hby/uDfnlNOV84ZWyp8KNhkZ8UU7RwIhRI9s67NEc=; b=PdmPaahHcaUcSThGgBro358sZNBvviSfpxQiyxyA8ko76/Bzs5iTnuHUD+913/p8jg D2a8bC+r0tPewLCBMzWLNvMog38jKIsvyhgbBlB53uzSaJfgOYhCnfRuN9c9Zd0EXoyv ke97xb6NfZwZCoY4LfJdyZL5rKi6hshYiwhZ5QSbOETi/YtZ0PQ+z65/qk0yL/YIUCPi Fdx9jpZXlsS0Tv+sDeLHqyFFHxEc8KSCxxAlgs3YkQMuW9GXdQs295a1VJV4IbTc8OwK mKoCiSjls72lA5NvXOsxubXBEVvGBud36rbw2XWvItM9zNoDqAPgfA7GpJv3GZ2C5CTP U+Fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=70hby/uDfnlNOV84ZWyp8KNhkZ8UU7RwIhRI9s67NEc=; b=UvGBNPligWpkOLUgeO6diykQFU9kwAGn4KcpUkwzEp8GiQikk8VJPH3YWhmfrg+y2S 8elyf6tROQsDibrv2r/v8tv35hdQcklCZfkHSoWyRYneoIQMTdP+TlvzmKbDns4IMmkz 67E2UWndEv+DEezpsjSUqzmHeKp6hUfK7Iwv2Ho1IFe6xPDeu9H5U0Dnga7ACr2eccaj OxbnU6TC4HUg1lxmlHHpu4cTaNUlKjg2uDsY9rNjAZDSSAoyYaflm/uLmH7OjOuEwHTY JpeT0Eb0zvIb1bg5cwDW6Wxp0F8XDxK4x5DbBuwjdWU+CwEepecX4p63ju9KALyKjR5E corw== X-Gm-Message-State: AOAM533NrtolOPjjMftYPR58mYvsTmK9/KsFtkQG25ZQwUYX0nk1dvjF 1CyJ1Crk4IlJKvrkS0IE1ReRgti9cE8= X-Google-Smtp-Source: ABdhPJzoKpvimB8ijVg2M/UozoW3bWV5HuW9QTZNuuj67H0GJk1ky50V/+WzjqK3YqVMknOaKYFB1g== X-Received: by 2002:a65:6496:: with SMTP id e22mr23292609pgv.63.1593019158077; Wed, 24 Jun 2020 10:19:18 -0700 (PDT) Received: from localhost.localdomain (c-73-202-182-113.hsd1.ca.comcast.net. [73.202.182.113]) by smtp.gmail.com with ESMTPSA id w18sm17490241pgj.31.2020.06.24.10.19.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 10:19:17 -0700 (PDT) From: Tom Herbert To: netdev@vger.kernel.org Cc: Tom Herbert Subject: [RFC PATCH 03/11] arfs: Create set_arfs_queue Date: Wed, 24 Jun 2020 10:17:42 -0700 Message-Id: <20200624171749.11927-4-tom@herbertland.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624171749.11927-1-tom@herbertland.com> References: <20200624171749.11927-1-tom@herbertland.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Abstract out the code for steering a flow to an aRFS queue (via ndo_rx_flow_steer) into its own function. This allows the function to be called in other use cases. --- net/core/dev.c | 67 +++++++++++++++++++++++++++++--------------------- 1 file changed, 39 insertions(+), 28 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 6bc2388141f6..9f7a3e78e23a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4250,42 +4250,53 @@ EXPORT_SYMBOL(rps_needed); struct static_key_false rfs_needed __read_mostly; EXPORT_SYMBOL(rfs_needed); +#ifdef CONFIG_RFS_ACCEL +static void set_arfs_queue(struct net_device *dev, struct sk_buff *skb, + struct rps_dev_flow *rflow, u16 rxq_index) +{ + struct rps_dev_flow_table *flow_table; + struct netdev_rx_queue *rxqueue; + struct rps_dev_flow *old_rflow; + u32 flow_id; + int rc; + + rxqueue = dev->_rx + rxq_index; + + flow_table = rcu_dereference(rxqueue->rps_flow_table); + if (!flow_table) + return; + + flow_id = skb_get_hash(skb) & flow_table->mask; + rc = dev->netdev_ops->ndo_rx_flow_steer(dev, skb, + rxq_index, flow_id); + if (rc < 0) + return; + + old_rflow = rflow; + rflow = &flow_table->flows[flow_id]; + rflow->filter = rc; + if (old_rflow->filter == rflow->filter) + old_rflow->filter = RPS_NO_FILTER; +} +#endif + static struct rps_dev_flow * set_rps_cpu(struct net_device *dev, struct sk_buff *skb, struct rps_dev_flow *rflow, u16 next_cpu) { if (next_cpu < nr_cpu_ids) { #ifdef CONFIG_RFS_ACCEL - struct netdev_rx_queue *rxqueue; - struct rps_dev_flow_table *flow_table; - struct rps_dev_flow *old_rflow; - u32 flow_id; - u16 rxq_index; - int rc; /* Should we steer this flow to a different hardware queue? */ - if (!skb_rx_queue_recorded(skb) || !dev->rx_cpu_rmap || - !(dev->features & NETIF_F_NTUPLE)) - goto out; - rxq_index = cpu_rmap_lookup_index(dev->rx_cpu_rmap, next_cpu); - if (rxq_index == skb_get_rx_queue(skb)) - goto out; - - rxqueue = dev->_rx + rxq_index; - flow_table = rcu_dereference(rxqueue->rps_flow_table); - if (!flow_table) - goto out; - flow_id = skb_get_hash(skb) & flow_table->mask; - rc = dev->netdev_ops->ndo_rx_flow_steer(dev, skb, - rxq_index, flow_id); - if (rc < 0) - goto out; - old_rflow = rflow; - rflow = &flow_table->flows[flow_id]; - rflow->filter = rc; - if (old_rflow->filter == rflow->filter) - old_rflow->filter = RPS_NO_FILTER; - out: + if (skb_rx_queue_recorded(skb) && dev->rx_cpu_rmap && + (dev->features & NETIF_F_NTUPLE)) { + u16 rxq_index; + + rxq_index = cpu_rmap_lookup_index(dev->rx_cpu_rmap, + next_cpu); + if (rxq_index != skb_get_rx_queue(skb)) + set_arfs_queue(dev, skb, rflow, rxq_index); + } #endif rflow->last_qtail = per_cpu(softnet_data, next_cpu).input_queue_head; From patchwork Wed Jun 24 17:17:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Herbert X-Patchwork-Id: 217173 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CA30C433DF for ; Wed, 24 Jun 2020 17:19:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F279B20823 for ; Wed, 24 Jun 2020 17:19:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.b="sPl29O/m" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405464AbgFXRTb (ORCPT ); Wed, 24 Jun 2020 13:19:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405391AbgFXRTa (ORCPT ); Wed, 24 Jun 2020 13:19:30 -0400 Received: from mail-pl1-x644.google.com (mail-pl1-x644.google.com [IPv6:2607:f8b0:4864:20::644]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7BE8FC061573 for ; Wed, 24 Jun 2020 10:19:30 -0700 (PDT) Received: by mail-pl1-x644.google.com with SMTP id f2so1299217plr.8 for ; Wed, 24 Jun 2020 10:19:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ioFTKN5RCu0a7L5Oh0HoLLLpAlODISnRUTDIV10EYKA=; b=sPl29O/mx8F2uru+TrWX4LXSKhPye0yGhnAtyug/AWAjyMQ1adSirI0F3rUxBiRR4q SaBuVzauWxsQb2E4PN8v+WwpShaXwmE2H6wmR/vLiFpdtWsUuxPQc6ul2PHEY9S9OxJ6 15hSnidYFjpOVMlffl5gbJBqNjSBmFqw+vaURBXJZRXhLdzqYNvvvR6jA4jtmFw+Ldf6 Jztt/BytR6iSZLsjhfrciRTbnO128ecpgA6gPRaYpXPMBmx2SVynFUqQ/ZarNunygaWj lxX87F4AHXqjg0+6jzPdydam85lKiaU7fKBQJsCMuAhnXc33DfTEvN+aX/kB36Jnft0Y OdJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ioFTKN5RCu0a7L5Oh0HoLLLpAlODISnRUTDIV10EYKA=; b=jKYk4qeRpua6YrsojBKwXYkjuOYEi5Ck+TkTL6vYjTHGsirq6Xl12j+GfsbLsFUBdT bLa/grEwmTlXgO36wwwOn31BklOf7uttNyvnRjRbmUhPeRrbrKWEOkkZOf7wHOlmYZza t7Xi4jdx2X62KUaWBxoTH6UK6CU9sUI1MBjGpaBoE6c/7LwRPu4FmIRlJKujfZS/N8ZM +FVRAk0urIgRhFgLa0s65Kh/G2kZHdTMG97xlFSRaEOyAy7fIc72Dq8VcAqqC8riEO8q GmZawEl5A/eLOnGAm4yuTtRHnGBtXqOPrz3ZcKSQKpI43jqPv34gyTiC+Ho9N+zQV8Ze e92Q== X-Gm-Message-State: AOAM531pePoNLjKandnyiMU9KKVxCbHMa3yHazl6Ua4a+27uKfDbxKtc TWW3SmSIpQPKnAG/cy9SvOi/KGqOzYM= X-Google-Smtp-Source: ABdhPJxK2iyD66bnB4O7ENYc0JXMaGhK8R1kOrrOMADFEwYYmGMzvheXxkTsDozG+Ebp66MI19LCTg== X-Received: by 2002:a17:902:c082:: with SMTP id j2mr29319573pld.175.1593019169376; Wed, 24 Jun 2020 10:19:29 -0700 (PDT) Received: from localhost.localdomain (c-73-202-182-113.hsd1.ca.comcast.net. [73.202.182.113]) by smtp.gmail.com with ESMTPSA id w18sm17490241pgj.31.2020.06.24.10.19.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 10:19:28 -0700 (PDT) From: Tom Herbert To: netdev@vger.kernel.org Cc: Tom Herbert Subject: [RFC PATCH 05/11] net: Infrastructure for per queue aRFS Date: Wed, 24 Jun 2020 10:17:44 -0700 Message-Id: <20200624171749.11927-6-tom@herbertland.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624171749.11927-1-tom@herbertland.com> References: <20200624171749.11927-1-tom@herbertland.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Infrastructure changes to allow aRFS to be based on Per Thread Queues instead of just CPU. The basic change is to create a field in rps_dev_flow to hold either a CPU or a queue index (not just a CPU that is). Changes include: - Replace u16 cpu field in rps_dev_flow structure with rps_cpu_qid structure that contains either a CPU or a device queue index. Note the structure is still sixteen bits - Helper functions to clear and set the cpu in the rps_cpu_qid of rps_dev_flow - Create a sock_masks structure that contains the partition of the thirty-two bit entry in rps_sock_flow_table. The structure contains two masks, one to extract the upper bits of the hash and one to extract the CPU number or queue index - Replace rps_cpu_mask with sock_masks from rps_sock_flow_table - Add rps_max_num_queues which will be used when creating sock_masks for queue entries in rps_sock_flow_table --- include/linux/netdevice.h | 94 +++++++++++++++++++++++++++++++++----- net/core/dev.c | 47 ++++++++++++------- net/core/net-sysfs.c | 2 +- net/core/sysctl_net_core.c | 6 ++- 4 files changed, 119 insertions(+), 30 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index bf5f2a85da97..d528aa61fea3 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -674,18 +674,65 @@ struct rps_map { }; #define RPS_MAP_SIZE(_num) (sizeof(struct rps_map) + ((_num) * sizeof(u16))) +/* The rps_cpu_qid structure is sixteen bits and holds either a CPU number or + * a queue index. The use_qid field specifies which type of value is set (i.e. + * if use_qid is 1 then cpu_qid contains a fifteen bit queue identifier, and if + * use_qid is 0 then cpu_qid contains a fifteen bit CPU number). No entry is + * signified by RPS_NO_CPU_QID in val which is set to NO_QUEUE (0xffff). So the + * range of CPU numbers that can be stored is 0..32,767 (0x7fff) and the range + * of queue identifiers is 0..32,766. Note that CPU numbers are limited by + * CONFIG_NR_CPUS which currently has a maximum supported value of 8,192 (per + * arch/x86/Kconfig), so WARN_ON is used to check that a CPU number is less + * than 0x8000 when setting the cpu in rps_cpu_qid. The queue index is limited + * by configuration. + */ +struct rps_cpu_qid { + union { + u16 val; + struct { + u16 use_qid: 1; + union { + u16 cpu: 15; + u16 qid: 15; + }; + }; + }; +}; + +#define RPS_NO_CPU_QID NO_QUEUE /* No CPU or qid in rps_cpu_qid */ +#define RPS_MAX_CPU 0x7fff /* Maximum cpu in rps_cpu_qid */ +#define RPS_MAX_QID 0x7ffe /* Maximum qid in rps_cpu_qid */ + /* * The rps_dev_flow structure contains the mapping of a flow to a CPU, the * tail pointer for that CPU's input queue at the time of last enqueue, and * a hardware filter index. */ struct rps_dev_flow { - u16 cpu; + struct rps_cpu_qid cpu_qid; u16 filter; unsigned int last_qtail; }; #define RPS_NO_FILTER 0xffff +static inline void rps_dev_flow_clear(struct rps_dev_flow *dev_flow) +{ + dev_flow->cpu_qid.val = RPS_NO_CPU_QID; +} + +static inline void rps_dev_flow_set_cpu(struct rps_dev_flow *dev_flow, u16 cpu) +{ + struct rps_cpu_qid cpu_qid; + + if (WARN_ON(cpu > RPS_MAX_CPU)) + return; + + /* Set the rflow target to the CPU atomically */ + cpu_qid.use_qid = 0; + cpu_qid.cpu = cpu; + dev_flow->cpu_qid = cpu_qid; +} + /* * The rps_dev_flow_table structure contains a table of flow mappings. */ @@ -697,34 +744,57 @@ struct rps_dev_flow_table { #define RPS_DEV_FLOW_TABLE_SIZE(_num) (sizeof(struct rps_dev_flow_table) + \ ((_num) * sizeof(struct rps_dev_flow))) +struct rps_sock_masks { + u32 mask; + u32 hash_mask; +}; + /* - * The rps_sock_flow_table contains mappings of flows to the last CPU - * on which they were processed by the application (set in recvmsg). - * Each entry is a 32bit value. Upper part is the high-order bits - * of flow hash, lower part is CPU number. - * rps_cpu_mask is used to partition the space, depending on number of - * possible CPUs : rps_cpu_mask = roundup_pow_of_two(nr_cpu_ids) - 1 - * For example, if 64 CPUs are possible, rps_cpu_mask = 0x3f, - * meaning we use 32-6=26 bits for the hash. + * The rps_sock_flow_table contains mappings of flows to the last CPU on which + * they were processed by the application (set in recvmsg), or the mapping of + * the flow to a per thread queue for the application. Each entry is a 32bit + * value. The high order bit indicates whether a CPU number or a queue index is + * stored. The next high-order bits contain the flow hash, and the lower bits + * contain the CPU number or queue index. The sock_flow table contains two + * sets of masks, one for CPU entries (cpu_masks) and one for queue entries + * (queue_masks), that are to used partition the space between the hash bits + * and the CPU number or queue index. For the cpu masks, cpu_masks.mask is set + * to roundup_pow_of_two(nr_cpu_ids) - 1 and the corresponding hash mask, + * cpu_masks.hash_mask, is set to (~cpu_masks.mask & ~RPS_SOCK_FLOW_USE_QID). + * For example, if 64 CPUs are possible, cpu_masks.mask == 0x3f, meaning we use + * 31-6=25 bits for the hash (so cpu_masks.hash_mask == 0x7fffffc0). Similarly, + * queue_masks in rps_sock_flow_table is used to partition the space when a + * queue index is present. */ struct rps_sock_flow_table { u32 mask; + struct rps_sock_masks cpu_masks; + struct rps_sock_masks queue_masks; u32 ents[] ____cacheline_aligned_in_smp; }; #define RPS_SOCK_FLOW_TABLE_SIZE(_num) (offsetof(struct rps_sock_flow_table, ents[_num])) -#define RPS_NO_CPU 0xffff +#define RPS_SOCK_FLOW_USE_QID (1 << 31) +#define RPS_SOCK_FLOW_NO_IDENT -1U -extern u32 rps_cpu_mask; extern struct rps_sock_flow_table __rcu *rps_sock_flow_table; +extern unsigned int rps_max_num_queues; + +static inline void rps_init_sock_masks(struct rps_sock_masks *masks, u32 num) +{ + u32 mask = roundup_pow_of_two(num) - 1; + + masks->mask = mask; + masks->hash_mask = (~mask & ~RPS_SOCK_FLOW_USE_QID); +} static inline void rps_record_sock_flow(struct rps_sock_flow_table *table, u32 hash) { if (table && hash) { + u32 val = hash & table->cpu_masks.hash_mask; unsigned int index = hash & table->mask; - u32 val = hash & ~rps_cpu_mask; /* We only give a hint, preemption can change CPU under us */ val |= raw_smp_processor_id(); diff --git a/net/core/dev.c b/net/core/dev.c index 9f7a3e78e23a..946940bdd583 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4242,8 +4242,7 @@ static inline void ____napi_schedule(struct softnet_data *sd, /* One global table that all flow-based protocols share. */ struct rps_sock_flow_table __rcu *rps_sock_flow_table __read_mostly; EXPORT_SYMBOL(rps_sock_flow_table); -u32 rps_cpu_mask __read_mostly; -EXPORT_SYMBOL(rps_cpu_mask); +unsigned int rps_max_num_queues; struct static_key_false rps_needed __read_mostly; EXPORT_SYMBOL(rps_needed); @@ -4302,7 +4301,7 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb, per_cpu(softnet_data, next_cpu).input_queue_head; } - rflow->cpu = next_cpu; + rps_dev_flow_set_cpu(rflow, next_cpu); return rflow; } @@ -4349,22 +4348,39 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb, sock_flow_table = rcu_dereference(rps_sock_flow_table); if (flow_table && sock_flow_table) { + u32 next_cpu, comparator, ident; struct rps_dev_flow *rflow; - u32 next_cpu; - u32 ident; /* First check into global flow table if there is a match */ ident = sock_flow_table->ents[hash & sock_flow_table->mask]; - if ((ident ^ hash) & ~rps_cpu_mask) - goto try_rps; + comparator = ((ident & RPS_SOCK_FLOW_USE_QID) ? + sock_flow_table->queue_masks.hash_mask : + sock_flow_table->cpu_masks.hash_mask); - next_cpu = ident & rps_cpu_mask; + if ((ident ^ hash) & comparator) + goto try_rps; /* OK, now we know there is a match, * we can look at the local (per receive queue) flow table */ rflow = &flow_table->flows[hash & flow_table->mask]; - tcpu = rflow->cpu; + + /* The flow_sock entry may refer to either a queue or a + * CPU. Proceed accordingly. + */ + if (ident & RPS_SOCK_FLOW_USE_QID) { + /* A queue identifier is in the sock_flow_table entry */ + + /* Don't use aRFS to set CPU in this case, skip to + * trying RPS + */ + goto try_rps; + } + + /* A CPU number is in the sock_flow_table entry */ + + next_cpu = ident & sock_flow_table->cpu_masks.mask; + tcpu = rflow->cpu_qid.use_qid ? NO_QUEUE : rflow->cpu_qid.cpu; /* * If the desired CPU (where last recvmsg was done) is @@ -4396,10 +4412,8 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb, if (map) { tcpu = map->cpus[reciprocal_scale(hash, map->len)]; - if (cpu_online(tcpu)) { + if (cpu_online(tcpu)) cpu = tcpu; - goto done; - } } done: @@ -4424,17 +4438,18 @@ bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index, { struct netdev_rx_queue *rxqueue = dev->_rx + rxq_index; struct rps_dev_flow_table *flow_table; + struct rps_cpu_qid cpu_qid; struct rps_dev_flow *rflow; bool expire = true; - unsigned int cpu; rcu_read_lock(); flow_table = rcu_dereference(rxqueue->rps_flow_table); if (flow_table && flow_id <= flow_table->mask) { rflow = &flow_table->flows[flow_id]; - cpu = READ_ONCE(rflow->cpu); - if (rflow->filter == filter_id && cpu < nr_cpu_ids && - ((int)(per_cpu(softnet_data, cpu).input_queue_head - + cpu_qid = READ_ONCE(rflow->cpu_qid); + if (rflow->filter == filter_id && !cpu_qid.use_qid && + cpu_qid.cpu < nr_cpu_ids && + ((int)(per_cpu(softnet_data, cpu_qid.cpu).input_queue_head - rflow->last_qtail) < (int)(10 * flow_table->mask))) expire = false; diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index e353b822bb15..56d27463d466 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -858,7 +858,7 @@ static ssize_t store_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue, table->mask = mask; for (count = 0; count <= mask; count++) - table->flows[count].cpu = RPS_NO_CPU; + rps_dev_flow_clear(&table->flows[count]); } else { table = NULL; } diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 9c7d46fbb75a..d09471f29d89 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -65,12 +65,16 @@ static int rps_create_sock_flow_table(size_t size, size_t orig_size, return -ENOMEM; sock_table->mask = size - 1; + rps_init_sock_masks(&sock_table->cpu_masks, + nr_cpu_ids); + rps_init_sock_masks(&sock_table->queue_masks, + rps_max_num_queues); } else { sock_table = orig_table; } for (i = 0; i < size; i++) - sock_table->ents[i] = RPS_NO_CPU; + sock_table->ents[i] = RPS_NO_CPU_QID; } else { sock_table = NULL; } From patchwork Wed Jun 24 17:17:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Herbert X-Patchwork-Id: 217172 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C3CBC433DF for ; Wed, 24 Jun 2020 17:19:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1B01A20823 for ; Wed, 24 Jun 2020 17:19:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.b="cwsVEsDd" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405484AbgFXRTk (ORCPT ); Wed, 24 Jun 2020 13:19:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405475AbgFXRTi (ORCPT ); Wed, 24 Jun 2020 13:19:38 -0400 Received: from mail-pj1-x1042.google.com (mail-pj1-x1042.google.com [IPv6:2607:f8b0:4864:20::1042]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0F39C061795 for ; Wed, 24 Jun 2020 10:19:37 -0700 (PDT) Received: by mail-pj1-x1042.google.com with SMTP id ne5so1409257pjb.5 for ; Wed, 24 Jun 2020 10:19:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=uWEQN00J+EsTD7vc+2ZqKDA1RdwxuktjAkTxSvFMBrY=; b=cwsVEsDd05a8SHzQwLET/o5+fbJx1bp956sls0WQ9hI9wfDNkFYd/LSlHTtou+AgMB nFQcuj06TeKnXxCXTFtiKbCHqqoiMJnkhL90xFPQNcSC0NOkLUzzWeSoaavoDrJwP7dD JiGSZKX5sghpjnKJ1Y8Hg6WTyBFWr5RbM4TIgcFtVGNflSQlqOlHmMs20V7rWMVNbmQ9 UAR1RHtdeYpJQVBJ5k0w4NVEppPJeQ/8ZD8S91Df0tC1kG6s2paWYkufe+Y5HkYMXe2F pTkcFf4opTAXSI7H2D6Ckjgb+FHSbn5TzGjPSX79QEwP6pCI8QoUxMSy6yUA865mlMRg 5a7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=uWEQN00J+EsTD7vc+2ZqKDA1RdwxuktjAkTxSvFMBrY=; b=GvPmuwihmYoVHQ3bukbB+cpgW4xCASez3ILlrdEYxrkDO/hS2tT5g/gLOSxaRBf+mf TyI2WXKt3IUD3/XWzvCbMaLaQ8yyifNF82/eDnfraOsm6573BfLB2hp1s5FpG7Au8c3G /aiHyKtxrwPV/nF9Z/r/NJi4y/jYQ28v0m2v84z2KZld4DtN3hDyuWjSXvseDLnKgosv NnhJb4hMox4D8rhoTLIIN5/PLXCQJmdWPMhmK7DWlSKs99Gmvcdy9tAgPfdFLPk5oNH2 uxrNgk3oAHrtIIkxQlmMPMCWdQyMy6UCTbWEgt/AUF9mPeLrqxDdbSGn0rrtqBwqnA12 vyPQ== X-Gm-Message-State: AOAM530a5tYMABAn7+eYiujWee9pCSaTKAbcPp6hA1Ju49A7vA9rhmtX pojTG96fsTdb0KHjQWmRuxQO7VtrHwQ= X-Google-Smtp-Source: ABdhPJy7ytS2Ni71K6mk+qkz7jf5KuIOMyPQ/xUACDDcqjCjZ36nNbnuzWv8T4t6N9aUvAbPxUEPvw== X-Received: by 2002:a17:90a:4f4b:: with SMTP id w11mr29868537pjl.11.1593019176774; Wed, 24 Jun 2020 10:19:36 -0700 (PDT) Received: from localhost.localdomain (c-73-202-182-113.hsd1.ca.comcast.net. [73.202.182.113]) by smtp.gmail.com with ESMTPSA id w18sm17490241pgj.31.2020.06.24.10.19.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 10:19:36 -0700 (PDT) From: Tom Herbert To: netdev@vger.kernel.org Cc: Tom Herbert Subject: [RFC PATCH 07/11] net: Introduce global queues Date: Wed, 24 Jun 2020 10:17:46 -0700 Message-Id: <20200624171749.11927-8-tom@herbertland.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624171749.11927-1-tom@herbertland.com> References: <20200624171749.11927-1-tom@herbertland.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Global queues, or gqids, are an abstract representation of NIC device queues. They are global in the sense that the each gqid can be map to a queue in each device, i.e. if there are multiple devices in the system, a gqid can map to a different queue, a dqid, in each device in a one to many mapping. gqids are used for configuring packet steering on both send and receive in a generic way not bound to a particular device. Each transmit or receive device queue may be reversed mapped to one gqid. Each device maintains a table mapping gqids to local device queues, those tables are used in the data path to convert a gqid receive or transmit queue into a device queue relative to the sending or receiving device. Changes in the patch: - Add a simple index to netdev_queue and netdev_rx_queue This serves as the dqid (it's just the index in the receive or transmit queue array for the device) - Add gqid to netdev_queue and netdev_rx_queue. This is the mapping of a device queue to gqid. If gqid is NO_QUEUE then the gqid is unmapped - The per device gqid to dqid maps are maintained in an array of netdev_queue_map structures in a net_devce for both transmit and receive - Functions that return a dqid where input is gqid and a net_device - Sysfs to set device queue mappings in global_queue_mapping attribyte of the sysfs rx- and tx- queue directory - Create per device gqid to dqid maps in the sysfs function --- include/linux/netdevice.h | 75 ++++++++++++++ net/core/dev.c | 20 +++- net/core/net-sysfs.c | 199 +++++++++++++++++++++++++++++++++++++- 3 files changed, 290 insertions(+), 4 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 48ba1c1fc644..ca163925211a 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -606,6 +606,10 @@ struct netdev_queue { #endif #if defined(CONFIG_XPS) && defined(CONFIG_NUMA) int numa_node; +#endif +#ifdef CONFIG_RPS + u16 index; + u16 gqid; #endif unsigned long tx_maxrate; /* @@ -823,6 +827,8 @@ bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index, u32 flow_id, /* This structure contains an instance of an RX queue. */ struct netdev_rx_queue { #ifdef CONFIG_RPS + u16 index; + u16 gqid; struct rps_map __rcu *rps_map; struct rps_dev_flow_table __rcu *rps_flow_table; #endif @@ -875,6 +881,25 @@ struct xps_dev_maps { #endif /* CONFIG_XPS */ +#ifdef CONFIG_RPS +/* Structure to map a global queue to a device queue */ +struct netdev_queue_map { + struct rcu_head rcu; + unsigned int max_ents; + unsigned int set_count; + u16 map[0]; +}; + +/* Allocate queue map in blocks to avoid thrashing */ +#define QUEUE_MAP_ALLOC_BLOCK 128 + +#define QUEUE_MAP_ALLOC_NUMBER(_num) \ + ((((_num - 1) / QUEUE_MAP_ALLOC_BLOCK) + 1) * QUEUE_MAP_ALLOC_BLOCK) + +#define QUEUE_MAP_ALLOC_SIZE(_num) (sizeof(struct netdev_queue_map) + \ + (_num) * sizeof(u16)) +#endif /* CONFIG_RPS */ + #define TC_MAX_QUEUE 16 #define TC_BITMASK 15 /* HW offloaded queuing disciplines txq count and offset maps */ @@ -2092,6 +2117,10 @@ struct net_device { rx_handler_func_t __rcu *rx_handler; void __rcu *rx_handler_data; +#ifdef CONFIG_RPS + struct netdev_queue_map __rcu *rx_gqueue_map; +#endif + #ifdef CONFIG_NET_CLS_ACT struct mini_Qdisc __rcu *miniq_ingress; #endif @@ -2122,6 +2151,9 @@ struct net_device { struct xps_dev_maps __rcu *xps_cpus_map; struct xps_dev_maps __rcu *xps_rxqs_map; #endif +#ifdef CONFIG_RPS + struct netdev_queue_map __rcu *tx_gqueue_map; +#endif #ifdef CONFIG_NET_CLS_ACT struct mini_Qdisc __rcu *miniq_egress; #endif @@ -2218,6 +2250,36 @@ struct net_device { }; #define to_net_dev(d) container_of(d, struct net_device, dev) +#ifdef CONFIG_RPS +static inline u16 netdev_gqid_to_dqid(const struct netdev_queue_map *map, + u16 gqid) +{ + return (map && gqid < map->max_ents) ? map->map[gqid] : NO_QUEUE; +} + +static inline u16 netdev_tx_gqid_to_dqid(const struct net_device *dev, u16 gqid) +{ + u16 dqid; + + rcu_read_lock(); + dqid = netdev_gqid_to_dqid(rcu_dereference(dev->tx_gqueue_map), gqid); + rcu_read_unlock(); + + return dqid; +} + +static inline u16 netdev_rx_gqid_to_dqid(const struct net_device *dev, u16 gqid) +{ + u16 dqid; + + rcu_read_lock(); + dqid = netdev_gqid_to_dqid(rcu_dereference(dev->rx_gqueue_map), gqid); + rcu_read_unlock(); + + return dqid; +} +#endif + static inline bool netif_elide_gro(const struct net_device *dev) { if (!(dev->features & NETIF_F_GRO) || dev->xdp_prog) @@ -2290,6 +2352,19 @@ static inline void netdev_for_each_tx_queue(struct net_device *dev, f(dev, &dev->_tx[i], arg); } +static inline void netdev_for_each_tx_queue_index(struct net_device *dev, + void (*f)(struct net_device *, + struct netdev_queue *, + unsigned int index, + void *), + void *arg) +{ + unsigned int i; + + for (i = 0; i < dev->num_tx_queues; i++) + f(dev, &dev->_tx[i], i, arg); +} + #define netdev_lockdep_set_classes(dev) \ { \ static struct lock_class_key qdisc_tx_busylock_key; \ diff --git a/net/core/dev.c b/net/core/dev.c index 946940bdd583..f64bf6608775 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9331,6 +9331,10 @@ static int netif_alloc_rx_queues(struct net_device *dev) for (i = 0; i < count; i++) { rx[i].dev = dev; +#ifdef CONFIG_RPS + rx[i].index = i; + rx[i].gqid = NO_QUEUE; +#endif /* XDP RX-queue setup */ err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i); @@ -9363,7 +9367,8 @@ static void netif_free_rx_queues(struct net_device *dev) } static void netdev_init_one_queue(struct net_device *dev, - struct netdev_queue *queue, void *_unused) + struct netdev_queue *queue, + unsigned int index, void *_unused) { /* Initialize queue lock */ spin_lock_init(&queue->_xmit_lock); @@ -9371,6 +9376,10 @@ static void netdev_init_one_queue(struct net_device *dev, queue->xmit_lock_owner = -1; netdev_queue_numa_node_write(queue, NUMA_NO_NODE); queue->dev = dev; +#ifdef CONFIG_RPS + queue->index = index; + queue->gqid = NO_QUEUE; +#endif #ifdef CONFIG_BQL dql_init(&queue->dql, HZ); #endif @@ -9396,7 +9405,7 @@ static int netif_alloc_netdev_queues(struct net_device *dev) dev->_tx = tx; - netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL); + netdev_for_each_tx_queue_index(dev, netdev_init_one_queue, NULL); spin_lock_init(&dev->tx_global_lock); return 0; @@ -9884,7 +9893,7 @@ struct netdev_queue *dev_ingress_queue_create(struct net_device *dev) queue = kzalloc(sizeof(*queue), GFP_KERNEL); if (!queue) return NULL; - netdev_init_one_queue(dev, queue, NULL); + netdev_init_one_queue(dev, queue, 0, NULL); RCU_INIT_POINTER(queue->qdisc, &noop_qdisc); queue->qdisc_sleeping = &noop_qdisc; rcu_assign_pointer(dev->ingress_queue, queue); @@ -10041,6 +10050,11 @@ void free_netdev(struct net_device *dev) { struct napi_struct *p, *n; +#ifdef CONFIG_RPS + WARN_ON(rcu_dereference_protected(dev->tx_gqueue_map, 1)); + WARN_ON(rcu_dereference_protected(dev->rx_gqueue_map, 1)); +#endif + might_sleep(); netif_free_tx_queues(dev); netif_free_rx_queues(dev); diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 56d27463d466..3a9d3d9ee8e0 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -875,18 +875,166 @@ static ssize_t store_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue, return len; } +static void queue_map_release(struct rcu_head *rcu) +{ + struct netdev_queue_map *q_map = container_of(rcu, + struct netdev_queue_map, rcu); + vfree(q_map); +} + +static int set_device_queue_mapping(struct netdev_queue_map **pmap, + u16 gqid, u16 dqid, u16 *p_gqid) +{ + static DEFINE_MUTEX(global_mapping_table); + struct netdev_queue_map *gq_map, *old_gq_map; + u16 old_gqid; + int ret = 0; + + mutex_lock(&global_mapping_table); + + old_gqid = *p_gqid; + if (old_gqid == gqid) { + /* Nothing changing */ + goto out; + } + + gq_map = rcu_dereference_protected(*pmap, + lockdep_is_held(&global_mapping_table)); + old_gq_map = gq_map; + + if (gqid == NO_QUEUE) { + /* Remove any old mapping (we know that old_gqid cannot be + * NO_QUEUE from above) + */ + if (!WARN_ON(!gq_map || old_gqid > gq_map->max_ents || + gq_map->map[old_gqid] != dqid)) { + /* Unset old mapping */ + gq_map->map[old_gqid] = NO_QUEUE; + if (--gq_map->set_count == 0) { + /* Done with map so free */ + rcu_assign_pointer(*pmap, NULL); + call_rcu(&gq_map->rcu, queue_map_release); + } + } + *p_gqid = NO_QUEUE; + + goto out; + } + + if (!gq_map || gqid >= gq_map->max_ents) { + unsigned int max_queues; + int i = 0; + + /* Need to create or expand queue map */ + + max_queues = QUEUE_MAP_ALLOC_NUMBER(gqid + 1); + + gq_map = vmalloc(QUEUE_MAP_ALLOC_SIZE(max_queues)); + if (!gq_map) { + ret = -ENOMEM; + goto out; + } + + gq_map->max_ents = max_queues; + + if (old_gq_map) { + /* Copy old map entries */ + + memcpy(gq_map->map, old_gq_map->map, + old_gq_map->max_ents * sizeof(gq_map->map[0])); + gq_map->set_count = old_gq_map->set_count; + i = old_gq_map->max_ents; + } else { + gq_map->set_count = 0; + } + + /* Initialize entries not copied from old map */ + for (; i < max_queues; i++) + gq_map->map[i] = NO_QUEUE; + } else if (gq_map->map[gqid] != NO_QUEUE) { + /* The global qid is already mapped to another device qid */ + ret = -EBUSY; + goto out; + } + + /* Set map entry */ + gq_map->map[gqid] = dqid; + gq_map->set_count++; + + if (old_gqid != NO_QUEUE) { + /* We know old_gqid is not equal to gqid */ + if (!WARN_ON(!old_gq_map || + old_gqid > old_gq_map->max_ents || + old_gq_map->map[old_gqid] != dqid)) { + /* Unset old mapping in (new) table */ + gq_map->map[old_gqid] = NO_QUEUE; + gq_map->set_count--; + } + } + + if (gq_map != old_gq_map) { + rcu_assign_pointer(*pmap, gq_map); + if (old_gq_map) + call_rcu(&old_gq_map->rcu, queue_map_release); + } + + /* Save for caller */ + *p_gqid = gqid; + +out: + mutex_unlock(&global_mapping_table); + + return ret; +} + +static ssize_t show_rx_queue_global_mapping(struct netdev_rx_queue *queue, + char *buf) +{ + u16 gqid = queue->gqid; + + if (gqid == NO_QUEUE) + return sprintf(buf, "none\n"); + else + return sprintf(buf, "%u\n", gqid); +} + +static ssize_t store_rx_queue_global_mapping(struct netdev_rx_queue *queue, + const char *buf, size_t len) +{ + unsigned long gqid; + int ret; + + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + + ret = kstrtoul(buf, 0, &gqid); + if (ret < 0) + return ret; + + if (gqid > RPS_MAX_QID || WARN_ON(queue->index > RPS_MAX_QID)) + return -EINVAL; + + ret = set_device_queue_mapping(&queue->dev->rx_gqueue_map, + gqid, queue->index, &queue->gqid); + return ret ? : len; +} + static struct rx_queue_attribute rps_cpus_attribute __ro_after_init = __ATTR(rps_cpus, 0644, show_rps_map, store_rps_map); static struct rx_queue_attribute rps_dev_flow_table_cnt_attribute __ro_after_init = __ATTR(rps_flow_cnt, 0644, show_rps_dev_flow_table_cnt, store_rps_dev_flow_table_cnt); +static struct rx_queue_attribute rx_queue_global_mapping_attribute __ro_after_init = + __ATTR(global_queue_mapping, 0644, + show_rx_queue_global_mapping, store_rx_queue_global_mapping); #endif /* CONFIG_RPS */ static struct attribute *rx_queue_default_attrs[] __ro_after_init = { #ifdef CONFIG_RPS &rps_cpus_attribute.attr, &rps_dev_flow_table_cnt_attribute.attr, + &rx_queue_global_mapping_attribute.attr, #endif NULL }; @@ -896,8 +1044,11 @@ static void rx_queue_release(struct kobject *kobj) { struct netdev_rx_queue *queue = to_rx_queue(kobj); #ifdef CONFIG_RPS - struct rps_map *map; struct rps_dev_flow_table *flow_table; + struct rps_map *map; + + set_device_queue_mapping(&queue->dev->rx_gqueue_map, NO_QUEUE, + queue->index, &queue->gqid); map = rcu_dereference_protected(queue->rps_map, 1); if (map) { @@ -1152,6 +1303,46 @@ static ssize_t traffic_class_show(struct netdev_queue *queue, sprintf(buf, "%u\n", tc); } +#ifdef CONFIG_RPS +static ssize_t show_queue_global_queue_mapping(struct netdev_queue *queue, + char *buf) +{ + u16 gqid = queue->gqid; + + if (gqid == NO_QUEUE) + return sprintf(buf, "none\n"); + else + return sprintf(buf, "%u\n", gqid); + return 0; +} + +static ssize_t store_queue_global_queue_mapping(struct netdev_queue *queue, + const char *buf, size_t len) +{ + unsigned long gqid; + int ret; + + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + + ret = kstrtoul(buf, 0, &gqid); + if (ret < 0) + return ret; + + if (gqid > RPS_MAX_QID || WARN_ON(queue->index > RPS_MAX_QID)) + return -EINVAL; + + ret = set_device_queue_mapping(&queue->dev->tx_gqueue_map, + gqid, queue->index, &queue->gqid); + return ret ? : len; +} + +static struct netdev_queue_attribute global_queue_mapping_attribute __ro_after_init = + __ATTR(global_queue_mapping, 0644, + show_queue_global_queue_mapping, + store_queue_global_queue_mapping); +#endif + #ifdef CONFIG_XPS static ssize_t tx_maxrate_show(struct netdev_queue *queue, char *buf) @@ -1483,6 +1674,9 @@ static struct netdev_queue_attribute xps_rxqs_attribute __ro_after_init static struct attribute *netdev_queue_default_attrs[] __ro_after_init = { &queue_trans_timeout.attr, &queue_traffic_class.attr, +#ifdef CONFIG_RPS + &global_queue_mapping_attribute.attr, +#endif #ifdef CONFIG_XPS &xps_cpus_attribute.attr, &xps_rxqs_attribute.attr, @@ -1496,6 +1690,9 @@ static void netdev_queue_release(struct kobject *kobj) { struct netdev_queue *queue = to_netdev_queue(kobj); + set_device_queue_mapping(&queue->dev->tx_gqueue_map, NO_QUEUE, + queue->index, &queue->gqid); + memset(kobj, 0, sizeof(*kobj)); dev_put(queue->dev); } From patchwork Wed Jun 24 17:17:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Herbert X-Patchwork-Id: 217171 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02381C433DF for ; Wed, 24 Jun 2020 17:19:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BF36820823 for ; Wed, 24 Jun 2020 17:19:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.b="gmhtc1+M" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405427AbgFXRTs (ORCPT ); Wed, 24 Jun 2020 13:19:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405486AbgFXRTn (ORCPT ); Wed, 24 Jun 2020 13:19:43 -0400 Received: from mail-pj1-x1042.google.com (mail-pj1-x1042.google.com [IPv6:2607:f8b0:4864:20::1042]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DDE02C061573 for ; Wed, 24 Jun 2020 10:19:42 -0700 (PDT) Received: by mail-pj1-x1042.google.com with SMTP id h22so1421788pjf.1 for ; Wed, 24 Jun 2020 10:19:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2OQX8d52ytdpe1I2jFepdSnY7MQ3IgjSgV7EDnS0Lvk=; b=gmhtc1+MV80WWG0qj3axJDEtLOorRgPozciZiqL3nxVjJaIMql+Ed5k4JxwLkez+jX /r8IZk7SIhmCggN+pk0x7s0nt0nzjW83K9nzKuq+M0DNR2exUew/QAc3P/AmuBR/RuVZ 5/1cueb+Iv1vrdi2dQUYRVRVWHeWupsC/2+G3jVCuJ85yT2iQsltRD3/+5HQceix75bx Cl85qB7jmQbjBppmOHynl90RGjmiPYbSq7dcZk2opNRP/lv3FfVWHgMXqqdWVwUQnJ6F hMdwbo2k+EtjfhUQaKaMNHCgPfO/PvBxnqkZYPAOC60Kp6HV9wCcm0hEb0U3tqUKPHLE AtyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2OQX8d52ytdpe1I2jFepdSnY7MQ3IgjSgV7EDnS0Lvk=; b=NEbWIThuyvgqSqpEeqxLv0OxedYYIASq71lAhJcxeOCrOlg9AaoMWoMjbu61y4ETYo wB0WST2jR5Jidudu6dKsnTnINpOJfL2tkNDeYvlZUnZubLj6jejeJKVGarKdssrZ8iFz lEeEktCdFPUouUWib1wYuvA8t74tzLHdY5f1X3YwUR/e2AEBFjijDcdWoaW9tUtx0FCV IzAaPUSX68MEaE7DwN/B40c/iPeoIwRthqma7YP8i/SgB0wDqnLJ2pP7r43/JEPulQN8 ciQnIJXjxIYQkwhu9ZwIUv96PqmI+JQWiirlLL7KhbkX4sRmU0tSAqye3XQYcQ4E2A87 ikZg== X-Gm-Message-State: AOAM530oUawO+eeHwdYg6O3pWoLNmrjcio4h6LIulxYXUbzFOOx74wN6 5kmpKkF5KZlJnAKcCiENpc7fWzerwtg= X-Google-Smtp-Source: ABdhPJxBY+EJ+1WlNxRYRcW9/DQ3X+LXUb6EKTotiex2pfRuujuWNtWYlvu87p0OHHBbXwPm70hnbQ== X-Received: by 2002:a17:902:ab98:: with SMTP id f24mr30802753plr.154.1593019181924; Wed, 24 Jun 2020 10:19:41 -0700 (PDT) Received: from localhost.localdomain (c-73-202-182-113.hsd1.ca.comcast.net. [73.202.182.113]) by smtp.gmail.com with ESMTPSA id w18sm17490241pgj.31.2020.06.24.10.19.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 10:19:41 -0700 (PDT) From: Tom Herbert To: netdev@vger.kernel.org Cc: Tom Herbert Subject: [RFC PATCH 09/11] ptq: Hook up transmit side of Per Queue Threads Date: Wed, 24 Jun 2020 10:17:48 -0700 Message-Id: <20200624171749.11927-10-tom@herbertland.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624171749.11927-1-tom@herbertland.com> References: <20200624171749.11927-1-tom@herbertland.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Support to select device queue for transmit based on the per thread transmit queue. Patch includes: - Add a global queue (gqid) mapping to sock - Function to convert gqid in a sock to a device queue (dqid) by calling sk_tx_gqid_to_dqid_get - Function sock_record_tx_queue to record a queue in a socket taken from ptq_threads in struct task - Call sock_record_tx_queue from af_inet send, listen, and accept functions to populate the socket's gqid for steerig - In netdev_pick_tx try to take the queue index from the socket using sk_tx_gqid_to_dqid_get --- include/net/sock.h | 63 ++++++++++++++++++++++++++++++++++++++++++++++ net/core/dev.c | 9 ++++--- net/ipv4/af_inet.c | 6 +++++ 3 files changed, 75 insertions(+), 3 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index acb76cfaae1b..5ec9d02e7ad0 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -140,6 +140,7 @@ typedef __u64 __bitwise __addrpair; * @skc_node: main hash linkage for various protocol lookup tables * @skc_nulls_node: main hash linkage for TCP/UDP/UDP-Lite protocol * @skc_tx_queue_mapping: tx queue number for this connection + * @skc_tx_gqid_mapping: global tx queue number for sending * @skc_rx_queue_mapping: rx queue number for this connection * @skc_flags: place holder for sk_flags * %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE, @@ -225,6 +226,9 @@ struct sock_common { struct hlist_nulls_node skc_nulls_node; }; unsigned short skc_tx_queue_mapping; +#ifdef CONFIG_RPS + unsigned short skc_tx_gqid_mapping; +#endif #ifdef CONFIG_XPS unsigned short skc_rx_queue_mapping; #endif @@ -353,6 +357,9 @@ struct sock { #define sk_nulls_node __sk_common.skc_nulls_node #define sk_refcnt __sk_common.skc_refcnt #define sk_tx_queue_mapping __sk_common.skc_tx_queue_mapping +#ifdef CONFIG_RPS +#define sk_tx_gqid_mapping __sk_common.skc_tx_gqid_mapping +#endif #ifdef CONFIG_XPS #define sk_rx_queue_mapping __sk_common.skc_rx_queue_mapping #endif @@ -1792,6 +1799,34 @@ static inline int sk_receive_skb(struct sock *sk, struct sk_buff *skb, return __sk_receive_skb(sk, skb, nested, 1, true); } +static inline int sk_tx_gqid_get(const struct sock *sk) +{ +#ifdef CONFIG_RPS + if (sk && sk->sk_tx_gqid_mapping != NO_QUEUE) + return sk->sk_tx_gqid_mapping; +#endif + + return -1; +} + +static inline void sk_tx_gqid_set(struct sock *sk, int gqid) +{ +#ifdef CONFIG_RPS + /* sk_tx_queue_mapping accept only up to RPS_MAX_QID (0x7ffe) */ + if (WARN_ON_ONCE((unsigned int)gqid > RPS_MAX_QID && + gqid != NO_QUEUE)) + return; + sk->sk_tx_gqid_mapping = gqid; +#endif +} + +static inline void sk_tx_gqid_clear(struct sock *sk) +{ +#ifdef CONFIG_RPS + sk->sk_tx_gqid_mapping = NO_QUEUE; +#endif +} + static inline void sk_tx_queue_set(struct sock *sk, int tx_queue) { /* sk_tx_queue_mapping accept only upto a 16-bit value */ @@ -1803,6 +1838,9 @@ static inline void sk_tx_queue_set(struct sock *sk, int tx_queue) static inline void sk_tx_queue_clear(struct sock *sk) { sk->sk_tx_queue_mapping = NO_QUEUE; + + /* Clear tx_gqid at same points */ + sk_tx_gqid_clear(sk); } static inline int sk_tx_queue_get(const struct sock *sk) @@ -1813,6 +1851,31 @@ static inline int sk_tx_queue_get(const struct sock *sk) return -1; } +static inline int sk_tx_gqid_to_dqid_get(const struct net_device *dev, + const struct sock *sk) +{ + int ret = -1; +#ifdef CONFIG_RPS + int gqid; + u16 dqid; + + gqid = sk_tx_gqid_get(sk); + if (gqid >= 0) { + dqid = netdev_tx_gqid_to_dqid(dev, gqid); + if (dqid != NO_QUEUE) + ret = dqid; + } +#endif + return ret; +} + +static inline void sock_record_tx_queue(struct sock *sk) +{ +#ifdef CONFIG_PER_THREAD_QUEUES + sk_tx_gqid_set(sk, current->ptq_queues.txq_id); +#endif +} + static inline void sk_rx_queue_set(struct sock *sk, const struct sk_buff *skb) { #ifdef CONFIG_XPS diff --git a/net/core/dev.c b/net/core/dev.c index f64bf6608775..f4478c9b1c9c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3982,10 +3982,13 @@ u16 netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, if (queue_index < 0 || skb->ooo_okay || queue_index >= dev->real_num_tx_queues) { - int new_index = get_xps_queue(dev, sb_dev, skb); + int new_index = sk_tx_gqid_to_dqid_get(dev, sk); - if (new_index < 0) - new_index = skb_tx_hash(dev, sb_dev, skb); + if (new_index < 0) { + new_index = get_xps_queue(dev, sb_dev, skb); + if (new_index < 0) + new_index = skb_tx_hash(dev, sb_dev, skb); + } if (queue_index != new_index && sk && sk_fullsock(sk) && diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 02aa5cb3a4fd..9b36aa3d1622 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -201,6 +201,8 @@ int inet_listen(struct socket *sock, int backlog) lock_sock(sk); + sock_record_tx_queue(sk); + err = -EINVAL; if (sock->state != SS_UNCONNECTED || sock->type != SOCK_STREAM) goto out; @@ -630,6 +632,8 @@ int __inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, } } + sock_record_tx_queue(sk); + switch (sock->state) { default: err = -EINVAL; @@ -742,6 +746,7 @@ int inet_accept(struct socket *sock, struct socket *newsock, int flags, lock_sock(sk2); sock_rps_record_flow(sk2); + sock_record_tx_queue(sk2); WARN_ON(!((1 << sk2->sk_state) & (TCPF_ESTABLISHED | TCPF_SYN_RECV | TCPF_CLOSE_WAIT | TCPF_CLOSE))); @@ -794,6 +799,7 @@ EXPORT_SYMBOL(inet_getname); int inet_send_prepare(struct sock *sk) { sock_rps_record_flow(sk); + sock_record_tx_queue(sk); /* We may need to bind the socket. */ if (!inet_sk(sk)->inet_num && !sk->sk_prot->no_autobind && From patchwork Wed Jun 24 17:17:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Herbert X-Patchwork-Id: 217170 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A0B2C433DF for ; Wed, 24 Jun 2020 17:19:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E495520823 for ; Wed, 24 Jun 2020 17:19:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.b="rXqb529D" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405510AbgFXRT4 (ORCPT ); Wed, 24 Jun 2020 13:19:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48930 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405495AbgFXRTu (ORCPT ); Wed, 24 Jun 2020 13:19:50 -0400 Received: from mail-pj1-x1041.google.com (mail-pj1-x1041.google.com [IPv6:2607:f8b0:4864:20::1041]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DA9DC061573 for ; Wed, 24 Jun 2020 10:19:50 -0700 (PDT) Received: by mail-pj1-x1041.google.com with SMTP id d6so1418490pjs.3 for ; Wed, 24 Jun 2020 10:19:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/5gyd7jvxJZUbppQ1X68CyB7v126AL9t5W3BAs6GaxY=; b=rXqb529DtF/YuUK9qskzQ0Qn9nEqOj9aPV4jKAlmSCPFN4GJ/uj59yCEZozKARpjfE nNUoDwXVxVHgDt/FIOElIljTF0+y1N65dqW77/ovJxpmi7TWKrruSiwFLWM1jRz6Xeui Lb1RH8kQA6AnJv1wIl7a9EIlA+xbW+bqahxD0r+rsTGNz66xTq18ezF21OnyA9DFRwx0 LYkylQcoTGzO5CifL3N2JelaNsRZZwBJ9YdwhsIiS49W6vDscEXdUylaz/f7r8C5Nbxc /obDWChyL5dfcP5HoZXpp1mQfQ3c14F86ci2cGJbf2QPmnqaByDb7IZ2muJMK1nW4T3S 7llw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/5gyd7jvxJZUbppQ1X68CyB7v126AL9t5W3BAs6GaxY=; b=iZUsU9t2qO7wxGqaZC/XwqAhtTxM/oSuhrqSzVK6c7nk03DLCxyqOantI9mjz1fbgc d14B/Dt0ZEt30cq2benfq3U7U6+4Ys7s0EWEidwgE8X7UX5h7UoWYo1DoBmWGz2brQ7K H7rnfJPB7KdlCPbIQRS1HGYs+98L7pRWKJSV5buGOrCwUAUxbQtqZtY7KI4hBsSUOpSk 6F5y3bU3eAYrMBKbp9hpXx8A32tuw1OWVf94x4odew5WxziUDtFi/LtSd0Waoi8LHvVJ wSMzZFlY4hxndNU+Vfg4A0HTn6zR1Ym5/ZYn0lt9QdudeWF8mClxe5ualwjYwRZd203d 84sQ== X-Gm-Message-State: AOAM533tKr2qvMj6T5Ex/OztSLUX0o8V2DkSqgeEb8EDx+MoY7I29VrA JnI9npvAVY2QuojnhR1tkisQQVu3gkM= X-Google-Smtp-Source: ABdhPJxotMCEQ5yjrW2HzJVe+MPR0y1VlLmWWiKbJ3JIngBOzab8MSbGiPmBv56CNRG699mczYg9eA== X-Received: by 2002:a17:902:8b86:: with SMTP id ay6mr28572322plb.329.1593019188745; Wed, 24 Jun 2020 10:19:48 -0700 (PDT) Received: from localhost.localdomain (c-73-202-182-113.hsd1.ca.comcast.net. [73.202.182.113]) by smtp.gmail.com with ESMTPSA id w18sm17490241pgj.31.2020.06.24.10.19.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 10:19:47 -0700 (PDT) From: Tom Herbert To: netdev@vger.kernel.org Cc: Tom Herbert Subject: [RFC PATCH 11/11] doc: Documentation for Per Thread Queues Date: Wed, 24 Jun 2020 10:17:50 -0700 Message-Id: <20200624171749.11927-12-tom@herbertland.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624171749.11927-1-tom@herbertland.com> References: <20200624171749.11927-1-tom@herbertland.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add a section on Per Thread Queues to scaling.rst. --- Documentation/networking/scaling.rst | 195 ++++++++++++++++++++++++++- 1 file changed, 194 insertions(+), 1 deletion(-) diff --git a/Documentation/networking/scaling.rst b/Documentation/networking/scaling.rst index 8f0347b9fb3d..42f1dc639ab7 100644 --- a/Documentation/networking/scaling.rst +++ b/Documentation/networking/scaling.rst @@ -250,7 +250,7 @@ RFS: Receive Flow Steering While RPS steers packets solely based on hash, and thus generally provides good load distribution, it does not take into account application locality. This is accomplished by Receive Flow Steering -(RFS). The goal of RFS is to increase datacache hitrate by steering +(RFS). The goal of RFS is to increase datacache hit rate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running. RFS relies on the same RPS mechanisms to enqueue packets onto the backlog of another CPU and to wake up that @@ -508,6 +508,199 @@ a max-rate attribute is supported, by setting a Mbps value to:: A value of zero means disabled, and this is the default. +PTQ: Per Thread Queues +====================== + +Per Thread Queues allows application threads to be assigned dedicated +hardware network queues for both transmit and receive. This facility +provides a high degree of traffic isolation between applications and +can also help facilitate high performance due to fine grained packet +steering. + +PTQ has three major design components: + - A method to assign transmit and receive queues to threads + - A means to associate packets with threads and then to steer + those packets to the queues assigned to the threads + - Mechanisms to process the per thread hardware queues + +Global network queues +~~~~~~~~~~~~~~~~~~~~~ + +Global network queues are an abstraction of hardware networking +queues that can be used in generic non-device specific configuration. +Global queues may mapped to real device queues. The mapping is +performed on a per device queue basis. A device sysfs parameter +"global_queue_mapping" in queues/{tx,rx}- indicates the mapping +of a device queue to a global queue. Each device maintains a table +that maps global queues to device queues for the device. Note that +for a single device, the global to device queue mapping is 1 to 1, +however each device may map a global queue to a different device +queue. + +net_queues cgroup controller +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For assigning queues to the threads, a cgroup controller named +"net_queues" is used. A cgroup can be configured with pools of transmit +and receive global queues from which individual threads are assigned +queues. The contents of the net_queues controller are described below in +the configuration section. + +Handling PTQ in the transmit path +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When a socket operation is performed that may result in sending packets +(i.e. listen, accept, sendmsg, sendpage), the task structure for the +current thread is consulted to see if there is an assigned transmit +queue for the thread. If there is a queue assignment, the queue index is +set in a field of the sock structure for the corresponding socket. +Subsequently, when transmit queue selection is performed, the sock +structure associated with packet being sent is consulted. If a transmit +global queue is set in the sock then that index is mapped to a device +queue for the output networking device. If a valid device queue is +discovered then that queue is used, else if a device queue is not found +then queue selection proceeds to XPS. + +Handling PTQ in the receive path +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The receive path uses the infrastructure of RFS which is extended +to steer based on the assigned received global queue for a thread in +addition to steering based on the CPU. The rps_sock_flow_table is +modified to contain either the desired CPU for flows or the desired +receive global queue. A queue is updated at the same time that the +desired CPU would updated during calls to recvmsg and sendmsg (see RFS +description above). The process is to consult the running task structure +to see if a receive queue is assigned to the task. If a queue is assigned +to the task then the corresponding queue index is set in the +rps_sock_flow_table; if no queue is assigned then the current CPU is +set as the desired per canonical RFS. + +When packets are received, the rps_sock_flow table is consulted to check +if they were received on the proper queue. If the rps_sock_flow_table +entry for a corresponding flow of a received packet contains a global +queue index, then the index is mapped to a device queue on the received +device. If the mapped device queue is equal to the receive queue then +packets are being steered properly. If there is a mismatch then the +local flow to queue mapping in the device is changed and +ndo_rx_flow_steer is invoked to set the receive queue for the flow in +the device as described in the aRFS section. + +Processing queues in Per Queue Threads +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When Per Queue Threads is used, the queue "follows" the thread. So when +a thread is rescheduled from one queue to another we expect that the +processing of the device queues that map to the thread are processed on +the CPU where the thread is currently running. This is a bit tricky +especially with respect to the canonical device interrupt driven model. +There are at least three possible approaches: + - Arrange for interrupts to follow threads as they are + rescheduled, or alternatively pin threads to CPUs and + statically configure the interrupt mappings for the queues for + each thread + - Use busy polling + - Use "sleeping busy-poll" with completion queues. The basic + idea is to have one CPU busy poll a device completion queue + that reports device queues with received or completed transmit + packets. When a queue is ready, the thread associated with the + queue (derived by reverse mapping the queue back to its + assigned thread) is scheduled. When the thread runs it polls + its queues to process any packets. + +Future work may further elaborate on solutions in this area. + +Reducing flow state in devices +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +PTQ (and aRFS as well) potentially create per flow state in a device. +This is costly in at least two ways: 1) State requires device memory +which is almost always much less than host memory can and thus the +number of flows that can be instantiated in a device are less than that +in the host. 2) State requires instantiation and synchronization +messages, i.e. ndo_rx_flow_steer causes a message over PCIe bus; if +there is a highly turnover rate of connections this messaging becomes +a bottleneck. + +Mitigations to reduce the amount of flow state in the device should be +considered. + +In PTQ (and aRFS) the device flow state is a considered cache. A flow +entry is only set in the device on a cache miss which occurs when the +receive queue for a packet doesn't match the desired receive queue. So +conceptually, if a packets for a flow are always received on the desired +queue from the beginning of the flow then a flow state might never need +to be instantiated in the device. This motivates a strategy to try to +use stateless steering mechanisms before resorting to stateful ones. + +As an example of applying this strategy, consider an application that +creates four threads where each threads creates a TCP listener socket +for some port that is shared amongst the threads via SO_REUSEPORT. +Four global queues can be assigned to the application (via a cgroup +for the application), and a filter rule can be set up in each device +that matches the listener port and any bound destination address. The +filter maps to a set of four device queues that map to the four global +queues for the application. When a packet is received that matches the +filter, one of the four queues is chosen via a hash over the packet's +four tuple. So in this manner, packets for the application are +distributed amongst the four threads. As long as processing for sockets +doesn't move between threads and the number of listener threads is +constant then packets are always received on the desired queue and no +flow state needs to be instantiated. In practice, we want to allow +elasticity in applications to create and destroy threads on demand, so +additional techniques, such as consistent hashing, are probably needed. + +Per Thread Queues Configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Per Thread Queues is only available if the kernel is compiled with +CONFIG_PER_THREAD_QUEUES. For PTQ in the receive path, aRFS needs to be +supported and configured (see aRFS section above). + +The net_queues cgroup controller is in: + /sys/fs/cgroup//net_queues + +The net_queues controller contains the following attributes: + - tx-queues, rx-queues + Specifies the transmit queue pool and receive queue pool + respectively as a range of global queue indices. The + format of these entries is ":" where + is the first queue index in the pool, and + is the number of queues in the range of pool. + If is zero the queue pool is empty. + - tx-assign,rx-assign + Boolean attributes ("0" or "1") that indicate unique + queue assignment from the respective transmit or receive + queue pool. When the "assign" attribute is enabled, a + thread is assigned a queue that is not already assigned + to another thread. + - symmetric + A boolean attribute ("0" or "1") that indicates the + receive and transmit queue assignment for a thread + should be the same. That is the assigned transmit queue + index is equal to the assigned receive queue index. + - task-queues + A read-only attribute that lists the threads of the + cgroup and their assigned queues. + +The mapping of global queues to device queues is in: + + /sys/class/net//queues/tx-/global_queue_mapping + -and - + /sys/class/net//queues/rx-/global_queue_mapping + +A value of "none" indicates no mapping, an integer value (up to +a maximum of 32,766) indicates a global queue. + +Suggested Configuration +~~~~~~~~~~~~~~~~~~~~~~ + +Unlike aRFS, PTQ requires per application application configuration. To +most effectively use PTQ some understanding of the threading model of +the application is warranted. The section above describes one possible +configuration strategy for a canonical application using SO_REUSEPORT. + + Further Information =================== RPS and RFS were introduced in kernel 2.6.35. XPS was incorporated into