From patchwork Tue Aug 3 17:10:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kishen Maloor X-Patchwork-Id: 491345 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94065C4320A for ; Tue, 3 Aug 2021 17:10:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 73A4A60230 for ; Tue, 3 Aug 2021 17:10:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237724AbhHCRKm (ORCPT ); Tue, 3 Aug 2021 13:10:42 -0400 Received: from mga14.intel.com ([192.55.52.115]:16898 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237502AbhHCRKl (ORCPT ); Tue, 3 Aug 2021 13:10:41 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="213466138" X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="213466138" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2021 10:10:18 -0700 X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="521327118" Received: from shyamasr-mobl.amr.corp.intel.com ([10.209.65.83]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2021 10:10:17 -0700 From: Kishen Maloor To: bpf@vger.kernel.org, netdev@vger.kernel.org, hawk@kernel.org, magnus.karlsson@intel.com Cc: Kishen Maloor Subject: [RFC bpf-next 2/5] libbpf: SO_TXTIME support in AF_XDP Date: Tue, 3 Aug 2021 13:10:03 -0400 Message-Id: <20210803171006.13915-3-kishen.maloor@intel.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20210803171006.13915-1-kishen.maloor@intel.com> References: <20210803171006.13915-1-kishen.maloor@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This change adds userspace support for SO_TXTIME in AF_XDP to include a specific TXTIME (aka "Launch Time") with XDP frames issued from userspace XDP applications. The userspace API has been expanded with two helper functons: - int xsk_socket__enable_so_txtime(struct xsk_socket *xsk, bool enable) Sets the SO_TXTIME option on the AF_XDP socket (using setsockopt()). - void xsk_umem__set_md_txtime(void *umem_area, __u64 chunkAddr, __s64 txtime) Packages the application supplied TXTIME into struct xdp_user_tx_metadata: struct xdp_user_tx_metadata { __u64 timestamp; __u32 md_valid; __u32 btf_id; }; and stores it in the XDP metadata area, which precedes the XDP frame. Signed-off-by: Kishen Maloor --- tools/include/uapi/linux/if_xdp.h | 2 ++ tools/include/uapi/linux/xdp_md_std.h | 14 ++++++++++++++ tools/lib/bpf/xsk.h | 27 ++++++++++++++++++++++++++- 3 files changed, 42 insertions(+), 1 deletion(-) create mode 100644 tools/include/uapi/linux/xdp_md_std.h diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h index a78a8096f4ce..31f81f82ed86 100644 --- a/tools/include/uapi/linux/if_xdp.h +++ b/tools/include/uapi/linux/if_xdp.h @@ -106,6 +106,8 @@ struct xdp_desc { __u32 options; }; +#define XDP_DESC_OPTION_METADATA (1 << 0) + /* UMEM descriptor is __u64 */ #endif /* _LINUX_IF_XDP_H */ diff --git a/tools/include/uapi/linux/xdp_md_std.h b/tools/include/uapi/linux/xdp_md_std.h new file mode 100644 index 000000000000..f00996a61639 --- /dev/null +++ b/tools/include/uapi/linux/xdp_md_std.h @@ -0,0 +1,14 @@ +#ifndef _UAPI_LINUX_XDP_MD_STD_H +#define _UAPI_LINUX_XDP_MD_STD_H + +#include + +#define XDP_METADATA_USER_TX_TIMESTAMP 0x1 + +struct xdp_user_tx_metadata { + __u64 timestamp; + __u32 md_valid; + __u32 btf_id; +}; + +#endif /* _UAPI_LINUX_XDP_MD_STD_H */ diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h index 01c12dca9c10..1b52ffe1c9a3 100644 --- a/tools/lib/bpf/xsk.h +++ b/tools/lib/bpf/xsk.h @@ -16,7 +16,8 @@ #include #include #include - +#include +#include #include "libbpf.h" #ifdef __cplusplus @@ -248,6 +249,30 @@ static inline __u64 xsk_umem__add_offset_to_addr(__u64 addr) LIBBPF_API int xsk_umem__fd(const struct xsk_umem *umem); LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk); +/* Helpers for SO_TXTIME */ + +static inline void xsk_umem__set_md_txtime(void *umem_area, __u64 addr, __s64 txtime) +{ + struct xdp_user_tx_metadata *md; + + md = (struct xdp_user_tx_metadata *)&((char *)umem_area)[addr]; + + md->timestamp = txtime; + md->md_valid |= XDP_METADATA_USER_TX_TIMESTAMP; +} + +static inline int xsk_socket__enable_so_txtime(struct xsk_socket *xsk, bool enable) +{ + unsigned int val = (enable) ? 1 : 0; + int err; + + err = setsockopt(xsk_socket__fd(xsk), SOL_XDP, SO_TXTIME, &val, sizeof(val)); + + if (err) + return -errno; + return 0; +} + #define XSK_RING_CONS__DEFAULT_NUM_DESCS 2048 #define XSK_RING_PROD__DEFAULT_NUM_DESCS 2048 #define XSK_UMEM__DEFAULT_FRAME_SHIFT 12 /* 4096 bytes */ From patchwork Tue Aug 3 17:10:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kishen Maloor X-Patchwork-Id: 491344 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B515C4320A for ; Tue, 3 Aug 2021 17:10:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D6CC161029 for ; Tue, 3 Aug 2021 17:10:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237798AbhHCRKq (ORCPT ); Tue, 3 Aug 2021 13:10:46 -0400 Received: from mga14.intel.com ([192.55.52.115]:16903 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237691AbhHCRKm (ORCPT ); Tue, 3 Aug 2021 13:10:42 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="213466142" X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="213466142" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2021 10:10:21 -0700 X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="521327149" Received: from shyamasr-mobl.amr.corp.intel.com ([10.209.65.83]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2021 10:10:19 -0700 From: Kishen Maloor To: bpf@vger.kernel.org, netdev@vger.kernel.org, hawk@kernel.org, magnus.karlsson@intel.com Cc: Jithu Joseph Subject: [RFC bpf-next 4/5] samples/bpf/xdpsock_user.c: Make get_nsecs() generic Date: Tue, 3 Aug 2021 13:10:05 -0400 Message-Id: <20210803171006.13915-5-kishen.maloor@intel.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20210803171006.13915-1-kishen.maloor@intel.com> References: <20210803171006.13915-1-kishen.maloor@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Jithu Joseph The helper function get_nsecs() assumes clock to be CLOCK_MONOTONIC. TSN features like Launchtime uses CLOCK_TAI. Subsequent patch extends this sample to show how Launchtime APIs maybe used in XDP context. In prepration for this, extend the function to add CLOCKID parameter. Signed-off-by: Jithu Joseph --- samples/bpf/xdpsock_user.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c index 33d0bdebbed8..3fd2f6a0d1eb 100644 --- a/samples/bpf/xdpsock_user.c +++ b/samples/bpf/xdpsock_user.c @@ -159,11 +159,11 @@ static int num_socks; struct xsk_socket_info *xsks[MAX_SOCKS]; int sock; -static unsigned long get_nsecs(void) +static unsigned long get_nsecs(int clockid) { struct timespec ts; - clock_gettime(CLOCK_MONOTONIC, &ts); + clock_gettime(clockid, &ts); return ts.tv_sec * 1000000000UL + ts.tv_nsec; } @@ -354,7 +354,7 @@ static void dump_driver_stats(long dt) static void dump_stats(void) { - unsigned long now = get_nsecs(); + unsigned long now = get_nsecs(CLOCK_MONOTONIC); long dt = now - prev_time; int i; @@ -443,7 +443,7 @@ static void dump_stats(void) static bool is_benchmark_done(void) { if (opt_duration > 0) { - unsigned long dt = (get_nsecs() - start_time); + unsigned long dt = (get_nsecs(CLOCK_MONOTONIC) - start_time); if (dt >= opt_duration) benchmark_done = true; @@ -1683,7 +1683,7 @@ int main(int argc, char **argv) exit_with_error(ret); } - prev_time = get_nsecs(); + prev_time = get_nsecs(CLOCK_MONOTONIC); start_time = prev_time; if (opt_bench == BENCH_RXDROP) From patchwork Tue Aug 3 17:10:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kishen Maloor X-Patchwork-Id: 491343 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C39CC4338F for ; Tue, 3 Aug 2021 17:10:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E862C6105A for ; Tue, 3 Aug 2021 17:10:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237820AbhHCRKu (ORCPT ); Tue, 3 Aug 2021 13:10:50 -0400 Received: from mga14.intel.com ([192.55.52.115]:16898 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237732AbhHCRKn (ORCPT ); Tue, 3 Aug 2021 13:10:43 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="213466147" X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="213466147" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2021 10:10:22 -0700 X-IronPort-AV: E=Sophos;i="5.84,292,1620716400"; d="scan'208";a="521327165" Received: from shyamasr-mobl.amr.corp.intel.com ([10.209.65.83]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2021 10:10:20 -0700 From: Kishen Maloor To: bpf@vger.kernel.org, netdev@vger.kernel.org, hawk@kernel.org, magnus.karlsson@intel.com Cc: Jithu Joseph Subject: [RFC bpf-next 5/5] samples/bpf/xdpsock_user.c: Launchtime/TXTIME API usage Date: Tue, 3 Aug 2021 13:10:06 -0400 Message-Id: <20210803171006.13915-6-kishen.maloor@intel.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20210803171006.13915-1-kishen.maloor@intel.com> References: <20210803171006.13915-1-kishen.maloor@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Jithu Joseph Adds -L flag to the xdpsock commandline options. Using this in conjuction with "-t txonly" option exercises the Launchtime/Txtime APIs. These allows the application to specify when each packet should be transmitted by the NIC. Below is an example of how this option may be exercised: sudo xdpsock -i enp1s0 -t -q 1 -N -s 128 -z -L The above invocation would transmit "batch_size" packets each spaced 1us apart. The first packet in the batch is to be launched "LAUNCH_TIME_ADVANCE_NS" into the future and the remaining packets in the batch should be spaced 1us apart. Since launch-time enabled NICs would wait LAUNCH_TIME_ADVANCE_NS(500us) between batches of packets, the emphasis of this path is not throughput. Launch-time should be enabled for the chosen hardware queue using the appropriate tc qdisc command before starting the application and also NIC hardware clock should be synchronized to the system clock using a mechanism like phc2sys. Signed-off-by: Jithu Joseph --- samples/bpf/xdpsock_user.c | 64 +++++++++++++++++++++++++++++++++++--- 1 file changed, 60 insertions(+), 4 deletions(-) diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c index 3fd2f6a0d1eb..a0fd3d5414ba 100644 --- a/samples/bpf/xdpsock_user.c +++ b/samples/bpf/xdpsock_user.c @@ -55,6 +55,9 @@ #define DEBUG_HEXDUMP 0 +#define LAUNCH_TIME_ADVANCE_NS (500000) +#define CLOCK_SYNC_DELAY (60) + typedef __u64 u64; typedef __u32 u32; typedef __u16 u16; @@ -99,6 +102,7 @@ static u32 opt_num_xsks = 1; static u32 prog_id; static bool opt_busy_poll; static bool opt_reduced_cap; +static bool opt_launch_time; struct xsk_ring_stats { unsigned long rx_npkts; @@ -741,6 +745,8 @@ static inline u16 udp_csum(u32 saddr, u32 daddr, u32 len, #define ETH_FCS_SIZE 4 +#define MD_SIZE (sizeof(struct xdp_user_tx_metadata)) + #define PKT_HDR_SIZE (sizeof(struct ethhdr) + sizeof(struct iphdr) + \ sizeof(struct udphdr)) @@ -798,8 +804,10 @@ static void gen_eth_hdr_data(void) static void gen_eth_frame(struct xsk_umem_info *umem, u64 addr) { - memcpy(xsk_umem__get_data(umem->buffer, addr), pkt_data, - PKT_SIZE); + if (opt_launch_time) + memcpy(xsk_umem__get_data(umem->buffer, addr) + MD_SIZE, pkt_data, PKT_SIZE); + else + memcpy(xsk_umem__get_data(umem->buffer, addr), pkt_data, PKT_SIZE); } static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size) @@ -927,6 +935,7 @@ static struct option long_options[] = { {"irq-string", no_argument, 0, 'I'}, {"busy-poll", no_argument, 0, 'B'}, {"reduce-cap", no_argument, 0, 'R'}, + {"launch-time", no_argument, 0, 'L'}, {0, 0, 0, 0} }; @@ -967,6 +976,7 @@ static void usage(const char *prog) " -I, --irq-string Display driver interrupt statistics for interface associated with irq-string.\n" " -B, --busy-poll Busy poll.\n" " -R, --reduce-cap Use reduced capabilities (cannot be used with -M)\n" + " -L, --launch-time Toy example of launchtime using XDP\n" "\n"; fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE, opt_batch_size, MIN_PKT_SIZE, MIN_PKT_SIZE, @@ -982,7 +992,7 @@ static void parse_command_line(int argc, char **argv) opterr = 0; for (;;) { - c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:BR", + c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:BRL", long_options, &option_index); if (c == -1) break; @@ -1087,6 +1097,9 @@ static void parse_command_line(int argc, char **argv) case 'R': opt_reduced_cap = true; break; + case 'L': + opt_launch_time = true; + break; default: usage(basename(argv[0])); } @@ -1272,6 +1285,7 @@ static void tx_only(struct xsk_socket_info *xsk, u32 *frame_nb, int batch_size) { u32 idx; unsigned int i; + u64 cur_ts, launch_time; while (xsk_ring_prod__reserve(&xsk->tx, batch_size, &idx) < batch_size) { @@ -1280,10 +1294,28 @@ static void tx_only(struct xsk_socket_info *xsk, u32 *frame_nb, int batch_size) return; } + cur_ts = get_nsecs(CLOCK_TAI); + for (i = 0; i < batch_size; i++) { struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i); - tx_desc->addr = (*frame_nb + i) * opt_xsk_frame_size; + if (opt_launch_time) { + /* + * Direct the NIC to launch "batch_size" packets each spaced 1us apart. + * The below calculation specifies that the first packet in the batch + * is to be launched "LAUNCH_TIME_ADVANCE_NS" into the future and the + * remaining packets in the batch should be spaced 1us apart. + */ + launch_time = cur_ts + LAUNCH_TIME_ADVANCE_NS + (1000 * i); + xsk_umem__set_md_txtime(xsk->umem->buffer, + ((*frame_nb + i) * opt_xsk_frame_size), + launch_time); + tx_desc->addr = (*frame_nb + i) * opt_xsk_frame_size + MD_SIZE; + tx_desc->options = XDP_DESC_OPTION_METADATA; + } else { + tx_desc->addr = (*frame_nb + i) * opt_xsk_frame_size; + } + tx_desc->len = PKT_SIZE; } @@ -1293,6 +1325,16 @@ static void tx_only(struct xsk_socket_info *xsk, u32 *frame_nb, int batch_size) *frame_nb += batch_size; *frame_nb %= NUM_FRAMES; complete_tx_only(xsk, batch_size); + if (opt_launch_time) { + /* + * Hold the Tx loop until all the frames from this batch has been + * transmitted by the driver. This also ensures that all packets from + * this batch reach the driver ASAP before the proposed future launchtime + * becomes stale + */ + while (xsk->outstanding_tx) + complete_tx_only(xsk, opt_batch_size); + } } static inline int get_batch_size(int pkt_cnt) @@ -1334,6 +1376,20 @@ static void tx_only_all(void) fds[0].events = POLLOUT; } + if (opt_launch_time) { + /* + * For launch-time to be meaningful, the system clock and NIC h/w clock + * needs to be synchronized. Many NIC driver implementations resets the NIC + * during the bind operation in the XDP initialization flow path. + * The delay below is intended as a best case approach to hold off packet + * transmission till the syncronization is acheived. + */ + xsk_socket__enable_so_txtime(xsks[0]->xsk, true); + printf("Waiting for %ds for the NIC clock to synchronize with the system clock\n", + CLOCK_SYNC_DELAY); + sleep(CLOCK_SYNC_DELAY); + } + while ((opt_pkt_count && pkt_cnt < opt_pkt_count) || !opt_pkt_count) { int batch_size = get_batch_size(pkt_cnt);