From patchwork Tue Jun 9 17:26:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= X-Patchwork-Id: 217895 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAE00C433E2 for ; Tue, 9 Jun 2020 17:26:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A5AF7207C3 for ; Tue, 9 Jun 2020 17:26:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="abdFKGls" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731732AbgFIR0o (ORCPT ); Tue, 9 Jun 2020 13:26:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726938AbgFIR0m (ORCPT ); Tue, 9 Jun 2020 13:26:42 -0400 Received: from mail-pl1-x644.google.com (mail-pl1-x644.google.com [IPv6:2607:f8b0:4864:20::644]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E8ECC05BD1E; Tue, 9 Jun 2020 10:26:42 -0700 (PDT) Received: by mail-pl1-x644.google.com with SMTP id n9so8312692plk.1; Tue, 09 Jun 2020 10:26:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xdiuyaT4q5ifNZgwkk2eoy6a+eyBMSIgKVWGTekNeNI=; b=abdFKGlsVSDGF7HKzBIdGCAbqWpxQbcToJGapD2DGn4SvXoaqIA9HobdTvtxaM6WpX HNJGlcLoyRvtm9cCspC4awzoRUMnRRjGqTa1P90A+6NGeL6nZSLyawsObKwYXHuBhMnI HNvcGyHIho1qtAkXjUqyhljzImalb3ernVzJQBdmOJ1GxVITbNiOWy5V6NbL3PHW7KcR DwdMY0fZHGuwfxannCpVnmZ/IpP4idbbMd8R0XAGdLDRkMqVx3XsNDMboclLjYMTEzTG MTxS/D0fDOhTLiSIO9zAKuBvpHNIdiqVK3upql6H9D8/ilGXVUZQmJ4LgYR4kbL4v8KV aTjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=xdiuyaT4q5ifNZgwkk2eoy6a+eyBMSIgKVWGTekNeNI=; b=t3CXpcJz/ICyh0ryMZyqQa7ObjvMojdNN1foCJakQcRRcfNY18NtOYPWAYQLZmz/9t C7xNp24Ig0RE9DanUkOaVr3SIVb3R5coywVYqR63PuSfJ3PYnU2SX2Cpa1/pPKbiFlPC VPWpZCEEmC11U8salhZBqTPRty5NB0qqvPjSX2wUTn46Gtg8Q9Uu9Y59E01S68K/stfv T4ir5/eZW342LL+K+YaNIw8ym+pbLurCeQTuiW80QlMMOz1QvkmVyevfrCGmzjRGC2rQ qy/J32B+pRecnoNDeoHBsJrKBOLPG1IpfgFq6u7l+pJKESn2wnI9B359+RzQjL7sVGR+ Fiig== X-Gm-Message-State: AOAM533lBQBB45hEkqAl74QOICLuHp8oFhWamcIekEOUYR4VpnuYpPED y/3Fq24S23IEXRo/BXo+Kvc= X-Google-Smtp-Source: ABdhPJwpB0ergjxQdCAtFDzwLdPg9G0RMr3TWW+1niPB5PmV37mAljEBYSAnWROjMsvVs2qe2jzJWA== X-Received: by 2002:a17:90a:7401:: with SMTP id a1mr5683557pjg.218.1591723601635; Tue, 09 Jun 2020 10:26:41 -0700 (PDT) Received: from btopel-mobl.ger.intel.com ([192.55.55.41]) by smtp.gmail.com with ESMTPSA id w190sm10390387pfw.35.2020.06.09.10.26.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Jun 2020 10:26:40 -0700 (PDT) From: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= To: ast@kernel.org, daniel@iogearbox.net, bpf@vger.kernel.org, john.fastabend@gmail.com, toke@redhat.com Cc: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , magnus.karlsson@intel.com, netdev@vger.kernel.org, brouer@redhat.com, maciej.fijalkowski@intel.com Subject: [RFC PATCH bpf-next 1/2] bpf, xdp: add naive bpf_redirect_map() tail call detection Date: Tue, 9 Jun 2020 19:26:21 +0200 Message-Id: <20200609172622.37990-2-bjorn.topel@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200609172622.37990-1-bjorn.topel@gmail.com> References: <20200609172622.37990-1-bjorn.topel@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Björn Töpel The bpf_redirect_map() BPF helper is used to redirect a packet to the endpoint referenced by a map element. Currently, there is no restrictment how the helper is called, e.g. ret = bpf_redirect_map(...); ... ret = bpf_redirect_map(...); ... // e.g. modify packet return ret; is a valid, albeit weird, program. The unconstrained nature of BPF (redirect) helpers has implications for the implementation of the XDP_REDIRECT call. Today, the redirect flow for a packet is along the lines of: for_each_packet() { act = bpf_prog_run_xdp(xdp_prog, xdp); // does bpf_redirect_map() call if (act == XDP_REDIRECT) ret = xdp_do_redirect(...); } xdp_do_flush(); The bpf_redirect_map() helper populates a per-cpu structure, which is picked up by the xdp_do_redirect() function that also performs the redirect action. What if the verifier could detect that certain programs have certain constraints, so the functiontionality performed by xdp_do_redirect() could be done directly from bpf_redirect_map()? Here, the verifier is taught to detect program where all bpf_redirect_map() calls, are tail calls. If a function call is a last action performed in another function, it is said to be a tail call -- not to be confused with the BPF tail call helper. This implementation is just a PoC, and not complete. It manages to detect tail calls by a naive peephole scheme. For the following program (the built in AF_XDP libbpf program): ... call bpf_redirect_map if w0 != 0 goto pc+X (goto exit) ... call bpf_redirect_map exit the verifier detects that all calls are tail calls. TODO: * Stateful tail call detection, instead of the current peephole one. * Add all helpers capable of returning XDP_REDIRECT, and not just bpf_redirect_map(). Signed-off-by: Björn Töpel --- include/linux/bpf_verifier.h | 2 ++ include/linux/filter.h | 19 +++++++++++- kernel/bpf/verifier.c | 53 +++++++++++++++++++++++++++++++++ net/core/filter.c | 57 ++++++++++++++++++++++++++++++++++++ 4 files changed, 130 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index ca08db4ffb5f..e41fd9264c94 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -411,6 +411,8 @@ struct bpf_verifier_env { u32 peak_states; /* longest register parentage chain walked for liveness marking */ u32 longest_mark_read_walk; + bool all_redirect_map_tail; + u32 redirect_map_call_cnt; }; __printf(2, 0) void bpf_verifier_vlog(struct bpf_verifier_log *log, diff --git a/include/linux/filter.h b/include/linux/filter.h index 259377723603..cdbc2ca8db4f 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -534,7 +534,8 @@ struct bpf_prog { is_func:1, /* program is a bpf function */ kprobe_override:1, /* Do we override a kprobe? */ has_callchain_buf:1, /* callchain buffer allocated? */ - enforce_expected_attach_type:1; /* Enforce expected_attach_type checking at attach time */ + enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */ + redirect_tail_call:1; enum bpf_prog_type type; /* Type of BPF program */ enum bpf_attach_type expected_attach_type; /* For some prog types */ u32 len; /* Number of filter blocks */ @@ -613,6 +614,9 @@ struct bpf_redirect_info { void *tgt_value; struct bpf_map *map; u32 kern_flags; + struct bpf_prog *xdp_prog_redirect_tailcall; // if !NULL, we can do "everything" from BPF context. + struct xdp_buff *xdp; + struct net_device *dev; }; DECLARE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info); @@ -620,6 +624,19 @@ DECLARE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info); /* flags for bpf_redirect_info kern_flags */ #define BPF_RI_F_RF_NO_DIRECT BIT(0) /* no napi_direct on return_frame */ + +static inline void xdp_set_redirect_tailcall(struct bpf_prog *xdp_prog, + struct xdp_buff *xdp, + struct net_device *dev) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + + ri->xdp_prog_redirect_tailcall = xdp_prog; + ri->xdp = xdp; + ri->dev = dev; +} + + /* Compute the linear packet data range [data, data_end) which * will be accessed by various program types (cls_bpf, act_bpf, * lwt, ...). Subsystems allowing direct data access must (!) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 5c7bbaac81ef..27a4fd5c8b8b 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -4594,6 +4594,54 @@ static int check_reference_leak(struct bpf_verifier_env *env) return state->acquired_refs ? -EINVAL : 0; } +static void check_redirect_map_tail_call(struct bpf_verifier_env *env, int func_id, + int insn_idx) +{ + struct bpf_insn *insns = env->prog->insnsi; + int insn_cnt = env->prog->len; + bool is_tail_call = false; + struct bpf_insn *insn; + + if (func_id != BPF_FUNC_redirect_map) + return; + + /* Naive peephole tail call checking */ + insn_idx++; +again: + if (insn_idx >= insn_cnt) + goto out; + + insn = &insns[insn_idx]; + insn_idx++; + + switch (insn->code) { + /* 1. Is the instruction following the call, an exit? */ + case BPF_JMP | BPF_EXIT: + is_tail_call = true; + break; + /* 2. True branch of "if w0 != 0 goto pc+X", an exit? */ + case BPF_JMP | BPF_JSGT | BPF_K: + case BPF_JMP32 | BPF_JSGT | BPF_K: + if (insn->dst_reg == BPF_REG_0 && insn->imm == 0) { + insn_idx += insn->off; + goto again; + } + break; + /* 3... more checks. XXX */ + default: + break; + } + +out: + if (!env->redirect_map_call_cnt++) { + env->all_redirect_map_tail = is_tail_call; + return; + } + + if (!is_tail_call) + env->all_redirect_map_tail = false; +} + static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn_idx) { const struct bpf_func_proto *fn = NULL; @@ -4685,6 +4733,8 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn } } + check_redirect_map_tail_call(env, func_id, insn_idx); + regs = cur_regs(env); /* check that flags argument in get_local_storage(map, flags) is 0, @@ -11064,6 +11114,9 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, if (ret == 0) adjust_btf_func(env); + if (ret == 0) + env->prog->redirect_tail_call = env->all_redirect_map_tail; + err_release_maps: if (!env->prog->aux->used_maps) /* if we didn't copy map pointers into bpf_prog_info, release diff --git a/net/core/filter.c b/net/core/filter.c index d01a244b5087..8cd8dee9359a 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3701,6 +3701,60 @@ static const struct bpf_func_proto bpf_xdp_redirect_proto = { .arg2_type = ARG_ANYTHING, }; +static u64 __bpf_xdp_redirect_map_tailcall(struct bpf_map *map, u32 index, + u64 flags, + struct bpf_redirect_info *ri) +{ + struct xdp_buff *xdp = ri->xdp; + struct net_device *d = ri->dev; + struct bpf_prog *xdp_prog; + void *val; + int err; + + switch (map->map_type) { + case BPF_MAP_TYPE_DEVMAP: { + val = __dev_map_lookup_elem(map, index); + if (unlikely(!val)) + return flags; + err = dev_map_enqueue(val, xdp, d); + break; + } + case BPF_MAP_TYPE_DEVMAP_HASH: { + val = __dev_map_hash_lookup_elem(map, index); + if (unlikely(!val)) + return flags; + err = dev_map_enqueue(val, xdp, d); + break; + } + case BPF_MAP_TYPE_CPUMAP: { + val = __cpu_map_lookup_elem(map, index); + if (unlikely(!val)) + return flags; + err = cpu_map_enqueue(val, xdp, d); + break; + } + case BPF_MAP_TYPE_XSKMAP: { + val = __xsk_map_lookup_elem(map, index); + if (unlikely(!val)) + return flags; + err = __xsk_map_redirect(val, xdp); + break; + } + default: + return flags; + } + + xdp_prog = ri->xdp_prog_redirect_tailcall; + if (unlikely(err)) + goto err; + + _trace_xdp_redirect_map(d, xdp_prog, val, map, index); + return XDP_REDIRECT; +err: + _trace_xdp_redirect_map_err(d, xdp_prog, val, map, index, err); + return XDP_DROP; +} + BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, u64, flags) { @@ -3710,6 +3764,9 @@ BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, if (unlikely(flags > XDP_TX)) return XDP_ABORTED; + if (ri->xdp_prog_redirect_tailcall) + return __bpf_xdp_redirect_map_tailcall(map, ifindex, flags, ri); + ri->tgt_value = __xdp_map_lookup_elem(map, ifindex); if (unlikely(!ri->tgt_value)) { /* If the lookup fails we want to clear out the state in the