diff mbox series

[bpf-next,11/11] bpf: Fall back to nospec for spec path verification

Message ID 20250313175312.1120183-2-luis.gerhorst@fau.de
State New
Headers show
Series bpf: Mitigate Spectre v1 using barriers | expand

Commit Message

Luis Gerhorst March 13, 2025, 5:53 p.m. UTC
This trades verification complexity for runtime overheads due to the
nospec inserted because of the EINVAL.

With increased limits this allows applying mitigations to large BPF
progs such as the Parca Continuous Profiler's prog. However, this
requires a jump-seq limit of 256k. In any case, the same principle
should apply to smaller programs therefore include it even if the limit
stays at 8k for now. Most programs in [1] only require a limit of 32k.

[1] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and
    Precise Spectre Defenses for Untrusted Linux Kernel Extensions")

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Henriette Herzog <henriette.herzog@rub.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
---
 kernel/bpf/verifier.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Alexei Starovoitov March 19, 2025, 2:40 a.m. UTC | #1
On Thu, Mar 13, 2025 at 10:57 AM Luis Gerhorst <luis.gerhorst@fau.de> wrote:
>
> This trades verification complexity for runtime overheads due to the
> nospec inserted because of the EINVAL.
>
> With increased limits this allows applying mitigations to large BPF
> progs such as the Parca Continuous Profiler's prog. However, this
> requires a jump-seq limit of 256k. In any case, the same principle
> should apply to smaller programs therefore include it even if the limit
> stays at 8k for now. Most programs in [1] only require a limit of 32k.

Do you mean that without this change the verifier needs 256k
jmp limit to load Parca's prog as unpriv due to speculative
path exploration with push_stack ?

And this change uses 4k as a trade-off between prog runtime
and verification time ?

But tracing progs use bpf_probe_read_kernel(), so they're never going
to be unpriv.

> @@ -2010,6 +2011,19 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
>         struct bpf_verifier_stack_elem *elem;
>         int err;
>
> +       if (!env->bypass_spec_v1 &&
> +           cur->speculative &&

Should this be
(cur->speculative || speculative)
?

In general I'm not convinced that the approach is safe.

This recoverable EINVAL means that exploration under speculation
stops early, but there could be more branches and they won't be
sanitized with extra lfence.
So speculative execution can still happen at later insns.

Similar concern in patch 7:
+ if (state->speculative && cur_aux(env)->nospec)
+   goto process_bpf_exit;

One lfence at this insn doesn't stop speculation until the program end.
Only at this insn. The rest of the code is free to speculate.

The refactoring in patches 1-3 is nice.
Patches 4-5 are tricky and somewhat questionable, but make sense.
Patch 7 without early goto process_bpf_exit looks correct too,
Patch 8 is borderline. Feels like it's opening the door for
new vulnerabilities and space to explore for security researchers.
We disabled unpriv bpf by default and have no intentions to enable it.
Even if we land the whole thing the unpriv will stay disabled.
So trade offs don't appear favorable.
Luis Gerhorst March 19, 2025, 9:06 a.m. UTC | #2
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Thu, Mar 13, 2025 at 10:57 AM Luis Gerhorst <luis.gerhorst@fau.de> wrote:
>> With increased limits this allows applying mitigations to large BPF
>> progs such as the Parca Continuous Profiler's prog. However, this
>> requires a jump-seq limit of 256k. In any case, the same principle
>> should apply to smaller programs therefore include it even if the limit
>> stays at 8k for now. Most programs in [1] only require a limit of 32k.
>
> Do you mean that without this change the verifier needs 256k
> jmp limit to load Parca's prog as unpriv due to speculative
> path exploration with push_stack ?
>

Only with this change Parca is loadable when manually enabling Spectre
defenses for privileged programs and setting the following limits:
- BPF_COMPLEXITY_LIMIT_JMP_SEQ=256k
- BPF_COMPLEXITY_LIMIT_SPEC_V1_VERIFICATION=128k
- BPF_COMPLEXITY_LIMIT_INSNS=32M

>
> And this change uses 4k as a trade-off between prog runtime
> and verification time ?
>

Yes, this change uses 4k to limited nested speculative path exploration.
At the top-level (i.e., on architectural paths), spawned speculative
paths are still explored.

>
> But tracing progs use bpf_probe_read_kernel(), so they're never going
> to be unpriv.
>

I'm sorry this was not clear. Parca is only used as an example here
to test whether this improves expressiveness in general.

If you do not see this as a favorable tradeoff (because of the code
complexity), I think it would be acceptable to drop the last patch for
now. The biggest improvements is already achieved with the other patches
as evident from the selftests. I can do a more exhaustive analysis on
patch 11 later.

>
>> @@ -2010,6 +2011,19 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
>>         struct bpf_verifier_stack_elem *elem;
>>         int err;
>>
>> +       if (!env->bypass_spec_v1 &&
>> +           cur->speculative &&
>
> Should this be
> (cur->speculative || speculative)
> ?

No, I think it will be unsafe to add `|| speculative` here. If we were
to return -EINVAL from push_stack() when pushing a speculative path from
a non-speculative path (e.g., in check_cond_jmp_op() through
sanitize_speculative_path()), this will cause do_check() to add an
lfence before the jump-op.

Here's a minimal example program:

	A = true
	B = true
	if A goto e
	f()
	if B goto e
	unsafe()
e:	exit

There are the following speculative and non-speculative paths
(`cur->speculative` and `speculative` referring to the value of the
push_stack() parameters):

- A = true
- B = true
- if A goto e
  - A && !cur->speculative && !speculative
    - exit
  - !A && !cur->speculative && speculative
    - f()
    - if B goto e
      - B && cur->speculative && !speculative
        - exit
      - !B && cur->speculative && speculative
        - unsafe()

`|| speculative` might cause us to only add an lfence before `if A goto
e`, which would not be sufficient. Intel recommends adding the lfence
after the jump [1].

>
> In general I'm not convinced that the approach is safe.
>
> This recoverable EINVAL means that exploration under speculation
> stops early, but there could be more branches and they won't be
> sanitized with extra lfence.
> So speculative execution can still happen at later insns.
>

`goto process_bpf_exit` only causes us to stop analyzing this particular
path, not the rest of the program.

This is based on the assumption, that the lfence stops the CPU from ever
reaching those branches (if they are not reachable through other means).

>
> Similar concern in patch 7:
> + if (state->speculative && cur_aux(env)->nospec)
> +   goto process_bpf_exit;
>
> One lfence at this insn doesn't stop speculation until the program end.
> Only at this insn. The rest of the code is free to speculate.
>

Taking the program above as an example, this allows us to stop before
f() if an lfence was inserted there.

Fully patched program would be:

	A = true
	B = true
	if A goto e
	lfence
	f()
	if B goto e
	unsafe()
e:	exit

In this example, all instructions after the lfence are dead code (and
with the lfence they are also dead code speculatively).

I believe this is in line with Intel's guidance [1]:

	An LFENCE instruction or a serializing instruction will ensure that
	no later instructions execute, even speculatively, until all prior
	instructions complete locally. Developers might prefer LFENCE over a
	serializing instruction because LFENCE may have lower latency.
	Inserting an LFENCE instruction after a bounds check prevents later
	operations from executing before the bound check completes.

With regards to the example, this implies that `if B goto e` will not
execute before `if A goto e` completes. Once `if A goto e` completes,
the CPU should find that the speculation was wrong and continue with
`exit`.

If there is any other path that leads to `if B goto e` (and therefore
`unsafe()`) without going through `if A goto e`, then an lfence might of
course still be needed there. However, I assume this other path will be
explored separately and therefore be discovered by the verifier even if
the exploration discussed here stops at the lfence. If this assumption
is wrong, please let me know.

>
> The refactoring in patches 1-3 is nice.
> Patches 4-5 are tricky and somewhat questionable, but make sense.
> Patch 7 without early goto process_bpf_exit looks correct too,
> Patch 8 is borderline. Feels like it's opening the door for
> new vulnerabilities and space to explore for security researchers.
> We disabled unpriv bpf by default and have no intentions to enable it.
> Even if we land the whole thing the unpriv will stay disabled.
> So trade offs don't appear favorable.
>

Thank you very much for having a look. Let me know whether the above
resolves your concern.

In any case, should I separate patches 1-3 into another series?

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/runtime-speculative-side-channel-mitigations.html
    ("Managed Runtime Speculative Execution Side Channel Mitigations")
diff mbox series

Patch

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 03af82f52a02..49c7e2608ccd 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -187,6 +187,7 @@  struct bpf_verifier_stack_elem {
 };
 
 #define BPF_COMPLEXITY_LIMIT_JMP_SEQ	8192
+#define BPF_COMPLEXITY_LIMIT_SPEC_V1_VERIFICATION	(BPF_COMPLEXITY_LIMIT_JMP_SEQ / 2)
 #define BPF_COMPLEXITY_LIMIT_STATES	64
 
 #define BPF_MAP_KEY_POISON	(1ULL << 63)
@@ -2010,6 +2011,19 @@  static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
 	struct bpf_verifier_stack_elem *elem;
 	int err;
 
+	if (!env->bypass_spec_v1 &&
+	    cur->speculative &&
+	    env->stack_size > BPF_COMPLEXITY_LIMIT_SPEC_V1_VERIFICATION) {
+		/* Avoiding nested speculative path verification because we are
+		 * close to exceeding the jump sequence complexity limit. Will
+		 * instead insert a speculation barrier which will impact
+		 * performace. To improve performance, authors should reduce the
+		 * program's complexity. Barrier will be inserted in
+		 * do_check().
+		 */
+		return ERR_PTR(-EINVAL);
+	}
+
 	elem = kzalloc(sizeof(struct bpf_verifier_stack_elem), GFP_KERNEL);
 	if (!elem) {
 		err = -ENOMEM;