diff mbox series

[2/3] Revert "bpf: Fix issue in verifying allow_ptr_leaks"

Message ID 20230913122827.91591-1-gerhorst@amazon.de
State New
Headers show
Series [1/3] Revert "selftests/bpf: Add selftest for allow_ptr_leaks" | expand

Commit Message

Luis Gerhorst Sept. 13, 2023, 12:28 p.m. UTC
This reverts commit d75e30dddf73449bc2d10bb8e2f1a2c446bc67a2.

To mitigate Spectre v1, the verifier relies on static analysis to deduct
constant pointer bounds, which can then be enforced by rewriting pointer
arithmetic [1] or index masking [2]. This relies on the fact that every
memory region to be accessed has a static upper bound and every date
below that bound is accessible. The verifier can only rewrite pointer
arithmetic or insert masking instructions to mitigate Spectre v1 if a
static upper bound, below of which every access is valid, can be given.

When allowing packet pointer comparisons, this introduces a way for the
program to effectively construct an accessible pointer for which no
static upper bound is known. Intuitively, this is obvious as a packet
might be of any size and therefore 0 is the only statically known upper
bound below of which every date is always accessible (i.e., none).

To clarify, the problem is not that comparing two pointers can be used
for pointer leaks in the same way in that comparing a pointer to a known
scalar can be used for pointer leaks. That is because the "secret"
components of the addresses cancel each other out if the pointers are
into the same region.

With [3] applied, the following malicious BPF program can be loaded into
the kernel without CAP_PERFMON:

r2 = *(u32 *)(r1 + 76) // data
r3 = *(u32 *)(r1 + 80) // data_end
r4 = r2
r4 += 1
if r4 > r3 goto exit
r5 = *(u8 *)(r2 + 0) // speculatively read secret
r5 &= 1 // choose bit to leak
// ... side channel to leak secret bit
exit:
// ...

This is jited to the following amd64 code which still contains the
gadget:

   0:   endbr64
   4:   nopl   0x0(%rax,%rax,1)
   9:   xchg   %ax,%ax
   b:   push   %rbp
   c:   mov    %rsp,%rbp
   f:   endbr64
  13:   push   %rbx
  14:   mov    0xc8(%rdi),%rsi // data
  1b:   mov    0x50(%rdi),%rdx // data_end
  1f:   mov    %rsi,%rcx
  22:   add    $0x1,%rcx
  26:   cmp    %rdx,%rcx
  29:   ja     0x000000000000003f // branch to mispredict
  2b:   movzbq 0x0(%rsi),%r8 // speculative load of secret
  30:   and    $0x1,%r8 // choose bit to leak
  34:   xor    %ebx,%ebx
  36:   cmp    %rbx,%r8
  39:   je     0x000000000000003f // branch based on secret
  3b:   imul   $0x61,%r8,%r8 // leak using port contention side channel
  3f:   xor    %eax,%eax
  41:   pop    %rbx
  42:   leaveq
  43:   retq

Here I'm using a port contention side channel because storing the secret
to the stack causes the verifier to insert an lfence for unrelated
reasons (SSB mitigation) which would terminate the speculation.

As Daniel already pointed out to me, data_end is even attacker
controlled as one could send many packets of sufficient length to train
the branch prediction into assuming data_end >= data will never be true.
When the attacker then sends a packet with insufficient data, the
Spectre v1 gadget leaks the chosen bit of some value that lies behind
data_end.

To make it clear that the problem is not the pointer comparison but the
missing masking instruction, it can be useful to transform the code
above into the following equivalent pseudocode:

r2 = data
r3 = data_end
r6 = ... // index to access, constant does not help
r7 = data_end - data // only known at runtime, could be [0,PKT_MAX)
if !(r6 < r7) goto exit
// no masking of index in r6 happens
r2 += r6 // addr. to access
r5 = *(u8 *)(r2 + 0) // speculatively read secret
// ... leak secret as above

One idea to resolve this while still allowing for unprivileged packet
access would be to always allocate a power of 2 for the packet data and
then also pass the respective index mask in the skb structure. The
verifier would then have to check that this index mask is always applied
to the offset before a packet pointer is dereferenced. This patch does
not implement this extension, but only reverts [3].

[1] 979d63d50c0c0f7bc537bf821e056cc9fe5abd38 ("bpf: prevent out of bounds speculation on pointer arithmetic")
[2] b2157399cc9898260d6031c5bfe45fe137c1fbe7 ("bpf: prevent out-of-bounds speculation")
[3] d75e30dddf73449bc2d10bb8e2f1a2c446bc67a2 ("bpf: Fix issue in verifying allow_ptr_leaks")

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Luis Gerhorst <gerhorst@amazon.de>
Signed-off-by: Luis Gerhorst <gerhorst@cs.fau.de>
Acked-by: Hagar Gamal Halim Hemdan <hagarhem@amazon.de>
Cc: Puranjay Mohan <puranjay12@gmail.com>
---
 kernel/bpf/verifier.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

Comments

Alexei Starovoitov Sept. 14, 2023, 4:20 p.m. UTC | #1
On Wed, Sep 13, 2023 at 5:30 AM Luis Gerhorst <gerhorst@amazon.de> wrote:
>
> This reverts commit d75e30dddf73449bc2d10bb8e2f1a2c446bc67a2.
>
> To mitigate Spectre v1, the verifier relies on static analysis to deduct
> constant pointer bounds, which can then be enforced by rewriting pointer
> arithmetic [1] or index masking [2]. This relies on the fact that every
> memory region to be accessed has a static upper bound and every date
> below that bound is accessible. The verifier can only rewrite pointer
> arithmetic or insert masking instructions to mitigate Spectre v1 if a
> static upper bound, below of which every access is valid, can be given.
>
> When allowing packet pointer comparisons, this introduces a way for the
> program to effectively construct an accessible pointer for which no
> static upper bound is known. Intuitively, this is obvious as a packet
> might be of any size and therefore 0 is the only statically known upper
> bound below of which every date is always accessible (i.e., none).
>
> To clarify, the problem is not that comparing two pointers can be used
> for pointer leaks in the same way in that comparing a pointer to a known
> scalar can be used for pointer leaks. That is because the "secret"
> components of the addresses cancel each other out if the pointers are
> into the same region.
>
> With [3] applied, the following malicious BPF program can be loaded into
> the kernel without CAP_PERFMON:
>
> r2 = *(u32 *)(r1 + 76) // data
> r3 = *(u32 *)(r1 + 80) // data_end
> r4 = r2
> r4 += 1
> if r4 > r3 goto exit
> r5 = *(u8 *)(r2 + 0) // speculatively read secret
> r5 &= 1 // choose bit to leak
> // ... side channel to leak secret bit
> exit:
> // ...
>
> This is jited to the following amd64 code which still contains the
> gadget:
>
>    0:   endbr64
>    4:   nopl   0x0(%rax,%rax,1)
>    9:   xchg   %ax,%ax
>    b:   push   %rbp
>    c:   mov    %rsp,%rbp
>    f:   endbr64
>   13:   push   %rbx
>   14:   mov    0xc8(%rdi),%rsi // data
>   1b:   mov    0x50(%rdi),%rdx // data_end
>   1f:   mov    %rsi,%rcx
>   22:   add    $0x1,%rcx
>   26:   cmp    %rdx,%rcx
>   29:   ja     0x000000000000003f // branch to mispredict
>   2b:   movzbq 0x0(%rsi),%r8 // speculative load of secret
>   30:   and    $0x1,%r8 // choose bit to leak
>   34:   xor    %ebx,%ebx
>   36:   cmp    %rbx,%r8
>   39:   je     0x000000000000003f // branch based on secret
>   3b:   imul   $0x61,%r8,%r8 // leak using port contention side channel
>   3f:   xor    %eax,%eax
>   41:   pop    %rbx
>   42:   leaveq
>   43:   retq
>
> Here I'm using a port contention side channel because storing the secret
> to the stack causes the verifier to insert an lfence for unrelated
> reasons (SSB mitigation) which would terminate the speculation.
>
> As Daniel already pointed out to me, data_end is even attacker
> controlled as one could send many packets of sufficient length to train
> the branch prediction into assuming data_end >= data will never be true.
> When the attacker then sends a packet with insufficient data, the
> Spectre v1 gadget leaks the chosen bit of some value that lies behind
> data_end.

The above analysis is correct, but unlike traditional spec_v1
the attacker doesn't control data/data_end.
The attack can send many large packets to train that data + X < data_end
and then send a small packet where CPU will mispredict that branch
and data + X will speculatively read past data_end,
so the attacker can extract a bit past data_end,
but data/data_end themselves cannot be controlled.
So whether this bit 0 or 1 has no bearing.
The attack cannot be repeated for the same location.
The attacker can read one bit 8 times in a row and all of them
will be from different locations in the memory.
Same as reading 8 random bits from 8 random locations.
Hence I don't think this revert is necessary.
I don't believe you can craft an actual exploit.

Your patch 3 says:
       /* Speculative access to be prevented. */
+       char secret = *((char *) iph);
+
+       /* Leak the first bit of the secret value that lies behind data_end to a
+        * SMP silbling thread that also executes imul instructions. If the bit
+        * is 1, the silbling will experience a slowdown. */
+       long long x = secret;
+       if (secret & 1) {
+               x *= 97;
+       }

the comment is correct, but speculative access alone is not enough
to leak data.
Daniel Borkmann Sept. 14, 2023, 5:24 p.m. UTC | #2
On 9/14/23 6:20 PM, Alexei Starovoitov wrote:
> On Wed, Sep 13, 2023 at 5:30 AM Luis Gerhorst <gerhorst@amazon.de> wrote:
>>
>> This reverts commit d75e30dddf73449bc2d10bb8e2f1a2c446bc67a2.
>>
>> To mitigate Spectre v1, the verifier relies on static analysis to deduct
>> constant pointer bounds, which can then be enforced by rewriting pointer
>> arithmetic [1] or index masking [2]. This relies on the fact that every
>> memory region to be accessed has a static upper bound and every date
>> below that bound is accessible. The verifier can only rewrite pointer
>> arithmetic or insert masking instructions to mitigate Spectre v1 if a
>> static upper bound, below of which every access is valid, can be given.
>>
>> When allowing packet pointer comparisons, this introduces a way for the
>> program to effectively construct an accessible pointer for which no
>> static upper bound is known. Intuitively, this is obvious as a packet
>> might be of any size and therefore 0 is the only statically known upper
>> bound below of which every date is always accessible (i.e., none).
>>
>> To clarify, the problem is not that comparing two pointers can be used
>> for pointer leaks in the same way in that comparing a pointer to a known
>> scalar can be used for pointer leaks. That is because the "secret"
>> components of the addresses cancel each other out if the pointers are
>> into the same region.
>>
>> With [3] applied, the following malicious BPF program can be loaded into
>> the kernel without CAP_PERFMON:
>>
>> r2 = *(u32 *)(r1 + 76) // data
>> r3 = *(u32 *)(r1 + 80) // data_end
>> r4 = r2
>> r4 += 1
>> if r4 > r3 goto exit
>> r5 = *(u8 *)(r2 + 0) // speculatively read secret
>> r5 &= 1 // choose bit to leak
>> // ... side channel to leak secret bit
>> exit:
>> // ...
>>
>> This is jited to the following amd64 code which still contains the
>> gadget:
>>
>>     0:   endbr64
>>     4:   nopl   0x0(%rax,%rax,1)
>>     9:   xchg   %ax,%ax
>>     b:   push   %rbp
>>     c:   mov    %rsp,%rbp
>>     f:   endbr64
>>    13:   push   %rbx
>>    14:   mov    0xc8(%rdi),%rsi // data
>>    1b:   mov    0x50(%rdi),%rdx // data_end
>>    1f:   mov    %rsi,%rcx
>>    22:   add    $0x1,%rcx
>>    26:   cmp    %rdx,%rcx
>>    29:   ja     0x000000000000003f // branch to mispredict
>>    2b:   movzbq 0x0(%rsi),%r8 // speculative load of secret
>>    30:   and    $0x1,%r8 // choose bit to leak
>>    34:   xor    %ebx,%ebx
>>    36:   cmp    %rbx,%r8
>>    39:   je     0x000000000000003f // branch based on secret
>>    3b:   imul   $0x61,%r8,%r8 // leak using port contention side channel
>>    3f:   xor    %eax,%eax
>>    41:   pop    %rbx
>>    42:   leaveq
>>    43:   retq
>>
>> Here I'm using a port contention side channel because storing the secret
>> to the stack causes the verifier to insert an lfence for unrelated
>> reasons (SSB mitigation) which would terminate the speculation.
>>
>> As Daniel already pointed out to me, data_end is even attacker
>> controlled as one could send many packets of sufficient length to train
>> the branch prediction into assuming data_end >= data will never be true.
>> When the attacker then sends a packet with insufficient data, the
>> Spectre v1 gadget leaks the chosen bit of some value that lies behind
>> data_end.
> 
> The above analysis is correct, but unlike traditional spec_v1
> the attacker doesn't control data/data_end.
> The attack can send many large packets to train that data + X < data_end
> and then send a small packet where CPU will mispredict that branch
> and data + X will speculatively read past data_end,
> so the attacker can extract a bit past data_end,
> but data/data_end themselves cannot be controlled.
> So whether this bit 0 or 1 has no bearing.
> The attack cannot be repeated for the same location.
> The attacker can read one bit 8 times in a row and all of them
> will be from different locations in the memory.
> Same as reading 8 random bits from 8 random locations.
> Hence I don't think this revert is necessary.
> I don't believe you can craft an actual exploit.
> 
> Your patch 3 says:
>         /* Speculative access to be prevented. */
> +       char secret = *((char *) iph);
> +
> +       /* Leak the first bit of the secret value that lies behind data_end to a
> +        * SMP silbling thread that also executes imul instructions. If the bit
> +        * is 1, the silbling will experience a slowdown. */
> +       long long x = secret;
> +       if (secret & 1) {
> +               x *= 97;
> +       }
> 
> the comment is correct, but speculative access alone is not enough
> to leak data.

What you write makes sense, it will probably be hard to craft an exploit.
Where it's a bit more of an unknown to me is whether struct skb_shared_info
could have e.g. destructor_arg rather static (at last the upper addr bits)
so that you would leak out kernel addresses.
Alexei Starovoitov Sept. 14, 2023, 7:47 p.m. UTC | #3
On Thu, Sep 14, 2023 at 10:24 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 9/14/23 6:20 PM, Alexei Starovoitov wrote:
> > On Wed, Sep 13, 2023 at 5:30 AM Luis Gerhorst <gerhorst@amazon.de> wrote:
> >>
> >> This reverts commit d75e30dddf73449bc2d10bb8e2f1a2c446bc67a2.
> >>
> >> To mitigate Spectre v1, the verifier relies on static analysis to deduct
> >> constant pointer bounds, which can then be enforced by rewriting pointer
> >> arithmetic [1] or index masking [2]. This relies on the fact that every
> >> memory region to be accessed has a static upper bound and every date
> >> below that bound is accessible. The verifier can only rewrite pointer
> >> arithmetic or insert masking instructions to mitigate Spectre v1 if a
> >> static upper bound, below of which every access is valid, can be given.
> >>
> >> When allowing packet pointer comparisons, this introduces a way for the
> >> program to effectively construct an accessible pointer for which no
> >> static upper bound is known. Intuitively, this is obvious as a packet
> >> might be of any size and therefore 0 is the only statically known upper
> >> bound below of which every date is always accessible (i.e., none).
> >>
> >> To clarify, the problem is not that comparing two pointers can be used
> >> for pointer leaks in the same way in that comparing a pointer to a known
> >> scalar can be used for pointer leaks. That is because the "secret"
> >> components of the addresses cancel each other out if the pointers are
> >> into the same region.
> >>
> >> With [3] applied, the following malicious BPF program can be loaded into
> >> the kernel without CAP_PERFMON:
> >>
> >> r2 = *(u32 *)(r1 + 76) // data
> >> r3 = *(u32 *)(r1 + 80) // data_end
> >> r4 = r2
> >> r4 += 1
> >> if r4 > r3 goto exit
> >> r5 = *(u8 *)(r2 + 0) // speculatively read secret
> >> r5 &= 1 // choose bit to leak
> >> // ... side channel to leak secret bit
> >> exit:
> >> // ...
> >>
> >> This is jited to the following amd64 code which still contains the
> >> gadget:
> >>
> >>     0:   endbr64
> >>     4:   nopl   0x0(%rax,%rax,1)
> >>     9:   xchg   %ax,%ax
> >>     b:   push   %rbp
> >>     c:   mov    %rsp,%rbp
> >>     f:   endbr64
> >>    13:   push   %rbx
> >>    14:   mov    0xc8(%rdi),%rsi // data
> >>    1b:   mov    0x50(%rdi),%rdx // data_end
> >>    1f:   mov    %rsi,%rcx
> >>    22:   add    $0x1,%rcx
> >>    26:   cmp    %rdx,%rcx
> >>    29:   ja     0x000000000000003f // branch to mispredict
> >>    2b:   movzbq 0x0(%rsi),%r8 // speculative load of secret
> >>    30:   and    $0x1,%r8 // choose bit to leak
> >>    34:   xor    %ebx,%ebx
> >>    36:   cmp    %rbx,%r8
> >>    39:   je     0x000000000000003f // branch based on secret
> >>    3b:   imul   $0x61,%r8,%r8 // leak using port contention side channel
> >>    3f:   xor    %eax,%eax
> >>    41:   pop    %rbx
> >>    42:   leaveq
> >>    43:   retq
> >>
> >> Here I'm using a port contention side channel because storing the secret
> >> to the stack causes the verifier to insert an lfence for unrelated
> >> reasons (SSB mitigation) which would terminate the speculation.
> >>
> >> As Daniel already pointed out to me, data_end is even attacker
> >> controlled as one could send many packets of sufficient length to train
> >> the branch prediction into assuming data_end >= data will never be true.
> >> When the attacker then sends a packet with insufficient data, the
> >> Spectre v1 gadget leaks the chosen bit of some value that lies behind
> >> data_end.
> >
> > The above analysis is correct, but unlike traditional spec_v1
> > the attacker doesn't control data/data_end.
> > The attack can send many large packets to train that data + X < data_end
> > and then send a small packet where CPU will mispredict that branch
> > and data + X will speculatively read past data_end,
> > so the attacker can extract a bit past data_end,
> > but data/data_end themselves cannot be controlled.
> > So whether this bit 0 or 1 has no bearing.
> > The attack cannot be repeated for the same location.
> > The attacker can read one bit 8 times in a row and all of them
> > will be from different locations in the memory.
> > Same as reading 8 random bits from 8 random locations.
> > Hence I don't think this revert is necessary.
> > I don't believe you can craft an actual exploit.
> >
> > Your patch 3 says:
> >         /* Speculative access to be prevented. */
> > +       char secret = *((char *) iph);
> > +
> > +       /* Leak the first bit of the secret value that lies behind data_end to a
> > +        * SMP silbling thread that also executes imul instructions. If the bit
> > +        * is 1, the silbling will experience a slowdown. */
> > +       long long x = secret;
> > +       if (secret & 1) {
> > +               x *= 97;
> > +       }
> >
> > the comment is correct, but speculative access alone is not enough
> > to leak data.
>
> What you write makes sense, it will probably be hard to craft an exploit.
> Where it's a bit more of an unknown to me is whether struct skb_shared_info
> could have e.g. destructor_arg rather static (at last the upper addr bits)
> so that you would leak out kernel addresses.

You mean since skb_shared_info is placed after skb->end
and in zero copy case destructor_arg may be initialized with the same
kernel pointer for multiple skb-s ?
The attacker cannot construct the address from data_end.
The verifier explicitly prohibits any ALU with PTR_TO_PACKET_END.
But the attacker can do skb->data + X.
The idea is that they can train the branch to mispredict with
a large packet and then send a small one so that shared_info
after skb->end has the same uarg pointer in all packets?
So every skb->data+X is a different location, but all of them
point to data that has uarg==destructor_arg ?

That would be feasible in theory, but in order to speculate the loads
the branch mispredict has to be reliable.
The spec v1 attack requires one of two loads feeding
into compare operation has to be slow.
In this case both data and data_end loads are going to be fast.
The attacker cannot evict skb->data or skb->data_end from cache.
Remember that we rearranged 'max_entries' field in struct bpf_map
specifically to be in the different cache line vs fields
controlled by user space. It was the necessary part of spec v1 attack.

So I still believe this revert is unnecessary and this speculative
execution is not exploitable.
Yafang Shao Sept. 15, 2023, 2:26 a.m. UTC | #4
On Wed, Sep 13, 2023 at 8:30 PM Luis Gerhorst <gerhorst@amazon.de> wrote:
>
> This reverts commit d75e30dddf73449bc2d10bb8e2f1a2c446bc67a2.
>
> To mitigate Spectre v1, the verifier relies on static analysis to deduct
> constant pointer bounds, which can then be enforced by rewriting pointer
> arithmetic [1] or index masking [2]. This relies on the fact that every
> memory region to be accessed has a static upper bound and every date
> below that bound is accessible. The verifier can only rewrite pointer
> arithmetic or insert masking instructions to mitigate Spectre v1 if a
> static upper bound, below of which every access is valid, can be given.
>
> When allowing packet pointer comparisons, this introduces a way for the
> program to effectively construct an accessible pointer for which no
> static upper bound is known. Intuitively, this is obvious as a packet
> might be of any size and therefore 0 is the only statically known upper
> bound below of which every date is always accessible (i.e., none).
>
> To clarify, the problem is not that comparing two pointers can be used
> for pointer leaks in the same way in that comparing a pointer to a known
> scalar can be used for pointer leaks. That is because the "secret"
> components of the addresses cancel each other out if the pointers are
> into the same region.
>
> With [3] applied, the following malicious BPF program can be loaded into
> the kernel without CAP_PERFMON:
>
> r2 = *(u32 *)(r1 + 76) // data
> r3 = *(u32 *)(r1 + 80) // data_end
> r4 = r2
> r4 += 1
> if r4 > r3 goto exit
> r5 = *(u8 *)(r2 + 0) // speculatively read secret
> r5 &= 1 // choose bit to leak
> // ... side channel to leak secret bit
> exit:
> // ...
>
> This is jited to the following amd64 code which still contains the
> gadget:
>
>    0:   endbr64
>    4:   nopl   0x0(%rax,%rax,1)
>    9:   xchg   %ax,%ax
>    b:   push   %rbp
>    c:   mov    %rsp,%rbp
>    f:   endbr64
>   13:   push   %rbx
>   14:   mov    0xc8(%rdi),%rsi // data
>   1b:   mov    0x50(%rdi),%rdx // data_end
>   1f:   mov    %rsi,%rcx
>   22:   add    $0x1,%rcx
>   26:   cmp    %rdx,%rcx
>   29:   ja     0x000000000000003f // branch to mispredict
>   2b:   movzbq 0x0(%rsi),%r8 // speculative load of secret
>   30:   and    $0x1,%r8 // choose bit to leak
>   34:   xor    %ebx,%ebx
>   36:   cmp    %rbx,%r8
>   39:   je     0x000000000000003f // branch based on secret
>   3b:   imul   $0x61,%r8,%r8 // leak using port contention side channel
>   3f:   xor    %eax,%eax
>   41:   pop    %rbx
>   42:   leaveq
>   43:   retq
>
> Here I'm using a port contention side channel because storing the secret
> to the stack causes the verifier to insert an lfence for unrelated
> reasons (SSB mitigation) which would terminate the speculation.
>
> As Daniel already pointed out to me, data_end is even attacker
> controlled as one could send many packets of sufficient length to train
> the branch prediction into assuming data_end >= data will never be true.
> When the attacker then sends a packet with insufficient data, the
> Spectre v1 gadget leaks the chosen bit of some value that lies behind
> data_end.
>
> To make it clear that the problem is not the pointer comparison but the
> missing masking instruction, it can be useful to transform the code
> above into the following equivalent pseudocode:
>
> r2 = data
> r3 = data_end
> r6 = ... // index to access, constant does not help
> r7 = data_end - data // only known at runtime, could be [0,PKT_MAX)
> if !(r6 < r7) goto exit
> // no masking of index in r6 happens
> r2 += r6 // addr. to access
> r5 = *(u8 *)(r2 + 0) // speculatively read secret
> // ... leak secret as above
>
> One idea to resolve this while still allowing for unprivileged packet
> access would be to always allocate a power of 2 for the packet data and
> then also pass the respective index mask in the skb structure. The
> verifier would then have to check that this index mask is always applied
> to the offset before a packet pointer is dereferenced. This patch does
> not implement this extension, but only reverts [3].

Hi Luis,

The skb pointer comparison is a reasonable operation in a networking bpf prog.
If we just prohibit a reasonable operation to prevent a possible
spectre v1 attack, it looks a little weird, right ?
Should we figure out a real fix to prevent it ?
Yafang Shao Sept. 19, 2023, 3:43 a.m. UTC | #5
On Mon, Sep 18, 2023 at 7:52 PM Luis Gerhorst <gerhorst@cs.fau.de> wrote:
>
> On 15/09/2023 04:26, Yafang Shao wrote:
> > On Wed, Sep 13, 2023 at 8:30 PM Luis Gerhorst <gerhorst@amazon.de> wrote:
> >>
> >> This reverts commit d75e30dddf73449bc2d10bb8e2f1a2c446bc67a2.
> >>
> >> To mitigate Spectre v1, the verifier relies on static analysis to deduct
> >> constant pointer bounds, which can then be enforced by rewriting pointer
> >> arithmetic [1] or index masking [2]. This relies on the fact that every
> >> memory region to be accessed has a static upper bound and every date
> >> below that bound is accessible. The verifier can only rewrite pointer
> >> arithmetic or insert masking instructions to mitigate Spectre v1 if a
> >> static upper bound, below of which every access is valid, can be given.
> >>
> >> When allowing packet pointer comparisons, this introduces a way for the
> >> program to effectively construct an accessible pointer for which no
> >> static upper bound is known. Intuitively, this is obvious as a packet
> >> might be of any size and therefore 0 is the only statically known upper
> >> bound below of which every date is always accessible (i.e., none).
> >>
> >> To clarify, the problem is not that comparing two pointers can be used
> >> for pointer leaks in the same way in that comparing a pointer to a known
> >> scalar can be used for pointer leaks. That is because the "secret"
> >> components of the addresses cancel each other out if the pointers are
> >> into the same region.
> >>
> >> With [3] applied, the following malicious BPF program can be loaded into
> >> the kernel without CAP_PERFMON:
> >>
> >> r2 = *(u32 *)(r1 + 76) // data
> >> r3 = *(u32 *)(r1 + 80) // data_end
> >> r4 = r2
> >> r4 += 1
> >> if r4 > r3 goto exit
> >> r5 = *(u8 *)(r2 + 0) // speculatively read secret
> >> r5 &= 1 // choose bit to leak
> >> // ... side channel to leak secret bit
> >> exit:
> >> // ...
> >>
> >> This is jited to the following amd64 code which still contains the
> >> gadget:
> >>
> >>     0:   endbr64
> >>     4:   nopl   0x0(%rax,%rax,1)
> >>     9:   xchg   %ax,%ax
> >>     b:   push   %rbp
> >>     c:   mov    %rsp,%rbp
> >>     f:   endbr64
> >>    13:   push   %rbx
> >>    14:   mov    0xc8(%rdi),%rsi // data
> >>    1b:   mov    0x50(%rdi),%rdx // data_end
> >>    1f:   mov    %rsi,%rcx
> >>    22:   add    $0x1,%rcx
> >>    26:   cmp    %rdx,%rcx
> >>    29:   ja     0x000000000000003f // branch to mispredict
> >>    2b:   movzbq 0x0(%rsi),%r8 // speculative load of secret
> >>    30:   and    $0x1,%r8 // choose bit to leak
> >>    34:   xor    %ebx,%ebx
> >>    36:   cmp    %rbx,%r8
> >>    39:   je     0x000000000000003f // branch based on secret
> >>    3b:   imul   $0x61,%r8,%r8 // leak using port contention side channel
> >>    3f:   xor    %eax,%eax
> >>    41:   pop    %rbx
> >>    42:   leaveq
> >>    43:   retq
> >>
> >> Here I'm using a port contention side channel because storing the secret
> >> to the stack causes the verifier to insert an lfence for unrelated
> >> reasons (SSB mitigation) which would terminate the speculation.
> >>
> >> As Daniel already pointed out to me, data_end is even attacker
> >> controlled as one could send many packets of sufficient length to train
> >> the branch prediction into assuming data_end >= data will never be true.
> >> When the attacker then sends a packet with insufficient data, the
> >> Spectre v1 gadget leaks the chosen bit of some value that lies behind
> >> data_end.
> >>
> >> To make it clear that the problem is not the pointer comparison but the
> >> missing masking instruction, it can be useful to transform the code
> >> above into the following equivalent pseudocode:
> >>
> >> r2 = data
> >> r3 = data_end
> >> r6 = ... // index to access, constant does not help
> >> r7 = data_end - data // only known at runtime, could be [0,PKT_MAX)
> >> if !(r6 < r7) goto exit
> >> // no masking of index in r6 happens
> >> r2 += r6 // addr. to access
> >> r5 = *(u8 *)(r2 + 0) // speculatively read secret
> >> // ... leak secret as above
> >>
> >> One idea to resolve this while still allowing for unprivileged packet
> >> access would be to always allocate a power of 2 for the packet data and
> >> then also pass the respective index mask in the skb structure. The
> >> verifier would then have to check that this index mask is always applied
> >> to the offset before a packet pointer is dereferenced. This patch does
> >> not implement this extension, but only reverts [3].
> >
> > Hi Luis,
> >
> > The skb pointer comparison is a reasonable operation in a networking bpf prog.
> > If we just prohibit a reasonable operation to prevent a possible
> > spectre v1 attack, it looks a little weird, right ?
> > Should we figure out a real fix to prevent it ?
> >
>
> I see your point, but this has been the case since Spectre v1 was
> mitigated for BPF. I actually did a few statistics on that in [1] and
>  >50 out of ~350 programs are rejected because of the Spectre v1
> mitigations. However, to repeat: The operation is not completely
> prohibited, only prohibited without CAP_PERFMON.
>
> Maybe it would be possible to expose the allow_ptr_leaks/bpf_spec_vX
> flags in sysfs? It would be helpful for debugging, and you could set it
> to 1 permanently for your purposes. However, I'm not sure if the others
> would like that...

I really appreciate that idea. I actually shared a similar concept earlier.[1].
Nonetheless, I believe it would be prudent to align with the system
settings regarding CPU security mitigations within the BPF subsystem
as well. In our production environment, we consistently configure it
with "mitigations=off"[2] to enhance performance, essentially
deactivating all optional CPU mitigations. Consequently, if we
implement a system-wide "mitigations=off" setting, it should also
inherently bypass Spectre v1 and Spectre v4 in the BPF subsystem.

Alexei, Daniel, any comments ?

[1]. https://lore.kernel.org/bpf/CALOAHbDDT=paFEdTb1Jsqu7eGkFXAh6A+f21VTrMdAeq5F60kg@mail.gmail.com/
[2]. https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html

>
> Another solution I have been working on in [2] is to change the
> mitigations to insert an lfence instead of rejecting the program
> entirely. That would have bad performance, but would still be better
> than completely rejecting the program. However, these patches are far
> from going upstream currently.
>
> A detail: The patches in [2] currently do not support the case we are
> discussing here, they only insert fences when the speculative paths fail
> to verify.
>
> [1]
> https://sys.cs.fau.de/extern/person/gerhorst/23-03_fgbs-spring-2023-presentation.pdf
> - Slide 9
> [2]
> https://gitlab.cs.fau.de/un65esoq/linux/-/commits/v6.5-rc6-bpf-spectre-nospec/
>
> --
> Luis



--
Regards
Yafang
diff mbox series

Patch

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index bb78212fa5b2..b415a81149ed 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -14050,12 +14050,6 @@  static int check_cond_jmp_op(struct bpf_verifier_env *env,
 		return -EINVAL;
 	}
 
-	/* check src2 operand */
-	err = check_reg_arg(env, insn->dst_reg, SRC_OP);
-	if (err)
-		return err;
-
-	dst_reg = &regs[insn->dst_reg];
 	if (BPF_SRC(insn->code) == BPF_X) {
 		if (insn->imm != 0) {
 			verbose(env, "BPF_JMP/JMP32 uses reserved fields\n");
@@ -14067,13 +14061,12 @@  static int check_cond_jmp_op(struct bpf_verifier_env *env,
 		if (err)
 			return err;
 
-		src_reg = &regs[insn->src_reg];
-		if (!(reg_is_pkt_pointer_any(dst_reg) && reg_is_pkt_pointer_any(src_reg)) &&
-		    is_pointer_value(env, insn->src_reg)) {
+		if (is_pointer_value(env, insn->src_reg)) {
 			verbose(env, "R%d pointer comparison prohibited\n",
 				insn->src_reg);
 			return -EACCES;
 		}
+		src_reg = &regs[insn->src_reg];
 	} else {
 		if (insn->src_reg != BPF_REG_0) {
 			verbose(env, "BPF_JMP/JMP32 uses reserved fields\n");
@@ -14081,6 +14074,12 @@  static int check_cond_jmp_op(struct bpf_verifier_env *env,
 		}
 	}
 
+	/* check src2 operand */
+	err = check_reg_arg(env, insn->dst_reg, SRC_OP);
+	if (err)
+		return err;
+
+	dst_reg = &regs[insn->dst_reg];
 	is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32;
 
 	if (BPF_SRC(insn->code) == BPF_K) {