mbox series

[0/3] Add __builtin_load_no_speculate

Message ID cover.1515072356.git.Richard.Earnshaw@arm.com
Headers show
Series Add __builtin_load_no_speculate | expand

Message

Richard Earnshaw (lists) Jan. 4, 2018, 1:58 p.m. UTC
Recently, Google Project Zero disclosed several classes of attack
against speculative execution. One of these, known as variant-1
(CVE-2017-5753), allows explicit bounds checks to be bypassed under
speculation, providing an arbitrary read gadget. Further details can
be found on the GPZ blog [1] and the documentation that is included
with the first patch.

This patch set adds a new builtin function for GCC to provide a
mechanism for limiting speculation by a CPU after a bounds-checked
memory access.  I've tried to design this in such a way that it can be
used for any target where this might be necessary.  The patch set
provides a generic implementation of the builtin and then
target-specific support for Arm and AArch64.  Other architectures can
utilize the internal infrastructure as needed.

Most of the details of the builtin and the hooks that need to be
implemented to support it are described in the updates to the manual,
but a short summary is given below.

TYP __builtin_load_no_speculate
        (const volatile TYP *ptr,
         const volatile void *lower,
         const volatile void *upper,
         TYP failval,
         const volatile void *cmpptr)

Where TYP can be any integral type (signed or unsigned char, int,
short, long, etc) or any pointer type.

The builtin implements the following logical behaviour:

inline TYP __builtin_load_no_speculate
         (const volatile TYP *ptr,
          const volatile void *lower,
          const volatile void *upper,
          TYP failval,
          const volatile void *cmpptr)
{
  TYP result;

  if (cmpptr >= lower && cmpptr < upper)
    result = *ptr;
  else
    result = failval;
  return result;
}

in addition the specification of the builtin ensures that future
speculation using *ptr may only continue iff cmpptr lies within the
bounds specified.

Some optimizations are permitted to make the builtin easier to use.
The final two arguments can both be omitted (c++ style): failval will
default to 0 in this case and if cmpptr is omitted ptr will be used
for expansions of the range check.  In addition either lower or upper
(but not both) may be a literal NULL and the expansion will then
ignore that boundary condition when expanding.

The patch set is constructed as follows:
1 - generic modifications to GCC providing the builtin function for all
    architectures and expanding to an implementation that gives the
    logical behaviour of the builtin only.  A warning is generated if
    this expansion path is used that code will execute correctly but
    without providing protection against speculative use.
2 - AArch64 support
3 - AArch32 support (arm) for A32 and thumb2 states.

These patches can be used with the header file that Arm recently
published here: https://github.com/ARM-software/speculation-barrier.

Kernel patches are also being developed, eg:
https://lkml.org/lkml/2018/1/3/754.  The intent is that eventually
code like this will be able to use support directly from the compiler
in a portable manner.

Similar patches are also being developed for LLVM and will be posted
to their development lists shortly.

[1] More information on the topic can be found here:
https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
Arm specific information can be found here: https://www.arm.com/security-update



Richard Earnshaw (3):
  [builtins] Generic support for __builtin_load_no_speculate()
  [aarch64] Implement support for __builtin_load_no_speculate.
  [arm] Implement support for the de-speculation intrinsic

 gcc/builtin-types.def         |  16 +++++
 gcc/builtins.c                |  99 +++++++++++++++++++++++++
 gcc/builtins.def              |  22 ++++++
 gcc/c-family/c-common.c       | 164 ++++++++++++++++++++++++++++++++++++++++++
 gcc/c-family/c-cppbuiltin.c   |   5 +-
 gcc/config/aarch64/aarch64.c  |  92 ++++++++++++++++++++++++
 gcc/config/aarch64/aarch64.md |  28 ++++++++
 gcc/config/arm/arm.c          | 107 +++++++++++++++++++++++++++
 gcc/config/arm/arm.md         |  40 ++++++++++-
 gcc/config/arm/unspecs.md     |   1 +
 gcc/doc/cpp.texi              |   4 ++
 gcc/doc/extend.texi           |  53 ++++++++++++++
 gcc/doc/tm.texi               |   6 ++
 gcc/doc/tm.texi.in            |   2 +
 gcc/target.def                |  20 ++++++
 gcc/targhooks.c               |  69 ++++++++++++++++++
 gcc/targhooks.h               |   3 +
 17 files changed, 729 insertions(+), 2 deletions(-)

Comments

Joseph Myers Jan. 4, 2018, 6:20 p.m. UTC | #1
On Thu, 4 Jan 2018, Richard Earnshaw wrote:

> 1 - generic modifications to GCC providing the builtin function for all

>     architectures and expanding to an implementation that gives the

>     logical behaviour of the builtin only.  A warning is generated if

>     this expansion path is used that code will execute correctly but

>     without providing protection against speculative use.


Presumably it would make sense to have a standard way for architectures 
with no speculative execution to just use the generic version, but without 
the warning?  E.g., split up default_inhibit_load_speculation so that it 
generates the warning then calls another function with the same prototype, 
so such architectures can just define the hook to use that other function?

-- 
Joseph S. Myers
joseph@codesourcery.com
Richard Biener Jan. 5, 2018, 9:44 a.m. UTC | #2
On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
<Richard.Earnshaw@arm.com> wrote:
>

> Recently, Google Project Zero disclosed several classes of attack

> against speculative execution. One of these, known as variant-1

> (CVE-2017-5753), allows explicit bounds checks to be bypassed under

> speculation, providing an arbitrary read gadget. Further details can

> be found on the GPZ blog [1] and the documentation that is included

> with the first patch.

>

> This patch set adds a new builtin function for GCC to provide a

> mechanism for limiting speculation by a CPU after a bounds-checked

> memory access.  I've tried to design this in such a way that it can be

> used for any target where this might be necessary.  The patch set

> provides a generic implementation of the builtin and then

> target-specific support for Arm and AArch64.  Other architectures can

> utilize the internal infrastructure as needed.

>

> Most of the details of the builtin and the hooks that need to be

> implemented to support it are described in the updates to the manual,

> but a short summary is given below.

>

> TYP __builtin_load_no_speculate

>         (const volatile TYP *ptr,

>          const volatile void *lower,

>          const volatile void *upper,

>          TYP failval,

>          const volatile void *cmpptr)

>

> Where TYP can be any integral type (signed or unsigned char, int,

> short, long, etc) or any pointer type.

>

> The builtin implements the following logical behaviour:

>

> inline TYP __builtin_load_no_speculate

>          (const volatile TYP *ptr,

>           const volatile void *lower,

>           const volatile void *upper,

>           TYP failval,

>           const volatile void *cmpptr)

> {

>   TYP result;

>

>   if (cmpptr >= lower && cmpptr < upper)

>     result = *ptr;

>   else

>     result = failval;

>   return result;

> }

>

> in addition the specification of the builtin ensures that future

> speculation using *ptr may only continue iff cmpptr lies within the

> bounds specified.


I fail to see how the vulnerability doesn't affect aggregate copies
or bitfield accesses.  The vulnerability doesn't cause the loaded
value to leak but (part of) the address by populating the CPU cache
side-channel.

So why isn't this just

 T __builtin_load_no_speculate (T *);

?  Why that "fallback" and why the lower/upper bounds?

Did you talk with other compiler vendors (intel, llvm, ...?)?

Richard.

> Some optimizations are permitted to make the builtin easier to use.

> The final two arguments can both be omitted (c++ style): failval will

> default to 0 in this case and if cmpptr is omitted ptr will be used

> for expansions of the range check.  In addition either lower or upper

> (but not both) may be a literal NULL and the expansion will then

> ignore that boundary condition when expanding.

>

> The patch set is constructed as follows:

> 1 - generic modifications to GCC providing the builtin function for all

>     architectures and expanding to an implementation that gives the

>     logical behaviour of the builtin only.  A warning is generated if

>     this expansion path is used that code will execute correctly but

>     without providing protection against speculative use.

> 2 - AArch64 support

> 3 - AArch32 support (arm) for A32 and thumb2 states.

>

> These patches can be used with the header file that Arm recently

> published here: https://github.com/ARM-software/speculation-barrier.

>

> Kernel patches are also being developed, eg:

> https://lkml.org/lkml/2018/1/3/754.  The intent is that eventually

> code like this will be able to use support directly from the compiler

> in a portable manner.

>

> Similar patches are also being developed for LLVM and will be posted

> to their development lists shortly.

>

> [1] More information on the topic can be found here:

> https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html

> Arm specific information can be found here: https://www.arm.com/security-update

>

>

>

> Richard Earnshaw (3):

>   [builtins] Generic support for __builtin_load_no_speculate()

>   [aarch64] Implement support for __builtin_load_no_speculate.

>   [arm] Implement support for the de-speculation intrinsic

>

>  gcc/builtin-types.def         |  16 +++++

>  gcc/builtins.c                |  99 +++++++++++++++++++++++++

>  gcc/builtins.def              |  22 ++++++

>  gcc/c-family/c-common.c       | 164 ++++++++++++++++++++++++++++++++++++++++++

>  gcc/c-family/c-cppbuiltin.c   |   5 +-

>  gcc/config/aarch64/aarch64.c  |  92 ++++++++++++++++++++++++

>  gcc/config/aarch64/aarch64.md |  28 ++++++++

>  gcc/config/arm/arm.c          | 107 +++++++++++++++++++++++++++

>  gcc/config/arm/arm.md         |  40 ++++++++++-

>  gcc/config/arm/unspecs.md     |   1 +

>  gcc/doc/cpp.texi              |   4 ++

>  gcc/doc/extend.texi           |  53 ++++++++++++++

>  gcc/doc/tm.texi               |   6 ++

>  gcc/doc/tm.texi.in            |   2 +

>  gcc/target.def                |  20 ++++++

>  gcc/targhooks.c               |  69 ++++++++++++++++++

>  gcc/targhooks.h               |   3 +

>  17 files changed, 729 insertions(+), 2 deletions(-)

>

>

>
Richard Earnshaw (lists) Jan. 5, 2018, 10:40 a.m. UTC | #3
On 05/01/18 09:44, Richard Biener wrote:
> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw

> <Richard.Earnshaw@arm.com> wrote:

>>

>> Recently, Google Project Zero disclosed several classes of attack

>> against speculative execution. One of these, known as variant-1

>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under

>> speculation, providing an arbitrary read gadget. Further details can

>> be found on the GPZ blog [1] and the documentation that is included

>> with the first patch.

>>

>> This patch set adds a new builtin function for GCC to provide a

>> mechanism for limiting speculation by a CPU after a bounds-checked

>> memory access.  I've tried to design this in such a way that it can be

>> used for any target where this might be necessary.  The patch set

>> provides a generic implementation of the builtin and then

>> target-specific support for Arm and AArch64.  Other architectures can

>> utilize the internal infrastructure as needed.

>>

>> Most of the details of the builtin and the hooks that need to be

>> implemented to support it are described in the updates to the manual,

>> but a short summary is given below.

>>

>> TYP __builtin_load_no_speculate

>>         (const volatile TYP *ptr,

>>          const volatile void *lower,

>>          const volatile void *upper,

>>          TYP failval,

>>          const volatile void *cmpptr)

>>

>> Where TYP can be any integral type (signed or unsigned char, int,

>> short, long, etc) or any pointer type.

>>

>> The builtin implements the following logical behaviour:

>>

>> inline TYP __builtin_load_no_speculate

>>          (const volatile TYP *ptr,

>>           const volatile void *lower,

>>           const volatile void *upper,

>>           TYP failval,

>>           const volatile void *cmpptr)

>> {

>>   TYP result;

>>

>>   if (cmpptr >= lower && cmpptr < upper)

>>     result = *ptr;

>>   else

>>     result = failval;

>>   return result;

>> }

>>

>> in addition the specification of the builtin ensures that future

>> speculation using *ptr may only continue iff cmpptr lies within the

>> bounds specified.

> 

> I fail to see how the vulnerability doesn't affect aggregate copies

> or bitfield accesses.  The vulnerability doesn't cause the loaded

> value to leak but (part of) the address by populating the CPU cache

> side-channel.

> 


It's not quite as simple as that.  You'll need to read the white paper
on Arm's web site to get a full grasp of this (linked from
https://www.arm.com/security-update).

> So why isn't this just

> 

>  T __builtin_load_no_speculate (T *);

> 

> ?  Why that "fallback" and why the lower/upper bounds?


Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem
to restrict subsequent speculation, the CSEL needs a condition and the
compiler must not be able to optimize it away based on logical
reachability.  The fallback is used for the other operand of the CSEL
instruction.

> 

> Did you talk with other compiler vendors (intel, llvm, ...?)?


As stated we'll shortly be submitting similar patches for LLVM.

R.

> 

> Richard.

> 

>> Some optimizations are permitted to make the builtin easier to use.

>> The final two arguments can both be omitted (c++ style): failval will

>> default to 0 in this case and if cmpptr is omitted ptr will be used

>> for expansions of the range check.  In addition either lower or upper

>> (but not both) may be a literal NULL and the expansion will then

>> ignore that boundary condition when expanding.

>>

>> The patch set is constructed as follows:

>> 1 - generic modifications to GCC providing the builtin function for all

>>     architectures and expanding to an implementation that gives the

>>     logical behaviour of the builtin only.  A warning is generated if

>>     this expansion path is used that code will execute correctly but

>>     without providing protection against speculative use.

>> 2 - AArch64 support

>> 3 - AArch32 support (arm) for A32 and thumb2 states.

>>

>> These patches can be used with the header file that Arm recently

>> published here: https://github.com/ARM-software/speculation-barrier.

>>

>> Kernel patches are also being developed, eg:

>> https://lkml.org/lkml/2018/1/3/754.  The intent is that eventually

>> code like this will be able to use support directly from the compiler

>> in a portable manner.

>>

>> Similar patches are also being developed for LLVM and will be posted

>> to their development lists shortly.

>>

>> [1] More information on the topic can be found here:

>> https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html

>> Arm specific information can be found here: https://www.arm.com/security-update

>>

>>

>>

>> Richard Earnshaw (3):

>>   [builtins] Generic support for __builtin_load_no_speculate()

>>   [aarch64] Implement support for __builtin_load_no_speculate.

>>   [arm] Implement support for the de-speculation intrinsic

>>

>>  gcc/builtin-types.def         |  16 +++++

>>  gcc/builtins.c                |  99 +++++++++++++++++++++++++

>>  gcc/builtins.def              |  22 ++++++

>>  gcc/c-family/c-common.c       | 164 ++++++++++++++++++++++++++++++++++++++++++

>>  gcc/c-family/c-cppbuiltin.c   |   5 +-

>>  gcc/config/aarch64/aarch64.c  |  92 ++++++++++++++++++++++++

>>  gcc/config/aarch64/aarch64.md |  28 ++++++++

>>  gcc/config/arm/arm.c          | 107 +++++++++++++++++++++++++++

>>  gcc/config/arm/arm.md         |  40 ++++++++++-

>>  gcc/config/arm/unspecs.md     |   1 +

>>  gcc/doc/cpp.texi              |   4 ++

>>  gcc/doc/extend.texi           |  53 ++++++++++++++

>>  gcc/doc/tm.texi               |   6 ++

>>  gcc/doc/tm.texi.in            |   2 +

>>  gcc/target.def                |  20 ++++++

>>  gcc/targhooks.c               |  69 ++++++++++++++++++

>>  gcc/targhooks.h               |   3 +

>>  17 files changed, 729 insertions(+), 2 deletions(-)

>>

>>

>>
Richard Biener Jan. 5, 2018, 10:47 a.m. UTC | #4
On Fri, Jan 5, 2018 at 11:40 AM, Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
> On 05/01/18 09:44, Richard Biener wrote:

>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw

>> <Richard.Earnshaw@arm.com> wrote:

>>>

>>> Recently, Google Project Zero disclosed several classes of attack

>>> against speculative execution. One of these, known as variant-1

>>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under

>>> speculation, providing an arbitrary read gadget. Further details can

>>> be found on the GPZ blog [1] and the documentation that is included

>>> with the first patch.

>>>

>>> This patch set adds a new builtin function for GCC to provide a

>>> mechanism for limiting speculation by a CPU after a bounds-checked

>>> memory access.  I've tried to design this in such a way that it can be

>>> used for any target where this might be necessary.  The patch set

>>> provides a generic implementation of the builtin and then

>>> target-specific support for Arm and AArch64.  Other architectures can

>>> utilize the internal infrastructure as needed.

>>>

>>> Most of the details of the builtin and the hooks that need to be

>>> implemented to support it are described in the updates to the manual,

>>> but a short summary is given below.

>>>

>>> TYP __builtin_load_no_speculate

>>>         (const volatile TYP *ptr,

>>>          const volatile void *lower,

>>>          const volatile void *upper,

>>>          TYP failval,

>>>          const volatile void *cmpptr)

>>>

>>> Where TYP can be any integral type (signed or unsigned char, int,

>>> short, long, etc) or any pointer type.

>>>

>>> The builtin implements the following logical behaviour:

>>>

>>> inline TYP __builtin_load_no_speculate

>>>          (const volatile TYP *ptr,

>>>           const volatile void *lower,

>>>           const volatile void *upper,

>>>           TYP failval,

>>>           const volatile void *cmpptr)

>>> {

>>>   TYP result;

>>>

>>>   if (cmpptr >= lower && cmpptr < upper)

>>>     result = *ptr;

>>>   else

>>>     result = failval;

>>>   return result;

>>> }

>>>

>>> in addition the specification of the builtin ensures that future

>>> speculation using *ptr may only continue iff cmpptr lies within the

>>> bounds specified.

>>

>> I fail to see how the vulnerability doesn't affect aggregate copies

>> or bitfield accesses.  The vulnerability doesn't cause the loaded

>> value to leak but (part of) the address by populating the CPU cache

>> side-channel.

>>

>

> It's not quite as simple as that.  You'll need to read the white paper

> on Arm's web site to get a full grasp of this (linked from

> https://www.arm.com/security-update).

>

>> So why isn't this just

>>

>>  T __builtin_load_no_speculate (T *);

>>

>> ?  Why that "fallback" and why the lower/upper bounds?

>

> Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem

> to restrict subsequent speculation, the CSEL needs a condition and the

> compiler must not be able to optimize it away based on logical

> reachability.  The fallback is used for the other operand of the CSEL

> instruction.


But the condition could be just 'true' in the instruction encoding?  That is,
why do you do both the jump-around and the csel?  Maybe I misunderstood
the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
the compiler messing up things.

>>

>> Did you talk with other compiler vendors (intel, llvm, ...?)?

>

> As stated we'll shortly be submitting similar patches for LLVM.


How do you solve the aggregate load issue?  I would imagine
that speculating stores (by pre-fetching the affected cache line!)
could be another attack vector?  Not sure if any CPU speculatively
does that (but I see no good reason why a CPU shouldn't do that).

Does ARM have a barrier like instruction suitable for use in the
kernel patches that are floating around?

Richard.

> R.

>

>>

>> Richard.

>>

>>> Some optimizations are permitted to make the builtin easier to use.

>>> The final two arguments can both be omitted (c++ style): failval will

>>> default to 0 in this case and if cmpptr is omitted ptr will be used

>>> for expansions of the range check.  In addition either lower or upper

>>> (but not both) may be a literal NULL and the expansion will then

>>> ignore that boundary condition when expanding.

>>>

>>> The patch set is constructed as follows:

>>> 1 - generic modifications to GCC providing the builtin function for all

>>>     architectures and expanding to an implementation that gives the

>>>     logical behaviour of the builtin only.  A warning is generated if

>>>     this expansion path is used that code will execute correctly but

>>>     without providing protection against speculative use.

>>> 2 - AArch64 support

>>> 3 - AArch32 support (arm) for A32 and thumb2 states.

>>>

>>> These patches can be used with the header file that Arm recently

>>> published here: https://github.com/ARM-software/speculation-barrier.

>>>

>>> Kernel patches are also being developed, eg:

>>> https://lkml.org/lkml/2018/1/3/754.  The intent is that eventually

>>> code like this will be able to use support directly from the compiler

>>> in a portable manner.

>>>

>>> Similar patches are also being developed for LLVM and will be posted

>>> to their development lists shortly.

>>>

>>> [1] More information on the topic can be found here:

>>> https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html

>>> Arm specific information can be found here: https://www.arm.com/security-update

>>>

>>>

>>>

>>> Richard Earnshaw (3):

>>>   [builtins] Generic support for __builtin_load_no_speculate()

>>>   [aarch64] Implement support for __builtin_load_no_speculate.

>>>   [arm] Implement support for the de-speculation intrinsic

>>>

>>>  gcc/builtin-types.def         |  16 +++++

>>>  gcc/builtins.c                |  99 +++++++++++++++++++++++++

>>>  gcc/builtins.def              |  22 ++++++

>>>  gcc/c-family/c-common.c       | 164 ++++++++++++++++++++++++++++++++++++++++++

>>>  gcc/c-family/c-cppbuiltin.c   |   5 +-

>>>  gcc/config/aarch64/aarch64.c  |  92 ++++++++++++++++++++++++

>>>  gcc/config/aarch64/aarch64.md |  28 ++++++++

>>>  gcc/config/arm/arm.c          | 107 +++++++++++++++++++++++++++

>>>  gcc/config/arm/arm.md         |  40 ++++++++++-

>>>  gcc/config/arm/unspecs.md     |   1 +

>>>  gcc/doc/cpp.texi              |   4 ++

>>>  gcc/doc/extend.texi           |  53 ++++++++++++++

>>>  gcc/doc/tm.texi               |   6 ++

>>>  gcc/doc/tm.texi.in            |   2 +

>>>  gcc/target.def                |  20 ++++++

>>>  gcc/targhooks.c               |  69 ++++++++++++++++++

>>>  gcc/targhooks.h               |   3 +

>>>  17 files changed, 729 insertions(+), 2 deletions(-)

>>>

>>>

>>>

>
Richard Earnshaw (lists) Jan. 5, 2018, 10:59 a.m. UTC | #5
On 05/01/18 10:47, Richard Biener wrote:
> On Fri, Jan 5, 2018 at 11:40 AM, Richard Earnshaw (lists)

> <Richard.Earnshaw@arm.com> wrote:

>> On 05/01/18 09:44, Richard Biener wrote:

>>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw

>>> <Richard.Earnshaw@arm.com> wrote:

>>>>

>>>> Recently, Google Project Zero disclosed several classes of attack

>>>> against speculative execution. One of these, known as variant-1

>>>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under

>>>> speculation, providing an arbitrary read gadget. Further details can

>>>> be found on the GPZ blog [1] and the documentation that is included

>>>> with the first patch.

>>>>

>>>> This patch set adds a new builtin function for GCC to provide a

>>>> mechanism for limiting speculation by a CPU after a bounds-checked

>>>> memory access.  I've tried to design this in such a way that it can be

>>>> used for any target where this might be necessary.  The patch set

>>>> provides a generic implementation of the builtin and then

>>>> target-specific support for Arm and AArch64.  Other architectures can

>>>> utilize the internal infrastructure as needed.

>>>>

>>>> Most of the details of the builtin and the hooks that need to be

>>>> implemented to support it are described in the updates to the manual,

>>>> but a short summary is given below.

>>>>

>>>> TYP __builtin_load_no_speculate

>>>>         (const volatile TYP *ptr,

>>>>          const volatile void *lower,

>>>>          const volatile void *upper,

>>>>          TYP failval,

>>>>          const volatile void *cmpptr)

>>>>

>>>> Where TYP can be any integral type (signed or unsigned char, int,

>>>> short, long, etc) or any pointer type.

>>>>

>>>> The builtin implements the following logical behaviour:

>>>>

>>>> inline TYP __builtin_load_no_speculate

>>>>          (const volatile TYP *ptr,

>>>>           const volatile void *lower,

>>>>           const volatile void *upper,

>>>>           TYP failval,

>>>>           const volatile void *cmpptr)

>>>> {

>>>>   TYP result;

>>>>

>>>>   if (cmpptr >= lower && cmpptr < upper)

>>>>     result = *ptr;

>>>>   else

>>>>     result = failval;

>>>>   return result;

>>>> }

>>>>

>>>> in addition the specification of the builtin ensures that future

>>>> speculation using *ptr may only continue iff cmpptr lies within the

>>>> bounds specified.

>>>

>>> I fail to see how the vulnerability doesn't affect aggregate copies

>>> or bitfield accesses.  The vulnerability doesn't cause the loaded

>>> value to leak but (part of) the address by populating the CPU cache

>>> side-channel.

>>>

>>

>> It's not quite as simple as that.  You'll need to read the white paper

>> on Arm's web site to get a full grasp of this (linked from

>> https://www.arm.com/security-update).

>>

>>> So why isn't this just

>>>

>>>  T __builtin_load_no_speculate (T *);

>>>

>>> ?  Why that "fallback" and why the lower/upper bounds?

>>

>> Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem

>> to restrict subsequent speculation, the CSEL needs a condition and the

>> compiler must not be able to optimize it away based on logical

>> reachability.  The fallback is used for the other operand of the CSEL

>> instruction.

> 

> But the condition could be just 'true' in the instruction encoding?  That is,

> why do you do both the jump-around and the csel?  Maybe I misunderstood

> the RTL you appear to generate.  I'd have expected an UNSPEC to avoid

> the compiler messing up things.

> 


That's why the intrinsic takes explicit bounds not a simple bool
expressing the conditions.  It prevents the optimizers from replacing
the condition by value range propagation.  That does restrict the
flexibility of the builtin somewhat but it does allow it to work
correctly at all optimization levels.

>>>

>>> Did you talk with other compiler vendors (intel, llvm, ...?)?

>>

>> As stated we'll shortly be submitting similar patches for LLVM.

> 

> How do you solve the aggregate load issue?  I would imagine

> that speculating stores (by pre-fetching the affected cache line!)

> could be another attack vector?  Not sure if any CPU speculatively

> does that (but I see no good reason why a CPU shouldn't do that).

> 

> Does ARM have a barrier like instruction suitable for use in the

> kernel patches that are floating around?

> 

> Richard.

> 

>> R.

>>

>>>

>>> Richard.

>>>

>>>> Some optimizations are permitted to make the builtin easier to use.

>>>> The final two arguments can both be omitted (c++ style): failval will

>>>> default to 0 in this case and if cmpptr is omitted ptr will be used

>>>> for expansions of the range check.  In addition either lower or upper

>>>> (but not both) may be a literal NULL and the expansion will then

>>>> ignore that boundary condition when expanding.

>>>>

>>>> The patch set is constructed as follows:

>>>> 1 - generic modifications to GCC providing the builtin function for all

>>>>     architectures and expanding to an implementation that gives the

>>>>     logical behaviour of the builtin only.  A warning is generated if

>>>>     this expansion path is used that code will execute correctly but

>>>>     without providing protection against speculative use.

>>>> 2 - AArch64 support

>>>> 3 - AArch32 support (arm) for A32 and thumb2 states.

>>>>

>>>> These patches can be used with the header file that Arm recently

>>>> published here: https://github.com/ARM-software/speculation-barrier.

>>>>

>>>> Kernel patches are also being developed, eg:

>>>> https://lkml.org/lkml/2018/1/3/754.  The intent is that eventually

>>>> code like this will be able to use support directly from the compiler

>>>> in a portable manner.

>>>>

>>>> Similar patches are also being developed for LLVM and will be posted

>>>> to their development lists shortly.

>>>>

>>>> [1] More information on the topic can be found here:

>>>> https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html

>>>> Arm specific information can be found here: https://www.arm.com/security-update

>>>>

>>>>

>>>>

>>>> Richard Earnshaw (3):

>>>>   [builtins] Generic support for __builtin_load_no_speculate()

>>>>   [aarch64] Implement support for __builtin_load_no_speculate.

>>>>   [arm] Implement support for the de-speculation intrinsic

>>>>

>>>>  gcc/builtin-types.def         |  16 +++++

>>>>  gcc/builtins.c                |  99 +++++++++++++++++++++++++

>>>>  gcc/builtins.def              |  22 ++++++

>>>>  gcc/c-family/c-common.c       | 164 ++++++++++++++++++++++++++++++++++++++++++

>>>>  gcc/c-family/c-cppbuiltin.c   |   5 +-

>>>>  gcc/config/aarch64/aarch64.c  |  92 ++++++++++++++++++++++++

>>>>  gcc/config/aarch64/aarch64.md |  28 ++++++++

>>>>  gcc/config/arm/arm.c          | 107 +++++++++++++++++++++++++++

>>>>  gcc/config/arm/arm.md         |  40 ++++++++++-

>>>>  gcc/config/arm/unspecs.md     |   1 +

>>>>  gcc/doc/cpp.texi              |   4 ++

>>>>  gcc/doc/extend.texi           |  53 ++++++++++++++

>>>>  gcc/doc/tm.texi               |   6 ++

>>>>  gcc/doc/tm.texi.in            |   2 +

>>>>  gcc/target.def                |  20 ++++++

>>>>  gcc/targhooks.c               |  69 ++++++++++++++++++

>>>>  gcc/targhooks.h               |   3 +

>>>>  17 files changed, 729 insertions(+), 2 deletions(-)

>>>>

>>>>

>>>>

>>
Jakub Jelinek Jan. 5, 2018, 11:35 a.m. UTC | #6
On Fri, Jan 05, 2018 at 10:59:21AM +0000, Richard Earnshaw (lists) wrote:
> > But the condition could be just 'true' in the instruction encoding?  That is,

> > why do you do both the jump-around and the csel?  Maybe I misunderstood

> > the RTL you appear to generate.  I'd have expected an UNSPEC to avoid

> > the compiler messing up things.

> > 

> 

> That's why the intrinsic takes explicit bounds not a simple bool

> expressing the conditions.  It prevents the optimizers from replacing

> the condition by value range propagation.  That does restrict the


Preventing optimizers from replacing the condition can be done in many ways,
IFN, UNSPEC, ...
The question is if you really need it at the assembly level, or if you can
just do an unconditional branch or branch conditional on say comparison of
two constants, without any dynamic bounds.

	Jakub
Alexander Monakov Jan. 5, 2018, 11:37 a.m. UTC | #7
I believe the proposed behavior of the new builtin is over-specialized.
In principle the following snippet may be open to exploitation too:

  if (predicate)
    foo = arr[(secret >> untrusted) & 64];

assuming the attacker has a means to observe which part of 'arr' is brought into
cache, but cannot set 'predicate' to true (so has to rely on the speculative
population of the cache); and likewise if a store is predicated-off rather than
a load.

So shouldn't, for generality, the new builtin work "as if" by cleansing the
address rather than the result of the load, like the following? It would also be
applicable to stores then.

On Thu, 4 Jan 2018, Richard Earnshaw wrote:
> inline TYP __builtin_load_no_speculate

>          (const volatile TYP *ptr,

>           const volatile void *lower,

>           const volatile void *upper,

>           TYP failval,

>           const volatile void *cmpptr)

> {

>   TYP result;

> 

>   if (cmpptr >= lower && cmpptr < upper)

>     result = *ptr;

>   else

>     result = failval;

>   return result;

> }


{
  if (!(cmpptr >= lower && cmpptr < upper))
    ptr = NULL;

  return *ptr;
}

Alexander
Richard Earnshaw (lists) Jan. 5, 2018, 11:53 a.m. UTC | #8
On 05/01/18 11:37, Alexander Monakov wrote:
> I believe the proposed behavior of the new builtin is over-specialized.

> In principle the following snippet may be open to exploitation too:

> 

>   if (predicate)

>     foo = arr[(secret >> untrusted) & 64];

> 

> assuming the attacker has a means to observe which part of 'arr' is

> brought into

> cache, but cannot set 'predicate' to true (so has to rely on the speculative

> population of the cache); and likewise if a store is predicated-off

> rather than

> a load.

> 

> So shouldn't, for generality, the new builtin work "as if" by cleansing the

> address rather than the result of the load, like the following? It would

> also be

> applicable to stores then.


This is quite tricky.  For ARM we have to have a speculated load.  So
one way to recast this might be to push 'untrusted' into a stack slot
that was then speculatively loaded back by the builtin.  To do that
you'd need to find a way of expressing predicate as a range check, so
something like

  if (predicate)
   {
     foo = arr[(secret >> __builtin_load_no_speculate (&untrusted,
				predicate_base, predicate_upper, 0,
				predicate_val)) & 64)];
   }
You could use similar techniques with stores as well.

I'm open to suggestions as to how we might express the conditions
better.  It's certainly the least pleasant part of this proposal, but
the seemingly obvious '_Bool condition' parameter won't work because the
compiler will just substitute true in too many situations where it
thinks it has proved that it knows the conditions.

R.


> 

> On Thu, 4 Jan 2018, Richard Earnshaw wrote:

>> inline TYP __builtin_load_no_speculate

>>          (const volatile TYP *ptr,

>>           const volatile void *lower,

>>           const volatile void *upper,

>>           TYP failval,

>>           const volatile void *cmpptr)

>> {

>>   TYP result;

>> 

>>   if (cmpptr >= lower && cmpptr < upper)

>>     result = *ptr;

>>   else

>>     result = failval;

>>   return result;

>> }

> 

> {

>   if (!(cmpptr >= lower && cmpptr < upper))

>     ptr = NULL;

> 

>   return *ptr;

> }

> 

> Alexander
Richard Earnshaw (lists) Jan. 5, 2018, 11:57 a.m. UTC | #9
On 05/01/18 11:35, Jakub Jelinek wrote:
> On Fri, Jan 05, 2018 at 10:59:21AM +0000, Richard Earnshaw (lists) wrote:

>>> But the condition could be just 'true' in the instruction encoding?  That is,

>>> why do you do both the jump-around and the csel?  Maybe I misunderstood

>>> the RTL you appear to generate.  I'd have expected an UNSPEC to avoid

>>> the compiler messing up things.

>>>

>>

>> That's why the intrinsic takes explicit bounds not a simple bool

>> expressing the conditions.  It prevents the optimizers from replacing

>> the condition by value range propagation.  That does restrict the

> 

> Preventing optimizers from replacing the condition can be done in many ways,

> IFN, UNSPEC, ...

> The question is if you really need it at the assembly level, or if you can

> just do an unconditional branch or branch conditional on say comparison of

> two constants, without any dynamic bounds.

> 

> 	Jakub

> 


The bounds /have/ to reflect the real speculation condition.  Otherwise
the CSEL becomes ineffective.

R.
Alexander Monakov Jan. 5, 2018, 1:08 p.m. UTC | #10
On Fri, 5 Jan 2018, Richard Earnshaw (lists) wrote:
> This is quite tricky.  For ARM we have to have a speculated load.


Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence
wouldn't work (applying CSEL to the address rather than loaded value), and
if it wouldn't, then ARM-specific lowering of the builtin can handle that
anyhow, right? (by spilling the pointer)

(on x86 the current Intel's recommendation is to emit LFENCE prior to the load)

Is the main issue expressing the CSEL condition in the source code? Perhaps it is
possible to introduce

  int guard = __builtin_nontransparent(predicate);

  if (predicate)
    foo = __builtin_load_no_speculate(&arr[addr], guard);

... or maybe even

  if (predicate)
    foo = arr[__builtin_loadspecbarrier(addr, guard)];

where internally __builtin_nontransparent is the same as

  guard = predicate;
  asm volatile("" : "+g"(guard));

although admittedly this is not perfect since it forces evaluation of 'guard'
before the branch.

Alexander
Jeff Law Jan. 5, 2018, 4:37 p.m. UTC | #11
On 01/05/2018 02:44 AM, Richard Biener wrote:
> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw

> <Richard.Earnshaw@arm.com> wrote:

>>

>> Recently, Google Project Zero disclosed several classes of attack

>> against speculative execution. One of these, known as variant-1

>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under

>> speculation, providing an arbitrary read gadget. Further details can

>> be found on the GPZ blog [1] and the documentation that is included

>> with the first patch.

>>

>> This patch set adds a new builtin function for GCC to provide a

>> mechanism for limiting speculation by a CPU after a bounds-checked

>> memory access.  I've tried to design this in such a way that it can be

>> used for any target where this might be necessary.  The patch set

>> provides a generic implementation of the builtin and then

>> target-specific support for Arm and AArch64.  Other architectures can

>> utilize the internal infrastructure as needed.

>>

>> Most of the details of the builtin and the hooks that need to be

>> implemented to support it are described in the updates to the manual,

>> but a short summary is given below.

>>

>> TYP __builtin_load_no_speculate

>>         (const volatile TYP *ptr,

>>          const volatile void *lower,

>>          const volatile void *upper,

>>          TYP failval,

>>          const volatile void *cmpptr)

>>

>> Where TYP can be any integral type (signed or unsigned char, int,

>> short, long, etc) or any pointer type.

>>

>> The builtin implements the following logical behaviour:

>>

>> inline TYP __builtin_load_no_speculate

>>          (const volatile TYP *ptr,

>>           const volatile void *lower,

>>           const volatile void *upper,

>>           TYP failval,

>>           const volatile void *cmpptr)

>> {

>>   TYP result;

>>

>>   if (cmpptr >= lower && cmpptr < upper)

>>     result = *ptr;

>>   else

>>     result = failval;

>>   return result;

>> }

>>

>> in addition the specification of the builtin ensures that future

>> speculation using *ptr may only continue iff cmpptr lies within the

>> bounds specified.

> 

> I fail to see how the vulnerability doesn't affect aggregate copies

> or bitfield accesses.  The vulnerability doesn't cause the loaded

> value to leak but (part of) the address by populating the CPU cache

> side-channel.

> 

> So why isn't this just

> 

>  T __builtin_load_no_speculate (T *);

> 

> ?  Why that "fallback" and why the lower/upper bounds?

> 

> Did you talk with other compiler vendors (intel, llvm, ...?)?

I think "fallback" could potentially be any value, I don't think it's
actual value matters much -- I could just as easily be "0" across the
board or some indeterminate value (as long as the indeterminate doesn't
itself present an information leak).

The bounds are used to build a condition for the csel to select between
the loaded value and the fail value.  You need some way to select
between them.

If I've got anything wrong, I'm sure Richard E. will correct me.

Jeff
Jeff Law Jan. 5, 2018, 4:41 p.m. UTC | #12
On 01/05/2018 03:40 AM, Richard Earnshaw (lists) wrote:
> On 05/01/18 09:44, Richard Biener wrote:

>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw

>> <Richard.Earnshaw@arm.com> wrote:

>>>

>>> Recently, Google Project Zero disclosed several classes of attack

>>> against speculative execution. One of these, known as variant-1

>>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under

>>> speculation, providing an arbitrary read gadget. Further details can

>>> be found on the GPZ blog [1] and the documentation that is included

>>> with the first patch.

>>>

>>> This patch set adds a new builtin function for GCC to provide a

>>> mechanism for limiting speculation by a CPU after a bounds-checked

>>> memory access.  I've tried to design this in such a way that it can be

>>> used for any target where this might be necessary.  The patch set

>>> provides a generic implementation of the builtin and then

>>> target-specific support for Arm and AArch64.  Other architectures can

>>> utilize the internal infrastructure as needed.

>>>

>>> Most of the details of the builtin and the hooks that need to be

>>> implemented to support it are described in the updates to the manual,

>>> but a short summary is given below.

>>>

>>> TYP __builtin_load_no_speculate

>>>         (const volatile TYP *ptr,

>>>          const volatile void *lower,

>>>          const volatile void *upper,

>>>          TYP failval,

>>>          const volatile void *cmpptr)

>>>

>>> Where TYP can be any integral type (signed or unsigned char, int,

>>> short, long, etc) or any pointer type.

>>>

>>> The builtin implements the following logical behaviour:

>>>

>>> inline TYP __builtin_load_no_speculate

>>>          (const volatile TYP *ptr,

>>>           const volatile void *lower,

>>>           const volatile void *upper,

>>>           TYP failval,

>>>           const volatile void *cmpptr)

>>> {

>>>   TYP result;

>>>

>>>   if (cmpptr >= lower && cmpptr < upper)

>>>     result = *ptr;

>>>   else

>>>     result = failval;

>>>   return result;

>>> }

>>>

>>> in addition the specification of the builtin ensures that future

>>> speculation using *ptr may only continue iff cmpptr lies within the

>>> bounds specified.

>>

>> I fail to see how the vulnerability doesn't affect aggregate copies

>> or bitfield accesses.  The vulnerability doesn't cause the loaded

>> value to leak but (part of) the address by populating the CPU cache

>> side-channel.

>>

> 

> It's not quite as simple as that.  You'll need to read the white paper

> on Arm's web site to get a full grasp of this (linked from

> https://www.arm.com/security-update).

I actually recommend reading the original papers referenced by Google in
addition to the ARM whitepaper.  This stuff is far from intuitive.  I've
been loosely involved for a month or so, but it really didn't start to
click for me until right before the holiday break.



jeff
Jeff Law Jan. 5, 2018, 4:44 p.m. UTC | #13
On 01/05/2018 04:57 AM, Richard Earnshaw (lists) wrote:
> On 05/01/18 11:35, Jakub Jelinek wrote:

>> On Fri, Jan 05, 2018 at 10:59:21AM +0000, Richard Earnshaw (lists) wrote:

>>>> But the condition could be just 'true' in the instruction encoding?  That is,

>>>> why do you do both the jump-around and the csel?  Maybe I misunderstood

>>>> the RTL you appear to generate.  I'd have expected an UNSPEC to avoid

>>>> the compiler messing up things.

>>>>

>>>

>>> That's why the intrinsic takes explicit bounds not a simple bool

>>> expressing the conditions.  It prevents the optimizers from replacing

>>> the condition by value range propagation.  That does restrict the

>>

>> Preventing optimizers from replacing the condition can be done in many ways,

>> IFN, UNSPEC, ...

>> The question is if you really need it at the assembly level, or if you can

>> just do an unconditional branch or branch conditional on say comparison of

>> two constants, without any dynamic bounds.

>>

>> 	Jakub

>>

> 

> The bounds /have/ to reflect the real speculation condition.  Otherwise

> the CSEL becomes ineffective.

I think this is probably the key that Jakub and Richard B. were missing.
 It certainly hadn't clicked for me when we spoke earlier.

Jeff
Jeff Law Jan. 5, 2018, 4:50 p.m. UTC | #14
On 01/05/2018 03:47 AM, Richard Biener wrote:
> On Fri, Jan 5, 2018 at 11:40 AM, Richard Earnshaw (lists)

> <Richard.Earnshaw@arm.com> wrote:

>> On 05/01/18 09:44, Richard Biener wrote:

>>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw

>>> <Richard.Earnshaw@arm.com> wrote:

>>>>

>>>> Recently, Google Project Zero disclosed several classes of attack

>>>> against speculative execution. One of these, known as variant-1

>>>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under

>>>> speculation, providing an arbitrary read gadget. Further details can

>>>> be found on the GPZ blog [1] and the documentation that is included

>>>> with the first patch.

>>>>

>>>> This patch set adds a new builtin function for GCC to provide a

>>>> mechanism for limiting speculation by a CPU after a bounds-checked

>>>> memory access.  I've tried to design this in such a way that it can be

>>>> used for any target where this might be necessary.  The patch set

>>>> provides a generic implementation of the builtin and then

>>>> target-specific support for Arm and AArch64.  Other architectures can

>>>> utilize the internal infrastructure as needed.

>>>>

>>>> Most of the details of the builtin and the hooks that need to be

>>>> implemented to support it are described in the updates to the manual,

>>>> but a short summary is given below.

>>>>

>>>> TYP __builtin_load_no_speculate

>>>>         (const volatile TYP *ptr,

>>>>          const volatile void *lower,

>>>>          const volatile void *upper,

>>>>          TYP failval,

>>>>          const volatile void *cmpptr)

>>>>

>>>> Where TYP can be any integral type (signed or unsigned char, int,

>>>> short, long, etc) or any pointer type.

>>>>

>>>> The builtin implements the following logical behaviour:

>>>>

>>>> inline TYP __builtin_load_no_speculate

>>>>          (const volatile TYP *ptr,

>>>>           const volatile void *lower,

>>>>           const volatile void *upper,

>>>>           TYP failval,

>>>>           const volatile void *cmpptr)

>>>> {

>>>>   TYP result;

>>>>

>>>>   if (cmpptr >= lower && cmpptr < upper)

>>>>     result = *ptr;

>>>>   else

>>>>     result = failval;

>>>>   return result;

>>>> }

>>>>

>>>> in addition the specification of the builtin ensures that future

>>>> speculation using *ptr may only continue iff cmpptr lies within the

>>>> bounds specified.

>>>

>>> I fail to see how the vulnerability doesn't affect aggregate copies

>>> or bitfield accesses.  The vulnerability doesn't cause the loaded

>>> value to leak but (part of) the address by populating the CPU cache

>>> side-channel.

>>>

>>

>> It's not quite as simple as that.  You'll need to read the white paper

>> on Arm's web site to get a full grasp of this (linked from

>> https://www.arm.com/security-update).

>>

>>> So why isn't this just

>>>

>>>  T __builtin_load_no_speculate (T *);

>>>

>>> ?  Why that "fallback" and why the lower/upper bounds?

>>

>> Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem

>> to restrict subsequent speculation, the CSEL needs a condition and the

>> compiler must not be able to optimize it away based on logical

>> reachability.  The fallback is used for the other operand of the CSEL

>> instruction.

> 

> But the condition could be just 'true' in the instruction encoding?  That is,

> why do you do both the jump-around and the csel?  Maybe I misunderstood

> the RTL you appear to generate.  I'd have expected an UNSPEC to avoid

> the compiler messing up things.

> 

>>>

>>> Did you talk with other compiler vendors (intel, llvm, ...?)?

>>

>> As stated we'll shortly be submitting similar patches for LLVM.

> 

> How do you solve the aggregate load issue?  I would imagine

> that speculating stores (by pre-fetching the affected cache line!)

> could be another attack vector?  Not sure if any CPU speculatively

> does that (but I see no good reason why a CPU shouldn't do that).

Without going into details, I've speculated to the appropriate folks
that there's other vectors for these kinds of attacks, some more
exploitable than others.  Attacking the cache via loads/stores is just
one instance of a class of problems IMHO.

I fully expect we'll be fighting this class of problems for a while.
This is just the tip of the iceberg.

Jeff
Jeff Law Jan. 5, 2018, 5:24 p.m. UTC | #15
On 01/04/2018 06:58 AM, Richard Earnshaw wrote:
> 

> Recently, Google Project Zero disclosed several classes of attack

> against speculative execution. One of these, known as variant-1

> (CVE-2017-5753), allows explicit bounds checks to be bypassed under

> speculation, providing an arbitrary read gadget. Further details can

> be found on the GPZ blog [1] and the documentation that is included

> with the first patch.

So I think it's important for anyone reading this stuff to read
Richard's paragraph carefully --  "an arbitrary read gadget".

I fully expect that over the course of time we're going to see other
arbitrary read gadgets to be found.  Those gadgets may have lower
bandwidth, have a higher degree of jitter or be harder to exploit, but
they're out there just waiting to be discovered.

So I don't expect this to be the only mitigation we have to put into place.

Anyway...


> 

> Some optimizations are permitted to make the builtin easier to use.

> The final two arguments can both be omitted (c++ style): failval will

> default to 0 in this case and if cmpptr is omitted ptr will be used

> for expansions of the range check.  In addition either lower or upper

> (but not both) may be a literal NULL and the expansion will then

> ignore that boundary condition when expanding.

So what are the cases where FAILVAL is useful rather than just using
zero (or some other constant) all the time?

Similarly under what conditions would one want to use CMPPTR rather than
PTR?

I wandered down through the LKML thread but didn't see anything which
would shed light on those two questions.

jeff
>
Jeff Law Jan. 5, 2018, 5:26 p.m. UTC | #16
On 01/04/2018 11:20 AM, Joseph Myers wrote:
> On Thu, 4 Jan 2018, Richard Earnshaw wrote:

> 

>> 1 - generic modifications to GCC providing the builtin function for all

>>     architectures and expanding to an implementation that gives the

>>     logical behaviour of the builtin only.  A warning is generated if

>>     this expansion path is used that code will execute correctly but

>>     without providing protection against speculative use.

> 

> Presumably it would make sense to have a standard way for architectures 

> with no speculative execution to just use the generic version, but without 

> the warning?  E.g., split up default_inhibit_load_speculation so that it 

> generates the warning then calls another function with the same prototype, 

> so such architectures can just define the hook to use that other function?

> 

Seems wise.  There's still architectures that don't speculate or don't
speculate enough to trigger these problems.

Jeff
Richard Earnshaw (lists) Jan. 9, 2018, 10:47 a.m. UTC | #17
On 05/01/18 13:08, Alexander Monakov wrote:
> On Fri, 5 Jan 2018, Richard Earnshaw (lists) wrote:

>> This is quite tricky.  For ARM we have to have a speculated load.

> 

> Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence

> wouldn't work (applying CSEL to the address rather than loaded value), and

> if it wouldn't, then ARM-specific lowering of the builtin can handle that

> anyhow, right? (by spilling the pointer)


The load has to feed /in/ to the csel/csdb sequence, not come after it.

> 

> (on x86 the current Intel's recommendation is to emit LFENCE prior to the load)


That can be supported in the way you expand the builtin.  The builtin
expander is given a (MEM (ptr)) , but it's up to the back-end where to
put that in the expanded sequence to materialize the load, so you could
write (sorry, don't know x86 asm very well, but I think this is how
you'd put it)

	lfence
	mov	(ptr), dest

with branches around that as appropriate to support the remainder of the
builtin's behaviour.

> Is the main issue expressing the CSEL condition in the source code? Perhaps it is

> possible to introduce

> 

>   int guard = __builtin_nontransparent(predicate);

> 

>   if (predicate)

>     foo = __builtin_load_no_speculate(&arr[addr], guard);

> 

> ... or maybe even

> 

>   if (predicate)

>     foo = arr[__builtin_loadspecbarrier(addr, guard)];

> 

> where internally __builtin_nontransparent is the same as

> 

>   guard = predicate;

>   asm volatile("" : "+g"(guard));

> 

> although admittedly this is not perfect since it forces evaluation of 'guard'

> before the branch.


As I explained to Bernd last night, I think this is likely be unsafe.
If there's some control path before __builtin_nontransparent that allows
'predicate' to be simplified (eg by value range propagation), then your
guard doesn't protect against the speculation that you think it does.
Changing all the optimizers to guarantee that wouldn't happen (and
guaranteeing that all future optimizers won't introduce new problems of
that nature) is, I suspect, very non-trivial.

R.

> 

> Alexander

>
Richard Earnshaw (lists) Jan. 9, 2018, 11:40 a.m. UTC | #18
On 05/01/18 17:24, Jeff Law wrote:
> On 01/04/2018 06:58 AM, Richard Earnshaw wrote:

>>

>> Recently, Google Project Zero disclosed several classes of attack

>> against speculative execution. One of these, known as variant-1

>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under

>> speculation, providing an arbitrary read gadget. Further details can

>> be found on the GPZ blog [1] and the documentation that is included

>> with the first patch.

> So I think it's important for anyone reading this stuff to read

> Richard's paragraph carefully --  "an arbitrary read gadget".

> 

> I fully expect that over the course of time we're going to see other

> arbitrary read gadgets to be found.  Those gadgets may have lower

> bandwidth, have a higher degree of jitter or be harder to exploit, but

> they're out there just waiting to be discovered.

> 

> So I don't expect this to be the only mitigation we have to put into place.

> 

> Anyway...

> 

> 

>>

>> Some optimizations are permitted to make the builtin easier to use.

>> The final two arguments can both be omitted (c++ style): failval will

>> default to 0 in this case and if cmpptr is omitted ptr will be used

>> for expansions of the range check.  In addition either lower or upper

>> (but not both) may be a literal NULL and the expansion will then

>> ignore that boundary condition when expanding.

> So what are the cases where FAILVAL is useful rather than just using

> zero (or some other constant) all the time?


So some background first.  My initial attempts to define a builtin were
based entirely around trying to specify the builtin without out any hard
functional behaviour as well.  The specification was a mess.  Things
just became so much easier to specify when I moved to defining a logical
behaviour on top of the intended side effects on speculation.  Having
done that it seemed sensible to permit the user to use the builtin in
more creative ways by allowing it to select the failure value.  The idea
was that code such as

  if (ptr >= base && ptr < limit) // bounds check
    return *ptr;
  return FAILVAL;

could simply be rewritten as

  return __builtin_load_no_speculate (ptr, base, limit, FAILVAL);

and now you don't have to worry about writing the condition out twice or
any other such nonsense.  In this case the builtin does exactly what you
want it to do.  (It's also easier now to write test cases that check the
correctness of the builtin's expansion, since you can test for a
behaviour of the code's execution, even if you can't check the
speculation behaviour properly.)



> 

> Similarly under what conditions would one want to use CMPPTR rather than

> PTR?


This was at the request of some kernel folk.  They have some cases where
CMPPTR may be a pointer to an atomic type and want to do something like

  if (cmpptr >= lower && cmpptr < upper)
    val = __atomic_read_and_inc (cmpptr);

The atomics are complicated enough already and rewriting them all to
additionally inhibit speculation based on the result would be a
nightmare.  By separating out cmpptr from ptr you can now write

  if (cmpptr >= lower && cmpptr < upper)
    {
      TYPE tmp_val = __atomic_read_and_inc (cmpptr);
      val = builtin_load_no_speculate (&tmp_val, lower, upper, 0,
                                       cmpptr);
    }

It's meaningless in this case to check the bounds of tmp_val - it's just
a value pushed onto the stack in order to satisfy the requirement that
we need a load that is being speculatively executed; but we can still
test against the speculation conditions, which are still the range check
for cmpptr.

There may be other similar cases as well where you have an object that
you want to clean up that is somehow derived from cmpptr but is not
cmpptr itself.  You do have to be more careful with the extended form,
since it is possible to write sequences that don't inhibit speculation
in the way you might think they do, but with greater flexibility also
comes greater risk.  I don't think there's ever a problem if ptr is
somehow derived from dereferencing cmpptr.

R.

> 

> I wandered down through the LKML thread but didn't see anything which

> would shed light on those two questions.

> 

> jeff

>>
Alexander Monakov Jan. 9, 2018, 1:27 p.m. UTC | #19
On Tue, 9 Jan 2018, Richard Earnshaw (lists) wrote:
> > Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence

> > wouldn't work (applying CSEL to the address rather than loaded value), and

> > if it wouldn't, then ARM-specific lowering of the builtin can handle that

> > anyhow, right? (by spilling the pointer)

> 

> The load has to feed /in/ to the csel/csdb sequence, not come after it.


Again, I'm sorry, but I have to insist that what you're saying here contradicts
the documentation linked from https://developer.arm.com/support/security-update
The PDF currently says, in "Details of the CSDB barrier":

    Until the barrier completes:
    1) For any load, store, data or instruction preload, RW2, appearing in
    program order *after the barrier* [...]

    2) For any indirect branch (B2), appearing in program order
    *after the barrier* [...]

    [...] the speculative execution of RW2/B2 does not influence the
    allocations of entries in a cache [...]

It doesn't say anything about the behavior of CSDB being dependent on the loads
encountered prior to it.  (and imho it doesn't make sense for a hardware
implementation to work that way)

> As I explained to Bernd last night, I think this is likely be unsafe.

> If there's some control path before __builtin_nontransparent that allows

> 'predicate' to be simplified (eg by value range propagation), then your

> guard doesn't protect against the speculation that you think it does.

> Changing all the optimizers to guarantee that wouldn't happen (and

> guaranteeing that all future optimizers won't introduce new problems of

> that nature) is, I suspect, very non-trivial.


But note that in that case the compiler could have performed the same
simplification in the original code as well, emitting straight-line machine code
lacking speculatively executable parts in the first place.

Alexander
Richard Earnshaw (lists) Jan. 9, 2018, 1:40 p.m. UTC | #20
On 09/01/18 13:27, Alexander Monakov wrote:
> On Tue, 9 Jan 2018, Richard Earnshaw (lists) wrote:

>> > Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence

>> > wouldn't work (applying CSEL to the address rather than loaded value), and

>> > if it wouldn't, then ARM-specific lowering of the builtin can handle that

>> > anyhow, right? (by spilling the pointer)

>> 

>> The load has to feed /in/ to the csel/csdb sequence, not come after it.

> 

> Again, I'm sorry, but I have to insist that what you're saying here

> contradicts

> the documentation linked from

> https://developer.arm.com/support/security-update

> The PDF currently says, in "Details of the CSDB barrier":

> 

>     Until the barrier completes:

>     1) For any load, store, data or instruction preload, RW2, appearing in

>     program order *after the barrier* [...]

> 

>     2) For any indirect branch (B2), appearing in program order

>     *after the barrier* [...]

> 

>     [...] the speculative execution of RW2/B2 does not influence the

>     allocations of entries in a cache [...]

> 

> It doesn't say anything about the behavior of CSDB being dependent on

> the loads

> encountered prior to it.  (and imho it doesn't make sense for a hardware

> implementation to work that way)


Read condition 1 i) again.


i) the conditional select instruction has a register data dependency on
a load R1, that has been executed speculatively, for one of its input
registers, and

To be executed speculatively it must be *after* a predicted operation
that has not yet resolved.  Once it has resolved it is no-longer
speculative, it's committed (one way or another).

R.

> 

>> As I explained to Bernd last night, I think this is likely be unsafe.

>> If there's some control path before __builtin_nontransparent that allows

>> 'predicate' to be simplified (eg by value range propagation), then your

>> guard doesn't protect against the speculation that you think it does.

>> Changing all the optimizers to guarantee that wouldn't happen (and

>> guaranteeing that all future optimizers won't introduce new problems of

>> that nature) is, I suspect, very non-trivial.

> 

> But note that in that case the compiler could have performed the same

> simplification in the original code as well, emitting straight-line

> machine code

> lacking speculatively executable parts in the first place.

> 

> Alexander
Alexander Monakov Jan. 9, 2018, 2:32 p.m. UTC | #21
On Tue, 9 Jan 2018, Richard Earnshaw (lists) wrote:
> Read condition 1 i) again.

> 

> 

> i) the conditional select instruction has a register data dependency on

> a load R1, that has been executed speculatively, for one of its input

> registers, and


Sorry - I don't know how I missed that. I understand the motivation for the
behavior of the new built-in now; the CSDB specification is surprising, but
that is off-topic I guess. Seems quite unfortunate that CSDB is constraining
the generic built-in like this, though.

Thank you for clearing this up.

I suggest that the user documentation in extend.texi should explicitly call out
that the load itself still may be speculatively executed - only its consumers
can be expected to be fenced off. For example, after

+but in addition target-specific code will be inserted to ensure that
+speculation using @code{*ptr} cannot occur when @var{cmpptr} lies outside of
+the specified bounds.

it can go on to say e.g.,

  However note that the load of @code{*ptr} itself and the conditional branch
  leading to it may still be subject to speculative execution (and the load may
  have observable effects on the cache as a result).

Thanks.
Alexander
Richard Earnshaw (lists) Jan. 9, 2018, 3:29 p.m. UTC | #22
On 05/01/18 17:26, Jeff Law wrote:
> On 01/04/2018 11:20 AM, Joseph Myers wrote:

>> On Thu, 4 Jan 2018, Richard Earnshaw wrote:

>>

>>> 1 - generic modifications to GCC providing the builtin function for all

>>>     architectures and expanding to an implementation that gives the

>>>     logical behaviour of the builtin only.  A warning is generated if

>>>     this expansion path is used that code will execute correctly but

>>>     without providing protection against speculative use.

>>

>> Presumably it would make sense to have a standard way for architectures 

>> with no speculative execution to just use the generic version, but without 

>> the warning?  E.g., split up default_inhibit_load_speculation so that it 

>> generates the warning then calls another function with the same prototype, 

>> so such architectures can just define the hook to use that other function?

>>

> Seems wise.  There's still architectures that don't speculate or don't

> speculate enough to trigger these problems.

> 

> Jeff

> 


I'm in two minds about that.  There are certainly cases where you might
want to use the generic expansion as part of handling a case where you
have a more standard speculation barrier; in those cases you might want
to emit your barrier and then let the generic code expand and provide
the logical behaviour of the builtin.

However, if you don't have a barrier you need to ask yourself, will my
architecture ever have an implementation that does do speculation?
Unless you can be certain that it won't, you probably need to be warned
that some code the programmer thinks might be vulnerable to spectre has
not been compiled with that protection, otherwise if you run that code
on a later implementation that does speculate, it might be vulnerable.

Let me give an example, we use the generic code expansion if we
encounter the builtin when generating code for Thumb on pre-ARMv7
devices.  We don't have instructions in 'thumb1' to guard against
speculation and we really want to warn users that they've done this (it
might be a mistake in how they're invoking the compiler).

I could add an extra parameter to the helper (bool warn_unimplemented),
which would be true if called directly from the expanders, but would now
allow back-ends to implement a trivial variant that just suppressed the
warning.  They could also then use the generic expansion if all that was
needed was a subsequent fence instruction.

R.
Jeff Law Jan. 10, 2018, 11:48 p.m. UTC | #23
On 01/09/2018 03:47 AM, Richard Earnshaw (lists) wrote:
> On 05/01/18 13:08, Alexander Monakov wrote:

>> On Fri, 5 Jan 2018, Richard Earnshaw (lists) wrote:

>>> This is quite tricky.  For ARM we have to have a speculated load.

>>

>> Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence

>> wouldn't work (applying CSEL to the address rather than loaded value), and

>> if it wouldn't, then ARM-specific lowering of the builtin can handle that

>> anyhow, right? (by spilling the pointer)

> 

> The load has to feed /in/ to the csel/csdb sequence, not come after it.

> 

>>

>> (on x86 the current Intel's recommendation is to emit LFENCE prior to the load)

> 

> That can be supported in the way you expand the builtin.  The builtin

> expander is given a (MEM (ptr)) , but it's up to the back-end where to

> put that in the expanded sequence to materialize the load, so you could

> write (sorry, don't know x86 asm very well, but I think this is how

> you'd put it)

> 

> 	lfence

> 	mov	(ptr), dest

> 

> with branches around that as appropriate to support the remainder of the

> builtin's behaviour.

I think the argument is going to be that they don't want the branches
around to support the other test + failval semantics.  Essentially the
same position as IBM has with PPC.

> 

>> Is the main issue expressing the CSEL condition in the source code? Perhaps it is

>> possible to introduce

>>

>>   int guard = __builtin_nontransparent(predicate);

>>

>>   if (predicate)

>>     foo = __builtin_load_no_speculate(&arr[addr], guard);

>>

>> ... or maybe even

>>

>>   if (predicate)

>>     foo = arr[__builtin_loadspecbarrier(addr, guard)];

>>

>> where internally __builtin_nontransparent is the same as

>>

>>   guard = predicate;

>>   asm volatile("" : "+g"(guard));

>>

>> although admittedly this is not perfect since it forces evaluation of 'guard'

>> before the branch.

> 

> As I explained to Bernd last night, I think this is likely be unsafe.

> If there's some control path before __builtin_nontransparent that allows

> 'predicate' to be simplified (eg by value range propagation), then your

> guard doesn't protect against the speculation that you think it does.

> Changing all the optimizers to guarantee that wouldn't happen (and

> guaranteeing that all future optimizers won't introduce new problems of

> that nature) is, I suspect, very non-trivial.

Agreed.  Whatever PREDICATE happens to be, the compiler is going to go
through extreme measures to try and collapse PREDICATE down to a
compile-time constant, including splitting paths to the point where
PREDICATE is used in the conditional so that on one side it's constant
and the other it's non-constant.  It seems like this approach is likely
to be compromised by the optimizers.


Jeff
Jeff Law Jan. 10, 2018, 11:56 p.m. UTC | #24
On 01/09/2018 08:29 AM, Richard Earnshaw (lists) wrote:
> I'm in two minds about that.  There are certainly cases where you might

> want to use the generic expansion as part of handling a case where you

> have a more standard speculation barrier; in those cases you might want

> to emit your barrier and then let the generic code expand and provide

> the logical behaviour of the builtin.

> 

> However, if you don't have a barrier you need to ask yourself, will my

> architecture ever have an implementation that does do speculation?

> Unless you can be certain that it won't, you probably need to be warned

> that some code the programmer thinks might be vulnerable to spectre has

> not been compiled with that protection, otherwise if you run that code

> on a later implementation that does speculate, it might be vulnerable.

> 

> Let me give an example, we use the generic code expansion if we

> encounter the builtin when generating code for Thumb on pre-ARMv7

> devices.  We don't have instructions in 'thumb1' to guard against

> speculation and we really want to warn users that they've done this (it

> might be a mistake in how they're invoking the compiler).

> 

> I could add an extra parameter to the helper (bool warn_unimplemented),

> which would be true if called directly from the expanders, but would now

> allow back-ends to implement a trivial variant that just suppressed the

> warning.  They could also then use the generic expansion if all that was

> needed was a subsequent fence instruction.

Yea, I see your side as well on this -- advisable or not I suspect we're
going to see folks saying "my embedded chip doesn't have these problems,
I don't want to pay any of these costs" and I don't want to be warned
about a problem I know can't happen (today).

Anyway, I think relatively speaking this is minor compared to the stuff
we're discussing around the semantics of the builtin.  I can live with
iterating on this aspect based on feedback.

jeff