mbox series

[v3,00/29] riscv control-flow integrity for usermode

Message ID 20240403234054.2020347-1-debug@rivosinc.com
Headers show
Series riscv control-flow integrity for usermode | expand

Message

Deepak Gupta April 3, 2024, 11:34 p.m. UTC
Sending out v3 for cpu assisted riscv user mode control flow integrity.

v2 [9] was sent a week ago for this riscv usermode control flow integrity
enabling. RFC patchset was (v1) early this year (January) [7].

changes in v3
--------------
envcfg:
logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been
picked on per task basis, even though CPU didn't implement it. Fixed in
this series.

dt-bindings:
As suggested, split into separate commit. fixed the messaging that spec is
in public review

arch_is_shadow_stack change:
arch_is_shadow_stack changed to vma_is_shadow_stack

hwprobe:
zicfiss / zicfilp if present will get enumerated in hwprobe

selftests:
As suggested, added object and binary filenames to .gitignore
Selftest binary anyways need to be compiled with cfi enabled compiler which
will make sure that landing pad and shadow stack are enabled. Thus removed
separate enable/disable tests. Cleaned up tests a bit.

changes in v2
---------------
As part of testing effort, compiled a rootfs with shadow stack and landing
pad enabled (libraries and binaries) and booted to shell. As part of long
running tests, I have been able to run some spec 2006 benchmarks [8] (here
link is provided only for list of benchmarks that were tested for long
running tests, excel sheet provided here actually is for some static stats
like code size growth on spec binaries). Thus converting from RFC to
regular patchset.

Securing control-flow integrity for usermode requires following

    - Securing forward control flow : All callsites must reach
      reach a target that they actually intend to reach.

    - Securing backward control flow : All function returns must
      return to location where they were called from.

This patch series use riscv cpu extension `zicfilp` [2] to secure forward
control flow and `zicfiss` [2] to secure backward control flow. `zicfilp`
enforces that all indirect calls or jmps must land on a landing pad instr
and label embedded in landing pad instr must match a value programmed in
`x7` register (at callsite via compiler). `zicfiss` introduces shadow stack
which can only be writeable via shadow stack instructions (sspush and
ssamoswap) and thus can't be tampered with via inadvertent stores. More
details about extension can be read from [2] and there are details in
documentation as well (in this patch series).

Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow
integrity for user mode programs can be compiled in the kernel.

Enabling of control flow integrity for user programs is left to user runtime
(specifically expected from dynamic loader). There has been a lot of earlier
discussion on the enabling topic around x86 shadow stack enabling [3, 4, 5] and
overall consensus had been to let dynamic loader (or usermode) to decide for
enabling the feature.

This patch series introduces arch agnostic `prctls` to enable shadow stack
and indirect branch tracking. And implements them on riscv. arm64 is expected
to implement shadow stack part of these arch agnostic `prctls` [6]

Changes since last time
***********************

Spec changes
------------
- Forward cfi spec has become much simpler. `lpad` instruction is pseudo for
  `auipc rd, <20bit_imm>`. `lpad` checks x7 against 20bit embedded in instr.
  Thus label width is 20bit.

- Shadow stack management instructions are reduced to
    sspush - to push x1/x5 on shadow stack
    sspopchk - pops from shadow stack and comapres with x1/x5.
    ssamoswap - atomically swap value on shadow stack.
    rdssp - reads current shadow stack pointer

- Shadow stack accesses on readonly memory always raise AMO/store page fault.
  `sspopchk` is load but if underlying page is readonly, it'll raise a store
  page fault. It simplifies hardware and kernel for COW handling for shadow
  stack pages.

- riscv defines a new exception type `software check exception` and control flow
  violations raise software check exception.

- enabling controls for shadow stack and landing are in xenvcfg CSR and controls
  lower privilege mode enabling. As an example senvcfg controls enabling for U and
  menvcfg controls enabling for S mode.

core mm shadow stack enabling
-----------------------------
Shadow stack for x86 usermode are now in mainline and thus this patch
series builds on top of that for arch-agnostic mm related changes. Big
thanks and shout out to Rick Edgecombe for that.

selftests
---------
Created some minimal selftests to test the patch series.


[1] - https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/
[2] - https://github.com/riscv/riscv-cfi
[3] - https://lore.kernel.org/lkml/ZWHcBq0bJ+15eeKs@finisterre.sirena.org.uk/T/#mb121cd8b33d564e64234595a0ec52211479cf474
[4] - https://lore.kernel.org/all/20220130211838.8382-1-rick.p.edgecombe@intel.com/
[5] - https://lore.kernel.org/lkml/CAHk-=wgP5mk3poVeejw16Asbid0ghDt4okHnWaWKLBkRhQntRA@mail.gmail.com/
[6] - https://lore.kernel.org/linux-mm/20231122-arm64-gcs-v7-2-201c483bd775@kernel.org/
[7] - https://lore.kernel.org/lkml/20240125062739.1339782-1-debug@rivosinc.com/
[8] - https://docs.google.com/spreadsheets/d/1_cHGH4ctNVvFRiS7hW9dEGKtXLAJ3aX4Z_iTSa3Tw2U/edit#gid=0
[9] - https://lore.kernel.org/lkml/20240329044459.3990638-1-debug@rivosinc.com/

Comments

David Hildenbrand April 4, 2024, 7:02 p.m. UTC | #1
On 04.04.24 01:34, Deepak Gupta wrote:
> VM_SHADOW_STACK (alias to VM_HIGH_ARCH_5) to encode shadow stack VMA.
> 
> This patch changes checks of VM_SHADOW_STACK flag in generic code to call
> to a function `vma_is_shadow_stack` which will return true if its a
> shadow stack vma and default stub (when support doesnt exist) returns false.
> 
> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
> Suggested-by: Mike Rapoport <rppt@kernel.org>
> ---
>   include/linux/mm.h | 13 ++++++++++++-
>   mm/gup.c           |  5 +++--
>   mm/internal.h      |  2 +-
>   3 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 64109f6c70f5..9952937be659 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -363,8 +363,19 @@ extern unsigned int kobjsize(const void *objp);
>   
>   #ifndef VM_SHADOW_STACK
>   # define VM_SHADOW_STACK	VM_NONE
> +
> +static inline bool vma_is_shadow_stack(vm_flags_t vm_flags)
> +{
> +	return false;
> +}
> +#else
> +static inline bool vma_is_shadow_stack(vm_flags_t vm_flags)
> +{
> +	return (vm_flags & VM_SHADOW_STACK);
> +}
>   #endif

You can simply do outside the ifdef

static inline bool vma_is_shadow_stack(vm_flags_t vm_flags)
{
	return !!(vm_flags & VM_SHADOW_STACK);
}

This will work even when VM_SHADOW_STACK is defined to be VM_NONE.

>   
> +

unrelated code change

>   #if defined(CONFIG_X86)
>   # define VM_PAT		VM_ARCH_1	/* PAT reserves whole VMA at once (x86) */
>   #elif defined(CONFIG_PPC)
> @@ -3473,7 +3484,7 @@ static inline unsigned long stack_guard_start_gap(struct vm_area_struct *vma)
>   		return stack_guard_gap;
>   
>   	/* See reasoning around the VM_SHADOW_STACK definition */
> -	if (vma->vm_flags & VM_SHADOW_STACK)
> +	if (vma->vm_flags && vma_is_shadow_stack(vma->vm_flags))

Pretty sure:

if (vma_is_shadow_stack(vma->vm_flags))

>   		return PAGE_SIZE;
>   
>   	return 0;
> diff --git a/mm/gup.c b/mm/gup.c
> index df83182ec72d..a7a02eb0a6b3 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1053,7 +1053,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
>   		    !writable_file_mapping_allowed(vma, gup_flags))
>   			return -EFAULT;
>   
> -		if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) {
> +		if (!(vm_flags & VM_WRITE) || vma_is_shadow_stack(vm_flags)) {
>   			if (!(gup_flags & FOLL_FORCE))
>   				return -EFAULT;
>   			/* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */
> @@ -1071,7 +1071,8 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
>   			if (!is_cow_mapping(vm_flags))
>   				return -EFAULT;
>   		}
> -	} else if (!(vm_flags & VM_READ)) {
> +	} else if (!(vm_flags & VM_READ) && !vma_is_shadow_stack(vm_flags)) {
> +	/* reads allowed if its shadow stack vma */
>   		if (!(gup_flags & FOLL_FORCE))
>   			return -EFAULT;
>   		/*

Unless I am missing something, this is not a simple cleanup. It should 
go into a separate patch with a clearly documented reason for that change.
Andy Chiu May 9, 2024, midnight UTC | #2
Hi Deepak,

On Thu, Apr 4, 2024 at 7:41 AM Deepak Gupta <debug@rivosinc.com> wrote:
>
> This patch adds support for detecting zicfiss and zicfilp. zicfiss and
> zicfilp stands for unprivleged integer spec extension for shadow stack
> and branch tracking on indirect branches, respectively.
>
> This patch looks for zicfiss and zicfilp in device tree and accordinlgy
> lights up bit in cpu feature bitmap. Furthermore this patch adds detection
> utility functions to return whether shadow stack or landing pads are
> supported by cpu.
>
> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
> ---
>  arch/riscv/include/asm/cpufeature.h | 13 +++++++++++++
>  arch/riscv/include/asm/hwcap.h      |  2 ++
>  arch/riscv/include/asm/processor.h  |  1 +
>  arch/riscv/kernel/cpufeature.c      |  2 ++
>  4 files changed, 18 insertions(+)
>
> diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
> index 0bd11862b760..f0fb8d8ae273 100644
> --- a/arch/riscv/include/asm/cpufeature.h
> +++ b/arch/riscv/include/asm/cpufeature.h
> @@ -8,6 +8,7 @@
>
>  #include <linux/bitmap.h>
>  #include <linux/jump_label.h>
> +#include <linux/smp.h>
>  #include <asm/hwcap.h>
>  #include <asm/alternative-macros.h>
>  #include <asm/errno.h>
> @@ -137,4 +138,16 @@ static __always_inline bool riscv_cpu_has_extension_unlikely(int cpu, const unsi
>
>  DECLARE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
>
> +static inline bool cpu_supports_shadow_stack(void)
> +{
> +       return (IS_ENABLED(CONFIG_RISCV_USER_CFI) &&
> +                   riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICFISS));
> +}
> +
> +static inline bool cpu_supports_indirect_br_lp_instr(void)
> +{
> +       return (IS_ENABLED(CONFIG_RISCV_USER_CFI) &&
> +                   riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICFILP));
> +}
> +
>  #endif
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index 1f2d2599c655..74b6c727f545 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -80,6 +80,8 @@
>  #define RISCV_ISA_EXT_ZFA              71
>  #define RISCV_ISA_EXT_ZTSO             72
>  #define RISCV_ISA_EXT_ZACAS            73
nit: two tabs for alignment


> +#define RISCV_ISA_EXT_ZICFILP  74
> +#define RISCV_ISA_EXT_ZICFISS  75
>
>  #define RISCV_ISA_EXT_XLINUXENVCFG     127
>
> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> index a8509cc31ab2..6c5b3d928b12 100644
> --- a/arch/riscv/include/asm/processor.h
> +++ b/arch/riscv/include/asm/processor.h
> @@ -13,6 +13,7 @@
>  #include <vdso/processor.h>
>
>  #include <asm/ptrace.h>
> +#include <asm/hwcap.h>
>
>  #ifdef CONFIG_64BIT
>  #define DEFAULT_MAP_WINDOW     (UL(1) << (MMAP_VA_BITS - 1))
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 79a5a35fab96..d052cad5b82f 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -263,6 +263,8 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
>         __RISCV_ISA_EXT_DATA(h, RISCV_ISA_EXT_h),
>         __RISCV_ISA_EXT_SUPERSET(zicbom, RISCV_ISA_EXT_ZICBOM, riscv_xlinuxenvcfg_exts),
>         __RISCV_ISA_EXT_SUPERSET(zicboz, RISCV_ISA_EXT_ZICBOZ, riscv_xlinuxenvcfg_exts),
> +       __RISCV_ISA_EXT_SUPERSET(zicfilp, RISCV_ISA_EXT_ZICFILP, riscv_xlinuxenvcfg_exts),
> +       __RISCV_ISA_EXT_SUPERSET(zicfiss, RISCV_ISA_EXT_ZICFISS, riscv_xlinuxenvcfg_exts),
>         __RISCV_ISA_EXT_DATA(zicntr, RISCV_ISA_EXT_ZICNTR),
>         __RISCV_ISA_EXT_DATA(zicond, RISCV_ISA_EXT_ZICOND),
>         __RISCV_ISA_EXT_DATA(zicsr, RISCV_ISA_EXT_ZICSR),
> --
> 2.43.2
>

Thanks,
Andy
Charlie Jenkins May 9, 2024, 12:07 a.m. UTC | #3
On Thu, May 09, 2024 at 08:00:00AM +0800, Andy Chiu wrote:
> Hi Deepak,
> 
> On Thu, Apr 4, 2024 at 7:41 AM Deepak Gupta <debug@rivosinc.com> wrote:
> >
> > This patch adds support for detecting zicfiss and zicfilp. zicfiss and
> > zicfilp stands for unprivleged integer spec extension for shadow stack
> > and branch tracking on indirect branches, respectively.
> >
> > This patch looks for zicfiss and zicfilp in device tree and accordinlgy
> > lights up bit in cpu feature bitmap. Furthermore this patch adds detection
> > utility functions to return whether shadow stack or landing pads are
> > supported by cpu.
> >
> > Signed-off-by: Deepak Gupta <debug@rivosinc.com>
> > ---
> >  arch/riscv/include/asm/cpufeature.h | 13 +++++++++++++
> >  arch/riscv/include/asm/hwcap.h      |  2 ++
> >  arch/riscv/include/asm/processor.h  |  1 +
> >  arch/riscv/kernel/cpufeature.c      |  2 ++
> >  4 files changed, 18 insertions(+)
> >
> > diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
> > index 0bd11862b760..f0fb8d8ae273 100644
> > --- a/arch/riscv/include/asm/cpufeature.h
> > +++ b/arch/riscv/include/asm/cpufeature.h
> > @@ -8,6 +8,7 @@
> >
> >  #include <linux/bitmap.h>
> >  #include <linux/jump_label.h>
> > +#include <linux/smp.h>
> >  #include <asm/hwcap.h>
> >  #include <asm/alternative-macros.h>
> >  #include <asm/errno.h>
> > @@ -137,4 +138,16 @@ static __always_inline bool riscv_cpu_has_extension_unlikely(int cpu, const unsi
> >
> >  DECLARE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
> >
> > +static inline bool cpu_supports_shadow_stack(void)
> > +{
> > +       return (IS_ENABLED(CONFIG_RISCV_USER_CFI) &&
> > +                   riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICFISS));
> > +}
> > +
> > +static inline bool cpu_supports_indirect_br_lp_instr(void)
> > +{
> > +       return (IS_ENABLED(CONFIG_RISCV_USER_CFI) &&
> > +                   riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICFILP));
> > +}
> > +
> >  #endif
> > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> > index 1f2d2599c655..74b6c727f545 100644
> > --- a/arch/riscv/include/asm/hwcap.h
> > +++ b/arch/riscv/include/asm/hwcap.h
> > @@ -80,6 +80,8 @@
> >  #define RISCV_ISA_EXT_ZFA              71
> >  #define RISCV_ISA_EXT_ZTSO             72
> >  #define RISCV_ISA_EXT_ZACAS            73
> nit: two tabs for alignment
> 

Deepak, I think you might be using tabs with a display size of 4 spaces
that causes a couple of places to have incorrect alignment but would
look correct with 4 spaces. Linux uses 8 spaces for tabs.

- Charlie

> 
> > +#define RISCV_ISA_EXT_ZICFILP  74
> > +#define RISCV_ISA_EXT_ZICFISS  75
> >
> >  #define RISCV_ISA_EXT_XLINUXENVCFG     127
> >
> > diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> > index a8509cc31ab2..6c5b3d928b12 100644
> > --- a/arch/riscv/include/asm/processor.h
> > +++ b/arch/riscv/include/asm/processor.h
> > @@ -13,6 +13,7 @@
> >  #include <vdso/processor.h>
> >
> >  #include <asm/ptrace.h>
> > +#include <asm/hwcap.h>
> >
> >  #ifdef CONFIG_64BIT
> >  #define DEFAULT_MAP_WINDOW     (UL(1) << (MMAP_VA_BITS - 1))
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index 79a5a35fab96..d052cad5b82f 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -263,6 +263,8 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
> >         __RISCV_ISA_EXT_DATA(h, RISCV_ISA_EXT_h),
> >         __RISCV_ISA_EXT_SUPERSET(zicbom, RISCV_ISA_EXT_ZICBOM, riscv_xlinuxenvcfg_exts),
> >         __RISCV_ISA_EXT_SUPERSET(zicboz, RISCV_ISA_EXT_ZICBOZ, riscv_xlinuxenvcfg_exts),
> > +       __RISCV_ISA_EXT_SUPERSET(zicfilp, RISCV_ISA_EXT_ZICFILP, riscv_xlinuxenvcfg_exts),
> > +       __RISCV_ISA_EXT_SUPERSET(zicfiss, RISCV_ISA_EXT_ZICFISS, riscv_xlinuxenvcfg_exts),
> >         __RISCV_ISA_EXT_DATA(zicntr, RISCV_ISA_EXT_ZICNTR),
> >         __RISCV_ISA_EXT_DATA(zicond, RISCV_ISA_EXT_ZICOND),
> >         __RISCV_ISA_EXT_DATA(zicsr, RISCV_ISA_EXT_ZICSR),
> > --
> > 2.43.2
> >
> 
> Thanks,
> Andy
Charlie Jenkins May 9, 2024, 12:10 a.m. UTC | #4
On Wed, Apr 03, 2024 at 04:34:49PM -0700, Deepak Gupta wrote:
> envcfg CSR defines enabling bits for cache management instructions and
> soon will control enabling for control flow integrity and pointer
> masking features.
> 
> Control flow integrity enabling for forward cfi and backward cfi are
> controlled via envcfg and thus need to be enabled on per thread basis.
> 
> This patch creates a place holder for envcfg CSR in `thread_info` and
> adds logic to save and restore on task switching.
> 
> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
> ---
>  arch/riscv/include/asm/switch_to.h   | 10 ++++++++++
>  arch/riscv/include/asm/thread_info.h |  1 +
>  2 files changed, 11 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> index 7efdb0584d47..2d9a00a30394 100644
> --- a/arch/riscv/include/asm/switch_to.h
> +++ b/arch/riscv/include/asm/switch_to.h
> @@ -69,6 +69,15 @@ static __always_inline bool has_fpu(void) { return false; }
>  #define __switch_to_fpu(__prev, __next) do { } while (0)
>  #endif
>  
> +static inline void __switch_to_envcfg(struct task_struct *next)
> +{
> +	register unsigned long envcfg = next->thread_info.envcfg;

This doesn't need the register storage class.

> +
> +	asm volatile (ALTERNATIVE("nop", "csrw " __stringify(CSR_ENVCFG) ", %0", 0,
> +							  RISCV_ISA_EXT_XLINUXENVCFG, 1)
> +							  :: "r" (envcfg) : "memory");
> +}
> +

Something like:

static inline void __switch_to_envcfg(struct task_struct *next)
{
	if (riscv_has_extension_unlikely(RISCV_ISA_EXT_XLINUXENVCFG))
		csr_write(CSR_ENVCFG, next->thread_info.envcfg);
}

would be easier to read, but the alternative you have written doesn't
have the jump that riscv_has_extension_unlikely has so what you have
will be more performant.

Does envcfg need to be save/restored always or just with
CONFIG_RISCV_USER_CFI?

- Charlie

>  extern struct task_struct *__switch_to(struct task_struct *,
>  				       struct task_struct *);
>  
> @@ -80,6 +89,7 @@ do {							\
>  		__switch_to_fpu(__prev, __next);	\
>  	if (has_vector())					\
>  		__switch_to_vector(__prev, __next);	\
> +	__switch_to_envcfg(__next);				\
>  	((last) = __switch_to(__prev, __next));		\
>  } while (0)
>  
> diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
> index 5d473343634b..a503bdc2f6dd 100644
> --- a/arch/riscv/include/asm/thread_info.h
> +++ b/arch/riscv/include/asm/thread_info.h
> @@ -56,6 +56,7 @@ struct thread_info {
>  	long			user_sp;	/* User stack pointer */
>  	int			cpu;
>  	unsigned long		syscall_work;	/* SYSCALL_WORK_ flags */
> +	unsigned long envcfg;
>  #ifdef CONFIG_SHADOW_CALL_STACK
>  	void			*scs_base;
>  	void			*scs_sp;
> -- 
> 2.43.2
>
Charlie Jenkins May 9, 2024, 12:33 a.m. UTC | #5
On Wed, Apr 03, 2024 at 04:34:48PM -0700, Deepak Gupta wrote:
> Sending out v3 for cpu assisted riscv user mode control flow integrity.
> 
> v2 [9] was sent a week ago for this riscv usermode control flow integrity
> enabling. RFC patchset was (v1) early this year (January) [7].
> 
> changes in v3
> --------------
> envcfg:
> logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been
> picked on per task basis, even though CPU didn't implement it. Fixed in
> this series.
> 
> dt-bindings:
> As suggested, split into separate commit. fixed the messaging that spec is
> in public review
> 
> arch_is_shadow_stack change:
> arch_is_shadow_stack changed to vma_is_shadow_stack
> 
> hwprobe:
> zicfiss / zicfilp if present will get enumerated in hwprobe
> 
> selftests:
> As suggested, added object and binary filenames to .gitignore
> Selftest binary anyways need to be compiled with cfi enabled compiler which
> will make sure that landing pad and shadow stack are enabled. Thus removed
> separate enable/disable tests. Cleaned up tests a bit.
> 
> changes in v2
> ---------------
> As part of testing effort, compiled a rootfs with shadow stack and landing
> pad enabled (libraries and binaries) and booted to shell. As part of long
> running tests, I have been able to run some spec 2006 benchmarks [8] (here
> link is provided only for list of benchmarks that were tested for long
> running tests, excel sheet provided here actually is for some static stats
> like code size growth on spec binaries). Thus converting from RFC to
> regular patchset.
> 
> Securing control-flow integrity for usermode requires following
> 
>     - Securing forward control flow : All callsites must reach
>       reach a target that they actually intend to reach.
> 
>     - Securing backward control flow : All function returns must
>       return to location where they were called from.
> 
> This patch series use riscv cpu extension `zicfilp` [2] to secure forward
> control flow and `zicfiss` [2] to secure backward control flow. `zicfilp`
> enforces that all indirect calls or jmps must land on a landing pad instr
> and label embedded in landing pad instr must match a value programmed in
> `x7` register (at callsite via compiler). `zicfiss` introduces shadow stack
> which can only be writeable via shadow stack instructions (sspush and
> ssamoswap) and thus can't be tampered with via inadvertent stores. More
> details about extension can be read from [2] and there are details in
> documentation as well (in this patch series).
> 
> Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow
> integrity for user mode programs can be compiled in the kernel.
> 
> Enabling of control flow integrity for user programs is left to user runtime
> (specifically expected from dynamic loader). There has been a lot of earlier
> discussion on the enabling topic around x86 shadow stack enabling [3, 4, 5] and
> overall consensus had been to let dynamic loader (or usermode) to decide for
> enabling the feature.
> 
> This patch series introduces arch agnostic `prctls` to enable shadow stack
> and indirect branch tracking. And implements them on riscv. arm64 is expected
> to implement shadow stack part of these arch agnostic `prctls` [6]
> 
> Changes since last time
> ***********************
> 
> Spec changes
> ------------
> - Forward cfi spec has become much simpler. `lpad` instruction is pseudo for
>   `auipc rd, <20bit_imm>`. `lpad` checks x7 against 20bit embedded in instr.
>   Thus label width is 20bit.
> 
> - Shadow stack management instructions are reduced to
>     sspush - to push x1/x5 on shadow stack
>     sspopchk - pops from shadow stack and comapres with x1/x5.
>     ssamoswap - atomically swap value on shadow stack.
>     rdssp - reads current shadow stack pointer
> 
> - Shadow stack accesses on readonly memory always raise AMO/store page fault.
>   `sspopchk` is load but if underlying page is readonly, it'll raise a store
>   page fault. It simplifies hardware and kernel for COW handling for shadow
>   stack pages.
> 
> - riscv defines a new exception type `software check exception` and control flow
>   violations raise software check exception.
> 
> - enabling controls for shadow stack and landing are in xenvcfg CSR and controls
>   lower privilege mode enabling. As an example senvcfg controls enabling for U and
>   menvcfg controls enabling for S mode.
> 
> core mm shadow stack enabling
> -----------------------------
> Shadow stack for x86 usermode are now in mainline and thus this patch
> series builds on top of that for arch-agnostic mm related changes. Big
> thanks and shout out to Rick Edgecombe for that.
> 
> selftests
> ---------
> Created some minimal selftests to test the patch series.
> 
> 
> [1] - https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/
> [2] - https://github.com/riscv/riscv-cfi
> [3] - https://lore.kernel.org/lkml/ZWHcBq0bJ+15eeKs@finisterre.sirena.org.uk/T/#mb121cd8b33d564e64234595a0ec52211479cf474
> [4] - https://lore.kernel.org/all/20220130211838.8382-1-rick.p.edgecombe@intel.com/
> [5] - https://lore.kernel.org/lkml/CAHk-=wgP5mk3poVeejw16Asbid0ghDt4okHnWaWKLBkRhQntRA@mail.gmail.com/
> [6] - https://lore.kernel.org/linux-mm/20231122-arm64-gcs-v7-2-201c483bd775@kernel.org/
> [7] - https://lore.kernel.org/lkml/20240125062739.1339782-1-debug@rivosinc.com/
> [8] - https://docs.google.com/spreadsheets/d/1_cHGH4ctNVvFRiS7hW9dEGKtXLAJ3aX4Z_iTSa3Tw2U/edit#gid=0
> [9] - https://lore.kernel.org/lkml/20240329044459.3990638-1-debug@rivosinc.com/
> 

This is a note for people wanting to test this series.

1. Need a toolchain that has CFI support

$ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev
$ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb  --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static"
$ make -j$(nproc)

2. QEMU

$ git clone git@github.com:deepak0414/qemu.git -b zicfilp_zicfiss_mar24_spec_v8.1.1
$ cd qemu
$ mkdir build
$ cd build
$ ../configure --target-list=riscv64-softmmu
$ make -j$(nproc)

3. OpenSBI

$ git clone git@github.com:deepak0414/opensbi.git -b cfi_spec_split_opensbi
$ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic

4. Linux

Running defconfig is fine. CFI is enabled by default if the toolchain
supports it.

$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc)

5. Running

Modify your qemu command to have:
-bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin
-cpu rv64,zicfilp=true,zicfiss=true

- Charlie
Charlie Jenkins May 9, 2024, 6:21 p.m. UTC | #6
On Wed, Apr 03, 2024 at 04:35:17PM -0700, Deepak Gupta wrote:
> Adds kselftest for RISC-V control flow integrity implementation for user
> mode. There is not a lot going on in kernel for enabling landing pad for
> user mode. cfi selftest are intended to be compiled with zicfilp and
> zicfiss enabled compiler. Thus kselftest simply checks if landing pad and
> shadow stack for the binary and process are enabled or not. selftest then
> register a signal handler for SIGSEGV. Any control flow violation are
> reported as SIGSEGV with si_code = SEGV_CPERR. Test will fail on recieving
> any SEGV_CPERR. Shadow stack part has more changes in kernel and thus there
> are separate tests for that
> 	- Exercise `map_shadow_stack` syscall
> 	- `fork` test to make sure COW works for shadow stack pages
> 	- gup tests
> 	  As of today kernel uses FOLL_FORCE when access happens to memory via
> 	  /proc/<pid>/mem. Not breaking that for shadow stack
> 	- signal test. Make sure signal delivery results in token creation on
>       shadow stack and consumes (and verifies) token on sigreturn
>     - shadow stack protection test. attempts to write using regular store
> 	  instruction on shadow stack memory must result in access faults
> 
> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
> ---
>  tools/testing/selftests/riscv/Makefile        |   2 +-
>  tools/testing/selftests/riscv/cfi/.gitignore  |   3 +
>  tools/testing/selftests/riscv/cfi/Makefile    |  10 +
>  .../testing/selftests/riscv/cfi/cfi_rv_test.h |  83 ++++
>  .../selftests/riscv/cfi/riscv_cfi_test.c      |  82 ++++
>  .../testing/selftests/riscv/cfi/shadowstack.c | 362 ++++++++++++++++++
>  .../testing/selftests/riscv/cfi/shadowstack.h |  37 ++
>  7 files changed, 578 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/riscv/cfi/.gitignore
>  create mode 100644 tools/testing/selftests/riscv/cfi/Makefile
>  create mode 100644 tools/testing/selftests/riscv/cfi/cfi_rv_test.h
>  create mode 100644 tools/testing/selftests/riscv/cfi/riscv_cfi_test.c
>  create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.c
>  create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.h
> 
> diff --git a/tools/testing/selftests/riscv/Makefile b/tools/testing/selftests/riscv/Makefile
> index 4a9ff515a3a0..867e5875b7ce 100644
> --- a/tools/testing/selftests/riscv/Makefile
> +++ b/tools/testing/selftests/riscv/Makefile
> @@ -5,7 +5,7 @@
>  ARCH ?= $(shell uname -m 2>/dev/null || echo not)
>  
>  ifneq (,$(filter $(ARCH),riscv))
> -RISCV_SUBTARGETS ?= hwprobe vector mm
> +RISCV_SUBTARGETS ?= hwprobe vector mm cfi
>  else
>  RISCV_SUBTARGETS :=
>  endif
> diff --git a/tools/testing/selftests/riscv/cfi/.gitignore b/tools/testing/selftests/riscv/cfi/.gitignore
> new file mode 100644
> index 000000000000..ce7623f9da28
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/cfi/.gitignore
> @@ -0,0 +1,3 @@
> +cfitests
> +riscv_cfi_test
> +shadowstack
> \ No newline at end of file
> diff --git a/tools/testing/selftests/riscv/cfi/Makefile b/tools/testing/selftests/riscv/cfi/Makefile
> new file mode 100644
> index 000000000000..b65f7ff38a32
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/cfi/Makefile
> @@ -0,0 +1,10 @@
> +CFLAGS += -I$(top_srcdir)/tools/include
> +
> +CFLAGS += -march=rv64gc_zicfilp_zicfiss
> +
> +TEST_GEN_PROGS := cfitests
> +
> +include ../../lib.mk
> +
> +$(OUTPUT)/cfitests: riscv_cfi_test.c shadowstack.c
> +	$(CC) -o$@ $(CFLAGS) $(LDFLAGS) $^
> diff --git a/tools/testing/selftests/riscv/cfi/cfi_rv_test.h b/tools/testing/selftests/riscv/cfi/cfi_rv_test.h
> new file mode 100644
> index 000000000000..fa1cf7183672
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/cfi/cfi_rv_test.h
> @@ -0,0 +1,83 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#ifndef SELFTEST_RISCV_CFI_H
> +#define SELFTEST_RISCV_CFI_H
> +#include <stddef.h>
> +#include <sys/types.h>
> +#include "shadowstack.h"
> +
> +#define RISCV_CFI_SELFTEST_COUNT RISCV_SHADOW_STACK_TESTS
> +
> +#define CHILD_EXIT_CODE_SSWRITE		10
> +#define CHILD_EXIT_CODE_SIG_TEST	11
> +
> +#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5)		\
> +({															\
> +	register long _num  __asm__ ("a7") = (num);				\
> +	register long _arg1 __asm__ ("a0") = (long)(arg1);		\
> +	register long _arg2 __asm__ ("a1") = (long)(arg2);		\
> +	register long _arg3 __asm__ ("a2") = (long)(arg3);		\
> +	register long _arg4 __asm__ ("a3") = (long)(arg4);		\
> +	register long _arg5 __asm__ ("a4") = (long)(arg5);		\
> +															\
> +	__asm__ volatile (										\
> +		"ecall\n"											\
> +		: "+r"(_arg1)										\
> +		: "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5),	\
> +		  "r"(_num)											\
> +		: "memory", "cc"									\
> +	);														\
> +	_arg1;													\
> +})
> +
> +#define my_syscall3(num, arg1, arg2, arg3)					\
> +({															\
> +	register long _num  __asm__ ("a7") = (num);				\
> +	register long _arg1 __asm__ ("a0") = (long)(arg1);		\
> +	register long _arg2 __asm__ ("a1") = (long)(arg2);		\
> +	register long _arg3 __asm__ ("a2") = (long)(arg3);		\
> +															\
> +	__asm__ volatile (										\
> +		"ecall\n"											\
> +		: "+r"(_arg1)										\
> +		: "r"(_arg2), "r"(_arg3),							\
> +		  "r"(_num)											\
> +		: "memory", "cc"									\
> +	);														\
> +	_arg1;													\
> +})
> +
> +#ifndef __NR_prctl
> +#define __NR_prctl 167
> +#endif
> +
> +#ifndef __NR_map_shadow_stack
> +#define __NR_map_shadow_stack 453
> +#endif
> +
> +#define CSR_SSP 0x011
> +
> +#ifdef __ASSEMBLY__
> +#define __ASM_STR(x)    x
> +#else
> +#define __ASM_STR(x)    #x
> +#endif
> +
> +#define csr_read(csr)									\
> +({														\
> +	register unsigned long __v;							\
> +	__asm__ __volatile__ ("csrr %0, " __ASM_STR(csr)	\
> +						  : "=r" (__v) :				\
> +						  : "memory");					\
> +	__v;												\
> +})
> +
> +#define csr_write(csr, val)								\
> +({														\
> +	unsigned long __v = (unsigned long) (val);			\
> +	__asm__ __volatile__ ("csrw " __ASM_STR(csr) ", %0"	\
> +						  : : "rK" (__v)				\
> +						  : "memory");					\
> +})
> +
> +#endif
> diff --git a/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c b/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c
> new file mode 100644
> index 000000000000..f22b3f0f24de
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c
> @@ -0,0 +1,82 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include "../../kselftest.h"
> +#include <signal.h>
> +#include <asm/ucontext.h>
> +#include <linux/prctl.h>
> +#include "cfi_rv_test.h"
> +
> +/* do not optimize cfi related test functions */
> +#pragma GCC push_options
> +#pragma GCC optimize("O0")
> +
> +void sigsegv_handler(int signum, siginfo_t *si, void *uc)
> +{
> +	struct ucontext *ctx = (struct ucontext *) uc;
> +
> +	if (si->si_code == SEGV_CPERR) {
> +		printf("Control flow violation happened somewhere\n");
> +		printf("pc where violation happened %lx\n", ctx->uc_mcontext.gregs[0]);
> +		exit(-1);
> +	}
> +
> +	printf("In sigsegv handler\n");
> +	/* all other cases are expected to be of shadow stack write case */
> +	exit(CHILD_EXIT_CODE_SSWRITE);
> +}
> +
> +bool register_signal_handler(void)
> +{
> +	struct sigaction sa = {};
> +
> +	sa.sa_sigaction = sigsegv_handler;
> +	sa.sa_flags = SA_SIGINFO;
> +	if (sigaction(SIGSEGV, &sa, NULL)) {
> +		printf("registering signal handler for landing pad violation failed\n");
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	int ret = 0;
> +	unsigned long lpad_status = 0, ss_status = 0;
> +
> +	ksft_print_header();
> +
> +	ksft_set_plan(RISCV_CFI_SELFTEST_COUNT);
> +
> +	ksft_print_msg("starting risc-v tests\n");
> +
> +	/*
> +	 * Landing pad test. Not a lot of kernel changes to support landing
> +	 * pad for user mode except lighting up a bit in senvcfg via a prctl
> +	 * Enable landing pad through out the execution of test binary
> +	 */
> +	ret = my_syscall5(__NR_prctl, PR_GET_INDIR_BR_LP_STATUS, &lpad_status, 0, 0, 0);

There is an assumption here that the libc supports setting
INDIR_BR_LP_STATUS but does not support the standard prctl interface
defined in <sys/prctl.h>. my_syscall5() is defined to fill in gaps in
the libc, so this test case should also set the status manually rather
than relying on the libc.

I don't think it's necessary to define my_syscall5() since every libc
should have a prctl() definition. However, these CFI prctls are very new
and glibc does not yet support (correct me if I am wrong) it so these
prctls should be enabled by the test cases.

- Charlie

> +	if (ret)
> +		ksft_exit_skip("Get landing pad status failed with %d\n", ret);
> +
> +	if (!(lpad_status & PR_INDIR_BR_LP_ENABLE))
> +		ksft_exit_skip("landing pad is not enabled, should be enabled via glibc\n");
> +
> +	ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0);
> +	if (ret)
> +		ksft_exit_skip("Get shadow stack failed with %d\n", ret);
> +
> +	if (!(ss_status & PR_SHADOW_STACK_ENABLE))
> +		ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n");
> +
> +	if (!register_signal_handler())
> +		ksft_exit_skip("registering signal handler for SIGSEGV failed\n");
> +
> +	ksft_print_msg("landing pad and shadow stack are enabled for binary\n");
> +	ksft_print_msg("starting risc-v shadow stack tests\n");
> +	execute_shadow_stack_tests();
> +
> +	ksft_finished();
> +}
> +
> +#pragma GCC pop_options
> diff --git a/tools/testing/selftests/riscv/cfi/shadowstack.c b/tools/testing/selftests/riscv/cfi/shadowstack.c
> new file mode 100644
> index 000000000000..2f65eb970c44
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/cfi/shadowstack.c
> @@ -0,0 +1,362 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include "../../kselftest.h"
> +#include <sys/wait.h>
> +#include <signal.h>
> +#include <fcntl.h>
> +#include <asm-generic/unistd.h>
> +#include <sys/mman.h>
> +#include "shadowstack.h"
> +#include "cfi_rv_test.h"
> +
> +/* do not optimize shadow stack related test functions */
> +#pragma GCC push_options
> +#pragma GCC optimize("O0")
> +
> +void zar(void)
> +{
> +	unsigned long ssp = 0;
> +
> +	ssp = csr_read(CSR_SSP);
> +	printf("inside %s and shadow stack ptr is %lx\n", __func__, ssp);
> +}
> +
> +void bar(void)
> +{
> +	printf("inside %s\n", __func__);
> +	zar();
> +}
> +
> +void foo(void)
> +{
> +	printf("inside %s\n", __func__);
> +	bar();
> +}
> +
> +void zar_child(void)
> +{
> +	unsigned long ssp = 0;
> +
> +	ssp = csr_read(CSR_SSP);
> +	printf("inside %s and shadow stack ptr is %lx\n", __func__, ssp);
> +}
> +
> +void bar_child(void)
> +{
> +	printf("inside %s\n", __func__);
> +	zar_child();
> +}
> +
> +void foo_child(void)
> +{
> +	printf("inside %s\n", __func__);
> +	bar_child();
> +}
> +
> +typedef void (call_func_ptr)(void);
> +/*
> + * call couple of functions to test push pop.
> + */
> +int shadow_stack_call_tests(call_func_ptr fn_ptr, bool parent)
> +{
> +	if (parent)
> +		printf("call test for parent\n");
> +	else
> +		printf("call test for child\n");
> +
> +	(fn_ptr)();
> +
> +	return 0;
> +}
> +
> +/* forks a thread, and ensure shadow stacks fork out */
> +bool shadow_stack_fork_test(unsigned long test_num, void *ctx)
> +{
> +	int pid = 0, child_status = 0, parent_pid = 0, ret = 0;
> +	unsigned long ss_status = 0;
> +
> +	printf("exercising shadow stack fork test\n");
> +
> +	ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0);
> +	if (ret) {
> +		printf("shadow stack get status prctl failed with errorcode %d\n", ret);
> +		return false;
> +	}
> +
> +	if (!(ss_status & PR_SHADOW_STACK_ENABLE))
> +		ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n");
> +
> +	parent_pid = getpid();
> +	pid = fork();
> +
> +	if (pid) {
> +		printf("Parent pid %d and child pid %d\n", parent_pid, pid);
> +		shadow_stack_call_tests(&foo, true);
> +	} else
> +		shadow_stack_call_tests(&foo_child, false);
> +
> +	if (pid) {
> +		printf("waiting on child to finish\n");
> +		wait(&child_status);
> +	} else {
> +		/* exit child gracefully */
> +		exit(0);
> +	}
> +
> +	if (pid && WIFSIGNALED(child_status)) {
> +		printf("child faulted");
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +/* exercise `map_shadow_stack`, pivot to it and call some functions to ensure it works */
> +#define SHADOW_STACK_ALLOC_SIZE 4096
> +bool shadow_stack_map_test(unsigned long test_num, void *ctx)
> +{
> +	unsigned long shdw_addr;
> +	int ret = 0;
> +
> +	shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0);
> +
> +	if (((long) shdw_addr) <= 0) {
> +		printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr);
> +		return false;
> +	}
> +
> +	ret = munmap((void *) shdw_addr, SHADOW_STACK_ALLOC_SIZE);
> +
> +	if (ret) {
> +		printf("munmap failed with error code %d\n", ret);
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +/*
> + * shadow stack protection tests. map a shadow stack and
> + * validate all memory protections work on it
> + */
> +bool shadow_stack_protection_test(unsigned long test_num, void *ctx)
> +{
> +	unsigned long shdw_addr;
> +	unsigned long *write_addr = NULL;
> +	int ret = 0, pid = 0, child_status = 0;
> +
> +	shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0);
> +
> +	if (((long) shdw_addr) <= 0) {
> +		printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr);
> +		return false;
> +	}
> +
> +	write_addr = (unsigned long *) shdw_addr;
> +	pid = fork();
> +
> +	/* no child was created, return false */
> +	if (pid == -1)
> +		return false;
> +
> +	/*
> +	 * try to perform a store from child on shadow stack memory
> +	 * it should result in SIGSEGV
> +	 */
> +	if (!pid) {
> +		/* below write must lead to SIGSEGV */
> +		*write_addr = 0xdeadbeef;
> +	} else {
> +		wait(&child_status);
> +	}
> +
> +	/* test fail, if 0xdeadbeef present on shadow stack address */
> +	if (*write_addr == 0xdeadbeef) {
> +		printf("write suceeded\n");
> +		return false;
> +	}
> +
> +	/* if child reached here, then fail */
> +	if (!pid) {
> +		printf("child reached unreachable state\n");
> +		return false;
> +	}
> +
> +	/* if child exited via signal handler but not for write on ss */
> +	if (WIFEXITED(child_status) &&
> +		WEXITSTATUS(child_status) != CHILD_EXIT_CODE_SSWRITE) {
> +		printf("child wasn't signaled for write on shadow stack\n");
> +		return false;
> +	}
> +
> +	ret = munmap(write_addr, SHADOW_STACK_ALLOC_SIZE);
> +	if (ret) {
> +		printf("munmap failed with error code %d\n", ret);
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +#define SS_MAGIC_WRITE_VAL 0xbeefdead
> +
> +int gup_tests(int mem_fd, unsigned long *shdw_addr)
> +{
> +	unsigned long val = 0;
> +
> +	lseek(mem_fd, (unsigned long)shdw_addr, SEEK_SET);
> +	if (read(mem_fd, &val, sizeof(val)) < 0) {
> +		printf("reading shadow stack mem via gup failed\n");
> +		return 1;
> +	}
> +
> +	val = SS_MAGIC_WRITE_VAL;
> +	lseek(mem_fd, (unsigned long)shdw_addr, SEEK_SET);
> +	if (write(mem_fd, &val, sizeof(val)) < 0) {
> +		printf("writing shadow stack mem via gup failed\n");
> +		return 1;
> +	}
> +
> +	if (*shdw_addr != SS_MAGIC_WRITE_VAL) {
> +		printf("GUP write to shadow stack memory didn't happen\n");
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +bool shadow_stack_gup_tests(unsigned long test_num, void *ctx)
> +{
> +	unsigned long shdw_addr = 0;
> +	unsigned long *write_addr = NULL;
> +	int fd = 0;
> +	bool ret = false;
> +
> +	shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0);
> +
> +	if (((long) shdw_addr) <= 0) {
> +		printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr);
> +		return false;
> +	}
> +
> +	write_addr = (unsigned long *) shdw_addr;
> +
> +	fd = open("/proc/self/mem", O_RDWR);
> +	if (fd == -1)
> +		return false;
> +
> +	if (gup_tests(fd, write_addr)) {
> +		printf("gup tests failed\n");
> +		goto out;
> +	}
> +
> +	ret = true;
> +out:
> +	if (shdw_addr && munmap(write_addr, SHADOW_STACK_ALLOC_SIZE)) {
> +		printf("munmap failed with error code %d\n", ret);
> +		ret = false;
> +	}
> +
> +	return ret;
> +}
> +
> +volatile bool break_loop;
> +
> +void sigusr1_handler(int signo)
> +{
> +	printf("In sigusr1 handler\n");
> +	break_loop = true;
> +}
> +
> +bool sigusr1_signal_test(void)
> +{
> +	struct sigaction sa = {};
> +
> +	sa.sa_handler = sigusr1_handler;
> +	sa.sa_flags = 0;
> +	sigemptyset(&sa.sa_mask);
> +	if (sigaction(SIGUSR1, &sa, NULL)) {
> +		printf("registering signal handler for SIGUSR1 failed\n");
> +		return false;
> +	}
> +
> +	return true;
> +}
> +/*
> + * shadow stack signal test. shadow stack must be enabled.
> + * register a signal, fork another thread which is waiting
> + * on signal. Send a signal from parent to child, verify
> + * that signal was received by child. If not test fails
> + */
> +bool shadow_stack_signal_test(unsigned long test_num, void *ctx)
> +{
> +	int pid = 0, child_status = 0, ret = 0;
> +	unsigned long ss_status = 0;
> +
> +	ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0);
> +	if (ret) {
> +		printf("shadow stack get status prctl failed with errorcode %d\n", ret);
> +		return false;
> +	}
> +
> +	if (!(ss_status & PR_SHADOW_STACK_ENABLE))
> +		ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n");
> +
> +	/* this should be caught by signal handler and do an exit */
> +	if (!sigusr1_signal_test()) {
> +		printf("registering sigusr1 handler failed\n");
> +		exit(-1);
> +	}
> +
> +	pid = fork();
> +
> +	if (pid == -1) {
> +		printf("signal test: fork failed\n");
> +		goto out;
> +	}
> +
> +	if (pid == 0) {
> +		while (!break_loop)
> +			sleep(1);
> +
> +		exit(11);
> +		/* child shouldn't go beyond here */
> +	}
> +
> +	/* send SIGUSR1 to child */
> +	kill(pid, SIGUSR1);
> +	wait(&child_status);
> +
> +out:
> +
> +	return (WIFEXITED(child_status) &&
> +			WEXITSTATUS(child_status) == 11);
> +}
> +
> +int execute_shadow_stack_tests(void)
> +{
> +	int ret = 0;
> +	unsigned long test_count = 0;
> +	unsigned long shstk_status = 0;
> +
> +	printf("Executing RISC-V shadow stack self tests\n");
> +
> +	ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &shstk_status, 0, 0, 0);
> +
> +	if (ret != 0)
> +		ksft_exit_skip("Get shadow stack status failed with %d\n", ret);
> +
> +	/*
> +	 * If we are here that means get shadow stack status succeeded and
> +	 * thus shadow stack support is baked in the kernel.
> +	 */
> +	while (test_count < ARRAY_SIZE(shstk_tests)) {
> +		ksft_test_result((*shstk_tests[test_count].t_func)(test_count, NULL),
> +						 shstk_tests[test_count].name);
> +		test_count++;
> +	}
> +
> +	return 0;
> +}
> +
> +#pragma GCC pop_options
> diff --git a/tools/testing/selftests/riscv/cfi/shadowstack.h b/tools/testing/selftests/riscv/cfi/shadowstack.h
> new file mode 100644
> index 000000000000..b43e74136a26
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/cfi/shadowstack.h
> @@ -0,0 +1,37 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#ifndef SELFTEST_SHADOWSTACK_TEST_H
> +#define SELFTEST_SHADOWSTACK_TEST_H
> +#include <stddef.h>
> +#include <linux/prctl.h>
> +
> +/*
> + * a cfi test returns true for success or false for fail
> + * takes a number for test number to index into array and void pointer.
> + */
> +typedef bool (*shstk_test_func)(unsigned long test_num, void *);
> +
> +struct shadow_stack_tests {
> +	char *name;
> +	shstk_test_func t_func;
> +};
> +
> +bool shadow_stack_fork_test(unsigned long test_num, void *ctx);
> +bool shadow_stack_map_test(unsigned long test_num, void *ctx);
> +bool shadow_stack_protection_test(unsigned long test_num, void *ctx);
> +bool shadow_stack_gup_tests(unsigned long test_num, void *ctx);
> +bool shadow_stack_signal_test(unsigned long test_num, void *ctx);
> +
> +static struct shadow_stack_tests shstk_tests[] = {
> +	{ "shstk fork test\n", shadow_stack_fork_test },
> +	{ "map shadow stack syscall\n", shadow_stack_map_test },
> +	{ "shadow stack gup tests\n", shadow_stack_gup_tests },
> +	{ "shadow stack signal tests\n", shadow_stack_signal_test},
> +	{ "memory protections of shadow stack memory\n", shadow_stack_protection_test }
> +};
> +
> +#define RISCV_SHADOW_STACK_TESTS ARRAY_SIZE(shstk_tests)
> +
> +int execute_shadow_stack_tests(void);
> +
> +#endif
> -- 
> 2.43.2
>
Deepak Gupta May 9, 2024, 7 p.m. UTC | #7
On Wed, May 08, 2024 at 05:10:46PM -0700, Charlie Jenkins wrote:
>On Wed, Apr 03, 2024 at 04:34:49PM -0700, Deepak Gupta wrote:
>> envcfg CSR defines enabling bits for cache management instructions and
>> soon will control enabling for control flow integrity and pointer
>> masking features.
>>
>> Control flow integrity enabling for forward cfi and backward cfi are
>> controlled via envcfg and thus need to be enabled on per thread basis.
>>
>> This patch creates a place holder for envcfg CSR in `thread_info` and
>> adds logic to save and restore on task switching.
>>
>> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
>> ---
>>  arch/riscv/include/asm/switch_to.h   | 10 ++++++++++
>>  arch/riscv/include/asm/thread_info.h |  1 +
>>  2 files changed, 11 insertions(+)
>>
>> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
>> index 7efdb0584d47..2d9a00a30394 100644
>> --- a/arch/riscv/include/asm/switch_to.h
>> +++ b/arch/riscv/include/asm/switch_to.h
>> @@ -69,6 +69,15 @@ static __always_inline bool has_fpu(void) { return false; }
>>  #define __switch_to_fpu(__prev, __next) do { } while (0)
>>  #endif
>>
>> +static inline void __switch_to_envcfg(struct task_struct *next)
>> +{
>> +	register unsigned long envcfg = next->thread_info.envcfg;
>
>This doesn't need the register storage class.
>

yeah. will fix it. thanks.

>> +
>> +	asm volatile (ALTERNATIVE("nop", "csrw " __stringify(CSR_ENVCFG) ", %0", 0,
>> +							  RISCV_ISA_EXT_XLINUXENVCFG, 1)
>> +							  :: "r" (envcfg) : "memory");
>> +}
>> +
>
>Something like:
>
>static inline void __switch_to_envcfg(struct task_struct *next)
>{
>	if (riscv_has_extension_unlikely(RISCV_ISA_EXT_XLINUXENVCFG))
>		csr_write(CSR_ENVCFG, next->thread_info.envcfg);
>}
>
>would be easier to read, but the alternative you have written doesn't
>have the jump that riscv_has_extension_unlikely has so what you have
>will be more performant.

Yeah looked at codegen of `riscv_has_extension_unlikely` and I didn't like un-necessary jumps,
specially in switch_to path. All I want is a CSR write. So used alternative to patch nop with
CSR write.

>
>Does envcfg need to be save/restored always or just with
>CONFIG_RISCV_USER_CFI?

There is no save (no read of CSR). Only restore (writes to CSR).

There are pointer masking patches from Samuel Holland where senvcfg needs to be context
switched on per task basis.
https://lore.kernel.org/lkml/20240319215915.832127-1-samuel.holland@sifive.com/T/

Given that this CSR controls user execution environment and is per task basis, I thought its
better to not wrap it under CONFIG_RISCV_USER_CFI and rather make it dependend on
RISCV_ISA_EXT_XLINUXENVCFG. If any of the extensions which require senvcfg, then simply
restore this CSR on per task basis.

>
>- Charlie
>
>>  extern struct task_struct *__switch_to(struct task_struct *,
>>  				       struct task_struct *);
>>
>> @@ -80,6 +89,7 @@ do {							\
>>  		__switch_to_fpu(__prev, __next);	\
>>  	if (has_vector())					\
>>  		__switch_to_vector(__prev, __next);	\
>> +	__switch_to_envcfg(__next);				\
>>  	((last) = __switch_to(__prev, __next));		\
>>  } while (0)
>>
>> diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
>> index 5d473343634b..a503bdc2f6dd 100644
>> --- a/arch/riscv/include/asm/thread_info.h
>> +++ b/arch/riscv/include/asm/thread_info.h
>> @@ -56,6 +56,7 @@ struct thread_info {
>>  	long			user_sp;	/* User stack pointer */
>>  	int			cpu;
>>  	unsigned long		syscall_work;	/* SYSCALL_WORK_ flags */
>> +	unsigned long envcfg;
>>  #ifdef CONFIG_SHADOW_CALL_STACK
>>  	void			*scs_base;
>>  	void			*scs_sp;
>> --
>> 2.43.2
>>
Deepak Gupta May 9, 2024, 7:16 p.m. UTC | #8
On Thu, May 09, 2024 at 11:21:15AM -0700, Charlie Jenkins wrote:
>On Wed, Apr 03, 2024 at 04:35:17PM -0700, Deepak Gupta wrote:
>> +
>> +int main(int argc, char *argv[])
>> +{
>> +	int ret = 0;
>> +	unsigned long lpad_status = 0, ss_status = 0;
>> +
>> +	ksft_print_header();
>> +
>> +	ksft_set_plan(RISCV_CFI_SELFTEST_COUNT);
>> +
>> +	ksft_print_msg("starting risc-v tests\n");
>> +
>> +	/*
>> +	 * Landing pad test. Not a lot of kernel changes to support landing
>> +	 * pad for user mode except lighting up a bit in senvcfg via a prctl
>> +	 * Enable landing pad through out the execution of test binary
>> +	 */
>> +	ret = my_syscall5(__NR_prctl, PR_GET_INDIR_BR_LP_STATUS, &lpad_status, 0, 0, 0);
>
>There is an assumption here that the libc supports setting
>INDIR_BR_LP_STATUS but does not support the standard prctl interface
>defined in <sys/prctl.h>. my_syscall5() is defined to fill in gaps in
>the libc, so this test case should also set the status manually rather
>than relying on the libc.
>
>I don't think it's necessary to define my_syscall5() since every libc
>should have a prctl() definition. However, these CFI prctls are very new
>and glibc does not yet support (correct me if I am wrong) it so these
>prctls should be enabled by the test cases.

In one of my previous patches, it was setting landing pad and shadow stack enabling
directly via handcrafted prctl macro. I changed it to check for status for following reasons

- If this binary is compiled with landing pad and shadow stack option then toolchain being used
   already has libc with shadow stack and landing pad enabling

- Currently upstream glibc toolchain dont have support but libc with toolchain has the support.

In case of shadow stack enabling, macro is needed and `prctl` function can't be used.
Because you enter `prctl` function with no shadow stack but exit with shadow stack and will lead to
fault in its epilog.

Due to all these reasons, kselftests have to be compiled with toolchain with cfi codegen and thus libc
should have support to light them up. Here tests only checks if they are already lit up, If not it fails.

Although you're spot on one thing here, since this test is assuming libc already lit-up landing pad and
shadow stack. It doesn't need macro here for status check of feature and can simply use `prctl` syscall
interface.

>
>- Charlie
>
>> +	if (ret)
Charlie Jenkins May 10, 2024, 1:20 a.m. UTC | #9
On Wed, Apr 03, 2024 at 04:35:17PM -0700, Deepak Gupta wrote:
> Adds kselftest for RISC-V control flow integrity implementation for user
> mode. There is not a lot going on in kernel for enabling landing pad for
> user mode. cfi selftest are intended to be compiled with zicfilp and
> zicfiss enabled compiler. Thus kselftest simply checks if landing pad and
> shadow stack for the binary and process are enabled or not. selftest then
> register a signal handler for SIGSEGV. Any control flow violation are
> reported as SIGSEGV with si_code = SEGV_CPERR. Test will fail on recieving
> any SEGV_CPERR. Shadow stack part has more changes in kernel and thus there
> are separate tests for that
> 	- Exercise `map_shadow_stack` syscall
> 	- `fork` test to make sure COW works for shadow stack pages
> 	- gup tests
> 	  As of today kernel uses FOLL_FORCE when access happens to memory via
> 	  /proc/<pid>/mem. Not breaking that for shadow stack
> 	- signal test. Make sure signal delivery results in token creation on
>       shadow stack and consumes (and verifies) token on sigreturn
>     - shadow stack protection test. attempts to write using regular store
> 	  instruction on shadow stack memory must result in access faults
> 
> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
> ---
>  tools/testing/selftests/riscv/Makefile        |   2 +-
>  tools/testing/selftests/riscv/cfi/.gitignore  |   3 +
>  tools/testing/selftests/riscv/cfi/Makefile    |  10 +
>  .../testing/selftests/riscv/cfi/cfi_rv_test.h |  83 ++++
>  .../selftests/riscv/cfi/riscv_cfi_test.c      |  82 ++++
>  .../testing/selftests/riscv/cfi/shadowstack.c | 362 ++++++++++++++++++
>  .../testing/selftests/riscv/cfi/shadowstack.h |  37 ++
>  7 files changed, 578 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/riscv/cfi/.gitignore
>  create mode 100644 tools/testing/selftests/riscv/cfi/Makefile
>  create mode 100644 tools/testing/selftests/riscv/cfi/cfi_rv_test.h
>  create mode 100644 tools/testing/selftests/riscv/cfi/riscv_cfi_test.c
>  create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.c
>  create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.h
> 
> diff --git a/tools/testing/selftests/riscv/Makefile b/tools/testing/selftests/riscv/Makefile
> index 4a9ff515a3a0..867e5875b7ce 100644
> --- a/tools/testing/selftests/riscv/Makefile
> +++ b/tools/testing/selftests/riscv/Makefile
> @@ -5,7 +5,7 @@
>  ARCH ?= $(shell uname -m 2>/dev/null || echo not)
>  
>  ifneq (,$(filter $(ARCH),riscv))
> -RISCV_SUBTARGETS ?= hwprobe vector mm
> +RISCV_SUBTARGETS ?= hwprobe vector mm cfi
>  else
>  RISCV_SUBTARGETS :=
>  endif
> diff --git a/tools/testing/selftests/riscv/cfi/.gitignore b/tools/testing/selftests/riscv/cfi/.gitignore
> new file mode 100644
> index 000000000000..ce7623f9da28
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/cfi/.gitignore
> @@ -0,0 +1,3 @@
> +cfitests
> +riscv_cfi_test
> +shadowstack
> \ No newline at end of file
> diff --git a/tools/testing/selftests/riscv/cfi/Makefile b/tools/testing/selftests/riscv/cfi/Makefile
> new file mode 100644
> index 000000000000..b65f7ff38a32
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/cfi/Makefile
> @@ -0,0 +1,10 @@
> +CFLAGS += -I$(top_srcdir)/tools/include
> +
> +CFLAGS += -march=rv64gc_zicfilp_zicfiss
> +
> +TEST_GEN_PROGS := cfitests
> +
> +include ../../lib.mk
> +
> +$(OUTPUT)/cfitests: riscv_cfi_test.c shadowstack.c
> +	$(CC) -o$@ $(CFLAGS) $(LDFLAGS) $^
> diff --git a/tools/testing/selftests/riscv/cfi/cfi_rv_test.h b/tools/testing/selftests/riscv/cfi/cfi_rv_test.h
> new file mode 100644
> index 000000000000..fa1cf7183672
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/cfi/cfi_rv_test.h
> @@ -0,0 +1,83 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#ifndef SELFTEST_RISCV_CFI_H
> +#define SELFTEST_RISCV_CFI_H
> +#include <stddef.h>
> +#include <sys/types.h>
> +#include "shadowstack.h"
> +
> +#define RISCV_CFI_SELFTEST_COUNT RISCV_SHADOW_STACK_TESTS
> +
> +#define CHILD_EXIT_CODE_SSWRITE		10
> +#define CHILD_EXIT_CODE_SIG_TEST	11
> +
> +#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5)		\
> +({															\
> +	register long _num  __asm__ ("a7") = (num);				\
> +	register long _arg1 __asm__ ("a0") = (long)(arg1);		\
> +	register long _arg2 __asm__ ("a1") = (long)(arg2);		\
> +	register long _arg3 __asm__ ("a2") = (long)(arg3);		\
> +	register long _arg4 __asm__ ("a3") = (long)(arg4);		\
> +	register long _arg5 __asm__ ("a4") = (long)(arg5);		\
> +															\
> +	__asm__ volatile (										\
> +		"ecall\n"											\
> +		: "+r"(_arg1)										\
> +		: "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5),	\
> +		  "r"(_num)											\
> +		: "memory", "cc"									\
> +	);														\
> +	_arg1;													\
> +})
> +
> +#define my_syscall3(num, arg1, arg2, arg3)					\
> +({															\
> +	register long _num  __asm__ ("a7") = (num);				\
> +	register long _arg1 __asm__ ("a0") = (long)(arg1);		\
> +	register long _arg2 __asm__ ("a1") = (long)(arg2);		\
> +	register long _arg3 __asm__ ("a2") = (long)(arg3);		\
> +															\
> +	__asm__ volatile (										\
> +		"ecall\n"											\
> +		: "+r"(_arg1)										\
> +		: "r"(_arg2), "r"(_arg3),							\
> +		  "r"(_num)											\
> +		: "memory", "cc"									\
> +	);														\
> +	_arg1;													\
> +})
> +
> +#ifndef __NR_prctl
> +#define __NR_prctl 167
> +#endif
> +
> +#ifndef __NR_map_shadow_stack
> +#define __NR_map_shadow_stack 453
> +#endif
> +
> +#define CSR_SSP 0x011
> +
> +#ifdef __ASSEMBLY__
> +#define __ASM_STR(x)    x
> +#else
> +#define __ASM_STR(x)    #x
> +#endif
> +
> +#define csr_read(csr)									\
> +({														\
> +	register unsigned long __v;							\
> +	__asm__ __volatile__ ("csrr %0, " __ASM_STR(csr)	\
> +						  : "=r" (__v) :				\
> +						  : "memory");					\
> +	__v;												\
> +})
> +
> +#define csr_write(csr, val)								\
> +({														\
> +	unsigned long __v = (unsigned long) (val);			\
> +	__asm__ __volatile__ ("csrw " __ASM_STR(csr) ", %0"	\
> +						  : : "rK" (__v)				\
> +						  : "memory");					\
> +})
> +
> +#endif
> diff --git a/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c b/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c
> new file mode 100644
> index 000000000000..f22b3f0f24de
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c
> @@ -0,0 +1,82 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include "../../kselftest.h"
> +#include <signal.h>
> +#include <asm/ucontext.h>
> +#include <linux/prctl.h>
> +#include "cfi_rv_test.h"
> +
> +/* do not optimize cfi related test functions */
> +#pragma GCC push_options
> +#pragma GCC optimize("O0")
> +
> +void sigsegv_handler(int signum, siginfo_t *si, void *uc)
> +{
> +	struct ucontext *ctx = (struct ucontext *) uc;
> +
> +	if (si->si_code == SEGV_CPERR) {
> +		printf("Control flow violation happened somewhere\n");
> +		printf("pc where violation happened %lx\n", ctx->uc_mcontext.gregs[0]);
> +		exit(-1);
> +	}
> +
> +	printf("In sigsegv handler\n");
> +	/* all other cases are expected to be of shadow stack write case */
> +	exit(CHILD_EXIT_CODE_SSWRITE);
> +}
> +
> +bool register_signal_handler(void)
> +{
> +	struct sigaction sa = {};
> +
> +	sa.sa_sigaction = sigsegv_handler;
> +	sa.sa_flags = SA_SIGINFO;
> +	if (sigaction(SIGSEGV, &sa, NULL)) {
> +		printf("registering signal handler for landing pad violation failed\n");
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	int ret = 0;
> +	unsigned long lpad_status = 0, ss_status = 0;
> +
> +	ksft_print_header();
> +
> +	ksft_set_plan(RISCV_CFI_SELFTEST_COUNT);
> +
> +	ksft_print_msg("starting risc-v tests\n");
> +
> +	/*
> +	 * Landing pad test. Not a lot of kernel changes to support landing
> +	 * pad for user mode except lighting up a bit in senvcfg via a prctl
> +	 * Enable landing pad through out the execution of test binary
> +	 */
> +	ret = my_syscall5(__NR_prctl, PR_GET_INDIR_BR_LP_STATUS, &lpad_status, 0, 0, 0);
> +	if (ret)
> +		ksft_exit_skip("Get landing pad status failed with %d\n", ret);
> +
> +	if (!(lpad_status & PR_INDIR_BR_LP_ENABLE))
> +		ksft_exit_skip("landing pad is not enabled, should be enabled via glibc\n");
> +
> +	ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0);
> +	if (ret)
> +		ksft_exit_skip("Get shadow stack failed with %d\n", ret);
> +
> +	if (!(ss_status & PR_SHADOW_STACK_ENABLE))
> +		ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n");
> +
> +	if (!register_signal_handler())
> +		ksft_exit_skip("registering signal handler for SIGSEGV failed\n");
> +
> +	ksft_print_msg("landing pad and shadow stack are enabled for binary\n");
> +	ksft_print_msg("starting risc-v shadow stack tests\n");
> +	execute_shadow_stack_tests();
> +
> +	ksft_finished();

The test case framework is based off of static variables, so these tests
actually report that nothing passed because the setup is in this file
and the actual test cases are in a different file. This can be remedied
by moving ksft_set_plan(RISCV_CFI_SELFTEST_COUNT) and ksft_finished()
into execute_shadow_stack_tests().

There are two versions of the kselftest framework and the one that this
is using is the low-level version that has the note in the header:

   kselftest.h:	low-level kselftest framework to include from
		selftest programs. When possible, please use
 		kselftest_harness.h instead.

There is not a good enough reason for you to change up this code to use
kselftest_harness.h instead, but just something to think about for any
future test cases you may write.

- Charlie

> +}
> +
> +#pragma GCC pop_options
> diff --git a/tools/testing/selftests/riscv/cfi/shadowstack.c b/tools/testing/selftests/riscv/cfi/shadowstack.c
> new file mode 100644
> index 000000000000..2f65eb970c44
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/cfi/shadowstack.c
> @@ -0,0 +1,362 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include "../../kselftest.h"
> +#include <sys/wait.h>
> +#include <signal.h>
> +#include <fcntl.h>
> +#include <asm-generic/unistd.h>
> +#include <sys/mman.h>
> +#include "shadowstack.h"
> +#include "cfi_rv_test.h"
> +
> +/* do not optimize shadow stack related test functions */
> +#pragma GCC push_options
> +#pragma GCC optimize("O0")
> +
> +void zar(void)
> +{
> +	unsigned long ssp = 0;
> +
> +	ssp = csr_read(CSR_SSP);
> +	printf("inside %s and shadow stack ptr is %lx\n", __func__, ssp);
> +}
> +
> +void bar(void)
> +{
> +	printf("inside %s\n", __func__);
> +	zar();
> +}
> +
> +void foo(void)
> +{
> +	printf("inside %s\n", __func__);
> +	bar();
> +}
> +
> +void zar_child(void)
> +{
> +	unsigned long ssp = 0;
> +
> +	ssp = csr_read(CSR_SSP);
> +	printf("inside %s and shadow stack ptr is %lx\n", __func__, ssp);
> +}
> +
> +void bar_child(void)
> +{
> +	printf("inside %s\n", __func__);
> +	zar_child();
> +}
> +
> +void foo_child(void)
> +{
> +	printf("inside %s\n", __func__);
> +	bar_child();
> +}
> +
> +typedef void (call_func_ptr)(void);
> +/*
> + * call couple of functions to test push pop.
> + */
> +int shadow_stack_call_tests(call_func_ptr fn_ptr, bool parent)
> +{
> +	if (parent)
> +		printf("call test for parent\n");
> +	else
> +		printf("call test for child\n");
> +
> +	(fn_ptr)();
> +
> +	return 0;
> +}
> +
> +/* forks a thread, and ensure shadow stacks fork out */
> +bool shadow_stack_fork_test(unsigned long test_num, void *ctx)
> +{
> +	int pid = 0, child_status = 0, parent_pid = 0, ret = 0;
> +	unsigned long ss_status = 0;
> +
> +	printf("exercising shadow stack fork test\n");
> +
> +	ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0);
> +	if (ret) {
> +		printf("shadow stack get status prctl failed with errorcode %d\n", ret);
> +		return false;
> +	}
> +
> +	if (!(ss_status & PR_SHADOW_STACK_ENABLE))
> +		ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n");
> +
> +	parent_pid = getpid();
> +	pid = fork();
> +
> +	if (pid) {
> +		printf("Parent pid %d and child pid %d\n", parent_pid, pid);
> +		shadow_stack_call_tests(&foo, true);
> +	} else
> +		shadow_stack_call_tests(&foo_child, false);
> +
> +	if (pid) {
> +		printf("waiting on child to finish\n");
> +		wait(&child_status);
> +	} else {
> +		/* exit child gracefully */
> +		exit(0);
> +	}
> +
> +	if (pid && WIFSIGNALED(child_status)) {
> +		printf("child faulted");
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +/* exercise `map_shadow_stack`, pivot to it and call some functions to ensure it works */
> +#define SHADOW_STACK_ALLOC_SIZE 4096
> +bool shadow_stack_map_test(unsigned long test_num, void *ctx)
> +{
> +	unsigned long shdw_addr;
> +	int ret = 0;
> +
> +	shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0);
> +
> +	if (((long) shdw_addr) <= 0) {
> +		printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr);
> +		return false;
> +	}
> +
> +	ret = munmap((void *) shdw_addr, SHADOW_STACK_ALLOC_SIZE);
> +
> +	if (ret) {
> +		printf("munmap failed with error code %d\n", ret);
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +/*
> + * shadow stack protection tests. map a shadow stack and
> + * validate all memory protections work on it
> + */
> +bool shadow_stack_protection_test(unsigned long test_num, void *ctx)
> +{
> +	unsigned long shdw_addr;
> +	unsigned long *write_addr = NULL;
> +	int ret = 0, pid = 0, child_status = 0;
> +
> +	shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0);
> +
> +	if (((long) shdw_addr) <= 0) {
> +		printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr);
> +		return false;
> +	}
> +
> +	write_addr = (unsigned long *) shdw_addr;
> +	pid = fork();
> +
> +	/* no child was created, return false */
> +	if (pid == -1)
> +		return false;
> +
> +	/*
> +	 * try to perform a store from child on shadow stack memory
> +	 * it should result in SIGSEGV
> +	 */
> +	if (!pid) {
> +		/* below write must lead to SIGSEGV */
> +		*write_addr = 0xdeadbeef;
> +	} else {
> +		wait(&child_status);
> +	}
> +
> +	/* test fail, if 0xdeadbeef present on shadow stack address */
> +	if (*write_addr == 0xdeadbeef) {
> +		printf("write suceeded\n");
> +		return false;
> +	}
> +
> +	/* if child reached here, then fail */
> +	if (!pid) {
> +		printf("child reached unreachable state\n");
> +		return false;
> +	}
> +
> +	/* if child exited via signal handler but not for write on ss */
> +	if (WIFEXITED(child_status) &&
> +		WEXITSTATUS(child_status) != CHILD_EXIT_CODE_SSWRITE) {
> +		printf("child wasn't signaled for write on shadow stack\n");
> +		return false;
> +	}
> +
> +	ret = munmap(write_addr, SHADOW_STACK_ALLOC_SIZE);
> +	if (ret) {
> +		printf("munmap failed with error code %d\n", ret);
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +#define SS_MAGIC_WRITE_VAL 0xbeefdead
> +
> +int gup_tests(int mem_fd, unsigned long *shdw_addr)
> +{
> +	unsigned long val = 0;
> +
> +	lseek(mem_fd, (unsigned long)shdw_addr, SEEK_SET);
> +	if (read(mem_fd, &val, sizeof(val)) < 0) {
> +		printf("reading shadow stack mem via gup failed\n");
> +		return 1;
> +	}
> +
> +	val = SS_MAGIC_WRITE_VAL;
> +	lseek(mem_fd, (unsigned long)shdw_addr, SEEK_SET);
> +	if (write(mem_fd, &val, sizeof(val)) < 0) {
> +		printf("writing shadow stack mem via gup failed\n");
> +		return 1;
> +	}
> +
> +	if (*shdw_addr != SS_MAGIC_WRITE_VAL) {
> +		printf("GUP write to shadow stack memory didn't happen\n");
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +bool shadow_stack_gup_tests(unsigned long test_num, void *ctx)
> +{
> +	unsigned long shdw_addr = 0;
> +	unsigned long *write_addr = NULL;
> +	int fd = 0;
> +	bool ret = false;
> +
> +	shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0);
> +
> +	if (((long) shdw_addr) <= 0) {
> +		printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr);
> +		return false;
> +	}
> +
> +	write_addr = (unsigned long *) shdw_addr;
> +
> +	fd = open("/proc/self/mem", O_RDWR);
> +	if (fd == -1)
> +		return false;
> +
> +	if (gup_tests(fd, write_addr)) {
> +		printf("gup tests failed\n");
> +		goto out;
> +	}
> +
> +	ret = true;
> +out:
> +	if (shdw_addr && munmap(write_addr, SHADOW_STACK_ALLOC_SIZE)) {
> +		printf("munmap failed with error code %d\n", ret);
> +		ret = false;
> +	}
> +
> +	return ret;
> +}
> +
> +volatile bool break_loop;
> +
> +void sigusr1_handler(int signo)
> +{
> +	printf("In sigusr1 handler\n");
> +	break_loop = true;
> +}
> +
> +bool sigusr1_signal_test(void)
> +{
> +	struct sigaction sa = {};
> +
> +	sa.sa_handler = sigusr1_handler;
> +	sa.sa_flags = 0;
> +	sigemptyset(&sa.sa_mask);
> +	if (sigaction(SIGUSR1, &sa, NULL)) {
> +		printf("registering signal handler for SIGUSR1 failed\n");
> +		return false;
> +	}
> +
> +	return true;
> +}
> +/*
> + * shadow stack signal test. shadow stack must be enabled.
> + * register a signal, fork another thread which is waiting
> + * on signal. Send a signal from parent to child, verify
> + * that signal was received by child. If not test fails
> + */
> +bool shadow_stack_signal_test(unsigned long test_num, void *ctx)
> +{
> +	int pid = 0, child_status = 0, ret = 0;
> +	unsigned long ss_status = 0;
> +
> +	ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0);
> +	if (ret) {
> +		printf("shadow stack get status prctl failed with errorcode %d\n", ret);
> +		return false;
> +	}
> +
> +	if (!(ss_status & PR_SHADOW_STACK_ENABLE))
> +		ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n");
> +
> +	/* this should be caught by signal handler and do an exit */
> +	if (!sigusr1_signal_test()) {
> +		printf("registering sigusr1 handler failed\n");
> +		exit(-1);
> +	}
> +
> +	pid = fork();
> +
> +	if (pid == -1) {
> +		printf("signal test: fork failed\n");
> +		goto out;
> +	}
> +
> +	if (pid == 0) {
> +		while (!break_loop)
> +			sleep(1);
> +
> +		exit(11);
> +		/* child shouldn't go beyond here */
> +	}
> +
> +	/* send SIGUSR1 to child */
> +	kill(pid, SIGUSR1);
> +	wait(&child_status);
> +
> +out:
> +
> +	return (WIFEXITED(child_status) &&
> +			WEXITSTATUS(child_status) == 11);
> +}
> +
> +int execute_shadow_stack_tests(void)
> +{
> +	int ret = 0;
> +	unsigned long test_count = 0;
> +	unsigned long shstk_status = 0;
> +
> +	printf("Executing RISC-V shadow stack self tests\n");
> +
> +	ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &shstk_status, 0, 0, 0);
> +
> +	if (ret != 0)
> +		ksft_exit_skip("Get shadow stack status failed with %d\n", ret);
> +
> +	/*
> +	 * If we are here that means get shadow stack status succeeded and
> +	 * thus shadow stack support is baked in the kernel.
> +	 */
> +	while (test_count < ARRAY_SIZE(shstk_tests)) {
> +		ksft_test_result((*shstk_tests[test_count].t_func)(test_count, NULL),
> +						 shstk_tests[test_count].name);
> +		test_count++;
> +	}
> +
> +	return 0;
> +}
> +
> +#pragma GCC pop_options
> diff --git a/tools/testing/selftests/riscv/cfi/shadowstack.h b/tools/testing/selftests/riscv/cfi/shadowstack.h
> new file mode 100644
> index 000000000000..b43e74136a26
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/cfi/shadowstack.h
> @@ -0,0 +1,37 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#ifndef SELFTEST_SHADOWSTACK_TEST_H
> +#define SELFTEST_SHADOWSTACK_TEST_H
> +#include <stddef.h>
> +#include <linux/prctl.h>
> +
> +/*
> + * a cfi test returns true for success or false for fail
> + * takes a number for test number to index into array and void pointer.
> + */
> +typedef bool (*shstk_test_func)(unsigned long test_num, void *);
> +
> +struct shadow_stack_tests {
> +	char *name;
> +	shstk_test_func t_func;
> +};
> +
> +bool shadow_stack_fork_test(unsigned long test_num, void *ctx);
> +bool shadow_stack_map_test(unsigned long test_num, void *ctx);
> +bool shadow_stack_protection_test(unsigned long test_num, void *ctx);
> +bool shadow_stack_gup_tests(unsigned long test_num, void *ctx);
> +bool shadow_stack_signal_test(unsigned long test_num, void *ctx);
> +
> +static struct shadow_stack_tests shstk_tests[] = {
> +	{ "shstk fork test\n", shadow_stack_fork_test },
> +	{ "map shadow stack syscall\n", shadow_stack_map_test },
> +	{ "shadow stack gup tests\n", shadow_stack_gup_tests },
> +	{ "shadow stack signal tests\n", shadow_stack_signal_test},
> +	{ "memory protections of shadow stack memory\n", shadow_stack_protection_test }
> +};
> +
> +#define RISCV_SHADOW_STACK_TESTS ARRAY_SIZE(shstk_tests)
> +
> +int execute_shadow_stack_tests(void);
> +
> +#endif
> -- 
> 2.43.2
>
Charlie Jenkins May 10, 2024, 8:30 p.m. UTC | #10
On Wed, Apr 03, 2024 at 04:35:15PM -0700, Deepak Gupta wrote:
> Adding documentation on landing pad aka indirect branch tracking on riscv
> and kernel interfaces exposed so that user tasks can enable it.
> 
> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
> ---
>  Documentation/arch/riscv/zicfilp.rst | 104 +++++++++++++++++++++++++++
>  1 file changed, 104 insertions(+)
>  create mode 100644 Documentation/arch/riscv/zicfilp.rst
> 
> diff --git a/Documentation/arch/riscv/zicfilp.rst b/Documentation/arch/riscv/zicfilp.rst
> new file mode 100644
> index 000000000000..3007c81f0465
> --- /dev/null
> +++ b/Documentation/arch/riscv/zicfilp.rst
> @@ -0,0 +1,104 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +:Author: Deepak Gupta <debug@rivosinc.com>
> +:Date:   12 January 2024
> +
> +====================================================
> +Tracking indirect control transfers on RISC-V Linux
> +====================================================
> +
> +This document briefly describes the interface provided to userspace by Linux
> +to enable indirect branch tracking for user mode applications on RISV-V
> +
> +1. Feature Overview
> +--------------------
> +
> +Memory corruption issues usually result in to crashes, however when in hands of
> +an adversary and if used creatively can result into variety security issues.
> +
> +One of those security issues can be code re-use attacks on program where adversary
> +can use corrupt function pointers and chain them together to perform jump oriented
> +programming (JOP) or call oriented programming (COP) and thus compromising control
> +flow integrity (CFI) of the program.
> +
> +Function pointers live in read-write memory and thus are susceptible to corruption
> +and allows an adversary to reach any program counter (PC) in address space. On
> +RISC-V zicfilp extension enforces a restriction on such indirect control transfers
> +
> +	- indirect control transfers must land on a landing pad instruction `lpad`.
> +	  There are two exception to this rule
> +		- rs1 = x1 or rs1 = x5, i.e. a return from a function and returns are

What is a return that is not a return from a function?

> +		  protected using shadow stack (see zicfiss.rst)
> +
> +		- rs1 = x7. On RISC-V compiler usually does below to reach function
> +		  which is beyond the offset possible J-type instruction.
> +
> +			"auipc x7, <imm>"
> +			"jalr (x7)"
> +
> +		  Such form of indirect control transfer are still immutable and don't rely
> +		  on memory and thus rs1=x7 is exempted from tracking and considered software
> +		  guarded jumps.
> +
> +`lpad` instruction is pseudo of `auipc rd, <imm_20bit>` and is a HINT nop. `lpad`

I think this should say "x0" or instead of "rd", or mention that rd=x0.

> +instruction must be aligned on 4 byte boundary and compares 20 bit immediate with x7.
> +If `imm_20bit` == 0, CPU don't perform any comparision with x7. If `imm_20bit` != 0,
> +then `imm_20bit` must match x7 else CPU will raise `software check exception`
> +(cause=18)with `*tval = 2`.
> +
> +Compiler can generate a hash over function signatures and setup them (truncated
> +to 20bit) in x7 at callsites and function proglogs can have `lpad` with same

"prologues" instead of "proglogs"

> +function hash. This further reduces number of program counters a call site can
> +reach.
> +
> +2. ELF and psABI
> +-----------------
> +
> +Toolchain sets up `GNU_PROPERTY_RISCV_FEATURE_1_FCFI` for property
> +`GNU_PROPERTY_RISCV_FEATURE_1_AND` in notes section of the object file.
> +
> +3. Linux enabling
> +------------------
> +
> +User space programs can have multiple shared objects loaded in its address space
> +and it's a difficult task to make sure all the dependencies have been compiled
> +with support of indirect branch. Thus it's left to dynamic loader to enable
> +indirect branch tracking for the program.
> +
> +4. prctl() enabling
> +--------------------
> +
> +`PR_SET_INDIR_BR_LP_STATUS` / `PR_GET_INDIR_BR_LP_STATUS` /
> +`PR_LOCK_INDIR_BR_LP_STATUS` are three prctls added to manage indirect branch
> +tracking. prctls are arch agnostic and returns -EINVAL on other arches.
> +
> +`PR_SET_INDIR_BR_LP_STATUS`: If arg1 `PR_INDIR_BR_LP_ENABLE` and if CPU supports
> +`zicfilp` then kernel will enabled indirect branch tracking for the task.
> +Dynamic loader can issue this `prctl` once it has determined that all the objects
> +loaded in address space support indirect branch tracking. Additionally if there is
> +a `dlopen` to an object which wasn't compiled with `zicfilp`, dynamic loader can
> +issue this prctl with arg1 set to 0 (i.e. `PR_INDIR_BR_LP_ENABLE` being clear)
> +
> +`PR_GET_INDIR_BR_LP_STATUS`: Returns current status of indirect branch tracking.
> +If enabled it'll return `PR_INDIR_BR_LP_ENABLE`
> +
> +`PR_LOCK_INDIR_BR_LP_STATUS`: Locks current status of indirect branch tracking on
> +the task. User space may want to run with strict security posture and wouldn't want
> +loading of objects without `zicfilp` support in it and thus would want to disallow
> +disabling of indirect branch tracking. In that case user space can use this prctl
> +to lock current settings.
> +
> +5. violations related to indirect branch tracking
> +--------------------------------------------------
> +
> +Pertaining to indirect branch tracking, CPU raises software check exception in
> +following conditions
> +	- missing `lpad` after indirect call / jmp
> +	- `lpad` not on 4 byte boundary
> +	- `imm_20bit` embedded in `lpad` instruction doesn't match with `x7`
> +
> +In all 3 cases, `*tval = 2` is captured and software check exception is raised
> +(cause=18)
> +
> +Linux kernel will treat this as `SIGSEV`` with code = `SEGV_CPERR` and follow
> +normal course of signal delivery.
> -- 
> 2.43.2
>
Charlie Jenkins May 10, 2024, 10:36 p.m. UTC | #11
On Wed, Apr 03, 2024 at 04:34:51PM -0700, Deepak Gupta wrote:
> riscv will need an implementation for exit_thread to clean up shadow stack
> when thread exits. If current thread had shadow stack enabled, shadow
> stack is allocated by default for any new thread.
> 
> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
> ---
>  arch/riscv/Kconfig          | 1 +
>  arch/riscv/kernel/process.c | 5 +++++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index e3142ce531a0..7e0b2bcc388f 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -149,6 +149,7 @@ config RISCV
>  	select HAVE_SAMPLE_FTRACE_DIRECT_MULTI
>  	select HAVE_STACKPROTECTOR
>  	select HAVE_SYSCALL_TRACEPOINTS
> +	select HAVE_EXIT_THREAD
>  	select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
>  	select IRQ_DOMAIN
>  	select IRQ_FORCED_THREADING
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index d3109557f951..ce577cdc2af3 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -200,6 +200,11 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>  	return 0;
>  }
>  
> +void exit_thread(struct task_struct *tsk)
> +{
> +
> +}
> +
>  int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>  {
>  	unsigned long clone_flags = args->flags;
> -- 
> 2.43.2
> 

Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Charlie Jenkins May 10, 2024, 10:51 p.m. UTC | #12
On Wed, Apr 03, 2024 at 04:34:55PM -0700, Deepak Gupta wrote:
> Carves out space in arch specific thread struct for cfi status and shadow
> stack in usermode on riscv.
> 
> This patch does following
> - defines a new structure cfi_status with status bit for cfi feature
> - defines shadow stack pointer, base and size in cfi_status structure
> - defines offsets to new member fields in thread in asm-offsets.c
> - Saves and restore shadow stack pointer on trap entry (U --> S) and exit
>   (S --> U)
> 
> Shadow stack save/restore is gated on feature availiblity and implemented
> using alternative. CSR can be context switched in `switch_to` as well but
> soon as kernel shadow stack support gets rolled in, shadow stack pointer
> will need to be switched at trap entry/exit point (much like `sp`). It can
> be argued that kernel using shadow stack deployment scenario may not be as
> prevalant as user mode using this feature. But even if there is some
> minimal deployment of kernel shadow stack, that means that it needs to be
> supported. And thus save/restore of shadow stack pointer in entry.S instead
> of in `switch_to.h`.
> 
> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
> ---
>  arch/riscv/include/asm/processor.h   |  1 +
>  arch/riscv/include/asm/thread_info.h |  3 +++
>  arch/riscv/include/asm/usercfi.h     | 24 ++++++++++++++++++++++++
>  arch/riscv/kernel/asm-offsets.c      |  4 ++++
>  arch/riscv/kernel/entry.S            | 26 ++++++++++++++++++++++++++
>  5 files changed, 58 insertions(+)
>  create mode 100644 arch/riscv/include/asm/usercfi.h
> 
> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> index 6c5b3d928b12..f8decf357804 100644
> --- a/arch/riscv/include/asm/processor.h
> +++ b/arch/riscv/include/asm/processor.h
> @@ -14,6 +14,7 @@
>  
>  #include <asm/ptrace.h>
>  #include <asm/hwcap.h>
> +#include <asm/usercfi.h>
>  
>  #ifdef CONFIG_64BIT
>  #define DEFAULT_MAP_WINDOW	(UL(1) << (MMAP_VA_BITS - 1))
> diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
> index a503bdc2f6dd..f1dee307806e 100644
> --- a/arch/riscv/include/asm/thread_info.h
> +++ b/arch/riscv/include/asm/thread_info.h
> @@ -57,6 +57,9 @@ struct thread_info {
>  	int			cpu;
>  	unsigned long		syscall_work;	/* SYSCALL_WORK_ flags */
>  	unsigned long envcfg;
> +#ifdef CONFIG_RISCV_USER_CFI
> +	struct cfi_status       user_cfi_state;
> +#endif
>  #ifdef CONFIG_SHADOW_CALL_STACK
>  	void			*scs_base;
>  	void			*scs_sp;
> diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h
> new file mode 100644
> index 000000000000..4fa201b4fc4e
> --- /dev/null
> +++ b/arch/riscv/include/asm/usercfi.h
> @@ -0,0 +1,24 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + * Copyright (C) 2024 Rivos, Inc.
> + * Deepak Gupta <debug@rivosinc.com>
> + */
> +#ifndef _ASM_RISCV_USERCFI_H
> +#define _ASM_RISCV_USERCFI_H
> +
> +#ifndef __ASSEMBLY__
> +#include <linux/types.h>
> +
> +#ifdef CONFIG_RISCV_USER_CFI
> +struct cfi_status {
> +	unsigned long ubcfi_en : 1; /* Enable for backward cfi. */
> +	unsigned long rsvd : ((sizeof(unsigned long)*8) - 1);
> +	unsigned long user_shdw_stk; /* Current user shadow stack pointer */
> +	unsigned long shdw_stk_base; /* Base address of shadow stack */
> +	unsigned long shdw_stk_size; /* size of shadow stack */
> +};
> +
> +#endif /* CONFIG_RISCV_USER_CFI */
> +
> +#endif /* __ASSEMBLY__ */
> +
> +#endif /* _ASM_RISCV_USERCFI_H */
> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> index a03129f40c46..5c5ea015c776 100644
> --- a/arch/riscv/kernel/asm-offsets.c
> +++ b/arch/riscv/kernel/asm-offsets.c
> @@ -44,6 +44,10 @@ void asm_offsets(void)
>  #endif
>  
>  	OFFSET(TASK_TI_CPU_NUM, task_struct, thread_info.cpu);
> +#ifdef CONFIG_RISCV_USER_CFI
> +	OFFSET(TASK_TI_CFI_STATUS, task_struct, thread_info.user_cfi_state);
> +	OFFSET(TASK_TI_USER_SSP, task_struct, thread_info.user_cfi_state.user_shdw_stk);
> +#endif
>  	OFFSET(TASK_THREAD_F0,  task_struct, thread.fstate.f[0]);
>  	OFFSET(TASK_THREAD_F1,  task_struct, thread.fstate.f[1]);
>  	OFFSET(TASK_THREAD_F2,  task_struct, thread.fstate.f[2]);
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 9d1a305d5508..7245a0ea25c1 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -60,6 +60,20 @@ SYM_CODE_START(handle_exception)
>  
>  	REG_L s0, TASK_TI_USER_SP(tp)
>  	csrrc s1, CSR_STATUS, t0
> +	/*
> +	 * If previous mode was U, capture shadow stack pointer and save it away
> +	 * Zero CSR_SSP at the same time for sanitization.
> +	 */
> +	ALTERNATIVE("nop; nop; nop; nop",
> +				__stringify(			\
> +				andi s2, s1, SR_SPP;	\
> +				bnez s2, skip_ssp_save;	\
> +				csrrw s2, CSR_SSP, x0;	\
> +				REG_S s2, TASK_TI_USER_SSP(tp); \
> +				skip_ssp_save:),
> +				0,
> +				RISCV_ISA_EXT_ZICFISS,
> +				CONFIG_RISCV_USER_CFI)
>  	csrr s2, CSR_EPC
>  	csrr s3, CSR_TVAL
>  	csrr s4, CSR_CAUSE
> @@ -141,6 +155,18 @@ SYM_CODE_START_NOALIGN(ret_from_exception)
>  	 * structures again.
>  	 */
>  	csrw CSR_SCRATCH, tp
> +
> +	/*
> +	 * Going back to U mode, restore shadow stack pointer
> +	 */
> +	ALTERNATIVE("nop; nop",
> +				__stringify(					\
> +				REG_L s3, TASK_TI_USER_SSP(tp); \
> +				csrw CSR_SSP, s3),
> +				0,
> +				RISCV_ISA_EXT_ZICFISS,
> +				CONFIG_RISCV_USER_CFI)
> +
>  1:
>  #ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
>  	move a0, sp
> -- 
> 2.43.2
> 

Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Charlie Jenkins May 10, 2024, 11:29 p.m. UTC | #13
On Wed, Apr 03, 2024 at 04:35:05PM -0700, Deepak Gupta wrote:
> Three architectures (x86, aarch64, riscv) have support for indirect branch
> tracking feature in a very similar fashion. On a very high level, indirect
> branch tracking is a CPU feature where CPU tracks branches which uses
> memory operand to perform control transfer in program. As part of this
> tracking on indirect branches, CPU goes in a state where it expects a
> landing pad instr on target and if not found then CPU raises some fault
> (architecture dependent)
> 
> x86 landing pad instr - `ENDBRANCH`
> aarch64 landing pad instr - `BTI`
> riscv landing instr - `lpad`
> 
> Given that three major arches have support for indirect branch tracking,
> This patch makes `prctl` for indirect branch tracking arch agnostic.
> 
> To allow userspace to enable this feature for itself, following prtcls are
> defined:
>  - PR_GET_INDIR_BR_LP_STATUS: Gets current configured status for indirect
>    branch tracking.
>  - PR_SET_INDIR_BR_LP_STATUS: Sets a configuration for indirect branch
>    tracking.
>    Following status options are allowed
>        - PR_INDIR_BR_LP_ENABLE: Enables indirect branch tracking on user
>          thread.
>        - PR_INDIR_BR_LP_DISABLE; Disables indirect branch tracking on user
>          thread.
>  - PR_LOCK_INDIR_BR_LP_STATUS: Locks configured status for indirect branch
>    tracking for user thread.
> 
> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
> ---
>  include/uapi/linux/prctl.h | 27 +++++++++++++++++++++++++++
>  kernel/sys.c               | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 57 insertions(+)
> 
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index 3c66ed8f46d8..b7a8212a068e 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -328,4 +328,31 @@ struct prctl_mm_map {
>   */
>  #define PR_LOCK_SHADOW_STACK_STATUS      73
>  
> +/*
> + * Get the current indirect branch tracking configuration for the current
> + * thread, this will be the value configured via PR_SET_INDIR_BR_LP_STATUS.
> + */
> +#define PR_GET_INDIR_BR_LP_STATUS      74
> +
> +/*
> + * Set the indirect branch tracking configuration. PR_INDIR_BR_LP_ENABLE will
> + * enable cpu feature for user thread, to track all indirect branches and ensure
> + * they land on arch defined landing pad instruction.
> + * x86 - If enabled, an indirect branch must land on `ENDBRANCH` instruction.
> + * arch64 - If enabled, an indirect branch must land on `BTI` instruction.
> + * riscv - If enabled, an indirect branch must land on `lpad` instruction.
> + * PR_INDIR_BR_LP_DISABLE will disable feature for user thread and indirect
> + * branches will no more be tracked by cpu to land on arch defined landing pad
> + * instruction.
> + */
> +#define PR_SET_INDIR_BR_LP_STATUS      75
> +# define PR_INDIR_BR_LP_ENABLE		   (1UL << 0)
> +
> +/*
> + * Prevent further changes to the specified indirect branch tracking
> + * configuration.  All bits may be locked via this call, including
> + * undefined bits.
> + */
> +#define PR_LOCK_INDIR_BR_LP_STATUS      76
> +
>  #endif /* _LINUX_PRCTL_H */
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 242e9f147791..c770060c3f06 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2330,6 +2330,21 @@ int __weak arch_lock_shadow_stack_status(struct task_struct *t, unsigned long st
>  	return -EINVAL;
>  }
>  
> +int __weak arch_get_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
> +{
> +	return -EINVAL;
> +}
> +
> +int __weak arch_set_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
> +{
> +	return -EINVAL;
> +}
> +
> +int __weak arch_lock_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
> +{
> +	return -EINVAL;
> +}
> +

These weak references each cause a warning:

kernel/sys.c:2333:12: warning: no previous prototype for 'arch_get_indir_br_lp_status' [-Wmissing-prototypes]
 2333 | int __weak arch_get_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/sys.c:2338:12: warning: no previous prototype for 'arch_set_indir_br_lp_status' [-Wmissing-prototypes]
 2338 | int __weak arch_set_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/sys.c:2343:12: warning: no previous prototype for 'arch_lock_indir_br_lp_status' [-Wmissing-prototypes]
 2343 | int __weak arch_lock_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)

Can the definitions be added to include/linux/mm.h alongside the
*_shadow_stack_status() definitions?

- Charlie

>  #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)
>  
>  #ifdef CONFIG_ANON_VMA_NAME
> @@ -2787,6 +2802,21 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  			return -EINVAL;
>  		error = arch_lock_shadow_stack_status(me, arg2);
>  		break;
> +	case PR_GET_INDIR_BR_LP_STATUS:
> +		if (arg3 || arg4 || arg5)
> +			return -EINVAL;
> +		error = arch_get_indir_br_lp_status(me, (unsigned long __user *) arg2);
> +		break;
> +	case PR_SET_INDIR_BR_LP_STATUS:
> +		if (arg3 || arg4 || arg5)
> +			return -EINVAL;
> +		error = arch_set_indir_br_lp_status(me, (unsigned long __user *) arg2);
> +		break;
> +	case PR_LOCK_INDIR_BR_LP_STATUS:
> +		if (arg3 || arg4 || arg5)
> +			return -EINVAL;
> +		error = arch_lock_indir_br_lp_status(me, (unsigned long __user *) arg2);
> +		break;
>  	default:
>  		error = -EINVAL;
>  		break;
> -- 
> 2.43.2
>
Alexandre Ghiti May 12, 2024, 4:26 p.m. UTC | #14
On 04/04/2024 01:34, Deepak Gupta wrote:
> This patch implements creating shadow stack pte (on riscv). Creating
> shadow stack PTE on riscv means that clearing RWX and then setting W=1.
>
> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
> ---
>   arch/riscv/include/asm/pgtable.h | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 4d5983bc6766..6362407f1e83 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -408,6 +408,12 @@ static inline pte_t pte_mkwrite_novma(pte_t pte)
>   	return __pte(pte_val(pte) | _PAGE_WRITE);
>   }
>   
> +static inline pte_t pte_mkwrite_shstk(pte_t pte)
> +{
> +	/* shadow stack on risc-v is XWR = 010. Clear everything and only set _PAGE_WRITE */


Nit: Not sure the comment is necessary


> +	return __pte((pte_val(pte) & ~(_PAGE_LEAF)) | _PAGE_WRITE);
> +}
> +
>   /* static inline pte_t pte_mkexec(pte_t pte) */
>   
>   static inline pte_t pte_mkdirty(pte_t pte)
> @@ -693,6 +699,12 @@ static inline pmd_t pmd_mkwrite_novma(pmd_t pmd)
>   	return pte_pmd(pte_mkwrite_novma(pmd_pte(pmd)));
>   }
>   
> +static inline pmd_t pmd_mkwrite_shstk(pmd_t pte)
> +{
> +	/* shadow stack on risc-v is XWR = 010. Clear everything and only set _PAGE_WRITE */
> +	return __pmd((pmd_val(pte) & ~(_PAGE_LEAF)) | _PAGE_WRITE);
> +}
> +
>   static inline pmd_t pmd_wrprotect(pmd_t pmd)
>   {
>   	return pte_pmd(pte_wrprotect(pmd_pte(pmd)));

Otherwise:

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>

Thanks,

Alex
Alexandre Ghiti May 12, 2024, 4:31 p.m. UTC | #15
On 04/04/2024 01:35, Deepak Gupta wrote:
> `fork` implements copy on write (COW) by making pages readonly in child
> and parent both.
>
> ptep_set_wrprotect and pte_wrprotect clears _PAGE_WRITE in PTE.
> Assumption is that page is readable and on fault copy on write happens.
>
> To implement COW on such pages,


I guess you mean "shadow stack pages" here.


>   clearing up W bit makes them XWR = 000.
> This will result in wrong PTE setting which says no perms but V=1 and PFN
> field pointing to final page. Instead desired behavior is to turn it into
> a readable page, take an access (load/store) fault on sspush/sspop
> (shadow stack) and then perform COW on such pages.
> This way regular reads
> would still be allowed and not lead to COW maintaining current behavior
> of COW on non-shadow stack but writeable memory.
>
> On the other hand it doesn't interfere with existing COW for read-write
> memory. Assumption is always that _PAGE_READ must have been set and thus
> setting _PAGE_READ is harmless.
>
> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
> ---
>   arch/riscv/include/asm/pgtable.h | 12 ++++++++++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 9b837239d3e8..7a1c2a98d272 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -398,7 +398,7 @@ static inline int pte_special(pte_t pte)
>   
>   static inline pte_t pte_wrprotect(pte_t pte)
>   {
> -	return __pte(pte_val(pte) & ~(_PAGE_WRITE));
> +	return __pte((pte_val(pte) & ~(_PAGE_WRITE)) | (_PAGE_READ));
>   }
>   
>   /* static inline pte_t pte_mkread(pte_t pte) */
> @@ -581,7 +581,15 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
>   static inline void ptep_set_wrprotect(struct mm_struct *mm,
>   				      unsigned long address, pte_t *ptep)
>   {
> -	atomic_long_and(~(unsigned long)_PAGE_WRITE, (atomic_long_t *)ptep);
> +	volatile pte_t read_pte = *ptep;
> +	/*
> +	 * ptep_set_wrprotect can be called for shadow stack ranges too.
> +	 * shadow stack memory is XWR = 010 and thus clearing _PAGE_WRITE will lead to
> +	 * encoding 000b which is wrong encoding with V = 1. This should lead to page fault
> +	 * but we dont want this wrong configuration to be set in page tables.
> +	 */
> +	atomic_long_set((atomic_long_t *)ptep,
> +			((pte_val(read_pte) & ~(unsigned long)_PAGE_WRITE) | _PAGE_READ));
>   }
>   
>   #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH


Doesn't making the shadow stack page readable allow "normal" loads to 
access the page? If it does, isn't that an issue (security-wise)?
Alexandre Ghiti May 12, 2024, 5:05 p.m. UTC | #16
On 04/04/2024 01:35, Deepak Gupta wrote:
> Userspace specifies VM_CLONE to share address space and spawn new thread.


CLONE_VM?


> `clone` allow userspace to specify a new stack for new thread. However
> there is no way to specify new shadow stack base address without changing
> API. This patch allocates a new shadow stack whenever VM_CLONE is given.
>
> In case of VM_FORK, parent is suspended until child finishes and thus can


You mean CLONE_VFORK here right?


> child use parent shadow stack. In case of !VM_CLONE, COW kicks in because
> entire address space is copied from parent to child.
>
> `clone3` is extensible and can provide mechanisms using which shadow stack
> as an input parameter can be provided. This is not settled yet and being
> extensively discussed on mailing list. Once that's settled, this commit
> will adapt to that.
>
> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
> ---
>   arch/riscv/include/asm/usercfi.h |  39 ++++++++++
>   arch/riscv/kernel/process.c      |  12 ++-
>   arch/riscv/kernel/usercfi.c      | 121 +++++++++++++++++++++++++++++++
>   3 files changed, 171 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h
> index 4fa201b4fc4e..b47574a7a8c9 100644
> --- a/arch/riscv/include/asm/usercfi.h
> +++ b/arch/riscv/include/asm/usercfi.h
> @@ -8,6 +8,9 @@
>   #ifndef __ASSEMBLY__
>   #include <linux/types.h>
>   
> +struct task_struct;
> +struct kernel_clone_args;
> +
>   #ifdef CONFIG_RISCV_USER_CFI
>   struct cfi_status {
>   	unsigned long ubcfi_en : 1; /* Enable for backward cfi. */
> @@ -17,6 +20,42 @@ struct cfi_status {
>   	unsigned long shdw_stk_size; /* size of shadow stack */
>   };
>   
> +unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
> +							const struct kernel_clone_args *args);
> +void shstk_release(struct task_struct *tsk);
> +void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size);
> +void set_active_shstk(struct task_struct *task, unsigned long shstk_addr);
> +bool is_shstk_enabled(struct task_struct *task);
> +
> +#else
> +
> +static inline unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
> +					   const struct kernel_clone_args *args)
> +{
> +	return 0;
> +}
> +
> +static inline void shstk_release(struct task_struct *tsk)
> +{
> +
> +}
> +
> +static inline void set_shstk_base(struct task_struct *task, unsigned long shstk_addr,
> +								unsigned long size)
> +{
> +
> +}
> +
> +static inline void set_active_shstk(struct task_struct *task, unsigned long shstk_addr)
> +{
> +
> +}
> +
> +static inline bool is_shstk_enabled(struct task_struct *task)
> +{
> +	return false;
> +}
> +
>   #endif /* CONFIG_RISCV_USER_CFI */
>   
>   #endif /* __ASSEMBLY__ */
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index ce577cdc2af3..ef48a25b0eff 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -26,6 +26,7 @@
>   #include <asm/cpuidle.h>
>   #include <asm/vector.h>
>   #include <asm/cpufeature.h>
> +#include <asm/usercfi.h>
>   
>   register unsigned long gp_in_global __asm__("gp");
>   
> @@ -202,7 +203,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>   
>   void exit_thread(struct task_struct *tsk)
>   {
> -
> +	if (IS_ENABLED(CONFIG_RISCV_USER_CFI))
> +		shstk_release(tsk);
>   }
>   
>   int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> @@ -210,6 +212,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>   	unsigned long clone_flags = args->flags;
>   	unsigned long usp = args->stack;
>   	unsigned long tls = args->tls;
> +	unsigned long ssp = 0;
>   	struct pt_regs *childregs = task_pt_regs(p);
>   
>   	memset(&p->thread.s, 0, sizeof(p->thread.s));
> @@ -225,11 +228,18 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>   		p->thread.s[0] = (unsigned long)args->fn;
>   		p->thread.s[1] = (unsigned long)args->fn_arg;
>   	} else {
> +		/* allocate new shadow stack if needed. In case of CLONE_VM we have to */
> +		ssp = shstk_alloc_thread_stack(p, args);
> +		if (IS_ERR_VALUE(ssp))
> +			return PTR_ERR((void *)ssp);
> +
>   		*childregs = *(current_pt_regs());
>   		/* Turn off status.VS */
>   		riscv_v_vstate_off(childregs);
>   		if (usp) /* User fork */
>   			childregs->sp = usp;
> +		if (ssp) /* if needed, set new ssp */
> +			set_active_shstk(p, ssp);
>   		if (clone_flags & CLONE_SETTLS)
>   			childregs->tp = tls;
>   		childregs->a0 = 0; /* Return value of fork() */
> diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c
> index c4ed0d4e33d6..11ef7ab925c9 100644
> --- a/arch/riscv/kernel/usercfi.c
> +++ b/arch/riscv/kernel/usercfi.c
> @@ -19,6 +19,41 @@
>   
>   #define SHSTK_ENTRY_SIZE sizeof(void *)
>   
> +bool is_shstk_enabled(struct task_struct *task)
> +{
> +	return task->thread_info.user_cfi_state.ubcfi_en ? true : false;
> +}
> +
> +void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size)
> +{
> +	task->thread_info.user_cfi_state.shdw_stk_base = shstk_addr;
> +	task->thread_info.user_cfi_state.shdw_stk_size = size;
> +}
> +
> +unsigned long get_shstk_base(struct task_struct *task, unsigned long *size)
> +{
> +	if (size)
> +		*size = task->thread_info.user_cfi_state.shdw_stk_size;
> +	return task->thread_info.user_cfi_state.shdw_stk_base;
> +}
> +
> +void set_active_shstk(struct task_struct *task, unsigned long shstk_addr)
> +{
> +	task->thread_info.user_cfi_state.user_shdw_stk = shstk_addr;
> +}
> +
> +/*
> + * If size is 0, then to be compatible with regular stack we want it to be as big as
> + * regular stack. Else PAGE_ALIGN it and return back
> + */
> +static unsigned long calc_shstk_size(unsigned long size)
> +{
> +	if (size)
> +		return PAGE_ALIGN(size);
> +
> +	return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G));
> +}
> +
>   /*
>    * Writes on shadow stack can either be `sspush` or `ssamoswap`. `sspush` can happen
>    * implicitly on current shadow stack pointed to by CSR_SSP. `ssamoswap` takes pointer to
> @@ -147,3 +182,89 @@ SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsi
>   
>   	return allocate_shadow_stack(addr, aligned_size, size, set_tok);
>   }
> +
> +/*
> + * This gets called during clone/clone3/fork. And is needed to allocate a shadow stack for
> + * cases where CLONE_VM is specified and thus a different stack is specified by user. We
> + * thus need a separate shadow stack too. How does separate shadow stack is specified by
> + * user is still being debated. Once that's settled, remove this part of the comment.
> + * This function simply returns 0 if shadow stack are not supported or if separate shadow
> + * stack allocation is not needed (like in case of !CLONE_VM)
> + */
> +unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
> +					   const struct kernel_clone_args *args)
> +{
> +	unsigned long addr, size;
> +
> +	/* If shadow stack is not supported, return 0 */
> +	if (!cpu_supports_shadow_stack())
> +		return 0;
> +
> +	/*
> +	 * If shadow stack is not enabled on the new thread, skip any
> +	 * switch to a new shadow stack.
> +	 */
> +	if (is_shstk_enabled(tsk))
> +		return 0;
> +
> +	/*
> +	 * For CLONE_VFORK the child will share the parents shadow stack.
> +	 * Set base = 0 and size = 0, this is special means to track this state
> +	 * so the freeing logic run for child knows to leave it alone.
> +	 */
> +	if (args->flags & CLONE_VFORK) {
> +		set_shstk_base(tsk, 0, 0);
> +		return 0;
> +	}
> +
> +	/*
> +	 * For !CLONE_VM the child will use a copy of the parents shadow
> +	 * stack.
> +	 */
> +	if (!(args->flags & CLONE_VM))
> +		return 0;
> +
> +	/*
> +	 * reaching here means, CLONE_VM was specified and thus a separate shadow
> +	 * stack is needed for new cloned thread. Note: below allocation is happening
> +	 * using current mm.
> +	 */
> +	size = calc_shstk_size(args->stack_size);
> +	addr = allocate_shadow_stack(0, size, 0, false);
> +	if (IS_ERR_VALUE(addr))
> +		return addr;
> +
> +	set_shstk_base(tsk, addr, size);
> +
> +	return addr + size;
> +}
> +
> +void shstk_release(struct task_struct *tsk)
> +{
> +	unsigned long base = 0, size = 0;
> +	/* If shadow stack is not supported or not enabled, nothing to release */
> +	if (!cpu_supports_shadow_stack() ||
> +		!is_shstk_enabled(tsk))
> +		return;
> +
> +	/*
> +	 * When fork() with CLONE_VM fails, the child (tsk) already has a
> +	 * shadow stack allocated, and exit_thread() calls this function to
> +	 * free it.  In this case the parent (current) and the child share
> +	 * the same mm struct. Move forward only when they're same.
> +	 */
> +	if (!tsk->mm || tsk->mm != current->mm)
> +		return;
> +
> +	/*
> +	 * We know shadow stack is enabled but if base is NULL, then
> +	 * this task is not managing its own shadow stack (CLONE_VFORK). So
> +	 * skip freeing it.
> +	 */
> +	base = get_shstk_base(tsk, &size);
> +	if (!base)
> +		return;
> +
> +	vm_munmap(base, size);
> +	set_shstk_base(tsk, 0, 0);
> +}
Deepak Gupta May 13, 2024, 5:07 p.m. UTC | #17
On Fri, May 10, 2024 at 01:30:32PM -0700, Charlie Jenkins wrote:
>On Wed, Apr 03, 2024 at 04:35:15PM -0700, Deepak Gupta wrote:
>> Adding documentation on landing pad aka indirect branch tracking on riscv
>> and kernel interfaces exposed so that user tasks can enable it.
>>
>> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
>> ---
>>  Documentation/arch/riscv/zicfilp.rst | 104 +++++++++++++++++++++++++++
>>  1 file changed, 104 insertions(+)
>>  create mode 100644 Documentation/arch/riscv/zicfilp.rst
>>
>> diff --git a/Documentation/arch/riscv/zicfilp.rst b/Documentation/arch/riscv/zicfilp.rst
>> new file mode 100644
>> index 000000000000..3007c81f0465
>> --- /dev/null
>> +++ b/Documentation/arch/riscv/zicfilp.rst
>> @@ -0,0 +1,104 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +:Author: Deepak Gupta <debug@rivosinc.com>
>> +:Date:   12 January 2024
>> +
>> +====================================================
>> +Tracking indirect control transfers on RISC-V Linux
>> +====================================================
>> +
>> +This document briefly describes the interface provided to userspace by Linux
>> +to enable indirect branch tracking for user mode applications on RISV-V
>> +
>> +1. Feature Overview
>> +--------------------
>> +
>> +Memory corruption issues usually result in to crashes, however when in hands of
>> +an adversary and if used creatively can result into variety security issues.
>> +
>> +One of those security issues can be code re-use attacks on program where adversary
>> +can use corrupt function pointers and chain them together to perform jump oriented
>> +programming (JOP) or call oriented programming (COP) and thus compromising control
>> +flow integrity (CFI) of the program.
>> +
>> +Function pointers live in read-write memory and thus are susceptible to corruption
>> +and allows an adversary to reach any program counter (PC) in address space. On
>> +RISC-V zicfilp extension enforces a restriction on such indirect control transfers
>> +
>> +	- indirect control transfers must land on a landing pad instruction `lpad`.
>> +	  There are two exception to this rule
>> +		- rs1 = x1 or rs1 = x5, i.e. a return from a function and returns are
>
>What is a return that is not a return from a function?
Those would be a jump or call (depending on convention of whether return is saved in x1/x5)

>
>> +		  protected using shadow stack (see zicfiss.rst)
>> +
>> +		- rs1 = x7. On RISC-V compiler usually does below to reach function
>> +		  which is beyond the offset possible J-type instruction.
>> +
>> +			"auipc x7, <imm>"
>> +			"jalr (x7)"
>> +
>> +		  Such form of indirect control transfer are still immutable and don't rely
>> +		  on memory and thus rs1=x7 is exempted from tracking and considered software
>> +		  guarded jumps.
>> +
>> +`lpad` instruction is pseudo of `auipc rd, <imm_20bit>` and is a HINT nop. `lpad`
>
>I think this should say "x0" or instead of "rd", or mention that rd=x0.

Yeah I missed that. will fix it.

>
>> +instruction must be aligned on 4 byte boundary and compares 20 bit immediate with x7.
>> +If `imm_20bit` == 0, CPU don't perform any comparision with x7. If `imm_20bit` != 0,
>> +then `imm_20bit` must match x7 else CPU will raise `software check exception`
>> +(cause=18)with `*tval = 2`.
>> +
>> +Compiler can generate a hash over function signatures and setup them (truncated
>> +to 20bit) in x7 at callsites and function proglogs can have `lpad` with same
>
>"prologues" instead of "proglogs"

Will fix it.

>
>> +function hash. This further reduces number of program counters a call site can
>> +reach.
>> +
>> +2. ELF and psABI
>> +-----------------
>> +
>> +Toolchain sets up `GNU_PROPERTY_RISCV_FEATURE_1_FCFI` for property
>> +`GNU_PROPERTY_RISCV_FEATURE_1_AND` in notes section of the object file.
>> +
>> +3. Linux enabling
>> +------------------
>> +
>> +User space programs can have multiple shared objects loaded in its address space
>> +and it's a difficult task to make sure all the dependencies have been compiled
>> +with support of indirect branch. Thus it's left to dynamic loader to enable
>> +indirect branch tracking for the program.
>> +
>> +4. prctl() enabling
>> +--------------------
>> +
>> +`PR_SET_INDIR_BR_LP_STATUS` / `PR_GET_INDIR_BR_LP_STATUS` /
>> +`PR_LOCK_INDIR_BR_LP_STATUS` are three prctls added to manage indirect branch
>> +tracking. prctls are arch agnostic and returns -EINVAL on other arches.
>> +
>> +`PR_SET_INDIR_BR_LP_STATUS`: If arg1 `PR_INDIR_BR_LP_ENABLE` and if CPU supports
>> +`zicfilp` then kernel will enabled indirect branch tracking for the task.
>> +Dynamic loader can issue this `prctl` once it has determined that all the objects
>> +loaded in address space support indirect branch tracking. Additionally if there is
>> +a `dlopen` to an object which wasn't compiled with `zicfilp`, dynamic loader can
>> +issue this prctl with arg1 set to 0 (i.e. `PR_INDIR_BR_LP_ENABLE` being clear)
>> +
>> +`PR_GET_INDIR_BR_LP_STATUS`: Returns current status of indirect branch tracking.
>> +If enabled it'll return `PR_INDIR_BR_LP_ENABLE`
>> +
>> +`PR_LOCK_INDIR_BR_LP_STATUS`: Locks current status of indirect branch tracking on
>> +the task. User space may want to run with strict security posture and wouldn't want
>> +loading of objects without `zicfilp` support in it and thus would want to disallow
>> +disabling of indirect branch tracking. In that case user space can use this prctl
>> +to lock current settings.
>> +
>> +5. violations related to indirect branch tracking
>> +--------------------------------------------------
>> +
>> +Pertaining to indirect branch tracking, CPU raises software check exception in
>> +following conditions
>> +	- missing `lpad` after indirect call / jmp
>> +	- `lpad` not on 4 byte boundary
>> +	- `imm_20bit` embedded in `lpad` instruction doesn't match with `x7`
>> +
>> +In all 3 cases, `*tval = 2` is captured and software check exception is raised
>> +(cause=18)
>> +
>> +Linux kernel will treat this as `SIGSEV`` with code = `SEGV_CPERR` and follow
>> +normal course of signal delivery.
>> --
>> 2.43.2
>>
Deepak Gupta May 13, 2024, 5:10 p.m. UTC | #18
On Sun, May 12, 2024 at 07:05:27PM +0200, Alexandre Ghiti wrote:
>On 04/04/2024 01:35, Deepak Gupta wrote:
>>Userspace specifies VM_CLONE to share address space and spawn new thread.
>
>
>CLONE_VM?

Yes I meant CLONE_VM, will fix it.

>
>
>>`clone` allow userspace to specify a new stack for new thread. However
>>there is no way to specify new shadow stack base address without changing
>>API. This patch allocates a new shadow stack whenever VM_CLONE is given.
>>
>>In case of VM_FORK, parent is suspended until child finishes and thus can
>
>
>You mean CLONE_VFORK here right?

Yes I meant CLONE_VFORK, will fix it.

>
>
>>child use parent shadow stack. In case of !VM_CLONE, COW kicks in because
>>entire address space is copied from parent to child.
>>
>>`clone3` is extensible and can provide mechanisms using which shadow stack
>>as an input parameter can be provided. This is not settled yet and being
>>extensively discussed on mailing list. Once that's settled, this commit
>>will adapt to that.
>>
>>Signed-off-by: Deepak Gupta <debug@rivosinc.com>
>>---
>>  arch/riscv/include/asm/usercfi.h |  39 ++++++++++
>>  arch/riscv/kernel/process.c      |  12 ++-
>>  arch/riscv/kernel/usercfi.c      | 121 +++++++++++++++++++++++++++++++
>>  3 files changed, 171 insertions(+), 1 deletion(-)
>>
>>diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h
>>index 4fa201b4fc4e..b47574a7a8c9 100644
>>--- a/arch/riscv/include/asm/usercfi.h
>>+++ b/arch/riscv/include/asm/usercfi.h
>>@@ -8,6 +8,9 @@
>>  #ifndef __ASSEMBLY__
>>  #include <linux/types.h>
>>+struct task_struct;
>>+struct kernel_clone_args;
>>+
>>  #ifdef CONFIG_RISCV_USER_CFI
>>  struct cfi_status {
>>  	unsigned long ubcfi_en : 1; /* Enable for backward cfi. */
>>@@ -17,6 +20,42 @@ struct cfi_status {
>>  	unsigned long shdw_stk_size; /* size of shadow stack */
>>  };
>>+unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
>>+							const struct kernel_clone_args *args);
>>+void shstk_release(struct task_struct *tsk);
>>+void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size);
>>+void set_active_shstk(struct task_struct *task, unsigned long shstk_addr);
>>+bool is_shstk_enabled(struct task_struct *task);
>>+
>>+#else
>>+
>>+static inline unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
>>+					   const struct kernel_clone_args *args)
>>+{
>>+	return 0;
>>+}
>>+
>>+static inline void shstk_release(struct task_struct *tsk)
>>+{
>>+
>>+}
>>+
>>+static inline void set_shstk_base(struct task_struct *task, unsigned long shstk_addr,
>>+								unsigned long size)
>>+{
>>+
>>+}
>>+
>>+static inline void set_active_shstk(struct task_struct *task, unsigned long shstk_addr)
>>+{
>>+
>>+}
>>+
>>+static inline bool is_shstk_enabled(struct task_struct *task)
>>+{
>>+	return false;
>>+}
>>+
>>  #endif /* CONFIG_RISCV_USER_CFI */
>>  #endif /* __ASSEMBLY__ */
>>diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
>>index ce577cdc2af3..ef48a25b0eff 100644
>>--- a/arch/riscv/kernel/process.c
>>+++ b/arch/riscv/kernel/process.c
>>@@ -26,6 +26,7 @@
>>  #include <asm/cpuidle.h>
>>  #include <asm/vector.h>
>>  #include <asm/cpufeature.h>
>>+#include <asm/usercfi.h>
>>  register unsigned long gp_in_global __asm__("gp");
>>@@ -202,7 +203,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>>  void exit_thread(struct task_struct *tsk)
>>  {
>>-
>>+	if (IS_ENABLED(CONFIG_RISCV_USER_CFI))
>>+		shstk_release(tsk);
>>  }
>>  int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>>@@ -210,6 +212,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>>  	unsigned long clone_flags = args->flags;
>>  	unsigned long usp = args->stack;
>>  	unsigned long tls = args->tls;
>>+	unsigned long ssp = 0;
>>  	struct pt_regs *childregs = task_pt_regs(p);
>>  	memset(&p->thread.s, 0, sizeof(p->thread.s));
>>@@ -225,11 +228,18 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>>  		p->thread.s[0] = (unsigned long)args->fn;
>>  		p->thread.s[1] = (unsigned long)args->fn_arg;
>>  	} else {
>>+		/* allocate new shadow stack if needed. In case of CLONE_VM we have to */
>>+		ssp = shstk_alloc_thread_stack(p, args);
>>+		if (IS_ERR_VALUE(ssp))
>>+			return PTR_ERR((void *)ssp);
>>+
>>  		*childregs = *(current_pt_regs());
>>  		/* Turn off status.VS */
>>  		riscv_v_vstate_off(childregs);
>>  		if (usp) /* User fork */
>>  			childregs->sp = usp;
>>+		if (ssp) /* if needed, set new ssp */
>>+			set_active_shstk(p, ssp);
>>  		if (clone_flags & CLONE_SETTLS)
>>  			childregs->tp = tls;
>>  		childregs->a0 = 0; /* Return value of fork() */
>>diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c
>>index c4ed0d4e33d6..11ef7ab925c9 100644
>>--- a/arch/riscv/kernel/usercfi.c
>>+++ b/arch/riscv/kernel/usercfi.c
>>@@ -19,6 +19,41 @@
>>  #define SHSTK_ENTRY_SIZE sizeof(void *)
>>+bool is_shstk_enabled(struct task_struct *task)
>>+{
>>+	return task->thread_info.user_cfi_state.ubcfi_en ? true : false;
>>+}
>>+
>>+void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size)
>>+{
>>+	task->thread_info.user_cfi_state.shdw_stk_base = shstk_addr;
>>+	task->thread_info.user_cfi_state.shdw_stk_size = size;
>>+}
>>+
>>+unsigned long get_shstk_base(struct task_struct *task, unsigned long *size)
>>+{
>>+	if (size)
>>+		*size = task->thread_info.user_cfi_state.shdw_stk_size;
>>+	return task->thread_info.user_cfi_state.shdw_stk_base;
>>+}
>>+
>>+void set_active_shstk(struct task_struct *task, unsigned long shstk_addr)
>>+{
>>+	task->thread_info.user_cfi_state.user_shdw_stk = shstk_addr;
>>+}
>>+
>>+/*
>>+ * If size is 0, then to be compatible with regular stack we want it to be as big as
>>+ * regular stack. Else PAGE_ALIGN it and return back
>>+ */
>>+static unsigned long calc_shstk_size(unsigned long size)
>>+{
>>+	if (size)
>>+		return PAGE_ALIGN(size);
>>+
>>+	return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G));
>>+}
>>+
>>  /*
>>   * Writes on shadow stack can either be `sspush` or `ssamoswap`. `sspush` can happen
>>   * implicitly on current shadow stack pointed to by CSR_SSP. `ssamoswap` takes pointer to
>>@@ -147,3 +182,89 @@ SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsi
>>  	return allocate_shadow_stack(addr, aligned_size, size, set_tok);
>>  }
>>+
>>+/*
>>+ * This gets called during clone/clone3/fork. And is needed to allocate a shadow stack for
>>+ * cases where CLONE_VM is specified and thus a different stack is specified by user. We
>>+ * thus need a separate shadow stack too. How does separate shadow stack is specified by
>>+ * user is still being debated. Once that's settled, remove this part of the comment.
>>+ * This function simply returns 0 if shadow stack are not supported or if separate shadow
>>+ * stack allocation is not needed (like in case of !CLONE_VM)
>>+ */
>>+unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
>>+					   const struct kernel_clone_args *args)
>>+{
>>+	unsigned long addr, size;
>>+
>>+	/* If shadow stack is not supported, return 0 */
>>+	if (!cpu_supports_shadow_stack())
>>+		return 0;
>>+
>>+	/*
>>+	 * If shadow stack is not enabled on the new thread, skip any
>>+	 * switch to a new shadow stack.
>>+	 */
>>+	if (is_shstk_enabled(tsk))
>>+		return 0;
>>+
>>+	/*
>>+	 * For CLONE_VFORK the child will share the parents shadow stack.
>>+	 * Set base = 0 and size = 0, this is special means to track this state
>>+	 * so the freeing logic run for child knows to leave it alone.
>>+	 */
>>+	if (args->flags & CLONE_VFORK) {
>>+		set_shstk_base(tsk, 0, 0);
>>+		return 0;
>>+	}
>>+
>>+	/*
>>+	 * For !CLONE_VM the child will use a copy of the parents shadow
>>+	 * stack.
>>+	 */
>>+	if (!(args->flags & CLONE_VM))
>>+		return 0;
>>+
>>+	/*
>>+	 * reaching here means, CLONE_VM was specified and thus a separate shadow
>>+	 * stack is needed for new cloned thread. Note: below allocation is happening
>>+	 * using current mm.
>>+	 */
>>+	size = calc_shstk_size(args->stack_size);
>>+	addr = allocate_shadow_stack(0, size, 0, false);
>>+	if (IS_ERR_VALUE(addr))
>>+		return addr;
>>+
>>+	set_shstk_base(tsk, addr, size);
>>+
>>+	return addr + size;
>>+}
>>+
>>+void shstk_release(struct task_struct *tsk)
>>+{
>>+	unsigned long base = 0, size = 0;
>>+	/* If shadow stack is not supported or not enabled, nothing to release */
>>+	if (!cpu_supports_shadow_stack() ||
>>+		!is_shstk_enabled(tsk))
>>+		return;
>>+
>>+	/*
>>+	 * When fork() with CLONE_VM fails, the child (tsk) already has a
>>+	 * shadow stack allocated, and exit_thread() calls this function to
>>+	 * free it.  In this case the parent (current) and the child share
>>+	 * the same mm struct. Move forward only when they're same.
>>+	 */
>>+	if (!tsk->mm || tsk->mm != current->mm)
>>+		return;
>>+
>>+	/*
>>+	 * We know shadow stack is enabled but if base is NULL, then
>>+	 * this task is not managing its own shadow stack (CLONE_VFORK). So
>>+	 * skip freeing it.
>>+	 */
>>+	base = get_shstk_base(tsk, &size);
>>+	if (!base)
>>+		return;
>>+
>>+	vm_munmap(base, size);
>>+	set_shstk_base(tsk, 0, 0);
>>+}
Deepak Gupta May 13, 2024, 5:32 p.m. UTC | #19
On Sun, May 12, 2024 at 06:31:24PM +0200, Alexandre Ghiti wrote:
>On 04/04/2024 01:35, Deepak Gupta wrote:
>>`fork` implements copy on write (COW) by making pages readonly in child
>>and parent both.
>>
>>ptep_set_wrprotect and pte_wrprotect clears _PAGE_WRITE in PTE.
>>Assumption is that page is readable and on fault copy on write happens.
>>
>>To implement COW on such pages,
>
>
>I guess you mean "shadow stack pages" here.

Yes I meant shadow stack pages. Will fix the message.

>
>
>>  clearing up W bit makes them XWR = 000.
>>This will result in wrong PTE setting which says no perms but V=1 and PFN
>>field pointing to final page. Instead desired behavior is to turn it into
>>a readable page, take an access (load/store) fault on sspush/sspop
>>(shadow stack) and then perform COW on such pages.
>>This way regular reads
>>would still be allowed and not lead to COW maintaining current behavior
>>of COW on non-shadow stack but writeable memory.
>>
>>On the other hand it doesn't interfere with existing COW for read-write
>>memory. Assumption is always that _PAGE_READ must have been set and thus
>>setting _PAGE_READ is harmless.
>>
>>Signed-off-by: Deepak Gupta <debug@rivosinc.com>
>>---
>>  arch/riscv/include/asm/pgtable.h | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>>diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
>>index 9b837239d3e8..7a1c2a98d272 100644
>>--- a/arch/riscv/include/asm/pgtable.h
>>+++ b/arch/riscv/include/asm/pgtable.h
>>@@ -398,7 +398,7 @@ static inline int pte_special(pte_t pte)
>>  static inline pte_t pte_wrprotect(pte_t pte)
>>  {
>>-	return __pte(pte_val(pte) & ~(_PAGE_WRITE));
>>+	return __pte((pte_val(pte) & ~(_PAGE_WRITE)) | (_PAGE_READ));
>>  }
>>  /* static inline pte_t pte_mkread(pte_t pte) */
>>@@ -581,7 +581,15 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
>>  static inline void ptep_set_wrprotect(struct mm_struct *mm,
>>  				      unsigned long address, pte_t *ptep)
>>  {
>>-	atomic_long_and(~(unsigned long)_PAGE_WRITE, (atomic_long_t *)ptep);
>>+	volatile pte_t read_pte = *ptep;
>>+	/*
>>+	 * ptep_set_wrprotect can be called for shadow stack ranges too.
>>+	 * shadow stack memory is XWR = 010 and thus clearing _PAGE_WRITE will lead to
>>+	 * encoding 000b which is wrong encoding with V = 1. This should lead to page fault
>>+	 * but we dont want this wrong configuration to be set in page tables.
>>+	 */
>>+	atomic_long_set((atomic_long_t *)ptep,
>>+			((pte_val(read_pte) & ~(unsigned long)_PAGE_WRITE) | _PAGE_READ));
>>  }
>>  #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
>
>
>Doesn't making the shadow stack page readable allow "normal" loads to 
>access the page? If it does, isn't that an issue (security-wise)?

When shadow stack permissions are there (i.e. R=0, W=1, X=0), then also shadow stack is
readable through "normal" loads. So nothing changes when it converts into a readonly page
from page permissions perspective.

Security-wise it's not a concern because from threat modeling perspective, if attacker had
read-write primitives (via some bug in program) available to read and write address space
of process/task; then they would have availiblity of return addresses on normal stack. It's
the write primitive that is concerning and to be protected against. And that's why shadow stack
is not writeable using "normal" stores.

>
Deepak Gupta May 13, 2024, 6:31 p.m. UTC | #20
On Fri, May 10, 2024 at 04:29:19PM -0700, Charlie Jenkins wrote:
>On Wed, Apr 03, 2024 at 04:35:05PM -0700, Deepak Gupta wrote:
>> Three architectures (x86, aarch64, riscv) have support for indirect branch
>> tracking feature in a very similar fashion. On a very high level, indirect
>> branch tracking is a CPU feature where CPU tracks branches which uses
>> memory operand to perform control transfer in program. As part of this
>> tracking on indirect branches, CPU goes in a state where it expects a
>> landing pad instr on target and if not found then CPU raises some fault
>> (architecture dependent)
>>
>> x86 landing pad instr - `ENDBRANCH`
>> aarch64 landing pad instr - `BTI`
>> riscv landing instr - `lpad`
>>
>> Given that three major arches have support for indirect branch tracking,
>> This patch makes `prctl` for indirect branch tracking arch agnostic.
>>
>> To allow userspace to enable this feature for itself, following prtcls are
>> defined:
>>  - PR_GET_INDIR_BR_LP_STATUS: Gets current configured status for indirect
>>    branch tracking.
>>  - PR_SET_INDIR_BR_LP_STATUS: Sets a configuration for indirect branch
>>    tracking.
>>    Following status options are allowed
>>        - PR_INDIR_BR_LP_ENABLE: Enables indirect branch tracking on user
>>          thread.
>>        - PR_INDIR_BR_LP_DISABLE; Disables indirect branch tracking on user
>>          thread.
>>  - PR_LOCK_INDIR_BR_LP_STATUS: Locks configured status for indirect branch
>>    tracking for user thread.
>>
>> Signed-off-by: Deepak Gupta <debug@rivosinc.com>
>> ---
>>  include/uapi/linux/prctl.h | 27 +++++++++++++++++++++++++++
>>  kernel/sys.c               | 30 ++++++++++++++++++++++++++++++
>>  2 files changed, 57 insertions(+)
>>
>> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
>> index 3c66ed8f46d8..b7a8212a068e 100644
>> --- a/include/uapi/linux/prctl.h
>> +++ b/include/uapi/linux/prctl.h
>> @@ -328,4 +328,31 @@ struct prctl_mm_map {
>>   */
>>  #define PR_LOCK_SHADOW_STACK_STATUS      73
>>
>> +/*
>> + * Get the current indirect branch tracking configuration for the current
>> + * thread, this will be the value configured via PR_SET_INDIR_BR_LP_STATUS.
>> + */
>> +#define PR_GET_INDIR_BR_LP_STATUS      74
>> +
>> +/*
>> + * Set the indirect branch tracking configuration. PR_INDIR_BR_LP_ENABLE will
>> + * enable cpu feature for user thread, to track all indirect branches and ensure
>> + * they land on arch defined landing pad instruction.
>> + * x86 - If enabled, an indirect branch must land on `ENDBRANCH` instruction.
>> + * arch64 - If enabled, an indirect branch must land on `BTI` instruction.
>> + * riscv - If enabled, an indirect branch must land on `lpad` instruction.
>> + * PR_INDIR_BR_LP_DISABLE will disable feature for user thread and indirect
>> + * branches will no more be tracked by cpu to land on arch defined landing pad
>> + * instruction.
>> + */
>> +#define PR_SET_INDIR_BR_LP_STATUS      75
>> +# define PR_INDIR_BR_LP_ENABLE		   (1UL << 0)
>> +
>> +/*
>> + * Prevent further changes to the specified indirect branch tracking
>> + * configuration.  All bits may be locked via this call, including
>> + * undefined bits.
>> + */
>> +#define PR_LOCK_INDIR_BR_LP_STATUS      76
>> +
>>  #endif /* _LINUX_PRCTL_H */
>> diff --git a/kernel/sys.c b/kernel/sys.c
>> index 242e9f147791..c770060c3f06 100644
>> --- a/kernel/sys.c
>> +++ b/kernel/sys.c
>> @@ -2330,6 +2330,21 @@ int __weak arch_lock_shadow_stack_status(struct task_struct *t, unsigned long st
>>  	return -EINVAL;
>>  }
>>
>> +int __weak arch_get_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
>> +{
>> +	return -EINVAL;
>> +}
>> +
>> +int __weak arch_set_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
>> +{
>> +	return -EINVAL;
>> +}
>> +
>> +int __weak arch_lock_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
>> +{
>> +	return -EINVAL;
>> +}
>> +
>
>These weak references each cause a warning:
>
>kernel/sys.c:2333:12: warning: no previous prototype for 'arch_get_indir_br_lp_status' [-Wmissing-prototypes]
> 2333 | int __weak arch_get_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
>      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~
>kernel/sys.c:2338:12: warning: no previous prototype for 'arch_set_indir_br_lp_status' [-Wmissing-prototypes]
> 2338 | int __weak arch_set_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
>      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~
>kernel/sys.c:2343:12: warning: no previous prototype for 'arch_lock_indir_br_lp_status' [-Wmissing-prototypes]
> 2343 | int __weak arch_lock_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
>
>Can the definitions be added to include/linux/mm.h alongside the
>*_shadow_stack_status() definitions?

Noted. Will work on a fix for this.

>
>- Charlie
>
>>  #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)
>>
>>  #ifdef CONFIG_ANON_VMA_NAME
>> @@ -2787,6 +2802,21 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>>  			return -EINVAL;
>>  		error = arch_lock_shadow_stack_status(me, arg2);
>>  		break;
>> +	case PR_GET_INDIR_BR_LP_STATUS:
>> +		if (arg3 || arg4 || arg5)
>> +			return -EINVAL;
>> +		error = arch_get_indir_br_lp_status(me, (unsigned long __user *) arg2);
>> +		break;
>> +	case PR_SET_INDIR_BR_LP_STATUS:
>> +		if (arg3 || arg4 || arg5)
>> +			return -EINVAL;
>> +		error = arch_set_indir_br_lp_status(me, (unsigned long __user *) arg2);
>> +		break;
>> +	case PR_LOCK_INDIR_BR_LP_STATUS:
>> +		if (arg3 || arg4 || arg5)
>> +			return -EINVAL;
>> +		error = arch_lock_indir_br_lp_status(me, (unsigned long __user *) arg2);
>> +		break;
>>  	default:
>>  		error = -EINVAL;
>>  		break;
>> --
>> 2.43.2
>>
Alexandre Ghiti May 23, 2024, 2:59 p.m. UTC | #21
Hi Deepak,

On Mon, May 13, 2024 at 7:32 PM Deepak Gupta <debug@rivosinc.com> wrote:
>
> On Sun, May 12, 2024 at 06:31:24PM +0200, Alexandre Ghiti wrote:
> >On 04/04/2024 01:35, Deepak Gupta wrote:
> >>`fork` implements copy on write (COW) by making pages readonly in child
> >>and parent both.
> >>
> >>ptep_set_wrprotect and pte_wrprotect clears _PAGE_WRITE in PTE.
> >>Assumption is that page is readable and on fault copy on write happens.
> >>
> >>To implement COW on such pages,
> >
> >
> >I guess you mean "shadow stack pages" here.
>
> Yes I meant shadow stack pages. Will fix the message.
>
> >
> >
> >>  clearing up W bit makes them XWR = 000.
> >>This will result in wrong PTE setting which says no perms but V=1 and PFN
> >>field pointing to final page. Instead desired behavior is to turn it into
> >>a readable page, take an access (load/store) fault on sspush/sspop
> >>(shadow stack) and then perform COW on such pages.
> >>This way regular reads
> >>would still be allowed and not lead to COW maintaining current behavior
> >>of COW on non-shadow stack but writeable memory.
> >>
> >>On the other hand it doesn't interfere with existing COW for read-write
> >>memory. Assumption is always that _PAGE_READ must have been set and thus
> >>setting _PAGE_READ is harmless.
> >>
> >>Signed-off-by: Deepak Gupta <debug@rivosinc.com>
> >>---
> >>  arch/riscv/include/asm/pgtable.h | 12 ++++++++++--
> >>  1 file changed, 10 insertions(+), 2 deletions(-)
> >>
> >>diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> >>index 9b837239d3e8..7a1c2a98d272 100644
> >>--- a/arch/riscv/include/asm/pgtable.h
> >>+++ b/arch/riscv/include/asm/pgtable.h
> >>@@ -398,7 +398,7 @@ static inline int pte_special(pte_t pte)
> >>  static inline pte_t pte_wrprotect(pte_t pte)
> >>  {
> >>-     return __pte(pte_val(pte) & ~(_PAGE_WRITE));
> >>+     return __pte((pte_val(pte) & ~(_PAGE_WRITE)) | (_PAGE_READ));
> >>  }
> >>  /* static inline pte_t pte_mkread(pte_t pte) */
> >>@@ -581,7 +581,15 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
> >>  static inline void ptep_set_wrprotect(struct mm_struct *mm,
> >>                                    unsigned long address, pte_t *ptep)
> >>  {
> >>-     atomic_long_and(~(unsigned long)_PAGE_WRITE, (atomic_long_t *)ptep);
> >>+     volatile pte_t read_pte = *ptep;

Sorry I missed this ^. You need to use ptep_get() to get the value of
a pte. And why do you need the volatile here?

> >>+     /*
> >>+      * ptep_set_wrprotect can be called for shadow stack ranges too.
> >>+      * shadow stack memory is XWR = 010 and thus clearing _PAGE_WRITE will lead to
> >>+      * encoding 000b which is wrong encoding with V = 1. This should lead to page fault
> >>+      * but we dont want this wrong configuration to be set in page tables.
> >>+      */
> >>+     atomic_long_set((atomic_long_t *)ptep,
> >>+                     ((pte_val(read_pte) & ~(unsigned long)_PAGE_WRITE) | _PAGE_READ));
> >>  }
> >>  #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
> >
> >
> >Doesn't making the shadow stack page readable allow "normal" loads to
> >access the page? If it does, isn't that an issue (security-wise)?
>
> When shadow stack permissions are there (i.e. R=0, W=1, X=0), then also shadow stack is
> readable through "normal" loads. So nothing changes when it converts into a readonly page
> from page permissions perspective.
>
> Security-wise it's not a concern because from threat modeling perspective, if attacker had
> read-write primitives (via some bug in program) available to read and write address space
> of process/task; then they would have availiblity of return addresses on normal stack. It's
> the write primitive that is concerning and to be protected against. And that's why shadow stack
> is not writeable using "normal" stores.
>
> >

Thanks for the explanation!

With the use of ptep_get(), you can add:

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>

Thanks,

Alex