mbox series

[0/9] riscv: improve self-protection

Message ID 20210330022144.150edc6e@xhacker
Headers show
Series riscv: improve self-protection | expand

Message

Jisheng Zhang March 29, 2021, 6:21 p.m. UTC
From: Jisheng Zhang <jszhang@kernel.org>

patch1 is a trivial improvement patch to move some functions to .init
section

Then following patches improve self-protection by:

Marking some variables __ro_after_init
Constifing some variables
Enabling ARCH_HAS_STRICT_MODULE_RWX

Jisheng Zhang (9):
  riscv: add __init section marker to some functions
  riscv: Mark some global variables __ro_after_init
  riscv: Constify sys_call_table
  riscv: Constify sbi_ipi_ops
  riscv: kprobes: Implement alloc_insn_page()
  riscv: bpf: Move bpf_jit_alloc_exec() and bpf_jit_free_exec() to core
  riscv: bpf: Avoid breaking W^X
  riscv: module: Create module allocations without exec permissions
  riscv: Set ARCH_HAS_STRICT_MODULE_RWX if MMU

 arch/riscv/Kconfig                 |  1 +
 arch/riscv/include/asm/smp.h       |  4 ++--
 arch/riscv/include/asm/syscall.h   |  2 +-
 arch/riscv/kernel/module.c         |  2 +-
 arch/riscv/kernel/probes/kprobes.c |  8 ++++++++
 arch/riscv/kernel/sbi.c            | 10 +++++-----
 arch/riscv/kernel/smp.c            |  6 +++---
 arch/riscv/kernel/syscall_table.c  |  2 +-
 arch/riscv/kernel/time.c           |  2 +-
 arch/riscv/kernel/traps.c          |  2 +-
 arch/riscv/kernel/vdso.c           |  4 ++--
 arch/riscv/mm/init.c               | 12 ++++++------
 arch/riscv/mm/kasan_init.c         |  6 +++---
 arch/riscv/mm/ptdump.c             |  2 +-
 arch/riscv/net/bpf_jit_comp64.c    | 13 -------------
 arch/riscv/net/bpf_jit_core.c      | 14 ++++++++++++++
 16 files changed, 50 insertions(+), 40 deletions(-)

Comments

Palmer Dabbelt April 23, 2021, 1:48 a.m. UTC | #1
On Mon, 29 Mar 2021 11:21:44 PDT (-0700), jszhang3@mail.ustc.edu.cn wrote:
> From: Jisheng Zhang <jszhang@kernel.org>

>

> patch1 is a trivial improvement patch to move some functions to .init

> section

>

> Then following patches improve self-protection by:

>

> Marking some variables __ro_after_init

> Constifing some variables

> Enabling ARCH_HAS_STRICT_MODULE_RWX

>

> Jisheng Zhang (9):

>   riscv: add __init section marker to some functions

>   riscv: Mark some global variables __ro_after_init

>   riscv: Constify sys_call_table

>   riscv: Constify sbi_ipi_ops

>   riscv: kprobes: Implement alloc_insn_page()

>   riscv: bpf: Move bpf_jit_alloc_exec() and bpf_jit_free_exec() to core

>   riscv: bpf: Avoid breaking W^X

>   riscv: module: Create module allocations without exec permissions

>   riscv: Set ARCH_HAS_STRICT_MODULE_RWX if MMU

>

>  arch/riscv/Kconfig                 |  1 +

>  arch/riscv/include/asm/smp.h       |  4 ++--

>  arch/riscv/include/asm/syscall.h   |  2 +-

>  arch/riscv/kernel/module.c         |  2 +-

>  arch/riscv/kernel/probes/kprobes.c |  8 ++++++++

>  arch/riscv/kernel/sbi.c            | 10 +++++-----

>  arch/riscv/kernel/smp.c            |  6 +++---

>  arch/riscv/kernel/syscall_table.c  |  2 +-

>  arch/riscv/kernel/time.c           |  2 +-

>  arch/riscv/kernel/traps.c          |  2 +-

>  arch/riscv/kernel/vdso.c           |  4 ++--

>  arch/riscv/mm/init.c               | 12 ++++++------

>  arch/riscv/mm/kasan_init.c         |  6 +++---

>  arch/riscv/mm/ptdump.c             |  2 +-

>  arch/riscv/net/bpf_jit_comp64.c    | 13 -------------

>  arch/riscv/net/bpf_jit_core.c      | 14 ++++++++++++++

>  16 files changed, 50 insertions(+), 40 deletions(-)


Thanks.  These are on for-next.  I had to fix up a handful of merge 
conflicts, so LMK if I made any mistakes.
Andreas Schwab June 11, 2021, 2:10 p.m. UTC | #2
On Mär 30 2021, Jisheng Zhang wrote:

> From: Jisheng Zhang <jszhang@kernel.org>

>

> We allocate Non-executable pages, then call bpf_jit_binary_lock_ro()

> to enable executable permission after mapping them read-only. This is

> to prepare for STRICT_MODULE_RWX in following patch.


That breaks booting with
<https://github.com/openSUSE/kernel-source/blob/master/config/riscv64/default>.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."
Jisheng Zhang June 11, 2021, 4:23 p.m. UTC | #3
Hi Andreas,

On Fri, 11 Jun 2021 16:10:03 +0200
Andreas Schwab <schwab@linux-m68k.org> wrote:

> On Mär 30 2021, Jisheng Zhang wrote:

> 

> > From: Jisheng Zhang <jszhang@kernel.org>

> >

> > We allocate Non-executable pages, then call bpf_jit_binary_lock_ro()

> > to enable executable permission after mapping them read-only. This is

> > to prepare for STRICT_MODULE_RWX in following patch.  

> 

> That breaks booting with

> <https://github.com/openSUSE/kernel-source/blob/master/config/riscv64/default>.

> 


Thanks for the bug report.
I reproduced an kernel panic with the defconfig on qemu, but I'm not sure whether
this is the issue you saw, I will check.

    0.161959] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
[    0.167028] pinctrl core: initialized pinctrl subsystem
[    0.190727] Unable to handle kernel paging request at virtual address ffffffff81651bd8
[    0.191361] Oops [#1]
[    0.191509] Modules linked in:
[    0.191814] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #3
[    0.192179] Hardware name: riscv-virtio,qemu (DT)
[    0.192492] epc : __memset+0xc4/0xfc
[    0.192712]  ra : skb_flow_dissector_init+0x22/0x86
[    0.192915] epc : ffffffff803e2700 ra : ffffffff8058f90c sp : ffffffe001a4fda0
[    0.193221]  gp : ffffffff8156d120 tp : ffffffe001a5b700 t0 : ffffffff81651b10
[    0.193631]  t1 : 0000000000000100 t2 : 00000000000003a8 s0 : ffffffe001a4fdd0
[    0.194034]  s1 : ffffffff80c9e250 a0 : ffffffff81651bd8 a1 : 0000000000000000
[    0.194502]  a2 : 000000000000003c a3 : ffffffff81651c10 a4 : 0000000000000064
[    0.195053]  a5 : ffffffff803e2700 a6 : 0000000000000040 a7 : 0000000000000002
[    0.195436]  s2 : ffffffff81651bd8 s3 : 0000000000000009 s4 : ffffffff8156e0c8
[    0.195723]  s5 : ffffffff8156e050 s6 : ffffffff80a105e0 s7 : ffffffff80a00738
[    0.195992]  s8 : ffffffff80f07be0 s9 : 0000000000000008 s10: ffffffff808000ac
[    0.196257]  s11: 0000000000000000 t3 : fffffffffffffffc t4 : 0000000000000000
[    0.196511]  t5 : 00000000000003a9 t6 : 00000000000003ff
[    0.196714] status: 0000000000000120 badaddr: ffffffff81651bd8 cause: 000000000000000f
[    0.197103] [<ffffffff803e2700>] __memset+0xc4/0xfc
[    0.197408] [<ffffffff80831f58>] init_default_flow_dissectors+0x22/0x60
[    0.197693] [<ffffffff800020fc>] do_one_initcall+0x3e/0x168
[    0.197907] [<ffffffff80801438>] kernel_init_freeable+0x25a/0x2c6
[    0.198157] [<ffffffff8070a8a8>] kernel_init+0x12/0x110
[    0.198351] [<ffffffff8000333a>] ret_from_exception+0x0/0xc
[    0.198973] Unable to handle kernel paging request at virtual address ffffffff8164d860
[    0.199242] Oops [#2]
[    0.199336] Modules linked in:
[    0.199514] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G      D           5.13.0-rc5-default+ #3
[    0.199785] Hardware name: riscv-virtio,qemu (DT)
[    0.199940] epc : _raw_spin_lock_irqsave+0x14/0x4e
[    0.200113]  ra : _extract_crng+0x58/0xac
[    0.200264] epc : ffffffff807117ae ra : ffffffff80490774 sp : ffffffe001a4fa70
[    0.200489]  gp : ffffffff8156d120 tp : ffffffe001a5b700 t0 : ffffffff8157c0d7
[    0.200715]  t1 : ffffffff8157c0c8 t2 : 0000000000000000 s0 : ffffffe001a4fa80
[    0.200938]  s1 : ffffffff8164d818 a0 : 0000000000000022 a1 : ffffffe001a4fac8
[    0.201166]  a2 : 0000000000000010 a3 : 0000000000000001 a4 : ffffffff8164d860
[    0.201389]  a5 : 0000000000000000 a6 : c0000000ffffdfff a7 : ffffffffffffffff
[    0.201612]  s2 : ffffffff8156e1c0 s3 : ffffffe001a4fac8 s4 : ffffffff8164d860
[    0.201836]  s5 : ffffffff8156e0c8 s6 : ffffffff80a105e0 s7 : ffffffff80a00738
[    0.202060]  s8 : ffffffff80f07be0 s9 : 0000000000000008 s10: ffffffff808000ac
[    0.202295]  s11: 0000000000000000 t3 : 000000000000005b t4 : ffffffffffffffff
[    0.202519]  t5 : 00000000000003a9 t6 : ffffffe001a4f9b8
[    0.202691] status: 0000000000000100 badaddr: ffffffff8164d860 cause: 000000000000000f
[    0.202940] [<ffffffff807117ae>] _raw_spin_lock_irqsave+0x14/0x4e
[    0.203326] Unable to handle kernel paging request at virtual address ffffffff8164d860
[    0.203574] Oops [#3]
[    0.203664] Modules linked in:
[    0.203784] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G      D           5.13.0-rc5-default+ #3
[    0.204046] Hardware name: riscv-virtio,qemu (DT)
[    0.204201] epc : _raw_spin_lock_irqsave+0x14/0x4e
[    0.204371]  ra : _extract_crng+0x58/0xac
[    0.204519] epc : ffffffff807117ae ra : ffffffff80490774 sp : ffffffe001a4f740
[    0.204819]  gp : ffffffff8156d120 tp : ffffffe001a5b700 t0 : ffffffff8157c0d7
[    0.205089]  t1 : ffffffff8157c0c8 t2 : 0000000000000000 s0 : ffffffe001a4f750
[    0.205330]  s1 : ffffffff8164d818 a0 : 0000000000000102 a1 : ffffffe001a4f798
[    0.205553]  a2 : 0000000000000010 a3 : 0000000000000001 a4 : ffffffff8164d860
[    0.205768]  a5 : 0000000000000000 a6 : c0000000ffffdfff a7 : ffffffff81408a40
[    0.205981]  s2 : ffffffff8156e1c0 s3 : ffffffe001a4f798 s4 : ffffffff8164d860
[    0.206197]  s5 : ffffffff8156e0c8 s6 : ffffffff80a105e0 s7 : ffffffff80a00738
[    0.206411]  s8 : ffffffff80f07be0 s9 : 0000000000000008 s10: ffffffff808000ac
[    0.206633]  s11: 0000000000000000 t3 : 000000000000005b t4 : ffffffffffffffff
[    0.206849]  t5 : 00000000000003a9 t6 : ffffffe001a4f688
Andreas Schwab June 11, 2021, 4:41 p.m. UTC | #4
On Jun 12 2021, Jisheng Zhang wrote:

> I reproduced an kernel panic with the defconfig on qemu, but I'm not sure whether

> this is the issue you saw, I will check.

>

>     0.161959] futex hash table entries: 512 (order: 3, 32768 bytes, linear)

> [    0.167028] pinctrl core: initialized pinctrl subsystem

> [    0.190727] Unable to handle kernel paging request at virtual address ffffffff81651bd8

> [    0.191361] Oops [#1]

> [    0.191509] Modules linked in:

> [    0.191814] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #3

> [    0.192179] Hardware name: riscv-virtio,qemu (DT)

> [    0.192492] epc : __memset+0xc4/0xfc

> [    0.192712]  ra : skb_flow_dissector_init+0x22/0x86


Yes, that's the same.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."
Jisheng Zhang June 13, 2021, 5:05 p.m. UTC | #5
Hi,

On Fri, 11 Jun 2021 18:41:16 +0200
Andreas Schwab <schwab@linux-m68k.org> wrote:

> On Jun 12 2021, Jisheng Zhang wrote:

> 

> > I reproduced an kernel panic with the defconfig on qemu, but I'm not sure whether

> > this is the issue you saw, I will check.

> >

> >     0.161959] futex hash table entries: 512 (order: 3, 32768 bytes, linear)

> > [    0.167028] pinctrl core: initialized pinctrl subsystem

> > [    0.190727] Unable to handle kernel paging request at virtual address ffffffff81651bd8

> > [    0.191361] Oops [#1]

> > [    0.191509] Modules linked in:

> > [    0.191814] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #3

> > [    0.192179] Hardware name: riscv-virtio,qemu (DT)

> > [    0.192492] epc : __memset+0xc4/0xfc

> > [    0.192712]  ra : skb_flow_dissector_init+0x22/0x86  

> 

> Yes, that's the same.

> 

> Andreas.

> 


I think I found the root cause: commit 2bfc6cd81bd ("move kernel mapping
outside of linear mapping") moves BPF JIT region after the kernel:

#define BPF_JIT_REGION_START   PFN_ALIGN((unsigned long)&_end)

The &_end is unlikely aligned with PMD SIZE, so the front bpf jit region
sits with kernel .data section in one PMD. But kenrel is mapped in PMD SIZE,
so when bpf_jit_binary_lock_ro() is called to make the first bpf jit prog
ROX, we will make part of kernel .data section RO too, so when we write, for example
memset the .data section, MMU will trigger store page fault.

To fix the issue, we need to make the bpf jit region PMD size aligned by either
patch BPF_JIT_REGION_START to align on PMD size rather than PAGE SIZE, or
something as below patch to move the BPF region before modules region:

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 9469f464e71a..997b894edbc2 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -31,8 +31,8 @@
 #define BPF_JIT_REGION_SIZE	(SZ_128M)
 #ifdef CONFIG_64BIT
 /* KASLR should leave at least 128MB for BPF after the kernel */
-#define BPF_JIT_REGION_START	PFN_ALIGN((unsigned long)&_end)
-#define BPF_JIT_REGION_END	(BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE)
+#define BPF_JIT_REGION_START	(BPF_JIT_REGION_END - BPF_JIT_REGION_SIZE)
+#define BPF_JIT_REGION_END	(MODULES_VADDR)
 #else
 #define BPF_JIT_REGION_START	(PAGE_OFFSET - BPF_JIT_REGION_SIZE)
 #define BPF_JIT_REGION_END	(VMALLOC_END)
@@ -40,8 +40,8 @@
 
 /* Modules always live before the kernel */
 #ifdef CONFIG_64BIT
-#define MODULES_VADDR	(PFN_ALIGN((unsigned long)&_end) - SZ_2G)
 #define MODULES_END	(PFN_ALIGN((unsigned long)&_start))
+#define MODULES_VADDR	(MODULES_END - SZ_128M)
 #endif
 
 
can you please try it? Per my test, the issue is fixed.

Thanks
Andreas Schwab June 13, 2021, 7:50 p.m. UTC | #6
On Jun 14 2021, Jisheng Zhang wrote:

> I think I found the root cause: commit 2bfc6cd81bd ("move kernel mapping

> outside of linear mapping") moves BPF JIT region after the kernel:

>

> #define BPF_JIT_REGION_START   PFN_ALIGN((unsigned long)&_end)

>

> The &_end is unlikely aligned with PMD SIZE, so the front bpf jit region

> sits with kernel .data section in one PMD. But kenrel is mapped in PMD SIZE,

> so when bpf_jit_binary_lock_ro() is called to make the first bpf jit prog

> ROX, we will make part of kernel .data section RO too, so when we write, for example

> memset the .data section, MMU will trigger store page fault.

>

> To fix the issue, we need to make the bpf jit region PMD size aligned by either

> patch BPF_JIT_REGION_START to align on PMD size rather than PAGE SIZE, or

> something as below patch to move the BPF region before modules region:

>

> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h

> index 9469f464e71a..997b894edbc2 100644

> --- a/arch/riscv/include/asm/pgtable.h

> +++ b/arch/riscv/include/asm/pgtable.h

> @@ -31,8 +31,8 @@

>  #define BPF_JIT_REGION_SIZE	(SZ_128M)

>  #ifdef CONFIG_64BIT

>  /* KASLR should leave at least 128MB for BPF after the kernel */

> -#define BPF_JIT_REGION_START	PFN_ALIGN((unsigned long)&_end)

> -#define BPF_JIT_REGION_END	(BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE)

> +#define BPF_JIT_REGION_START	(BPF_JIT_REGION_END - BPF_JIT_REGION_SIZE)

> +#define BPF_JIT_REGION_END	(MODULES_VADDR)

>  #else

>  #define BPF_JIT_REGION_START	(PAGE_OFFSET - BPF_JIT_REGION_SIZE)

>  #define BPF_JIT_REGION_END	(VMALLOC_END)

> @@ -40,8 +40,8 @@

>  

>  /* Modules always live before the kernel */

>  #ifdef CONFIG_64BIT

> -#define MODULES_VADDR	(PFN_ALIGN((unsigned long)&_end) - SZ_2G)

>  #define MODULES_END	(PFN_ALIGN((unsigned long)&_start))

> +#define MODULES_VADDR	(MODULES_END - SZ_128M)

>  #endif

>  

>  

> can you please try it? Per my test, the issue is fixed.


I can confirm that this fixes the issue.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."