Message ID | 20210330022144.150edc6e@xhacker |
---|---|
Headers | show |
Series | riscv: improve self-protection | expand |
On Mon, 29 Mar 2021 11:21:44 PDT (-0700), jszhang3@mail.ustc.edu.cn wrote: > From: Jisheng Zhang <jszhang@kernel.org> > > patch1 is a trivial improvement patch to move some functions to .init > section > > Then following patches improve self-protection by: > > Marking some variables __ro_after_init > Constifing some variables > Enabling ARCH_HAS_STRICT_MODULE_RWX > > Jisheng Zhang (9): > riscv: add __init section marker to some functions > riscv: Mark some global variables __ro_after_init > riscv: Constify sys_call_table > riscv: Constify sbi_ipi_ops > riscv: kprobes: Implement alloc_insn_page() > riscv: bpf: Move bpf_jit_alloc_exec() and bpf_jit_free_exec() to core > riscv: bpf: Avoid breaking W^X > riscv: module: Create module allocations without exec permissions > riscv: Set ARCH_HAS_STRICT_MODULE_RWX if MMU > > arch/riscv/Kconfig | 1 + > arch/riscv/include/asm/smp.h | 4 ++-- > arch/riscv/include/asm/syscall.h | 2 +- > arch/riscv/kernel/module.c | 2 +- > arch/riscv/kernel/probes/kprobes.c | 8 ++++++++ > arch/riscv/kernel/sbi.c | 10 +++++----- > arch/riscv/kernel/smp.c | 6 +++--- > arch/riscv/kernel/syscall_table.c | 2 +- > arch/riscv/kernel/time.c | 2 +- > arch/riscv/kernel/traps.c | 2 +- > arch/riscv/kernel/vdso.c | 4 ++-- > arch/riscv/mm/init.c | 12 ++++++------ > arch/riscv/mm/kasan_init.c | 6 +++--- > arch/riscv/mm/ptdump.c | 2 +- > arch/riscv/net/bpf_jit_comp64.c | 13 ------------- > arch/riscv/net/bpf_jit_core.c | 14 ++++++++++++++ > 16 files changed, 50 insertions(+), 40 deletions(-) Thanks. These are on for-next. I had to fix up a handful of merge conflicts, so LMK if I made any mistakes.
On Mär 30 2021, Jisheng Zhang wrote: > From: Jisheng Zhang <jszhang@kernel.org> > > We allocate Non-executable pages, then call bpf_jit_binary_lock_ro() > to enable executable permission after mapping them read-only. This is > to prepare for STRICT_MODULE_RWX in following patch. That breaks booting with <https://github.com/openSUSE/kernel-source/blob/master/config/riscv64/default>. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Hi Andreas, On Fri, 11 Jun 2021 16:10:03 +0200 Andreas Schwab <schwab@linux-m68k.org> wrote: > On Mär 30 2021, Jisheng Zhang wrote: > > > From: Jisheng Zhang <jszhang@kernel.org> > > > > We allocate Non-executable pages, then call bpf_jit_binary_lock_ro() > > to enable executable permission after mapping them read-only. This is > > to prepare for STRICT_MODULE_RWX in following patch. > > That breaks booting with > <https://github.com/openSUSE/kernel-source/blob/master/config/riscv64/default>. > Thanks for the bug report. I reproduced an kernel panic with the defconfig on qemu, but I'm not sure whether this is the issue you saw, I will check. 0.161959] futex hash table entries: 512 (order: 3, 32768 bytes, linear) [ 0.167028] pinctrl core: initialized pinctrl subsystem [ 0.190727] Unable to handle kernel paging request at virtual address ffffffff81651bd8 [ 0.191361] Oops [#1] [ 0.191509] Modules linked in: [ 0.191814] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #3 [ 0.192179] Hardware name: riscv-virtio,qemu (DT) [ 0.192492] epc : __memset+0xc4/0xfc [ 0.192712] ra : skb_flow_dissector_init+0x22/0x86 [ 0.192915] epc : ffffffff803e2700 ra : ffffffff8058f90c sp : ffffffe001a4fda0 [ 0.193221] gp : ffffffff8156d120 tp : ffffffe001a5b700 t0 : ffffffff81651b10 [ 0.193631] t1 : 0000000000000100 t2 : 00000000000003a8 s0 : ffffffe001a4fdd0 [ 0.194034] s1 : ffffffff80c9e250 a0 : ffffffff81651bd8 a1 : 0000000000000000 [ 0.194502] a2 : 000000000000003c a3 : ffffffff81651c10 a4 : 0000000000000064 [ 0.195053] a5 : ffffffff803e2700 a6 : 0000000000000040 a7 : 0000000000000002 [ 0.195436] s2 : ffffffff81651bd8 s3 : 0000000000000009 s4 : ffffffff8156e0c8 [ 0.195723] s5 : ffffffff8156e050 s6 : ffffffff80a105e0 s7 : ffffffff80a00738 [ 0.195992] s8 : ffffffff80f07be0 s9 : 0000000000000008 s10: ffffffff808000ac [ 0.196257] s11: 0000000000000000 t3 : fffffffffffffffc t4 : 0000000000000000 [ 0.196511] t5 : 00000000000003a9 t6 : 00000000000003ff [ 0.196714] status: 0000000000000120 badaddr: ffffffff81651bd8 cause: 000000000000000f [ 0.197103] [<ffffffff803e2700>] __memset+0xc4/0xfc [ 0.197408] [<ffffffff80831f58>] init_default_flow_dissectors+0x22/0x60 [ 0.197693] [<ffffffff800020fc>] do_one_initcall+0x3e/0x168 [ 0.197907] [<ffffffff80801438>] kernel_init_freeable+0x25a/0x2c6 [ 0.198157] [<ffffffff8070a8a8>] kernel_init+0x12/0x110 [ 0.198351] [<ffffffff8000333a>] ret_from_exception+0x0/0xc [ 0.198973] Unable to handle kernel paging request at virtual address ffffffff8164d860 [ 0.199242] Oops [#2] [ 0.199336] Modules linked in: [ 0.199514] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G D 5.13.0-rc5-default+ #3 [ 0.199785] Hardware name: riscv-virtio,qemu (DT) [ 0.199940] epc : _raw_spin_lock_irqsave+0x14/0x4e [ 0.200113] ra : _extract_crng+0x58/0xac [ 0.200264] epc : ffffffff807117ae ra : ffffffff80490774 sp : ffffffe001a4fa70 [ 0.200489] gp : ffffffff8156d120 tp : ffffffe001a5b700 t0 : ffffffff8157c0d7 [ 0.200715] t1 : ffffffff8157c0c8 t2 : 0000000000000000 s0 : ffffffe001a4fa80 [ 0.200938] s1 : ffffffff8164d818 a0 : 0000000000000022 a1 : ffffffe001a4fac8 [ 0.201166] a2 : 0000000000000010 a3 : 0000000000000001 a4 : ffffffff8164d860 [ 0.201389] a5 : 0000000000000000 a6 : c0000000ffffdfff a7 : ffffffffffffffff [ 0.201612] s2 : ffffffff8156e1c0 s3 : ffffffe001a4fac8 s4 : ffffffff8164d860 [ 0.201836] s5 : ffffffff8156e0c8 s6 : ffffffff80a105e0 s7 : ffffffff80a00738 [ 0.202060] s8 : ffffffff80f07be0 s9 : 0000000000000008 s10: ffffffff808000ac [ 0.202295] s11: 0000000000000000 t3 : 000000000000005b t4 : ffffffffffffffff [ 0.202519] t5 : 00000000000003a9 t6 : ffffffe001a4f9b8 [ 0.202691] status: 0000000000000100 badaddr: ffffffff8164d860 cause: 000000000000000f [ 0.202940] [<ffffffff807117ae>] _raw_spin_lock_irqsave+0x14/0x4e [ 0.203326] Unable to handle kernel paging request at virtual address ffffffff8164d860 [ 0.203574] Oops [#3] [ 0.203664] Modules linked in: [ 0.203784] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G D 5.13.0-rc5-default+ #3 [ 0.204046] Hardware name: riscv-virtio,qemu (DT) [ 0.204201] epc : _raw_spin_lock_irqsave+0x14/0x4e [ 0.204371] ra : _extract_crng+0x58/0xac [ 0.204519] epc : ffffffff807117ae ra : ffffffff80490774 sp : ffffffe001a4f740 [ 0.204819] gp : ffffffff8156d120 tp : ffffffe001a5b700 t0 : ffffffff8157c0d7 [ 0.205089] t1 : ffffffff8157c0c8 t2 : 0000000000000000 s0 : ffffffe001a4f750 [ 0.205330] s1 : ffffffff8164d818 a0 : 0000000000000102 a1 : ffffffe001a4f798 [ 0.205553] a2 : 0000000000000010 a3 : 0000000000000001 a4 : ffffffff8164d860 [ 0.205768] a5 : 0000000000000000 a6 : c0000000ffffdfff a7 : ffffffff81408a40 [ 0.205981] s2 : ffffffff8156e1c0 s3 : ffffffe001a4f798 s4 : ffffffff8164d860 [ 0.206197] s5 : ffffffff8156e0c8 s6 : ffffffff80a105e0 s7 : ffffffff80a00738 [ 0.206411] s8 : ffffffff80f07be0 s9 : 0000000000000008 s10: ffffffff808000ac [ 0.206633] s11: 0000000000000000 t3 : 000000000000005b t4 : ffffffffffffffff [ 0.206849] t5 : 00000000000003a9 t6 : ffffffe001a4f688
On Jun 12 2021, Jisheng Zhang wrote: > I reproduced an kernel panic with the defconfig on qemu, but I'm not sure whether > this is the issue you saw, I will check. > > 0.161959] futex hash table entries: 512 (order: 3, 32768 bytes, linear) > [ 0.167028] pinctrl core: initialized pinctrl subsystem > [ 0.190727] Unable to handle kernel paging request at virtual address ffffffff81651bd8 > [ 0.191361] Oops [#1] > [ 0.191509] Modules linked in: > [ 0.191814] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #3 > [ 0.192179] Hardware name: riscv-virtio,qemu (DT) > [ 0.192492] epc : __memset+0xc4/0xfc > [ 0.192712] ra : skb_flow_dissector_init+0x22/0x86 Yes, that's the same. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Hi, On Fri, 11 Jun 2021 18:41:16 +0200 Andreas Schwab <schwab@linux-m68k.org> wrote: > On Jun 12 2021, Jisheng Zhang wrote: > > > I reproduced an kernel panic with the defconfig on qemu, but I'm not sure whether > > this is the issue you saw, I will check. > > > > 0.161959] futex hash table entries: 512 (order: 3, 32768 bytes, linear) > > [ 0.167028] pinctrl core: initialized pinctrl subsystem > > [ 0.190727] Unable to handle kernel paging request at virtual address ffffffff81651bd8 > > [ 0.191361] Oops [#1] > > [ 0.191509] Modules linked in: > > [ 0.191814] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #3 > > [ 0.192179] Hardware name: riscv-virtio,qemu (DT) > > [ 0.192492] epc : __memset+0xc4/0xfc > > [ 0.192712] ra : skb_flow_dissector_init+0x22/0x86 > > Yes, that's the same. > > Andreas. > I think I found the root cause: commit 2bfc6cd81bd ("move kernel mapping outside of linear mapping") moves BPF JIT region after the kernel: #define BPF_JIT_REGION_START PFN_ALIGN((unsigned long)&_end) The &_end is unlikely aligned with PMD SIZE, so the front bpf jit region sits with kernel .data section in one PMD. But kenrel is mapped in PMD SIZE, so when bpf_jit_binary_lock_ro() is called to make the first bpf jit prog ROX, we will make part of kernel .data section RO too, so when we write, for example memset the .data section, MMU will trigger store page fault. To fix the issue, we need to make the bpf jit region PMD size aligned by either patch BPF_JIT_REGION_START to align on PMD size rather than PAGE SIZE, or something as below patch to move the BPF region before modules region: diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 9469f464e71a..997b894edbc2 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -31,8 +31,8 @@ #define BPF_JIT_REGION_SIZE (SZ_128M) #ifdef CONFIG_64BIT /* KASLR should leave at least 128MB for BPF after the kernel */ -#define BPF_JIT_REGION_START PFN_ALIGN((unsigned long)&_end) -#define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE) +#define BPF_JIT_REGION_START (BPF_JIT_REGION_END - BPF_JIT_REGION_SIZE) +#define BPF_JIT_REGION_END (MODULES_VADDR) #else #define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE) #define BPF_JIT_REGION_END (VMALLOC_END) @@ -40,8 +40,8 @@ /* Modules always live before the kernel */ #ifdef CONFIG_64BIT -#define MODULES_VADDR (PFN_ALIGN((unsigned long)&_end) - SZ_2G) #define MODULES_END (PFN_ALIGN((unsigned long)&_start)) +#define MODULES_VADDR (MODULES_END - SZ_128M) #endif can you please try it? Per my test, the issue is fixed. Thanks
On Jun 14 2021, Jisheng Zhang wrote: > I think I found the root cause: commit 2bfc6cd81bd ("move kernel mapping > outside of linear mapping") moves BPF JIT region after the kernel: > > #define BPF_JIT_REGION_START PFN_ALIGN((unsigned long)&_end) > > The &_end is unlikely aligned with PMD SIZE, so the front bpf jit region > sits with kernel .data section in one PMD. But kenrel is mapped in PMD SIZE, > so when bpf_jit_binary_lock_ro() is called to make the first bpf jit prog > ROX, we will make part of kernel .data section RO too, so when we write, for example > memset the .data section, MMU will trigger store page fault. > > To fix the issue, we need to make the bpf jit region PMD size aligned by either > patch BPF_JIT_REGION_START to align on PMD size rather than PAGE SIZE, or > something as below patch to move the BPF region before modules region: > > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h > index 9469f464e71a..997b894edbc2 100644 > --- a/arch/riscv/include/asm/pgtable.h > +++ b/arch/riscv/include/asm/pgtable.h > @@ -31,8 +31,8 @@ > #define BPF_JIT_REGION_SIZE (SZ_128M) > #ifdef CONFIG_64BIT > /* KASLR should leave at least 128MB for BPF after the kernel */ > -#define BPF_JIT_REGION_START PFN_ALIGN((unsigned long)&_end) > -#define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE) > +#define BPF_JIT_REGION_START (BPF_JIT_REGION_END - BPF_JIT_REGION_SIZE) > +#define BPF_JIT_REGION_END (MODULES_VADDR) > #else > #define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE) > #define BPF_JIT_REGION_END (VMALLOC_END) > @@ -40,8 +40,8 @@ > > /* Modules always live before the kernel */ > #ifdef CONFIG_64BIT > -#define MODULES_VADDR (PFN_ALIGN((unsigned long)&_end) - SZ_2G) > #define MODULES_END (PFN_ALIGN((unsigned long)&_start)) > +#define MODULES_VADDR (MODULES_END - SZ_128M) > #endif > > > can you please try it? Per my test, the issue is fixed. I can confirm that this fixes the issue. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
From: Jisheng Zhang <jszhang@kernel.org> patch1 is a trivial improvement patch to move some functions to .init section Then following patches improve self-protection by: Marking some variables __ro_after_init Constifing some variables Enabling ARCH_HAS_STRICT_MODULE_RWX Jisheng Zhang (9): riscv: add __init section marker to some functions riscv: Mark some global variables __ro_after_init riscv: Constify sys_call_table riscv: Constify sbi_ipi_ops riscv: kprobes: Implement alloc_insn_page() riscv: bpf: Move bpf_jit_alloc_exec() and bpf_jit_free_exec() to core riscv: bpf: Avoid breaking W^X riscv: module: Create module allocations without exec permissions riscv: Set ARCH_HAS_STRICT_MODULE_RWX if MMU arch/riscv/Kconfig | 1 + arch/riscv/include/asm/smp.h | 4 ++-- arch/riscv/include/asm/syscall.h | 2 +- arch/riscv/kernel/module.c | 2 +- arch/riscv/kernel/probes/kprobes.c | 8 ++++++++ arch/riscv/kernel/sbi.c | 10 +++++----- arch/riscv/kernel/smp.c | 6 +++--- arch/riscv/kernel/syscall_table.c | 2 +- arch/riscv/kernel/time.c | 2 +- arch/riscv/kernel/traps.c | 2 +- arch/riscv/kernel/vdso.c | 4 ++-- arch/riscv/mm/init.c | 12 ++++++------ arch/riscv/mm/kasan_init.c | 6 +++--- arch/riscv/mm/ptdump.c | 2 +- arch/riscv/net/bpf_jit_comp64.c | 13 ------------- arch/riscv/net/bpf_jit_core.c | 14 ++++++++++++++ 16 files changed, 50 insertions(+), 40 deletions(-)