From patchwork Fri Sep 4 20:31:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gabriel Krisman Bertazi X-Patchwork-Id: 255241 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73D96C43461 for ; Fri, 4 Sep 2020 20:32:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3DD48208C7 for ; Fri, 4 Sep 2020 20:32:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727020AbgIDUcA (ORCPT ); Fri, 4 Sep 2020 16:32:00 -0400 Received: from bhuna.collabora.co.uk ([46.235.227.227]:42056 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726621AbgIDUb7 (ORCPT ); Fri, 4 Sep 2020 16:31:59 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: krisman) with ESMTPSA id 0060329B035 From: Gabriel Krisman Bertazi To: luto@kernel.org, tglx@linutronix.de, keescook@chromium.org Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, willy@infradead.org, linux-kselftest@vger.kernel.org, shuah@kernel.org, Gabriel Krisman Bertazi , kernel@collabora.com Subject: [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag Date: Fri, 4 Sep 2020 16:31:39 -0400 Message-Id: <20200904203147.2908430-2-krisman@collabora.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200904203147.2908430-1-krisman@collabora.com> References: <20200904203147.2908430-1-krisman@collabora.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Convert TIF_SECCOMP into a generic TI flag for any syscall interception work being done by the kernel. The actual type of work is exposed by a new flag field outside of thread_info. This ensures that the syscall_intercept field is only accessed if struct seccomp has to be accessed already, such that it doesn't incur in a much higher cost to the seccomp path. In order to avoid modifying every architecture at once, this patch has a transition mechanism, such that architectures that define TIF_SECCOMP continue to work by ignoring the syscall_intercept flag, as long as they don't support other syscall interception mechanisms like the future syscall user dispatch. When migrating TIF_SECCOMP to TIF_SYSCALL_INTERCEPT, they should adopt the semantics of checking the syscall_intercept flag, like it is done in the common entry syscall code, or even better, migrate to the common syscall entry code. This was tested by running the selftests for seccomp. No regressions were observed, and all tests passed (with and without this patch). Signed-off-by: Gabriel Krisman Bertazi --- include/linux/sched.h | 6 ++- include/linux/seccomp.h | 20 ++++++++- include/linux/syscall_intercept.h | 70 +++++++++++++++++++++++++++++++ kernel/fork.c | 10 ++++- kernel/seccomp.c | 7 ++-- 5 files changed, 106 insertions(+), 7 deletions(-) create mode 100644 include/linux/syscall_intercept.h diff --git a/include/linux/sched.h b/include/linux/sched.h index afe01e232935..3511c98a7849 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -959,7 +959,11 @@ struct task_struct { kuid_t loginuid; unsigned int sessionid; #endif - struct seccomp seccomp; + + struct { + unsigned int syscall_intercept; + struct seccomp seccomp; + }; /* Thread group tracking: */ u64 parent_exec_id; diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index 02aef2844c38..027dc462cea9 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -20,6 +20,24 @@ #include #include +/* + * Some transitional defines to avoid migrating every architecture code + * at once. + */ + +#if defined(TIF_SECCOMP) && defined(TIF_SYSCALL_INTERCEPT) +# error "TIF_SYSCALL_INTERCEPT and TIF_SECCOMP can't be defined at the same time" +#endif + +/* + * If the arch has not transitioned to TIF_SYSCALL_INTERCEPT, this let + * seccomp work with these architectures, as long as no other syscall + * intercept features are meant to be supported. + */ +#ifdef TIF_SECCOMP +# define TIF_SYSCALL_INTERCEPT TIF_SECCOMP +#endif + struct seccomp_filter; /** * struct seccomp - the state of a seccomp'ed process @@ -42,7 +60,7 @@ struct seccomp { extern int __secure_computing(const struct seccomp_data *sd); static inline int secure_computing(void) { - if (unlikely(test_thread_flag(TIF_SECCOMP))) + if (unlikely(test_thread_flag(TIF_SYSCALL_INTERCEPT))) return __secure_computing(NULL); return 0; } diff --git a/include/linux/syscall_intercept.h b/include/linux/syscall_intercept.h new file mode 100644 index 000000000000..725d157699da --- /dev/null +++ b/include/linux/syscall_intercept.h @@ -0,0 +1,70 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2020 Collabora Ltd. + */ +#ifndef _SYSCALL_INTERCEPT_H +#define _SYSCALL_INTERCEPT_H + +#include +#include +#include + +#define SYSINT_SECCOMP 0x1 + +#ifdef TIF_SYSCALL_INTERCEPT + +/* seccomp (at least) can modify TIF_SYSCALL_INTERCEPT from a different + * thread, which means it can race with itself or with + * syscall_user_dispatch. Therefore, TIF_SYSCALL_INTERCEPT and + * syscall_intercept are synchronized by tsk->sighand->siglock. + */ + +static inline void __set_tsk_syscall_intercept(struct task_struct *tsk, + unsigned int type) +{ + tsk->syscall_intercept |= type; + + if (tsk->syscall_intercept) + set_tsk_thread_flag(tsk, TIF_SYSCALL_INTERCEPT); +} + +static inline void __clear_tsk_syscall_intercept(struct task_struct *tsk, + unsigned int type) +{ + tsk->syscall_intercept &= ~type; + + if (tsk->syscall_intercept == 0) + clear_tsk_thread_flag(tsk, TIF_SYSCALL_INTERCEPT); +} + +static inline void set_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type) +{ + spin_lock_irq(&tsk->sighand->siglock); + __set_tsk_syscall_intercept(tsk, type); + spin_unlock_irq(&tsk->sighand->siglock); +} + +static inline void clear_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type) +{ + spin_lock_irq(&tsk->sighand->siglock); + __clear_tsk_syscall_intercept(tsk, type); + spin_unlock_irq(&tsk->sighand->siglock); +} + +#else +static inline void __set_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type) +{ +} +static inline void set_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type) +{ +} +static inline void __clear_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type) +{ +} +static inline void clear_tsk_syscall_intercept(struct task_struct *tsk, unsigned int type) +{ +} +#endif + +#endif + diff --git a/kernel/fork.c b/kernel/fork.c index 4d32190861bd..a39177bed8ea 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -49,7 +49,7 @@ #include #include #include -#include +#include #include #include #include @@ -898,6 +898,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) * the usage counts on the error path calling free_task. */ tsk->seccomp.filter = NULL; + tsk->syscall_intercept = 0; #endif setup_thread_stack(tsk, orig); @@ -1620,9 +1621,14 @@ static void copy_seccomp(struct task_struct *p) * If the parent gained a seccomp mode after copying thread * flags and between before we held the sighand lock, we have * to manually enable the seccomp thread flag here. + * + * In addition current sighand lock is asserted, so it is safe + * to use the unlocked version of set_tsk_syscall_intercept. */ if (p->seccomp.mode != SECCOMP_MODE_DISABLED) - set_tsk_thread_flag(p, TIF_SECCOMP); + __set_tsk_syscall_intercept(p, SYSINT_SECCOMP); + else + __clear_tsk_syscall_intercept(p, SYSINT_SECCOMP); #endif } diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 3ee59ce0a323..d0643b500f2e 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -28,6 +28,7 @@ #include #include #include +#include #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER #include @@ -352,14 +353,14 @@ static inline void seccomp_assign_mode(struct task_struct *task, task->seccomp.mode = seccomp_mode; /* - * Make sure TIF_SECCOMP cannot be set before the mode (and + * Make sure SYSINT_SECCOMP cannot be set before the mode (and * filter) is set. */ smp_mb__before_atomic(); /* Assume default seccomp processes want spec flaw mitigation. */ if ((flags & SECCOMP_FILTER_FLAG_SPEC_ALLOW) == 0) arch_seccomp_spec_mitigate(task); - set_tsk_thread_flag(task, TIF_SECCOMP); + __set_tsk_syscall_intercept(task, SYSINT_SECCOMP); } #ifdef CONFIG_SECCOMP_FILTER @@ -925,7 +926,7 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, /* * Make sure that any changes to mode from another thread have - * been seen after TIF_SECCOMP was seen. + * been seen after SYSINT_SECCOMP was seen. */ rmb(); From patchwork Fri Sep 4 20:31:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gabriel Krisman Bertazi X-Patchwork-Id: 255240 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97D91C43461 for ; Fri, 4 Sep 2020 20:32:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6BA232145D for ; Fri, 4 Sep 2020 20:32:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728111AbgIDUcH (ORCPT ); Fri, 4 Sep 2020 16:32:07 -0400 Received: from bhuna.collabora.co.uk ([46.235.227.227]:42074 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726621AbgIDUcH (ORCPT ); Fri, 4 Sep 2020 16:32:07 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: krisman) with ESMTPSA id 71E8029B031 From: Gabriel Krisman Bertazi To: luto@kernel.org, tglx@linutronix.de, keescook@chromium.org Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, willy@infradead.org, linux-kselftest@vger.kernel.org, shuah@kernel.org, Gabriel Krisman Bertazi , kernel@collabora.com Subject: [PATCH v6 3/9] x86: vdso: Expose sigreturn address on vdso to the kernel Date: Fri, 4 Sep 2020 16:31:41 -0400 Message-Id: <20200904203147.2908430-4-krisman@collabora.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200904203147.2908430-1-krisman@collabora.com> References: <20200904203147.2908430-1-krisman@collabora.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Syscall user redirection requires the signal trampoline code to not be captured, in order to support returning with a locked selector while avoiding recursion back into the signal handler. For ia-32, which has the trampoline in the vDSO, expose the entry points to the kernel, such that it can avoid dispatching syscalls from that region to userspace. Changes since V1 - Change return address to bool (Andy) Suggested-by: Andy Lutomirski Acked-by: Andy Lutomirski Signed-off-by: Gabriel Krisman Bertazi Reviewed-by: Kees Cook --- arch/x86/entry/vdso/vdso2c.c | 2 ++ arch/x86/entry/vdso/vdso32/sigreturn.S | 2 ++ arch/x86/entry/vdso/vma.c | 15 +++++++++++++++ arch/x86/include/asm/elf.h | 1 + arch/x86/include/asm/vdso.h | 2 ++ 5 files changed, 22 insertions(+) diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c index 7380908045c7..2d0f3d8bcc25 100644 --- a/arch/x86/entry/vdso/vdso2c.c +++ b/arch/x86/entry/vdso/vdso2c.c @@ -101,6 +101,8 @@ struct vdso_sym required_syms[] = { {"__kernel_sigreturn", true}, {"__kernel_rt_sigreturn", true}, {"int80_landing_pad", true}, + {"vdso32_rt_sigreturn_landing_pad", true}, + {"vdso32_sigreturn_landing_pad", true}, }; __attribute__((format(printf, 1, 2))) __attribute__((noreturn)) diff --git a/arch/x86/entry/vdso/vdso32/sigreturn.S b/arch/x86/entry/vdso/vdso32/sigreturn.S index c3233ee98a6b..1bd068f72d4c 100644 --- a/arch/x86/entry/vdso/vdso32/sigreturn.S +++ b/arch/x86/entry/vdso/vdso32/sigreturn.S @@ -18,6 +18,7 @@ __kernel_sigreturn: movl $__NR_sigreturn, %eax SYSCALL_ENTER_KERNEL .LEND_sigreturn: +SYM_INNER_LABEL(vdso32_sigreturn_landing_pad, SYM_L_GLOBAL) nop .size __kernel_sigreturn,.-.LSTART_sigreturn @@ -29,6 +30,7 @@ __kernel_rt_sigreturn: movl $__NR_rt_sigreturn, %eax SYSCALL_ENTER_KERNEL .LEND_rt_sigreturn: +SYM_INNER_LABEL(vdso32_rt_sigreturn_landing_pad, SYM_L_GLOBAL) nop .size __kernel_rt_sigreturn,.-.LSTART_rt_sigreturn .previous diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index 9185cb1d13b9..3fc323d24824 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -436,6 +436,21 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) } #endif +bool arch_syscall_is_vdso_sigreturn(struct pt_regs *regs) +{ +#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION) + const struct vdso_image *image = current->mm->context.vdso_image; + unsigned long vdso = (unsigned long) current->mm->context.vdso; + + if (in_ia32_syscall() && image == &vdso_image_32) { + if (regs->ip == vdso + image->sym_vdso32_sigreturn_landing_pad || + regs->ip == vdso + image->sym_vdso32_rt_sigreturn_landing_pad) + return true; + } +#endif + return false; +} + #ifdef CONFIG_X86_64 static __init int vdso_setup(char *s) { diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index b9a5d488f1a5..eb41db289fe6 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -383,6 +383,7 @@ extern int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp); extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp); +extern bool arch_syscall_is_vdso_sigreturn(struct pt_regs *regs); #define compat_arch_setup_additional_pages compat_arch_setup_additional_pages /* Do not change the values. See get_align_mask() */ diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h index bbcdc7b8f963..589f489dd375 100644 --- a/arch/x86/include/asm/vdso.h +++ b/arch/x86/include/asm/vdso.h @@ -27,6 +27,8 @@ struct vdso_image { long sym___kernel_rt_sigreturn; long sym___kernel_vsyscall; long sym_int80_landing_pad; + long sym_vdso32_sigreturn_landing_pad; + long sym_vdso32_rt_sigreturn_landing_pad; }; #ifdef CONFIG_X86_64 From patchwork Fri Sep 4 20:31:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gabriel Krisman Bertazi X-Patchwork-Id: 255237 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6209C433E2 for ; Fri, 4 Sep 2020 20:32:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7EC79208C7 for ; Fri, 4 Sep 2020 20:32:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728234AbgIDUcq (ORCPT ); Fri, 4 Sep 2020 16:32:46 -0400 Received: from bhuna.collabora.co.uk ([46.235.227.227]:42080 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728113AbgIDUcJ (ORCPT ); Fri, 4 Sep 2020 16:32:09 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: krisman) with ESMTPSA id 4413F29B035 From: Gabriel Krisman Bertazi To: luto@kernel.org, tglx@linutronix.de, keescook@chromium.org Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, willy@infradead.org, linux-kselftest@vger.kernel.org, shuah@kernel.org, Gabriel Krisman Bertazi , kernel@collabora.com Subject: [PATCH v6 4/9] signal: Expose SYS_USER_DISPATCH si_code type Date: Fri, 4 Sep 2020 16:31:42 -0400 Message-Id: <20200904203147.2908430-5-krisman@collabora.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200904203147.2908430-1-krisman@collabora.com> References: <20200904203147.2908430-1-krisman@collabora.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org SYS_USER_DISPATCH will be triggered when a syscall is sent to userspace by the Syscall User Dispatch mechanism. This adjusts eventual BUILD_BUG_ON around the tree. Signed-off-by: Gabriel Krisman Bertazi Acked-by: Kees Cook --- arch/x86/kernel/signal_compat.c | 2 +- include/uapi/asm-generic/siginfo.h | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c index 9ccbf0576cd0..210aecc6eab9 100644 --- a/arch/x86/kernel/signal_compat.c +++ b/arch/x86/kernel/signal_compat.c @@ -31,7 +31,7 @@ static inline void signal_compat_build_tests(void) BUILD_BUG_ON(NSIGBUS != 5); BUILD_BUG_ON(NSIGTRAP != 5); BUILD_BUG_ON(NSIGCHLD != 6); - BUILD_BUG_ON(NSIGSYS != 1); + BUILD_BUG_ON(NSIGSYS != 2); /* This is part of the ABI and can never change in size: */ BUILD_BUG_ON(sizeof(compat_siginfo_t) != 128); diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index cb3d6c267181..37741908b846 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -284,7 +284,8 @@ typedef struct siginfo { * SIGSYS si_codes */ #define SYS_SECCOMP 1 /* seccomp triggered */ -#define NSIGSYS 1 +#define SYS_USER_DISPATCH 2 /* syscall user dispatch triggered */ +#define NSIGSYS 2 /* * SIGEMT si_codes From patchwork Fri Sep 4 20:31:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gabriel Krisman Bertazi X-Patchwork-Id: 255239 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52CF4C43461 for ; Fri, 4 Sep 2020 20:32:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2BA3A2145D for ; Fri, 4 Sep 2020 20:32:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728197AbgIDUcY (ORCPT ); Fri, 4 Sep 2020 16:32:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726621AbgIDUcV (ORCPT ); Fri, 4 Sep 2020 16:32:21 -0400 Received: from bhuna.collabora.co.uk (bhuna.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e3e3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73352C061244; Fri, 4 Sep 2020 13:32:20 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: krisman) with ESMTPSA id 204DB29B035 From: Gabriel Krisman Bertazi To: luto@kernel.org, tglx@linutronix.de, keescook@chromium.org Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, willy@infradead.org, linux-kselftest@vger.kernel.org, shuah@kernel.org, Gabriel Krisman Bertazi , kernel@collabora.com Subject: [PATCH v6 7/9] x86: Enable Syscall User Dispatch Date: Fri, 4 Sep 2020 16:31:45 -0400 Message-Id: <20200904203147.2908430-8-krisman@collabora.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200904203147.2908430-1-krisman@collabora.com> References: <20200904203147.2908430-1-krisman@collabora.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Syscall User Dispatch requirements are fully supported in x86. This patch flips the switch, marking it as supported. This was tested against Syscall User Dispatch selftest. Signed-off-by: Gabriel Krisman Bertazi --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7101ac64bb20..56ac8de99021 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -150,6 +150,7 @@ config X86 select HAVE_ARCH_COMPAT_MMAP_BASES if MMU && COMPAT select HAVE_ARCH_PREL32_RELOCATIONS select HAVE_ARCH_SECCOMP_FILTER + select HAVE_ARCH_SYSCALL_USER_DISPATCH select HAVE_ARCH_THREAD_STRUCT_WHITELIST select HAVE_ARCH_STACKLEAK select HAVE_ARCH_TRACEHOOK From patchwork Fri Sep 4 20:31:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gabriel Krisman Bertazi X-Patchwork-Id: 255238 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 855FDC2D0E1 for ; Fri, 4 Sep 2020 20:32:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5EB862098B for ; Fri, 4 Sep 2020 20:32:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728151AbgIDUcf (ORCPT ); Fri, 4 Sep 2020 16:32:35 -0400 Received: from bhuna.collabora.co.uk ([46.235.227.227]:42132 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728202AbgIDUc2 (ORCPT ); Fri, 4 Sep 2020 16:32:28 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: krisman) with ESMTPSA id C61EE29B03D From: Gabriel Krisman Bertazi To: luto@kernel.org, tglx@linutronix.de, keescook@chromium.org Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, willy@infradead.org, linux-kselftest@vger.kernel.org, shuah@kernel.org, Gabriel Krisman Bertazi , kernel@collabora.com Subject: [PATCH v6 9/9] doc: Document Syscall User Dispatch Date: Fri, 4 Sep 2020 16:31:47 -0400 Message-Id: <20200904203147.2908430-10-krisman@collabora.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200904203147.2908430-1-krisman@collabora.com> References: <20200904203147.2908430-1-krisman@collabora.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Explain the interface, provide some background and security notes. Signed-off-by: Gabriel Krisman Bertazi Reviewed-by: Kees Cook --- .../admin-guide/syscall-user-dispatch.rst | 87 +++++++++++++++++++ 1 file changed, 87 insertions(+) create mode 100644 Documentation/admin-guide/syscall-user-dispatch.rst diff --git a/Documentation/admin-guide/syscall-user-dispatch.rst b/Documentation/admin-guide/syscall-user-dispatch.rst new file mode 100644 index 000000000000..96616660fded --- /dev/null +++ b/Documentation/admin-guide/syscall-user-dispatch.rst @@ -0,0 +1,87 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================== +Syscall User Dispatch +===================== + +Background +---------- + +Compatibility layers like Wine need a way to efficiently emulate system +calls of only a part of their process - the part that has the +incompatible code - while being able to execute native syscalls without +a high performance penalty on the native part of the process. Seccomp +falls short on this task, since it has limited support to efficiently +filter syscalls based on memory regions, and it doesn't support removing +filters. Therefore a new mechanism is necessary. + +Syscall User Dispatch brings the filtering of the syscall dispatcher +address back to userspace. The application is in control of a flip +switch, indicating the current personality of the process. A +multiple-personality application can then flip the switch without +invoking the kernel, when crossing the compatibility layer API +boundaries, to enable/disable the syscall redirection and execute +syscalls directly (disabled) or send them to be emulated in userspace +through a SIGSYS. + +The goal of this design is to provide very quick compatibility layer +boundary crosses, which is achieved by not executing a syscall to change +personality every time the compatibility layer executes. Instead, a +userspace memory region exposed to the kernel indicates the current +personality, and the application simply modifies that variable to +configure the mechanism. + +There is a relatively high cost associated with handling signals on most +architectures, like x86, but at least for Wine, syscalls issued by +native Windows code are currently not known to be a performance problem, +since they are quite rare, at least for modern gaming applications. + +Since this mechanism is designed to capture syscalls issued by +non-native applications, it must function on syscalls whose invocation +ABI is completely unexpected to Linux. Syscall User Dispatch, therefore +doesn't rely on any of the syscall ABI to make the filtering. It uses +only the syscall dispatcher address and the userspace key. + +Interface +--------- + +A process can setup this mechanism on supported kernels +CONFIG_SYSCALL_USER_DISPATCH) by executing the following prctl: + + prctl(PR_SET_SYSCALL_USER_DISPATCH, , , , [selector]) + + is either PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF, to enable and +disable the mechanism globally for that thread. When +PR_SYS_DISPATCH_OFF is used, the other fields must be zero. + + and delimit a closed memory region interval from +which syscalls are always executed directly, regardless of the userspace +selector. This provides a fast path for the C library, which includes +the most common syscall dispatchers in the native code applications, and +also provides a way for the signal handler to return without triggering +a nested SIGSYS on (rt_)sigreturn. Users of this interface should make +sure that at least the signal trampoline code is included in this +region. In addition, for syscalls that implement the trampoline code on +the vDSO, that trampoline is never intercepted. + +[selector] is a pointer to a char-sized region in the process memory +region, that provides a quick way to enable disable syscall redirection +thread-wide, without the need to invoke the kernel directly. selector +can be set to PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF. Any other +value should terminate the program with a SIGSYS. + +Security Notes +-------------- + +Syscall User Dispatch provides functionality for compatibility layers to +quickly capture system calls issued by a non-native part of the +application, while not impacting the Linux native regions of the +process. It is not a mechanism for sandboxing system calls, and it +should not be seen as a security mechanism, since it is trivial for a +malicious application to subvert the mechanism by jumping to an allowed +dispatcher region prior to executing the syscall, or to discover the +address and modify the selector value. If the use case requires any +kind of security sandboxing, Seccomp should be used instead. + +Any fork or exec of the existing process resets the mechanism to +PR_SYS_DISPATCH_OFF.