From patchwork Tue Mar 3 13:02:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bernd Edlinger X-Patchwork-Id: 230065 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.7 required=3.0 tests=FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA009C3F2C6 for ; Tue, 3 Mar 2020 13:03:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9AB5C20866 for ; Tue, 3 Mar 2020 13:03:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729163AbgCCNDA (ORCPT ); Tue, 3 Mar 2020 08:03:00 -0500 Received: from mail-oln040092074057.outbound.protection.outlook.com ([40.92.74.57]:14471 "EHLO EUR04-DB3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728041AbgCCNDA (ORCPT ); Tue, 3 Mar 2020 08:03:00 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bPYboE1hw+SNhB9XdS4A+eEqqhl3IKLf/n3rOrO+6yY0FQ/M2VA0iOg07ZqmWEabHETprD+bi6vxCX/QimVTOkEQ8Smy8TXlyLEMiI8v2ER8TT5zn/A6qlXK1k4H1m6WlbfjhWJo7kbK1UEytYjIfrTV9BmwL6DCycGKqf+gqVdif6dcb/d8v8OYZjNSImx3E+mpNhglMI0+CkasoX+MKQqAsnTOtvJICit7WaY5UuTWupbyyYXmoenFGKB49zQ8R4plSOQDRm7uyxUQmzMNOxHelv0tlfknX2BE+lhhGHIYuj132RUpiWZ/piYMy8OH6AbPw21LtYpnnUEL/CdO9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=R/7C/pvwOdA+yGXula/YC4w5AxIogJFAEzii63ER/34=; b=PrqH55olxEOwuCXGFfcwzmOMWvIQLE7FrFZ1DhibYsXYLspOwbCwfUC8IEPFa6w9gYtxYOr05SM3cX6wOZiHewPBrtlAAnGgfJj3K+70AaiN92m4Ijfw0vUw8NZIya63mB5hGYblIjJjcoeurQtI7sLaOX1xh6c2bYkzYkMfUFw5mBYEW7G8/t6PeK9Ouz+VyNaEmmPwu+3GC3idKqzcSo/nQHEb3zf5+KxjmdT9pWQ4EBZeAabMsXOQMkr2byMRxIaIsykuAtRvKAeveMjdcBvrJ5OgTflJl3qHabfOUoeaFL4mgY+DmZfnb8HwcMI8ApuZAw/PhRc2bPlefpDdlw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none Received: from VI1EUR04FT033.eop-eur04.prod.protection.outlook.com (2a01:111:e400:7e0e::37) by VI1EUR04HT096.eop-eur04.prod.protection.outlook.com (2a01:111:e400:7e0e::210) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2750.19; Tue, 3 Mar 2020 13:02:51 +0000 Received: from AM6PR03MB5170.eurprd03.prod.outlook.com (10.152.28.53) by VI1EUR04FT033.mail.protection.outlook.com (10.152.29.130) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2750.18 via Frontend Transport; Tue, 3 Mar 2020 13:02:51 +0000 Received: from AM6PR03MB5170.eurprd03.prod.outlook.com ([fe80::1956:d274:cab3:b4dd]) by AM6PR03MB5170.eurprd03.prod.outlook.com ([fe80::1956:d274:cab3:b4dd%6]) with mapi id 15.20.2772.019; Tue, 3 Mar 2020 13:02:51 +0000 Received: from [192.168.1.101] (92.77.140.102) by AM0PR06CA0037.eurprd06.prod.outlook.com (2603:10a6:208:aa::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2772.14 via Frontend Transport; Tue, 3 Mar 2020 13:02:50 +0000 From: Bernd Edlinger To: Christian Brauner , Kees Cook CC: "Eric W. Biederman" , Jann Horn , Jonathan Corbet , Alexander Viro , Andrew Morton , Alexey Dobriyan , Thomas Gleixner , Oleg Nesterov , Frederic Weisbecker , Andrei Vagin , Ingo Molnar , "Peter Zijlstra (Intel)" , Yuyang Du , David Hildenbrand , Sebastian Andrzej Siewior , Anshuman Khandual , David Howells , James Morris , Greg Kroah-Hartman , Shakeel Butt , Jason Gunthorpe , Christian Kellner , Andrea Arcangeli , Aleksa Sarai , "Dmitry V. Levin" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , "stable@vger.kernel.org" Subject: [PATCHv5] exec: Fix a deadlock in ptrace Thread-Topic: [PATCHv5] exec: Fix a deadlock in ptrace Thread-Index: AQHV8VwLHjttz8YuA0eRTVSvUK/ceg== Date: Tue, 3 Mar 2020 13:02:51 +0000 Message-ID: References: <87a74zmfc9.fsf@x220.int.ebiederm.org> <87k142lpfz.fsf@x220.int.ebiederm.org> <875zfmloir.fsf@x220.int.ebiederm.org> <87v9nmjulm.fsf@x220.int.ebiederm.org> <202003021531.C77EF10@keescook> <20200303085802.eqn6jbhwxtmz4j2x@wittgenstein> In-Reply-To: <20200303085802.eqn6jbhwxtmz4j2x@wittgenstein> Accept-Language: en-US, en-GB, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: AM0PR06CA0037.eurprd06.prod.outlook.com (2603:10a6:208:aa::14) To AM6PR03MB5170.eurprd03.prod.outlook.com (2603:10a6:20b:ca::23) x-incomingtopheadermarker: OriginalChecksum:3A9E593F916730446E6B3724A95D918191EAB9F962843CB6E4A8D395A1D5C545; UpperCasedChecksum:7F3C66001225CD05E3033E478368C91FA8BD86D313988ED0F8D15FF8ACFB94B4; SizeAsReceived:9446; Count:50 x-ms-exchange-messagesentrepresentingtype: 1 x-tmn: [A2XfsmQ/5CM0LopJ+nZrc+vKH/0O+5/w] x-microsoft-original-message-id: x-ms-publictraffictype: Email x-incomingheadercount: 50 x-eopattributedmessage: 0 x-ms-office365-filtering-correlation-id: 5f7270e3-e2fb-45bc-69b9-08d7bf732e01 x-ms-traffictypediagnostic: VI1EUR04HT096: x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: OD2NV1XhwmL/GVkSQrjMCoq2VUC7uLil4AbzD9yPDp5HB80BMPToAzvOSOsryGrQ1RqHr5t6avshoghzgYezThxKvPmgnwC4RzJrtUCIXcV8PUFfljcUY9dZPLyFpX1nzQde+mUh31AeLgGp1qwj5Fwfz2XpuIvXd4lqVUjMLULbqiG7T/V9WwBneDzldkt9 x-ms-exchange-antispam-messagedata: 1UKjL8uwczkk74NkbKmZ71XZb4qLbnniJ3WGLt46/0AJBTee3cx3PySwyKIZkm2ZtoGSxFWstzD37IX15RSjbQjenmYjDl8XKUDvRjYoojOv/U05LoGAmPmRSxHPvNY8ssseyP4/OgUhf4woyhClvg== x-ms-exchange-transport-forked: True Content-ID: <155893300113B54C8A26153B7F1E7563@eurprd03.prod.outlook.com> MIME-Version: 1.0 X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-CrossTenant-Network-Message-Id: 5f7270e3-e2fb-45bc-69b9-08d7bf732e01 X-MS-Exchange-CrossTenant-rms-persistedconsumerorg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Mar 2020 13:02:51.4960 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1EUR04HT096 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org This fixes a deadlock in the tracer when tracing a multi-threaded application that calls execve while more than one thread are running. I observed that when running strace on the gcc test suite, it always blocks after a while, when expect calls execve, because other threads have to be terminated. They send ptrace events, but the strace is no longer able to respond, since it is blocked in vm_access. The deadlock is always happening when strace needs to access the tracees process mmap, while another thread in the tracee starts to execve a child process, but that cannot continue until the PTRACE_EVENT_EXIT is handled and the WIFEXITED event is received: strace D 0 30614 30584 0x00000000 Call Trace: __schedule+0x3ce/0x6e0 schedule+0x5c/0xd0 schedule_preempt_disabled+0x15/0x20 __mutex_lock.isra.13+0x1ec/0x520 __mutex_lock_killable_slowpath+0x13/0x20 mutex_lock_killable+0x28/0x30 mm_access+0x27/0xa0 process_vm_rw_core.isra.3+0xff/0x550 process_vm_rw+0xdd/0xf0 __x64_sys_process_vm_readv+0x31/0x40 do_syscall_64+0x64/0x220 entry_SYSCALL_64_after_hwframe+0x44/0xa9 expect D 0 31933 30876 0x80004003 Call Trace: __schedule+0x3ce/0x6e0 schedule+0x5c/0xd0 flush_old_exec+0xc4/0x770 load_elf_binary+0x35a/0x16c0 search_binary_handler+0x97/0x1d0 __do_execve_file.isra.40+0x5d4/0x8a0 __x64_sys_execve+0x49/0x60 do_syscall_64+0x64/0x220 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The proposed solution is to take the cred_guard_mutex only in a critical section at the beginning, and at the end of the execve function, and let PTRACE_ATTACH fail with EAGAIN while execve is not complete, but other functions like vm_access are allowed to complete normally. This changes the lifetime of the cred_guard_mutex lock to be: - during prepare_bprm_creds() - from flush_old_exec() through install_exec_creds() Before, cred_guard_mutex was held from prepare_bprm_creds() through install_exec_creds(). I also took the opportunity to improve the documentation of prepare_creds, which is obviously out of sync. Signed-off-by: Bernd Edlinger --- Documentation/security/credentials.rst | 19 +++++---- fs/exec.c | 41 ++++++++++++++++--- include/linux/binfmts.h | 6 ++- include/linux/sched/signal.h | 2 + kernel/cred.c | 2 +- kernel/ptrace.c | 4 ++ kernel/seccomp.c | 3 ++ mm/process_vm_access.c | 2 +- tools/testing/selftests/ptrace/Makefile | 4 +- tools/testing/selftests/ptrace/vmaccess.c | 66 +++++++++++++++++++++++++++++++ 10 files changed, 130 insertions(+), 19 deletions(-) create mode 100644 tools/testing/selftests/ptrace/vmaccess.c v2: adds a test case which passes when this patch is applied. v3: fixes the issue without introducing a new mutex. v4: fixes one comment and a formatting issue found by checkpatch.pl in the test case. v5: addresses review comments. -- 1.9.1 diff --git a/Documentation/security/credentials.rst b/Documentation/security/credentials.rst index 282e79f..0988798 100644 --- a/Documentation/security/credentials.rst +++ b/Documentation/security/credentials.rst @@ -437,9 +437,14 @@ new set of credentials by calling:: struct cred *prepare_creds(void); -this locks current->cred_replace_mutex and then allocates and constructs a -duplicate of the current process's credentials, returning with the mutex still -held if successful. It returns NULL if not successful (out of memory). +this allocates and constructs a duplicate of the current process's credentials. +It returns NULL if not successful (out of memory). + +If called from __do_execve_file, the mutex current->signal->cred_guard_mutex +is acquired before this function gets called, and released after setting +current->signal->cred_locked_in_execve. The same mutex is acquired later, +while the credentials and the process mmap are actually changed, and +current->signal->cred_locked_in_execve is reset again. The mutex prevents ``ptrace()`` from altering the ptrace state of a process while security checks on credentials construction and changing is taking place @@ -466,9 +471,8 @@ by calling:: This will alter various aspects of the credentials and the process, giving the LSM a chance to do likewise, then it will use ``rcu_assign_pointer()`` to -actually commit the new credentials to ``current->cred``, it will release -``current->cred_replace_mutex`` to allow ``ptrace()`` to take place, and it -will notify the scheduler and others of the changes. +actually commit the new credentials to ``current->cred``, and it will notify +the scheduler and others of the changes. This function is guaranteed to return 0, so that it can be tail-called at the end of such functions as ``sys_setresuid()``. @@ -486,8 +490,7 @@ invoked:: void abort_creds(struct cred *new); -This releases the lock on ``current->cred_replace_mutex`` that -``prepare_creds()`` got and then releases the new credentials. +This releases the new credentials. A typical credentials alteration function would look something like this:: diff --git a/fs/exec.c b/fs/exec.c index 74d88da..5fc744e 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1266,6 +1266,12 @@ int flush_old_exec(struct linux_binprm * bprm) if (retval) goto out; + retval = mutex_lock_killable(¤t->signal->cred_guard_mutex); + if (retval) + goto out; + + bprm->called_flush_old_exec = 1; + /* * Must be called _before_ exec_mmap() as bprm->mm is * not visibile until then. This also enables the update @@ -1398,29 +1404,51 @@ void finalize_exec(struct linux_binprm *bprm) EXPORT_SYMBOL(finalize_exec); /* - * Prepare credentials and lock ->cred_guard_mutex. + * Prepare credentials and set ->cred_locked_in_execve. * install_exec_creds() commits the new creds and drops the lock. * Or, if exec fails before, free_bprm() should release ->cred and * and unlock. */ static int prepare_bprm_creds(struct linux_binprm *bprm) { + int ret; + if (mutex_lock_interruptible(¤t->signal->cred_guard_mutex)) return -ERESTARTNOINTR; + ret = -EAGAIN; + if (unlikely(current->signal->cred_locked_in_execve)) + goto out; + + ret = -ENOMEM; bprm->cred = prepare_exec_creds(); - if (likely(bprm->cred)) - return 0; + if (likely(bprm->cred)) { + current->signal->cred_locked_in_execve = true; + ret = 0; + } +out: mutex_unlock(¤t->signal->cred_guard_mutex); - return -ENOMEM; + return ret; } static void free_bprm(struct linux_binprm *bprm) { free_arg_pages(bprm); if (bprm->cred) { - mutex_unlock(¤t->signal->cred_guard_mutex); + /* + * If flush_old_exec did not acquire the cred_guard_mutex, + * try again here, but if that fails, just leave + * cred_locked_in_execve alone, since this means there + * must be a fatal signal pending. + * We don't want to prevent this task to be killed, just + * because it is stuck in the middle of execve. + */ + if (bprm->called_flush_old_exec || + !mutex_lock_killable(¤t->signal->cred_guard_mutex)) { + current->signal->cred_locked_in_execve = false; + mutex_unlock(¤t->signal->cred_guard_mutex); + } abort_creds(bprm->cred); } if (bprm->file) { @@ -1469,13 +1497,14 @@ void install_exec_creds(struct linux_binprm *bprm) * credentials; any time after this it may be unlocked. */ security_bprm_committed_creds(bprm); + current->signal->cred_locked_in_execve = false; mutex_unlock(¤t->signal->cred_guard_mutex); } EXPORT_SYMBOL(install_exec_creds); /* * determine how safe it is to execute the proposed program - * - the caller must hold ->cred_guard_mutex to protect against + * - the caller must have set ->cred_locked_in_execve to protect against * PTRACE_ATTACH or seccomp thread-sync */ static void check_unsafe_exec(struct linux_binprm *bprm) diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index b40fc63..2930253 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -44,7 +44,11 @@ struct linux_binprm { * exec has happened. Used to sanitize execution environment * and to set AT_SECURE auxv for glibc. */ - secureexec:1; + secureexec:1, + /* + * Set by flush_old_exec, when the cred_guard_mutex is taken. + */ + called_flush_old_exec:1; #ifdef __alpha__ unsigned int taso:1; #endif diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 8805025..8f8e358 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -225,6 +225,8 @@ struct signal_struct { struct mutex cred_guard_mutex; /* guard against foreign influences on * credential calculations * (notably. ptrace) */ + bool cred_locked_in_execve; /* set while in execve, only valid when + * cred_guard_mutex is held */ } __randomize_layout; /* diff --git a/kernel/cred.c b/kernel/cred.c index 809a985..e4c78de 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -676,7 +676,7 @@ void __init cred_init(void) * * Returns the new credentials or NULL if out of memory. * - * Does not take, and does not return holding current->cred_replace_mutex. + * Does not take, and does not return holding ->cred_guard_mutex. */ struct cred *prepare_kernel_cred(struct task_struct *daemon) { diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 43d6179..0f82bab 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -395,6 +395,10 @@ static int ptrace_attach(struct task_struct *task, long request, if (mutex_lock_interruptible(&task->signal->cred_guard_mutex)) goto out; + retval = -EAGAIN; + if (task->signal->cred_locked_in_execve) + goto unlock_creds; + task_lock(task); retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS); task_unlock(task); diff --git a/kernel/seccomp.c b/kernel/seccomp.c index b6ea3dc..3efa3e5 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -342,6 +342,9 @@ static inline pid_t seccomp_can_sync_threads(void) BUG_ON(!mutex_is_locked(¤t->signal->cred_guard_mutex)); assert_spin_locked(¤t->sighand->siglock); + if (current->signal->cred_locked_in_execve) + return -EAGAIN; + /* Validate all threads being eligible for synchronization. */ caller = current; for_each_thread(caller, thread) { diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c index 357aa7b..b3e6eb5 100644 --- a/mm/process_vm_access.c +++ b/mm/process_vm_access.c @@ -204,7 +204,7 @@ static ssize_t process_vm_rw_core(pid_t pid, struct iov_iter *iter, if (!mm || IS_ERR(mm)) { rc = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH; /* - * Explicitly map EACCES to EPERM as EPERM is a more a + * Explicitly map EACCES to EPERM as EPERM is a more * appropriate error code for process_vw_readv/writev */ if (rc == -EACCES) diff --git a/tools/testing/selftests/ptrace/Makefile b/tools/testing/selftests/ptrace/Makefile index c0b7f89..2f1f532 100644 --- a/tools/testing/selftests/ptrace/Makefile +++ b/tools/testing/selftests/ptrace/Makefile @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only -CFLAGS += -iquote../../../../include/uapi -Wall +CFLAGS += -std=c99 -pthread -iquote../../../../include/uapi -Wall -TEST_GEN_PROGS := get_syscall_info peeksiginfo +TEST_GEN_PROGS := get_syscall_info peeksiginfo vmaccess include ../lib.mk diff --git a/tools/testing/selftests/ptrace/vmaccess.c b/tools/testing/selftests/ptrace/vmaccess.c new file mode 100644 index 0000000..6d8a048 --- /dev/null +++ b/tools/testing/selftests/ptrace/vmaccess.c @@ -0,0 +1,66 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (c) 2020 Bernd Edlinger + * All rights reserved. + * + * Check whether /proc/$pid/mem can be accessed without causing deadlocks + * when de_thread is blocked with ->cred_guard_mutex held. + */ + +#include "../kselftest_harness.h" +#include +#include +#include +#include +#include +#include + +static void *thread(void *arg) +{ + ptrace(PTRACE_TRACEME, 0, 0L, 0L); + return NULL; +} + +TEST(vmaccess) +{ + int f, pid = fork(); + char mm[64]; + + if (!pid) { + pthread_t pt; + + pthread_create(&pt, NULL, thread, NULL); + pthread_join(pt, NULL); + execlp("true", "true", NULL); + } + + sleep(1); + sprintf(mm, "/proc/%d/mem", pid); + f = open(mm, O_RDONLY); + ASSERT_LE(0, f); + close(f); + f = kill(pid, SIGCONT); + ASSERT_EQ(0, f); +} + +TEST(attach) +{ + int f, pid = fork(); + + if (!pid) { + pthread_t pt; + + pthread_create(&pt, NULL, thread, NULL); + pthread_join(pt, NULL); + execlp("true", "true", NULL); + } + + sleep(1); + f = ptrace(PTRACE_ATTACH, pid, 0L, 0L); + ASSERT_EQ(EAGAIN, errno); + ASSERT_EQ(f, -1); + f = kill(pid, SIGCONT); + ASSERT_EQ(0, f); +} + +TEST_HARNESS_MAIN