From patchwork Mon Feb 7 12:17:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Michal_Koutn=C3=BD?= X-Patchwork-Id: 540654 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A18D4C43217 for ; Mon, 7 Feb 2022 13:28:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242694AbiBGN1l (ORCPT ); Mon, 7 Feb 2022 08:27:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1445598AbiBGMmS (ORCPT ); Mon, 7 Feb 2022 07:42:18 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D0BBE033DB0; Mon, 7 Feb 2022 04:33:57 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id BBE201F386; Mon, 7 Feb 2022 12:18:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1644236292; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LOAZl+X1cH+/PE8PnJNA97WZ/FEmmtdAOX5J155V/GY=; b=R8vHFitE3/0c2+aGokRHHqIecoBd082bURHw791MAgCrpS+qdVLAID/ocEGvCoGhMecpAz 8vZtgPllVDkkTehtIMbSCysfFSrc9ShUVS4/eJouxWbDJ06PWi1EEGxlqdAAHmc6Siwwl3 6vp3aC0jiKvBudbRtwu0d5OAVauMN5M= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 97D9113BBC; Mon, 7 Feb 2022 12:18:12 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 2KeSJAQOAWLMegAAMHmgww (envelope-from ); Mon, 07 Feb 2022 12:18:12 +0000 From: =?utf-8?q?Michal_Koutn=C3=BD?= To: Eric Biederman , Alexey Gladkov Cc: Kees Cook , Shuah Khan , Christian Brauner , Solar Designer , Ran Xiaokai , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Linux Containers Subject: [RFC PATCH 2/6] set*uid: Check RLIMIT_PROC against new credentials Date: Mon, 7 Feb 2022 13:17:56 +0100 Message-Id: <20220207121800.5079-3-mkoutny@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220207121800.5079-1-mkoutny@suse.com> References: <20220207121800.5079-1-mkoutny@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org The generic idea is that not even root or capable user can force an unprivileged user's limit breach. (For historical and security reasons this check is postponed from set*uid to execve.) During the switch the resource consumption of target the user has to be checked. The commits 905ae01c4ae2 ("Add a reference to ucounts for each cred") and 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") made the check in set_user() look at the old user's consumption. This version of the fix simply moves the check to the place where the actual switch of the accounting structure happens -- set_cred_ucounts(). The other callers are kept without the check but with the per-userns accounting they may be newly subject to the check too. The set_cred_ucounts() becomes inconsistent since task->flags are passed by the caller but task_rlimit() is implicitly `current`'s, this patch is meant to illustrate the issue, nicer implementation is possible. Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") Signed-off-by: Michal Koutný --- fs/exec.c | 2 +- include/linux/cred.h | 2 +- kernel/cred.c | 24 +++++++++++++++++++++--- kernel/fork.c | 2 +- kernel/sys.c | 21 +++------------------ kernel/user_namespace.c | 2 +- 6 files changed, 28 insertions(+), 25 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index fc598c2652b2..e759e42c61da 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1363,7 +1363,7 @@ int begin_new_exec(struct linux_binprm * bprm) WRITE_ONCE(me->self_exec_id, me->self_exec_id + 1); flush_signal_handlers(me, 0); - retval = set_cred_ucounts(bprm->cred); + retval = set_cred_ucounts(bprm->cred, NULL); if (retval < 0) goto out_unlock; diff --git a/include/linux/cred.h b/include/linux/cred.h index fcbc6885cc09..455525ab380d 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -170,7 +170,7 @@ extern int set_security_override_from_ctx(struct cred *, const char *); extern int set_create_files_as(struct cred *, struct inode *); extern int cred_fscmp(const struct cred *, const struct cred *); extern void __init cred_init(void); -extern int set_cred_ucounts(struct cred *); +extern int set_cred_ucounts(struct cred *, unsigned int *); /* * check for validity of credentials diff --git a/kernel/cred.c b/kernel/cred.c index 473d17c431f3..791cab70b764 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -370,7 +370,7 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags) ret = create_user_ns(new); if (ret < 0) goto error_put; - ret = set_cred_ucounts(new); + ret = set_cred_ucounts(new, NULL); if (ret < 0) goto error_put; } @@ -492,7 +492,7 @@ int commit_creds(struct cred *new) /* do it * RLIMIT_NPROC limits on user->processes have already been checked - * in set_user(). + * in set_cred_ucounts(). */ alter_cred_subscribers(new, 2); if (new->user != old->user || new->user_ns != old->user_ns) @@ -663,7 +663,7 @@ int cred_fscmp(const struct cred *a, const struct cred *b) } EXPORT_SYMBOL(cred_fscmp); -int set_cred_ucounts(struct cred *new) +int set_cred_ucounts(struct cred *new, unsigned int *nproc_flags) { struct task_struct *task = current; const struct cred *old = task->real_cred; @@ -685,6 +685,24 @@ int set_cred_ucounts(struct cred *new) new->ucounts = new_ucounts; put_ucounts(old_ucounts); + if (!nproc_flags) + return 0; + + /* + * We don't fail in case of NPROC limit excess here because too many + * poorly written programs don't check set*uid() return code, assuming + * it never fails if called by root. We may still enforce NPROC limit + * for programs doing set*uid()+execve() by harmlessly deferring the + * failure to the execve() stage. + */ + if (ucounts_limit_cmp(new->ucounts, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) >= 0 && + new->user != INIT_USER && + !security_capable(new, &init_user_ns, CAP_SYS_RESOURCE, CAP_OPT_NONE) && + !security_capable(new, &init_user_ns, CAP_SYS_ADMIN, CAP_OPT_NONE)) + *nproc_flags |= PF_NPROC_EXCEEDED; + else + *nproc_flags &= ~PF_NPROC_EXCEEDED; + return 0; } diff --git a/kernel/fork.c b/kernel/fork.c index 7cb21a70737d..a4005c679d29 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -3051,7 +3051,7 @@ int ksys_unshare(unsigned long unshare_flags) goto bad_unshare_cleanup_cred; if (new_cred) { - err = set_cred_ucounts(new_cred); + err = set_cred_ucounts(new_cred, NULL); if (err) goto bad_unshare_cleanup_cred; } diff --git a/kernel/sys.c b/kernel/sys.c index 48c90dcceff3..4e4eea30e235 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -472,21 +472,6 @@ static int set_user(struct cred *new) if (!new_user) return -EAGAIN; - /* - * We don't fail in case of NPROC limit excess here because too many - * poorly written programs don't check set*uid() return code, assuming - * it never fails if called by root. We may still enforce NPROC limit - * for programs doing set*uid()+execve() by harmlessly deferring the - * failure to the execve() stage. - */ - if (ucounts_limit_cmp(new->ucounts, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) >= 0 && - new_user != INIT_USER && - !security_capable(new, &init_user_ns, CAP_SYS_RESOURCE, CAP_OPT_NONE) && - !security_capable(new, &init_user_ns, CAP_SYS_ADMIN, CAP_OPT_NONE)) - current->flags |= PF_NPROC_EXCEEDED; - else - current->flags &= ~PF_NPROC_EXCEEDED; - free_uid(new->user); new->user = new_user; return 0; @@ -560,7 +545,7 @@ long __sys_setreuid(uid_t ruid, uid_t euid) if (retval < 0) goto error; - retval = set_cred_ucounts(new); + retval = set_cred_ucounts(new, ¤t->flags); if (retval < 0) goto error; @@ -622,7 +607,7 @@ long __sys_setuid(uid_t uid) if (retval < 0) goto error; - retval = set_cred_ucounts(new); + retval = set_cred_ucounts(new, ¤t->flags); if (retval < 0) goto error; @@ -701,7 +686,7 @@ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid) if (retval < 0) goto error; - retval = set_cred_ucounts(new); + retval = set_cred_ucounts(new, ¤t->flags); if (retval < 0) goto error; diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 6b2e3ca7ee99..f7eec0b0233b 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -1344,7 +1344,7 @@ static int userns_install(struct nsset *nsset, struct ns_common *ns) put_user_ns(cred->user_ns); set_cred_user_ns(cred, get_user_ns(user_ns)); - if (set_cred_ucounts(cred) < 0) + if (set_cred_ucounts(cred, NULL) < 0) return -EINVAL; return 0; From patchwork Mon Feb 7 12:17:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Michal_Koutn=C3=BD?= X-Patchwork-Id: 540657 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61BB8C43217 for ; Mon, 7 Feb 2022 12:32:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1382630AbiBGM20 (ORCPT ); Mon, 7 Feb 2022 07:28:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60204 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1382754AbiBGMYW (ORCPT ); Mon, 7 Feb 2022 07:24:22 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E0E2C002B4E; Mon, 7 Feb 2022 04:18:14 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 262E8210FA; Mon, 7 Feb 2022 12:18:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1644236293; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WP6Xgo2ws98hPpXKeUU962VqHQ+2ksGNpkLhWTonht4=; b=TGINR8F84BeIkQgoA51gSRY3TQArk02/ttVciZuSUTaE915VzbI2t5jJbBPIy7qcRntcI8 Y+Lgxh5qk9zh+ztIJwYwFqwMHqJpiQIC3MdiFPcJEZbx5mRbfsl2oS0DgxqXR76IRO88o/ 14AYuFnOo3RVUkVt/oFbsuSTrFiN5g8= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 0472B13BE6; Mon, 7 Feb 2022 12:18:13 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id qGCSAAUOAWLMegAAMHmgww (envelope-from ); Mon, 07 Feb 2022 12:18:13 +0000 From: =?utf-8?q?Michal_Koutn=C3=BD?= To: Eric Biederman , Alexey Gladkov Cc: Kees Cook , Shuah Khan , Christian Brauner , Solar Designer , Ran Xiaokai , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Linux Containers Subject: [RFC PATCH 5/6] selftests: Challenge RLIMIT_NPROC in user namespaces Date: Mon, 7 Feb 2022 13:17:59 +0100 Message-Id: <20220207121800.5079-6-mkoutny@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220207121800.5079-1-mkoutny@suse.com> References: <20220207121800.5079-1-mkoutny@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org The services are started in descendant user namepaces, each of them should honor the RLIMIT_NPROC that's passed during user namespace creation. main [user_ns_0] ` service [user_ns_1] ` worker 1 ` worker 2 ... ` worker k ... ` service [user_ns_n] ` worker 1 ` worker 2 ... ` worker k Test uses explicit synchronization, to make sure original parent's limit does not interfere with descendants. Signed-off-by: Michal Koutný --- .../selftests/rlimits/rlimits-per-userns.c | 154 ++++++++++++++---- 1 file changed, 125 insertions(+), 29 deletions(-) diff --git a/tools/testing/selftests/rlimits/rlimits-per-userns.c b/tools/testing/selftests/rlimits/rlimits-per-userns.c index 26dc949e93ea..54c1b345e42b 100644 --- a/tools/testing/selftests/rlimits/rlimits-per-userns.c +++ b/tools/testing/selftests/rlimits/rlimits-per-userns.c @@ -9,7 +9,9 @@ #include #include #include +#include +#include #include #include #include @@ -21,38 +23,74 @@ #include #include -#define NR_CHILDS 2 +#define THE_LIMIT 4 +#define NR_CHILDREN 5 + +static_assert(NR_CHILDREN >= THE_LIMIT-1, "Need slots for limit-1 children."); static char *service_prog; static uid_t user = 60000; static uid_t group = 60000; +static struct rlimit saved_limit; + +/* Two uses: main and service */ +static pid_t child[NR_CHILDREN]; +static pid_t pid; static void setrlimit_nproc(rlim_t n) { - pid_t pid = getpid(); struct rlimit limit = { .rlim_cur = n, .rlim_max = n }; - - warnx("(pid=%d): Setting RLIMIT_NPROC=%ld", pid, n); + if (getrlimit(RLIMIT_NPROC, &saved_limit) < 0) + err(EXIT_FAILURE, "(pid=%d): getrlimit(RLIMIT_NPROC)", pid); if (setrlimit(RLIMIT_NPROC, &limit) < 0) err(EXIT_FAILURE, "(pid=%d): setrlimit(RLIMIT_NPROC)", pid); + + warnx("(pid=%d): Set RLIMIT_NPROC=%ld", pid, n); +} + +static void restore_rlimit_nproc(void) +{ + if (setrlimit(RLIMIT_NPROC, &saved_limit) < 0) + err(EXIT_FAILURE, "(pid=%d): setrlimit(RLIMIT_NPROC, saved)", pid); + warnx("(pid=%d) Restored RLIMIT_NPROC", pid); } -static pid_t fork_child(void) +enum msg_sync { + UNSHARE, + RLIMIT_RESTORE, +}; + +static void sync_notify(int fd, enum msg_sync m) { - pid_t pid = fork(); + char tmp = m; + + if (write(fd, &tmp, 1) < 0) + warnx("(pid=%d): failed sync-write", pid); +} - if (pid < 0) +static void sync_wait(int fd, enum msg_sync m) +{ + char tmp; + + if (read(fd, &tmp, 1) < 0) + warnx("(pid=%d): failed sync-read", pid); +} + +static pid_t fork_child(int control_fd) +{ + pid_t new_pid = fork(); + + if (new_pid < 0) err(EXIT_FAILURE, "fork"); - if (pid > 0) - return pid; + if (new_pid > 0) + return new_pid; pid = getpid(); - warnx("(pid=%d): New process starting ...", pid); if (prctl(PR_SET_PDEATHSIG, SIGKILL) < 0) @@ -73,6 +111,9 @@ static pid_t fork_child(void) if (unshare(CLONE_NEWUSER) < 0) err(EXIT_FAILURE, "unshare(CLONE_NEWUSER)"); + sync_notify(control_fd, UNSHARE); + sync_wait(control_fd, RLIMIT_RESTORE); + char *const argv[] = { "service", NULL }; char *const envp[] = { "I_AM_SERVICE=1", NULL }; @@ -82,37 +123,92 @@ static pid_t fork_child(void) err(EXIT_FAILURE, "(pid=%d): execve", pid); } +static void run_service(void) +{ + size_t i; + int ret = EXIT_SUCCESS; + struct rlimit limit; + char user_ns[PATH_MAX]; + + if (getrlimit(RLIMIT_NPROC, &limit) < 0) + err(EXIT_FAILURE, "(pid=%d) failed getrlimit", pid); + if (readlink("/proc/self/ns/user", user_ns, PATH_MAX) < 0) + err(EXIT_FAILURE, "(pid=%d) failed readlink", pid); + + warnx("(pid=%d) Service instance attempts %i children, limit %lu:%lu, ns=%s", + pid, THE_LIMIT, limit.rlim_cur, limit.rlim_max, user_ns); + + /* test rlimit inside the service, effectively THE_LIMIT-1 becaue of service itself */ + for (i = 0; i < THE_LIMIT; i++) { + child[i] = fork(); + if (child[i] == 0) { + /* service child */ + pause(); + exit(EXIT_SUCCESS); + } + if (child[i] < 0) { + warnx("(pid=%d) service fork %lu failed, errno = %i", pid, i+1, errno); + if (!(i == THE_LIMIT-1 && errno == EAGAIN)) + ret = EXIT_FAILURE; + } else if (i == THE_LIMIT-1) { + warnx("(pid=%d) RLIMIT_NPROC not honored", pid); + ret = EXIT_FAILURE; + } + } + + /* service cleanup */ + for (i = 0; i < THE_LIMIT; i++) + if (child[i] > 0) + kill(child[i], SIGUSR1); + + for (i = 0; i < THE_LIMIT; i++) + if (child[i] > 0) + waitpid(child[i], NULL, WNOHANG); + + if (ret) + exit(ret); + pause(); +} + int main(int argc, char **argv) { size_t i; - pid_t child[NR_CHILDS]; - int wstatus[NR_CHILDS]; - int childs = NR_CHILDS; - pid_t pid; + int control_fd[NR_CHILDREN]; + int wstatus[NR_CHILDREN]; + int children = NR_CHILDREN; + int sockets[2]; + + pid = getpid(); if (getenv("I_AM_SERVICE")) { - pause(); - exit(EXIT_SUCCESS); + run_service(); + exit(EXIT_FAILURE); } service_prog = argv[0]; - pid = getpid(); warnx("(pid=%d) Starting testcase", pid); - /* - * This rlimit is not a problem for root because it can be exceeded. - */ - setrlimit_nproc(1); - - for (i = 0; i < NR_CHILDS; i++) { - child[i] = fork_child(); + setrlimit_nproc(THE_LIMIT); + for (i = 0; i < NR_CHILDREN; i++) { + if (socketpair(AF_UNIX, SOCK_DGRAM | SOCK_CLOEXEC, 0, sockets) < 0) + err(EXIT_FAILURE, "(pid=%d) socketpair failed", pid); + control_fd[i] = sockets[0]; + child[i] = fork_child(sockets[1]); wstatus[i] = 0; + } + + for (i = 0; i < NR_CHILDREN; i++) + sync_wait(control_fd[i], UNSHARE); + restore_rlimit_nproc(); + + for (i = 0; i < NR_CHILDREN; i++) { + sync_notify(control_fd[i], RLIMIT_RESTORE); usleep(250000); } while (1) { - for (i = 0; i < NR_CHILDS; i++) { + for (i = 0; i < NR_CHILDREN; i++) { if (child[i] <= 0) continue; @@ -126,22 +222,22 @@ int main(int argc, char **argv) warn("(pid=%d): waitpid(%d)", pid, child[i]); child[i] *= -1; - childs -= 1; + children -= 1; } - if (!childs) + if (!children) break; usleep(250000); - for (i = 0; i < NR_CHILDS; i++) { + for (i = 0; i < NR_CHILDREN; i++) { if (child[i] <= 0) continue; kill(child[i], SIGUSR1); } } - for (i = 0; i < NR_CHILDS; i++) { + for (i = 0; i < NR_CHILDREN; i++) { if (WIFEXITED(wstatus[i])) warnx("(pid=%d): pid %d exited, status=%d", pid, -child[i], WEXITSTATUS(wstatus[i])); From patchwork Mon Feb 7 12:18:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Michal_Koutn=C3=BD?= X-Patchwork-Id: 540656 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53BECC433F5 for ; Mon, 7 Feb 2022 13:28:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239162AbiBGN1k (ORCPT ); Mon, 7 Feb 2022 08:27:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1445599AbiBGMmS (ORCPT ); Mon, 7 Feb 2022 07:42:18 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D177E033DB1; Mon, 7 Feb 2022 04:33:57 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 46FD41F38F; Mon, 7 Feb 2022 12:18:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1644236293; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aQBjXVV6nPDxhegxhJuo1rX8QnvdXvXCHXEHajMjE/k=; b=F7vAbavv5+DmmcYqc+EmNYdMceHNSmxYSAN9d0tSWb0chC3D3hfAkjuJXtTsRP9K1btMvr A8jT8Uy54seXjkBVVreXC3HuvigVWh/RwAFUv+ReMCNKF7QqlpMsEl79kgMbBBhrkEMZVk oeL3Pc2ZMCI/XWD7PFQ/NPciWtTEzcI= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 2581413BBC; Mon, 7 Feb 2022 12:18:13 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id wMqjCAUOAWLMegAAMHmgww (envelope-from ); Mon, 07 Feb 2022 12:18:13 +0000 From: =?utf-8?q?Michal_Koutn=C3=BD?= To: Eric Biederman , Alexey Gladkov Cc: Kees Cook , Shuah Khan , Christian Brauner , Solar Designer , Ran Xiaokai , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Linux Containers Subject: [RFC PATCH 6/6] selftests: Test RLIMIT_NPROC in clone-created user namespaces Date: Mon, 7 Feb 2022 13:18:00 +0100 Message-Id: <20220207121800.5079-7-mkoutny@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220207121800.5079-1-mkoutny@suse.com> References: <20220207121800.5079-1-mkoutny@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Verify RLIMIT_NPROC observance in user namespaces also in the clone(CLONE_NEWUSER) path. Note the such a user_ns is created by the privileged user. Signed-off-by: Michal Koutný --- .../selftests/rlimits/rlimits-per-userns.c | 141 +++++++++++++----- 1 file changed, 101 insertions(+), 40 deletions(-) diff --git a/tools/testing/selftests/rlimits/rlimits-per-userns.c b/tools/testing/selftests/rlimits/rlimits-per-userns.c index 54c1b345e42b..46f4cff36b30 100644 --- a/tools/testing/selftests/rlimits/rlimits-per-userns.c +++ b/tools/testing/selftests/rlimits/rlimits-per-userns.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0-or-later /* * Author: Alexey Gladkov + * Author: Michal Koutný */ #define _GNU_SOURCE #include @@ -25,16 +26,25 @@ #define THE_LIMIT 4 #define NR_CHILDREN 5 +#define STACK_SIZE (2 * (1<<20)) -static_assert(NR_CHILDREN >= THE_LIMIT-1, "Need slots for limit-1 children."); +static_assert(NR_CHILDREN >= THE_LIMIT-1, "Need slots for THE_LIMIT-1 children."); -static char *service_prog; static uid_t user = 60000; static uid_t group = 60000; static struct rlimit saved_limit; -/* Two uses: main and service */ -static pid_t child[NR_CHILDREN]; +enum userns_mode { + UM_UNSHARE, /* setrlimit,clone(0),setuid,unshare,execve */ + UM_CLONE_NEWUSER, /* setrlimit,clone(NEWUSER),setuid,execve */ +}; +static struct { + int control_fd; + char *pathname; + enum userns_mode mode; +} child_args; + +/* Cache current pid */ static pid_t pid; static void setrlimit_nproc(rlim_t n) @@ -60,6 +70,7 @@ static void restore_rlimit_nproc(void) } enum msg_sync { + MAP_DEFINE, UNSHARE, RLIMIT_RESTORE, }; @@ -80,15 +91,32 @@ static void sync_wait(int fd, enum msg_sync m) warnx("(pid=%d): failed sync-read", pid); } -static pid_t fork_child(int control_fd) +static int define_maps(pid_t child_pid) { - pid_t new_pid = fork(); + FILE *f; + char filename[PATH_MAX]; - if (new_pid < 0) - err(EXIT_FAILURE, "fork"); + if (child_args.mode != UM_CLONE_NEWUSER) + return 0; + + snprintf(filename, PATH_MAX, "/proc/%i/uid_map", child_pid); + f = fopen(filename, "w"); + if (fprintf(f, "%i %i 1\n", user, user) < 0) + return -1; + fclose(f); + + snprintf(filename, PATH_MAX, "/proc/%i/gid_map", child_pid); + f = fopen(filename, "w"); + if (fprintf(f, "%i %i 1\n", group, group) < 0) + return -1; + fclose(f); + + return 0; +} - if (new_pid > 0) - return new_pid; +static int setup_and_exec(void *arg) +{ + int control_fd = child_args.control_fd; pid = getpid(); warnx("(pid=%d): New process starting ...", pid); @@ -98,6 +126,7 @@ static pid_t fork_child(int control_fd) signal(SIGUSR1, SIG_DFL); + sync_wait(control_fd, RLIMIT_RESTORE); warnx("(pid=%d): Changing to uid=%d, gid=%d", pid, user, group); if (setgid(group) < 0) @@ -107,9 +136,11 @@ static pid_t fork_child(int control_fd) warnx("(pid=%d): Service running ...", pid); - warnx("(pid=%d): Unshare user namespace", pid); - if (unshare(CLONE_NEWUSER) < 0) - err(EXIT_FAILURE, "unshare(CLONE_NEWUSER)"); + if (child_args.mode == UM_UNSHARE) { + warnx("(pid=%d): Unshare user namespace", pid); + if (unshare(CLONE_NEWUSER) < 0) + err(EXIT_FAILURE, "unshare(CLONE_NEWUSER)"); + } sync_notify(control_fd, UNSHARE); sync_wait(control_fd, RLIMIT_RESTORE); @@ -119,14 +150,30 @@ static pid_t fork_child(int control_fd) warnx("(pid=%d): Executing real service ...", pid); - execve(service_prog, argv, envp); + execve(child_args.pathname, argv, envp); err(EXIT_FAILURE, "(pid=%d): execve", pid); } -static void run_service(void) +static pid_t start_child(char *pathname, int control_fd) +{ + char *stack = malloc(STACK_SIZE); + int flags = child_args.mode == UM_CLONE_NEWUSER ? CLONE_NEWUSER : 0; + pid_t new_pid; + + child_args.control_fd = control_fd; + child_args.pathname = pathname; + + new_pid = clone(setup_and_exec, stack+STACK_SIZE-1, flags, NULL); + if (new_pid < 0) + err(EXIT_FAILURE, "clone"); + + free(stack); + close(control_fd); + return new_pid; +} + +static void dump_context(size_t n_workers) { - size_t i; - int ret = EXIT_SUCCESS; struct rlimit limit; char user_ns[PATH_MAX]; @@ -135,44 +182,55 @@ static void run_service(void) if (readlink("/proc/self/ns/user", user_ns, PATH_MAX) < 0) err(EXIT_FAILURE, "(pid=%d) failed readlink", pid); - warnx("(pid=%d) Service instance attempts %i children, limit %lu:%lu, ns=%s", - pid, THE_LIMIT, limit.rlim_cur, limit.rlim_max, user_ns); + warnx("(pid=%d) Service instance attempts %lu workers, limit %lu:%lu, ns=%s", + pid, n_workers, limit.rlim_cur, limit.rlim_max, user_ns); +} + +static int run_service(void) +{ + size_t i, n_workers = THE_LIMIT; + pid_t worker[NR_CHILDREN]; + int ret = EXIT_SUCCESS; - /* test rlimit inside the service, effectively THE_LIMIT-1 becaue of service itself */ - for (i = 0; i < THE_LIMIT; i++) { - child[i] = fork(); - if (child[i] == 0) { - /* service child */ + dump_context(n_workers); + + /* test rlimit inside the service, last worker should fail because of service itself */ + for (i = 0; i < n_workers; i++) { + worker[i] = fork(); + if (worker[i] == 0) { + /* service worker */ pause(); exit(EXIT_SUCCESS); } - if (child[i] < 0) { + if (worker[i] < 0) { warnx("(pid=%d) service fork %lu failed, errno = %i", pid, i+1, errno); - if (!(i == THE_LIMIT-1 && errno == EAGAIN)) + if (!(i == n_workers-1 && errno == EAGAIN)) ret = EXIT_FAILURE; - } else if (i == THE_LIMIT-1) { + } else if (i == n_workers-1) { warnx("(pid=%d) RLIMIT_NPROC not honored", pid); ret = EXIT_FAILURE; } } /* service cleanup */ - for (i = 0; i < THE_LIMIT; i++) - if (child[i] > 0) - kill(child[i], SIGUSR1); + for (i = 0; i < n_workers; i++) + if (worker[i] > 0) + kill(worker[i], SIGUSR1); - for (i = 0; i < THE_LIMIT; i++) - if (child[i] > 0) - waitpid(child[i], NULL, WNOHANG); + for (i = 0; i < n_workers; i++) + if (worker[i] > 0) + waitpid(worker[i], NULL, WNOHANG); if (ret) - exit(ret); + return ret; pause(); + return EXIT_FAILURE; } int main(int argc, char **argv) { size_t i; + pid_t child[NR_CHILDREN]; int control_fd[NR_CHILDREN]; int wstatus[NR_CHILDREN]; int children = NR_CHILDREN; @@ -180,12 +238,11 @@ int main(int argc, char **argv) pid = getpid(); - if (getenv("I_AM_SERVICE")) { - run_service(); - exit(EXIT_FAILURE); - } + if (getenv("I_AM_SERVICE")) + return run_service(); - service_prog = argv[0]; + if (argc > 1 && *argv[1] == 'c') + child_args.mode = UM_CLONE_NEWUSER; warnx("(pid=%d) Starting testcase", pid); @@ -194,8 +251,12 @@ int main(int argc, char **argv) if (socketpair(AF_UNIX, SOCK_DGRAM | SOCK_CLOEXEC, 0, sockets) < 0) err(EXIT_FAILURE, "(pid=%d) socketpair failed", pid); control_fd[i] = sockets[0]; - child[i] = fork_child(sockets[1]); + child[i] = start_child(argv[0], sockets[1]); wstatus[i] = 0; + + if (define_maps(child[i]) < 0) + err(EXIT_FAILURE, "(pid=%d) user_ns maps definition failed", pid); + sync_notify(control_fd[i], MAP_DEFINE); } for (i = 0; i < NR_CHILDREN; i++)