From patchwork Fri Jan 10 14:55:44 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 856255 Delivered-To: patch@linaro.org Received: by 2002:a5d:525c:0:b0:385:e875:8a9e with SMTP id k28csp266616wrc; Fri, 10 Jan 2025 06:56:29 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXF90gAchxE66SbhUObGeDrKtygG4ef4hpb6d9MPAcr/Y2lN6atJ7COilq0csE0LqEc2BjCqQ==@linaro.org X-Google-Smtp-Source: AGHT+IHoOnnRbLQgfEXI6eZeTzSQLWeMqzPSOu/s12mwYJe/RJ9QA0djPJ41VR6wNNi12ew6aN8H X-Received: by 2002:a05:620a:2589:b0:7b6:cf60:3971 with SMTP id af79cd13be357-7bcd97095bfmr1760136985a.22.1736520989141; Fri, 10 Jan 2025 06:56:29 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1736520989; cv=pass; d=google.com; s=arc-20240605; b=QCTEPFomXsHdgpZALI3lVaWxG+JpiI7sMvLNph+XaynW295qegHryl4IcNMs1RENDv J/DiiMpKnqyXiEe8bRdNzSBz7pP72vh/Ohn4vzVckYQhHA0mFGMNHI/cfLmlFJICtvVl vNMuYS+4wxf6ysbDt5sXYCJZBWEjwbZ9cifuVUUSR9r1MAM9qWvi1xoJ8cwsWrH8oho8 M/XDz0RfxKWEUDknAIqr/aVXtUWmPJdq6iUkU+REimibRFJDuLww80IMaHIfpHt8KYnu t4K70YOoh5yezpEtLW+mHX5c8seeQwRZwOi0aJGnVUODMx/de3M3YTFj1X9TSKWBOoA3 5Z8Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :arc-filter:dmarc-filter:delivered-to; bh=TmbMEr6gDHS/6LZ+Gn8BHGxBCZW6d7jP4DjXMluFlp8=; fh=kxe/wfJybIMwOF5l32zk6CyzYVmV/lEsdubStKtynhc=; b=JpeUgDZXH54N9AFlzxVUIO0kVdMwLa+zKxPbBOZiIrODSU+fxaekS1k9oqu5yQOjkX kXmeBBStHdK/OyyU8hzO95NwbGcvrJltrKb+0uQRVJKFbF9bLaX0sE8MMIPAFjESxYXT mI5h8XQHyZ/nbdfbhzJ0pz+fXNG8rgQnzhYKiF9cRId8ZFbWU/MhIOBCN3tMbf+jBHAb bLy+/22AkGIdJfBsJ88R3B8dcgXKVn3pcsIZvPXjlZ21ja52JSivTnYj/MBx5yidvUQw 4W9QEfW/5n/kO7O5W+/QRxZrRxvqGUpgAbfSycEG6o0Icvnsdhz8lHO21/2m5jsbeVUm 0dZA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Frd3y4fM; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces~patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id af79cd13be357-7bce322a3f3si407255685a.39.2025.01.10.06.56.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Jan 2025 06:56:29 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Frd3y4fM; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces~patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B7F393858280 for ; Fri, 10 Jan 2025 14:56:28 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-vs1-xe33.google.com (mail-vs1-xe33.google.com [IPv6:2607:f8b0:4864:20::e33]) by sourceware.org (Postfix) with ESMTPS id 1ECFA3857C4F for ; Fri, 10 Jan 2025 14:56:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1ECFA3857C4F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1ECFA3857C4F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::e33 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736520963; cv=none; b=j0iArU8IZCwysm+lET0LGOAc1bXoOCDh4k85i2IrA8Sjlx6ujv8TwK+3rG3fTagZJxxiSYc2lpmlYzICvw5Ydwk9IAQR2H39oA3/QTuj8/6q7V1mfs/Q+y32Sk9krdBuCw+uM8WX+nCUfu9duVFekobXJPOzV8CwqmDuIpXYdRs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736520963; c=relaxed/simple; bh=7SROsRaZUM52hBPBN4s7Oox4QemP8mf0sBj8ISwgPM0=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=n9RTsTenUwTZfLY6XeYBQO1LAfaz0Ms9+XpvCwfotyeSLcoZAExA8qBeGF5PbxI+uyqI3Q7MRdoaw2W3uGfPakJfc+x82d986AsJqgmEcU9+gqtmFC7/UjE8VObAyOHRaEBUaj1d8dGq4IOvrQ5/Gqf0QTZGmfHujRFkEZ9YZ20= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-vs1-xe33.google.com with SMTP id ada2fe7eead31-4b10dd44c8bso759981137.3 for ; Fri, 10 Jan 2025 06:56:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1736520962; x=1737125762; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=TmbMEr6gDHS/6LZ+Gn8BHGxBCZW6d7jP4DjXMluFlp8=; b=Frd3y4fMNyChPp0jAXaUH5HhHUlhj+sMCjFzAc9PIbH8FDDmBss/zDgeo9QSy6CDzK 91ykXUFex0nv1aaBcyh2exnDK7Xx7Ljsd1NNS0ykoFyKTiNexDJExyVoLz2DXnKwTLOe si9EuNNOJe6xFEevmKjIrQHGdZN/MLw+BThnafODY8haKgPK0IABcySHNQRXaamfjQ1I Dv21KcEEyBvvBCcdnH4WAbUuYwH2sbEGoQuL4FC8TPfmzheU2zl679MaxNPUi+bcinSI osQzSUg5gxF0RwldFx1DkPox8OhfNLWllaqRuszc0MaFqjZ36FBjQqDUyvHldm6APN7V Sjqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736520962; x=1737125762; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=TmbMEr6gDHS/6LZ+Gn8BHGxBCZW6d7jP4DjXMluFlp8=; b=sJb/DeG9aJN896xo5GtzgR8Sddmb3GJOeNbsZIIWV47vEjDyqj02/NWm6a9haoc4rp 9VP3bLQHy3Ck2+FwNNpN7l+lLuPBbxky9BikHkhAX6e2R4md6qSoV+eGxlWTEXQAyeNO 3CfQof3/QjIZ6hJl1k1pdxAJfhS+GsFUlcHuOJo4eQZsUHNifWs6ar31rc4cOapkq40S jv+PYVKGsEzkoY/52P10SDufC21WokdKL815Fl1kWtH/2NPWaJtJH+jRzOYnwXcjH7XV ZRWnjAeBDlZxjM6+HD1YRxnNaE+FnkNBnTBsnsJZPWsGcNKXwoOEtxOX5k/XcbHFJoww 5/hw== X-Gm-Message-State: AOJu0YyA6I4rfgZ7AaLIbfh0jgq/PG2dMyZlR90dxRXgsd5sQShfnIrN iKFDQLKgS1UjUd+CZ2OLOoZYAcQLTX5zI13SJ10peqB6+iL+O7CFoBo32tlNDrX7gtsPe8VuBrY H X-Gm-Gg: ASbGncu8jQRrqto3XRQMjK3cLnnX/Dx/ajnhrDbHhpcfzfqVnezeET9o7apa92nnSFM xbpOxVfPqSvrmJgE0CDhPoS9Cbp1+nYwzyCHAhZg5u0N9qRr37w+0ZDKX3JT4QLovQU1VvW8CxI 2ABH9WXVdwj8x2M0cckSPs6Krfxes0T/btAOhdjHj9dQTxwjUg4a9Rd1hn38aAloAyPufcDWE/w bfn5AMFmKSHOEdLMgtsZzCsS74X02PQNaRH544viV/HcrXP9PhjN04h2Z3l8/KGYzvMAQ== X-Received: by 2002:a05:6102:3a08:b0:4af:ed5a:b697 with SMTP id ada2fe7eead31-4b3d0fc25camr10652017137.13.1736520961573; Fri, 10 Jan 2025 06:56:01 -0800 (PST) Received: from ubuntu-vm.. ([191.23.120.207]) by smtp.gmail.com with ESMTPSA id a1e0cc1a2514c-86231362217sm2796108241.12.2025.01.10.06.55.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Jan 2025 06:56:01 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: Lorenzo Stoakes , =?utf-8?q?Cristian_Rodr?= =?utf-8?q?=C3=ADguez?= Subject: [PATCH] nptl: Add support for setup guard pages with MADV_GUARD_INSTALL Date: Fri, 10 Jan 2025 11:55:44 -0300 Message-ID: <20250110145556.520522-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patch=linaro.org@sourceware.org The Linux 6.13 (662df3e5c3766) added a lightweight way to define guard pages through madvise syscall. Instead of using PROT_NONE mmap regions through mprotect, userland can madvise the same region and kernel to ensure that accessing the area will trigger a SIGSEGV (as for PROT_NONE mapping). It has both an advantage of less kernel memory consumption for the process page-table (one less VMA per guard area), and slight less contention on kernel (also due the less VMA areas being tracked). The pthread_create allocates a new thread stack in two ways: if a guard area is set (the default) it allocates the memory range required using PROT_NONE and then mprotect the usable stack area. Otherwise, if a guard page is not set it allocates the region with the required flags. For the MADV_GUARD_INSTALL support, the stack area regions is allocated with required flags and then guard region is madvise. If the kernel does not support it, the usual way is used instead (and ADV_GUARD_INSTALL is disabled for future stack creations). The stack allocation strategy is recorded on the pthread struct, it is used in case the guard region needs to be resized. To avoid the need for an extra field, the 'user_stack' is repurposed and renamed to 'stack_mode'. This patch also adds a proper test for the pthread guard. I checked on x86_64, aarch64, and powerpc64le with kernel 6.13.0-rc4. I also checked the new test on hppa with Linux 5.16.0 (although I could not boot a 6.13 to check MADV_GUARD_INSTALL). --- nptl/Makefile | 1 + nptl/TODO-testing | 4 - nptl/allocatestack.c | 229 +++++++++----- nptl/descr.h | 8 +- nptl/nptl-stack.c | 2 +- nptl/pthread_create.c | 2 +- nptl/tst-guard1.c | 366 ++++++++++++++++++++++ sysdeps/nptl/dl-tls_init_tp.c | 2 +- sysdeps/nptl/fork.h | 2 +- sysdeps/unix/sysv/linux/bits/mman-linux.h | 2 + 10 files changed, 523 insertions(+), 95 deletions(-) create mode 100644 nptl/tst-guard1.c diff --git a/nptl/Makefile b/nptl/Makefile index b7c63999a3..b04e25cd0d 100644 --- a/nptl/Makefile +++ b/nptl/Makefile @@ -289,6 +289,7 @@ tests = \ tst-dlsym1 \ tst-exec4 \ tst-exec5 \ + tst-guard1 \ tst-initializers1 \ tst-initializers1-c11 \ tst-initializers1-c89 \ diff --git a/nptl/TODO-testing b/nptl/TODO-testing index f50d2ceb51..46ebf3bc5c 100644 --- a/nptl/TODO-testing +++ b/nptl/TODO-testing @@ -1,7 +1,3 @@ -pthread_attr_setguardsize - - test effectiveness - pthread_attr_[sg]etschedparam what to test? diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c index 9c1a72bcf0..acae232765 100644 --- a/nptl/allocatestack.c +++ b/nptl/allocatestack.c @@ -146,10 +146,37 @@ get_cached_stack (size_t *sizep, void **memp) return result; } +/* Assume support for MADV_ADVISE_GUARD, setup_stack_prot will disable it + and fallback to ALLOCATE_GUARD_PROT_NONE if the madvise call fails. */ +static int allocate_stack_mode = ALLOCATE_GUARD_MADV_GUARD; + +static inline int stack_prot (void) +{ + return (PROT_READ | PROT_WRITE + | ((GL(dl_stack_flags) & PF_X) ? PROT_EXEC : 0)); +} + +static void * +allocate_thread_stack (size_t size, size_t guardsize) +{ + /* MADV_ADVISE_GUARD does not require an additional PROT_NONE mapping. */ + int prot = stack_prot (); + + if (atomic_load_relaxed (&allocate_stack_mode) == ALLOCATE_GUARD_PROT_NONE) + /* If a guard page is required, avoid committing memory by first allocate + with PROT_NONE and then reserve with required permission excluding the + guard page. */ + prot = guardsize == 0 ? prot : PROT_NONE; + + return __mmap (NULL, size, prot, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, + 0); +} + + /* Return the guard page position on allocated stack. */ static inline char * __attribute ((always_inline)) -guard_position (void *mem, size_t size, size_t guardsize, struct pthread *pd, +guard_position (void *mem, size_t size, size_t guardsize, const struct pthread *pd, size_t pagesize_m1) { #if _STACK_GROWS_DOWN @@ -159,27 +186,97 @@ guard_position (void *mem, size_t size, size_t guardsize, struct pthread *pd, #endif } -/* Based on stack allocated with PROT_NONE, setup the required portions with - 'prot' flags based on the guard page position. */ -static inline int -setup_stack_prot (char *mem, size_t size, char *guard, size_t guardsize, - const int prot) +/* Setup the MEM thread stack of SIZE bytes with the required protection flags + along with a guard area of GUARDSIZE size. It first tries with + MADV_GUARD_INSTALL, and then fallback to setup the guard area using the + extra PROT_NONE mapping. Update PD with the type of guard area setup. */ +static inline bool +setup_stack_prot (char *mem, size_t size, struct pthread *pd, + size_t guardsize, size_t pagesize_m1) { - char *guardend = guard + guardsize; + if (__glibc_unlikely (guardsize == 0)) + return true; + + char *guard = guard_position (mem, size, guardsize, pd, pagesize_m1); + if (atomic_load_relaxed (&allocate_stack_mode) == ALLOCATE_GUARD_MADV_GUARD) + { + if (__madvise (guard, guardsize, MADV_GUARD_INSTALL) == 0) + { + pd->stack_mode = ALLOCATE_GUARD_MADV_GUARD; + return true; + } + + /* If madvise fails it means the kernel does not support the guard + advise (we assume that guard is page-aligned and length is non + negative). The stack has already the expected flags, so it just need + to PROT_NONE the guard area. */ + atomic_store_relaxed (&allocate_stack_mode, ALLOCATE_GUARD_PROT_NONE); + if (__mprotect (guard, guardsize, PROT_NONE) != 0) + return false; + } + else + { + const int prot = stack_prot (); + char *guardend = guard + guardsize; #if _STACK_GROWS_DOWN - /* As defined at guard_position, for architectures with downward stack - the guard page is always at start of the allocated area. */ - if (__mprotect (guardend, size - guardsize, prot) != 0) - return errno; + /* As defined at guard_position, for architectures with downward stack + the guard page is always at start of the allocated area. */ + if (__mprotect (guardend, size - guardsize, prot) != 0) + return false; #else - size_t mprots1 = (uintptr_t) guard - (uintptr_t) mem; - if (__mprotect (mem, mprots1, prot) != 0) - return errno; - size_t mprots2 = ((uintptr_t) mem + size) - (uintptr_t) guardend; - if (__mprotect (guardend, mprots2, prot) != 0) - return errno; + size_t mprots1 = (uintptr_t) guard - (uintptr_t) mem; + if (__mprotect (mem, mprots1, prot) != 0) + return false; + size_t mprots2 = ((uintptr_t) mem + size) - (uintptr_t) guardend; + if (__mprotect (guardend, mprots2, prot) != 0) + return false; #endif - return 0; + } + + pd->stack_mode = ALLOCATE_GUARD_PROT_NONE; + return true; +} + +/* Update the guard area of the thread stack MEM of size SIZE with the + new GUARDISZE. It uses the method defined by PD stack_mode. */ +static inline bool +adjust_stack_prot (char *mem, size_t size, const struct pthread *pd, + size_t guardsize, size_t pagesize_m1) +{ + char *guard = guard_position (mem, size, guardsize, pd, pagesize_m1); + /* The required guardsize is larger than the current one. */ + if (guardsize > pd->guardsize) + { + if (pd->stack_mode == ALLOCATE_GUARD_MADV_GUARD) + return __madvise (guard, guardsize, MADV_GUARD_INSTALL) == 0; + else if (pd->stack_mode == ALLOCATE_GUARD_PROT_NONE) + return __mprotect (guard, guardsize, PROT_NONE) == 0; + } + /* The old guard are is too large. */ + else if (pd->guardsize > guardsize) + { + const int prot = stack_prot (); + size_t slacksize = pd->guardsize - guardsize; + if (pd->stack_mode == ALLOCATE_GUARD_MADV_GUARD) + return __madvise (guard + guardsize, slacksize, MADV_GUARD_REMOVE) == 0; + else if (pd->stack_mode == ALLOCATE_GUARD_PROT_NONE) +#if _STACK_GROWS_DOWN + return __mprotect (mem + guardsize, slacksize, prot) == 0; +#else + { + char *new_guard = (char *)(((uintptr_t) pd - guardsize) + & ~pagesize_m1); + char *old_guard = (char *)(((uintptr_t) pd - pd->guardsize) + & ~pagesize_m1); + /* The guard size difference might be > 0, but once rounded + to the nearest page the size difference might be zero. */ + if (new_guard > old_guard + && __mprotect (old_guard, new_guard - old_guard, prot) != 0) + return false; + } +#endif + } + return true; } /* Mark the memory of the stack as usable to the kernel. It frees everything @@ -291,7 +388,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, /* This is a user-provided stack. It will not be queued in the stack cache nor will the memory (except the TLS memory) be freed. */ - pd->user_stack = true; + pd->stack_mode = ALLOCATE_GUARD_USER; /* This is at least the second thread. */ pd->header.multiple_threads = 1; @@ -325,10 +422,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, /* Allocate some anonymous memory. If possible use the cache. */ size_t guardsize; size_t reported_guardsize; - size_t reqsize; void *mem; - const int prot = (PROT_READ | PROT_WRITE - | ((GL(dl_stack_flags) & PF_X) ? PROT_EXEC : 0)); /* Adjust the stack size for alignment. */ size &= ~tls_static_align_m1; @@ -358,16 +452,10 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, return EINVAL; /* Try to get a stack from the cache. */ - reqsize = size; pd = get_cached_stack (&size, &mem); if (pd == NULL) { - /* If a guard page is required, avoid committing memory by first - allocate with PROT_NONE and then reserve with required permission - excluding the guard page. */ - mem = __mmap (NULL, size, (guardsize == 0) ? prot : PROT_NONE, - MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); - + mem = allocate_thread_stack (size, guardsize); if (__glibc_unlikely (mem == MAP_FAILED)) return errno; @@ -394,15 +482,10 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, #endif /* Now mprotect the required region excluding the guard area. */ - if (__glibc_likely (guardsize > 0)) + if (!setup_stack_prot (mem, size, pd, guardsize, pagesize_m1)) { - char *guard = guard_position (mem, size, guardsize, pd, - pagesize_m1); - if (setup_stack_prot (mem, size, guard, guardsize, prot) != 0) - { - __munmap (mem, size); - return errno; - } + __munmap (mem, size); + return errno; } /* Remember the stack-related values. */ @@ -456,59 +539,31 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, which will be read next. */ } - /* Create or resize the guard area if necessary. */ - if (__glibc_unlikely (guardsize > pd->guardsize)) + /* Create or resize the guard area if necessary on an already + allocated stack. */ + if (!adjust_stack_prot (mem, size, pd, guardsize, pagesize_m1)) { - char *guard = guard_position (mem, size, guardsize, pd, - pagesize_m1); - if (__mprotect (guard, guardsize, PROT_NONE) != 0) - { - mprot_error: - lll_lock (GL (dl_stack_cache_lock), LLL_PRIVATE); + lll_lock (GL (dl_stack_cache_lock), LLL_PRIVATE); - /* Remove the thread from the list. */ - __nptl_stack_list_del (&pd->list); + /* Remove the thread from the list. */ + __nptl_stack_list_del (&pd->list); - lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE); + lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE); - /* Get rid of the TLS block we allocated. */ - _dl_deallocate_tls (TLS_TPADJ (pd), false); + /* Get rid of the TLS block we allocated. */ + _dl_deallocate_tls (TLS_TPADJ (pd), false); - /* Free the stack memory regardless of whether the size - of the cache is over the limit or not. If this piece - of memory caused problems we better do not use it - anymore. Uh, and we ignore possible errors. There - is nothing we could do. */ - (void) __munmap (mem, size); + /* Free the stack memory regardless of whether the size + of the cache is over the limit or not. If this piece + of memory caused problems we better do not use it + anymore. Uh, and we ignore possible errors. There + is nothing we could do. */ + (void) __munmap (mem, size); - return errno; - } - - pd->guardsize = guardsize; + return errno; } - else if (__builtin_expect (pd->guardsize - guardsize > size - reqsize, - 0)) - { - /* The old guard area is too large. */ -#if _STACK_GROWS_DOWN - if (__mprotect ((char *) mem + guardsize, pd->guardsize - guardsize, - prot) != 0) - goto mprot_error; -#elif _STACK_GROWS_UP - char *new_guard = (char *)(((uintptr_t) pd - guardsize) - & ~pagesize_m1); - char *old_guard = (char *)(((uintptr_t) pd - pd->guardsize) - & ~pagesize_m1); - /* The guard size difference might be > 0, but once rounded - to the nearest page the size difference might be zero. */ - if (new_guard > old_guard - && __mprotect (old_guard, new_guard - old_guard, prot) != 0) - goto mprot_error; -#endif - - pd->guardsize = guardsize; - } + pd->guardsize = guardsize; /* The pthread_getattr_np() calls need to get passed the size requested in the attribute, regardless of how large the actually used guardsize is. */ @@ -568,19 +623,21 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, static void name_stack_maps (struct pthread *pd, bool set) { + size_t adjust = pd->stack_mode == ALLOCATE_GUARD_PROT_NONE ? + pd->guardsize : 0; #if _STACK_GROWS_DOWN - void *stack = pd->stackblock + pd->guardsize; + void *stack = pd->stackblock + adjust; #else void *stack = pd->stackblock; #endif - size_t stacksize = pd->stackblock_size - pd->guardsize; + size_t stacksize = pd->stackblock_size - adjust; if (!set) - __set_vma_name (stack, stacksize, NULL); + __set_vma_name (stack, stacksize, " glibc: unused stack"); else { unsigned int tid = pd->tid; - if (pd->user_stack) + if (pd->stack_mode == ALLOCATE_GUARD_USER) SET_STACK_NAME (" glibc: pthread user stack: ", stack, stacksize, tid); else SET_STACK_NAME (" glibc: pthread stack: ", stack, stacksize, tid); diff --git a/nptl/descr.h b/nptl/descr.h index d0d30929e2..9c1ed54c56 100644 --- a/nptl/descr.h +++ b/nptl/descr.h @@ -125,6 +125,12 @@ struct priority_protection_data unsigned int priomap[]; }; +enum allocate_stack_mode_t +{ + ALLOCATE_GUARD_MADV_GUARD = 0, + ALLOCATE_GUARD_PROT_NONE = 1, + ALLOCATE_GUARD_USER = 2, +}; /* Thread descriptor data structure. */ struct pthread @@ -324,7 +330,7 @@ struct pthread bool report_events; /* True if the user provided the stack. */ - bool user_stack; + enum allocate_stack_mode_t stack_mode; /* True if thread must stop at startup time. */ bool stopped_start; diff --git a/nptl/nptl-stack.c b/nptl/nptl-stack.c index 503357f25d..c049c5133c 100644 --- a/nptl/nptl-stack.c +++ b/nptl/nptl-stack.c @@ -120,7 +120,7 @@ __nptl_deallocate_stack (struct pthread *pd) not reset the 'used' flag in the 'tid' field. This is done by the kernel. If no thread has been created yet this field is still zero. */ - if (__glibc_likely (! pd->user_stack)) + if (__glibc_likely (pd->stack_mode != ALLOCATE_GUARD_USER)) (void) queue_stack (pd); else /* Free the memory associated with the ELF TLS. */ diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c index 01e8a86980..0808f2e628 100644 --- a/nptl/pthread_create.c +++ b/nptl/pthread_create.c @@ -554,7 +554,7 @@ start_thread (void *arg) to avoid creating a new free-state block during thread release. */ __getrandom_vdso_release (pd); - if (!pd->user_stack) + if (pd->stack_mode != ALLOCATE_GUARD_USER) advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd, pd->guardsize); diff --git a/nptl/tst-guard1.c b/nptl/tst-guard1.c new file mode 100644 index 0000000000..ec04eeed02 --- /dev/null +++ b/nptl/tst-guard1.c @@ -0,0 +1,366 @@ +/* Basic tests for pthread guard area. + Copyright (C) 2025 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static long int pagesz; + +/* To check if the guard region is inaccessible, the thread tries read/writes + on it and checks if a SIGSEGV is generated. */ + +static volatile sig_atomic_t signal_jump_set; +static sigjmp_buf signal_jmp_buf; + +static void +sigsegv_handler (int sig) +{ + if (signal_jump_set == 0) + return; + + siglongjmp (signal_jmp_buf, sig); +} + +static bool +try_access_buf (char *ptr, bool write) +{ + signal_jump_set = true; + + bool failed = sigsetjmp (signal_jmp_buf, 0) != 0; + if (!failed) + { + if (write) + *(volatile char *)(ptr) = 'x'; + else + *(volatile char *)(ptr); + } + + signal_jump_set = false; + return !failed; +} + +static bool +try_read_buf (char *ptr) +{ + return try_access_buf (ptr, false); +} + +static bool +try_write_buf (char *ptr) +{ + return try_access_buf (ptr, true); +} + +static bool +try_read_write_buf (char *ptr) +{ + return try_read_buf (ptr) && try_write_buf(ptr); +} + + +/* Return the guard region of the current thread (it only makes sense on + a thread created by pthread_created). */ + +struct stack_t +{ + char *stack; + size_t stacksize; + char *guard; + size_t guardsize; +}; + +static inline size_t +adjust_stacksize (size_t stacksize) +{ + /* For some ABIs, The guard page depends of the thread descriptor, which in + turn rely on the require static TLS. The only supported _STACK_GROWS_UP + ABI, hppa, defines TLS_DTV_AT_TP and it is not straightforward to + calculate the guard region with current pthread APIs. So to get a + correct stack size assumes an extra page after the guard area. */ +#if _STACK_GROWS_DOWN + return stacksize; +#elif _STACK_GROWS_UP + return stacksize - pagesz; +#endif +} + +struct stack_t +get_current_stack_info (void) +{ + pthread_attr_t attr; + TEST_VERIFY_EXIT (pthread_getattr_np (pthread_self (), &attr) == 0); + void *stack; + size_t stacksize; + TEST_VERIFY_EXIT (pthread_attr_getstack (&attr, &stack, &stacksize) == 0); + size_t guardsize; + TEST_VERIFY_EXIT (pthread_attr_getguardsize (&attr, &guardsize) == 0); + /* The guardsize is reported as the current page size, although it might + be adjusted to a larger value (aarch64 for instance). */ + if (guardsize != 0 && guardsize < ARCH_MIN_GUARD_SIZE) + guardsize = ARCH_MIN_GUARD_SIZE; + +#if _STACK_GROWS_DOWN + void *guard = guardsize ? stack - guardsize : 0; +#elif _STACK_GROWS_UP + stacksize = adjust_stacksize (stacksize); + void *guard = guardsize ? stack + stacksize : 0; +#endif + + pthread_attr_destroy (&attr); + + return (struct stack_t) { stack, stacksize, guard, guardsize }; +} + +struct thread_args_t +{ + size_t stacksize; + size_t guardsize; +}; + +struct thread_args_t +get_thread_args (const pthread_attr_t *attr) +{ + size_t stacksize; + size_t guardsize; + + TEST_COMPARE (pthread_attr_getstacksize (attr, &stacksize), 0); + TEST_COMPARE (pthread_attr_getguardsize (attr, &guardsize), 0); + if (guardsize < ARCH_MIN_GUARD_SIZE) + guardsize = ARCH_MIN_GUARD_SIZE; + + return (struct thread_args_t) { stacksize, guardsize }; +} + +static void +set_thread_args (pthread_attr_t *attr, const struct thread_args_t *args) +{ + xpthread_attr_setstacksize (attr, args->stacksize); + xpthread_attr_setguardsize (attr, args->guardsize); +} + +static void * +tf (void *closure) +{ + struct thread_args_t *args = closure; + + struct stack_t s = get_current_stack_info (); + if (test_verbose) + printf ("debug: [tid=%jd] stack = { .stack=%p, stacksize=%#zx, guard=%p, " + "guardsize=%#zx }\n", + (intmax_t) gettid (), + s.stack, + s.stacksize, + s.guard, + s.guardsize); + + if (args != NULL) + { + TEST_COMPARE (adjust_stacksize (args->stacksize), s.stacksize); + TEST_COMPARE (args->guardsize, s.guardsize); + } + + /* Ensure we can access the stack area. */ + TEST_COMPARE (try_read_write_buf (s.stack), true); + TEST_COMPARE (try_read_write_buf (&s.stack[s.stacksize / 2]), true); + TEST_COMPARE (try_read_write_buf (&s.stack[s.stacksize - 1]), true); + + /* Check if accessing the guard area results in SIGSEGV. */ + if (s.guardsize > 0) + { + TEST_COMPARE (try_read_write_buf (s.guard), false); + TEST_COMPARE (try_read_write_buf (&s.guard[s.guardsize / 2]), false); + TEST_COMPARE (try_read_write_buf (&s.guard[s.guardsize] - 1), false); + } + + return NULL; +} + +/* Test 1: caller provided stack without guard. */ +static void +do_test1 (void) +{ + pthread_attr_t attr; + xpthread_attr_init (&attr); + + size_t stacksize = support_small_thread_stack_size (); + void *stack = xmmap (0, + stacksize, + PROT_READ | PROT_WRITE, + MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, + -1); + xpthread_attr_setstack (&attr, stack, stacksize); + xpthread_attr_setguardsize (&attr, 0); + + struct thread_args_t args = { stacksize, 0 }; + pthread_t t = xpthread_create (&attr, tf, &args); + void *status = xpthread_join (t); + TEST_VERIFY (status == 0); + + xpthread_attr_destroy (&attr); + xmunmap (stack, stacksize); +} + +/* Test 2: same as 1., but with a guard area. */ +static void +do_test2 (void) +{ + pthread_attr_t attr; + xpthread_attr_init (&attr); + + size_t stacksize = support_small_thread_stack_size (); + void *stack = xmmap (0, + stacksize, + PROT_READ | PROT_WRITE, + MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, + -1); + xpthread_attr_setstack (&attr, stack, stacksize); + xpthread_attr_setguardsize (&attr, pagesz); + + struct thread_args_t args = { stacksize, 0 }; + pthread_t t = xpthread_create (&attr, tf, &args); + void *status = xpthread_join (t); + TEST_VERIFY (status == 0); + + xpthread_attr_destroy (&attr); + xmunmap (stack, stacksize); +} + +/* Test 3: pthread_create with default values. */ +static void +do_test3 (void) +{ + pthread_t t = xpthread_create (NULL, tf, NULL); + void *status = xpthread_join (t); + TEST_VERIFY (status == 0); +} + +/* Test 4: pthread_create without a guard area. */ +static void +do_test4 (void) +{ + pthread_attr_t attr; + xpthread_attr_init (&attr); + struct thread_args_t args = get_thread_args (&attr); + args.stacksize += args.guardsize; + args.guardsize = 0; + set_thread_args (&attr, &args); + + pthread_t t = xpthread_create (&attr, tf, &args); + void *status = xpthread_join (t); + TEST_VERIFY (status == 0); + + xpthread_attr_destroy (&attr); +} + +/* Test 5: pthread_create with non default stack and guard size value. */ +static void +do_test5 (void) +{ + pthread_attr_t attr; + xpthread_attr_init (&attr); + struct thread_args_t args = get_thread_args (&attr); + args.guardsize += pagesz; + args.stacksize += pagesz; + set_thread_args (&attr, &args); + + pthread_t t = xpthread_create (&attr, tf, &args); + void *status = xpthread_join (t); + TEST_VERIFY (status == 0); + + xpthread_attr_destroy (&attr); +} + +/* Test 6: thread with the required size (stack + guard) that matches the + test 3, but with largert guard area. The pthread_create will need to + increase the guard area. */ +static void +do_test6 (void) +{ + pthread_attr_t attr; + xpthread_attr_init (&attr); + struct thread_args_t args = get_thread_args (&attr); + args.guardsize += pagesz; + args.stacksize -= pagesz; + set_thread_args (&attr, &args); + + pthread_t t = xpthread_create (&attr, tf, &args); + void *status = xpthread_join (t); + TEST_VERIFY (status == 0); + + xpthread_attr_destroy (&attr); +} + +/* Test 7: pthread_create with default values, the requires size matches the + one from test 3 and 6 (but with a reduced guard ares). The + pthread_create should use the cached stack from previous tests, but it + would require to reduce the guard area. */ +static void +do_test7 (void) +{ + pthread_t t = xpthread_create (NULL, tf, NULL); + void *status = xpthread_join (t); + TEST_VERIFY (status == 0); +} + +static int +do_test (void) +{ + pagesz = sysconf (_SC_PAGESIZE); + + { + struct sigaction sa = { + .sa_handler = sigsegv_handler, + .sa_flags = SA_NODEFER, + }; + sigemptyset (&sa.sa_mask); + xsigaction (SIGSEGV, &sa, NULL); + } + + static const struct { + const char *descr; + void (*test)(void); + } tests[] = { + { "user provided stack without guard", do_test1 }, + { "user provided stack with guard", do_test2 }, + { "default attribute", do_test3 }, + { "default attribute without guard", do_test4 }, + { "non default stack and guard sizes", do_test5 }, + { "reused stack with larger guard", do_test6 }, + { "reused stack with smaller guard", do_test7 }, + }; + + for (int i = 0; i < array_length (tests); i++) + { + printf ("debug: test%01d: %s\n", i, tests[i].descr); + tests[i].test(); + } + + return 0; +} + +#include diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c index c57738e9f3..20cc9202ec 100644 --- a/sysdeps/nptl/dl-tls_init_tp.c +++ b/sysdeps/nptl/dl-tls_init_tp.c @@ -72,7 +72,7 @@ __tls_init_tp (void) /* Early initialization of the TCB. */ pd->tid = INTERNAL_SYSCALL_CALL (set_tid_address, &pd->tid); THREAD_SETMEM (pd, specific[0], &pd->specific_1stblock[0]); - THREAD_SETMEM (pd, user_stack, true); + THREAD_SETMEM (pd, stack_mode, ALLOCATE_GUARD_USER); /* Before initializing GL (dl_stack_user), the debugger could not find us and had to set __nptl_initial_report_events. Propagate diff --git a/sysdeps/nptl/fork.h b/sysdeps/nptl/fork.h index 6156af79e1..3c79179437 100644 --- a/sysdeps/nptl/fork.h +++ b/sysdeps/nptl/fork.h @@ -155,7 +155,7 @@ reclaim_stacks (void) INIT_LIST_HEAD (&GL (dl_stack_used)); INIT_LIST_HEAD (&GL (dl_stack_user)); - if (__glibc_unlikely (THREAD_GETMEM (self, user_stack))) + if (__glibc_unlikely (self->stack_mode == ALLOCATE_GUARD_USER)) list_add (&self->list, &GL (dl_stack_user)); else list_add (&self->list, &GL (dl_stack_used)); diff --git a/sysdeps/unix/sysv/linux/bits/mman-linux.h b/sysdeps/unix/sysv/linux/bits/mman-linux.h index 8e072eb4cd..fe0496d802 100644 --- a/sysdeps/unix/sysv/linux/bits/mman-linux.h +++ b/sysdeps/unix/sysv/linux/bits/mman-linux.h @@ -113,6 +113,8 @@ locked pages too. */ # define MADV_COLLAPSE 25 /* Synchronous hugepage collapse. */ # define MADV_HWPOISON 100 /* Poison a page for testing. */ +# define MADV_GUARD_INSTALL 102 /* Fatal signal on access to range */ +# define MADV_GUARD_REMOVE 103 /* Unguard range */ #endif /* The POSIX people had to invent similar names for the same things. */