diff mbox series

[v2,7/7] elf: Add glibc.mem.decorate_maps tunable

Message ID 20231028195559.390407-8-adhemerval.zanella@linaro.org
State New
Headers show
Series Add a tunable to decorate anonymous memory maps | expand

Commit Message

Adhemerval Zanella Netto Oct. 28, 2023, 7:55 p.m. UTC
The PR_SET_VMA_ANON_NAME support is only enabled through a configurable
kernel switch, mainly because assigning a name to a
anonymous virtual memory area might prevent that area from being
merged with adjacent virtual memory areas.

For instance, with the following code:

   void *p1 = mmap (NULL,
                    1024 * 4096,
                    PROT_READ | PROT_WRITE,
                    MAP_PRIVATE | MAP_ANONYMOUS,
                    -1,
                    0);

   void *p2 = mmap (p1 + (1024 * 4096),
                    1024 * 4096,
                    PROT_READ | PROT_WRITE,
                    MAP_PRIVATE | MAP_ANONYMOUS,
                    -1,
                    0);

The kernel will potentially merge both mappings resulting in only one
segment of size 0x800000.  If the segment is names with
PR_SET_VMA_ANON_NAME with different names, it results in two mappings.

Although this will unlikely be an issue for pthread stacks and malloc
arenas (since for pthread stacks the guard page will result in
a PROT_NONE segment, similar to the alignment requirement for the arena
block), it still might prevent the mmap memory allocated for detail
malloc.

There is also another potential scalability issue, where the prctl
requires
to take the mmap global lock which is still not fully fixed in Linux
[1] (for pthread stacks and arenas, it is mitigated by the stack
cached and the arena reuse).

So this patch disables anonymous mapping annotations as default and
add a new tunable, glibc.mem.decorate_maps, can be used to enable
it.

[1] https://lwn.net/Articles/906852/

Checked on x86_64-linux-gnu and aarch64-linux-gnu.
---
 NEWS                                 |  5 +++++
 elf/Makefile                         |  2 +-
 elf/dl-tunables.list                 |  5 +++++
 manual/tunables.texi                 | 12 ++++++++++++
 sysdeps/unix/sysv/linux/setvmaname.c |  4 ++++
 5 files changed, 27 insertions(+), 1 deletion(-)

Comments

DJ Delorie Nov. 1, 2023, 3:09 a.m. UTC | #1
LGTM with a few optional suggestions
Reviewed-by: DJ Delorie <dj@redhat.com>

Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> diff --git a/NEWS b/NEWS
> index 9432564444..67f6e63b14 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -38,6 +38,11 @@ Major new features:
>    and the wfN format length modifiers for arguments pointing to types
>    int_fastN_t or uint_fastN_t, as specified in draft ISO C2X.
>  
> +* A new tunable, glibc.mem.decorate_maps, can be used to add additional
> +  information on underlying memory allocated by the glibc (for instance,
> +  on thread stack created by pthread_create or memory allocated by
> +  malloc).
> +

Ok.  Could mention /proc/*/maps

> diff --git a/elf/Makefile b/elf/Makefile
> index c3cf63a443..2603b90961 100644
> --- a/elf/Makefile
> +++ b/elf/Makefile
> @@ -3023,5 +3023,5 @@ $(objpfx)tst-dlclose-lazy.out: \
>  $(objpfx)tst-decorate-maps: $(shared-thread-library)
>  
>  tst-decorate-maps-ENV = \
> -  GLIBC_TUNABLES=glibc.malloc.arena_max=8:glibc.malloc.mmap_threshold=1024
> +  GLIBC_TUNABLES=glibc.malloc.arena_max=8:glibc.malloc.mmap_threshold=1024:glibc.mem.decorate_maps=1
>  tst-decorate-maps-ARGS = 8

Ok.

> diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
> index 695ba7192e..888d2ede04 100644
> --- a/elf/dl-tunables.list
> +++ b/elf/dl-tunables.list
> @@ -160,6 +160,11 @@ glibc {
>        maxval: 255
>        security_level: SXID_IGNORE
>      }
> +    decorate_maps {
> +      type: INT_32
> +      minval: 0
> +      maxval: 1
> +    }
>    }

Ok.

> diff --git a/manual/tunables.texi b/manual/tunables.texi
> index 776fd93fd9..2a5ad73e9d 100644
> --- a/manual/tunables.texi
> +++ b/manual/tunables.texi
> @@ -653,6 +653,18 @@ support in the kernel if this tunable has any non-zero value.
>  The default value is @samp{0}, which disables all memory tagging.
>  @end deftp
>  
> +@deftp Tunable glibc.mem.decorate_maps
> +If the kernel supports naming anonymous virtual memory areas (since
> +Linux version 5.17), this tunable can be used to control whether
> +@theglibc{} decorates the underlying memory obtained from operating
> +system with a string describing its usage (for instance, on the thread
> +stack created by @code{ptthread_create} or memory allocated by
> +@code{malloc}).
> +
> +This tunable takes a value of 0 and 1, where 1 enables the feature.
> +The default value is @samp{0}, which disables the decoration.
> +@end deftp
> +

Ok.  Again, could mention /proc/*/maps.  Could mention that the kernel
might be built without such support by choice, too.

> diff --git a/sysdeps/unix/sysv/linux/setvmaname.c b/sysdeps/unix/sysv/linux/setvmaname.c
> index 9960ab5917..62237fcf7c 100644
> --- a/sysdeps/unix/sysv/linux/setvmaname.c
> +++ b/sysdeps/unix/sysv/linux/setvmaname.c
> @@ -20,6 +20,7 @@
>  #include <setvmaname.h>
>  #include <sys/prctl.h>
>  #include <sysdep.h>
> +#include <elf/dl-tunables.h>
>  
>  /* If PR_SET_VMA_ANON_NAME is not supported by the kernel, prctl returns
>     EINVAL.  However, it also returns the same error for invalid argument.
> @@ -30,6 +31,9 @@
>  void
>  __set_vma_name (void *start, size_t len, const char *name)
>  {
> +  if (TUNABLE_GET (glibc, mem, decorate_maps, int32_t, NULL) == 0)
> +    return;
> +
>    static int prctl_supported = 1;
>    if (atomic_load_relaxed (&prctl_supported) == 0)
>      return;

Ok, but would be better to check the tunable after the prctl_supported
flag (and set the flag) to avoid calling __tunable_get_val for every
call.  Unless the intention was to allow changing the tunable during
runtime, in which case, ignore this comment ;-)
Adhemerval Zanella Netto Nov. 1, 2023, 12:53 p.m. UTC | #2
On 01/11/23 00:09, DJ Delorie wrote:
> 
> LGTM with a few optional suggestions
> Reviewed-by: DJ Delorie <dj@redhat.com>
> 
> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
>> diff --git a/NEWS b/NEWS
>> index 9432564444..67f6e63b14 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -38,6 +38,11 @@ Major new features:
>>    and the wfN format length modifiers for arguments pointing to types
>>    int_fastN_t or uint_fastN_t, as specified in draft ISO C2X.
>>  
>> +* A new tunable, glibc.mem.decorate_maps, can be used to add additional
>> +  information on underlying memory allocated by the glibc (for instance,
>> +  on thread stack created by pthread_create or memory allocated by
>> +  malloc).
>> +
> 
> Ok.  Could mention /proc/*/maps
> 
>> diff --git a/elf/Makefile b/elf/Makefile
>> index c3cf63a443..2603b90961 100644
>> --- a/elf/Makefile
>> +++ b/elf/Makefile
>> @@ -3023,5 +3023,5 @@ $(objpfx)tst-dlclose-lazy.out: \
>>  $(objpfx)tst-decorate-maps: $(shared-thread-library)
>>  
>>  tst-decorate-maps-ENV = \
>> -  GLIBC_TUNABLES=glibc.malloc.arena_max=8:glibc.malloc.mmap_threshold=1024
>> +  GLIBC_TUNABLES=glibc.malloc.arena_max=8:glibc.malloc.mmap_threshold=1024:glibc.mem.decorate_maps=1
>>  tst-decorate-maps-ARGS = 8
> 
> Ok.
> 
>> diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
>> index 695ba7192e..888d2ede04 100644
>> --- a/elf/dl-tunables.list
>> +++ b/elf/dl-tunables.list
>> @@ -160,6 +160,11 @@ glibc {
>>        maxval: 255
>>        security_level: SXID_IGNORE
>>      }
>> +    decorate_maps {
>> +      type: INT_32
>> +      minval: 0
>> +      maxval: 1
>> +    }
>>    }
> 
> Ok.
> 
>> diff --git a/manual/tunables.texi b/manual/tunables.texi
>> index 776fd93fd9..2a5ad73e9d 100644
>> --- a/manual/tunables.texi
>> +++ b/manual/tunables.texi
>> @@ -653,6 +653,18 @@ support in the kernel if this tunable has any non-zero value.
>>  The default value is @samp{0}, which disables all memory tagging.
>>  @end deftp
>>  
>> +@deftp Tunable glibc.mem.decorate_maps
>> +If the kernel supports naming anonymous virtual memory areas (since
>> +Linux version 5.17), this tunable can be used to control whether
>> +@theglibc{} decorates the underlying memory obtained from operating
>> +system with a string describing its usage (for instance, on the thread
>> +stack created by @code{ptthread_create} or memory allocated by
>> +@code{malloc}).
>> +
>> +This tunable takes a value of 0 and 1, where 1 enables the feature.
>> +The default value is @samp{0}, which disables the decoration.
>> +@end deftp
>> +
> 
> Ok.  Again, could mention /proc/*/maps.  Could mention that the kernel
> might be built without such support by choice, too.

Ack, I will extend it.

> 
>> diff --git a/sysdeps/unix/sysv/linux/setvmaname.c b/sysdeps/unix/sysv/linux/setvmaname.c
>> index 9960ab5917..62237fcf7c 100644
>> --- a/sysdeps/unix/sysv/linux/setvmaname.c
>> +++ b/sysdeps/unix/sysv/linux/setvmaname.c
>> @@ -20,6 +20,7 @@
>>  #include <setvmaname.h>
>>  #include <sys/prctl.h>
>>  #include <sysdep.h>
>> +#include <elf/dl-tunables.h>
>>  
>>  /* If PR_SET_VMA_ANON_NAME is not supported by the kernel, prctl returns
>>     EINVAL.  However, it also returns the same error for invalid argument.
>> @@ -30,6 +31,9 @@
>>  void
>>  __set_vma_name (void *start, size_t len, const char *name)
>>  {
>> +  if (TUNABLE_GET (glibc, mem, decorate_maps, int32_t, NULL) == 0)
>> +    return;
>> +
>>    static int prctl_supported = 1;
>>    if (atomic_load_relaxed (&prctl_supported) == 0)
>>      return;
> 
> Ok, but would be better to check the tunable after the prctl_supported
> flag (and set the flag) to avoid calling __tunable_get_val for every
> call.  Unless the intention was to allow changing the tunable during
> runtime, in which case, ignore this comment ;-)
> 

Indeed you are right, we don't have the tunable runtime support and I am
not sure if adding this would make sense for this tunable.
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index 9432564444..67f6e63b14 100644
--- a/NEWS
+++ b/NEWS
@@ -38,6 +38,11 @@  Major new features:
   and the wfN format length modifiers for arguments pointing to types
   int_fastN_t or uint_fastN_t, as specified in draft ISO C2X.
 
+* A new tunable, glibc.mem.decorate_maps, can be used to add additional
+  information on underlying memory allocated by the glibc (for instance,
+  on thread stack created by pthread_create or memory allocated by
+  malloc).
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * The ldconfig program now skips file names containing ';' or ending in
diff --git a/elf/Makefile b/elf/Makefile
index c3cf63a443..2603b90961 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -3023,5 +3023,5 @@  $(objpfx)tst-dlclose-lazy.out: \
 $(objpfx)tst-decorate-maps: $(shared-thread-library)
 
 tst-decorate-maps-ENV = \
-  GLIBC_TUNABLES=glibc.malloc.arena_max=8:glibc.malloc.mmap_threshold=1024
+  GLIBC_TUNABLES=glibc.malloc.arena_max=8:glibc.malloc.mmap_threshold=1024:glibc.mem.decorate_maps=1
 tst-decorate-maps-ARGS = 8
diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
index 695ba7192e..888d2ede04 100644
--- a/elf/dl-tunables.list
+++ b/elf/dl-tunables.list
@@ -160,6 +160,11 @@  glibc {
       maxval: 255
       security_level: SXID_IGNORE
     }
+    decorate_maps {
+      type: INT_32
+      minval: 0
+      maxval: 1
+    }
   }
 
   rtld {
diff --git a/manual/tunables.texi b/manual/tunables.texi
index 776fd93fd9..2a5ad73e9d 100644
--- a/manual/tunables.texi
+++ b/manual/tunables.texi
@@ -653,6 +653,18 @@  support in the kernel if this tunable has any non-zero value.
 The default value is @samp{0}, which disables all memory tagging.
 @end deftp
 
+@deftp Tunable glibc.mem.decorate_maps
+If the kernel supports naming anonymous virtual memory areas (since
+Linux version 5.17), this tunable can be used to control whether
+@theglibc{} decorates the underlying memory obtained from operating
+system with a string describing its usage (for instance, on the thread
+stack created by @code{ptthread_create} or memory allocated by
+@code{malloc}).
+
+This tunable takes a value of 0 and 1, where 1 enables the feature.
+The default value is @samp{0}, which disables the decoration.
+@end deftp
+
 @node gmon Tunables
 @section gmon Tunables
 @cindex gmon tunables
diff --git a/sysdeps/unix/sysv/linux/setvmaname.c b/sysdeps/unix/sysv/linux/setvmaname.c
index 9960ab5917..62237fcf7c 100644
--- a/sysdeps/unix/sysv/linux/setvmaname.c
+++ b/sysdeps/unix/sysv/linux/setvmaname.c
@@ -20,6 +20,7 @@ 
 #include <setvmaname.h>
 #include <sys/prctl.h>
 #include <sysdep.h>
+#include <elf/dl-tunables.h>
 
 /* If PR_SET_VMA_ANON_NAME is not supported by the kernel, prctl returns
    EINVAL.  However, it also returns the same error for invalid argument.
@@ -30,6 +31,9 @@ 
 void
 __set_vma_name (void *start, size_t len, const char *name)
 {
+  if (TUNABLE_GET (glibc, mem, decorate_maps, int32_t, NULL) == 0)
+    return;
+
   static int prctl_supported = 1;
   if (atomic_load_relaxed (&prctl_supported) == 0)
     return;