Message ID | 20220725110035.1273441-1-peter.maydell@linaro.org |
---|---|
State | Superseded |
Headers | show |
Series | linux-user: Don't assume 0 is not a valid host timer_t value | expand |
On Mon, Jul 25, 2022 at 12:00:35PM +0100, Peter Maydell wrote: > For handling guest POSIX timers, we currently use an array > g_posix_timers[], whose entries are a host timer_t value, or 0 for > "this slot is unused". When the guest calls the timer_create syscall > we look through the array for a slot containing 0, and use that for > the new timer. > > This scheme assumes that host timer_t values can never be zero. This > is unfortunately not a valid assumption -- for some host libc > versions, timer_t values are simply indexes starting at 0. When > using this kind of host libc, the effect is that the first and second > timers end up sharing a slot, and so when the guest tries to operate > on the first timer it changes the second timer instead. For sake of historical record, could you mention here which specific libc impl / version highlights the problem. > > Rework the timer allocation code, so that: > * the 'slot in use' indication uses a separate array from the > host timer_t array > * we grab the free slot atomically, to avoid races when multiple > threads call timer_create simultaneously > * releasing an allocated slot is abstracted out into a new > free_host_timer_slot() function called in the correct places > > This fixes: > * problems on hosts where timer_t 0 is valid > * the FIXME in next_free_host_timer() about locking > * bugs in the error paths in timer_create where we forgot to release > the slot we grabbed, or forgot to free the host timer > > Reported-by: Jon Alduan <jon.alduan@gmail.com> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > --- > linux-user/syscall.c | 24 ++++++++++++++++-------- > 1 file changed, 16 insertions(+), 8 deletions(-) With regards, Daniel
On Mon, 25 Jul 2022 at 12:13, Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Mon, Jul 25, 2022 at 12:00:35PM +0100, Peter Maydell wrote: > > For handling guest POSIX timers, we currently use an array > > g_posix_timers[], whose entries are a host timer_t value, or 0 for > > "this slot is unused". When the guest calls the timer_create syscall > > we look through the array for a slot containing 0, and use that for > > the new timer. > > > > This scheme assumes that host timer_t values can never be zero. This > > is unfortunately not a valid assumption -- for some host libc > > versions, timer_t values are simply indexes starting at 0. When > > using this kind of host libc, the effect is that the first and second > > timers end up sharing a slot, and so when the guest tries to operate > > on the first timer it changes the second timer instead. > > For sake of historical record, could you mention here which specific > libc impl / version highlights the problem. Jon, which host libc are you seeing this with? thanks -- PMM
Hello Peter, I can say so far, your patch solved the issue! Great thanks for that! Regarding the libc version: From my WSL2 Ubuntu 21.04 x86_64: $ ls -l /lib32/libc* -rwxr-xr-x 1 root root 2042632 Mar 31 2021 /lib32/libc-2.33.so My gcc version 10 does use the same libc version. As already mentioned, I can also reproduce this on a VM with Ubuntu 20.04 and libc-2.31. In addition, originally, this issue was first reproduced with an own buildroot RootFS and containing libc-2.28. As you see, the libcs are not that old. What about the virtual environment? I could not check this hypothesis, but I hope to do so soon. Thank you again and best regards Jon El lun, 25 jul 2022 a las 14:45, Peter Maydell (<peter.maydell@linaro.org>) escribió: > On Mon, 25 Jul 2022 at 12:13, Daniel P. Berrangé <berrange@redhat.com> > wrote: > > > > On Mon, Jul 25, 2022 at 12:00:35PM +0100, Peter Maydell wrote: > > > For handling guest POSIX timers, we currently use an array > > > g_posix_timers[], whose entries are a host timer_t value, or 0 for > > > "this slot is unused". When the guest calls the timer_create syscall > > > we look through the array for a slot containing 0, and use that for > > > the new timer. > > > > > > This scheme assumes that host timer_t values can never be zero. This > > > is unfortunately not a valid assumption -- for some host libc > > > versions, timer_t values are simply indexes starting at 0. When > > > using this kind of host libc, the effect is that the first and second > > > timers end up sharing a slot, and so when the guest tries to operate > > > on the first timer it changes the second timer instead. > > > > For sake of historical record, could you mention here which specific > > libc impl / version highlights the problem. > > Jon, which host libc are you seeing this with? > > thanks > -- PMM >
On Tue, 26 Jul 2022 at 23:13, Jon Alduan <jon.alduan@gmail.com> wrote: > > Hello Peter, > > I can say so far, your patch solved the issue! Great thanks for that! > > Regarding the libc version: > From my WSL2 Ubuntu 21.04 x86_64: > $ ls -l /lib32/libc* > -rwxr-xr-x 1 root root 2042632 Mar 31 2021 /lib32/libc-2.33.so > > My gcc version 10 does use the same libc version. > As already mentioned, I can also reproduce this on a VM with Ubuntu 20.04 and libc-2.31. > In addition, originally, this issue was first reproduced with an own buildroot RootFS and containing libc-2.28. > > As you see, the libcs are not that old. So, new glibc does have a fallback timer_create() which can return 0 as a timer_id: https://elixir.bootlin.com/glibc/latest/source/sysdeps/unix/sysv/linux/timer_create.c#L150 ...but it should only be using that code if the binary was built against an old libc and then dynamically links with the new one, which is a bit of an odd setup, and I'm not sure why you run into it. -- PMM
On Mon, 25 Jul 2022 at 12:13, Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Mon, Jul 25, 2022 at 12:00:35PM +0100, Peter Maydell wrote: > > For handling guest POSIX timers, we currently use an array > > g_posix_timers[], whose entries are a host timer_t value, or 0 for > > "this slot is unused". When the guest calls the timer_create syscall > > we look through the array for a slot containing 0, and use that for > > the new timer. > > > > This scheme assumes that host timer_t values can never be zero. This > > is unfortunately not a valid assumption -- for some host libc > > versions, timer_t values are simply indexes starting at 0. When > > using this kind of host libc, the effect is that the first and second > > timers end up sharing a slot, and so when the guest tries to operate > > on the first timer it changes the second timer instead. > > For sake of historical record, could you mention here which specific > libc impl / version highlights the problem. How about: "This can happen if you are using glibc's backwards-compatible 'timer_t is an integer' compat code for some reason. This happens when a glibc newer than 2.3.3 is used for a program that was linked to work with glibc 2.2 to 2.3.3." Laurent, I'm going to assume you don't need a v2 sending just for a commit message tweak, unless you'd like me to do that. thanks -- PMM
Laurent, ping ? thanks -- PMM On Mon, 1 Aug 2022 at 12:43, Peter Maydell <peter.maydell@linaro.org> wrote: > > On Mon, 25 Jul 2022 at 12:13, Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > On Mon, Jul 25, 2022 at 12:00:35PM +0100, Peter Maydell wrote: > > > For handling guest POSIX timers, we currently use an array > > > g_posix_timers[], whose entries are a host timer_t value, or 0 for > > > "this slot is unused". When the guest calls the timer_create syscall > > > we look through the array for a slot containing 0, and use that for > > > the new timer. > > > > > > This scheme assumes that host timer_t values can never be zero. This > > > is unfortunately not a valid assumption -- for some host libc > > > versions, timer_t values are simply indexes starting at 0. When > > > using this kind of host libc, the effect is that the first and second > > > timers end up sharing a slot, and so when the guest tries to operate > > > on the first timer it changes the second timer instead. > > > > For sake of historical record, could you mention here which specific > > libc impl / version highlights the problem. > > How about: > > "This can happen if you are using glibc's backwards-compatible > 'timer_t is an integer' compat code for some reason. This happens > when a glibc newer than 2.3.3 is used for a program that was > linked to work with glibc 2.2 to 2.3.3." > > Laurent, I'm going to assume you don't need a v2 sending just > for a commit message tweak, unless you'd like me to do that. > > thanks > -- PMM
Le 09/08/2022 à 11:51, Peter Maydell a écrit : > Laurent, ping ? Sorry, I didn't see your message. I'm going to apply it if it's ok to go into rc3? Thanks, Laurent > > thanks > -- PMM > > On Mon, 1 Aug 2022 at 12:43, Peter Maydell <peter.maydell@linaro.org> wrote: >> >> On Mon, 25 Jul 2022 at 12:13, Daniel P. Berrangé <berrange@redhat.com> wrote: >>> >>> On Mon, Jul 25, 2022 at 12:00:35PM +0100, Peter Maydell wrote: >>>> For handling guest POSIX timers, we currently use an array >>>> g_posix_timers[], whose entries are a host timer_t value, or 0 for >>>> "this slot is unused". When the guest calls the timer_create syscall >>>> we look through the array for a slot containing 0, and use that for >>>> the new timer. >>>> >>>> This scheme assumes that host timer_t values can never be zero. This >>>> is unfortunately not a valid assumption -- for some host libc >>>> versions, timer_t values are simply indexes starting at 0. When >>>> using this kind of host libc, the effect is that the first and second >>>> timers end up sharing a slot, and so when the guest tries to operate >>>> on the first timer it changes the second timer instead. >>> >>> For sake of historical record, could you mention here which specific >>> libc impl / version highlights the problem. >> >> How about: >> >> "This can happen if you are using glibc's backwards-compatible >> 'timer_t is an integer' compat code for some reason. This happens >> when a glibc newer than 2.3.3 is used for a program that was >> linked to work with glibc 2.2 to 2.3.3." >> >> Laurent, I'm going to assume you don't need a v2 sending just >> for a commit message tweak, unless you'd like me to do that. >> >> thanks >> -- PMM >
On Wed, 10 Aug 2022 at 06:59, Laurent Vivier <laurent@vivier.eu> wrote: > > Le 09/08/2022 à 11:51, Peter Maydell a écrit : > > Laurent, ping ? > > Sorry, I didn't see your message. I'm going to apply it if it's ok to go into rc3? Not sure about rc3; I'd have been OK with it in rc2 but I think it could probably use a bit more testing. Since it only shows up in weird configs, probably better to leave it til 7.2 now. thanks -- PMM
Le 25/07/2022 à 13:00, Peter Maydell a écrit : > For handling guest POSIX timers, we currently use an array > g_posix_timers[], whose entries are a host timer_t value, or 0 for > "this slot is unused". When the guest calls the timer_create syscall > we look through the array for a slot containing 0, and use that for > the new timer. > > This scheme assumes that host timer_t values can never be zero. This > is unfortunately not a valid assumption -- for some host libc > versions, timer_t values are simply indexes starting at 0. When > using this kind of host libc, the effect is that the first and second > timers end up sharing a slot, and so when the guest tries to operate > on the first timer it changes the second timer instead. > > Rework the timer allocation code, so that: > * the 'slot in use' indication uses a separate array from the > host timer_t array > * we grab the free slot atomically, to avoid races when multiple > threads call timer_create simultaneously > * releasing an allocated slot is abstracted out into a new > free_host_timer_slot() function called in the correct places > > This fixes: > * problems on hosts where timer_t 0 is valid > * the FIXME in next_free_host_timer() about locking > * bugs in the error paths in timer_create where we forgot to release > the slot we grabbed, or forgot to free the host timer > > Reported-by: Jon Alduan <jon.alduan@gmail.com> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > --- > linux-user/syscall.c | 24 ++++++++++++++++-------- > 1 file changed, 16 insertions(+), 8 deletions(-) > Applied to my linux-user-for-7.2 branch. Thanks, Laurent
diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 991b85e6b4d..b75de1bc8d6 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -497,20 +497,25 @@ _syscall4(int, sys_prlimit64, pid_t, pid, int, resource, #if defined(TARGET_NR_timer_create) /* Maximum of 32 active POSIX timers allowed at any one time. */ -static timer_t g_posix_timers[32] = { 0, } ; +#define GUEST_TIMER_MAX 32 +static timer_t g_posix_timers[GUEST_TIMER_MAX]; +static int g_posix_timer_allocated[GUEST_TIMER_MAX]; static inline int next_free_host_timer(void) { - int k ; - /* FIXME: Does finding the next free slot require a lock? */ - for (k = 0; k < ARRAY_SIZE(g_posix_timers); k++) { - if (g_posix_timers[k] == 0) { - g_posix_timers[k] = (timer_t) 1; + int k; + for (k = 0; k < ARRAY_SIZE(g_posix_timer_allocated); k++) { + if (qatomic_xchg(g_posix_timer_allocated + k, 1) == 0) { return k; } } return -1; } + +static inline void free_host_timer_slot(int id) +{ + qatomic_store_release(g_posix_timer_allocated + id, 0); +} #endif static inline int host_to_target_errno(int host_errno) @@ -12832,15 +12837,18 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int num, abi_long arg1, phost_sevp = &host_sevp; ret = target_to_host_sigevent(phost_sevp, arg2); if (ret != 0) { + free_host_timer_slot(timer_index); return ret; } } ret = get_errno(timer_create(clkid, phost_sevp, phtimer)); if (ret) { - phtimer = NULL; + free_host_timer_slot(timer_index); } else { if (put_user(TIMER_MAGIC | timer_index, arg3, target_timer_t)) { + timer_delete(*phtimer); + free_host_timer_slot(timer_index); return -TARGET_EFAULT; } } @@ -12976,7 +12984,7 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int num, abi_long arg1, } else { timer_t htimer = g_posix_timers[timerid]; ret = get_errno(timer_delete(htimer)); - g_posix_timers[timerid] = 0; + free_host_timer_slot(timerid); } return ret; }
For handling guest POSIX timers, we currently use an array g_posix_timers[], whose entries are a host timer_t value, or 0 for "this slot is unused". When the guest calls the timer_create syscall we look through the array for a slot containing 0, and use that for the new timer. This scheme assumes that host timer_t values can never be zero. This is unfortunately not a valid assumption -- for some host libc versions, timer_t values are simply indexes starting at 0. When using this kind of host libc, the effect is that the first and second timers end up sharing a slot, and so when the guest tries to operate on the first timer it changes the second timer instead. Rework the timer allocation code, so that: * the 'slot in use' indication uses a separate array from the host timer_t array * we grab the free slot atomically, to avoid races when multiple threads call timer_create simultaneously * releasing an allocated slot is abstracted out into a new free_host_timer_slot() function called in the correct places This fixes: * problems on hosts where timer_t 0 is valid * the FIXME in next_free_host_timer() about locking * bugs in the error paths in timer_create where we forgot to release the slot we grabbed, or forgot to free the host timer Reported-by: Jon Alduan <jon.alduan@gmail.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- linux-user/syscall.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-)