Message ID | 20230823051615.1297706-1-richard.henderson@linaro.org |
---|---|
Headers | show |
Series | linux-user: Detect and report host crashes | expand |
Ping. On 8/22/23 22:16, Richard Henderson wrote: > More signal cleanups. Mostly tested by temporarily adding an > abort, divide by zero, undefined instruction, null dereference, > within the implementation of a guest syscall to induce an error. > > > r~ > > > Helge Deller (1): > linux-user: Detect and report host crashes > > Richard Henderson (9): > linux-user: Split out die_with_signal > linux-user: Exit not abort in die_with_backtrace > linux-user: Use die_with_signal with abort > linux-user: Only register handlers for core_dump_signal by default > linux-user: Map unsupported signals to an out-of-bounds value > linux-user: Remap SIGPROF when CONFIG_GPROF > linux-user: Simplify signal_init > linux-user: Split out host_sig{segv,bus}_handler > linux-user: Detect and report host SIGILL, SIGFPE, SIGTRAP > > linux-user/signal.c | 400 ++++++++++++++++++++++++++------------------ > 1 file changed, 240 insertions(+), 160 deletions(-) >
On 9/9/23 21:12, Richard Henderson wrote: > On 8/22/23 22:16, Richard Henderson wrote: >> More signal cleanups. Mostly tested by temporarily adding an >> abort, divide by zero, undefined instruction, null dereference, >> within the implementation of a guest syscall to induce an error. Applied on top of git head, linking fails when builing the *static* qemu-user binary: /usr/bin/ld: /usr/lib/gcc/x86_64-redhat-linux/13/../../../../lib64/libc.a(abort.o): in function `abort': (.text.unlikely+0x0): multiple definition of `abort'; libqemu-aarch64-linux-user.fa.p/linux-user_signal.c.o:/srv/_build/../../home/cvs/qemu/qemu/linux-user/signal.c:723: first defined here Helge >> >> Helge Deller (1): >> linux-user: Detect and report host crashes >> >> Richard Henderson (9): >> linux-user: Split out die_with_signal >> linux-user: Exit not abort in die_with_backtrace >> linux-user: Use die_with_signal with abort >> linux-user: Only register handlers for core_dump_signal by default >> linux-user: Map unsupported signals to an out-of-bounds value >> linux-user: Remap SIGPROF when CONFIG_GPROF >> linux-user: Simplify signal_init >> linux-user: Split out host_sig{segv,bus}_handler >> linux-user: Detect and report host SIGILL, SIGFPE, SIGTRAP >> >> linux-user/signal.c | 400 ++++++++++++++++++++++++++------------------ >> 1 file changed, 240 insertions(+), 160 deletions(-) >> > >
12.09.2023 12:45, Helge Deller: > /usr/bin/ld: /usr/lib/gcc/x86_64-redhat-linux/13/../../../../lib64/libc.a(abort.o): in function `abort': > (.text.unlikely+0x0): multiple definition of `abort'; > libqemu-aarch64-linux-user.fa.p/linux-user_signal.c.o:/srv/_build/../../home/cvs/qemu/qemu/linux-user/signal.c:723: first defined here [PATCH v4 03/10] linux-user: Use die_with_signal with abort Sigh. I'd be double-cautious when overriding system functions like this, - it's almost always a bad idea. /mjt
On 9/12/23 12:34, Michael Tokarev wrote: > 12.09.2023 12:45, Helge Deller: > >> /usr/bin/ld: /usr/lib/gcc/x86_64-redhat-linux/13/../../../../lib64/libc.a(abort.o): in function `abort': >> (.text.unlikely+0x0): multiple definition of `abort'; libqemu-aarch64-linux-user.fa.p/linux-user_signal.c.o:/srv/_build/../../home/cvs/qemu/qemu/linux-user/signal.c:723: first defined here > > [PATCH v4 03/10] linux-user: Use die_with_signal with abort > > Sigh. > > I'd be double-cautious when overriding system functions like this, - it's > almost always a bad idea. Richard, I'm not sure, but with that change: -void abort(void) +void __attribute__((weak)) abort(void) it will at least successfully link the binary. Not sure which effects it has, but probably not worse than before your patch series... Helge
18.09.2023 17:05, Helge Deller wrote: > On 9/12/23 12:34, Michael Tokarev wrote: >> 12.09.2023 12:45, Helge Deller: >> >>> /usr/bin/ld: /usr/lib/gcc/x86_64-redhat-linux/13/../../../../lib64/libc.a(abort.o): in function `abort': >>> (.text.unlikely+0x0): multiple definition of `abort'; >>> libqemu-aarch64-linux-user.fa.p/linux-user_signal.c.o:/srv/_build/../../home/cvs/qemu/qemu/linux-user/signal.c:723: first defined here >> >> [PATCH v4 03/10] linux-user: Use die_with_signal with abort >> >> Sigh. >> >> I'd be double-cautious when overriding system functions like this, - it's >> almost always a bad idea. > > Richard, I'm not sure, but with that change: > > -void abort(void) > +void __attribute__((weak)) abort(void) > > it will at least successfully link the binary. Not sure which effects it has, > but probably not worse than before your patch series... With weak here, ld should peak abort() from glibc, basically reverting this patch: linux-user: Use die_with_signal with abort +/* + * The system abort() will raise SIGABRT, which will get caught and deferred + * by host_signal_handler. Returning into system abort will try harder. + * Eventually, on x86, it will execute HLT, which raises SIGSEGV. This goes + * back into host_signal_handler, through a different path which may longjmp + * back to the main loop. This often explodes. + */ +void abort(void) +{ + die_with_signal(SIGABRT); +} + I think it's better to revert it now. Probably the right solution is to use qemu_abort() (and qemu_assert() etc), and maybe #define abort(x) qemu_abort(x). Even if some way to redefine abort like the above will work on glibc, it does not mean it will work on *bsd and in other contexts. Yes, providing its own abort() is tempting due to its simplicity. But it is a grey area at best. Thanks, /mjt
On 9/19/23 09:40, Michael Tokarev wrote: > 18.09.2023 17:05, Helge Deller wrote: >> On 9/12/23 12:34, Michael Tokarev wrote: >>> 12.09.2023 12:45, Helge Deller: >>> >>>> /usr/bin/ld: /usr/lib/gcc/x86_64-redhat-linux/13/../../../../lib64/libc.a(abort.o): in function `abort': >>>> (.text.unlikely+0x0): multiple definition of `abort'; libqemu-aarch64-linux-user.fa.p/linux-user_signal.c.o:/srv/_build/../../home/cvs/qemu/qemu/linux-user/signal.c:723: first defined here >>> >>> [PATCH v4 03/10] linux-user: Use die_with_signal with abort >>> >>> Sigh. >>> >>> I'd be double-cautious when overriding system functions like this, - it's >>> almost always a bad idea. >> >> Richard, I'm not sure, but with that change: >> >> -void abort(void) >> +void __attribute__((weak)) abort(void) >> >> it will at least successfully link the binary. Not sure which effects it has, >> but probably not worse than before your patch series... > > With weak here, ld should peak abort() from glibc, basically reverting this > patch: > > linux-user: Use die_with_signal with abort > +/* > + * The system abort() will raise SIGABRT, which will get caught and deferred > + * by host_signal_handler. Returning into system abort will try harder. > + * Eventually, on x86, it will execute HLT, which raises SIGSEGV. This goes > + * back into host_signal_handler, through a different path which may longjmp > + * back to the main loop. This often explodes. > + */ > +void abort(void) > +{ > + die_with_signal(SIGABRT); > +} > + > > I think it's better to revert it now. Maybe. I only did static compile builds, and not even for all platforms. Since I assume Richard did test it in his builds (which probably were linux-user built as dynamic executable) seem to have worked for him, the code probably works on those binaries. If this function is left out on static builds (because libc.a already provides this function), then there is no difference (aka same not-optimal behaviour) as without this patch. So, keeping this patch may help for dynamic linux-user executables at least. > Probably the right solution is to use qemu_abort() (and qemu_assert() etc), > and maybe #define abort(x) qemu_abort(x). Even if some way to redefine > abort like the above will work on glibc, it does not mean it will work > on *bsd and in other contexts. True. That's probably the better solution. > Yes, providing its own abort() is tempting due to its simplicity. > But it is a grey area at best. Helge
19.09.2023 11:00, Helge Deller wrote: .. >> Probably the right solution is to use qemu_abort() (and qemu_assert() etc), >> and maybe #define abort(x) qemu_abort(x). Even if some way to redefine >> abort like the above will work on glibc, it does not mean it will work >> on *bsd and in other contexts. > > True. That's probably the better solution. That wont work, since abort() gets called from a lot of libraries (gilbc has 1000s of calls to it) Sigh. /mjt
On 9/18/23 16:05, Helge Deller wrote: > On 9/12/23 12:34, Michael Tokarev wrote: >> 12.09.2023 12:45, Helge Deller: >> >>> /usr/bin/ld: /usr/lib/gcc/x86_64-redhat-linux/13/../../../../lib64/libc.a(abort.o): in >>> function `abort': >>> (.text.unlikely+0x0): multiple definition of `abort'; >>> libqemu-aarch64-linux-user.fa.p/linux-user_signal.c.o:/srv/_build/../../home/cvs/qemu/qemu/linux-user/signal.c:723: first defined here >> >> [PATCH v4 03/10] linux-user: Use die_with_signal with abort >> >> Sigh. >> >> I'd be double-cautious when overriding system functions like this, - it's >> almost always a bad idea. > > Richard, I'm not sure, but with that change: > > -void abort(void) > +void __attribute__((weak)) abort(void) > > it will at least successfully link the binary. Not sure which effects it has, > but probably not worse than before your patch series... This won't work, in that it will have no effect, and we continue to have the weird longjmp assertion message after stack corruption. Probably we will have to replace all of the apis that can raise abort at the source level, e.g. void qemu_abort(void) __attribute__((noreturn)); void qemu_abort_msg(const char *) __attribute__((noreturn)); #undef abort #define abort qemu_abort #undef assert #define assert(X) ... #undef g_assert #define g_assert(X) assert(X) etc. r~
On 9/19/23 10:26, Michael Tokarev wrote: > 19.09.2023 11:00, Helge Deller wrote: > .. >>> Probably the right solution is to use qemu_abort() (and qemu_assert() etc), >>> and maybe #define abort(x) qemu_abort(x). Even if some way to redefine >>> abort like the above will work on glibc, it does not mean it will work >>> on *bsd and in other contexts. >> >> True. That's probably the better solution. > > That wont work, since abort() gets called from a lot of libraries > (gilbc has 1000s of calls to it) > > Sigh. > > /mjt A possible solution that occurs to me is to treat SIGABRT like patch 7 of this patch set treats SIGPROF: remap the guest signal to one of the host RT signals. Then we leave the host SIGABRT as SIG_DFL, producing the expected crash when the signal originates from a host abort() (etc). A guest abort() would use a different signal which is caught and emulated. Things do get confusing across processes, but should be no worse than any of the existing signal number swizzling. Thoughts? r~
On 9/19/23 10:38, Richard Henderson wrote: > On 9/19/23 10:26, Michael Tokarev wrote: >> 19.09.2023 11:00, Helge Deller wrote: >> .. >>>> Probably the right solution is to use qemu_abort() (and qemu_assert() etc), >>>> and maybe #define abort(x) qemu_abort(x). Even if some way to redefine >>>> abort like the above will work on glibc, it does not mean it will work >>>> on *bsd and in other contexts. >>> >>> True. That's probably the better solution. >> >> That wont work, since abort() gets called from a lot of libraries >> (gilbc has 1000s of calls to it) >> >> Sigh. >> >> /mjt > > A possible solution that occurs to me is to treat SIGABRT like patch 7 of this patch set treats SIGPROF: remap the guest signal to one of the host RT signals. > > Then we leave the host SIGABRT as SIG_DFL, producing the expected crash when the signal originates from a host abort() (etc). A guest abort() would use a different signal which is caught and emulated. > > Things do get confusing across processes, but should be no worse than any of the existing signal number swizzling. > > Thoughts? Yes, this could work. Helge
FWIW, there's an interesting read (suggested by iam_tj) https://stackoverflow.com/questions/39725138/strange-behaviour-while-wrapping-abort-system-call also https://drewdevault.com/2016/07/19/Using-Wl-wrap-for-mocking-in-C.html /mjt