Message ID | 20180123144807.5618-13-laurent@vivier.eu |
---|---|
State | Accepted |
Commit | 9c4bbee9e3b83544257e82566342c29e15a88637 |
Headers | show |
Series | [PULL,01/13] linux-user: Fix locking order in fork_start() | expand |
Le 23/01/2018 à 15:48, Laurent Vivier a écrit : > From: Peter Maydell <peter.maydell@linaro.org> > > If multiple guest threads in user-mode emulation write to a > page which QEMU has marked read-only because of cached TCG > translations, the threads can race in page_unprotect: > > * threads A & B both try to do a write to a page with code in it at > the same time (ie which we've made non-writeable, so SEGV) > * they race into the signal handler with this faulting address > * thread A happens to get to page_unprotect() first and takes the > mmap lock, so thread B sits waiting for it to be done > * A then finds the page, marks it PAGE_WRITE and mprotect()s it writable > * A can then continue OK (returns from signal handler to retry the > memory access) > * ...but when B gets the mmap lock it finds that the page is already > PAGE_WRITE, and so it exits page_unprotect() via the "not due to > protected translation" code path, and wrongly delivers the signal > to the guest rather than just retrying the access > > In particular, this meant that trying to run 'javac' in user-mode > emulation would fail with a spurious guest SIGSEGV. > > Handle this by making page_unprotect() assume that a call for a page > which is already PAGE_WRITE is due to a race of this sort and return > a "fault handled" indication. > > Since this would cause an infinite loop if we ever called > page_unprotect() for some other kind of fault than "write failed due > to bad access permissions", tighten the condition in > handle_cpu_signal() to check the signal number and si_code, and add a > comment so that if somebody does ever find themselves debugging an > infinite loop of faults they have some clue about why. > > (The trick for identifying the correct setting for > current_tb_invalidated for thread B (needed to handle the precise-SMC > case) is due to Richard Henderson. Paolo Bonzini suggested just > relying on si_code rather than trying anything more complicated.) > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > Message-Id: <1511879725-9576-3-git-send-email-peter.maydell@linaro.org> > Signed-off-by: Laurent Vivier <laurent@vivier.eu> > --- > accel/tcg/translate-all.c | 50 +++++++++++++++++++++++++++++------------------ > accel/tcg/user-exec.c | 13 +++++++++++- > 2 files changed, 43 insertions(+), 20 deletions(-) > It seems this patch breaks something in linux-user mode emulation for m68k (32bit BE) on ppc (32bit BE). What I have: ~/chroot$ sudo QEMU_CPU=m68040 chroot m68k/sid/ I have no name!@localhost:/# ls bin debootstrap etc lib qemu-m68k run sys usr boot dev home proc root sbin tmp var qemu: uncaught target signal 11 (Segmentation fault) - core dumped ~/chroot$ It seems "bash" crashes on "ls" exit. My chroot has been installed with: ARCH=m68k TARGET=sid CHROOT=$HOME/chroot/m68k/sid/ REPOT=http://cdn-fastly.deb.debian.org/debian-ports/ debootstrap --arch=$ARCH --foreign --variant=minbase \ --no-check-gpg $TARGET $CHROOT $REPO I didn't investigate more. Thanks, Laurent
Le 22/03/2018 à 02:52, Laurent Vivier a écrit : > Le 23/01/2018 à 15:48, Laurent Vivier a écrit : >> From: Peter Maydell <peter.maydell@linaro.org> >> >> If multiple guest threads in user-mode emulation write to a >> page which QEMU has marked read-only because of cached TCG >> translations, the threads can race in page_unprotect: >> >> * threads A & B both try to do a write to a page with code in it at >> the same time (ie which we've made non-writeable, so SEGV) >> * they race into the signal handler with this faulting address >> * thread A happens to get to page_unprotect() first and takes the >> mmap lock, so thread B sits waiting for it to be done >> * A then finds the page, marks it PAGE_WRITE and mprotect()s it writable >> * A can then continue OK (returns from signal handler to retry the >> memory access) >> * ...but when B gets the mmap lock it finds that the page is already >> PAGE_WRITE, and so it exits page_unprotect() via the "not due to >> protected translation" code path, and wrongly delivers the signal >> to the guest rather than just retrying the access >> >> In particular, this meant that trying to run 'javac' in user-mode >> emulation would fail with a spurious guest SIGSEGV. >> >> Handle this by making page_unprotect() assume that a call for a page >> which is already PAGE_WRITE is due to a race of this sort and return >> a "fault handled" indication. >> >> Since this would cause an infinite loop if we ever called >> page_unprotect() for some other kind of fault than "write failed due >> to bad access permissions", tighten the condition in >> handle_cpu_signal() to check the signal number and si_code, and add a >> comment so that if somebody does ever find themselves debugging an >> infinite loop of faults they have some clue about why. >> >> (The trick for identifying the correct setting for >> current_tb_invalidated for thread B (needed to handle the precise-SMC >> case) is due to Richard Henderson. Paolo Bonzini suggested just >> relying on si_code rather than trying anything more complicated.) >> >> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> >> Message-Id: <1511879725-9576-3-git-send-email-peter.maydell@linaro.org> >> Signed-off-by: Laurent Vivier <laurent@vivier.eu> >> --- >> accel/tcg/translate-all.c | 50 +++++++++++++++++++++++++++++------------------ >> accel/tcg/user-exec.c | 13 +++++++++++- >> 2 files changed, 43 insertions(+), 20 deletions(-) >> > > It seems this patch breaks something in linux-user mode emulation for > m68k (32bit BE) on ppc (32bit BE). > > What I have: > > ~/chroot$ sudo QEMU_CPU=m68040 chroot m68k/sid/ > I have no name!@localhost:/# ls > bin debootstrap etc lib qemu-m68k run sys usr > boot dev home proc root sbin tmp var > qemu: uncaught target signal 11 (Segmentation fault) - core dumped > ~/chroot$ > > It seems "bash" crashes on "ls" exit. > > My chroot has been installed with: > > ARCH=m68k > TARGET=sid > CHROOT=$HOME/chroot/m68k/sid/ > REPOT=http://cdn-fastly.deb.debian.org/debian-ports/ > debootstrap --arch=$ARCH --foreign --variant=minbase \ > --no-check-gpg $TARGET $CHROOT $REPO > > I didn't investigate more. It goes wrong in this part: + */ + if (is_write && info->si_signo == SIGSEGV && info->si_code == SEGV_ACCERR && + h2g_valid(address)) { Because, on ppc, si_code is SEGV_MAPERR and not SEGV_ACCERR (on x86_64, si_code is SEGV_ACCERR as expected) Any idea? Thanks, Laurent
On 22 March 2018 at 10:36, Laurent Vivier <laurent@vivier.eu> wrote: > Le 22/03/2018 à 02:52, Laurent Vivier a écrit : >> It seems this patch breaks something in linux-user mode emulation for >> m68k (32bit BE) on ppc (32bit BE). >> >> What I have: >> >> ~/chroot$ sudo QEMU_CPU=m68040 chroot m68k/sid/ >> I have no name!@localhost:/# ls >> bin debootstrap etc lib qemu-m68k run sys usr >> boot dev home proc root sbin tmp var >> qemu: uncaught target signal 11 (Segmentation fault) - core dumped >> ~/chroot$ >> >> It seems "bash" crashes on "ls" exit. >> >> My chroot has been installed with: >> >> ARCH=m68k >> TARGET=sid >> CHROOT=$HOME/chroot/m68k/sid/ >> REPOT=http://cdn-fastly.deb.debian.org/debian-ports/ >> debootstrap --arch=$ARCH --foreign --variant=minbase \ >> --no-check-gpg $TARGET $CHROOT $REPO >> >> I didn't investigate more. > > It goes wrong in this part: > > + */ > + if (is_write && info->si_signo == SIGSEGV && info->si_code == > SEGV_ACCERR && > + h2g_valid(address)) { > > Because, on ppc, si_code is SEGV_MAPERR and not SEGV_ACCERR > (on x86_64, si_code is SEGV_ACCERR as expected) So on PPC if you have a page mapped, and you access it with the wrong permissions, you get SEGV_MAPERR? This seems like a host kernel bug to me. thanks -- PMM
On 22 March 2018 at 11:05, Peter Maydell <peter.maydell@linaro.org> wrote: > On 22 March 2018 at 10:36, Laurent Vivier <laurent@vivier.eu> wrote: >> It goes wrong in this part: >> >> + */ >> + if (is_write && info->si_signo == SIGSEGV && info->si_code == >> SEGV_ACCERR && >> + h2g_valid(address)) { >> >> Because, on ppc, si_code is SEGV_MAPERR and not SEGV_ACCERR >> (on x86_64, si_code is SEGV_ACCERR as expected) > > So on PPC if you have a page mapped, and you access it with > the wrong permissions, you get SEGV_MAPERR? This seems like > a host kernel bug to me. ...in particular, kernel commit ecb101aed86156e (dated Dec 2017) fixes a regression introduced in commit c3350602e876 that broke the ppc kernels so they started returning SEGV_MAPERR here instead of SEGV_ACCERR. Presumably your host kernel is missing this fix. thanks -- PMM
Le 22/03/2018 à 12:05, Peter Maydell a écrit : > On 22 March 2018 at 10:36, Laurent Vivier <laurent@vivier.eu> wrote: >> Le 22/03/2018 à 02:52, Laurent Vivier a écrit : >>> It seems this patch breaks something in linux-user mode emulation for >>> m68k (32bit BE) on ppc (32bit BE). >>> >>> What I have: >>> >>> ~/chroot$ sudo QEMU_CPU=m68040 chroot m68k/sid/ >>> I have no name!@localhost:/# ls >>> bin debootstrap etc lib qemu-m68k run sys usr >>> boot dev home proc root sbin tmp var >>> qemu: uncaught target signal 11 (Segmentation fault) - core dumped >>> ~/chroot$ >>> >>> It seems "bash" crashes on "ls" exit. >>> >>> My chroot has been installed with: >>> >>> ARCH=m68k >>> TARGET=sid >>> CHROOT=$HOME/chroot/m68k/sid/ >>> REPOT=http://cdn-fastly.deb.debian.org/debian-ports/ >>> debootstrap --arch=$ARCH --foreign --variant=minbase \ >>> --no-check-gpg $TARGET $CHROOT $REPO >>> >>> I didn't investigate more. >> >> It goes wrong in this part: >> >> + */ >> + if (is_write && info->si_signo == SIGSEGV && info->si_code == >> SEGV_ACCERR && >> + h2g_valid(address)) { >> >> Because, on ppc, si_code is SEGV_MAPERR and not SEGV_ACCERR >> (on x86_64, si_code is SEGV_ACCERR as expected) > > So on PPC if you have a page mapped, and you access it with > the wrong permissions, you get SEGV_MAPERR? This seems like > a host kernel bug to me. Are we sure it is mapped? How to know? otherwise yes, it sounds like a kernel bug. Thanks, Laurent
On 22 March 2018 at 11:07, Laurent Vivier <laurent@vivier.eu> wrote: > Le 22/03/2018 à 12:05, Peter Maydell a écrit : >> On 22 March 2018 at 10:36, Laurent Vivier <laurent@vivier.eu> wrote:re. >>> It goes wrong in this part: >>> >>> + */ >>> + if (is_write && info->si_signo == SIGSEGV && info->si_code == >>> SEGV_ACCERR && >>> + h2g_valid(address)) { >>> >>> Because, on ppc, si_code is SEGV_MAPERR and not SEGV_ACCERR >>> (on x86_64, si_code is SEGV_ACCERR as expected) >> >> So on PPC if you have a page mapped, and you access it with >> the wrong permissions, you get SEGV_MAPERR? This seems like >> a host kernel bug to me. > > Are we sure it is mapped? How to know? We know it's mapped because the kernel doesn't give us the SEGV_MAPERR code :-) Access to unmapped pages must be the guest binary's problem -- the thing we're trying to detect here is "is this a write access to a page that we mapped read-only because we have a cache of code translated for it", which is always going to be "mapped but not with the right permissions". thanks -- PMM
Le 22/03/2018 à 12:07, Peter Maydell a écrit : > On 22 March 2018 at 11:05, Peter Maydell <peter.maydell@linaro.org> wrote: >> On 22 March 2018 at 10:36, Laurent Vivier <laurent@vivier.eu> wrote: >>> It goes wrong in this part: >>> >>> + */ >>> + if (is_write && info->si_signo == SIGSEGV && info->si_code == >>> SEGV_ACCERR && >>> + h2g_valid(address)) { >>> >>> Because, on ppc, si_code is SEGV_MAPERR and not SEGV_ACCERR >>> (on x86_64, si_code is SEGV_ACCERR as expected) >> >> So on PPC if you have a page mapped, and you access it with >> the wrong permissions, you get SEGV_MAPERR? This seems like >> a host kernel bug to me. > > ...in particular, kernel commit ecb101aed86156e (dated Dec 2017) > fixes a regression introduced in commit c3350602e876 that broke > the ppc kernels so they started returning SEGV_MAPERR here > instead of SEGV_ACCERR. Presumably your host kernel is missing > this fix. Yes, you're right, my kernel is 4.14-rc1 (6e80ecd) with c3350602e876 but without ecb101aed86156e. I'm going to update it. Thanks, Laurent
Le 22/03/2018 à 12:13, Laurent Vivier a écrit : > Le 22/03/2018 à 12:07, Peter Maydell a écrit : >> On 22 March 2018 at 11:05, Peter Maydell <peter.maydell@linaro.org> wrote: >>> On 22 March 2018 at 10:36, Laurent Vivier <laurent@vivier.eu> wrote: >>>> It goes wrong in this part: >>>> >>>> + */ >>>> + if (is_write && info->si_signo == SIGSEGV && info->si_code == >>>> SEGV_ACCERR && >>>> + h2g_valid(address)) { >>>> >>>> Because, on ppc, si_code is SEGV_MAPERR and not SEGV_ACCERR >>>> (on x86_64, si_code is SEGV_ACCERR as expected) >>> >>> So on PPC if you have a page mapped, and you access it with >>> the wrong permissions, you get SEGV_MAPERR? This seems like >>> a host kernel bug to me. >> >> ...in particular, kernel commit ecb101aed86156e (dated Dec 2017) >> fixes a regression introduced in commit c3350602e876 that broke >> the ppc kernels so they started returning SEGV_MAPERR here >> instead of SEGV_ACCERR. Presumably your host kernel is missing >> this fix. > > Yes, you're right, my kernel is 4.14-rc1 (6e80ecd) with > c3350602e876 but without ecb101aed86156e. > > I'm going to update it. Re-tested with 4.16-rc6 on ppc32 and it works fine. Thanks, Laurent
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 7736257085..67795cd78c 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -2181,29 +2181,41 @@ int page_unprotect(target_ulong address, uintptr_t pc) /* if the page was really writable, then we change its protection back to writable */ - if ((p->flags & PAGE_WRITE_ORG) && !(p->flags & PAGE_WRITE)) { - host_start = address & qemu_host_page_mask; - host_end = host_start + qemu_host_page_size; - - prot = 0; + if (p->flags & PAGE_WRITE_ORG) { current_tb_invalidated = false; - for (addr = host_start ; addr < host_end ; addr += TARGET_PAGE_SIZE) { - p = page_find(addr >> TARGET_PAGE_BITS); - p->flags |= PAGE_WRITE; - prot |= p->flags; - - /* and since the content will be modified, we must invalidate - the corresponding translated code. */ - current_tb_invalidated |= tb_invalidate_phys_page(addr, pc); -#ifdef CONFIG_USER_ONLY - if (DEBUG_TB_CHECK_GATE) { - tb_invalidate_check(addr); + if (p->flags & PAGE_WRITE) { + /* If the page is actually marked WRITE then assume this is because + * this thread raced with another one which got here first and + * set the page to PAGE_WRITE and did the TB invalidate for us. + */ +#ifdef TARGET_HAS_PRECISE_SMC + TranslationBlock *current_tb = tb_find_pc(pc); + if (current_tb) { + current_tb_invalidated = tb_cflags(current_tb) & CF_INVALID; } #endif + } else { + host_start = address & qemu_host_page_mask; + host_end = host_start + qemu_host_page_size; + + prot = 0; + for (addr = host_start; addr < host_end; addr += TARGET_PAGE_SIZE) { + p = page_find(addr >> TARGET_PAGE_BITS); + p->flags |= PAGE_WRITE; + prot |= p->flags; + + /* and since the content will be modified, we must invalidate + the corresponding translated code. */ + current_tb_invalidated |= tb_invalidate_phys_page(addr, pc); +#ifdef CONFIG_USER_ONLY + if (DEBUG_TB_CHECK_GATE) { + tb_invalidate_check(addr); + } +#endif + } + mprotect((void *)g2h(host_start), qemu_host_page_size, + prot & PAGE_BITS); } - mprotect((void *)g2h(host_start), qemu_host_page_size, - prot & PAGE_BITS); - mmap_unlock(); /* If current TB was invalidated return to main loop */ return current_tb_invalidated ? 2 : 1; diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c index e8f26ff0cb..c973752562 100644 --- a/accel/tcg/user-exec.c +++ b/accel/tcg/user-exec.c @@ -104,7 +104,18 @@ static inline int handle_cpu_signal(uintptr_t pc, siginfo_t *info, pc, address, is_write, *(unsigned long *)old_set); #endif /* XXX: locking issue */ - if (is_write && h2g_valid(address)) { + /* Note that it is important that we don't call page_unprotect() unless + * this is really a "write to nonwriteable page" fault, because + * page_unprotect() assumes that if it is called for an access to + * a page that's writeable this means we had two threads racing and + * another thread got there first and already made the page writeable; + * so we will retry the access. If we were to call page_unprotect() + * for some other kind of fault that should really be passed to the + * guest, we'd end up in an infinite loop of retrying the faulting + * access. + */ + if (is_write && info->si_signo == SIGSEGV && info->si_code == SEGV_ACCERR && + h2g_valid(address)) { switch (page_unprotect(h2g(address), pc)) { case 0: /* Fault not caused by a page marked unwritable to protect