Message ID | 20230716170150.22398-1-richard.henderson@linaro.org |
---|---|
State | New |
Headers | show |
Series | [for-8.1] accel/tcg: Take mmap_lock in load_atomic16_or_exit | expand |
On Sun, 16 Jul 2023 at 18:03, Richard Henderson <richard.henderson@linaro.org> wrote: > > For user-only, the probe for page writability may race with another > thread's mprotect. Take the mmap_lock around the operation. This > is still faster than the start/end_exclusive fallback. > > Remove the write probe in load_atomic8_or_exit. There we don't have > the same machinery for testing the existance of an 8-byte cmpxchg. "existence" > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > --- > accel/tcg/ldst_atomicity.c.inc | 54 +++++++++++++++------------------- > 1 file changed, 24 insertions(+), 30 deletions(-) > > diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc > index 4de0a80492..e7170f8ba2 100644 > --- a/accel/tcg/ldst_atomicity.c.inc > +++ b/accel/tcg/ldst_atomicity.c.inc > @@ -152,19 +152,6 @@ static uint64_t load_atomic8_or_exit(CPUArchState *env, uintptr_t ra, void *pv) > return load_atomic8(pv); > } > > -#ifdef CONFIG_USER_ONLY > - /* > - * If the page is not writable, then assume the value is immutable > - * and requires no locking. This ignores the case of MAP_SHARED with > - * another process, because the fallback start_exclusive solution > - * provides no protection across processes. > - */ > - if (page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) { > - uint64_t *p = __builtin_assume_aligned(pv, 8); > - return *p; > - } > -#endif I don't really understand the comment in the commit message: why would it be wrong to wrap this "test writeability and do the operation" in the mmap-lock, the same way we do for the 16-byte case? thanks -- PMM
Richard Henderson <richard.henderson@linaro.org> writes: > For user-only, the probe for page writability may race with another > thread's mprotect. Take the mmap_lock around the operation. This > is still faster than the start/end_exclusive fallback. Did we have a bug report or replication for this? > > Remove the write probe in load_atomic8_or_exit. There we don't have > the same machinery for testing the existance of an 8-byte cmpxchg. > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Otherwise makes sense to me: Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
On 7/17/23 11:12, Peter Maydell wrote: > On Sun, 16 Jul 2023 at 18:03, Richard Henderson > <richard.henderson@linaro.org> wrote: >> >> For user-only, the probe for page writability may race with another >> thread's mprotect. Take the mmap_lock around the operation. This >> is still faster than the start/end_exclusive fallback. >> >> Remove the write probe in load_atomic8_or_exit. There we don't have >> the same machinery for testing the existance of an 8-byte cmpxchg. > > "existence" > >> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >> --- >> accel/tcg/ldst_atomicity.c.inc | 54 +++++++++++++++------------------- >> 1 file changed, 24 insertions(+), 30 deletions(-) >> >> diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc >> index 4de0a80492..e7170f8ba2 100644 >> --- a/accel/tcg/ldst_atomicity.c.inc >> +++ b/accel/tcg/ldst_atomicity.c.inc >> @@ -152,19 +152,6 @@ static uint64_t load_atomic8_or_exit(CPUArchState *env, uintptr_t ra, void *pv) >> return load_atomic8(pv); >> } >> >> -#ifdef CONFIG_USER_ONLY >> - /* >> - * If the page is not writable, then assume the value is immutable >> - * and requires no locking. This ignores the case of MAP_SHARED with >> - * another process, because the fallback start_exclusive solution >> - * provides no protection across processes. >> - */ >> - if (page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) { >> - uint64_t *p = __builtin_assume_aligned(pv, 8); >> - return *p; >> - } >> -#endif > > I don't really understand the comment in the commit message: > why would it be wrong to wrap this "test writeability and > do the operation" in the mmap-lock, the same way we do for the > 16-byte case? It would not be wrong. I was just thinking of the cmpxchg8 part, for which we do not have a configure probe, and for which I *think* there's no call, because there are no 32-bit hosts that have cmpxchg8 but not the full CONFIG_ATOMIC64. r~
On 7/17/23 11:40, Alex Bennée wrote: > > Richard Henderson <richard.henderson@linaro.org> writes: > >> For user-only, the probe for page writability may race with another >> thread's mprotect. Take the mmap_lock around the operation. This >> is still faster than the start/end_exclusive fallback. > > Did we have a bug report or replication for this? See the comment: > + * We must take mmap_lock so that the query remains valid until > + * the write is complete -- tests/tcg/multiarch/munmap-pthread.c > + * is an example that can race. r~
Richard Henderson <richard.henderson@linaro.org> writes: > On 7/17/23 11:40, Alex Bennée wrote: >> Richard Henderson <richard.henderson@linaro.org> writes: >> >>> For user-only, the probe for page writability may race with another >>> thread's mprotect. Take the mmap_lock around the operation. This >>> is still faster than the start/end_exclusive fallback. >> Did we have a bug report or replication for this? > > See the comment: > >> + * We must take mmap_lock so that the query remains valid until >> + * the write is complete -- tests/tcg/multiarch/munmap-pthread.c >> + * is an example that can race. doh... > > > r~
On Mon, 17 Jul 2023 at 19:25, Richard Henderson <richard.henderson@linaro.org> wrote: > > On 7/17/23 11:12, Peter Maydell wrote: > > On Sun, 16 Jul 2023 at 18:03, Richard Henderson > > <richard.henderson@linaro.org> wrote: > >> > >> For user-only, the probe for page writability may race with another > >> thread's mprotect. Take the mmap_lock around the operation. This > >> is still faster than the start/end_exclusive fallback. > >> > >> Remove the write probe in load_atomic8_or_exit. There we don't have > >> the same machinery for testing the existance of an 8-byte cmpxchg. > > > > "existence" > > > >> > >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > >> --- > >> accel/tcg/ldst_atomicity.c.inc | 54 +++++++++++++++------------------- > >> 1 file changed, 24 insertions(+), 30 deletions(-) > >> > >> diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc > >> index 4de0a80492..e7170f8ba2 100644 > >> --- a/accel/tcg/ldst_atomicity.c.inc > >> +++ b/accel/tcg/ldst_atomicity.c.inc > >> @@ -152,19 +152,6 @@ static uint64_t load_atomic8_or_exit(CPUArchState *env, uintptr_t ra, void *pv) > >> return load_atomic8(pv); > >> } > >> > >> -#ifdef CONFIG_USER_ONLY > >> - /* > >> - * If the page is not writable, then assume the value is immutable > >> - * and requires no locking. This ignores the case of MAP_SHARED with > >> - * another process, because the fallback start_exclusive solution > >> - * provides no protection across processes. > >> - */ > >> - if (page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) { > >> - uint64_t *p = __builtin_assume_aligned(pv, 8); > >> - return *p; > >> - } > >> -#endif > > > > I don't really understand the comment in the commit message: > > why would it be wrong to wrap this "test writeability and > > do the operation" in the mmap-lock, the same way we do for the > > 16-byte case? > > It would not be wrong. I was just thinking of the cmpxchg8 part, for which we do not have > a configure probe, and for which I *think* there's no call, because there are no 32-bit > hosts that have cmpxchg8 but not the full CONFIG_ATOMIC64. But this piece of code the patch deletes isn't doing a cmpxchg8, it just does a plain load. If "take the lock and do the operation" is faster than always using the fallback code for the atomic16 case, why don't we make the same tradeoff choice in atomic8 ? thanks -- PMM
diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc index 4de0a80492..e7170f8ba2 100644 --- a/accel/tcg/ldst_atomicity.c.inc +++ b/accel/tcg/ldst_atomicity.c.inc @@ -152,19 +152,6 @@ static uint64_t load_atomic8_or_exit(CPUArchState *env, uintptr_t ra, void *pv) return load_atomic8(pv); } -#ifdef CONFIG_USER_ONLY - /* - * If the page is not writable, then assume the value is immutable - * and requires no locking. This ignores the case of MAP_SHARED with - * another process, because the fallback start_exclusive solution - * provides no protection across processes. - */ - if (page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) { - uint64_t *p = __builtin_assume_aligned(pv, 8); - return *p; - } -#endif - /* Ultimate fallback: re-execute in serial context. */ cpu_loop_exit_atomic(env_cpu(env), ra); } @@ -186,25 +173,32 @@ static Int128 load_atomic16_or_exit(CPUArchState *env, uintptr_t ra, void *pv) return atomic16_read_ro(p); } -#ifdef CONFIG_USER_ONLY - /* - * We can only use cmpxchg to emulate a load if the page is writable. - * If the page is not writable, then assume the value is immutable - * and requires no locking. This ignores the case of MAP_SHARED with - * another process, because the fallback start_exclusive solution - * provides no protection across processes. - */ - if (page_check_range(h2g(p), 16, PAGE_WRITE_ORG)) { - return *p; - } -#endif - - /* - * In system mode all guest pages are writable, and for user-only - * we have just checked writability. Try cmpxchg. - */ + /* We can only use cmpxchg to emulate a load if the page is writable. */ if (HAVE_ATOMIC128_RW) { +#ifdef CONFIG_USER_ONLY + /* + * If the page is not writable, then assume the value is immutable + * and requires no locking. This ignores the case of MAP_SHARED with + * another process, because the fallback start_exclusive solution + * provides no protection across processes. + * We must take mmap_lock so that the query remains valid until + * the write is complete -- tests/tcg/multiarch/munmap-pthread.c + * is an example that can race. + */ + Int128 r; + + mmap_lock(); + if (page_get_flags(h2g(p)) & PAGE_WRITE_ORG) { + r = atomic16_read_rw(p); + } else { + r = *p; + } + mmap_unlock(); + return r; +#else + /* In system mode all guest pages are host writable. */ return atomic16_read_rw(p); +#endif } /* Ultimate fallback: re-execute in serial context. */
For user-only, the probe for page writability may race with another thread's mprotect. Take the mmap_lock around the operation. This is still faster than the start/end_exclusive fallback. Remove the write probe in load_atomic8_or_exit. There we don't have the same machinery for testing the existance of an 8-byte cmpxchg. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- accel/tcg/ldst_atomicity.c.inc | 54 +++++++++++++++------------------- 1 file changed, 24 insertions(+), 30 deletions(-)