Message ID | 20161202173454.19179-1-alex.bennee@linaro.org |
---|---|
State | Superseded |
Headers | show |
Alex Bennée <alex.bennee@linaro.org> writes: > While testing rth's latest TCG patches with risu I found ldaxp was > broken. Investigating further I found it was broken by 1dd089d0 when > the cmpxchg atomic work was merged. CC'ing Paolo/Richard Do you guys have any final TCG fixes planned for 2.8 that can take this fix for the ldaxp regression? > As part of that change the code > attempted to be clever by doing a single 64 bit load and then shuffle > the data around to set the two 32 bit registers. > > As I couldn't quite follow the endian magic I've simply partially > reverted the change to the original code gen_load_exclusive code. This > doesn't affect the cmpxchg functionality as that is all done on in > gen_store_exclusive part which is untouched. > > I've also restored the comment that was removed (with a slight tweak > to mention cmpxchg). > > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> > --- > target-arm/translate-a64.c | 42 +++++++++++++++++++----------------------- > 1 file changed, 19 insertions(+), 23 deletions(-) > > diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c > index de48747..6dc27a6 100644 > --- a/target-arm/translate-a64.c > +++ b/target-arm/translate-a64.c > @@ -1839,41 +1839,37 @@ static void disas_b_exc_sys(DisasContext *s, uint32_t insn) > } > } > > +/* > + * Load/Store exclusive instructions are implemented by remembering > + * the value/address loaded, and seeing if these are the same > + * when the store is performed. This is not actually the architecturally > + * mandated semantics, but it works for typical guest code sequences > + * and avoids having to monitor regular stores. > + * > + * The store exclusive uses the atomic cmpxchg primitives to avoid > + * races in multi-threaded linux-user and when MTTCG softmmu is > + * enabled. > + */ > static void gen_load_exclusive(DisasContext *s, int rt, int rt2, > TCGv_i64 addr, int size, bool is_pair) > { > TCGv_i64 tmp = tcg_temp_new_i64(); > - TCGMemOp be = s->be_data; > + TCGMemOp memop = s->be_data + size; > > g_assert(size <= 3); > + tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), memop); > + > if (is_pair) { > + TCGv_i64 addr2 = tcg_temp_new_i64(); > TCGv_i64 hitmp = tcg_temp_new_i64(); > > - if (size == 3) { > - TCGv_i64 addr2 = tcg_temp_new_i64(); > - > - tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), > - MO_64 | MO_ALIGN_16 | be); > - tcg_gen_addi_i64(addr2, addr, 8); > - tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s), > - MO_64 | MO_ALIGN | be); > - tcg_temp_free_i64(addr2); > - } else { > - g_assert(size == 2); > - tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), > - MO_64 | MO_ALIGN | be); > - if (be == MO_LE) { > - tcg_gen_extr32_i64(tmp, hitmp, tmp); > - } else { > - tcg_gen_extr32_i64(hitmp, tmp, tmp); > - } > - } > - > + g_assert(size >= 2); > + tcg_gen_addi_i64(addr2, addr, 1 << size); > + tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s), memop); > + tcg_temp_free_i64(addr2); > tcg_gen_mov_i64(cpu_exclusive_high, hitmp); > tcg_gen_mov_i64(cpu_reg(s, rt2), hitmp); > tcg_temp_free_i64(hitmp); > - } else { > - tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), size | MO_ALIGN | be); > } > > tcg_gen_mov_i64(cpu_exclusive_val, tmp); -- Alex Bennée
On 12/05/2016 03:09 AM, Alex Bennée wrote: > > Alex Bennée <alex.bennee@linaro.org> writes: > >> While testing rth's latest TCG patches with risu I found ldaxp was >> broken. Investigating further I found it was broken by 1dd089d0 when >> the cmpxchg atomic work was merged. > > CC'ing Paolo/Richard > > Do you guys have any final TCG fixes planned for 2.8 that can take this > fix for the ldaxp regression? I don't have any pending patchs for 2.8, no. >> As part of that change the code >> attempted to be clever by doing a single 64 bit load and then shuffle >> the data around to set the two 32 bit registers. It's not trying to be "clever", it's trying to be correct, giving an atomic 64-bit load. The fact that we didn't attempt an atomic 128-bit load here for the 64-bit pair is a bug. We have support for that in helper_atomic_ldo_le_mmu; I'm not sure why I didn't use that here. >> As I couldn't quite follow the endian magic I've simply partially >> reverted the change to the original code gen_load_exclusive code. This >> doesn't affect the cmpxchg functionality as that is all done on in >> gen_store_exclusive part which is untouched. We'll have to fix it properly eventually. But perhaps 2.9 is soon enough, since that's when mttcg will go in. r~
On 5 December 2016 at 15:42, Richard Henderson <rth@twiddle.net> wrote: > We'll have to fix it properly eventually. But perhaps 2.9 is soon enough, > since that's when mttcg will go in. Could you provide a reviewed-by (or an explicit nak) for Alex's patch, please? I didn't look at this area of the code when it went in so I'm not really in position to review it even if it goes via my tree. thanks -- PMM
Richard Henderson <rth@twiddle.net> writes: > On 12/05/2016 03:09 AM, Alex Bennée wrote: >> >> Alex Bennée <alex.bennee@linaro.org> writes: >> >>> While testing rth's latest TCG patches with risu I found ldaxp was >>> broken. Investigating further I found it was broken by 1dd089d0 when >>> the cmpxchg atomic work was merged. >> >> CC'ing Paolo/Richard >> >> Do you guys have any final TCG fixes planned for 2.8 that can take this >> fix for the ldaxp regression? > > I don't have any pending patchs for 2.8, no. > >>> As part of that change the code >>> attempted to be clever by doing a single 64 bit load and then shuffle >>> the data around to set the two 32 bit registers. > > It's not trying to be "clever", it's trying to be correct, giving an atomic > 64-bit load. Ahh right I see. What happens if the backend is 32bit, will it issue two loads anyway? > > The fact that we didn't attempt an atomic 128-bit load here for the 64-bit pair > is a bug. We have support for that in helper_atomic_ldo_le_mmu; I'm not sure > why I didn't use that here. > >>> As I couldn't quite follow the endian magic I've simply partially >>> reverted the change to the original code gen_load_exclusive code. This >>> doesn't affect the cmpxchg functionality as that is all done on in >>> gen_store_exclusive part which is untouched. > > We'll have to fix it properly eventually. But perhaps 2.9 is soon enough, > since that's when mttcg will go in. I guess the easiest way would be to punt these paired loads to a helper? However regardless of the atomicity the actual values loaded into the registers are wrong so that behaviour is a regression vs 2.7. I don't know how often these load-exclusive paired operations are used in real code but I think we should at least fix it so the values are loaded properly for 2.8 and do the proper atomic fix for 2.9. > > > r~ -- Alex Bennée
On 12/05/2016 09:04 AM, Alex Bennée wrote: >> It's not trying to be "clever", it's trying to be correct, giving an atomic >> 64-bit load. > > Ahh right I see. What happens if the backend is 32bit, will it issue two > loads anyway? Yes. I did bring this up when the atomics patch set went in. In principal there's no reason we can't handle this like the 128-bit path, with a special helper for a 32-bit host using __atomic_load_64 if available, which it would be for i686 (via fpu) and armv7 (via ldrexd), or gen_helper_exit_atomic if not. But it's nasty and slow, and work that's unnecessary if we simply decide that 32-bit hosts cannot run mttcg. > I don't know how often these load-exclusive paired operations are used > in real code but I think we should at least fix it so the values are > loaded properly for 2.8 and do the proper atomic fix for 2.9. Yep. r~
On 12/05/2016 03:09 AM, Alex Bennée wrote: > > Alex Bennée <alex.bennee@linaro.org> writes: > >> While testing rth's latest TCG patches with risu I found ldaxp was >> broken. Investigating further I found it was broken by 1dd089d0 when >> the cmpxchg atomic work was merged. > > CC'ing Paolo/Richard > > Do you guys have any final TCG fixes planned for 2.8 that can take this > fix for the ldaxp regression? > >> As part of that change the code >> attempted to be clever by doing a single 64 bit load and then shuffle >> the data around to set the two 32 bit registers. >> >> As I couldn't quite follow the endian magic I've simply partially >> reverted the change to the original code gen_load_exclusive code. This >> doesn't affect the cmpxchg functionality as that is all done on in >> gen_store_exclusive part which is untouched. >> >> I've also restored the comment that was removed (with a slight tweak >> to mention cmpxchg). >> >> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> >> --- >> target-arm/translate-a64.c | 42 +++++++++++++++++++----------------------- >> 1 file changed, 19 insertions(+), 23 deletions(-) >> >> diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c >> index de48747..6dc27a6 100644 >> --- a/target-arm/translate-a64.c >> +++ b/target-arm/translate-a64.c >> @@ -1839,41 +1839,37 @@ static void disas_b_exc_sys(DisasContext *s, uint32_t insn) >> } >> } >> >> +/* >> + * Load/Store exclusive instructions are implemented by remembering >> + * the value/address loaded, and seeing if these are the same >> + * when the store is performed. This is not actually the architecturally >> + * mandated semantics, but it works for typical guest code sequences >> + * and avoids having to monitor regular stores. >> + * >> + * The store exclusive uses the atomic cmpxchg primitives to avoid >> + * races in multi-threaded linux-user and when MTTCG softmmu is >> + * enabled. >> + */ >> static void gen_load_exclusive(DisasContext *s, int rt, int rt2, >> TCGv_i64 addr, int size, bool is_pair) >> { >> TCGv_i64 tmp = tcg_temp_new_i64(); >> - TCGMemOp be = s->be_data; >> + TCGMemOp memop = s->be_data + size; >> >> g_assert(size <= 3); >> + tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), memop); >> + >> if (is_pair) { >> + TCGv_i64 addr2 = tcg_temp_new_i64(); >> TCGv_i64 hitmp = tcg_temp_new_i64(); >> >> - if (size == 3) { >> - TCGv_i64 addr2 = tcg_temp_new_i64(); >> - >> - tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), >> - MO_64 | MO_ALIGN_16 | be); >> - tcg_gen_addi_i64(addr2, addr, 8); >> - tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s), >> - MO_64 | MO_ALIGN | be); >> - tcg_temp_free_i64(addr2); >> - } else { >> - g_assert(size == 2); >> - tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), >> - MO_64 | MO_ALIGN | be); >> - if (be == MO_LE) { >> - tcg_gen_extr32_i64(tmp, hitmp, tmp); >> - } else { >> - tcg_gen_extr32_i64(hitmp, tmp, tmp); >> - } >> - } >> - >> + g_assert(size >= 2); >> + tcg_gen_addi_i64(addr2, addr, 1 << size); >> + tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s), memop); >> + tcg_temp_free_i64(addr2); >> tcg_gen_mov_i64(cpu_exclusive_high, hitmp); >> tcg_gen_mov_i64(cpu_reg(s, rt2), hitmp); >> tcg_temp_free_i64(hitmp); >> - } else { >> - tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), size | MO_ALIGN | be); >> } >> >> tcg_gen_mov_i64(cpu_exclusive_val, tmp); Acked-by: Richard Henderson <rth@twiddle.net> r~
On 5 December 2016 at 17:22, Richard Henderson <rth@twiddle.net> wrote:
> Acked-by: Richard Henderson <rth@twiddle.net>
Thanks. I've put this into a one-patch pullrequest which I've just
sent out and which will hopefully make -rc3.
thanks
-- PMM
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c index de48747..6dc27a6 100644 --- a/target-arm/translate-a64.c +++ b/target-arm/translate-a64.c @@ -1839,41 +1839,37 @@ static void disas_b_exc_sys(DisasContext *s, uint32_t insn) } } +/* + * Load/Store exclusive instructions are implemented by remembering + * the value/address loaded, and seeing if these are the same + * when the store is performed. This is not actually the architecturally + * mandated semantics, but it works for typical guest code sequences + * and avoids having to monitor regular stores. + * + * The store exclusive uses the atomic cmpxchg primitives to avoid + * races in multi-threaded linux-user and when MTTCG softmmu is + * enabled. + */ static void gen_load_exclusive(DisasContext *s, int rt, int rt2, TCGv_i64 addr, int size, bool is_pair) { TCGv_i64 tmp = tcg_temp_new_i64(); - TCGMemOp be = s->be_data; + TCGMemOp memop = s->be_data + size; g_assert(size <= 3); + tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), memop); + if (is_pair) { + TCGv_i64 addr2 = tcg_temp_new_i64(); TCGv_i64 hitmp = tcg_temp_new_i64(); - if (size == 3) { - TCGv_i64 addr2 = tcg_temp_new_i64(); - - tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), - MO_64 | MO_ALIGN_16 | be); - tcg_gen_addi_i64(addr2, addr, 8); - tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s), - MO_64 | MO_ALIGN | be); - tcg_temp_free_i64(addr2); - } else { - g_assert(size == 2); - tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), - MO_64 | MO_ALIGN | be); - if (be == MO_LE) { - tcg_gen_extr32_i64(tmp, hitmp, tmp); - } else { - tcg_gen_extr32_i64(hitmp, tmp, tmp); - } - } - + g_assert(size >= 2); + tcg_gen_addi_i64(addr2, addr, 1 << size); + tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s), memop); + tcg_temp_free_i64(addr2); tcg_gen_mov_i64(cpu_exclusive_high, hitmp); tcg_gen_mov_i64(cpu_reg(s, rt2), hitmp); tcg_temp_free_i64(hitmp); - } else { - tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), size | MO_ALIGN | be); } tcg_gen_mov_i64(cpu_exclusive_val, tmp);
While testing rth's latest TCG patches with risu I found ldaxp was broken. Investigating further I found it was broken by 1dd089d0 when the cmpxchg atomic work was merged. As part of that change the code attempted to be clever by doing a single 64 bit load and then shuffle the data around to set the two 32 bit registers. As I couldn't quite follow the endian magic I've simply partially reverted the change to the original code gen_load_exclusive code. This doesn't affect the cmpxchg functionality as that is all done on in gen_store_exclusive part which is untouched. I've also restored the comment that was removed (with a slight tweak to mention cmpxchg). Signed-off-by: Alex Bennée <alex.bennee@linaro.org> --- target-arm/translate-a64.c | 42 +++++++++++++++++++----------------------- 1 file changed, 19 insertions(+), 23 deletions(-) -- 2.10.2