Message ID | 20200522160755.886-19-robert.foley@linaro.org |
---|---|
State | New |
Headers | show |
Series | Add Thread Sanitizer support to QEMU | expand |
On Fri, 22 May 2020 at 17:15, Robert Foley <robert.foley@linaro.org> wrote: > > For example: > WARNING: ThreadSanitizer: data race (pid=11134) > Atomic write of size 4 at 0x7bbc0000e0ac by main thread (mutexes: write M875): > #0 __tsan_atomic32_store <null> (qemu-system-aarch64+0x394d84) > #1 cpu_reset_interrupt hw/core/cpu.c:107:5 (qemu-system-aarch64+0x842f90) > #2 arm_cpu_set_irq target/arm/cpu.c (qemu-system-aarch64+0x615a55) > > Previous read of size 4 at 0x7bbc0000e0ac by thread T7: > #0 arm_cpu_has_work target/arm/cpu.c:78:16 (qemu-system-aarch64+0x6178ba) > #1 cpu_has_work include/hw/core/cpu.h:700:12 (qemu-system-aarch64+0x68be2e) > > Cc: Peter Maydell <peter.maydell@linaro.org> > Cc: Richard Henderson <richard.henderson@linaro.org> > Signed-off-by: Robert Foley <robert.foley@linaro.org> > --- > target/arm/cpu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/target/arm/cpu.c b/target/arm/cpu.c > index 32bec156f2..cdb90582ee 100644 > --- a/target/arm/cpu.c > +++ b/target/arm/cpu.c > @@ -75,7 +75,7 @@ static bool arm_cpu_has_work(CPUState *cs) > ARMCPU *cpu = ARM_CPU(cs); > > return (cpu->power_state != PSCI_OFF) > - && cs->interrupt_request & > + && atomic_read(&cs->interrupt_request) & > (CPU_INTERRUPT_FIQ | CPU_INTERRUPT_HARD > | CPU_INTERRUPT_VFIQ | CPU_INTERRUPT_VIRQ > | CPU_INTERRUPT_EXITTB); Every target's has_work function seems to access cs->interrupt_request without using atomic_read() : why does Arm need to do something special here? More generally, the only place that currently uses atomic_read() on the interrupt_request field is cpu_handle_interrupt(), so if this field needs special precautions to access then a lot of code needs updating. thanks -- PMM
On Fri, 22 May 2020 at 13:44, Peter Maydell <peter.maydell@linaro.org> wrote: > > On Fri, 22 May 2020 at 17:15, Robert Foley <robert.foley@linaro.org> wrote: > > > > For example: > > WARNING: ThreadSanitizer: data race (pid=11134) > > Atomic write of size 4 at 0x7bbc0000e0ac by main thread (mutexes: write M875): > > #0 __tsan_atomic32_store <null> (qemu-system-aarch64+0x394d84) > > #1 cpu_reset_interrupt hw/core/cpu.c:107:5 (qemu-system-aarch64+0x842f90) > > #2 arm_cpu_set_irq target/arm/cpu.c (qemu-system-aarch64+0x615a55) > > > > Previous read of size 4 at 0x7bbc0000e0ac by thread T7: > > #0 arm_cpu_has_work target/arm/cpu.c:78:16 (qemu-system-aarch64+0x6178ba) > > #1 cpu_has_work include/hw/core/cpu.h:700:12 (qemu-system-aarch64+0x68be2e) > > > > Cc: Peter Maydell <peter.maydell@linaro.org> > > Cc: Richard Henderson <richard.henderson@linaro.org> > > Signed-off-by: Robert Foley <robert.foley@linaro.org> > > --- > > target/arm/cpu.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/target/arm/cpu.c b/target/arm/cpu.c > > index 32bec156f2..cdb90582ee 100644 > > --- a/target/arm/cpu.c > > +++ b/target/arm/cpu.c > > @@ -75,7 +75,7 @@ static bool arm_cpu_has_work(CPUState *cs) > > ARMCPU *cpu = ARM_CPU(cs); > > > > return (cpu->power_state != PSCI_OFF) > > - && cs->interrupt_request & > > + && atomic_read(&cs->interrupt_request) & > > (CPU_INTERRUPT_FIQ | CPU_INTERRUPT_HARD > > | CPU_INTERRUPT_VFIQ | CPU_INTERRUPT_VIRQ > > | CPU_INTERRUPT_EXITTB); > > Every target's has_work function seems to access > cs->interrupt_request without using atomic_read() : > why does Arm need to do something special here? > > More generally, the only place that currently > uses atomic_read() on the interrupt_request field > is cpu_handle_interrupt(), so if this field needs > special precautions to access then a lot of code > needs updating. TSan flagged this case as a potential data race. It does not mean necessarily that there is an issue here, just that two threads were accessing the data without TSan detecting the synchronization. TSan gives a few options to silence the warning, such as changing the locking, making it atomic, or adding various types of annotations to tell TSan to ignore it. So in this case we had a few options, such as to change it to atomic or to simply annotate it and silence it. We started our TSan testing using arm, and have been working to iron out the TSan warnings there, and there alone initially. Assuming that we are OK with making this particular change, to silence the TSan warning, then certainly it is a good point that we need to consider changing the other places that access this field, since they will all see similar TSan warnings. Of course if we are not OK with these changes to silence the TSan tool, that's OK too :). In that case we can certainly just add an annotation either in the code or via our suppressions/blacklist and leave the code functionally unchanged. Thanks & Regards, -Rob > > thanks > -- PMM
On Fri, 22 May 2020 at 22:33, Robert Foley <robert.foley@linaro.org> wrote: > On Fri, 22 May 2020 at 13:44, Peter Maydell <peter.maydell@linaro.org> wrote: > > Every target's has_work function seems to access > > cs->interrupt_request without using atomic_read() : > > why does Arm need to do something special here? > > > > More generally, the only place that currently > > uses atomic_read() on the interrupt_request field > > is cpu_handle_interrupt(), so if this field needs > > special precautions to access then a lot of code > > needs updating. > > TSan flagged this case as a potential data race. It does not mean > necessarily that there is an issue here, just that two threads were > accessing the data > without TSan detecting the synchronization. TSan gives a few options > to silence the > warning, such as changing the locking, making it atomic, or adding > various types > of annotations to tell TSan to ignore it. So in this case we had a > few options, such as > to change it to atomic or to simply annotate it and silence it. > > We started our TSan testing using arm, and have been working to iron out the > TSan warnings there, and there alone initially. Assuming that we are OK > with making this particular change, to silence the TSan warning, > then certainly it is a good point that we need to consider changing the > other places that access this field, since they will all see similar > TSan warnings. So is this: (a) a TSan false positive, because we've analysed the use of this struct field and know it's not a race because [details], but which we're choosing to silence in this way (b) an actual race for which the correct fix is to make the accesses atomic because [details] ? Either way, the important part is the analysis which fills in the "[details]" part, which should be in the commit message... thanks -- PMM
On Fri, May 22, 2020 at 23:36:18 +0100, Peter Maydell wrote: > So is this: > (a) a TSan false positive, because we've analysed the use > of this struct field and know it's not a race because > [details], but which we're choosing to silence in this way > (b) an actual race for which the correct fix is to make the > accesses atomic because [details] > > ? It is (b), as shown in the per-cpu lock series. In particular, see these two patches: - [PATCH v9 33/74] cpu: define cpu_interrupt_request helpers https://lists.nongnu.org/archive/html/qemu-devel/2020-05/msg06322.html - [PATCH v9 39/74] arm: convert to cpu_interrupt_request https://lists.nongnu.org/archive/html/qemu-devel/2020-05/msg06328.html Since a more thorough fix is included in that other series, I think this patch should be dropped from this series -- I'll post a reply to patch 00/19 with more details. Thanks, Emilio
On Sat, 23 May 2020 at 13:18, Emilio G. Cota <cota@braap.org> wrote: > > On Fri, May 22, 2020 at 23:36:18 +0100, Peter Maydell wrote: > > So is this: > > (a) a TSan false positive, because we've analysed the use > > of this struct field and know it's not a race because > > [details], but which we're choosing to silence in this way > > (b) an actual race for which the correct fix is to make the > > accesses atomic because [details] > > > > ? > > It is (b), as shown in the per-cpu lock series. In particular, > see these two patches: > - [PATCH v9 33/74] cpu: define cpu_interrupt_request helpers > https://lists.nongnu.org/archive/html/qemu-devel/2020-05/msg06322.html > - [PATCH v9 39/74] arm: convert to cpu_interrupt_request > https://lists.nongnu.org/archive/html/qemu-devel/2020-05/msg06328.html > > Since a more thorough fix is included in that other series, I think this > patch should be dropped from this series -- I'll post a reply to patch > 00/19 with more details. I agree, we will re-focus the patch series a bit and drop this patch from the series. Thanks & Regards, -Rob > Thanks, > > Emilio
diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 32bec156f2..cdb90582ee 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -75,7 +75,7 @@ static bool arm_cpu_has_work(CPUState *cs) ARMCPU *cpu = ARM_CPU(cs); return (cpu->power_state != PSCI_OFF) - && cs->interrupt_request & + && atomic_read(&cs->interrupt_request) & (CPU_INTERRUPT_FIQ | CPU_INTERRUPT_HARD | CPU_INTERRUPT_VFIQ | CPU_INTERRUPT_VIRQ | CPU_INTERRUPT_EXITTB);
For example: WARNING: ThreadSanitizer: data race (pid=11134) Atomic write of size 4 at 0x7bbc0000e0ac by main thread (mutexes: write M875): #0 __tsan_atomic32_store <null> (qemu-system-aarch64+0x394d84) #1 cpu_reset_interrupt hw/core/cpu.c:107:5 (qemu-system-aarch64+0x842f90) #2 arm_cpu_set_irq target/arm/cpu.c (qemu-system-aarch64+0x615a55) Previous read of size 4 at 0x7bbc0000e0ac by thread T7: #0 arm_cpu_has_work target/arm/cpu.c:78:16 (qemu-system-aarch64+0x6178ba) #1 cpu_has_work include/hw/core/cpu.h:700:12 (qemu-system-aarch64+0x68be2e) Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Robert Foley <robert.foley@linaro.org> --- target/arm/cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.17.1