diff mbox

[ARM,2/7,ping3] Adapt atomic and exclusive load and store to ARMv8-M Baseline

Message ID 0610c815-204b-bd03-a4bf-e456577c6fae@foss.arm.com
State New
Headers show

Commit Message

Thomas Preudhomme Oct. 24, 2016, 8:04 a.m. UTC
Ping?

Best regards,

Thomas

On 14/10/16 14:48, Thomas Preudhomme wrote:
> Ping?

>

> Best regards,

>

> Thomas

>

> On 03/10/16 17:42, Thomas Preudhomme wrote:

>> Ping?

>>

>> Best regards,

>>

>> Thomas

>>

>> On 22/09/16 14:41, Thomas Preudhomme wrote:

>>> Hi,

>>>

>>> This patch is part of a patch series to add support for atomic operations on

>>> ARMv8-M Baseline targets in GCC. This specific patch adapts atomic and exclusive

>>> load and store patterns to the constraints of ARMv8-M Baseline. It consists of

>>> two sets of changes:

>>>

>>> - adding non predicated output templates because ARMv8-M Baseline does not have

>>> IT instruction

>>> - use low registers for ldr/str

>>>

>>> Together these changes require to create 2 new alternatives for atomic_load and

>>> atomic_store: (i) one for relaxed, consume and release memory model (the new Pf

>>> constraint) where ldr/str are used and thus low registers must be used and (ii)

>>> another one for the other memory model where lda/stl are used. These are

>>> separate from the constraint for 32bit targets whose output templates expect

>>> predication.

>>>

>>> ChangeLog entry is as follows:

>>>

>>> *** gcc/ChangeLog ***

>>>

>>> 2016-07-05  Thomas Preud'homme  <thomas.preudhomme@arm.com>

>>>

>>>         * config/arm/constraints.md (Q constraint): Document its use for

>>>         Thumb-1.

>>>         (Pf constraint): New constraint for relaxed, consume or relaxed memory

>>>         models.

>>>         * config/arm/sync.md (atomic_load<mode>): Add new ARMv8-M Baseline only

>>>         alternatives to allow any register when memory model matches Pf and

>>>         thus lda is used, but only low registers otherwise.  Use unpredicated

>>>         output template for Thumb-1 targets.

>>>         (atomic_store<mode>): Likewise for stl.

>>>         (arm_load_exclusive<mode>): Add new ARMv8-M Baseline only alternative

>>>         whose output template does not have predication.

>>>         (arm_load_acquire_exclusive<mode>): Likewise.

>>>         (arm_load_exclusivesi): Likewise.

>>>         (arm_load_acquire_exclusivesi): Likewise.

>>>         (arm_store_release_exclusive<mode>): Likewise.

>>>         (arm_store_exclusive<mode>): Use unpredicated output template for

>>>         Thumb-1 targets.

>>>

>>>

>>> Testing: No code generation difference for ARMv7-A, ARMv7VE and ARMv8-A on all

>>> atomic and synchronization testcases in the testsuite [2]. Patchset was also

>>> bootstrapped with --enable-itm --enable-gomp on ARMv8-A in ARM and Thumb mode at

>>> optimization level -O1 and above [1] without any regression in the testsuite and

>>> no code generation difference in libitm and libgomp.

>>>

>>> Code generation for ARMv8-M Baseline has been manually examined and compared

>>> against ARMv8-A Thumb-2 for the following configuration without finding any

>>> issue:

>>>

>>> gcc.dg/atomic-op-2.c at -Os

>>> gcc.dg/atomic-compare-exchange-2.c at -Os

>>> gcc.dg/atomic-compare-exchange-3.c at -O3

>>>

>>>

>>> Is this ok for trunk?

>>>

>>> Best regards,

>>>

>>> Thomas

>>>

>>> [1] CFLAGS_FOR_TARGET and CXXFLAGS_FOR_TARGET were set to "-O1 -g", "-O3 -g" and

>>> undefined ("-O2 -g")

>>> [2] The exact list is:

>>>

>>> gcc/testsuite/gcc.dg/atomic-compare-exchange-1.c

>>> gcc/testsuite/gcc.dg/atomic-compare-exchange-2.c

>>> gcc/testsuite/gcc.dg/atomic-compare-exchange-3.c

>>> gcc/testsuite/gcc.dg/atomic-exchange-1.c

>>> gcc/testsuite/gcc.dg/atomic-exchange-2.c

>>> gcc/testsuite/gcc.dg/atomic-exchange-3.c

>>> gcc/testsuite/gcc.dg/atomic-fence.c

>>> gcc/testsuite/gcc.dg/atomic-flag.c

>>> gcc/testsuite/gcc.dg/atomic-generic.c

>>> gcc/testsuite/gcc.dg/atomic-generic-aux.c

>>> gcc/testsuite/gcc.dg/atomic-invalid-2.c

>>> gcc/testsuite/gcc.dg/atomic-load-1.c

>>> gcc/testsuite/gcc.dg/atomic-load-2.c

>>> gcc/testsuite/gcc.dg/atomic-load-3.c

>>> gcc/testsuite/gcc.dg/atomic-lockfree.c

>>> gcc/testsuite/gcc.dg/atomic-lockfree-aux.c

>>> gcc/testsuite/gcc.dg/atomic-noinline.c

>>> gcc/testsuite/gcc.dg/atomic-noinline-aux.c

>>> gcc/testsuite/gcc.dg/atomic-op-1.c

>>> gcc/testsuite/gcc.dg/atomic-op-2.c

>>> gcc/testsuite/gcc.dg/atomic-op-3.c

>>> gcc/testsuite/gcc.dg/atomic-op-6.c

>>> gcc/testsuite/gcc.dg/atomic-store-1.c

>>> gcc/testsuite/gcc.dg/atomic-store-2.c

>>> gcc/testsuite/gcc.dg/atomic-store-3.c

>>> gcc/testsuite/g++.dg/ext/atomic-1.C

>>> gcc/testsuite/g++.dg/ext/atomic-2.C

>>> gcc/testsuite/gcc.target/arm/atomic-comp-swap-release-acquire.c

>>> gcc/testsuite/gcc.target/arm/atomic-op-acq_rel.c

>>> gcc/testsuite/gcc.target/arm/atomic-op-acquire.c

>>> gcc/testsuite/gcc.target/arm/atomic-op-char.c

>>> gcc/testsuite/gcc.target/arm/atomic-op-consume.c

>>> gcc/testsuite/gcc.target/arm/atomic-op-int.c

>>> gcc/testsuite/gcc.target/arm/atomic-op-relaxed.c

>>> gcc/testsuite/gcc.target/arm/atomic-op-release.c

>>> gcc/testsuite/gcc.target/arm/atomic-op-seq_cst.c

>>> gcc/testsuite/gcc.target/arm/atomic-op-short.c

>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_1.c

>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_2.c

>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_3.c

>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_4.c

>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_5.c

>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_6.c

>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c

>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c

>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_9.c

>>> gcc/testsuite/gcc.target/arm/sync-1.c

>>> gcc/testsuite/gcc.target/arm/synchronize.c

>>> gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c

>>> gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c

>>> gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c

>>> gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c

>>> libstdc++-v3/testsuite/29_atomics/atomic/60658.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/62259.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/64658.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/65147.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/65913.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/70766.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/49445.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/constexpr.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/copy_list.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/default.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/direct_list.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/single_value.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/user_pod.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/51811.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/56011.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/integral_assignment.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/integral_conversion.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/pointer_partial_void.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/requirements/base_classes.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic/requirements/compare_exchange_lowering.cc

>>>

>>>

>>> libstdc++-v3/testsuite/29_atomics/atomic/requirements/explicit_instantiation/1.cc

>>>

>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/1.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/56012.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/aggregate.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/default.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/requirements/standard_layout.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/requirements/trivial.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/60940.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/65147.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/constexpr.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/copy_list.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/default.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/direct_list.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/single_value.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/bitwise.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/decrement.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/increment.cc

>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/integral_assignment.cc

>>>

>>>

>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/integral_conversion.cc

>>>

>>>

>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/standard_layout.cc

>>>

>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/trivial.cc

>>> libstdc++-v3/testsuite/29_atomics/headers/atomic/functions_std_c++0x.cc

>>> libstdc++-v3/testsuite/29_atomics/headers/atomic/macros.cc

>>> libstdc++-v3/testsuite/29_atomics/headers/atomic/types_std_c++0x.cc

Comments

Kyrill Tkachov Oct. 24, 2016, 4:40 p.m. UTC | #1
Hi Thomas,

On 24/10/16 09:04, Thomas Preudhomme wrote:
> Ping?

>

> Best regards,

>

> Thomas

>

> On 14/10/16 14:48, Thomas Preudhomme wrote:

>> Ping?

>>

>> Best regards,

>>

>> Thomas

>>

>> On 03/10/16 17:42, Thomas Preudhomme wrote:

>>> Ping?

>>>

>>> Best regards,

>>>

>>> Thomas

>>>

>>> On 22/09/16 14:41, Thomas Preudhomme wrote:

>>>> Hi,

>>>>

>>>> This patch is part of a patch series to add support for atomic operations on

>>>> ARMv8-M Baseline targets in GCC. This specific patch adapts atomic and exclusive

>>>> load and store patterns to the constraints of ARMv8-M Baseline. It consists of

>>>> two sets of changes:

>>>>

>>>> - adding non predicated output templates because ARMv8-M Baseline does not have

>>>> IT instruction

>>>> - use low registers for ldr/str

>>>>

>>>> Together these changes require to create 2 new alternatives for atomic_load and

>>>> atomic_store: (i) one for relaxed, consume and release memory model (the new Pf

>>>> constraint) where ldr/str are used and thus low registers must be used and (ii)

>>>> another one for the other memory model where lda/stl are used. These are

>>>> separate from the constraint for 32bit targets whose output templates expect

>>>> predication.

>>>>

>>>> ChangeLog entry is as follows:

>>>>

>>>> *** gcc/ChangeLog ***

>>>>

>>>> 2016-07-05  Thomas Preud'homme <thomas.preudhomme@arm.com>

>>>>

>>>>         * config/arm/constraints.md (Q constraint): Document its use for

>>>>         Thumb-1.

>>>>         (Pf constraint): New constraint for relaxed, consume or relaxed memory

>>>>         models.

>>>>         * config/arm/sync.md (atomic_load<mode>): Add new ARMv8-M Baseline only

>>>>         alternatives to allow any register when memory model matches Pf and

>>>>         thus lda is used, but only low registers otherwise. Use unpredicated

>>>>         output template for Thumb-1 targets.

>>>>         (atomic_store<mode>): Likewise for stl.

>>>>         (arm_load_exclusive<mode>): Add new ARMv8-M Baseline only alternative

>>>>         whose output template does not have predication.

>>>>         (arm_load_acquire_exclusive<mode>): Likewise.

>>>>         (arm_load_exclusivesi): Likewise.

>>>>         (arm_load_acquire_exclusivesi): Likewise.

>>>>         (arm_store_release_exclusive<mode>): Likewise.

>>>>         (arm_store_exclusive<mode>): Use unpredicated output template for

>>>>         Thumb-1 targets.

>>>>

>>>>

>>>> Testing: No code generation difference for ARMv7-A, ARMv7VE and ARMv8-A on all

>>>> atomic and synchronization testcases in the testsuite [2]. Patchset was also

>>>> bootstrapped with --enable-itm --enable-gomp on ARMv8-A in ARM and Thumb mode at

>>>> optimization level -O1 and above [1] without any regression in the testsuite and

>>>> no code generation difference in libitm and libgomp.

>>>>

>>>> Code generation for ARMv8-M Baseline has been manually examined and compared

>>>> against ARMv8-A Thumb-2 for the following configuration without finding any

>>>> issue:

>>>>

>>>> gcc.dg/atomic-op-2.c at -Os

>>>> gcc.dg/atomic-compare-exchange-2.c at -Os

>>>> gcc.dg/atomic-compare-exchange-3.c at -O3

>>>>

>>>>

>>>> Is this ok for trunk?

>>>>

>>>> Best regards,

>>>>

>>>> Thomas

>>>>

>>>> [1] CFLAGS_FOR_TARGET and CXXFLAGS_FOR_TARGET were set to "-O1 -g", "-O3 -g" and

>>>> undefined ("-O2 -g")

>>>> [2] The exact list is:

>>>>

>>>> gcc/testsuite/gcc.dg/atomic-compare-exchange-1.c

>>>> gcc/testsuite/gcc.dg/atomic-compare-exchange-2.c

>>>> gcc/testsuite/gcc.dg/atomic-compare-exchange-3.c

>>>> gcc/testsuite/gcc.dg/atomic-exchange-1.c

>>>> gcc/testsuite/gcc.dg/atomic-exchange-2.c

>>>> gcc/testsuite/gcc.dg/atomic-exchange-3.c

>>>> gcc/testsuite/gcc.dg/atomic-fence.c

>>>> gcc/testsuite/gcc.dg/atomic-flag.c

>>>> gcc/testsuite/gcc.dg/atomic-generic.c

>>>> gcc/testsuite/gcc.dg/atomic-generic-aux.c

>>>> gcc/testsuite/gcc.dg/atomic-invalid-2.c

>>>> gcc/testsuite/gcc.dg/atomic-load-1.c

>>>> gcc/testsuite/gcc.dg/atomic-load-2.c

>>>> gcc/testsuite/gcc.dg/atomic-load-3.c

>>>> gcc/testsuite/gcc.dg/atomic-lockfree.c

>>>> gcc/testsuite/gcc.dg/atomic-lockfree-aux.c

>>>> gcc/testsuite/gcc.dg/atomic-noinline.c

>>>> gcc/testsuite/gcc.dg/atomic-noinline-aux.c

>>>> gcc/testsuite/gcc.dg/atomic-op-1.c

>>>> gcc/testsuite/gcc.dg/atomic-op-2.c

>>>> gcc/testsuite/gcc.dg/atomic-op-3.c

>>>> gcc/testsuite/gcc.dg/atomic-op-6.c

>>>> gcc/testsuite/gcc.dg/atomic-store-1.c

>>>> gcc/testsuite/gcc.dg/atomic-store-2.c

>>>> gcc/testsuite/gcc.dg/atomic-store-3.c

>>>> gcc/testsuite/g++.dg/ext/atomic-1.C

>>>> gcc/testsuite/g++.dg/ext/atomic-2.C

>>>> gcc/testsuite/gcc.target/arm/atomic-comp-swap-release-acquire.c

>>>> gcc/testsuite/gcc.target/arm/atomic-op-acq_rel.c

>>>> gcc/testsuite/gcc.target/arm/atomic-op-acquire.c

>>>> gcc/testsuite/gcc.target/arm/atomic-op-char.c

>>>> gcc/testsuite/gcc.target/arm/atomic-op-consume.c

>>>> gcc/testsuite/gcc.target/arm/atomic-op-int.c

>>>> gcc/testsuite/gcc.target/arm/atomic-op-relaxed.c

>>>> gcc/testsuite/gcc.target/arm/atomic-op-release.c

>>>> gcc/testsuite/gcc.target/arm/atomic-op-seq_cst.c

>>>> gcc/testsuite/gcc.target/arm/atomic-op-short.c

>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_1.c

>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_2.c

>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_3.c

>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_4.c

>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_5.c

>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_6.c

>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c

>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c

>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_9.c

>>>> gcc/testsuite/gcc.target/arm/sync-1.c

>>>> gcc/testsuite/gcc.target/arm/synchronize.c

>>>> gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c

>>>> gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c

>>>> gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c

>>>> gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c

>>>> libstdc++-v3/testsuite/29_atomics/atomic/60658.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/62259.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/64658.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/65147.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/65913.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/70766.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/49445.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/constexpr.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/copy_list.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/default.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/direct_list.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/single_value.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/user_pod.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/51811.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/56011.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/integral_assignment.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/integral_conversion.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/pointer_partial_void.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/requirements/base_classes.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic/requirements/compare_exchange_lowering.cc

>>>>

>>>>

>>>> libstdc++-v3/testsuite/29_atomics/atomic/requirements/explicit_instantiation/1.cc

>>>>

>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/1.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/56012.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/aggregate.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/default.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/requirements/standard_layout.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/requirements/trivial.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/60940.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/65147.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/constexpr.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/copy_list.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/default.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/direct_list.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/single_value.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/bitwise.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/decrement.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/increment.cc

>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/integral_assignment.cc

>>>>

>>>>

>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/integral_conversion.cc

>>>>

>>>>

>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/standard_layout.cc

>>>>

>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/trivial.cc

>>>> libstdc++-v3/testsuite/29_atomics/headers/atomic/functions_std_c++0x.cc

>>>> libstdc++-v3/testsuite/29_atomics/headers/atomic/macros.cc

>>>> libstdc++-v3/testsuite/29_atomics/headers/atomic/types_std_c++0x.cc

<snip>

      else
-      return \"lda<sync_sfx>%?\\t%0, %1\";
+      {
+	if (TARGET_THUMB1)
+	  return \"lda<sync_sfx>\\t%0, %1\";
+	else
+	  return \"lda<sync_sfx>%?\\t%0, %1\";
+      }
    }
-  [(set_attr "predicable" "yes")
+  [(set_attr "arch" "32,v8mb,any")
+   (set_attr "predicable" "yes")
     (set_attr "predicable_short_it" "no")])


Please set the predicable attribute to "no" for the v8mb alternative.
It wouldn't change any functionality as the ifcvt pass for conditional execution
won't run for ARMv8-M Baseline but it's better to be explicit for documentation purposes.
Same for the other patterns where you add new v8mb alternatives.

Ok with that change.
Sorry for the delay,
Kyrill
Thomas Preudhomme Oct. 24, 2016, 5:01 p.m. UTC | #2
Hi Kyrill,

On 24/10/16 17:40, Kyrill Tkachov wrote:
> Hi Thomas,

>

> On 24/10/16 09:04, Thomas Preudhomme wrote:

>> Ping?

>>

>> Best regards,

>>

>> Thomas

>>

>> On 14/10/16 14:48, Thomas Preudhomme wrote:

>>> Ping?

>>>

>>> Best regards,

>>>

>>> Thomas

>>>

>>> On 03/10/16 17:42, Thomas Preudhomme wrote:

>>>> Ping?

>>>>

>>>> Best regards,

>>>>

>>>> Thomas

>>>>

>>>> On 22/09/16 14:41, Thomas Preudhomme wrote:

>>>>> Hi,

>>>>>

>>>>> This patch is part of a patch series to add support for atomic operations on

>>>>> ARMv8-M Baseline targets in GCC. This specific patch adapts atomic and

>>>>> exclusive

>>>>> load and store patterns to the constraints of ARMv8-M Baseline. It consists of

>>>>> two sets of changes:

>>>>>

>>>>> - adding non predicated output templates because ARMv8-M Baseline does not

>>>>> have

>>>>> IT instruction

>>>>> - use low registers for ldr/str

>>>>>

>>>>> Together these changes require to create 2 new alternatives for atomic_load

>>>>> and

>>>>> atomic_store: (i) one for relaxed, consume and release memory model (the

>>>>> new Pf

>>>>> constraint) where ldr/str are used and thus low registers must be used and

>>>>> (ii)

>>>>> another one for the other memory model where lda/stl are used. These are

>>>>> separate from the constraint for 32bit targets whose output templates expect

>>>>> predication.

>>>>>

>>>>> ChangeLog entry is as follows:

>>>>>

>>>>> *** gcc/ChangeLog ***

>>>>>

>>>>> 2016-07-05  Thomas Preud'homme <thomas.preudhomme@arm.com>

>>>>>

>>>>>         * config/arm/constraints.md (Q constraint): Document its use for

>>>>>         Thumb-1.

>>>>>         (Pf constraint): New constraint for relaxed, consume or relaxed memory

>>>>>         models.

>>>>>         * config/arm/sync.md (atomic_load<mode>): Add new ARMv8-M Baseline

>>>>> only

>>>>>         alternatives to allow any register when memory model matches Pf and

>>>>>         thus lda is used, but only low registers otherwise. Use unpredicated

>>>>>         output template for Thumb-1 targets.

>>>>>         (atomic_store<mode>): Likewise for stl.

>>>>>         (arm_load_exclusive<mode>): Add new ARMv8-M Baseline only alternative

>>>>>         whose output template does not have predication.

>>>>>         (arm_load_acquire_exclusive<mode>): Likewise.

>>>>>         (arm_load_exclusivesi): Likewise.

>>>>>         (arm_load_acquire_exclusivesi): Likewise.

>>>>>         (arm_store_release_exclusive<mode>): Likewise.

>>>>>         (arm_store_exclusive<mode>): Use unpredicated output template for

>>>>>         Thumb-1 targets.

>>>>>

>>>>>

>>>>> Testing: No code generation difference for ARMv7-A, ARMv7VE and ARMv8-A on all

>>>>> atomic and synchronization testcases in the testsuite [2]. Patchset was also

>>>>> bootstrapped with --enable-itm --enable-gomp on ARMv8-A in ARM and Thumb

>>>>> mode at

>>>>> optimization level -O1 and above [1] without any regression in the

>>>>> testsuite and

>>>>> no code generation difference in libitm and libgomp.

>>>>>

>>>>> Code generation for ARMv8-M Baseline has been manually examined and compared

>>>>> against ARMv8-A Thumb-2 for the following configuration without finding any

>>>>> issue:

>>>>>

>>>>> gcc.dg/atomic-op-2.c at -Os

>>>>> gcc.dg/atomic-compare-exchange-2.c at -Os

>>>>> gcc.dg/atomic-compare-exchange-3.c at -O3

>>>>>

>>>>>

>>>>> Is this ok for trunk?

>>>>>

>>>>> Best regards,

>>>>>

>>>>> Thomas

>>>>>

>>>>> [1] CFLAGS_FOR_TARGET and CXXFLAGS_FOR_TARGET were set to "-O1 -g", "-O3

>>>>> -g" and

>>>>> undefined ("-O2 -g")

>>>>> [2] The exact list is:

>>>>>

>>>>> gcc/testsuite/gcc.dg/atomic-compare-exchange-1.c

>>>>> gcc/testsuite/gcc.dg/atomic-compare-exchange-2.c

>>>>> gcc/testsuite/gcc.dg/atomic-compare-exchange-3.c

>>>>> gcc/testsuite/gcc.dg/atomic-exchange-1.c

>>>>> gcc/testsuite/gcc.dg/atomic-exchange-2.c

>>>>> gcc/testsuite/gcc.dg/atomic-exchange-3.c

>>>>> gcc/testsuite/gcc.dg/atomic-fence.c

>>>>> gcc/testsuite/gcc.dg/atomic-flag.c

>>>>> gcc/testsuite/gcc.dg/atomic-generic.c

>>>>> gcc/testsuite/gcc.dg/atomic-generic-aux.c

>>>>> gcc/testsuite/gcc.dg/atomic-invalid-2.c

>>>>> gcc/testsuite/gcc.dg/atomic-load-1.c

>>>>> gcc/testsuite/gcc.dg/atomic-load-2.c

>>>>> gcc/testsuite/gcc.dg/atomic-load-3.c

>>>>> gcc/testsuite/gcc.dg/atomic-lockfree.c

>>>>> gcc/testsuite/gcc.dg/atomic-lockfree-aux.c

>>>>> gcc/testsuite/gcc.dg/atomic-noinline.c

>>>>> gcc/testsuite/gcc.dg/atomic-noinline-aux.c

>>>>> gcc/testsuite/gcc.dg/atomic-op-1.c

>>>>> gcc/testsuite/gcc.dg/atomic-op-2.c

>>>>> gcc/testsuite/gcc.dg/atomic-op-3.c

>>>>> gcc/testsuite/gcc.dg/atomic-op-6.c

>>>>> gcc/testsuite/gcc.dg/atomic-store-1.c

>>>>> gcc/testsuite/gcc.dg/atomic-store-2.c

>>>>> gcc/testsuite/gcc.dg/atomic-store-3.c

>>>>> gcc/testsuite/g++.dg/ext/atomic-1.C

>>>>> gcc/testsuite/g++.dg/ext/atomic-2.C

>>>>> gcc/testsuite/gcc.target/arm/atomic-comp-swap-release-acquire.c

>>>>> gcc/testsuite/gcc.target/arm/atomic-op-acq_rel.c

>>>>> gcc/testsuite/gcc.target/arm/atomic-op-acquire.c

>>>>> gcc/testsuite/gcc.target/arm/atomic-op-char.c

>>>>> gcc/testsuite/gcc.target/arm/atomic-op-consume.c

>>>>> gcc/testsuite/gcc.target/arm/atomic-op-int.c

>>>>> gcc/testsuite/gcc.target/arm/atomic-op-relaxed.c

>>>>> gcc/testsuite/gcc.target/arm/atomic-op-release.c

>>>>> gcc/testsuite/gcc.target/arm/atomic-op-seq_cst.c

>>>>> gcc/testsuite/gcc.target/arm/atomic-op-short.c

>>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_1.c

>>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_2.c

>>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_3.c

>>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_4.c

>>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_5.c

>>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_6.c

>>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c

>>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c

>>>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_9.c

>>>>> gcc/testsuite/gcc.target/arm/sync-1.c

>>>>> gcc/testsuite/gcc.target/arm/synchronize.c

>>>>> gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c

>>>>> gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c

>>>>> gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c

>>>>> gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/60658.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/62259.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/64658.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/65147.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/65913.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/70766.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/49445.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/constexpr.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/copy_list.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/default.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/direct_list.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/single_value.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/user_pod.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/51811.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/56011.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/integral_assignment.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/integral_conversion.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/pointer_partial_void.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/requirements/base_classes.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/requirements/compare_exchange_lowering.cc

>>>>>

>>>>>

>>>>>

>>>>> libstdc++-v3/testsuite/29_atomics/atomic/requirements/explicit_instantiation/1.cc

>>>>>

>>>>>

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/1.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/56012.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/aggregate.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/default.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/requirements/standard_layout.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/requirements/trivial.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/60940.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/65147.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/constexpr.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/copy_list.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/default.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/direct_list.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/single_value.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/bitwise.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/decrement.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/increment.cc

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/integral_assignment.cc

>>>>>

>>>>>

>>>>>

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/integral_conversion.cc

>>>>>

>>>>>

>>>>>

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/standard_layout.cc

>>>>>

>>>>>

>>>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/trivial.cc

>>>>> libstdc++-v3/testsuite/29_atomics/headers/atomic/functions_std_c++0x.cc

>>>>> libstdc++-v3/testsuite/29_atomics/headers/atomic/macros.cc

>>>>> libstdc++-v3/testsuite/29_atomics/headers/atomic/types_std_c++0x.cc

> <snip>

>

>      else

> -      return \"lda<sync_sfx>%?\\t%0, %1\";

> +      {

> +    if (TARGET_THUMB1)

> +      return \"lda<sync_sfx>\\t%0, %1\";

> +    else

> +      return \"lda<sync_sfx>%?\\t%0, %1\";

> +      }

>    }

> -  [(set_attr "predicable" "yes")

> +  [(set_attr "arch" "32,v8mb,any")

> +   (set_attr "predicable" "yes")

>     (set_attr "predicable_short_it" "no")])

>

>

> Please set the predicable attribute to "no" for the v8mb alternative.

> It wouldn't change any functionality as the ifcvt pass for conditional execution

> won't run for ARMv8-M Baseline but it's better to be explicit for documentation

> purposes.

> Same for the other patterns where you add new v8mb alternatives.


predicable cannot be set on a per architecture basis which is why I kept it this 
way. See SET_ATTR_ALTERNATIVE case in is_predicable function in gensupport.c

Best regards,

Thomas

>

> Ok with that change.


Ok without that then?

Best regards,

Thomas
Kyrylo Tkachov Oct. 24, 2016, 8:18 p.m. UTC | #3
>

> Please set the predicable attribute to "no" for the v8mb alternative.

> It wouldn't change any functionality as the ifcvt pass for conditional execution

> won't run for ARMv8-M Baseline but it's better to be explicit for documentation

> purposes.

> Same for the other patterns where you add new v8mb alternatives.


predicable cannot be set on a per architecture basis which is why I kept it this 
way. See SET_ATTR_ALTERNATIVE case in is_predicable function in gensupport.c

Best regards,

Thomas

>

> Ok with that change.


Ok without that then?


You're right.
This is ok then,
Kyrill
diff mbox

Patch

diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 4ece5f013c92adee04157b5c909e1d47c894c994..65098ceeb1a66174b345bcfb0688152f9f137150 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -34,11 +34,13 @@ 
 ;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, Dl, DL, Do, Dv, Dy, Di, Dt, Dp, Dz
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
 ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py
+;; in all states: Pf
 
 ;; The following memory constraints have been used:
-;; in ARM/Thumb-2 state: Q, Uh, Ut, Uv, Uy, Un, Um, Us
+;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us
 ;; in ARM state: Uq
 ;; in Thumb state: Uu, Uw
+;; in all states: Q
 
 
 (define_register_constraint "t" "TARGET_32BIT ? VFP_LO_REGS : NO_REGS"
@@ -180,6 +182,13 @@ 
   (and (match_code "const_int")
        (match_test "TARGET_THUMB1 && ival >= 256 && ival <= 510")))
 
+(define_constraint "Pf"
+  "Memory models except relaxed, consume or release ones."
+  (and (match_code "const_int")
+       (match_test "!is_mm_relaxed (memmodel_from_int (ival))
+		    && !is_mm_consume (memmodel_from_int (ival))
+		    && !is_mm_release (memmodel_from_int (ival))")))
+
 (define_constraint "Ps"
   "@internal In Thumb-2 state a constant in the range -255 to +255"
   (and (match_code "const_int")
@@ -407,7 +416,7 @@ 
 
 (define_memory_constraint "Q"
  "@internal
-  In ARM/Thumb-2 state an address that is a single base register."
+  An address that is a single base register."
  (and (match_code "mem")
       (match_test "REG_P (XEXP (op, 0))")))
 
diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
index d10ede4175f94e627a23bf32d19d2b5f3de76771..d36c24f76f670d7602f766d7172286504faa7af5 100644
--- a/gcc/config/arm/sync.md
+++ b/gcc/config/arm/sync.md
@@ -63,37 +63,59 @@ 
    (set_attr "predicable" "no")])
 
 (define_insn "atomic_load<mode>"
-  [(set (match_operand:QHSI 0 "register_operand" "=r")
+  [(set (match_operand:QHSI 0 "register_operand" "=r,r,l")
     (unspec_volatile:QHSI
-      [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q")
-       (match_operand:SI 2 "const_int_operand")]		;; model
+      [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q,Q,Q")
+       (match_operand:SI 2 "const_int_operand" "n,Pf,n")]	;; model
       VUNSPEC_LDA))]
   "TARGET_HAVE_LDACQ"
   {
     enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
     if (is_mm_relaxed (model) || is_mm_consume (model) || is_mm_release (model))
-      return \"ldr<sync_sfx>%?\\t%0, %1\";
+      {
+	if (TARGET_THUMB1)
+	  return \"ldr<sync_sfx>\\t%0, %1\";
+	else
+	  return \"ldr<sync_sfx>%?\\t%0, %1\";
+      }
     else
-      return \"lda<sync_sfx>%?\\t%0, %1\";
+      {
+	if (TARGET_THUMB1)
+	  return \"lda<sync_sfx>\\t%0, %1\";
+	else
+	  return \"lda<sync_sfx>%?\\t%0, %1\";
+      }
   }
-  [(set_attr "predicable" "yes")
+  [(set_attr "arch" "32,v8mb,any")
+   (set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")])
 
 (define_insn "atomic_store<mode>"
-  [(set (match_operand:QHSI 0 "memory_operand" "=Q")
+  [(set (match_operand:QHSI 0 "memory_operand" "=Q,Q,Q")
     (unspec_volatile:QHSI
-      [(match_operand:QHSI 1 "general_operand" "r")
-       (match_operand:SI 2 "const_int_operand")]		;; model
+      [(match_operand:QHSI 1 "general_operand" "r,r,l")
+       (match_operand:SI 2 "const_int_operand" "n,Pf,n")]	;; model
       VUNSPEC_STL))]
   "TARGET_HAVE_LDACQ"
   {
     enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
     if (is_mm_relaxed (model) || is_mm_consume (model) || is_mm_acquire (model))
-      return \"str<sync_sfx>%?\t%1, %0\";
+      {
+	if (TARGET_THUMB1)
+	  return \"str<sync_sfx>\t%1, %0\";
+	else
+	  return \"str<sync_sfx>%?\t%1, %0\";
+      }
     else
-      return \"stl<sync_sfx>%?\t%1, %0\";
+      {
+	if (TARGET_THUMB1)
+	  return \"stl<sync_sfx>\t%1, %0\";
+	else
+	  return \"stl<sync_sfx>%?\t%1, %0\";
+      }
   }
-  [(set_attr "predicable" "yes")
+  [(set_attr "arch" "32,v8mb,any")
+   (set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")])
 
 ;; An LDRD instruction usable by the atomic_loaddi expander on LPAE targets
@@ -380,45 +402,57 @@ 
   })
 
 (define_insn "arm_load_exclusive<mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
         (zero_extend:SI
 	  (unspec_volatile:NARROW
-	    [(match_operand:NARROW 1 "mem_noofs_operand" "Ua")]
+	    [(match_operand:NARROW 1 "mem_noofs_operand" "Ua,Ua")]
 	    VUNSPEC_LL)))]
   "TARGET_HAVE_LDREXBH"
-  "ldrex<sync_sfx>%?\t%0, %C1"
-  [(set_attr "predicable" "yes")
+  "@
+   ldrex<sync_sfx>%?\t%0, %C1
+   ldrex<sync_sfx>\t%0, %C1"
+  [(set_attr "arch" "32,v8mb")
+   (set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")])
 
 (define_insn "arm_load_acquire_exclusive<mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
         (zero_extend:SI
 	  (unspec_volatile:NARROW
-	    [(match_operand:NARROW 1 "mem_noofs_operand" "Ua")]
+	    [(match_operand:NARROW 1 "mem_noofs_operand" "Ua,Ua")]
 	    VUNSPEC_LAX)))]
   "TARGET_HAVE_LDACQ"
-  "ldaex<sync_sfx>%?\\t%0, %C1"
-  [(set_attr "predicable" "yes")
+  "@
+   ldaex<sync_sfx>%?\\t%0, %C1
+   ldaex<sync_sfx>\\t%0, %C1"
+  [(set_attr "arch" "32,v8mb")
+   (set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")])
 
 (define_insn "arm_load_exclusivesi"
-  [(set (match_operand:SI 0 "s_register_operand" "=r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
 	(unspec_volatile:SI
-	  [(match_operand:SI 1 "mem_noofs_operand" "Ua")]
+	  [(match_operand:SI 1 "mem_noofs_operand" "Ua,Ua")]
 	  VUNSPEC_LL))]
   "TARGET_HAVE_LDREX"
-  "ldrex%?\t%0, %C1"
-  [(set_attr "predicable" "yes")
+  "@
+   ldrex%?\t%0, %C1
+   ldrex\t%0, %C1"
+  [(set_attr "arch" "32,v8mb")
+   (set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")])
 
 (define_insn "arm_load_acquire_exclusivesi"
-  [(set (match_operand:SI 0 "s_register_operand" "=r")
+  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
 	(unspec_volatile:SI
-	  [(match_operand:SI 1 "mem_noofs_operand" "Ua")]
+	  [(match_operand:SI 1 "mem_noofs_operand" "Ua,Ua")]
 	  VUNSPEC_LAX))]
   "TARGET_HAVE_LDACQ"
-  "ldaex%?\t%0, %C1"
-  [(set_attr "predicable" "yes")
+  "@
+   ldaex%?\t%0, %C1
+   ldaex\t%0, %C1"
+  [(set_attr "arch" "32,v8mb")
+   (set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")])
 
 (define_insn "arm_load_exclusivedi"
@@ -460,7 +494,10 @@ 
 	gcc_assert ((REGNO (operands[2]) & 1) == 0 || TARGET_THUMB2);
 	return "strexd%?\t%0, %2, %H2, %C1";
       }
-    return "strex<sync_sfx>%?\t%0, %2, %C1";
+    if (TARGET_THUMB1)
+      return "strex<sync_sfx>\t%0, %2, %C1";
+    else
+      return "strex<sync_sfx>%?\t%0, %2, %C1";
   }
   [(set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")])
@@ -482,13 +519,16 @@ 
    (set_attr "predicable_short_it" "no")])
 
 (define_insn "arm_store_release_exclusive<mode>"
-  [(set (match_operand:SI 0 "s_register_operand" "=&r")
+  [(set (match_operand:SI 0 "s_register_operand" "=&r,&r")
 	(unspec_volatile:SI [(const_int 0)] VUNSPEC_SLX))
-   (set (match_operand:QHSI 1 "mem_noofs_operand" "=Ua")
+   (set (match_operand:QHSI 1 "mem_noofs_operand" "=Ua,Ua")
 	(unspec_volatile:QHSI
-	  [(match_operand:QHSI 2 "s_register_operand" "r")]
+	  [(match_operand:QHSI 2 "s_register_operand" "r,r")]
 	  VUNSPEC_SLX))]
   "TARGET_HAVE_LDACQ"
-  "stlex<sync_sfx>%?\t%0, %2, %C1"
-  [(set_attr "predicable" "yes")
+  "@
+   stlex<sync_sfx>%?\t%0, %2, %C1
+   stlex<sync_sfx>\t%0, %2, %C1"
+  [(set_attr "arch" "32,v8mb")
+   (set_attr "predicable" "yes")
    (set_attr "predicable_short_it" "no")])