From patchwork Thu Sep 18 19:38:29 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Charles Baylis X-Patchwork-Id: 37608 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-ee0-f70.google.com (mail-ee0-f70.google.com [74.125.83.70]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 428D22054D for ; Thu, 18 Sep 2014 19:42:09 +0000 (UTC) Received: by mail-ee0-f70.google.com with SMTP id c41sf1008883eek.9 for ; Thu, 18 Sep 2014 12:42:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:mailing-list :precedence:list-id:list-unsubscribe:list-archive:list-post :list-help:sender:delivered-to:from:to:subject:date:message-id :in-reply-to:references:x-original-sender :x-original-authentication-results; bh=DIvle3+S2iRXDajUGXm0zkWxnHzr1BHWUa/nMi4zZsE=; b=UTTfVmrdQk0tU9HYe9I5764Kwr9eCk1BF/dcXH0WWNS37+fl8tUC0+Rvp3jULu20F5 zvRxswul8JsLbK/AdFD2MBS1V/lc0v9Q7n+0jVcYqHIJq7f/KtMkvEUUcsDj/BpnQHa4 6UnsX7DgmXh8iEDXHLzyPvozTK0T/PeG+Yj5z7xFqv6V6gUFeegcgaEot4XXb9ReH71o +A7EjexPNRNhU2jge5uo5jNlSEDyQk8uOe0kkitEg7FxhorNDSvB+QroPqjcdI8+eGTM 0F0HDQMsZyiY9KojB2/4nTX4sr6stwiz20w5qhLLj1XhKacBrBnOg6SJJAIRuKvF6ZiJ 0PdA== X-Gm-Message-State: ALoCoQkbVnUrbOsz5qjZmSHnJQGJEiIUgxRM8ciNfKRpAtMlKvSEY23WWs6w+VmGsVXKOhvB6XVY X-Received: by 10.152.4.4 with SMTP id g4mr1169963lag.2.1411069328320; Thu, 18 Sep 2014 12:42:08 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.203.136 with SMTP id kq8ls243854lac.94.gmail; Thu, 18 Sep 2014 12:42:08 -0700 (PDT) X-Received: by 10.112.163.103 with SMTP id yh7mr1737505lbb.73.1411069328088; Thu, 18 Sep 2014 12:42:08 -0700 (PDT) Received: from mail-la0-x231.google.com (mail-la0-x231.google.com [2a00:1450:4010:c03::231]) by mx.google.com with ESMTPS id q8si34210256laj.55.2014.09.18.12.42.08 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 18 Sep 2014 12:42:08 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c03::231 as permitted sender) client-ip=2a00:1450:4010:c03::231; Received: by mail-la0-f49.google.com with SMTP id pn19so1818591lab.22 for ; Thu, 18 Sep 2014 12:42:07 -0700 (PDT) X-Received: by 10.112.75.233 with SMTP id f9mr1871422lbw.102.1411069327920; Thu, 18 Sep 2014 12:42:07 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.130.169 with SMTP id of9csp820932lbb; Thu, 18 Sep 2014 12:42:07 -0700 (PDT) X-Received: by 10.68.106.66 with SMTP id gs2mr8959230pbb.141.1411069326304; Thu, 18 Sep 2014 12:42:06 -0700 (PDT) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id rb5si41264257pab.183.2014.09.18.12.42.05 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 18 Sep 2014 12:42:06 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-378066-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 30706 invoked by alias); 18 Sep 2014 19:41:02 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 30660 invoked by uid 89); 18 Sep 2014 19:41:02 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-pa0-f44.google.com Received: from mail-pa0-f44.google.com (HELO mail-pa0-f44.google.com) (209.85.220.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Thu, 18 Sep 2014 19:40:57 +0000 Received: by mail-pa0-f44.google.com with SMTP id bj1so2235312pad.3 for ; Thu, 18 Sep 2014 12:40:55 -0700 (PDT) X-Received: by 10.68.249.101 with SMTP id yt5mr7920935pbc.156.1411069255868; Thu, 18 Sep 2014 12:40:55 -0700 (PDT) Received: from sale.swisscom.com (70-35-38-154.static.wiline.com. [70.35.38.154]) by mx.google.com with ESMTPSA id f12sm20996103pat.36.2014.09.18.12.40.52 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 18 Sep 2014 12:40:54 -0700 (PDT) From: Charles Baylis To: marcus.shawcroft@arm.com, rearnsha@arm.com, gcc-patches@gcc.gnu.org Subject: [PATCH 4/4] [AARCH64, NEON] Fix unnecessary moves in vst[234]q_* intrinsics Date: Thu, 18 Sep 2014 20:38:29 +0100 Message-Id: <1411069109-31425-5-git-send-email-charles.baylis@linaro.org> In-Reply-To: <1411069109-31425-1-git-send-email-charles.baylis@linaro.org> References: <1411069109-31425-1-git-send-email-charles.baylis@linaro.org> X-IsSubscribed: yes X-Original-Sender: charles.baylis@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c03::231 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gcc.gnu.org X-Google-Group-Id: 836684582541 This patch improves code generation of vst[234]q_* intrinsics by avoiding use of the __builtin_aarch64_set_qreg_* builtins to generate a temporary __builtin_aarch64_simd_XX variable. Instead, a union is used for type-punning, which avoids generation of some unnecessary move instructions. This idiom is already used in several other intrinsics. This patch is independent of the previous patches in the series. Tested (with the rest of the patch series) with make check on aarch64-oe-linux with qemu, and also causes no regressions in clyon's NEON intrinsics tests. Charles Baylis * config/aarch64/arm_neon.h (vst2q_s8, vst2q_p8, vst2q_s16, vst2q_p16, vst2q_s32, vst2q_s64, vst2q_u8, vst2q_u16, vst2q_u32, vst2q_u64, vst2q_f32, vst2q_f64, vst3q_s8, vst3q_p8, vst3q_s16, vst3q_p16, vst3q_s32, vst3q_s64, vst3q_u8, vst3q_u16, vst3q_u32, vst3q_u64, vst3q_f32, vst3q_f64, vst4q_s8, vst4q_p8, vst4q_s16, vst4q_p16, vst4q_s32, vst4q_s64, vst4q_u8, vst4q_u16, vst4q_u32, vst4q_u64, vst4q_f32, vst4q_f64): Use type-punning to convert between NEON intrinsic types and __builtin_aarch64_simd* types. Change-Id: I789c68fc8d9458638eb00a15ffa28073bdc969a8 --- gcc/config/aarch64/arm_neon.h | 288 ++++++++++++++++-------------------------- 1 file changed, 108 insertions(+), 180 deletions(-) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 87e3baf..3292ce0 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -22493,109 +22493,97 @@ vst2_f32 (float32_t * __a, float32x2x2_t val) __extension__ static __inline void __attribute__ ((__always_inline__)) vst2q_s8 (int8_t * __a, int8x16x2_t val) { - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[1], 1); - __builtin_aarch64_st2v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + union { int8x16x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp = { val }; + __builtin_aarch64_st2v16qi ((__builtin_aarch64_simd_qi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst2q_p8 (poly8_t * __a, poly8x16x2_t val) { - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[1], 1); - __builtin_aarch64_st2v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + union { poly8x16x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp = { val }; + __builtin_aarch64_st2v16qi ((__builtin_aarch64_simd_qi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst2q_s16 (int16_t * __a, int16x8x2_t val) { - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[1], 1); - __builtin_aarch64_st2v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + union { int16x8x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp = { val }; + __builtin_aarch64_st2v8hi ((__builtin_aarch64_simd_hi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst2q_p16 (poly16_t * __a, poly16x8x2_t val) { - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[1], 1); - __builtin_aarch64_st2v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + union { poly16x8x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp = { val }; + __builtin_aarch64_st2v8hi ((__builtin_aarch64_simd_hi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst2q_s32 (int32_t * __a, int32x4x2_t val) { - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) val.val[1], 1); - __builtin_aarch64_st2v4si ((__builtin_aarch64_simd_si *) __a, __o); + union { int32x4x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp = { val }; + __builtin_aarch64_st2v4si ((__builtin_aarch64_simd_si *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst2q_s64 (int64_t * __a, int64x2x2_t val) { - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) val.val[1], 1); - __builtin_aarch64_st2v2di ((__builtin_aarch64_simd_di *) __a, __o); + union { int64x2x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp = { val }; + __builtin_aarch64_st2v2di ((__builtin_aarch64_simd_di *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst2q_u8 (uint8_t * __a, uint8x16x2_t val) { - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[1], 1); - __builtin_aarch64_st2v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + union { uint8x16x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp = { val }; + __builtin_aarch64_st2v16qi ((__builtin_aarch64_simd_qi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst2q_u16 (uint16_t * __a, uint16x8x2_t val) { - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[1], 1); - __builtin_aarch64_st2v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + union { uint16x8x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp = { val }; + __builtin_aarch64_st2v8hi ((__builtin_aarch64_simd_hi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst2q_u32 (uint32_t * __a, uint32x4x2_t val) { - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) val.val[1], 1); - __builtin_aarch64_st2v4si ((__builtin_aarch64_simd_si *) __a, __o); + union { uint32x4x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp = { val }; + __builtin_aarch64_st2v4si ((__builtin_aarch64_simd_si *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst2q_u64 (uint64_t * __a, uint64x2x2_t val) { - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) val.val[1], 1); - __builtin_aarch64_st2v2di ((__builtin_aarch64_simd_di *) __a, __o); + union { uint64x2x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp = { val }; + __builtin_aarch64_st2v2di ((__builtin_aarch64_simd_di *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst2q_f32 (float32_t * __a, float32x4x2_t val) { - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_set_qregoiv4sf (__o, (float32x4_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregoiv4sf (__o, (float32x4_t) val.val[1], 1); - __builtin_aarch64_st2v4sf ((__builtin_aarch64_simd_sf *) __a, __o); + union { float32x4x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp = { val }; + __builtin_aarch64_st2v4sf ((__builtin_aarch64_simd_sf *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst2q_f64 (float64_t * __a, float64x2x2_t val) { - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_set_qregoiv2df (__o, (float64x2_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregoiv2df (__o, (float64x2_t) val.val[1], 1); - __builtin_aarch64_st2v2df ((__builtin_aarch64_simd_df *) __a, __o); + union { float64x2x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp = { val }; + __builtin_aarch64_st2v2df ((__builtin_aarch64_simd_df *) __a, __temp.__o); } __extension__ static __inline void @@ -22769,121 +22757,97 @@ vst3_f32 (float32_t * __a, float32x2x3_t val) __extension__ static __inline void __attribute__ ((__always_inline__)) vst3q_s8 (int8_t * __a, int8x16x3_t val) { - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[2], 2); - __builtin_aarch64_st3v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + union { int8x16x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp = { val }; + __builtin_aarch64_st3v16qi ((__builtin_aarch64_simd_qi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst3q_p8 (poly8_t * __a, poly8x16x3_t val) { - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[2], 2); - __builtin_aarch64_st3v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + union { poly8x16x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp = { val }; + __builtin_aarch64_st3v16qi ((__builtin_aarch64_simd_qi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst3q_s16 (int16_t * __a, int16x8x3_t val) { - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[2], 2); - __builtin_aarch64_st3v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + union { int16x8x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp = { val }; + __builtin_aarch64_st3v8hi ((__builtin_aarch64_simd_hi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst3q_p16 (poly16_t * __a, poly16x8x3_t val) { - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[2], 2); - __builtin_aarch64_st3v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + union { poly16x8x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp = { val }; + __builtin_aarch64_st3v8hi ((__builtin_aarch64_simd_hi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst3q_s32 (int32_t * __a, int32x4x3_t val) { - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[2], 2); - __builtin_aarch64_st3v4si ((__builtin_aarch64_simd_si *) __a, __o); + union { int32x4x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp = { val }; + __builtin_aarch64_st3v4si ((__builtin_aarch64_simd_si *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst3q_s64 (int64_t * __a, int64x2x3_t val) { - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[2], 2); - __builtin_aarch64_st3v2di ((__builtin_aarch64_simd_di *) __a, __o); + union { int64x2x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp = { val }; + __builtin_aarch64_st3v2di ((__builtin_aarch64_simd_di *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst3q_u8 (uint8_t * __a, uint8x16x3_t val) { - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[2], 2); - __builtin_aarch64_st3v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + union { uint8x16x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp = { val }; + __builtin_aarch64_st3v16qi ((__builtin_aarch64_simd_qi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst3q_u16 (uint16_t * __a, uint16x8x3_t val) { - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[2], 2); - __builtin_aarch64_st3v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + union { uint16x8x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp = { val }; + __builtin_aarch64_st3v8hi ((__builtin_aarch64_simd_hi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst3q_u32 (uint32_t * __a, uint32x4x3_t val) { - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[2], 2); - __builtin_aarch64_st3v4si ((__builtin_aarch64_simd_si *) __a, __o); + union { uint32x4x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp = { val }; + __builtin_aarch64_st3v4si ((__builtin_aarch64_simd_si *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst3q_u64 (uint64_t * __a, uint64x2x3_t val) { - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[2], 2); - __builtin_aarch64_st3v2di ((__builtin_aarch64_simd_di *) __a, __o); + union { uint64x2x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp = { val }; + __builtin_aarch64_st3v2di ((__builtin_aarch64_simd_di *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst3q_f32 (float32_t * __a, float32x4x3_t val) { - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) val.val[2], 2); - __builtin_aarch64_st3v4sf ((__builtin_aarch64_simd_sf *) __a, __o); + union { float32x4x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp = { val }; + __builtin_aarch64_st3v4sf ((__builtin_aarch64_simd_sf *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst3q_f64 (float64_t * __a, float64x2x3_t val) { - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) val.val[2], 2); - __builtin_aarch64_st3v2df ((__builtin_aarch64_simd_df *) __a, __o); + union { float64x2x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp = { val }; + __builtin_aarch64_st3v2df ((__builtin_aarch64_simd_df *) __a, __temp.__o); } __extension__ static __inline void @@ -23081,133 +23045,97 @@ vst4_f32 (float32_t * __a, float32x2x4_t val) __extension__ static __inline void __attribute__ ((__always_inline__)) vst4q_s8 (int8_t * __a, int8x16x4_t val) { - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[2], 2); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[3], 3); - __builtin_aarch64_st4v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + union { int8x16x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp = { val }; + __builtin_aarch64_st4v16qi ((__builtin_aarch64_simd_qi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst4q_p8 (poly8_t * __a, poly8x16x4_t val) { - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[2], 2); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[3], 3); - __builtin_aarch64_st4v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + union { poly8x16x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp = { val }; + __builtin_aarch64_st4v16qi ((__builtin_aarch64_simd_qi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst4q_s16 (int16_t * __a, int16x8x4_t val) { - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[2], 2); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[3], 3); - __builtin_aarch64_st4v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + union { int16x8x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp = { val }; + __builtin_aarch64_st4v8hi ((__builtin_aarch64_simd_hi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst4q_p16 (poly16_t * __a, poly16x8x4_t val) { - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[2], 2); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[3], 3); - __builtin_aarch64_st4v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + union { poly16x8x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp = { val }; + __builtin_aarch64_st4v8hi ((__builtin_aarch64_simd_hi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst4q_s32 (int32_t * __a, int32x4x4_t val) { - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[3], 3); - __builtin_aarch64_st4v4si ((__builtin_aarch64_simd_si *) __a, __o); + union { int32x4x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp = { val }; + __builtin_aarch64_st4v4si ((__builtin_aarch64_simd_si *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst4q_s64 (int64_t * __a, int64x2x4_t val) { - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[2], 2); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[3], 3); - __builtin_aarch64_st4v2di ((__builtin_aarch64_simd_di *) __a, __o); + union { int64x2x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp = { val }; + __builtin_aarch64_st4v2di ((__builtin_aarch64_simd_di *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst4q_u8 (uint8_t * __a, uint8x16x4_t val) { - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[2], 2); - __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[3], 3); - __builtin_aarch64_st4v16qi ((__builtin_aarch64_simd_qi *) __a, __o); + union { uint8x16x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp = { val }; + __builtin_aarch64_st4v16qi ((__builtin_aarch64_simd_qi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst4q_u16 (uint16_t * __a, uint16x8x4_t val) { - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[2], 2); - __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[3], 3); - __builtin_aarch64_st4v8hi ((__builtin_aarch64_simd_hi *) __a, __o); + union { uint16x8x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp = { val }; + __builtin_aarch64_st4v8hi ((__builtin_aarch64_simd_hi *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst4q_u32 (uint32_t * __a, uint32x4x4_t val) { - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[3], 3); - __builtin_aarch64_st4v4si ((__builtin_aarch64_simd_si *) __a, __o); + union { uint32x4x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp = { val }; + __builtin_aarch64_st4v4si ((__builtin_aarch64_simd_si *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst4q_u64 (uint64_t * __a, uint64x2x4_t val) { - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[2], 2); - __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[3], 3); - __builtin_aarch64_st4v2di ((__builtin_aarch64_simd_di *) __a, __o); + union { uint64x2x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp = { val }; + __builtin_aarch64_st4v2di ((__builtin_aarch64_simd_di *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst4q_f32 (float32_t * __a, float32x4x4_t val) { - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) val.val[2], 2); - __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) val.val[3], 3); - __builtin_aarch64_st4v4sf ((__builtin_aarch64_simd_sf *) __a, __o); + union { float32x4x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp = { val }; + __builtin_aarch64_st4v4sf ((__builtin_aarch64_simd_sf *) __a, __temp.__o); } __extension__ static __inline void __attribute__ ((__always_inline__)) vst4q_f64 (float64_t * __a, float64x2x4_t val) { - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) val.val[0], 0); - __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) val.val[1], 1); - __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) val.val[2], 2); - __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) val.val[3], 3); - __builtin_aarch64_st4v2df ((__builtin_aarch64_simd_df *) __a, __o); + union { float64x2x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp = { val }; + __builtin_aarch64_st4v2df ((__builtin_aarch64_simd_df *) __a, __temp.__o); } /* vsub */