From patchwork Thu Sep 18 19:38:28 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Charles Baylis X-Patchwork-Id: 37607 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-ee0-f71.google.com (mail-ee0-f71.google.com [74.125.83.71]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id E1F2D2054D for ; Thu, 18 Sep 2014 19:41:51 +0000 (UTC) Received: by mail-ee0-f71.google.com with SMTP id e53sf1010783eek.2 for ; Thu, 18 Sep 2014 12:41:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:mailing-list :precedence:list-id:list-unsubscribe:list-archive:list-post :list-help:sender:delivered-to:from:to:subject:date:message-id :in-reply-to:references:x-original-sender :x-original-authentication-results; bh=IOD3Xb0xPeDVd/eQo8zH1n0eEpl/ZIRoERnYC+jirLI=; b=FVM7t56u0YtDcAcB2vv4GM0M5/DYHwz1HITTOpzad+RDCKxNJF+rLkaHY5a/fr8QP0 C7zzZE/Sdc9xGObgyEwaUhrwVWtxI9+kRYq003pF9RTLxk+fUHe4JJRWq1TFJyCKxkaA vclK2ZZiYI7QUkvRUxwi9gqlBX6NwQb1sVQK+JvQX29hfzxdbQakkrBd2ZPNx+UQOqSI Q/oHfo50F4Y9+XQ0Q8psRMThZdNsfMuVNtyDFxVmVNgx/PNWVi2Y/f+S6jJKYmsqw5y7 HK5gsfRgjwOs4/wGa4HdhnFbqi6EsbgCxGS6u3Gd3mTGDtqqP4FDSUY1V7gxexbw38HX S4BQ== X-Gm-Message-State: ALoCoQldNnFhBsqc+LJw1QBCdG0EsRkKiu+sxvksuNgwy+TSS6B4/Fn4YuWEcM8IV+MJeugelDpp X-Received: by 10.180.81.226 with SMTP id d2mr9233030wiy.5.1411069311027; Thu, 18 Sep 2014 12:41:51 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.22.130 with SMTP id d2ls221867laf.98.gmail; Thu, 18 Sep 2014 12:41:50 -0700 (PDT) X-Received: by 10.152.203.167 with SMTP id kr7mr1968063lac.9.1411069310870; Thu, 18 Sep 2014 12:41:50 -0700 (PDT) Received: from mail-lb0-x22e.google.com (mail-lb0-x22e.google.com [2a00:1450:4010:c04::22e]) by mx.google.com with ESMTPS id 7si25077146lai.94.2014.09.18.12.41.50 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 18 Sep 2014 12:41:50 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::22e as permitted sender) client-ip=2a00:1450:4010:c04::22e; Received: by mail-lb0-f174.google.com with SMTP id l4so1845739lbv.19 for ; Thu, 18 Sep 2014 12:41:50 -0700 (PDT) X-Received: by 10.152.204.231 with SMTP id lb7mr2038690lac.44.1411069310800; Thu, 18 Sep 2014 12:41:50 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.130.169 with SMTP id of9csp820896lbb; Thu, 18 Sep 2014 12:41:49 -0700 (PDT) X-Received: by 10.68.183.68 with SMTP id ek4mr9433916pbc.54.1411069308984; Thu, 18 Sep 2014 12:41:48 -0700 (PDT) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id qs7si41272514pbc.118.2014.09.18.12.41.48 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 18 Sep 2014 12:41:48 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-378065-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 30370 invoked by alias); 18 Sep 2014 19:40:59 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 30303 invoked by uid 89); 18 Sep 2014 19:40:58 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-pd0-f180.google.com Received: from mail-pd0-f180.google.com (HELO mail-pd0-f180.google.com) (209.85.192.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Thu, 18 Sep 2014 19:40:54 +0000 Received: by mail-pd0-f180.google.com with SMTP id ft15so2024716pdb.25 for ; Thu, 18 Sep 2014 12:40:52 -0700 (PDT) X-Received: by 10.68.194.194 with SMTP id hy2mr8970778pbc.149.1411069252553; Thu, 18 Sep 2014 12:40:52 -0700 (PDT) Received: from sale.swisscom.com (70-35-38-154.static.wiline.com. [70.35.38.154]) by mx.google.com with ESMTPSA id f12sm20996103pat.36.2014.09.18.12.40.49 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 18 Sep 2014 12:40:51 -0700 (PDT) From: Charles Baylis To: marcus.shawcroft@arm.com, rearnsha@arm.com, gcc-patches@gcc.gnu.org Subject: [PATCH 3/4] [AARCH64, NEON] Fix unnecessary moves in vld[234]q_* intrinsics Date: Thu, 18 Sep 2014 20:38:28 +0100 Message-Id: <1411069109-31425-4-git-send-email-charles.baylis@linaro.org> In-Reply-To: <1411069109-31425-1-git-send-email-charles.baylis@linaro.org> References: <1411069109-31425-1-git-send-email-charles.baylis@linaro.org> X-IsSubscribed: yes X-Original-Sender: charles.baylis@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::22e as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gcc.gnu.org X-Google-Group-Id: 836684582541 This patch improves code generation of vld[234]q_* intrinsics by avoiding use of the __builtin_aarch64_get_qreg_* builtins to generate a temporary result variable. Instead, a union is used for type-punning, which avoids generation of some unnecessary move instructions. This idiom is already used in several other intrinsics. This patch is independent of the previous patches in the series. Tested (with the rest of the patch series) with make check on aarch64-oe-linux with qemu, and also causes no regressions in clyon's NEON intrinsics tests. Charles Baylis * config/aarch64/arm_neon.h (vld2q_s8, vld2q_p8, vld2q_s16, vld2q_p16, vld2q_s32, vld2q_s64, vld2q_u8, vld2q_u16, vld2q_u32, vld2q_u64, vld2q_f32, vld2q_f64, vld3q_s8, vld3q_p8, vld3q_s16, vld3q_p16, vld3q_s32, vld3q_s64, vld3q_u8, vld3q_u16, vld3q_u32, vld3q_u64, vld3q_f32, vld3q_f64, vld4q_s8, vld4q_p8, vld4q_s16, vld4q_p16, vld4q_s32, vld4q_s64, vld4q_u8, vld4q_u16, vld4q_u32, vld4q_u64, vld4q_f32, vld4q_f64): Use type-punning to convert between NEON intrinsic types and __builtin_aarch64_simd* types. Change-Id: I61efa29138b13c7a83679885343211d604a73b15 --- gcc/config/aarch64/arm_neon.h | 396 +++++++++++++++--------------------------- 1 file changed, 144 insertions(+), 252 deletions(-) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index c1fcb47..87e3baf 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -16969,133 +16969,109 @@ vld2_f32 (const float32_t * __a) __extension__ static __inline int8x16x2_t __attribute__ ((__always_inline__)) vld2q_s8 (const int8_t * __a) { - int8x16x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0); - ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1); - return ret; + union { int8x16x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp; + __temp.__o = __builtin_aarch64_ld2v16qi ((const __builtin_aarch64_simd_qi *) __a); + return __temp.__i; } __extension__ static __inline poly8x16x2_t __attribute__ ((__always_inline__)) vld2q_p8 (const poly8_t * __a) { - poly8x16x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0); - ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1); - return ret; + union { poly8x16x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp; + __temp.__o = __builtin_aarch64_ld2v16qi ((const __builtin_aarch64_simd_qi *) __a); + return __temp.__i; } __extension__ static __inline int16x8x2_t __attribute__ ((__always_inline__)) vld2q_s16 (const int16_t * __a) { - int16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0); - ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1); - return ret; + union { int16x8x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp; + __temp.__o = __builtin_aarch64_ld2v8hi ((const __builtin_aarch64_simd_hi *) __a); + return __temp.__i; } __extension__ static __inline poly16x8x2_t __attribute__ ((__always_inline__)) vld2q_p16 (const poly16_t * __a) { - poly16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0); - ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1); - return ret; + union { poly16x8x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp; + __temp.__o = __builtin_aarch64_ld2v8hi ((const __builtin_aarch64_simd_hi *) __a); + return __temp.__i; } __extension__ static __inline int32x4x2_t __attribute__ ((__always_inline__)) vld2q_s32 (const int32_t * __a) { - int32x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + union { int32x4x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp; + __temp.__o = __builtin_aarch64_ld2v4si ((const __builtin_aarch64_simd_si *) __a); + return __temp.__i; } __extension__ static __inline int64x2x2_t __attribute__ ((__always_inline__)) vld2q_s64 (const int64_t * __a) { - int64x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 0); - ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 1); - return ret; + union { int64x2x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp; + __temp.__o = __builtin_aarch64_ld2v2di ((const __builtin_aarch64_simd_di *) __a); + return __temp.__i; } __extension__ static __inline uint8x16x2_t __attribute__ ((__always_inline__)) vld2q_u8 (const uint8_t * __a) { - uint8x16x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0); - ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1); - return ret; + union { uint8x16x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp; + __temp.__o = __builtin_aarch64_ld2v16qi ((const __builtin_aarch64_simd_qi *) __a); + return __temp.__i; } __extension__ static __inline uint16x8x2_t __attribute__ ((__always_inline__)) vld2q_u16 (const uint16_t * __a) { - uint16x8x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0); - ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1); - return ret; + union { uint16x8x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp; + __temp.__o = __builtin_aarch64_ld2v8hi ((const __builtin_aarch64_simd_hi *) __a); + return __temp.__i; } __extension__ static __inline uint32x4x2_t __attribute__ ((__always_inline__)) vld2q_u32 (const uint32_t * __a) { - uint32x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0); - ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1); - return ret; + union { uint32x4x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp; + __temp.__o = __builtin_aarch64_ld2v4si ((const __builtin_aarch64_simd_si *) __a); + return __temp.__i; } __extension__ static __inline uint64x2x2_t __attribute__ ((__always_inline__)) vld2q_u64 (const uint64_t * __a) { - uint64x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 0); - ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 1); - return ret; + union { uint64x2x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp; + __temp.__o = __builtin_aarch64_ld2v2di ((const __builtin_aarch64_simd_di *) __a); + return __temp.__i; } __extension__ static __inline float32x4x2_t __attribute__ ((__always_inline__)) vld2q_f32 (const float32_t * __a) { - float32x4x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v4sf ((const __builtin_aarch64_simd_sf *) __a); - ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregoiv4sf (__o, 0); - ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregoiv4sf (__o, 1); - return ret; + union { float32x4x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp; + __temp.__o = __builtin_aarch64_ld2v4sf ((const __builtin_aarch64_simd_sf *) __a); + return __temp.__i; } __extension__ static __inline float64x2x2_t __attribute__ ((__always_inline__)) vld2q_f64 (const float64_t * __a) { - float64x2x2_t ret; - __builtin_aarch64_simd_oi __o; - __o = __builtin_aarch64_ld2v2df ((const __builtin_aarch64_simd_df *) __a); - ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregoiv2df (__o, 0); - ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregoiv2df (__o, 1); - return ret; + union { float64x2x2_t __i; + __builtin_aarch64_simd_oi __o; } __temp; + __temp.__o = __builtin_aarch64_ld2v2df ((const __builtin_aarch64_simd_df *) __a); + return __temp.__i; } __extension__ static __inline int64x1x3_t __attribute__ ((__always_inline__)) @@ -17245,145 +17221,109 @@ vld3_f32 (const float32_t * __a) __extension__ static __inline int8x16x3_t __attribute__ ((__always_inline__)) vld3q_s8 (const int8_t * __a) { - int8x16x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0); - ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1); - ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2); - return ret; + union { int8x16x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp; + __temp.__o = __builtin_aarch64_ld3v16qi ((const __builtin_aarch64_simd_qi *) __a); + return __temp.__i; } __extension__ static __inline poly8x16x3_t __attribute__ ((__always_inline__)) vld3q_p8 (const poly8_t * __a) { - poly8x16x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0); - ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1); - ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2); - return ret; + union { poly8x16x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp; + __temp.__o = __builtin_aarch64_ld3v16qi ((const __builtin_aarch64_simd_qi *) __a); + return __temp.__i; } __extension__ static __inline int16x8x3_t __attribute__ ((__always_inline__)) vld3q_s16 (const int16_t * __a) { - int16x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0); - ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1); - ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2); - return ret; + union { int16x8x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp; + __temp.__o = __builtin_aarch64_ld3v8hi ((const __builtin_aarch64_simd_hi *) __a); + return __temp.__i; } __extension__ static __inline poly16x8x3_t __attribute__ ((__always_inline__)) vld3q_p16 (const poly16_t * __a) { - poly16x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0); - ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1); - ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2); - return ret; + union { poly16x8x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp; + __temp.__o = __builtin_aarch64_ld3v8hi ((const __builtin_aarch64_simd_hi *) __a); + return __temp.__i; } __extension__ static __inline int32x4x3_t __attribute__ ((__always_inline__)) vld3q_s32 (const int32_t * __a) { - int32x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + union { int32x4x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp; + __temp.__o = __builtin_aarch64_ld3v4si ((const __builtin_aarch64_simd_si *) __a); + return __temp.__i; } __extension__ static __inline int64x2x3_t __attribute__ ((__always_inline__)) vld3q_s64 (const int64_t * __a) { - int64x2x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0); - ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1); - ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2); - return ret; + union { int64x2x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp; + __temp.__o = __builtin_aarch64_ld3v2di ((const __builtin_aarch64_simd_di *) __a); + return __temp.__i; } __extension__ static __inline uint8x16x3_t __attribute__ ((__always_inline__)) vld3q_u8 (const uint8_t * __a) { - uint8x16x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0); - ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1); - ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2); - return ret; + union { uint8x16x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp; + __temp.__o = __builtin_aarch64_ld3v16qi ((const __builtin_aarch64_simd_qi *) __a); + return __temp.__i; } __extension__ static __inline uint16x8x3_t __attribute__ ((__always_inline__)) vld3q_u16 (const uint16_t * __a) { - uint16x8x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0); - ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1); - ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2); - return ret; + union { uint16x8x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp; + __temp.__o = __builtin_aarch64_ld3v8hi ((const __builtin_aarch64_simd_hi *) __a); + return __temp.__i; } __extension__ static __inline uint32x4x3_t __attribute__ ((__always_inline__)) vld3q_u32 (const uint32_t * __a) { - uint32x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0); - ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1); - ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2); - return ret; + union { uint32x4x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp; + __temp.__o = __builtin_aarch64_ld3v4si ((const __builtin_aarch64_simd_si *) __a); + return __temp.__i; } __extension__ static __inline uint64x2x3_t __attribute__ ((__always_inline__)) vld3q_u64 (const uint64_t * __a) { - uint64x2x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0); - ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1); - ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2); - return ret; + union { uint64x2x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp; + __temp.__o = __builtin_aarch64_ld3v2di ((const __builtin_aarch64_simd_di *) __a); + return __temp.__i; } __extension__ static __inline float32x4x3_t __attribute__ ((__always_inline__)) vld3q_f32 (const float32_t * __a) { - float32x4x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v4sf ((const __builtin_aarch64_simd_sf *) __a); - ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 0); - ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 1); - ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 2); - return ret; + union { float32x4x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp; + __temp.__o = __builtin_aarch64_ld3v4sf ((const __builtin_aarch64_simd_sf *) __a); + return __temp.__i; } __extension__ static __inline float64x2x3_t __attribute__ ((__always_inline__)) vld3q_f64 (const float64_t * __a) { - float64x2x3_t ret; - __builtin_aarch64_simd_ci __o; - __o = __builtin_aarch64_ld3v2df ((const __builtin_aarch64_simd_df *) __a); - ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 0); - ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 1); - ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 2); - return ret; + union { float64x2x3_t __i; + __builtin_aarch64_simd_ci __o; } __temp; + __temp.__o = __builtin_aarch64_ld3v2df ((const __builtin_aarch64_simd_df *) __a); + return __temp.__i; } __extension__ static __inline int64x1x4_t __attribute__ ((__always_inline__)) @@ -17545,157 +17485,109 @@ vld4_f32 (const float32_t * __a) __extension__ static __inline int8x16x4_t __attribute__ ((__always_inline__)) vld4q_s8 (const int8_t * __a) { - int8x16x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0); - ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1); - ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2); - ret.val[3] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3); - return ret; + union { int8x16x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp; + __temp.__o = __builtin_aarch64_ld4v16qi ((const __builtin_aarch64_simd_qi *) __a); + return __temp.__i; } __extension__ static __inline poly8x16x4_t __attribute__ ((__always_inline__)) vld4q_p8 (const poly8_t * __a) { - poly8x16x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0); - ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1); - ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2); - ret.val[3] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3); - return ret; + union { poly8x16x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp; + __temp.__o = __builtin_aarch64_ld4v16qi ((const __builtin_aarch64_simd_qi *) __a); + return __temp.__i; } __extension__ static __inline int16x8x4_t __attribute__ ((__always_inline__)) vld4q_s16 (const int16_t * __a) { - int16x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0); - ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1); - ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2); - ret.val[3] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3); - return ret; + union { int16x8x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp; + __temp.__o = __builtin_aarch64_ld4v8hi ((const __builtin_aarch64_simd_hi *) __a); + return __temp.__i; } __extension__ static __inline poly16x8x4_t __attribute__ ((__always_inline__)) vld4q_p16 (const poly16_t * __a) { - poly16x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0); - ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1); - ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2); - ret.val[3] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3); - return ret; + union { poly16x8x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp; + __temp.__o = __builtin_aarch64_ld4v8hi ((const __builtin_aarch64_simd_hi *) __a); + return __temp.__i; } __extension__ static __inline int32x4x4_t __attribute__ ((__always_inline__)) vld4q_s32 (const int32_t * __a) { - int32x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + union { int32x4x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp; + __temp.__o = __builtin_aarch64_ld4v4si ((const __builtin_aarch64_simd_si *) __a); + return __temp.__i; } __extension__ static __inline int64x2x4_t __attribute__ ((__always_inline__)) vld4q_s64 (const int64_t * __a) { - int64x2x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 0); - ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 1); - ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 2); - ret.val[3] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 3); - return ret; + union { int64x2x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp; + __temp.__o = __builtin_aarch64_ld4v2di ((const __builtin_aarch64_simd_di *) __a); + return __temp.__i; } __extension__ static __inline uint8x16x4_t __attribute__ ((__always_inline__)) vld4q_u8 (const uint8_t * __a) { - uint8x16x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v16qi ((const __builtin_aarch64_simd_qi *) __a); - ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0); - ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1); - ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2); - ret.val[3] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3); - return ret; + union { uint8x16x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp; + __temp.__o = __builtin_aarch64_ld4v16qi ((const __builtin_aarch64_simd_qi *) __a); + return __temp.__i; } __extension__ static __inline uint16x8x4_t __attribute__ ((__always_inline__)) vld4q_u16 (const uint16_t * __a) { - uint16x8x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v8hi ((const __builtin_aarch64_simd_hi *) __a); - ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0); - ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1); - ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2); - ret.val[3] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3); - return ret; + union { uint16x8x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp; + __temp.__o = __builtin_aarch64_ld4v8hi ((const __builtin_aarch64_simd_hi *) __a); + return __temp.__i; } __extension__ static __inline uint32x4x4_t __attribute__ ((__always_inline__)) vld4q_u32 (const uint32_t * __a) { - uint32x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v4si ((const __builtin_aarch64_simd_si *) __a); - ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0); - ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1); - ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2); - ret.val[3] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3); - return ret; + union { uint32x4x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp; + __temp.__o = __builtin_aarch64_ld4v4si ((const __builtin_aarch64_simd_si *) __a); + return __temp.__i; } __extension__ static __inline uint64x2x4_t __attribute__ ((__always_inline__)) vld4q_u64 (const uint64_t * __a) { - uint64x2x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v2di ((const __builtin_aarch64_simd_di *) __a); - ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 0); - ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 1); - ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 2); - ret.val[3] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 3); - return ret; + union { uint64x2x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp; + __temp.__o = __builtin_aarch64_ld4v2di ((const __builtin_aarch64_simd_di *) __a); + return __temp.__i; } __extension__ static __inline float32x4x4_t __attribute__ ((__always_inline__)) vld4q_f32 (const float32_t * __a) { - float32x4x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v4sf ((const __builtin_aarch64_simd_sf *) __a); - ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 0); - ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 1); - ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 2); - ret.val[3] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 3); - return ret; + union { float32x4x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp; + __temp.__o = __builtin_aarch64_ld4v4sf ((const __builtin_aarch64_simd_sf *) __a); + return __temp.__i; } __extension__ static __inline float64x2x4_t __attribute__ ((__always_inline__)) vld4q_f64 (const float64_t * __a) { - float64x2x4_t ret; - __builtin_aarch64_simd_xi __o; - __o = __builtin_aarch64_ld4v2df ((const __builtin_aarch64_simd_df *) __a); - ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 0); - ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 1); - ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 2); - ret.val[3] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 3); - return ret; + union { float64x2x4_t __i; + __builtin_aarch64_simd_xi __o; } __temp; + __temp.__o = __builtin_aarch64_ld4v2df ((const __builtin_aarch64_simd_df *) __a); + return __temp.__i; } /* vmax */