From patchwork Tue Sep 15 16:25:25 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Lyon X-Patchwork-Id: 53687 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-lb0-f198.google.com (mail-lb0-f198.google.com [209.85.217.198]) by patches.linaro.org (Postfix) with ESMTPS id C796D22DE5 for ; Tue, 15 Sep 2015 16:26:01 +0000 (UTC) Received: by lbbmp1 with SMTP id mp1sf58997429lbb.2 for ; Tue, 15 Sep 2015 09:26:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id :list-unsubscribe:list-archive:list-post:list-help:sender :delivered-to:mime-version:date:message-id:subject:from:to :content-type:x-original-sender:x-original-authentication-results; bh=51xzAfQ7QDNskQ4B5PiozWX9FgIVKYnWjI11uiczUUM=; b=K68kK3uMl6QKGtkrwVqUtl0rLwfqvTbaqegBnkjeNJfDKrEqDtXjDhZXFRHZLmfruy gvoARukRYxm32LSJseUofwVdB5ReKQMsQB4AarV2WtegekURFfBHA1rBQgtHW8Qh8ZFV d+QJ19EEaWC+NHnnPn54P9GWIm+wA+y4DuV5/xW3r0xHWhNtZfXzveEbmKcR+UZP2cMH 5DQnCoG0hQBynpkPBCd6Sn7WN7C0YLSdEMLHqngY2fJT+FwcOH7BM4Q4Xwq2yoFkTzX2 0nVzqmbXgD1lpBCLqw7A84cRkDO7ZkPZMftKG6tm9HBb8I51UonBEFSg1qZ/6+FKaV89 qo4A== X-Gm-Message-State: ALoCoQl6EEJF8xzsnLr2Cts0oOULGsARzE0R26CgMDgH6c02i0IGMSgn2j5mc/ujfONXYo2z77sq X-Received: by 10.112.199.5 with SMTP id jg5mr4460195lbc.14.1442334360788; Tue, 15 Sep 2015 09:26:00 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.28.230 with SMTP id e6ls662788lah.66.gmail; Tue, 15 Sep 2015 09:26:00 -0700 (PDT) X-Received: by 10.152.23.199 with SMTP id o7mr21096583laf.76.1442334360504; Tue, 15 Sep 2015 09:26:00 -0700 (PDT) Received: from mail-la0-x22b.google.com (mail-la0-x22b.google.com. [2a00:1450:4010:c03::22b]) by mx.google.com with ESMTPS id ap9si14682761lbc.133.2015.09.15.09.26.00 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 15 Sep 2015 09:26:00 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c03::22b as permitted sender) client-ip=2a00:1450:4010:c03::22b; Received: by lahg1 with SMTP id g1so81952011lah.1 for ; Tue, 15 Sep 2015 09:26:00 -0700 (PDT) X-Received: by 10.152.18.130 with SMTP id w2mr13240807lad.88.1442334360014; Tue, 15 Sep 2015 09:26:00 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.59.35 with SMTP id w3csp1905608lbq; Tue, 15 Sep 2015 09:25:59 -0700 (PDT) X-Received: by 10.50.4.42 with SMTP id h10mr7563388igh.37.1442334358903; Tue, 15 Sep 2015 09:25:58 -0700 (PDT) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id f14si8981619igt.88.2015.09.15.09.25.58 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 15 Sep 2015 09:25:58 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-407472-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 13440 invoked by alias); 15 Sep 2015 16:25:34 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 13363 invoked by uid 89); 15 Sep 2015 16:25:34 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-qg0-f54.google.com Received: from mail-qg0-f54.google.com (HELO mail-qg0-f54.google.com) (209.85.192.54) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 15 Sep 2015 16:25:28 +0000 Received: by qgev79 with SMTP id v79so147264683qge.0 for ; Tue, 15 Sep 2015 09:25:25 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.140.130.72 with SMTP id 69mr35168172qhc.32.1442334325518; Tue, 15 Sep 2015 09:25:25 -0700 (PDT) Received: by 10.140.96.71 with HTTP; Tue, 15 Sep 2015 09:25:25 -0700 (PDT) Date: Tue, 15 Sep 2015 18:25:25 +0200 Message-ID: Subject: [AArch64_be] Fix vtbl[34] and vtbx4 From: Christophe Lyon To: "gcc-patches@gcc.gnu.org" X-IsSubscribed: yes X-Original-Sender: christophe.lyon@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c03::22b as permitted sender) smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gcc.gnu.org X-Google-Group-Id: 836684582541 This patch re-implements vtbl[34] and vtbx4 AdvSIMD intrinsics using existing builtins, and fixes the behaviour on aarch64_be. Tested on aarch64_be-none-elf and aarch64-none-elf using the Foundation Model. OK? Christophe. 2015-09-15 Christophe Lyon * config/aarch64/aarch64-builtins.c (aarch64_types_tbl_qualifiers): New static data. (TYPES_TBL): Define. * config/aarch64/aarch64-simd-builtins.def: Update builtins tables. * config/aarch64/aarch64-simd.md (aarch64_tbl3v8qi): New. * config/aarch64/arm_neon.h (vtbl3_s8, vtbl3_u8, vtbl3_p8) (vtbl4_s8, vtbl4_u8, vtbl4_p8): Rewrite using builtin functions. (vtbx4_s8, vtbx4_u8, vtbx4_p8): Emulate behaviour using other intrinsics. * config/aarch64/iterators.md (V8Q): New. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 0f4f2b9..7ca3917 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -253,6 +253,11 @@ aarch64_types_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_none, qualifier_struct_load_store_lane_index }; #define TYPES_STORESTRUCT_LANE (aarch64_types_storestruct_lane_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_tbl_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_none }; +#define TYPES_TBL (aarch64_types_tbl_qualifiers) + #define CF0(N, X) CODE_FOR_aarch64_##N##X #define CF1(N, X) CODE_FOR_##N##X##1 #define CF2(N, X) CODE_FOR_##N##X##2 diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index d0f298a..62f1b13 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -405,3 +405,5 @@ VAR1 (BINOPP, crypto_pmull, 0, di) VAR1 (BINOPP, crypto_pmull, 0, v2di) + /* Implemented by aarch64_tbl3v8qi. */ + BUILTIN_V8Q (TBL, tbl3, 0) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 9777418..84a61d5 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4716,6 +4714,16 @@ [(set_attr "type" "neon_tbl2_q")] ) +(define_insn "aarch64_tbl3v8qi" + [(set (match_operand:V8QI 0 "register_operand" "=w") + (unspec:V8QI [(match_operand:OI 1 "register_operand" "w") + (match_operand:V8QI 2 "register_operand" "w")] + UNSPEC_TBL))] + "TARGET_SIMD" + "tbl\\t%S0.8b, {%S1.16b - %T1.16b}, %S2.8b" + [(set_attr "type" "neon_tbl3")] +) + (define_insn_and_split "aarch64_combinev16qi" [(set (match_operand:OI 0 "register_operand" "=w") (unspec:OI [(match_operand:V16QI 1 "register_operand" "w") diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 87bbf6e..91704de 100644 diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 6dfebe7..e8ee318 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -10902,13 +10902,14 @@ vtbl3_s8 (int8x8x3_t tab, int8x8_t idx) { int8x8_t result; int8x16x2_t temp; + __builtin_aarch64_simd_oi __o; temp.val[0] = vcombine_s8 (tab.val[0], tab.val[1]); temp.val[1] = vcombine_s8 (tab.val[2], vcreate_s8 (__AARCH64_UINT64_C (0x0))); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbl %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "=w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[1], 1); + result = __builtin_aarch64_tbl3v8qi (__o, idx); return result; } @@ -10917,13 +10918,14 @@ vtbl3_u8 (uint8x8x3_t tab, uint8x8_t idx) { uint8x8_t result; uint8x16x2_t temp; + __builtin_aarch64_simd_oi __o; temp.val[0] = vcombine_u8 (tab.val[0], tab.val[1]); temp.val[1] = vcombine_u8 (tab.val[2], vcreate_u8 (__AARCH64_UINT64_C (0x0))); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbl %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "=w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[1], 1); + result = (uint8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx); return result; } @@ -10932,13 +10934,14 @@ vtbl3_p8 (poly8x8x3_t tab, uint8x8_t idx) { poly8x8_t result; poly8x16x2_t temp; + __builtin_aarch64_simd_oi __o; temp.val[0] = vcombine_p8 (tab.val[0], tab.val[1]); temp.val[1] = vcombine_p8 (tab.val[2], vcreate_p8 (__AARCH64_UINT64_C (0x0))); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbl %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "=w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[1], 1); + result = (poly8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx); return result; } @@ -10947,13 +10950,14 @@ vtbl4_s8 (int8x8x4_t tab, int8x8_t idx) { int8x8_t result; int8x16x2_t temp; + __builtin_aarch64_simd_oi __o; temp.val[0] = vcombine_s8 (tab.val[0], tab.val[1]); temp.val[1] = vcombine_s8 (tab.val[2], tab.val[3]); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbl %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "=w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[1], 1); + result = __builtin_aarch64_tbl3v8qi (__o, idx); return result; } @@ -10962,13 +10966,14 @@ vtbl4_u8 (uint8x8x4_t tab, uint8x8_t idx) { uint8x8_t result; uint8x16x2_t temp; + __builtin_aarch64_simd_oi __o; temp.val[0] = vcombine_u8 (tab.val[0], tab.val[1]); temp.val[1] = vcombine_u8 (tab.val[2], tab.val[3]); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbl %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "=w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[1], 1); + result = (uint8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx); return result; } @@ -10977,13 +10982,14 @@ vtbl4_p8 (poly8x8x4_t tab, uint8x8_t idx) { poly8x8_t result; poly8x16x2_t temp; + __builtin_aarch64_simd_oi __o; temp.val[0] = vcombine_p8 (tab.val[0], tab.val[1]); temp.val[1] = vcombine_p8 (tab.val[2], tab.val[3]); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbl %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "=w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[1], 1); + result = (poly8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx); return result; } @@ -11023,51 +11029,6 @@ vtbx2_p8 (poly8x8_t r, poly8x8x2_t tab, uint8x8_t idx) return result; } -__extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) -vtbx4_s8 (int8x8_t r, int8x8x4_t tab, int8x8_t idx) -{ - int8x8_t result = r; - int8x16x2_t temp; - temp.val[0] = vcombine_s8 (tab.val[0], tab.val[1]); - temp.val[1] = vcombine_s8 (tab.val[2], tab.val[3]); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbx %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "+w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); - return result; -} - -__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) -vtbx4_u8 (uint8x8_t r, uint8x8x4_t tab, uint8x8_t idx) -{ - uint8x8_t result = r; - uint8x16x2_t temp; - temp.val[0] = vcombine_u8 (tab.val[0], tab.val[1]); - temp.val[1] = vcombine_u8 (tab.val[2], tab.val[3]); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbx %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "+w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); - return result; -} - -__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) -vtbx4_p8 (poly8x8_t r, poly8x8x4_t tab, uint8x8_t idx) -{ - poly8x8_t result = r; - poly8x16x2_t temp; - temp.val[0] = vcombine_p8 (tab.val[0], tab.val[1]); - temp.val[1] = vcombine_p8 (tab.val[2], tab.val[3]); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbx %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "+w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); - return result; -} - /* End of temporary inline asm. */ /* Start of optimal implementations in approved order. */ @@ -23221,6 +23182,36 @@ vtbx3_p8 (poly8x8_t __r, poly8x8x3_t __tab, uint8x8_t __idx) return vbsl_p8 (__mask, __tbl, __r); } +/* vtbx4 */ + +__extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) +vtbx4_s8 (int8x8_t __r, int8x8x4_t __tab, int8x8_t __idx) +{ + uint8x8_t __mask = vclt_u8 (vreinterpret_u8_s8 (__idx), + vmov_n_u8 (32)); + int8x8_t __tbl = vtbl4_s8 (__tab, __idx); + + return vbsl_s8 (__mask, __tbl, __r); +} + +__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) +vtbx4_u8 (uint8x8_t __r, uint8x8x4_t __tab, uint8x8_t __idx) +{ + uint8x8_t __mask = vclt_u8 (__idx, vmov_n_u8 (32)); + uint8x8_t __tbl = vtbl4_u8 (__tab, __idx); + + return vbsl_u8 (__mask, __tbl, __r); +} + +__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) +vtbx4_p8 (poly8x8_t __r, poly8x8x4_t __tab, uint8x8_t __idx) +{ + uint8x8_t __mask = vclt_u8 (__idx, vmov_n_u8 (32)); + poly8x8_t __tbl = vtbl4_p8 (__tab, __idx); + + return vbsl_p8 (__mask, __tbl, __r); +} + /* vtrn */ __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index b8a45d1..dfbd9cd 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -100,6 +100,8 @@ ;; All modes. (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF]) +(define_mode_iterator V8Q [V8QI]) + ;; All vector modes and DI. (define_mode_iterator VALLDI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF DI])