From patchwork Tue Jan 26 15:18:00 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Lyon X-Patchwork-Id: 60463 Delivered-To: patch@linaro.org Received: by 10.112.130.2 with SMTP id oa2csp2022410lbb; Tue, 26 Jan 2016 07:18:28 -0800 (PST) X-Received: by 10.67.6.195 with SMTP id cw3mr34766716pad.88.1453821508290; Tue, 26 Jan 2016 07:18:28 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id y63si2559380pfi.175.2016.01.26.07.18.27 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Jan 2016 07:18:28 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-420061-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-return-420061-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-420061-patch=linaro.org@gcc.gnu.org; dkim=pass header.i=@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=dVroE5wsc+1Ba51zsG Ri/Cj2byFHs/tEmTF/u5nzmI1J2PEn0wj6y98PCJbf/B6Q0H+C6oFhkHflcG8hQu KASuMY4E2D8DcJjr7AzE8cXRD9/Q2InXBe43qFMP+C/Bb8+356rqfDKyM4bJHZZo SShMoCyE/B8+8TPzaYLQ5bdW8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=A9fU9RzHJ6vbk+A5oqhHB9Yz 8jM=; b=aOM/CILnbPHcJUm/M7iQAaEbgbxHXYaAlch1Zakg6R5jtGjYD2P4MRZy vFSJld1fpA5/oK1O9935rDxYqLo/1QT25SU2BtQdlYmHQCsltr3TtHlffVcBhkvV Qmi2wIAAx6u477n2H/8fnOgnw++1gZBDikYkZ1+FtLOc0YdXR+A= Received: (qmail 120734 invoked by alias); 26 Jan 2016 15:18:08 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 119735 invoked by uid 89); 26 Jan 2016 15:18:07 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL, BAYES_50, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=no version=3.3.2 spammy=4157, ii, 415, 7, rt X-HELO: mail-qg0-f53.google.com Received: from mail-qg0-f53.google.com (HELO mail-qg0-f53.google.com) (209.85.192.53) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 26 Jan 2016 15:18:03 +0000 Received: by mail-qg0-f53.google.com with SMTP id b35so140524912qge.0 for ; Tue, 26 Jan 2016 07:18:02 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=6o9/rGtu5xe3Utg5ERTU1iST2/LaYfwiIRxduGRv1W8=; b=LD/KMu69ixZn2dop4Ofic7V6s5kZ6bhwS7qXtjJOvOuURVEHAlP3YbUkXKaA0LbVhm LrDr1RdzsNuV2FvzB6OoZUDoQKo5YDaIBQEzs62JNrZDrec1IiZNAnK4ZnVmKNZPLG0b cPKjzk3kip025TBUFarEHys3IQ+6W7YsMHPd5sXL/RwfGp9zFNUdDg53TprmQk2VdUDL SGiAVmHqRnsTs3noOCxznHBQEbDZcYTT+X9EPIp+GvveVt3uF6R6oGvTUYU3io4Jby+z mi6vqqfVyX9fNWhjR0QBDe+wfxObu8bcyAdeJBZbbvlA1Ky8xvTxB+7Ytb+ey6YLH//d YKlg== X-Gm-Message-State: AG10YOTQq7B6+hgj3anR+qooNFpIGPwcytm6kQ7oEyS0w/u2K9HO6Yu2WVJ6pXJs23hsukZ2LRDuwBPnffy3WvF8 MIME-Version: 1.0 X-Received: by 10.140.171.5 with SMTP id r5mr31182001qhr.51.1453821480532; Tue, 26 Jan 2016 07:18:00 -0800 (PST) Received: by 10.140.90.84 with HTTP; Tue, 26 Jan 2016 07:18:00 -0800 (PST) In-Reply-To: <56A77289.30308@foss.arm.com> References: <1453143711-21320-1-git-send-email-alan.lawrence@arm.com> <569E4D77.6000809@foss.arm.com> <56A77289.30308@foss.arm.com> Date: Tue, 26 Jan 2016 16:18:00 +0100 Message-ID: Subject: Re: [PATCH] ARM PR68620 (ICE with FP16 on armeb) From: Christophe Lyon To: Kyrill Tkachov Cc: Alan Lawrence , "gcc-patches@gcc.gnu.org" X-IsSubscribed: yes On 26 January 2016 at 14:20, Kyrill Tkachov wrote: > Hi Christophe, > > On 20/01/16 21:10, Christophe Lyon wrote: >> >> On 19 January 2016 at 15:51, Alan Lawrence >> wrote: >>> >>> On 19/01/16 11:15, Christophe Lyon wrote: >>> >>>>>> For neon_vdupn, I chose to implement neon_vdup_nv4hf and >>>>>> neon_vdup_nv8hf instead of updating the VX iterator because I thought >>>>>> it was not desirable to impact neon_vrev32. >>>>> >>>>> >>>>> Well, the same instruction will suffice for vrev32'ing vectors of HF >>>>> just >>>>> as >>>>> well as vectors of HI, so I think I'd argue that's harmless enough. To >>>>> gain the >>>>> benefit, we'd need to update arm_evpc_neon_vrev with a few new cases, >>>>> though. >>>>> >>>> Since this is more intrusive, I'd rather leave that part for later. OK? >>> >>> >>> Sure. >>> >>>>>> +#ifdef __ARM_BIG_ENDIAN >>>>>> + /* Here, 3 is (4-1) where 4 is the number of lanes. This is also >>>>>> the >>>>>> + right value for vectors with 8 lanes. */ >>>>>> +#define __arm_lane(__vec, __idx) (__idx ^ 3) >>>>>> +#else >>>>>> +#define __arm_lane(__vec, __idx) __idx >>>>>> +#endif >>>>>> + >>>>> >>>>> >>>>> Looks right, but sounds... my concern here is that I'm hoping at some >>>>> point we >>>>> will move the *other* vget/set_lane intrinsics to use GCC vector >>>>> extensions >>>>> too. At which time (unlike __aarch64_lane which can be used everywhere) >>>>> this >>>>> will be the wrong formula. Can we name (and/or comment) it to avoid >>>>> misleading >>>>> anyone? The key characteristic seems to be that it is for vectors of >>>>> 16-bit >>>>> elements only. >>>>> >>>> I'm not to follow, here. Looking at the patterns for >>>> neon_vget_lane_*internal in neon.md, >>>> I can see 2 flavours: one for VD, one for VQ2. The latter uses >>>> "halfelts". >>>> >>>> Do you prefer that I create 2 macros (say __arm_lane and __arm_laneq), >>>> that would be similar to the aarch64 ones (by computing the number of >>>> lanes of the input vector), but the "q" one would use half the total >>>> number of lanes instead? >>> >>> >>> That works for me! Sthg like: >>> >>> #define __arm_lane(__vec, __idx) NUM_LANES(__vec) - __idx >>> #define __arm_laneq(__vec, __idx) (__idx & (NUM_LANES(__vec)/2)) + >>> (NUM_LANES(__vec)/2 - __idx) >>> //or similarly >>> #define __arm_laneq(__vec, __idx) (__idx ^ (NUM_LANES(__vec)/2 - 1)) >>> >>> Alternatively I'd been thinking >>> >>> #define __arm_lane_32xN(__idx) __idx ^ 1 >>> #define __arm_lane_16xN(__idx) __idx ^ 3 >>> #define __arm_lane_8xN(__idx) __idx ^ 7 >>> >>> Bear in mind PR64893 that we had on AArch64 :-( >>> >> Here is a new version, based on the comments above. >> I've also removed the addition of arm_fp_ok effective target since I >> added that in my other testsuite patch. >> >> OK now? >> >> Thanks, >> >> Christophe >> > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index 3588b83..b1f408c 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -12370,6 +12370,10 @@ neon_valid_immediate (rtx op, machine_mode mode, > int inverse, > if (!vfp3_const_double_rtx (el0) && el0 != CONST0_RTX (GET_MODE > (el0))) > return -1; > + /* FP16 vectors cannot be represented. */ > + if (innersize == 2) > + return -1; > + > r0 = CONST_DOUBLE_REAL_VALUE (el0); > > > I think it'd be clearer to write "if (GET_MODE_INNER (mode) == HFmode)" > > +(define_expand "movv4hf" > + [(set (match_operand:V4HF 0 "s_register_operand") > + (match_operand:V4HF 1 "s_register_operand"))] > + "TARGET_NEON && TARGET_FP16" > +{ > + if (can_create_pseudo_p ()) > + { > + if (!REG_P (operands[0])) > + operands[1] = force_reg (V4HFmode, operands[1]); > + } > +}) > > Can you please add a comment saying why you need the force_reg here? > IIRC it's because of CANNOT_CHANGE_MODE_CLASS on big-endian that causes an > ICE during expand with subregs. > > I've tried this patch out and it does indeed fix the ICE on armeb. > So ok for trunk with the changes above. > Thanks, > Kyrill > > OK thanks, here is what I have committed (r232832). Christophe. Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c (revision 232831) +++ gcc/config/arm/arm.c (working copy) @@ -12381,6 +12381,10 @@ if (!vfp3_const_double_rtx (el0) && el0 != CONST0_RTX (GET_MODE (el0))) return -1; + /* FP16 vectors cannot be represented. */ + if (GET_MODE_INNER (mode) == HFmode) + return -1; + r0 = CONST_DOUBLE_REAL_VALUE (el0); for (i = 1; i < n_elts; i++) Index: gcc/config/arm/arm_neon.h =================================================================== --- gcc/config/arm/arm_neon.h (revision 232831) +++ gcc/config/arm/arm_neon.h (working copy) @@ -5302,16 +5302,28 @@ were marked always-inline so there were no call sites, the declaration would nonetheless raise an error. Hence, we must use a macro instead. */ -#define vget_lane_f16(__v, __idx) \ - __extension__ \ - ({ \ - float16x4_t __vec = (__v); \ - __builtin_arm_lane_check (4, __idx); \ - float16_t __res = __vec[__idx]; \ - __res; \ - }) + /* For big-endian, GCC's vector indices are reversed within each 64 + bits compared to the architectural lane indices used by Neon + intrinsics. */ +#ifdef __ARM_BIG_ENDIAN +#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0])) +#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 1)) +#define __arm_laneq(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 - 1)) +#else +#define __arm_lane(__vec, __idx) __idx +#define __arm_laneq(__vec, __idx) __idx #endif +#define vget_lane_f16(__v, __idx) \ + __extension__ \ + ({ \ + float16x4_t __vec = (__v); \ + __builtin_arm_lane_check (4, __idx); \ + float16_t __res = __vec[__arm_lane(__vec, __idx)]; \ + __res; \ + }) +#endif + __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vget_lane_f32 (float32x2_t __a, const int __b) { @@ -5379,14 +5391,14 @@ } #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE) -#define vgetq_lane_f16(__v, __idx) \ - __extension__ \ - ({ \ - float16x8_t __vec = (__v); \ - __builtin_arm_lane_check (8, __idx); \ - float16_t __res = __vec[__idx]; \ - __res; \ - }) +#define vgetq_lane_f16(__v, __idx) \ + __extension__ \ + ({ \ + float16x8_t __vec = (__v); \ + __builtin_arm_lane_check (8, __idx); \ + float16_t __res = __vec[__arm_laneq(__vec, __idx)]; \ + __res; \ + }) #endif __extension__ static __inline float32_t __attribute__ ((__always_inline__)) @@ -5458,13 +5470,13 @@ #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE) #define vset_lane_f16(__e, __v, __idx) \ __extension__ \ - ({ \ - float16_t __elem = (__e); \ - float16x4_t __vec = (__v); \ - __builtin_arm_lane_check (4, __idx); \ - __vec[__idx] = __elem; \ - __vec; \ - }) + ({ \ + float16_t __elem = (__e); \ + float16x4_t __vec = (__v); \ + __builtin_arm_lane_check (4, __idx); \ + __vec[__arm_lane (__vec, __idx)] = __elem; \ + __vec; \ + }) #endif __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) @@ -5536,13 +5548,13 @@ #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE) #define vsetq_lane_f16(__e, __v, __idx) \ __extension__ \ - ({ \ - float16_t __elem = (__e); \ - float16x8_t __vec = (__v); \ - __builtin_arm_lane_check (8, __idx); \ - __vec[__idx] = __elem; \ - __vec; \ - }) + ({ \ + float16_t __elem = (__e); \ + float16x8_t __vec = (__v); \ + __builtin_arm_lane_check (8, __idx); \ + __vec[__arm_laneq (__vec, __idx)] = __elem; \ + __vec; \ + }) #endif __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) Index: gcc/config/arm/iterators.md =================================================================== --- gcc/config/arm/iterators.md (revision 232831) +++ gcc/config/arm/iterators.md (working copy) @@ -99,7 +99,7 @@ (define_mode_iterator VQI [V16QI V8HI V4SI]) ;; Quad-width vector modes, with TImode added, for moves. -(define_mode_iterator VQXMOV [V16QI V8HI V4SI V4SF V2DI TI]) +(define_mode_iterator VQXMOV [V16QI V8HI V8HF V4SI V4SF V2DI TI]) ;; Opaque structure types wider than TImode. (define_mode_iterator VSTRUCT [EI OI CI XI]) @@ -114,7 +114,7 @@ (define_mode_iterator VN [V8HI V4SI V2DI]) ;; All supported vector modes (except singleton DImode). -(define_mode_iterator VDQ [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF V2DI]) +(define_mode_iterator VDQ [V8QI V16QI V4HI V8HI V2SI V4SI V4HF V8HF V2SF V4SF V2DI]) ;; All supported vector modes (except those with 64-bit integer elements). (define_mode_iterator VDQW [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF]) @@ -428,6 +428,7 @@ ;; Register width from element mode (define_mode_attr V_reg [(V8QI "P") (V16QI "q") (V4HI "P") (V8HI "q") + (V4HF "P") (V8HF "q") (V2SI "P") (V4SI "q") (V2SF "P") (V4SF "q") (DI "P") (V2DI "q") @@ -576,6 +577,7 @@ (define_mode_attr Is_float_mode [(V8QI "false") (V16QI "false") (V4HI "false") (V8HI "false") (V2SI "false") (V4SI "false") + (V4HF "true") (V8HF "true") (V2SF "true") (V4SF "true") (DI "false") (V2DI "false")]) Index: gcc/config/arm/neon.md =================================================================== --- gcc/config/arm/neon.md (revision 232831) +++ gcc/config/arm/neon.md (working copy) @@ -137,6 +137,36 @@ } }) +(define_expand "movv4hf" + [(set (match_operand:V4HF 0 "s_register_operand") + (match_operand:V4HF 1 "s_register_operand"))] + "TARGET_NEON && TARGET_FP16" +{ + /* We need to use force_reg to avoid CANNOT_CHANGE_MODE_CLASS + causing an ICE on big-endian because it cannot extract subregs in + this case. */ + if (can_create_pseudo_p ()) + { + if (!REG_P (operands[0])) + operands[1] = force_reg (V4HFmode, operands[1]); + } +}) + +(define_expand "movv8hf" + [(set (match_operand:V8HF 0 "") + (match_operand:V8HF 1 ""))] + "TARGET_NEON && TARGET_FP16" +{ + /* We need to use force_reg to avoid CANNOT_CHANGE_MODE_CLASS + causing an ICE on big-endian because it cannot extract subregs in + this case. */ + if (can_create_pseudo_p ()) + { + if (!REG_P (operands[0])) + operands[1] = force_reg (V8HFmode, operands[1]); + } +}) + (define_insn "*neon_mov" [(set (match_operand:VSTRUCT 0 "nonimmediate_operand" "=w,Ut,w") (match_operand:VSTRUCT 1 "general_operand" " w,w, Ut"))] @@ -299,11 +329,11 @@ [(set_attr "type" "neon_load1_1reg")]) (define_insn "vec_set_internal" - [(set (match_operand:VD 0 "s_register_operand" "=w,w") - (vec_merge:VD - (vec_duplicate:VD + [(set (match_operand:VD_LANE 0 "s_register_operand" "=w,w") + (vec_merge:VD_LANE + (vec_duplicate:VD_LANE (match_operand: 1 "nonimmediate_operand" "Um,r")) - (match_operand:VD 3 "s_register_operand" "0,0") + (match_operand:VD_LANE 3 "s_register_operand" "0,0") (match_operand:SI 2 "immediate_operand" "i,i")))] "TARGET_NEON" { @@ -385,7 +415,7 @@ (define_insn "vec_extract" [(set (match_operand: 0 "nonimmediate_operand" "=Um,r") (vec_select: - (match_operand:VD 1 "s_register_operand" "w,w") + (match_operand:VD_LANE 1 "s_register_operand" "w,w") (parallel [(match_operand:SI 2 "immediate_operand" "i,i")])))] "TARGET_NEON" { @@ -2829,6 +2859,22 @@ [(set_attr "type" "neon_from_gp")] ) +(define_insn "neon_vdup_nv4hf" + [(set (match_operand:V4HF 0 "s_register_operand" "=w") + (vec_duplicate:V4HF (match_operand:HF 1 "s_register_operand" "r")))] + "TARGET_NEON" + "vdup.16\t%P0, %1" + [(set_attr "type" "neon_from_gp")] +) + +(define_insn "neon_vdup_nv8hf" + [(set (match_operand:V8HF 0 "s_register_operand" "=w") + (vec_duplicate:V8HF (match_operand:HF 1 "s_register_operand" "r")))] + "TARGET_NEON" + "vdup.16\t%q0, %1" + [(set_attr "type" "neon_from_gp_q")] +) + (define_insn "neon_vdup_n" [(set (match_operand:V32 0 "s_register_operand" "=w,w") (vec_duplicate:V32 (match_operand: 1 "s_register_operand" "r,t")))] @@ -4361,8 +4407,8 @@ ) (define_insn "neon_vld1_dup" - [(set (match_operand:VD 0 "s_register_operand" "=w") - (vec_duplicate:VD (match_operand: 1 "neon_struct_operand" "Um")))] + [(set (match_operand:VD_LANE 0 "s_register_operand" "=w") + (vec_duplicate:VD_LANE (match_operand: 1 "neon_struct_operand" "Um")))] "TARGET_NEON" "vld1.\t{%P0[]}, %A1" [(set_attr "type" "neon_load1_all_lanes")] @@ -4378,8 +4424,8 @@ ) (define_insn "neon_vld1_dup" - [(set (match_operand:VQ 0 "s_register_operand" "=w") - (vec_duplicate:VQ (match_operand: 1 "neon_struct_operand" "Um")))] + [(set (match_operand:VQ2 0 "s_register_operand" "=w") + (vec_duplicate:VQ2 (match_operand: 1 "neon_struct_operand" "Um")))] "TARGET_NEON" { return "vld1.\t{%e0[], %f0[]}, %A1"; Index: gcc/testsuite/gcc.target/arm/pr68620.c =================================================================== --- gcc/testsuite/gcc.target/arm/pr68620.c (revision 0) +++ gcc/testsuite/gcc.target/arm/pr68620.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_fp_ok } */ +/* { dg-options "-mfp16-format=ieee" } */ +/* { dg-add-options arm_fp } */ + +#include "arm_neon.h" + +float16x4_t __attribute__((target("fpu=neon-fp16"))) +foo (float32x4_t arg) +{ + return vcvt_f16_f32 (arg); +}