From patchwork Tue Jan 26 15:18:00 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Christophe Lyon <christophe.lyon@linaro.org>
X-Patchwork-Id: 60463
Delivered-To: patch@linaro.org
Received: by 10.112.130.2 with SMTP id oa2csp2022410lbb;
 Tue, 26 Jan 2016 07:18:28 -0800 (PST)
X-Received: by 10.67.6.195 with SMTP id cw3mr34766716pad.88.1453821508290;
 Tue, 26 Jan 2016 07:18:28 -0800 (PST)
Return-Path: <gcc-patches-return-420061-patch=linaro.org@gcc.gnu.org>
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 y63si2559380pfi.175.2016.01.26.07.18.27 for <patch@linaro.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 26 Jan 2016 07:18:28 -0800 (PST)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-420061-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Authentication-Results: mx.google.com; spf=pass (google.com: domain of
 gcc-patches-return-420061-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom=gcc-patches-return-420061-patch=linaro.org@gcc.gnu.org;
 dkim=pass header.i=@gcc.gnu.org
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :mime-version:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type; q=dns; s=default; b=dVroE5wsc+1Ba51zsG
 Ri/Cj2byFHs/tEmTF/u5nzmI1J2PEn0wj6y98PCJbf/B6Q0H+C6oFhkHflcG8hQu
 KASuMY4E2D8DcJjr7AzE8cXRD9/Q2InXBe43qFMP+C/Bb8+356rqfDKyM4bJHZZo
 SShMoCyE/B8+8TPzaYLQ5bdW8=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :mime-version:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type; s=default; bh=A9fU9RzHJ6vbk+A5oqhHB9Yz
 8jM=; b=aOM/CILnbPHcJUm/M7iQAaEbgbxHXYaAlch1Zakg6R5jtGjYD2P4MRZy
 vFSJld1fpA5/oK1O9935rDxYqLo/1QT25SU2BtQdlYmHQCsltr3TtHlffVcBhkvV
 Qmi2wIAAx6u477n2H/8fnOgnw++1gZBDikYkZ1+FtLOc0YdXR+A=
Received: (qmail 120734 invoked by alias); 26 Jan 2016 15:18:08 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-patch=linaro.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 119735 invoked by uid 89); 26 Jan 2016 15:18:07 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL, BAYES_50,
 KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_LOW,
 SPF_PASS autolearn=no version=3.3.2 spammy=4157, ii, 415, 7,
 rt
X-HELO: mail-qg0-f53.google.com
Received: from mail-qg0-f53.google.com (HELO mail-qg0-f53.google.com)
 (209.85.192.53) by sourceware.org
 (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256
 encrypted) ESMTPS; Tue, 26 Jan 2016 15:18:03 +0000
Received: by mail-qg0-f53.google.com with SMTP id b35so140524912qge.0 for
 <gcc-patches@gcc.gnu.org>; Tue, 26 Jan 2016 07:18:02 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net;
 s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:date
 :message-id:subject:from:to:cc:content-type;
 bh=6o9/rGtu5xe3Utg5ERTU1iST2/LaYfwiIRxduGRv1W8=;
 b=LD/KMu69ixZn2dop4Ofic7V6s5kZ6bhwS7qXtjJOvOuURVEHAlP3YbUkXKaA0LbVhm
 LrDr1RdzsNuV2FvzB6OoZUDoQKo5YDaIBQEzs62JNrZDrec1IiZNAnK4ZnVmKNZPLG0b
 cPKjzk3kip025TBUFarEHys3IQ+6W7YsMHPd5sXL/RwfGp9zFNUdDg53TprmQk2VdUDL
 SGiAVmHqRnsTs3noOCxznHBQEbDZcYTT+X9EPIp+GvveVt3uF6R6oGvTUYU3io4Jby+z
 mi6vqqfVyX9fNWhjR0QBDe+wfxObu8bcyAdeJBZbbvlA1Ky8xvTxB+7Ytb+ey6YLH//d
 YKlg==
X-Gm-Message-State: AG10YOTQq7B6+hgj3anR+qooNFpIGPwcytm6kQ7oEyS0w/u2K9HO6Yu2WVJ6pXJs23hsukZ2LRDuwBPnffy3WvF8
MIME-Version: 1.0
X-Received: by 10.140.171.5 with SMTP id r5mr31182001qhr.51.1453821480532;
 Tue, 26 Jan 2016 07:18:00 -0800 (PST)
Received: by 10.140.90.84 with HTTP; Tue, 26 Jan 2016 07:18:00 -0800 (PST)
In-Reply-To: <56A77289.30308@foss.arm.com>
References: <CAKdteOZbgTw44UarkL9zmmZBd3_gGPof9v0m8mPH-xu9=8yqqQ@mail.gmail.com>	<1453143711-21320-1-git-send-email-alan.lawrence@arm.com>	<CAKdteOaNmfhgeFVeaEgddvQ-FntWudtQSowYgPtW=xi8Ert60Q@mail.gmail.com>	<569E4D77.6000809@foss.arm.com>	<CAKdteOa+mqbODEs7jWb-Cj4H+OX93jDPwH1uauY2=0Tn-S=LpA@mail.gmail.com>	<56A77289.30308@foss.arm.com>
Date: Tue, 26 Jan 2016 16:18:00 +0100
Message-ID: <CAKdteObAiKtuDG3sV3wtGA-iqtHycdKBsSdGpvDaeTKSJzFRyw@mail.gmail.com>
Subject: Re: [PATCH] ARM PR68620 (ICE with FP16 on armeb)
From: Christophe Lyon <christophe.lyon@linaro.org>
To: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
Cc: Alan Lawrence <alan.lawrence@foss.arm.com>,
 "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
X-IsSubscribed: yes

On 26 January 2016 at 14:20, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
> Hi Christophe,
>
> On 20/01/16 21:10, Christophe Lyon wrote:
>>
>> On 19 January 2016 at 15:51, Alan Lawrence <alan.lawrence@foss.arm.com>
>> wrote:
>>>
>>> On 19/01/16 11:15, Christophe Lyon wrote:
>>>
>>>>>> For neon_vdupn, I chose to implement neon_vdup_nv4hf and
>>>>>> neon_vdup_nv8hf instead of updating the VX iterator because I thought
>>>>>> it was not desirable to impact neon_vrev32<mode>.
>>>>>
>>>>>
>>>>> Well, the same instruction will suffice for vrev32'ing vectors of HF
>>>>> just
>>>>> as
>>>>> well as vectors of HI, so I think I'd argue that's harmless enough. To
>>>>> gain the
>>>>> benefit, we'd need to update arm_evpc_neon_vrev with a few new cases,
>>>>> though.
>>>>>
>>>> Since this is more intrusive, I'd rather leave that part for later. OK?
>>>
>>>
>>> Sure.
>>>
>>>>>> +#ifdef __ARM_BIG_ENDIAN
>>>>>> +  /* Here, 3 is (4-1) where 4 is the number of lanes. This is also
>>>>>> the
>>>>>> +     right value for vectors with 8 lanes.  */
>>>>>> +#define __arm_lane(__vec, __idx) (__idx ^ 3)
>>>>>> +#else
>>>>>> +#define __arm_lane(__vec, __idx) __idx
>>>>>> +#endif
>>>>>> +
>>>>>
>>>>>
>>>>> Looks right, but sounds... my concern here is that I'm hoping at some
>>>>> point we
>>>>> will move the *other* vget/set_lane intrinsics to use GCC vector
>>>>> extensions
>>>>> too. At which time (unlike __aarch64_lane which can be used everywhere)
>>>>> this
>>>>> will be the wrong formula. Can we name (and/or comment) it to avoid
>>>>> misleading
>>>>> anyone? The key characteristic seems to be that it is for vectors of
>>>>> 16-bit
>>>>> elements only.
>>>>>
>>>> I'm not to follow, here. Looking at the patterns for
>>>> neon_vget_lane<mode>_*internal in neon.md,
>>>> I can see 2 flavours: one for VD, one for VQ2. The latter uses
>>>> "halfelts".
>>>>
>>>> Do you prefer that I create 2 macros (say __arm_lane and __arm_laneq),
>>>> that would be similar to the aarch64 ones (by computing the number of
>>>> lanes of the input vector), but the "q" one would use half the total
>>>> number of lanes instead?
>>>
>>>
>>> That works for me! Sthg like:
>>>
>>> #define __arm_lane(__vec, __idx) NUM_LANES(__vec) - __idx
>>> #define __arm_laneq(__vec, __idx) (__idx & (NUM_LANES(__vec)/2)) +
>>> (NUM_LANES(__vec)/2 - __idx)
>>> //or similarly
>>> #define __arm_laneq(__vec, __idx) (__idx ^ (NUM_LANES(__vec)/2 - 1))
>>>
>>> Alternatively I'd been thinking
>>>
>>> #define __arm_lane_32xN(__idx) __idx ^ 1
>>> #define __arm_lane_16xN(__idx) __idx ^ 3
>>> #define __arm_lane_8xN(__idx) __idx ^ 7
>>>
>>> Bear in mind PR64893 that we had on AArch64 :-(
>>>
>> Here is a new version, based on the comments above.
>> I've also removed the addition of arm_fp_ok effective target since I
>> added that in my other testsuite patch.
>>
>> OK now?
>>
>> Thanks,
>>
>> Christophe
>>
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 3588b83..b1f408c 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -12370,6 +12370,10 @@ neon_valid_immediate (rtx op, machine_mode mode,
> int inverse,
>        if (!vfp3_const_double_rtx (el0) && el0 != CONST0_RTX (GET_MODE
> (el0)))
>          return -1;
>  +      /* FP16 vectors cannot be represented.  */
> +      if (innersize == 2)
> +       return -1;
> +
>        r0 = CONST_DOUBLE_REAL_VALUE (el0);
>
>
> I think it'd be clearer to write "if (GET_MODE_INNER (mode) == HFmode)"
>
> +(define_expand "movv4hf"
> +  [(set (match_operand:V4HF 0 "s_register_operand")
> +       (match_operand:V4HF 1 "s_register_operand"))]
> +  "TARGET_NEON && TARGET_FP16"
> +{
> +  if (can_create_pseudo_p ())
> +    {
> +      if (!REG_P (operands[0]))
> +       operands[1] = force_reg (V4HFmode, operands[1]);
> +    }
> +})
>
> Can you please add a comment saying why you need the force_reg here?
> IIRC it's because of CANNOT_CHANGE_MODE_CLASS on big-endian that causes an
> ICE during expand with subregs.
>
> I've tried this patch out and it does indeed fix the ICE on armeb.
> So ok for trunk with the changes above.
> Thanks,
> Kyrill
>
>

OK thanks, here is what I have committed (r232832).


Christophe.

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 232831)
+++ gcc/config/arm/arm.c	(working copy)
@@ -12381,6 +12381,10 @@
       if (!vfp3_const_double_rtx (el0) && el0 != CONST0_RTX (GET_MODE (el0)))
         return -1;
 
+      /* FP16 vectors cannot be represented.  */
+      if (GET_MODE_INNER (mode) == HFmode)
+	return -1;
+
       r0 = CONST_DOUBLE_REAL_VALUE (el0);
 
       for (i = 1; i < n_elts; i++)
Index: gcc/config/arm/arm_neon.h
===================================================================
--- gcc/config/arm/arm_neon.h	(revision 232831)
+++ gcc/config/arm/arm_neon.h	(working copy)
@@ -5302,16 +5302,28 @@
    were marked always-inline so there were no call sites, the declaration
    would nonetheless raise an error.  Hence, we must use a macro instead.  */
 
-#define vget_lane_f16(__v, __idx)		\
-  __extension__					\
-    ({						\
-      float16x4_t __vec = (__v);		\
-      __builtin_arm_lane_check (4, __idx);	\
-      float16_t __res = __vec[__idx];		\
-      __res;					\
-    })
+  /* For big-endian, GCC's vector indices are reversed within each 64
+     bits compared to the architectural lane indices used by Neon
+     intrinsics.  */
+#ifdef __ARM_BIG_ENDIAN
+#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0]))
+#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 1))
+#define __arm_laneq(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 - 1))
+#else
+#define __arm_lane(__vec, __idx) __idx
+#define __arm_laneq(__vec, __idx) __idx
 #endif
 
+#define vget_lane_f16(__v, __idx)			\
+  __extension__						\
+  ({							\
+    float16x4_t __vec = (__v);				\
+    __builtin_arm_lane_check (4, __idx);		\
+    float16_t __res = __vec[__arm_lane(__vec, __idx)];	\
+    __res;						\
+  })
+#endif
+
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
 vget_lane_f32 (float32x2_t __a, const int __b)
 {
@@ -5379,14 +5391,14 @@
 }
 
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-#define vgetq_lane_f16(__v, __idx)		\
-  __extension__					\
-    ({						\
-      float16x8_t __vec = (__v);		\
-      __builtin_arm_lane_check (8, __idx);	\
-      float16_t __res = __vec[__idx];		\
-      __res;					\
-    })
+#define vgetq_lane_f16(__v, __idx)			\
+  __extension__						\
+  ({							\
+    float16x8_t __vec = (__v);				\
+    __builtin_arm_lane_check (8, __idx);		\
+    float16_t __res = __vec[__arm_laneq(__vec, __idx)];	\
+    __res;						\
+  })
 #endif
 
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
@@ -5458,13 +5470,13 @@
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 #define vset_lane_f16(__e, __v, __idx)		\
   __extension__					\
-    ({						\
-      float16_t __elem = (__e);			\
-      float16x4_t __vec = (__v);		\
-      __builtin_arm_lane_check (4, __idx);	\
-      __vec[__idx] = __elem;			\
-      __vec;					\
-    })
+  ({						\
+    float16_t __elem = (__e);			\
+    float16x4_t __vec = (__v);			\
+    __builtin_arm_lane_check (4, __idx);	\
+    __vec[__arm_lane (__vec, __idx)] = __elem;	\
+    __vec;					\
+  })
 #endif
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
@@ -5536,13 +5548,13 @@
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 #define vsetq_lane_f16(__e, __v, __idx)		\
   __extension__					\
-    ({						\
-      float16_t __elem = (__e);			\
-      float16x8_t __vec = (__v);		\
-      __builtin_arm_lane_check (8, __idx);	\
-      __vec[__idx] = __elem;			\
-      __vec;					\
-    })
+  ({						\
+    float16_t __elem = (__e);			\
+    float16x8_t __vec = (__v);			\
+    __builtin_arm_lane_check (8, __idx);	\
+    __vec[__arm_laneq (__vec, __idx)] = __elem;	\
+    __vec;					\
+  })
 #endif
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
Index: gcc/config/arm/iterators.md
===================================================================
--- gcc/config/arm/iterators.md	(revision 232831)
+++ gcc/config/arm/iterators.md	(working copy)
@@ -99,7 +99,7 @@
 (define_mode_iterator VQI [V16QI V8HI V4SI])
 
 ;; Quad-width vector modes, with TImode added, for moves.
-(define_mode_iterator VQXMOV [V16QI V8HI V4SI V4SF V2DI TI])
+(define_mode_iterator VQXMOV [V16QI V8HI V8HF V4SI V4SF V2DI TI])
 
 ;; Opaque structure types wider than TImode.
 (define_mode_iterator VSTRUCT [EI OI CI XI])
@@ -114,7 +114,7 @@
 (define_mode_iterator VN [V8HI V4SI V2DI])
 
 ;; All supported vector modes (except singleton DImode).
-(define_mode_iterator VDQ [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF V2DI])
+(define_mode_iterator VDQ [V8QI V16QI V4HI V8HI V2SI V4SI V4HF V8HF V2SF V4SF V2DI])
 
 ;; All supported vector modes (except those with 64-bit integer elements).
 (define_mode_iterator VDQW [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF])
@@ -428,6 +428,7 @@
 ;; Register width from element mode
 (define_mode_attr V_reg [(V8QI "P") (V16QI "q")
                          (V4HI "P") (V8HI  "q")
+                         (V4HF "P") (V8HF  "q")
                          (V2SI "P") (V4SI  "q")
                          (V2SF "P") (V4SF  "q")
                          (DI   "P") (V2DI  "q")
@@ -576,6 +577,7 @@
 (define_mode_attr Is_float_mode [(V8QI "false") (V16QI "false")
                  (V4HI "false") (V8HI "false")
                  (V2SI "false") (V4SI "false")
+                 (V4HF "true") (V8HF "true")
                  (V2SF "true") (V4SF "true")
                  (DI "false") (V2DI "false")])
 
Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	(revision 232831)
+++ gcc/config/arm/neon.md	(working copy)
@@ -137,6 +137,36 @@
     }
 })
 
+(define_expand "movv4hf"
+  [(set (match_operand:V4HF 0 "s_register_operand")
+	(match_operand:V4HF 1 "s_register_operand"))]
+  "TARGET_NEON && TARGET_FP16"
+{
+  /* We need to use force_reg to avoid CANNOT_CHANGE_MODE_CLASS
+     causing an ICE on big-endian because it cannot extract subregs in
+     this case.  */
+  if (can_create_pseudo_p ())
+    {
+      if (!REG_P (operands[0]))
+	operands[1] = force_reg (V4HFmode, operands[1]);
+    }
+})
+
+(define_expand "movv8hf"
+  [(set (match_operand:V8HF 0 "")
+	(match_operand:V8HF 1 ""))]
+  "TARGET_NEON && TARGET_FP16"
+{ 
+  /* We need to use force_reg to avoid CANNOT_CHANGE_MODE_CLASS
+     causing an ICE on big-endian because it cannot extract subregs in
+     this case.  */
+  if (can_create_pseudo_p ())
+    {
+      if (!REG_P (operands[0]))
+	operands[1] = force_reg (V8HFmode, operands[1]);
+    }
+})
+
 (define_insn "*neon_mov<mode>"
   [(set (match_operand:VSTRUCT 0 "nonimmediate_operand"	"=w,Ut,w")
 	(match_operand:VSTRUCT 1 "general_operand"	" w,w, Ut"))]
@@ -299,11 +329,11 @@
   [(set_attr "type" "neon_load1_1reg<q>")])
 
 (define_insn "vec_set<mode>_internal"
-  [(set (match_operand:VD 0 "s_register_operand" "=w,w")
-        (vec_merge:VD
-          (vec_duplicate:VD
+  [(set (match_operand:VD_LANE 0 "s_register_operand" "=w,w")
+        (vec_merge:VD_LANE
+          (vec_duplicate:VD_LANE
             (match_operand:<V_elem> 1 "nonimmediate_operand" "Um,r"))
-          (match_operand:VD 3 "s_register_operand" "0,0")
+          (match_operand:VD_LANE 3 "s_register_operand" "0,0")
           (match_operand:SI 2 "immediate_operand" "i,i")))]
   "TARGET_NEON"
 {
@@ -385,7 +415,7 @@
 (define_insn "vec_extract<mode>"
   [(set (match_operand:<V_elem> 0 "nonimmediate_operand" "=Um,r")
         (vec_select:<V_elem>
-          (match_operand:VD 1 "s_register_operand" "w,w")
+          (match_operand:VD_LANE 1 "s_register_operand" "w,w")
           (parallel [(match_operand:SI 2 "immediate_operand" "i,i")])))]
   "TARGET_NEON"
 {
@@ -2829,6 +2859,22 @@
   [(set_attr "type" "neon_from_gp<q>")]
 )
 
+(define_insn "neon_vdup_nv4hf"
+  [(set (match_operand:V4HF 0 "s_register_operand" "=w")
+        (vec_duplicate:V4HF (match_operand:HF 1 "s_register_operand" "r")))]
+  "TARGET_NEON"
+  "vdup.16\t%P0, %1"
+  [(set_attr "type" "neon_from_gp")]
+)
+
+(define_insn "neon_vdup_nv8hf"
+  [(set (match_operand:V8HF 0 "s_register_operand" "=w")
+        (vec_duplicate:V8HF (match_operand:HF 1 "s_register_operand" "r")))]
+  "TARGET_NEON"
+  "vdup.16\t%q0, %1"
+  [(set_attr "type" "neon_from_gp_q")]
+)
+
 (define_insn "neon_vdup_n<mode>"
   [(set (match_operand:V32 0 "s_register_operand" "=w,w")
         (vec_duplicate:V32 (match_operand:<V_elem> 1 "s_register_operand" "r,t")))]
@@ -4361,8 +4407,8 @@
 )
 
 (define_insn "neon_vld1_dup<mode>"
-  [(set (match_operand:VD 0 "s_register_operand" "=w")
-        (vec_duplicate:VD (match_operand:<V_elem> 1 "neon_struct_operand" "Um")))]
+  [(set (match_operand:VD_LANE 0 "s_register_operand" "=w")
+        (vec_duplicate:VD_LANE (match_operand:<V_elem> 1 "neon_struct_operand" "Um")))]
   "TARGET_NEON"
   "vld1.<V_sz_elem>\t{%P0[]}, %A1"
   [(set_attr "type" "neon_load1_all_lanes<q>")]
@@ -4378,8 +4424,8 @@
 )
 
 (define_insn "neon_vld1_dup<mode>"
-  [(set (match_operand:VQ 0 "s_register_operand" "=w")
-        (vec_duplicate:VQ (match_operand:<V_elem> 1 "neon_struct_operand" "Um")))]
+  [(set (match_operand:VQ2 0 "s_register_operand" "=w")
+        (vec_duplicate:VQ2 (match_operand:<V_elem> 1 "neon_struct_operand" "Um")))]
   "TARGET_NEON"
 {
   return "vld1.<V_sz_elem>\t{%e0[], %f0[]}, %A1";
Index: gcc/testsuite/gcc.target/arm/pr68620.c
===================================================================
--- gcc/testsuite/gcc.target/arm/pr68620.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/pr68620.c	(working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_fp_ok } */
+/* { dg-options "-mfp16-format=ieee" } */
+/* { dg-add-options arm_fp } */
+
+#include "arm_neon.h"
+
+float16x4_t __attribute__((target("fpu=neon-fp16")))
+foo (float32x4_t arg)
+{
+    return vcvt_f16_f32 (arg);
+}