From patchwork Tue Oct 25 04:31:21 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
X-Patchwork-Id: 79119
Delivered-To: patch@linaro.org
Received: by 10.140.97.247 with SMTP id m110csp2920251qge;
 Mon, 24 Oct 2016 21:32:50 -0700 (PDT)
X-Received: by 10.200.44.203 with SMTP id 11mr16529694qtx.98.1477369970642; 
 Mon, 24 Oct 2016 21:32:50 -0700 (PDT)
Return-Path: <qemu-devel-bounces+patch=linaro.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11])
 by mx.google.com with ESMTPS id
 a97si2676819qkh.90.2016.10.24.21.32.50 for <patch@linaro.org>
 (version=TLS1 cipher=AES128-SHA bits=128/128);
 Mon, 24 Oct 2016 21:32:50 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates
 2001:4830:134:3::11 as permitted sender)
 client-ip=2001:4830:134:3::11; 
Authentication-Results: mx.google.com; spf=pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates
 2001:4830:134:3::11 as permitted sender)
 smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org
Received: from localhost ([::1]:51570 helo=lists.gnu.org)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <qemu-devel-bounces+patch=linaro.org@nongnu.org>)
 id 1bytPu-0005MK-8Z
 for patch@linaro.org; Tue, 25 Oct 2016 00:32:50 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:36847)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <raji@linux.vnet.ibm.com>) id 1bytOn-0004zh-Dg
 for qemu-devel@nongnu.org; Tue, 25 Oct 2016 00:31:42 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <raji@linux.vnet.ibm.com>) id 1bytOk-0000SN-9Z
 for qemu-devel@nongnu.org; Tue, 25 Oct 2016 00:31:41 -0400
Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:34494)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <raji@linux.vnet.ibm.com>)
 id 1bytOk-0000S9-0O
 for qemu-devel@nongnu.org; Tue, 25 Oct 2016 00:31:38 -0400
Received: from pps.filterd (m0098404.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id
 u9P4Svpc144556
 for <qemu-devel@nongnu.org>; Tue, 25 Oct 2016 00:31:36 -0400
Received: from e28smtp05.in.ibm.com (e28smtp05.in.ibm.com [125.16.236.5])
 by mx0a-001b2d01.pphosted.com with ESMTP id 269yh19ynw-1
 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT)
 for <qemu-devel@nongnu.org>; Tue, 25 Oct 2016 00:31:35 -0400
Received: from localhost
 by e28smtp05.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
 Only! Violators will be prosecuted
 for <qemu-devel@nongnu.org> from <raji@linux.vnet.ibm.com>;
 Tue, 25 Oct 2016 10:01:32 +0530
Received: from d28dlp01.in.ibm.com (9.184.220.126)
 by e28smtp05.in.ibm.com (192.168.1.135) with IBM ESMTP SMTP Gateway:
 Authorized Use Only! Violators will be prosecuted; 
 Tue, 25 Oct 2016 10:01:30 +0530
Received: from d28relay02.in.ibm.com (d28relay02.in.ibm.com [9.184.220.59])
 by d28dlp01.in.ibm.com (Postfix) with ESMTP id 911B8E0040;
 Tue, 25 Oct 2016 10:01:21 +0530 (IST)
Received: from d28av04.in.ibm.com (d28av04.in.ibm.com [9.184.220.66])
 by d28relay02.in.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 u9P4VTAI46137594; Tue, 25 Oct 2016 10:01:29 +0530
Received: from d28av04.in.ibm.com (localhost [127.0.0.1])
 by d28av04.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id
 u9P4VR5d009169; Tue, 25 Oct 2016 10:01:29 +0530
Received: from oc4354787705.ibm.com ([9.109.223.104])
 by d28av04.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id
 u9P4VK61008623; Tue, 25 Oct 2016 10:01:21 +0530
To: Richard Henderson <rth@twiddle.net>, qemu-ppc@nongnu.org,
 david@gibson.dropbear.id.au
References: <1475041518-9757-1-git-send-email-raji@linux.vnet.ibm.com>
 <1475041518-9757-3-git-send-email-raji@linux.vnet.ibm.com>
 <443643e4-26c4-d049-c521-fc8a15da663f@twiddle.net>
 <897d84d4-9b85-4d03-a117-555da740b48c@linux.vnet.ibm.com>
From: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
Date: Tue, 25 Oct 2016 10:01:21 +0530
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.4.0
MIME-Version: 1.0
In-Reply-To: <897d84d4-9b85-4d03-a117-555da740b48c@linux.vnet.ibm.com>
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16102504-0016-0000-0000-0000035AD08A
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 16102504-0017-0000-0000-00002709BF8C
Message-Id: <e205d4b4-b287-d86d-62b0-c1c82e1241c3@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, ,
 definitions=2016-10-25_02:, , signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 spamscore=0 suspectscore=2
 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam
 adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000
 definitions=main-1610250076
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [generic]
X-Received-From: 148.163.156.1
Subject: Re: [Qemu-devel] [PATCH 2/6] target-ppc: add vextu[bhw]lx instructions
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: qemu-devel@nongnu.org, nikunj@linux.vnet.ibm.com,
 Avinesh Kumar <avinesku@linux.vnet.ibm.com>
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>

On 10/05/2016 10:51 AM, Rajalakshmi Srinivasaraghavan wrote:
>
>
> On 09/28/2016 10:24 PM, Richard Henderson wrote:
>> On 09/27/2016 10:45 PM, Rajalakshmi Srinivasaraghavan wrote:
>>> +#if defined(HOST_WORDS_BIGENDIAN)
>>> +#define VEXTULX_DO(name, elem)                                  \
>>> +target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b)  \
>>> +{                                                               \
>>> +    target_ulong r = 0;                                         \
>>> +    int i;                                                      \
>>> +    int index = a & 0xf;                                        \
>>> +    for (i = 0; i < elem; i++) {                                \
>>> +        r = r << 8;                                             \
>>> +        if (index + i <= 15) {                                  \
>>> +            r = r | b->u8[index + i];                           \
>>> + }                                                       \
>>> + }                                                           \
>>> +    return r;                                                   \
>>> +}
>>> +#else
>>> +#define VEXTULX_DO(name, elem)                                  \
>>> +target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b)  \
>>> +{                                                               \
>>> +    target_ulong r = 0;                                         \
>>> +    int i;                                                      \
>>> +    int index = 15 - (a & 0xf);                                 \
>>> +    for (i = 0; i < elem; i++) {                                \
>>> +        r = r << 8;                                             \
>>> +        if (index - i >= 0) {                                   \
>>> +            r = r | b->u8[index - i];                           \
>>> + }                                                       \
>>> + }                                                           \
>>> +    return r;                                                   \
>>> +}
>>> +#endif
>>> +
>>> +VEXTULX_DO(vextublx, 1)
>>> +VEXTULX_DO(vextuhlx, 2)
>>> +VEXTULX_DO(vextuwlx, 4)
>>> +#undef VEXTULX_DO
>> Ew.
>>
>> This should be one 128-bit shift and one and.
>>
>> Since the shift amount is a multiple of 8, the 128-bit shift for 
>> vextub[lr]x
>> does not need to cross a double-word boundary, and so can be 
>> decomposed into
>> one 64-bit shift of (count & 64 ? hi : lo).
>>
>> For vextu[hw]lr]x, you'd need to do the whole left-shift, 
>> right-shift, or thing.
>>
>> But still, fantastically better than a loop.
> Ack. Will send an updated patch.
Attached updated patch.
>>
>>
>> r~
>>
>>
>

-- 
Thanks
Rajalakshmi S

>From 59b96e11dd4c649ba9dbf0435439f717b931530f Mon Sep 17 00:00:00 2001
From: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
Date: Mon, 24 Oct 2016 11:36:33 +0530
Subject: [PATCH 1/2] target-ppc: add vextu[bhw]lx instructions

vextublx:  Vector Extract Unsigned Byte Left
vextuhlx:  Vector Extract Unsigned Halfword Left
vextuwlx:  Vector Extract Unsigned Word Left

Signed-off-by: Avinesh Kumar <avinesku@linux.vnet.ibm.com>
Signed-off-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
---
 target-ppc/helper.h                 |    3 ++
 target-ppc/int_helper.c             |   63 +++++++++++++++++++++++++++++++++++
 target-ppc/translate/vmx-impl.inc.c |   18 ++++++++++
 target-ppc/translate/vmx-ops.inc.c  |    4 ++-
 4 files changed, 87 insertions(+), 1 deletions(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 04c6421..8551568 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -357,6 +357,9 @@ DEF_HELPER_3(vpmsumb, void, avr, avr, avr)
 DEF_HELPER_3(vpmsumh, void, avr, avr, avr)
 DEF_HELPER_3(vpmsumw, void, avr, avr, avr)
 DEF_HELPER_3(vpmsumd, void, avr, avr, avr)
+DEF_HELPER_2(vextublx, tl, tl, avr)
+DEF_HELPER_2(vextuhlx, tl, tl, avr)
+DEF_HELPER_2(vextuwlx, tl, tl, avr)
 
 DEF_HELPER_2(vsbox, void, avr, avr)
 DEF_HELPER_3(vcipher, void, avr, avr, avr)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 5aee0a8..2b28848 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -1742,6 +1742,69 @@ void helper_vlogefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
     }
 }
 
+#define EXTRACT128(value, start, length)        \
+    ((value >> start) & (~(__uint128_t)0 >> (128 - length)))
+
+#if defined(HOST_WORDS_BIGENDIAN)
+# if defined (CONFIG_INT128)                                    \
+#  define VEXTULX_DO(name, elem)                                \
+target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b)  \
+{                                                               \
+    target_ulong r = 0;                                         \
+    int index = (a & 0xf) * 8;                                  \
+    r = EXTRACT128(b->u128, index, elem * 8);                   \
+    return r;                                                   \
+}
+# else 
+#  define VEXTULX_DO(name, elem)                                \
+target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b)  \
+{                                                               \
+    target_ulong r = 0;                                         \
+    int i;                                                      \
+    int index = a & 0xf;                                        \
+    for (i = 0; i < elem; i++) {                                \
+        r = r << 8;                                             \
+        if (index + i <= 15) {                                  \
+            r = r | b->u8[index + i];                           \
+        }                                                       \
+    }                                                           \
+    return r;                                                   \
+}
+# endif
+#else 
+# if defined (CONFIG_INT128)
+#  define VEXTULX_DO(name, elem)                                \
+target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b)  \
+{                                                               \
+    target_ulong r = 0;                                         \
+    int size =  elem * 8;                                       \
+    int index = (15 - (a & 0xf) + 1) * 8;                       \
+    r = EXTRACT128(b->u128, (index - size), size);              \
+    return r;                                                   \
+}
+# else
+#  define VEXTULX_DO(name, elem)                                \
+target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b)  \
+{                                                               \
+    target_ulong r = 0;                                         \
+    int i;                                                      \
+    int index = 15 - (a & 0xf);                                 \
+    for (i = 0; i < elem; i++) {                                \
+        r = r << 8;                                             \
+        if (index - i >= 0) {                                   \
+            r = r | b->u8[index - i];                           \
+        }                                                       \
+    }                                                           \
+    return r;                                                   \
+}
+# endif
+#endif
+
+VEXTULX_DO(vextublx, 1)
+VEXTULX_DO(vextuhlx, 2)
+VEXTULX_DO(vextuwlx, 4)
+#undef VEXTULX_DO
+
 /* The specification says that the results are undefined if all of the
  * shift counts are not identical.  We check to make sure that they are
  * to conform to what real hardware appears to do.  */
diff --git a/target-ppc/translate/vmx-impl.inc.c b/target-ppc/translate/vmx-impl.inc.c
index c8998f3..0a9d609 100644
--- a/target-ppc/translate/vmx-impl.inc.c
+++ b/target-ppc/translate/vmx-impl.inc.c
@@ -276,6 +276,19 @@ static void glue(gen_, name0##_##name1)(DisasContext *ctx)             \
     }                                                                  \
 }
 
+#define GEN_VXFORM_HETRO(name, opc2, opc3)                              \
+static void glue(gen_, name)(DisasContext *ctx)                         \
+{                                                                       \
+    TCGv_ptr rb;                                                        \
+    if (unlikely(!ctx->altivec_enabled)) {                              \
+        gen_exception(ctx, POWERPC_EXCP_VPU);                           \
+        return;                                                         \
+    }                                                                   \
+    rb = gen_avr_ptr(rB(ctx->opcode));                                  \
+    gen_helper_##name(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)], rb); \
+    tcg_temp_free_ptr(rb);                                              \
+}
+
 GEN_VXFORM(vaddubm, 0, 0);
 GEN_VXFORM(vadduhm, 0, 1);
 GEN_VXFORM(vadduwm, 0, 2);
@@ -441,6 +454,11 @@ GEN_VXFORM_ENV(vaddfp, 5, 0);
 GEN_VXFORM_ENV(vsubfp, 5, 1);
 GEN_VXFORM_ENV(vmaxfp, 5, 16);
 GEN_VXFORM_ENV(vminfp, 5, 17);
+GEN_VXFORM_HETRO(vextublx, 6, 24)
+GEN_VXFORM_HETRO(vextuhlx, 6, 25)
+GEN_VXFORM_HETRO(vextuwlx, 6, 26)
+GEN_VXFORM_DUAL(vmrgow, PPC_NONE, PPC2_ALTIVEC_207,
+                vextuwlx, PPC_NONE, PPC2_ISA300)
 
 #define GEN_VXRFORM1(opname, name, str, opc2, opc3)                     \
 static void glue(gen_, name)(DisasContext *ctx)                         \
diff --git a/target-ppc/translate/vmx-ops.inc.c b/target-ppc/translate/vmx-ops.inc.c
index 68cba3e..70dc250 100644
--- a/target-ppc/translate/vmx-ops.inc.c
+++ b/target-ppc/translate/vmx-ops.inc.c
@@ -91,8 +91,10 @@ GEN_VXFORM(vmrghw, 6, 2),
 GEN_VXFORM(vmrglb, 6, 4),
 GEN_VXFORM(vmrglh, 6, 5),
 GEN_VXFORM(vmrglw, 6, 6),
+GEN_VXFORM_300(vextublx, 6, 24),
+GEN_VXFORM_300(vextuhlx, 6, 25),
+GEN_VXFORM_DUAL(vmrgow, vextuwlx, 6, 26, PPC_NONE, PPC2_ALTIVEC_207),
 GEN_VXFORM_207(vmrgew, 6, 30),
-GEN_VXFORM_207(vmrgow, 6, 26),
 GEN_VXFORM(vmuloub, 4, 0),
 GEN_VXFORM(vmulouh, 4, 1),
 GEN_VXFORM_DUAL(vmulouw, vmuluwm, 4, 2, PPC_ALTIVEC, PPC_NONE),
-- 
1.7.1