From patchwork Tue Oct 25 04:31:21 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rajalakshmi Srinivasaraghavan X-Patchwork-Id: 79119 Delivered-To: patch@linaro.org Received: by 10.140.97.247 with SMTP id m110csp2920251qge; Mon, 24 Oct 2016 21:32:50 -0700 (PDT) X-Received: by 10.200.44.203 with SMTP id 11mr16529694qtx.98.1477369970642; Mon, 24 Oct 2016 21:32:50 -0700 (PDT) Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id a97si2676819qkh.90.2016.10.24.21.32.50 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 24 Oct 2016 21:32:50 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org Received: from localhost ([::1]:51570 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bytPu-0005MK-8Z for patch@linaro.org; Tue, 25 Oct 2016 00:32:50 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36847) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bytOn-0004zh-Dg for qemu-devel@nongnu.org; Tue, 25 Oct 2016 00:31:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bytOk-0000SN-9Z for qemu-devel@nongnu.org; Tue, 25 Oct 2016 00:31:41 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:34494) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1bytOk-0000S9-0O for qemu-devel@nongnu.org; Tue, 25 Oct 2016 00:31:38 -0400 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id u9P4Svpc144556 for ; Tue, 25 Oct 2016 00:31:36 -0400 Received: from e28smtp05.in.ibm.com (e28smtp05.in.ibm.com [125.16.236.5]) by mx0a-001b2d01.pphosted.com with ESMTP id 269yh19ynw-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 25 Oct 2016 00:31:35 -0400 Received: from localhost by e28smtp05.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 25 Oct 2016 10:01:32 +0530 Received: from d28dlp01.in.ibm.com (9.184.220.126) by e28smtp05.in.ibm.com (192.168.1.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 25 Oct 2016 10:01:30 +0530 Received: from d28relay02.in.ibm.com (d28relay02.in.ibm.com [9.184.220.59]) by d28dlp01.in.ibm.com (Postfix) with ESMTP id 911B8E0040; Tue, 25 Oct 2016 10:01:21 +0530 (IST) Received: from d28av04.in.ibm.com (d28av04.in.ibm.com [9.184.220.66]) by d28relay02.in.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u9P4VTAI46137594; Tue, 25 Oct 2016 10:01:29 +0530 Received: from d28av04.in.ibm.com (localhost [127.0.0.1]) by d28av04.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u9P4VR5d009169; Tue, 25 Oct 2016 10:01:29 +0530 Received: from oc4354787705.ibm.com ([9.109.223.104]) by d28av04.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id u9P4VK61008623; Tue, 25 Oct 2016 10:01:21 +0530 To: Richard Henderson , qemu-ppc@nongnu.org, david@gibson.dropbear.id.au References: <1475041518-9757-1-git-send-email-raji@linux.vnet.ibm.com> <1475041518-9757-3-git-send-email-raji@linux.vnet.ibm.com> <443643e4-26c4-d049-c521-fc8a15da663f@twiddle.net> <897d84d4-9b85-4d03-a117-555da740b48c@linux.vnet.ibm.com> From: Rajalakshmi Srinivasaraghavan Date: Tue, 25 Oct 2016 10:01:21 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <897d84d4-9b85-4d03-a117-555da740b48c@linux.vnet.ibm.com> X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16102504-0016-0000-0000-0000035AD08A X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16102504-0017-0000-0000-00002709BF8C Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-10-25_02:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1610250076 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [generic] X-Received-From: 148.163.156.1 Subject: Re: [Qemu-devel] [PATCH 2/6] target-ppc: add vextu[bhw]lx instructions X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-devel@nongnu.org, nikunj@linux.vnet.ibm.com, Avinesh Kumar Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" On 10/05/2016 10:51 AM, Rajalakshmi Srinivasaraghavan wrote: > > > On 09/28/2016 10:24 PM, Richard Henderson wrote: >> On 09/27/2016 10:45 PM, Rajalakshmi Srinivasaraghavan wrote: >>> +#if defined(HOST_WORDS_BIGENDIAN) >>> +#define VEXTULX_DO(name, elem) \ >>> +target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b) \ >>> +{ \ >>> + target_ulong r = 0; \ >>> + int i; \ >>> + int index = a & 0xf; \ >>> + for (i = 0; i < elem; i++) { \ >>> + r = r << 8; \ >>> + if (index + i <= 15) { \ >>> + r = r | b->u8[index + i]; \ >>> + } \ >>> + } \ >>> + return r; \ >>> +} >>> +#else >>> +#define VEXTULX_DO(name, elem) \ >>> +target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b) \ >>> +{ \ >>> + target_ulong r = 0; \ >>> + int i; \ >>> + int index = 15 - (a & 0xf); \ >>> + for (i = 0; i < elem; i++) { \ >>> + r = r << 8; \ >>> + if (index - i >= 0) { \ >>> + r = r | b->u8[index - i]; \ >>> + } \ >>> + } \ >>> + return r; \ >>> +} >>> +#endif >>> + >>> +VEXTULX_DO(vextublx, 1) >>> +VEXTULX_DO(vextuhlx, 2) >>> +VEXTULX_DO(vextuwlx, 4) >>> +#undef VEXTULX_DO >> Ew. >> >> This should be one 128-bit shift and one and. >> >> Since the shift amount is a multiple of 8, the 128-bit shift for >> vextub[lr]x >> does not need to cross a double-word boundary, and so can be >> decomposed into >> one 64-bit shift of (count & 64 ? hi : lo). >> >> For vextu[hw]lr]x, you'd need to do the whole left-shift, >> right-shift, or thing. >> >> But still, fantastically better than a loop. > Ack. Will send an updated patch. Attached updated patch. >> >> >> r~ >> >> > -- Thanks Rajalakshmi S >From 59b96e11dd4c649ba9dbf0435439f717b931530f Mon Sep 17 00:00:00 2001 From: Rajalakshmi Srinivasaraghavan Date: Mon, 24 Oct 2016 11:36:33 +0530 Subject: [PATCH 1/2] target-ppc: add vextu[bhw]lx instructions vextublx: Vector Extract Unsigned Byte Left vextuhlx: Vector Extract Unsigned Halfword Left vextuwlx: Vector Extract Unsigned Word Left Signed-off-by: Avinesh Kumar Signed-off-by: Rajalakshmi Srinivasaraghavan --- target-ppc/helper.h | 3 ++ target-ppc/int_helper.c | 63 +++++++++++++++++++++++++++++++++++ target-ppc/translate/vmx-impl.inc.c | 18 ++++++++++ target-ppc/translate/vmx-ops.inc.c | 4 ++- 4 files changed, 87 insertions(+), 1 deletions(-) diff --git a/target-ppc/helper.h b/target-ppc/helper.h index 04c6421..8551568 100644 --- a/target-ppc/helper.h +++ b/target-ppc/helper.h @@ -357,6 +357,9 @@ DEF_HELPER_3(vpmsumb, void, avr, avr, avr) DEF_HELPER_3(vpmsumh, void, avr, avr, avr) DEF_HELPER_3(vpmsumw, void, avr, avr, avr) DEF_HELPER_3(vpmsumd, void, avr, avr, avr) +DEF_HELPER_2(vextublx, tl, tl, avr) +DEF_HELPER_2(vextuhlx, tl, tl, avr) +DEF_HELPER_2(vextuwlx, tl, tl, avr) DEF_HELPER_2(vsbox, void, avr, avr) DEF_HELPER_3(vcipher, void, avr, avr, avr) diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c index 5aee0a8..2b28848 100644 --- a/target-ppc/int_helper.c +++ b/target-ppc/int_helper.c @@ -1742,6 +1742,69 @@ void helper_vlogefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b) } } +#define EXTRACT128(value, start, length) \ + ((value >> start) & (~(__uint128_t)0 >> (128 - length))) + +#if defined(HOST_WORDS_BIGENDIAN) +# if defined (CONFIG_INT128) \ +# define VEXTULX_DO(name, elem) \ +target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b) \ +{ \ + target_ulong r = 0; \ + int index = (a & 0xf) * 8; \ + r = EXTRACT128(b->u128, index, elem * 8); \ + return r; \ +} +# else +# define VEXTULX_DO(name, elem) \ +target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b) \ +{ \ + target_ulong r = 0; \ + int i; \ + int index = a & 0xf; \ + for (i = 0; i < elem; i++) { \ + r = r << 8; \ + if (index + i <= 15) { \ + r = r | b->u8[index + i]; \ + } \ + } \ + return r; \ +} +# endif +#else +# if defined (CONFIG_INT128) +# define VEXTULX_DO(name, elem) \ +target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b) \ +{ \ + target_ulong r = 0; \ + int size = elem * 8; \ + int index = (15 - (a & 0xf) + 1) * 8; \ + r = EXTRACT128(b->u128, (index - size), size); \ + return r; \ +} +# else +# define VEXTULX_DO(name, elem) \ +target_ulong glue(helper_, name)(target_ulong a, ppc_avr_t *b) \ +{ \ + target_ulong r = 0; \ + int i; \ + int index = 15 - (a & 0xf); \ + for (i = 0; i < elem; i++) { \ + r = r << 8; \ + if (index - i >= 0) { \ + r = r | b->u8[index - i]; \ + } \ + } \ + return r; \ +} +# endif +#endif + +VEXTULX_DO(vextublx, 1) +VEXTULX_DO(vextuhlx, 2) +VEXTULX_DO(vextuwlx, 4) +#undef VEXTULX_DO + /* The specification says that the results are undefined if all of the * shift counts are not identical. We check to make sure that they are * to conform to what real hardware appears to do. */ diff --git a/target-ppc/translate/vmx-impl.inc.c b/target-ppc/translate/vmx-impl.inc.c index c8998f3..0a9d609 100644 --- a/target-ppc/translate/vmx-impl.inc.c +++ b/target-ppc/translate/vmx-impl.inc.c @@ -276,6 +276,19 @@ static void glue(gen_, name0##_##name1)(DisasContext *ctx) \ } \ } +#define GEN_VXFORM_HETRO(name, opc2, opc3) \ +static void glue(gen_, name)(DisasContext *ctx) \ +{ \ + TCGv_ptr rb; \ + if (unlikely(!ctx->altivec_enabled)) { \ + gen_exception(ctx, POWERPC_EXCP_VPU); \ + return; \ + } \ + rb = gen_avr_ptr(rB(ctx->opcode)); \ + gen_helper_##name(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)], rb); \ + tcg_temp_free_ptr(rb); \ +} + GEN_VXFORM(vaddubm, 0, 0); GEN_VXFORM(vadduhm, 0, 1); GEN_VXFORM(vadduwm, 0, 2); @@ -441,6 +454,11 @@ GEN_VXFORM_ENV(vaddfp, 5, 0); GEN_VXFORM_ENV(vsubfp, 5, 1); GEN_VXFORM_ENV(vmaxfp, 5, 16); GEN_VXFORM_ENV(vminfp, 5, 17); +GEN_VXFORM_HETRO(vextublx, 6, 24) +GEN_VXFORM_HETRO(vextuhlx, 6, 25) +GEN_VXFORM_HETRO(vextuwlx, 6, 26) +GEN_VXFORM_DUAL(vmrgow, PPC_NONE, PPC2_ALTIVEC_207, + vextuwlx, PPC_NONE, PPC2_ISA300) #define GEN_VXRFORM1(opname, name, str, opc2, opc3) \ static void glue(gen_, name)(DisasContext *ctx) \ diff --git a/target-ppc/translate/vmx-ops.inc.c b/target-ppc/translate/vmx-ops.inc.c index 68cba3e..70dc250 100644 --- a/target-ppc/translate/vmx-ops.inc.c +++ b/target-ppc/translate/vmx-ops.inc.c @@ -91,8 +91,10 @@ GEN_VXFORM(vmrghw, 6, 2), GEN_VXFORM(vmrglb, 6, 4), GEN_VXFORM(vmrglh, 6, 5), GEN_VXFORM(vmrglw, 6, 6), +GEN_VXFORM_300(vextublx, 6, 24), +GEN_VXFORM_300(vextuhlx, 6, 25), +GEN_VXFORM_DUAL(vmrgow, vextuwlx, 6, 26, PPC_NONE, PPC2_ALTIVEC_207), GEN_VXFORM_207(vmrgew, 6, 30), -GEN_VXFORM_207(vmrgow, 6, 26), GEN_VXFORM(vmuloub, 4, 0), GEN_VXFORM(vmulouh, 4, 1), GEN_VXFORM_DUAL(vmulouw, vmuluwm, 4, 2, PPC_ALTIVEC, PPC_NONE), -- 1.7.1