From patchwork Fri Oct 11 15:32:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Joel X-Patchwork-Id: 175979 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp913991ill; Fri, 11 Oct 2019 08:33:17 -0700 (PDT) X-Google-Smtp-Source: APXvYqz7uKQyLt1wy9QTDL1NOmZ8UJEb8CHOjlJ2K7BgOFAdvuoWX+scyGeskAVkv1aeAuGFalb5 X-Received: by 2002:a05:6402:a4f:: with SMTP id bt15mr14303992edb.121.1570807997056; Fri, 11 Oct 2019 08:33:17 -0700 (PDT) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id c28si6245865ede.3.2019.10.11.08.33.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 Oct 2019 08:33:17 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-510778-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=UUS27GE+; dkim=neutral (body hash did not verify) header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=2RbKLYFr; dkim=neutral (body hash did not verify) header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=2RbKLYFr; arc=fail (body hash mismatch); spf=pass (google.com: domain of gcc-patches-return-510778-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom="gcc-patches-return-510778-patch=linaro.org@gcc.gnu.org" DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type:mime-version; q=dns; s=default; b=hSd7ZUCwcZdMr+MOlrMZshJQDhFkyZP4/xehHE5Ros4dL7dSAJ qoncEPBK3MTJD2pRLhbOGkZhegXRJEMAfTnbkqbMRj4dJ7S8ueaRTpJHbMQSgSo+ kqX7lLijWU3PuEyheyl7sHQWvhEL/LF+u//+WhJw5aPTQywAQM23wW/kg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type:mime-version; s= default; bh=U+CBbDEGTfuSynfWKn2hHNMrT9c=; b=UUS27GE+o7K0pvk9oLtx DXwgri5fjO9rgS4dF3uIVU358c/Pae2zz84k9Xk5DTB096OguiId168Mn4IGFiGF 28qlfu2jA0HeWmuKx4xbqGGjSvzCvNdOKatShyj0AYy/KeWBJFfqDgr6aTvllbTa PcizORrHLhJKoy0SGHvBK4c= Received: (qmail 97563 invoked by alias); 11 Oct 2019 15:33:02 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 97545 invoked by uid 89); 11 Oct 2019 15:33:02 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: EUR02-HE1-obe.outbound.protection.outlook.com Received: from mail-eopbgr10049.outbound.protection.outlook.com (HELO EUR02-HE1-obe.outbound.protection.outlook.com) (40.107.1.49) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 11 Oct 2019 15:32:59 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=L1nlugKXEZSexiAb51UNx0bLguMfHU6MRE/trb75PV4=; b=2RbKLYFrKNYSRBcb+IO9UTAIzqp15RMt0/ckgB1TTvDchK4Kcn4r3oKfV7wOqbDna8BSdXbV87MqCRfGAuMGxSVJh769ZU0FPA5utN+l4pqScFBltz8Gx4JZhcG5tX+Tu5iHM6ZPbzcPiLTK7dd+jY198l/6GKGuj9/5DqiRVGc= Received: from VI1PR08CA0118.eurprd08.prod.outlook.com (2603:10a6:800:d4::20) by AM0PR08MB4306.eurprd08.prod.outlook.com (2603:10a6:208:139::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2347.21; Fri, 11 Oct 2019 15:32:55 +0000 Received: from AM5EUR03FT062.eop-EUR03.prod.protection.outlook.com (2a01:111:f400:7e08::200) by VI1PR08CA0118.outlook.office365.com (2603:10a6:800:d4::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.2347.16 via Frontend Transport; Fri, 11 Oct 2019 15:32:54 +0000 Authentication-Results: spf=temperror (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; gcc.gnu.org; dmarc=none action=none header.from=arm.com; Received-SPF: TempError (protection.outlook.com: error in processing during lookup of arm.com: DNS Timeout) Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT062.mail.protection.outlook.com (10.152.17.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.2305.15 via Frontend Transport; Fri, 11 Oct 2019 15:32:53 +0000 Received: ("Tessian outbound 851a1162fca7:v33"); Fri, 11 Oct 2019 15:32:53 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 551f2bd443750a23 X-CR-MTA-TID: 64aa7808 Received: from 5a1f32c078d6.1 (ip-172-16-0-2.eu-west-1.compute.internal [104.47.8.59]) by 64aa7808-outbound-1.mta.getcheckrecipient.com id 0FA58990-0FBF-47A0-8594-EBE2B6E77B3F.1; Fri, 11 Oct 2019 15:32:48 +0000 Received: from EUR03-AM5-obe.outbound.protection.outlook.com (mail-am5eur03lp2059.outbound.protection.outlook.com [104.47.8.59]) by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 5a1f32c078d6.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 11 Oct 2019 15:32:48 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UHV81Ubv1DGOkenUON7nsYnzuiewKHbzM+oeRlx5eqSSeuIuxL80EJFrfNlHKBZcn9PYLTYs5Uxfgmw0l2KhfIGR/5nx9wWed61GsowhzSR7uQgBI6k0tX//9RxT11CaSZOiSxGNV7rmeg6nRNOrg/8WirCmscloPSV20gwUhL+7Gs9nTNO5qMPU4EWLbekJYgLCVGP64SfS+6v7pOREQ/99qsqELLy44Y3AlzI60+vCwt0HD/OTopRBaB/EBGUAL2q1olnPyP+tEStqJBPmx3CcejKFhYLimLM2nQf0Ocwh51/rmwnG2Bz0ep4SqPeBDDfblbE+ipGxvuiImqQNug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=L1nlugKXEZSexiAb51UNx0bLguMfHU6MRE/trb75PV4=; b=hn+0s4tYPZWwWHN+O2qrEP6d0et5ueZacnVcDJrWl+qy2tkt7wQCHmnm2lY43mdu/kWPT/R8sWRXtMNPbhDh5Fr9FQ3ETY0/QsvZSsWqt2mTZvonVCC67DR/mtavO4Ho0JgrhjsTyom8cv7y9dnfuVzPAYA+hpRSp7Mw4ow4zEGlKzzhj6CRpelISL8ntBE+r2QRkehgsvG5WHxrlC03Lb+xI1guwwoemwvdkpveqNFjrxIFpOzi7tMvlSK832PqHtWZS3cy1JOZtNx1V2VojlmhkNuRhJuUqeCNosYGXBxRuPJKvmEkD+5+3pBZgyhifBNqZyOsOWLFBaIYKdtvow== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=L1nlugKXEZSexiAb51UNx0bLguMfHU6MRE/trb75PV4=; b=2RbKLYFrKNYSRBcb+IO9UTAIzqp15RMt0/ckgB1TTvDchK4Kcn4r3oKfV7wOqbDna8BSdXbV87MqCRfGAuMGxSVJh769ZU0FPA5utN+l4pqScFBltz8Gx4JZhcG5tX+Tu5iHM6ZPbzcPiLTK7dd+jY198l/6GKGuj9/5DqiRVGc= Received: from DB6PR0801MB2054.eurprd08.prod.outlook.com (10.168.86.135) by DB6PR0801MB1718.eurprd08.prod.outlook.com (10.169.221.9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2347.21; Fri, 11 Oct 2019 15:32:47 +0000 Received: from DB6PR0801MB2054.eurprd08.prod.outlook.com ([fe80::550d:a973:73ad:99b9]) by DB6PR0801MB2054.eurprd08.prod.outlook.com ([fe80::550d:a973:73ad:99b9%4]) with mapi id 15.20.2347.021; Fri, 11 Oct 2019 15:32:47 +0000 From: Joel Hutton To: GCC Patches , "rguenther@suse.de" CC: nd Subject: [SLP] SLP vectorization: vectorize vector constructors Date: Fri, 11 Oct 2019 15:32:47 +0000 Message-ID: <5edb0b00-4ae2-41c0-80ec-76de15d0b110@arm.com> user-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 Authentication-Results-Original: spf=none (sender IP is ) smtp.mailfrom=Joel.Hutton@arm.com; x-checkrecipientrouted: true x-ms-oob-tlc-oobclassifiers: OLM:8882;OLM:8882; X-Forefront-Antispam-Report-Untrusted: SFV:NSPM; SFS:(10009020)(4636009)(39850400004)(396003)(136003)(376002)(346002)(366004)(54534003)(199004)(189003)(40764003)(99936001)(6486002)(5660300002)(6436002)(478600001)(36756003)(110136005)(58126008)(316002)(4326008)(476003)(14454004)(25786009)(66446008)(64756008)(66556008)(66476007)(66616009)(66946007)(6116002)(31686004)(3846002)(6512007)(2906002)(2616005)(2501003)(486006)(66066001)(52116002)(65806001)(65956001)(4001150100001)(71200400001)(71190400001)(99286004)(31696002)(386003)(14444005)(256004)(81156014)(81166006)(7736002)(86362001)(8936002)(26005)(305945005)(186003)(102836004)(8676002)(6506007); DIR:OUT; SFP:1101; SCL:1; SRVR:DB6PR0801MB1718; H:DB6PR0801MB2054.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: sujIqqUa2Y6mftb+K/7SLEFY0eoQjkkwVgFHsRZg7BMgjFZeQzER0mmUz4a5D18zVvClJu6jw2oGTSKc5VdcsRHvFYkpWMlWSYKOVuHxHeLt8F8NGH60FFmda2Ytotl7PInohADGIC1ylvdLmaefHoicsOZ1Q96WmqHWVcdY1lYP9U9l/JzbLsdWPm3NpSEes9eVurj84kbKZ/f3bHoNb6Ls483HTOyirBnEVbbd53zLj8QFP/qjjS0Mmvyb7qIZeuz9hIPWeU+4yRDFbqs8gnB8b9ltK78e3zf+qxjNdovafNKtUGdulvzoDo+CAC4E+Ueu33J/v0ZSMD8QiEfcEUqCMQ6ggsVz41Qp34Gp3PvgFRWyrTLVI19w9IRMBDAxpF8kUwZcV6ax01r+UVLBi+7NWSn0hZPXL/+tKDhiRDQ= x-ms-exchange-transport-forked: True MIME-Version: 1.0 Original-Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Joel.Hutton@arm.com; Return-Path: Joel.Hutton@arm.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT062.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 6c63fab1-d2bd-42e4-6b80-08d74e604477 X-IsSubscribed: yes Hi Richard, Thanks for your help, I've reworked my SLP RFC based on your feedback. > I think a better place for the loop searching for CONSTRUCTORs is > vect_slp_analyze_bb_1 where I'd put it before the check you remove, > and I'd simply append found CONSTRUCTORs to the grouped_stores > array I've moved this check into a separate function and called it from vect_slp_analyze_bb_1 > The fixup you do in vectorizable_operation doesn't > belong there either, I'd add a new field to the SLP instance > structure refering to the CONSTRUCTOR stmt and do the fixup > in vect_schedule_slp_instance instead where you can simply > replace the CONSTRUCTOR with the vectorized SSA name then. Done. > +           /* Check that the constructor elements are unique.  */ > +           FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val) > +             { > +               tree prev_val; > +               int j; > +               FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), j, > prev_val) > +               { > +                 if (val == prev_val && i!=j) > > why's that necessary? (it looks incomplete, also doesn't catch > [duplicate] constants) The thinking was that there was no benefit in vectorizing a constructor of duplicates, or a vector of constants, although now you mention it that thinking may be flawed. I've removed it > You miss to check that CONSTRUCTOR_NELTS == TYPE_VECTOR_SUBPARTS > (we can have omitted trailing zeros). ... > What happens if you have a vector constructor that is twice > as large as the machine supports?  The vectorizer will happily > produce a two vector SSA name vectorized result but your > CONSTRUCTOR replacement doesn't work here.  I think this should > be made work correctly (not give up on that case). I've reworked the patch to account for this, by checking that the vectorized version has one vectorized stmt at the root of the SLP tree. I'm not sure how to handle a vector constructor twice as large as the machine supports, as far as I can see, when only analyzing a within a basic block, the SSA name of the constructor has to be maintained. Currently SLP vectorization can build SLP trees starting from reductions or from group stores. This patch adds a third starting point: vector constructors. For the following test case (compiled with -O3): char g_d[1024], g_s1[1024], g_s2[1024]; void test_loop(void) {   char d = g_d, s1 = g_s1, *s2 = g_s2;   for ( int y = 0; y < 128; y++ )   {     for ( int x = 0; x < 16; x++ )       d[x] = s1[x] + s2[x];     d += 16;   } } before patch: test_loop: .LFB0:         .cfi_startproc         adrp    x0, g_s1         adrp    x2, g_s2         add     x3, x0, :lo12:g_s1         add     x4, x2, :lo12:g_s2         ldrb    w7, [x2, #:lo12:g_s2]         ldrb    w1, [x0, #:lo12:g_s1]         adrp    x0, g_d         ldrb    w6, [x4, 1]         add     x0, x0, :lo12:g_d         ldrb    w5, [x3, 1]         add     w1, w1, w7         fmov    s0, w1         ldrb    w7, [x4, 2]         add     w5, w5, w6         ldrb    w1, [x3, 2]         ldrb    w6, [x4, 3]         add     x2, x0, 2048         ins     v0.b[1], w5         add     w1, w1, w7         ldrb    w7, [x3, 3]         ldrb    w5, [x4, 4]         add     w7, w7, w6         ldrb    w6, [x3, 4]         ins     v0.b[2], w1         ldrb    w8, [x4, 5]         add     w6, w6, w5         ldrb    w5, [x3, 5]         ldrb    w9, [x4, 6]         add     w5, w5, w8         ldrb    w1, [x3, 6]         ins     v0.b[3], w7         ldrb    w8, [x4, 7]         add     w1, w1, w9         ldrb    w11, [x3, 7]         ldrb    w7, [x4, 8]         add     w11, w11, w8         ldrb    w10, [x3, 8]         ins     v0.b[4], w6         ldrb    w8, [x4, 9]         add     w10, w10, w7         ldrb    w9, [x3, 9]         ldrb    w7, [x4, 10]         add     w9, w9, w8         ldrb    w8, [x3, 10]         ins     v0.b[5], w5         ldrb    w6, [x4, 11]         add     w8, w8, w7         ldrb    w7, [x3, 11]         ldrb    w5, [x4, 12]         add     w7, w7, w6         ldrb    w6, [x3, 12]         ins     v0.b[6], w1         ldrb    w12, [x4, 13]         add     w6, w6, w5         ldrb    w5, [x3, 13]         ldrb    w1, [x3, 14]         add     w5, w5, w12         ldrb    w13, [x4, 14]         ins     v0.b[7], w11         ldrb    w12, [x4, 15]         add     w4, w1, w13         ldrb    w1, [x3, 15]         add     w1, w1, w12         ins     v0.b[8], w10         ins     v0.b[9], w9         ins     v0.b[10], w8         ins     v0.b[11], w7         ins     v0.b[12], w6         ins     v0.b[13], w5         ins     v0.b[14], w4         ins     v0.b[15], w1         .p2align 3,,7 .L2:         str     q0, [x0], 16         cmp     x2, x0         bne     .L2         ret         .cfi_endproc .LFE0: After patch: test_loop: .LFB0:         .cfi_startproc         adrp    x3, g_s1         adrp    x2, g_s2         add     x3, x3, :lo12:g_s1         add     x2, x2, :lo12:g_s2         adrp    x0, g_d         add     x0, x0, :lo12:g_d         add     x1, x0, 2048         ldr     q1, [x2]         ldr     q0, [x3]         add     v0.16b, v0.16b, v1.16b         .p2align 3,,7 .L2:         str     q0, [x0], 16         cmp     x0, x1         bne     .L2         ret         .cfi_endproc .LFE0: 2019-10-11  Joel Hutton  Joel.Hutton@arm.com     * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector constructors.     (vect_bb_slp_scalar_cost): Likewise.     (vect_ssa_use_outside_bb): New function.     (vect_slp_check_for_constructors): New function.     (vect_slp_analyze_bb_1): Add check for vector constructors.     (vect_schedule_slp_instance): Add case to fixup vector constructor stmt.     * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field. gcc/testsuite/ChangeLog: 2019-10-11  Joel Hutton  Joel.Hutton@arm.com     * gcc.dg/vect/bb-slp-40.c: New test. bootstrapped and regression tested on aarch64-none-linux-gnu >From 2bc57c17faa1dd494ed3898298e9fbe91f8a8675 Mon Sep 17 00:00:00 2001 From: Joel Hutton Date: Wed, 2 Oct 2019 17:38:53 +0100 Subject: [PATCH] SLP Vectorization: Vectorize Vector Constructors --- gcc/testsuite/gcc.dg/vect/bb-slp-40.c | 33 +++++++ gcc/tree-vect-slp.c | 127 ++++++++++++++++++++++++++ gcc/tree-vectorizer.h | 5 + 3 files changed, 165 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-40.c diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-40.c b/gcc/testsuite/gcc.dg/vect/bb-slp-40.c new file mode 100644 index 0000000000000000000000000000000000000000..51566b716bcda2fe82f50c50e9e9685cb3eb10ae --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-40.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-slp-all" } */ +/* { dg-require-effective-target vect_int } */ + +char g_d[1024], g_s1[1024], g_s2[1024]; +void foo(void) +{ + char *d = g_d, *s1 = g_s1, *s2 = g_s2; + + for ( int y = 0; y < 128; y++ ) + { + d[0 ] = s1[0 ] + s2[0 ]; + d[1 ] = s1[1 ] + s2[1 ]; + d[2 ] = s1[2 ] + s2[2 ]; + d[3 ] = s1[3 ] + s2[3 ]; + d[4 ] = s1[4 ] + s2[4 ]; + d[5 ] = s1[5 ] + s2[5 ]; + d[6 ] = s1[6 ] + s2[6 ]; + d[7 ] = s1[7 ] + s2[7 ]; + d[8 ] = s1[8 ] + s2[8 ]; + d[9 ] = s1[9 ] + s2[9 ]; + d[10] = s1[10] + s2[10]; + d[11] = s1[11] + s2[11]; + d[12] = s1[12] + s2[12]; + d[13] = s1[13] + s2[13]; + d[14] = s1[14] + s2[14]; + d[15] = s1[15] + s2[15]; + d += 16; + } +} + +/* See that we vectorize an SLP instance. */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "slp1" } } */ diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 9b86b67734ad3e3506e9cee6a532b68decf24ae6..c4d452e3dfd46acdaa94dc047e4c6114d8295458 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -1922,6 +1922,7 @@ vect_analyze_slp_instance (vec_info *vinfo, unsigned int i; struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); vec scalar_stmts; + bool constructor = false; if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) { @@ -1935,6 +1936,13 @@ vect_analyze_slp_instance (vec_info *vinfo, vectype = STMT_VINFO_VECTYPE (stmt_info); group_size = REDUC_GROUP_SIZE (stmt_info); } + else if (is_gimple_assign (stmt_info->stmt) + && gimple_assign_rhs_code (stmt_info->stmt) == CONSTRUCTOR) + { + vectype = TREE_TYPE (gimple_assign_rhs1 (stmt_info->stmt)); + group_size = CONSTRUCTOR_NELTS (gimple_assign_rhs1 (stmt_info->stmt)); + constructor = true; + } else { gcc_assert (is_a (vinfo)); @@ -1981,6 +1989,25 @@ vect_analyze_slp_instance (vec_info *vinfo, STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info)) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (scalar_stmts.last ())); } + else if (constructor) + { + tree rhs = gimple_assign_rhs1 (stmt_info->stmt); + tree val; + FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val) + { + if (TREE_CODE (val) == SSA_NAME) + { + gimple* def = SSA_NAME_DEF_STMT (val); + stmt_vec_info def_info = vinfo->lookup_stmt (def); + /* Value is defined in another basic block. */ + if (!def_info) + return false; + scalar_stmts.safe_push (def_info); + } + else + return false; + } + } else { /* Collect reduction statements. */ @@ -2038,6 +2065,14 @@ vect_analyze_slp_instance (vec_info *vinfo, SLP_INSTANCE_GROUP_SIZE (new_instance) = group_size; SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor; SLP_INSTANCE_LOADS (new_instance) = vNULL; + + if (constructor) + { + SLP_INSTANCE_ROOT_STMT (new_instance) = stmt_info->stmt; + } + else + SLP_INSTANCE_ROOT_STMT (new_instance) = NULL; + vect_gather_slp_loads (new_instance, node); if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, @@ -2725,6 +2760,10 @@ vect_bb_slp_scalar_cost (basic_block bb, stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt); if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info)) { + /* Check this is not a constructor that will be vectorized + away. */ + if (BB_VINFO_GROUPED_STORES (vinfo).contains (use_stmt_info)) + continue; (*life)[i] = true; BREAK_FROM_IMM_USE_STMT (use_iter); } @@ -2836,6 +2875,72 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo) return true; } +static bool +vect_ssa_use_outside_bb (tree ssa) +{ + imm_use_iterator use_iter; + gimple *use_stmt; + bool use_outside_of_block = false; + gcc_checking_assert (TREE_CODE (ssa) == SSA_NAME); + gimple* def = SSA_NAME_DEF_STMT (ssa); + + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, ssa) + { + if (use_stmt->bb != def->bb) + { + use_outside_of_block = true; + BREAK_FROM_IMM_USE_STMT (use_iter); + } + /* In the following pattern, we consider _1 and vect_1 + equivalent. + _1 = {a,b,c} + vect_1 = _1 */ + else if (is_gimple_assign (use_stmt) + && gimple_assign_rhs_code (use_stmt) == SSA_NAME + && TREE_CODE (gimple_assign_lhs (use_stmt)) == SSA_NAME) + { + use_outside_of_block = vect_ssa_use_outside_bb (gimple_assign_lhs (use_stmt)); + BREAK_FROM_IMM_USE_STMT (use_iter); + } + else + BREAK_FROM_IMM_USE_STMT (use_iter); + } + return use_outside_of_block; +} + +static void +vect_slp_check_for_constructors (bb_vec_info bb_vinfo) +{ + gimple_stmt_iterator gsi; + + for (gsi = bb_vinfo->region_begin; + gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + bool vectorizable = true; + + if (is_gimple_assign (stmt) + && gimple_assign_rhs_code (stmt) == CONSTRUCTOR + && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME + && TREE_CODE (TREE_TYPE (gimple_assign_lhs (stmt))) == VECTOR_TYPE) + { + tree rhs = gimple_assign_rhs1 (stmt); + + if (CONSTRUCTOR_NELTS (rhs) == 0) + vectorizable = false; + + if (!vect_ssa_use_outside_bb (gimple_assign_lhs (stmt))) + vectorizable = false; + + if (vectorizable) + { + stmt_vec_info stmt_info = bb_vinfo->lookup_stmt (stmt); + BB_VINFO_GROUPED_STORES (bb_vinfo).safe_push (stmt_info); + } + } + } +} + /* Check if the basic block can be vectorized. Returns a bb_vec_info if so and sets fatal to true if failure is independent of current_vector_size. */ @@ -2908,6 +3013,8 @@ vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin, return NULL; } + vect_slp_check_for_constructors (bb_vinfo); + /* If there are no grouped stores in the region there is no need to continue with pattern recog as vect_analyze_slp will fail anyway. */ @@ -4053,6 +4160,26 @@ vect_schedule_slp_instance (slp_tree node, slp_instance instance, FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (child), j, child_stmt_info) STMT_VINFO_DEF_TYPE (child_stmt_info) = vect_internal_def; } + + /* For vector constructors, the same SSA name must be used to maintain data + flow into other basic blocks. */ + if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance) + && SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1 + && SLP_TREE_VEC_STMTS (node).exists ()) + { + stmt_vec_info child_stmt_info; + int j; + FOR_EACH_VEC_ELT (SLP_TREE_VEC_STMTS (node), j, child_stmt_info) + { + gassign *rstmt + = gimple_build_assign (gimple_get_lhs (instance->root_stmt), + gimple_get_lhs (child_stmt_info->stmt)); + gimple_stmt_iterator rgsi = gsi_for_stmt (instance->root_stmt); + gsi_replace (&rgsi, rstmt, true); + break; + } + } + } /* Replace scalar calls from SLP node NODE with setting of their lhs to zero. diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 837fb5ab52537cdf95a413557335b30704f9dc26..906579f0bc0efce955a1cde177fe1404ea8ce843 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -149,6 +149,10 @@ public: /* The root of SLP tree. */ slp_tree root; + /* For vector constructors, the constructor stmt that the SLP tree is built + from, NULL otherwise. */ + gimple *root_stmt; + /* Size of groups of scalar stmts that will be replaced by SIMD stmt/s. */ unsigned int group_size; @@ -168,6 +172,7 @@ public: #define SLP_INSTANCE_GROUP_SIZE(S) (S)->group_size #define SLP_INSTANCE_UNROLLING_FACTOR(S) (S)->unrolling_factor #define SLP_INSTANCE_LOADS(S) (S)->loads +#define SLP_INSTANCE_ROOT_STMT(S) (S)->root_stmt #define SLP_TREE_CHILDREN(S) (S)->children #define SLP_TREE_SCALAR_STMTS(S) (S)->stmts -- 2.17.1