From patchwork Fri Nov 17 15:33:09 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 119167 Delivered-To: patch@linaro.org Received: by 10.140.22.164 with SMTP id 33csp661176qgn; Fri, 17 Nov 2017 07:33:48 -0800 (PST) X-Google-Smtp-Source: AGs4zMb4i6UTn+56uD31fdqGlHYbDPCfEl20Cbu6YnRgfq5hMV22xI0ST9CFSisjytI1mMVCif3O X-Received: by 10.101.96.35 with SMTP id p3mr5386244pgu.230.1510932828455; Fri, 17 Nov 2017 07:33:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510932828; cv=none; d=google.com; s=arc-20160816; b=Bm2+BLCqg3967hsctCbSSXJGS6w4jXiyu+yvUJx+9QfV5BOKV8AD5f48pANT2qbEe+ u/ykADH3nZYZVyzTqjX7jRKY8CnOwC+to/LNOjALaWBhfwP2duBvuxcoyZ68vfRbo7Wa 8TlBnu/brWMdBP4YUK/bA4n/RulhXxQxbHY3bcWerivPiwIBjSvXfcjYkVCIcxuADBow gGRql2PSbx33CMZI85jWl6vPz7Rj+fquV31eLBVfVmZm2dCvAhsDS2qbF5t3ACescoy/ JBKuUjqsuPRgIVRb21iSl9lfbP7EgTglIZfJ2C8pnm3HWaX6tfTgrJc/hKU3It5lyz70 TOLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:message-id:date:subject:mail-followup-to:to :from:delivered-to:sender:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature :domainkey-signature:arc-authentication-results; bh=HbyB90cWBaW+eNGXS4lJrdD6vmnyhG2srftaR6xYsFo=; b=kxQS5yj1cle6P2PWHlxbeC/JKrqz96SSycB1KnDxLsc3CjuAy9umiFfuLe9MHjyY9p ppqQh1leWeLu4Aouq15CzcpCpRtzw5FzW4s7ZSe/hmhZCdAEQV9nOdIIyRTtW1WWP4sT E/NL+nkK8+ZiS2tQiHPeq+FTBor1+Dfjv5uxofnrGZOX1T/2/nizRpPq1Bz5zjO4UOA2 et03IuZOKddccYxaH171iJ9QubZco2jXjqEkBcaIEknekUSm+zrepE/EgcTJMdf56ghz tDlt8TtanYYnC7ORLFs+g/ofPqyuqrMgLZYQdzzfAUrQDttX5YyTAaRYP0Et4JmN19fG zfDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=NiK4B96C; spf=pass (google.com: domain of gcc-patches-return-467157-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-467157-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id w189si2879366pgd.73.2017.11.17.07.33.48 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Nov 2017 07:33:48 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-467157-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=NiK4B96C; spf=pass (google.com: domain of gcc-patches-return-467157-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-467157-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=I51tOU2+nsFcbLzCbl5I6N/cqJiPDGzBqXKP76RkyXTPCXPLQNFWp c7UDxeptDOIf3CSuJTtWZcIS6LakC2hIyebwvTm75yiS+xd+9gKykWn323zVUtbv b7pp+1F9xKwPMQIV9Ik5b31vswEqsQSfOrtM+ylPaqHnFSr/S6KKG0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=Wj2AT6fWz7tys1t1EszXoDDjAAk=; b=NiK4B96CUq2qXcF6vZfZ nvMGLcwaRdz1PDzXgLPUze6VYTj3HVg2+Q9BloVyriXNPLZbSM57EDNleeLrj7A/ elA2Yf+PlqEmLXAOXWxOL13ENjrzxNN8mMx7exF7rBmP6efnvjcX5yibZgiJ9Xof /vtEpZDyqnfggOZIwv3PTT4= Received: (qmail 106114 invoked by alias); 17 Nov 2017 15:33:19 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 106036 invoked by uid 89); 17 Nov 2017 15:33:19 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KB_WAM_FROM_NAME_SINGLEWORD, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-wm0-f65.google.com Received: from mail-wm0-f65.google.com (HELO mail-wm0-f65.google.com) (74.125.82.65) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 17 Nov 2017 15:33:16 +0000 Received: by mail-wm0-f65.google.com with SMTP id l8so7154322wmg.4 for ; Fri, 17 Nov 2017 07:33:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=HbyB90cWBaW+eNGXS4lJrdD6vmnyhG2srftaR6xYsFo=; b=t+gmJnIzuP0oJKVjpv1FAGyY0AZwrbgkZtjd0QyTFU98QXYXexP+SDqSfMIz2P98Bp +FqKBjrOKPRSK6zPEhfoEokvA+QF3UEet/r1aYBDa0Cy3wes5AXh97pgZRDvk5Y27eK2 g/rl2w+6ve++8AKD7JHHbf51NM6UWdGRvOl6rvXa4JY3S0UxzLwLdYjk+h06T7zxdOQO BQciQO4AXMdFExAsMp7QkxiuVWuahKxcMH2jysJL+QbXRgU07aPo8+u4RDmYb/lw1P+S WULQ3pK53OS2dH6yM6DsR3sy1YJ/N1o5D/O7+9Z4elTPBLxejtR1JJwerm4GjLwS1NJ7 Vq6w== X-Gm-Message-State: AJaThX5+a/Webw/YLqpiDse/n0wAXfmxM3VLqBVKFHsgC1b5RoibfvQ3 ImPqZpzPFRkRQbl3QsAf5+6xFvwvK40= X-Received: by 10.28.51.133 with SMTP id z127mr1361907wmz.139.1510932794426; Fri, 17 Nov 2017 07:33:14 -0800 (PST) Received: from localhost ([2.25.234.120]) by smtp.gmail.com with ESMTPSA id n49sm2508890wrn.76.2017.11.17.07.33.13 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 17 Nov 2017 07:33:13 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: Allow single-element interleaving for non-power-of-2 strides Date: Fri, 17 Nov 2017 15:33:09 +0000 Message-ID: <87o9o0yl6y.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 This allows LD3 to be used for isolated a[i * 3] accesses, in a similar way to the current a[i * 2] and a[i * 4] for LD2 and LD4 respectively. Given the problems with the cost model underestimating the cost of elementwise accesses, the patch continues to reject the VMAT_ELEMENTWISE cases that are currently rejected. Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? Richard 2017-11-17 Richard Sandiford Alan Hayward David Sherwood gcc/ * tree-vect-data-refs.c (vect_analyze_group_access_1): Allow single-element interleaving even if the size is not a power of 2. * tree-vect-stmts.c (get_load_store_type): Disallow elementwise accesses for single-element interleaving if the group size is not a power of 2. gcc/testsuite/ * gcc.target/aarch64/sve_struct_vect_18.c: New test. * gcc.target/aarch64/sve_struct_vect_18_run.c: Likewise. * gcc.target/aarch64/sve_struct_vect_19.c: Likewise. * gcc.target/aarch64/sve_struct_vect_19_run.c: Likewise. Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-11-17 15:32:12.513242384 +0000 +++ gcc/tree-vect-data-refs.c 2017-11-17 15:32:12.696843097 +0000 @@ -2440,11 +2440,10 @@ vect_analyze_group_access_1 (struct data element of the group that is accessed in the loop. */ /* Gaps are supported only for loads. STEP must be a multiple of the type - size. The size of the group must be a power of 2. */ + size. */ if (DR_IS_READ (dr) && (dr_step % type_size) == 0 - && groupsize > 0 - && pow2p_hwi (groupsize)) + && groupsize > 0) { GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = stmt; GROUP_SIZE (vinfo_for_stmt (stmt)) = groupsize; Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c 2017-11-17 15:32:12.513242384 +0000 +++ gcc/tree-vect-stmts.c 2017-11-17 15:32:12.697756534 +0000 @@ -2208,7 +2208,10 @@ get_load_store_type (gimple *stmt, tree cost of using elementwise accesses. This check preserves the traditional behavior until that can be fixed. */ if (*memory_access_type == VMAT_ELEMENTWISE - && !STMT_VINFO_STRIDED_P (stmt_info)) + && !STMT_VINFO_STRIDED_P (stmt_info) + && !(stmt == GROUP_FIRST_ELEMENT (stmt_info) + && !GROUP_NEXT_ELEMENT (stmt_info) + && !pow2p_hwi (GROUP_SIZE (stmt_info)))) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_18.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_18.c 2017-11-17 15:32:12.695929661 +0000 @@ -0,0 +1,44 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#define N 2000 + +#define TEST_LOOP(NAME, TYPE) \ + void __attribute__ ((noinline, noclone)) \ + NAME (TYPE *restrict dest, TYPE *restrict src) \ + { \ + for (int i = 0; i < N; ++i) \ + dest[i] += src[i * 3]; \ + } + +#define TEST(NAME) \ + TEST_LOOP (NAME##_i8, signed char) \ + TEST_LOOP (NAME##_i16, unsigned short) \ + TEST_LOOP (NAME##_f32, float) \ + TEST_LOOP (NAME##_f64, double) + +TEST (test) + +/* Check the vectorized loop. */ +/* { dg-final { scan-assembler-times {\tld1b\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tld3b\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tst1b\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1h\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tld3h\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tst1h\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1w\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tld3w\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tst1w\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1d\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tld3d\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tst1d\t} 1 } } */ + +/* Check the scalar tail. */ +/* { dg-final { scan-assembler-times {\tldrb\tw} 2 } } */ +/* { dg-final { scan-assembler-times {\tstrb\tw} 1 } } */ +/* { dg-final { scan-assembler-times {\tldrh\tw} 2 } } */ +/* { dg-final { scan-assembler-times {\tstrh\tw} 1 } } */ +/* { dg-final { scan-assembler-times {\tldr\ts} 2 } } */ +/* { dg-final { scan-assembler-times {\tstr\ts} 1 } } */ +/* { dg-final { scan-assembler-times {\tldr\td} 2 } } */ +/* { dg-final { scan-assembler-times {\tstr\td} 1 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_18_run.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_18_run.c 2017-11-17 15:32:12.695929661 +0000 @@ -0,0 +1,36 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#include "sve_struct_vect_18.c" + +#undef TEST_LOOP +#define TEST_LOOP(NAME, TYPE) \ + { \ + TYPE out[N]; \ + TYPE in[N * 3]; \ + for (int i = 0; i < N; ++i) \ + { \ + out[i] = i * 7 / 2; \ + asm volatile ("" ::: "memory"); \ + } \ + for (int i = 0; i < N * 3; ++i) \ + { \ + in[i] = i * 9 / 2; \ + asm volatile ("" ::: "memory"); \ + } \ + NAME (out, in); \ + for (int i = 0; i < N; ++i) \ + { \ + TYPE expected = i * 7 / 2 + in[i * 3]; \ + if (out[i] != expected) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int __attribute__ ((optimize (1))) +main (void) +{ + TEST (test); + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_19.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_19.c 2017-11-17 15:32:12.695929661 +0000 @@ -0,0 +1,42 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#define TEST_LOOP(NAME, TYPE) \ + void __attribute__ ((noinline, noclone)) \ + NAME (TYPE *restrict dest, TYPE *restrict src, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + dest[i] += src[i * 3]; \ + } + +#define TEST(NAME) \ + TEST_LOOP (NAME##_i8, signed char) \ + TEST_LOOP (NAME##_i16, unsigned short) \ + TEST_LOOP (NAME##_f32, float) \ + TEST_LOOP (NAME##_f64, double) + +TEST (test) + +/* Check the vectorized loop. */ +/* { dg-final { scan-assembler-times {\tld1b\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tld3b\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tst1b\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1h\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tld3h\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tst1h\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1w\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tld3w\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tst1w\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1d\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tld3d\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tst1d\t} 1 } } */ + +/* Check the scalar tail. */ +/* { dg-final { scan-assembler-times {\tldrb\tw} 2 } } */ +/* { dg-final { scan-assembler-times {\tstrb\tw} 1 } } */ +/* { dg-final { scan-assembler-times {\tldrh\tw} 2 } } */ +/* { dg-final { scan-assembler-times {\tstrh\tw} 1 } } */ +/* { dg-final { scan-assembler-times {\tldr\ts} 2 } } */ +/* { dg-final { scan-assembler-times {\tstr\ts} 1 } } */ +/* { dg-final { scan-assembler-times {\tldr\td} 2 } } */ +/* { dg-final { scan-assembler-times {\tstr\td} 1 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_19_run.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_19_run.c 2017-11-17 15:32:12.695929661 +0000 @@ -0,0 +1,45 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#include "sve_struct_vect_19.c" + +#define N 1000 + +#undef TEST_LOOP +#define TEST_LOOP(NAME, TYPE) \ + { \ + TYPE out[N]; \ + TYPE in[N * 3]; \ + int counts[] = { 0, 1, N - 1 }; \ + for (int j = 0; j < 3; ++j) \ + { \ + int count = counts[j]; \ + for (int i = 0; i < N; ++i) \ + { \ + out[i] = i * 7 / 2; \ + asm volatile ("" ::: "memory"); \ + } \ + for (int i = 0; i < N * 3; ++i) \ + { \ + in[i] = i * 9 / 2; \ + asm volatile ("" ::: "memory"); \ + } \ + NAME (out, in, count); \ + for (int i = 0; i < N; ++i) \ + { \ + TYPE expected = i * 7 / 2; \ + if (i < count) \ + expected += in[i * 3]; \ + if (out[i] != expected) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } \ + } + +int __attribute__ ((optimize (1))) +main (void) +{ + TEST (test); + return 0; +}