From patchwork Fri Jun 15 13:13:19 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ulrich Weigand X-Patchwork-Id: 9320 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id 52D6523EB4 for ; Fri, 15 Jun 2012 13:13:30 +0000 (UTC) Received: from mail-gh0-f180.google.com (mail-gh0-f180.google.com [209.85.160.180]) by fiordland.canonical.com (Postfix) with ESMTP id F36ADA18345 for ; Fri, 15 Jun 2012 13:13:29 +0000 (UTC) Received: by ghbz12 with SMTP id z12so2434884ghb.11 for ; Fri, 15 Jun 2012 06:13:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-forwarded-to:x-forwarded-for:delivered-to:received-spf:message-id :subject:to:date:from:cc:x-mailer:mime-version:content-type :content-transfer-encoding:x-cbid:x-gm-message-state; bh=rlqgpaqNKfCpZCnSSCafYw/TwAG6vzKCN+hcaARtuZA=; b=HQC7bJqa9N/JnMuQwAIfEQyA7izXoBmHj/WgYvoYhURR8oIjoXAnRGf5YoKjBarsNV ogzhpGiG3/j2GMS0L5mu6PoCUXJaw37ZstiYGhF04LTJVmVMathjDjytQuBEHy9O3LTI TydBbvfRM9nYRFr6bPqz7IFffrYImwHiw2DyxLYk3LjZNEILbBM8hsyUoW8ivmJ/6Ha0 V2sLAa+npm9CbF6DP8rdVCopPGerGHYJZFrMeOtmGcLVm5BLIEYCt9rll43eyCpbmyB9 dHXD/neFi8yCnakw3kL5hT1NnImyV4evOCVR1iMx4lL9Om4HpCnaIdG5CXkNTBW6Qb1k pAng== Received: by 10.50.87.227 with SMTP id bb3mr1724710igb.57.1339766009129; Fri, 15 Jun 2012 06:13:29 -0700 (PDT) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.231.24.148 with SMTP id v20csp145864ibb; Fri, 15 Jun 2012 06:13:27 -0700 (PDT) Received: by 10.180.84.6 with SMTP id u6mr4504901wiy.11.1339766007291; Fri, 15 Jun 2012 06:13:27 -0700 (PDT) Received: from e06smtp15.uk.ibm.com (e06smtp15.uk.ibm.com. [195.75.94.111]) by mx.google.com with ESMTPS id z10si2861255wix.34.2012.06.15.06.13.26 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 15 Jun 2012 06:13:27 -0700 (PDT) Received-SPF: pass (google.com: domain of uweigand@de.ibm.com designates 195.75.94.111 as permitted sender) client-ip=195.75.94.111; Authentication-Results: mx.google.com; spf=pass (google.com: domain of uweigand@de.ibm.com designates 195.75.94.111 as permitted sender) smtp.mail=uweigand@de.ibm.com Received: from /spool/local by e06smtp15.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 15 Jun 2012 14:13:26 +0100 Received: from d06nrmr1407.portsmouth.uk.ibm.com (9.149.38.185) by e06smtp15.uk.ibm.com (192.168.101.145) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 15 Jun 2012 14:13:23 +0100 Received: from d06av02.portsmouth.uk.ibm.com (d06av02.portsmouth.uk.ibm.com [9.149.37.228]) by d06nrmr1407.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q5FDDMis2052142 for ; Fri, 15 Jun 2012 14:13:22 +0100 Received: from d06av02.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q5FDDLIL032242 for ; Fri, 15 Jun 2012 07:13:22 -0600 Received: from tuxmaker.boeblingen.de.ibm.com (tuxmaker.boeblingen.de.ibm.com [9.152.85.9]) by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with SMTP id q5FDDK22032184; Fri, 15 Jun 2012 07:13:20 -0600 Message-Id: <201206151313.q5FDDK22032184@d06av02.portsmouth.uk.ibm.com> Received: by tuxmaker.boeblingen.de.ibm.com (sSMTP sendmail emulation); Fri, 15 Jun 2012 15:13:20 +0200 Subject: [PATCH] Fix PR tree-optimization/53636 (SLP generates invalid misaligned access) To: gcc-patches@gcc.gnu.org Date: Fri, 15 Jun 2012 15:13:19 +0200 (CEST) From: "Ulrich Weigand" Cc: patches@linaro.org X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 x-cbid: 12061513-0342-0000-0000-000002003757 X-Gm-Message-State: ALoCoQlYye6JLpURrngLvTC8aFVkqLQuPhWk8ZCsP9I1LxVdWH8iF9fVkAjsOwF8LT9jhnBrp1qY Hello, PR tree-optimization/53636 is a crash due to an invalid unaligned access generated by the vectorizer. The problem is that vect_compute_data_ref_alignment uses DR_ALIGNED_TO as computed by the default data-ref analysis to decide whether an access is sufficiently aligned for the vectorizer. However, this analysis computes the scalar evolution relative to the innermost loop in which the access takes place; DR_ALIGNED_TO only reflects the known alignmnent of the *base* address according to that evolution. Now, if we're actually about to vectorize this particular loop, then just checking the alignment of the base is exactly what we need to do (subsequent accesses will usually be misaligned, but that's OK since we're transforming those into a vector access). However, if we're actually currently vectorizing something else, this test is not sufficient. The code currently already checks for the case where we're performing outer-loop vectorization. In this case, we need to check alignment of the access on *every* pass through the inner loop, as the comment states: /* In case the dataref is in an inner-loop of the loop that is being vectorized (LOOP), we use the base and misalignment information relative to the outer-loop (LOOP). This is ok only if the misalignment stays the same throughout the execution of the inner-loop, which is why we have to check that the stride of the dataref in the inner-loop evenly divides by the vector size. */ However, there is a second case where we need to check every pass: if we're not actually vectorizing any loop, but are performing basic-block SLP. In this case, it would appear that we need the same check as described in the comment above, i.e. to verify that the stride is a multiple of the vector size. The patch below adds this check, and this indeed fixes the invalid access I was seeing in the test case (in the final assembler, we now get a vld1.16 instead of vldr). Tested on arm-linux-gnueabi with no regressions. OK for mainline? Bye, Ulrich ChangeLog: gcc/ PR tree-optimization/53636 * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Verify stride when doing basic-block vectorization. gcc/testsuite/ PR tree-optimization/53636 * gcc.target/arm/pr53636.c: New test. === added file 'gcc/testsuite/gcc.target/arm/pr53636.c' --- gcc/testsuite/gcc.target/arm/pr53636.c 1970-01-01 00:00:00 +0000 +++ gcc/testsuite/gcc.target/arm/pr53636.c 2012-06-11 17:31:41 +0000 @@ -0,0 +1,48 @@ +/* { dg-do run } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-options "-O -ftree-vectorize" } */ +/* { dg-add-options arm_neon } */ + +void fill (short *buf) __attribute__ ((noinline)); +void fill (short *buf) +{ + int i; + + for (i = 0; i < 11 * 8; i++) + buf[i] = i; +} + +void test (unsigned char *dst) __attribute__ ((noinline)); +void test (unsigned char *dst) +{ + short tmp[11 * 8], *tptr; + int i; + + fill (tmp); + + tptr = tmp; + for (i = 0; i < 8; i++) + { + dst[0] = (-tptr[0] + 9 * tptr[0 + 1] + 9 * tptr[0 + 2] - tptr[0 + 3]) >> 7; + dst[1] = (-tptr[1] + 9 * tptr[1 + 1] + 9 * tptr[1 + 2] - tptr[1 + 3]) >> 7; + dst[2] = (-tptr[2] + 9 * tptr[2 + 1] + 9 * tptr[2 + 2] - tptr[2 + 3]) >> 7; + dst[3] = (-tptr[3] + 9 * tptr[3 + 1] + 9 * tptr[3 + 2] - tptr[3 + 3]) >> 7; + dst[4] = (-tptr[4] + 9 * tptr[4 + 1] + 9 * tptr[4 + 2] - tptr[4 + 3]) >> 7; + dst[5] = (-tptr[5] + 9 * tptr[5 + 1] + 9 * tptr[5 + 2] - tptr[5 + 3]) >> 7; + dst[6] = (-tptr[6] + 9 * tptr[6 + 1] + 9 * tptr[6 + 2] - tptr[6 + 3]) >> 7; + dst[7] = (-tptr[7] + 9 * tptr[7 + 1] + 9 * tptr[7 + 2] - tptr[7 + 3]) >> 7; + + dst += 8; + tptr += 11; + } +} + +int main (void) +{ + char buf [8 * 8]; + + test (buf); + + return 0; +} + === modified file 'gcc/tree-vect-data-refs.c' --- gcc/tree-vect-data-refs.c 2012-05-31 08:46:10 +0000 +++ gcc/tree-vect-data-refs.c 2012-06-11 17:31:41 +0000 @@ -845,6 +845,24 @@ } } + /* Similarly, if we're doing basic-block vectorization, we can only use + base and misalignment information relative to an innermost loop if the + misalignment stays the same throughout the execution of the loop. + As above, this is the case if the stride of the dataref evenly divides + by the vector size. */ + if (!loop) + { + tree step = DR_STEP (dr); + HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step); + + if (dr_step % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0) + { + if (vect_print_dump_info (REPORT_ALIGNMENT)) + fprintf (vect_dump, "SLP: step doesn't divide the vector-size."); + misalign = NULL_TREE; + } + } + base = build_fold_indirect_ref (base_addr); alignment = ssize_int (TYPE_ALIGN (vectype)/BITS_PER_UNIT);