From patchwork Fri Nov 13 16:01:39 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Charles Baylis <charles.baylis@linaro.org>
X-Patchwork-Id: 56521
Delivered-To: patch@linaro.org
Received: by 10.112.155.196 with SMTP id vy4csp1124498lbb;
 Fri, 13 Nov 2015 08:01:59 -0800 (PST)
X-Received: by 10.66.158.97 with SMTP id wt1mr33585441pab.155.1447430517439; 
 Fri, 13 Nov 2015 08:01:57 -0800 (PST)
Return-Path: <gcc-patches-return-414011-patch=linaro.org@gcc.gnu.org>
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 wx8si28184995pab.113.2015.11.13.08.01.56 for <patch@linaro.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Fri, 13 Nov 2015 08:01:57 -0800 (PST)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-414011-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Authentication-Results: mx.google.com; spf=pass (google.com: domain of
 gcc-patches-return-414011-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom=gcc-patches-return-414011-patch=linaro.org@gcc.gnu.org;
 dkim=pass header.i=@gcc.gnu.org
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :mime-version:date:message-id:subject:from:to:cc:content-type;
 q=dns; s=default; b=i8Cyg6o9/8E5LgfhX/BigA8EvDbuv6tn8yrBj8IE7kB
 q1ZW5tkGNRoszQQjW3NXOPMnNiZ2mYrbitphn/1PZ5IxjxCMhe/lIM2re2LxwJtO
 3mRGdU4xGRfyRDVCPii+UyrlXOvQz7ObdwWz9sR9Dq2QpLH9P3dDemfUi/WPP6Kc
 =
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :mime-version:date:message-id:subject:from:to:cc:content-type;
 s=default; bh=UTDgRjf1zzqMZk7g+OzaN7W0Apc=; b=Nv9eLIKyaR7Wo85WV
 9Cr2RjSwgAFPS65bHaCLipA/m9j+JTSaESw20pIBchVIMMCp6Gm15rhVTYWEl8nP
 6bbszekXsWPYRqTW3vup/Up6kdBK8EPascFsmdt1VcORRhi9X4kbhTY0GDQ37naH
 6P+aztTXxr5WRw7WNyEVO/VlcE=
Received: (qmail 20017 invoked by alias); 13 Nov 2015 16:01:44 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-patch=linaro.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 20002 invoked by uid 89); 13 Nov 2015 16:01:44 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL, BAYES_00,
 KAM_LOTSOFHASH, RCVD_IN_DNSWL_LOW,
 SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-oi0-f47.google.com
Received: from mail-oi0-f47.google.com (HELO mail-oi0-f47.google.com)
 (209.85.218.47) by sourceware.org
 (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256
 encrypted) ESMTPS; Fri, 13 Nov 2015 16:01:41 +0000
Received: by oige206 with SMTP id e206so51856224oig.2 for
 <gcc-patches@gcc.gnu.org>; Fri, 13 Nov 2015 08:01:39 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net;
 s=20130820;
 h=x-gm-message-state:mime-version:date:message-id:subject:from:to:cc
 :content-type; bh=jugvQzK5fMvFz5eSvMOKnc6oVgRf0kEILvs0uQV8lZQ=;
 b=Gc5Q1gOGfP7k0qBXNFgfdgm9ftNuygmEBpSVHgoVeUvDTz8+4Tysbgz7JKWMRMKGv6
 D0+3PWjUYRgg6kGq6pcMdKxQ4wRNVctq8yWaLBCLBlwZQTnCTIORkxdWXowTthB+PgzT
 rhnJIcvQEtOsEI/H6k7xnseb0riotNer3zEfcbgqBCeO8avzcSAwpTfk+DuLJujSPI/x
 dOrXTyO7FICyijkZshYafHWiiXKoro0Z6+Ff1rdDLPw7IxP2F6rKTJ+6oa6CilxMaib9
 AGyQNOJtlFZooH0ruwrU3PmDmOm+/HOAwAPaHB2KPhqsLtcLnfPBuKY7d4UCJbVZyTXT
 jjgw==
X-Gm-Message-State: ALoCoQkjZ5uGzCAjqBHZt4yCYxfl2KhF4/QsmkBpGPawY6am1TBu3PumvWvIPZZZseSFrkm+SCts
MIME-Version: 1.0
X-Received: by 10.202.181.133 with SMTP id e127mr12324076oif.16.1447430499116;
 Fri, 13 Nov 2015 08:01:39 -0800 (PST)
Received: by 10.202.215.215 with HTTP; Fri, 13 Nov 2015 08:01:39 -0800 (PST)
Date: Fri, 13 Nov 2015 16:01:39 +0000
Message-ID: <CADnVucBwhJ3n59iMy9amFiQ=kgre64kn99MW7AwBr7FLxTSczQ@mail.gmail.com>
Subject: [PATCH v2] [ARM] PR61551 RFC: Improve costs for NEON addressing modes
From: Charles Baylis <charles.baylis@linaro.org>
To: Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>,
 Kyrylo Tkachov <kyrylo.tkachov@arm.com>,
 Richard Earnshaw <Richard.Earnshaw@arm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
X-IsSubscribed: yes

Hi

Following on from previous discussion:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03464.html
and IRC.

I'm going to try once more to make the case for fixing the worst
problem for GCC 6, pending a rewrite of the address_cost
infrastructure for GCC 7. I think the rewrite you're describing is
overkill for this problem. There is one specific problem which I would
like to fix for GCC6, and that is the failure of the ARM backend to
allow use of post-indexed addressing for some vector modes.

Test program:
 #include <arm_neon.h>
char *f(char *p, int8x8x4_t v, int r)
{ vst4_s8(p, v); p+=32; return p; }

Desired code:
f:
        vst4.8  {d0-d3}, [r0]!
        bx      lr

Currently generated code:
f:
        mov     r3, r0
        adds    r0, r0, #32
        vst4.8  {d0-d3}, [r3]
        bx      lr

The auto-inc-dec phase does not apply in this case, because the costs
for RTXs which use POST_INC are wrong. Using gdb to poke at this, we
can see:

$ arm-unknown-linux-gnueabihf-gcc -mfpu=neon -O3 -S /tmp/foo.c
-wrapper gdb,--args
GNU gdb (Ubuntu 7.9-1ubuntu1) 7.9
<snip>
Reading symbols from
/home/charles.baylis/tools/tools-arm-unknown-linux-gnueabihf-git/bin/../libexec/gcc/arm-unknown-linux-gnueabihf/6.0.0/cc1...done.
(gdb) b auto-inc-dec.c:473
Breakpoint 1 at 0x102c253: file
/home/charles.baylis/srcarea/gcc/gcc-git/gcc/auto-inc-dec.c, line 473.
(gdb) r
<snip, gdb has an assertion failure here, but it works to continue>
(gdb) print debug_rtx(mem)
(mem:OI (reg/v/f:SI 112 [ p ]) [0 MEM[(signed char[32] *)p_2(D)]+0 S32 A8])
$1 = void
(gdb) print rtx_cost(mem, V16QImode, SET, 1, false)
$2 = 4
(gdb) print debug_rtx(mem_tmp)
(mem:OI (post_inc:SI (reg/f:SI 115 [ p ])) [0  S32 A64])
$3 = void
(gdb) print rtx_cost(mem_tmp, V16QImode, SET, 1, false)
$4 = 32

So, the cost of
     (mem:OI (reg/v/f:SI 112 [ p ]))
is 4, while the cost of
    (mem:OI (post_inc:SI (reg/f:SI 115 [ p ])))
is 32.

That is a difference equivalent to 7 insns, which has no basis in
reality. It is just a bug.

Addressing some specific review points from the previous version.

> > +    {
> > +      0,
> > +      COSTS_N_INSNS (15),
> > +      COSTS_N_INSNS (15),
> > +      COSTS_N_INSNS (15),
> > +      COSTS_N_INSNS (15)
> > +    } /* vec512 */
> >    }
> >  };
>
> I'm curious as to the numbers here - The costs should reflect the relative costs of the
> addressing modes not the costs of the loads and stores - thus having high numbers
> here for vector modes may just prevent this from even triggering in auto-inc-dec
> code ? In my experience with GCC I've never satisfactorily answered the question
> whether these should be comparable to rtx_costs or not. In an ideal world they should
> be but I'm never sure. IOW I'm not sure if using COSTS_N_INSNS or plain numbers
> here is appropriate.

That's the point of the patch. These numbers give the same behaviour
as the current arm_rtx_costs code, and they are obviously wrong.

> 17:45 < ramana> My problem is that the mid-end in a number of other places
>         compares the cost coming out of rtx_cost and address_cost and if the 2
>         are not in sync we get funny values.

There is already no correspondence at all between the two at present.
My patch doesn't address this, but I think it must at least make it
better.

However, I don't really understand this comment - as you point out
above, address_cost and rtx_cost return values measured in different
units. I don't see how they can be made to correspond, given that.

> Right, but this does not change arm_address_costs - so how is this going to work?
> I would like this moved into a new function aarch_address_costs and that replacing
> arm_address_costs only to be called from here.

I could do that, but if I did, I would have to resubmit the patch at
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00387.html along with a
reimplemention of arm_address_costs which used a table without
changing its numerical results (pending subsequent tuning). Since the
former would already solve my problem, and the latter would then be a
pure code clean up of a separate function, why not accept the '387
patch as is, and leave the clean up until GCC 7?

Alternatively, this is an updated patch series which changes the costs
for MEMs in arm_rtx_costs using the table. Passes make check with no
regressions for arm-unknown-linux-gnueabihf on qemu.

>From 68f4318327e75a709f6a3bea327915c0558127df Mon Sep 17 00:00:00 2001
From: Charles Baylis <charles.baylis@linaro.org>
Date: Fri, 13 Nov 2015 12:24:08 +0000
Subject: [PATCH 4/4] Use integer costs for soft float

If using soft float, then costs of accessing FP values is actually the same
as the cost of accessing integers of the same size.

Change-Id: Icb672b2b599ea4e433bc0b29c228e9f910aeb3ee
---
 gcc/config/arm/arm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 101ff28..726a385 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9579,15 +9579,15 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code,
 		  : ARM_NUM_REGS (mode) <= 12 ? extra_cost->extra_mem->vec384[op]
 		  : extra_cost->extra_mem->vec512[op]);
 	      }
-	    else if (FLOAT_MODE_P (mode))
+	    else if (mode == BLKmode)
+	      *cost += extra_cost->extra_mem->blk[op];
+	    else if (FLOAT_MODE_P (mode) && TARGET_HARD_FLOAT)
 	      {
 		*cost +=
 		  (ARM_NUM_REGS (mode) <= 1 ? extra_cost->extra_mem->sf[op]
 		  : ARM_NUM_REGS (mode) <= 2 ? extra_cost->extra_mem->df[op]
 					     : extra_cost->extra_mem->cdf[op]);
 	      }
-	    else if (mode == BLKmode)
-	      *cost += extra_cost->extra_mem->blk[op];
             else
 	      { /* integer modes */
 		*cost +=
-- 
1.9.1