From patchwork Fri Sep 9 17:04:18 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ulrich Weigand X-Patchwork-Id: 4005 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id 028EE23FA0 for ; Fri, 9 Sep 2011 17:04:24 +0000 (UTC) Received: from mail-fx0-f52.google.com (mail-fx0-f52.google.com [209.85.161.52]) by fiordland.canonical.com (Postfix) with ESMTP id AEA6DA181B7 for ; Fri, 9 Sep 2011 17:04:22 +0000 (UTC) Received: by fxd18 with SMTP id 18so3966364fxd.11 for ; Fri, 09 Sep 2011 10:04:22 -0700 (PDT) Received: by 10.223.102.11 with SMTP id e11mr1607721fao.8.1315587862527; Fri, 09 Sep 2011 10:04:22 -0700 (PDT) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.152.11.8 with SMTP id m8cs20720lab; Fri, 9 Sep 2011 10:04:22 -0700 (PDT) Received: by 10.213.113.13 with SMTP id y13mr773720ebp.88.1315587861517; Fri, 09 Sep 2011 10:04:21 -0700 (PDT) Received: from mtagate2.uk.ibm.com (mtagate2.uk.ibm.com [194.196.100.162]) by mx.google.com with ESMTPS id y18si2555064eeh.51.2011.09.09.10.04.20 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 09 Sep 2011 10:04:21 -0700 (PDT) Received-SPF: pass (google.com: domain of uweigand@de.ibm.com designates 194.196.100.162 as permitted sender) client-ip=194.196.100.162; Authentication-Results: mx.google.com; spf=pass (google.com: domain of uweigand@de.ibm.com designates 194.196.100.162 as permitted sender) smtp.mail=uweigand@de.ibm.com Received: from d06nrmr1707.portsmouth.uk.ibm.com (d06nrmr1707.portsmouth.uk.ibm.com [9.149.39.225]) by mtagate2.uk.ibm.com (8.13.1/8.13.1) with ESMTP id p89H4Klo017977; Fri, 9 Sep 2011 17:04:20 GMT Received: from d06av02.portsmouth.uk.ibm.com (d06av02.portsmouth.uk.ibm.com [9.149.37.228]) by d06nrmr1707.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p89H4JVb2191416; Fri, 9 Sep 2011 18:04:20 +0100 Received: from d06av02.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p89H4JPY021450; Fri, 9 Sep 2011 11:04:19 -0600 Received: from tuxmaker.boeblingen.de.ibm.com (tuxmaker.boeblingen.de.ibm.com [9.152.85.9]) by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with SMTP id p89H4IWs021430; Fri, 9 Sep 2011 11:04:18 -0600 Message-Id: <201109091704.p89H4IWs021430@d06av02.portsmouth.uk.ibm.com> Received: by tuxmaker.boeblingen.de.ibm.com (sSMTP sendmail emulation); Fri, 09 Sep 2011 19:04:18 +0200 Subject: [patch, arm] Fix PR target/50305 (arm_legitimize_reload_address problem) To: gcc-patches@gcc.gnu.org Date: Fri, 9 Sep 2011 19:04:18 +0200 (CEST) From: "Ulrich Weigand" Cc: rearnsha@arm.com, ramana.radhakrishnan@linaro.org, patches@linaro.org X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Hello, the problem in PR 50305 turned out to be caused by the ARM back-end LEGITIMIZE_RELOAD_ADDRESS implementation. One of the arguments to the inline asm ("+Qo" (perf_event_id)) has the form (mem/c/i:DI (plus:SI (reg/f:SI 152) (const_int 1200 [0x4b0])) [5 perf_event_id+0 S8 A64]) before reload, where reg 152 holds the section anchor: (insn 23 21 29 3 (set (reg/f:SI 152) (symbol_ref:SI ("*.LANCHOR0") [flags 0x182])) pr50305.c:36 176 {*arm_movsi_insn} (expr_list:REG_EQUAL (symbol_ref:SI ("*.LANCHOR0") [flags 0x182]) (nil))) The displacement is considered out of range for a DImode MEM, and therefore reload attempts to reload the address. The ARM LEGITIMIZE_RELOAD_ADDRESS routine then attempts to optimize this by converting the address to: (mem/c/i:DI (plus:SI (plus:SI (reg/f:SI 3 r3 [152]) (const_int 1024 [0x400])) (const_int 176 [0xb0])) [5 perf_event_id+0 S8 A64]) and pushing reloads: Reload 0: reload_in (SI) = (plus:SI (reg/f:SI 3 r3 [152]) (const_int 1024 [0x400])) CORE_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 2) reload_in_reg: (plus:SI (reg/f:SI 3 r3 [152]) (const_int 1024 [0x400])) Reload 1: reload_in (SI) = (plus:SI (reg/f:SI 3 r3 [152]) (const_int 1024 [0x400])) CORE_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 5) reload_in_reg: (plus:SI (reg/f:SI 3 r3 [152]) (const_int 1024 [0x400])) Reload 2: reload_in (SI) = (plus:SI (plus:SI (reg/f:SI 3 r3 [152]) (const_int 1024 [0x400])) (const_int 176 [0xb0])) CORE_REGS, RELOAD_FOR_INPUT (opnum = 2), inc by 8 reload_in_reg: (plus:SI (plus:SI (reg/f:SI 3 r3 [152]) (const_int 1024 [0x400])) (const_int 176 [0xb0])) Reload 3: reload_in (SI) = (plus:SI (plus:SI (reg/f:SI 3 r3 [152]) (const_int 1024 [0x400])) (const_int 176 [0xb0])) CORE_REGS, RELOAD_FOR_INPUT (opnum = 5), inc by 8 reload_in_reg: (plus:SI (plus:SI (reg/f:SI 3 r3 [152]) (const_int 1024 [0x400])) (const_int 176 [0xb0])) (Note that the duplicate reloads are because the "+" operand has been implicitly converted to an input and an output operand. Reloads 2/3 are there because reload is not sure that the result of LEGITIMIZE_RELOAD_ADDRESS is offsetable, and therefore reloads the whole thing anyway.) Now the problem is that some other arguments of the asm don't all fit into registers, and therefore we get a second pass through find_reloads. At this point, the insn stream has already been modified, so LEGITIMIZE_RELOAD_ADDRESS this time around sees the RTL it has itself generated at the first pass. However, it is not able to recognize this, and therefore doesn't re-generate the required reloads, so instead generic code attempts to handle the nested plus, and creates somewhat unfortunate reloads: Reload 0: reload_in (SI) = (plus:SI (reg/f:SI 3 r3 [152]) (const_int 176 [0xb0])) CORE_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 2) reload_in_reg: (plus:SI (reg/f:SI 3 r3 [152]) (const_int 176 [0xb0])) Reload 1: reload_in (SI) = (plus:SI (reg/f:SI 3 r3 [152]) (const_int 176 [0xb0])) CORE_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 5) reload_in_reg: (plus:SI (reg/f:SI 3 r3 [152]) (const_int 176 [0xb0])) Reload 2: reload_out (SI) = (reg:SI 151 [ tmp ]) GENERAL_REGS, RELOAD_FOR_INSN (opnum = 1) reload_out_reg: (reg:SI 151 [ tmp ]) Reload 3: reload_in (SI) = (plus:SI (plus:SI (reg/f:SI 3 r3 [152]) (const_int 176 [0xb0])) (const_int 1024 [0x400])) CORE_REGS, RELOAD_FOR_INPUT (opnum = 2), inc by 8 reload_in_reg: (plus:SI (plus:SI (reg/f:SI 3 r3 [152]) (const_int 176 [0xb0])) (const_int 1024 [0x400])) Reload 4: reload_in (SI) = (plus:SI (plus:SI (reg/f:SI 3 r3 [152]) (const_int 176 [0xb0])) (const_int 1024 [0x400])) CORE_REGS, RELOAD_FOR_INPUT (opnum = 5), inc by 8 reload_in_reg: (plus:SI (plus:SI (reg/f:SI 3 r3 [152]) (const_int 176 [0xb0])) (const_int 1024 [0x400])) This can be fixed by having LEGITIMIZE_RELOAD_ADDRESS recognize RTL it has generated itself in a prior pass (note that several other back-ends already have a corresponding fix). However, even then, the test case still fails. This is because we then emit reloads that compute (plus:SI (reg 152) (const_int 1024)). But, reg 152 was marked as equivalent to a constant (the LANCHOR address). Also, reload needed another register, and decided to spill reg 152 from its preliminary home in reg 3. Since the register was equivalent to a constant, it is not spilled on the stack; instead, reload assumes that all uses would get reloaded to a rematerialization of the equivalent constant. This is in fact what reload itself will do; but this doesn't happen if the reload is generated by LEGITIMIZE_RELOAD_ADDRESS. In theory, LEGITIMIZE_RELOAD_ADDRESS could attempt to handle them by substituting the equivalent constant and then reloading the result. However, this might need additional steps (pushing to the constant pool, reloading the constant pool address, ...) which would lead to significant duplication of code from core reload. This doesn't seem worthwhile at this point ... Therefore, the patch below fixes this second issue by simply not handling addresses based on a register equivalent to a constant at all in LEGITIMIZE_RELOAD_ADDRESS. In general, common code should do a good enough job for those anyway ... Tested on arm-linux-gnueabi with no regressions, fixes the testcase. OK for mainline? Bye, Ulrich ChangeLog: gcc/ PR target/50305 * config/arm/arm.c (arm_legitimize_reload_address): Recognize output of a previous pass through legitimize_reload_address. Do not attempt to optimize addresses if the base register is equivalent to a constant. gcc/testsuite/ PR target/50305 * gcc.target/arm/pr50305.c: New test. Index: gcc/testsuite/gcc.target/arm/pr50305.c =================================================================== --- gcc/testsuite/gcc.target/arm/pr50305.c (revision 0) +++ gcc/testsuite/gcc.target/arm/pr50305.c (revision 0) @@ -0,0 +1,59 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-omit-frame-pointer -marm -march=armv7-a" } */ + +struct event { + unsigned long long id; + unsigned int flag; +}; + +void dummy(void) +{ + /* This is here to ensure that the offset of perf_event_id below + relative to the LANCHOR symbol exceeds the allowed displacement. */ + static int __warned[300]; + __warned[0] = 1; +} + +extern void *kmem_cache_alloc_trace (void *cachep); +extern void *cs_cachep; +extern int nr_cpu_ids; + +struct event * +event_alloc (int cpu) +{ + static unsigned long long __attribute__((aligned(8))) perf_event_id; + struct event *event; + unsigned long long result; + unsigned long tmp; + + if (cpu >= nr_cpu_ids) + return 0; + + event = kmem_cache_alloc_trace (cs_cachep); + + __asm__ __volatile__ ("dmb" : : : "memory"); + + __asm__ __volatile__("@ atomic64_add_return\n" +"1: ldrexd %0, %H0, [%3]\n" +" adds %0, %0, %4\n" +" adc %H0, %H0, %H4\n" +" strexd %1, %0, %H0, [%3]\n" +" teq %1, #0\n" +" bne 1b" + : "=&r" (result), "=&r" (tmp), "+Qo" (perf_event_id) + : "r" (&perf_event_id), "r" (1LL) + : "cc"); + + __asm__ __volatile__ ("dmb" : : : "memory"); + + event->id = result; + + if (cpu) + event->flag = 1; + + for (cpu = 0; cpu < nr_cpu_ids; cpu++) + kmem_cache_alloc_trace (cs_cachep); + + return event; +} + Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c (revision 178487) +++ gcc/config/arm/arm.c (working copy) @@ -6528,9 +6528,26 @@ int opnum, int type, int ind_levels ATTRIBUTE_UNUSED) { + /* We must recognize output that we have already generated ourselves. */ if (GET_CODE (*p) == PLUS + && GET_CODE (XEXP (*p, 0)) == PLUS + && GET_CODE (XEXP (XEXP (*p, 0), 0)) == REG + && GET_CODE (XEXP (XEXP (*p, 0), 1)) == CONST_INT + && GET_CODE (XEXP (*p, 1)) == CONST_INT) + { + push_reload (XEXP (*p, 0), NULL_RTX, &XEXP (*p, 0), NULL, + MODE_BASE_REG_CLASS (mode), GET_MODE (*p), + VOIDmode, 0, 0, opnum, (enum reload_type) type); + return true; + } + + if (GET_CODE (*p) == PLUS && GET_CODE (XEXP (*p, 0)) == REG && ARM_REGNO_OK_FOR_BASE_P (REGNO (XEXP (*p, 0))) + /* If the base register is equivalent to a constant, let the generic + code handle it. Otherwise we will run into problems if a future + reload pass decides to rematerialize the constant. */ + && !reg_equiv_constant (ORIGINAL_REGNO (XEXP (*p, 0))) && GET_CODE (XEXP (*p, 1)) == CONST_INT) { HOST_WIDE_INT val = INTVAL (XEXP (*p, 1));