Message ID | 0065c16a-d7c3-1237-9deb-73400a9cd0c8@codesourcery.com |
---|---|
State | New |
Headers | show |
Hi Andrew, Thanks for the patch and looking into this. On Mon, Nov 14, 2016 at 04:57:58PM +0000, Andrew Stubbs wrote: > The testcase powerpc/fusion3.c causes an ICE when compiled with > -msoft-float. > Basically, the problem is that the peephole optimization tries to create > a Power9 Fusion instruction, but those do not support SF values in > integer registers (AFAICT). The peepholes do not support it, or maybe the define_insns do not either. The machine of course will not care. > So, presumably, I need to adjust either the predicate or the condition > of the peephole rules. Yes. > The predicate used is "toc_fusion_or_p9_reg_operand", and this might be > the root cause, but I don't know the architecture well enough to be > sure. This fusion is quite simple really. Offset addressing insns can only access a 16-bit range, but instead of writing e.g. lwz 0,0x12345678(3) you can do addis 4,3,0x1234 lwz 0,0x5678(4) and the processor will execute it faster, essentially like if it was written as the first example. See 2.1.1 in Power ISA 3.0. > The predicate code seems to suggest that "toc_fusion", whatever > that is, should be able to do this, but the insn produced by the > peephole uses only UNSPEC_FUSION_P9, which does not. Perhaps this > predicate is inappropriate for the P9 Fusion peephole, or perhaps it > needs to be taught about this corner case? One of those yes. > In any case, I don't want to change the predicate without being sure > what it does (here and elsewhere), so the attached patch solves the > problem by changing the condition. > > Is this OK, or do I need to do something less blunt? We can have floats in GPRs even without TARGET_SOFT_FLOAT, so this does not even work? And yes this is blunt :-) Segher
On 15/11/16 12:29, Segher Boessenkool wrote: > The peepholes do not support it, or maybe the define_insns do not either. > The machine of course will not care. Oh, OK, so probably the bug is not in the peephole at all, but in the define_insn, or lack thereof. More investigation required. Thanks Andrew
On Mon, Nov 14, 2016 at 04:57:58PM +0000, Andrew Stubbs wrote: > The testcase powerpc/fusion3.c causes an ICE when compiled with > -msoft-float. > > The key line in the testcase looks fairly harmless: > > void fusion_float_write (float *p, float f){ p[LARGE] = f; } LARGE is large enough that it won't fit in the offset field of a single instruction. > The error message look like this: > > .../gcc.target/powerpc/fusion3.c: In function 'fusion_float_write': > .../gcc.target/powerpc/fusion3.c:12:1: error: unrecognizable insn: > (insn 18 4 14 2 (parallel [ > (set (mem:SF (plus:SI (plus:SI (reg:SI 3 3 [ p ]) > (const_int 327680 [0x50000])) > (const_int -29420 [0xffffffffffff8d14])) [1 > MEM[(float *)p_1(D) + 298260B]+0 S4 A32]) > (unspec:SF [ > (reg:SF 4 4 [ f ]) > ] UNSPEC_FUSION_P9)) > (clobber (reg/f:SI 3 3 [157])) > ]) "/scratch/astubbs/fsf/src/gcc-mainline/gcc/testsuite/gcc.target/powerpc/fusion3.c":12 > -1 > (nil)) When I wrote the basic fusion stuff, I was assuming nobody would do -msoft-float -mcpu=power7. By the time the code had been written, the soft-float libraries were no longer being built on the 64-bit Linux systems, due to the Linux distributions dropping support for them. However, while we can make this particular failure go away by making powerpc_p9vector_ok (and probably some of the other targets needing VSX features) false if -msoft-float it is still a problem, since SFmode can go in GPRs. This is the same basic failure (PR 78101) that I saw in building the next generation of the Spec benchmark suite, except that it is a DFmode instead of SFmode. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78101 Both are trying to store a value from a GPR > Basically, the problem is that the peephole optimization tries to > create a Power9 Fusion instruction, but those do not support SF > values in integer registers (AFAICT). > > So, presumably, I need to adjust either the predicate or the > condition of the peephole rules. Now, that I have a little time, I can look into this, to at least make predicate and peepholes match. There is some other stuff (support for the new load/store that were added to the compiler after that we should also tackle). -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
On 15/11/16 21:06, Michael Meissner wrote: > Now, that I have a little time, I can look into this, to at least make > predicate and peepholes match. There is some other stuff (support for the new > load/store that were added to the compiler after that we should also tackle). I've been investigating this today, and I've found that the insn does not match because the "fusion_addis_mem_combo_store" predicate requires TARGET_SF_FPR is true, which in turn requires TARGET_HARD_FLOAT is true. So basically the fusion stuff is disabled in soft-float mode regardless of where the value is stored. Anyway, I'm at end-of-day now, so let me know if you come up with anything. Thanks Andrew
On Tue, Nov 15, 2016 at 09:11:56PM +0000, Andrew Stubbs wrote: > On 15/11/16 21:06, Michael Meissner wrote: > >Now, that I have a little time, I can look into this, to at least make > >predicate and peepholes match. There is some other stuff (support for the new > >load/store that were added to the compiler after that we should also tackle). > > I've been investigating this today, and I've found that the insn > does not match because the "fusion_addis_mem_combo_store" predicate > requires TARGET_SF_FPR is true, which in turn requires > TARGET_HARD_FLOAT is true. > > So basically the fusion stuff is disabled in soft-float mode > regardless of where the value is stored. > > Anyway, I'm at end-of-day now, so let me know if you come up with anything. Yeah, SFmode and DFmode should not have the TARGET_{S,D}F_FPR checks. But a secondary problem is the early clobber in the match_scratch. Before the peephole2 we have: (insn 7 4 8 2 (set (reg/f:DI 3 3 [157]) (plus:DI (reg:DI 3 3 [ p ]) (const_int 327680 [0x50000]))) "fusion3.c":12 75 {*adddi3} (nil)) (insn 8 7 14 2 (set (mem:SF (plus:DI (reg/f:DI 3 3 [157]) (const_int -29420 [0xffffffffffff8d14])) [1 MEM[(float *)p_1(D) + 298260B]+0 S4 A32]) (reg:SF 4 4 [ f ])) "fusion3.c":12 484 {*movsf_softfloat} (expr_list:REG_DEAD (reg:SF 4 4 [ f ]) (expr_list:REG_DEAD (reg/f:DI 3 3 [157]) (nil)))) This gets combined into: (insn 17 4 14 2 (parallel [ (set (mem:SF (plus:DI (plus:DI (reg:DI 3 3 [ p ]) (const_int 327680 [0x50000])) (const_int -29420 [0xffffffffffff8d14])) [1 MEM[(float *)p_1(D) + 298260B]+0 S4 A32]) (unspec:SF [ (reg:SF 4 4 [ f ]) ] UNSPEC_FUSION_P9)) (clobber (reg/f:DI 3 3 [157])) ]) "fusion3.c":12 -1 (nil)) where (reg:DI 3) is the base register temporary. Since it was used as part of the original address, the early clobber check in constrain_operands will flag it. This is what is causing PR 78101. Thus there are two issues here. In the scope of the ISA 3.0/power9 work, there is also the issue that the new d-form load and stores should be folded into this as well. This is due to me doing the original power9/ISA 3.0 fusion first, and then putting it on the shelf, and then months later working on the new address forms, and never getting back to fusion. As it is currently written, there was an expectation that the fusion stuff would eventually move from being a peephole2 to being part of the addressing handled before register allocation. Then the secondary reload hook would call the appropriate fusion helper function, which is why we have int_ and gpr_ fusion insn functions, each based on a separate register class. The int_ fusion form allows SFmode/DFmode, but as you point out the combo recognizer doesn't allow them in GPRs. I'm not as sure that we will have the will to move fused addresses early before register allocation, and whether we need to keep around the separate int_ and fpr_ forms. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
On 16/11/16 13:10, Michael Meissner wrote: > Yeah, SFmode and DFmode should not have the TARGET_{S,D}F_FPR checks. So, I can safely resolve my initial problem by simply removing them? And that wouldn't break the other use of that predicate? > But a secondary problem is the early clobber in the match_scratch. So, the FPR_FUSION insn works because operands 1 and 2 cannot conflict, which means the early-clobber is not necessary, but the GPR_FUSION insn cannot work because there's no way to ensure that operands 1 and 2 don't conflict without also specifying that operands 0 and 2 don't conflict, which they commonly do. We could fix it, for now, by adding new patterns that fit both cases (given that the register numbers are known at peephole time). Or, we could disable the peephole in the case where this would occur (as my original patch does, albeit bluntly). Or, something else? Andrew
On 16/11/16 17:05, Michael Meissner wrote: > I'm starting to test this patch right now (it's on LE power8 stage3 right now, > and I need to build BE power8 and BE power7 versions when I get into the office > shortly, and build spec 2017 with it for PR 78101): Did the testing go OK? Andrew
On Wed, Nov 23, 2016 at 11:52:01AM +0000, Andrew Stubbs wrote: > On 16/11/16 17:05, Michael Meissner wrote: > >I'm starting to test this patch right now (it's on LE power8 stage3 right now, > >and I need to build BE power8 and BE power7 versions when I get into the office > >shortly, and build spec 2017 with it for PR 78101): > > Did the testing go OK? Yes, the testing worked fine, and I checked in the fix. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
2016-11-11 Andrew Stubbs <ams@codesourcery.com> gcc/ * config/rs6000/rs6000.md: Disable P9-fusion peepholes when TARGET_SOFT_FLOAT is set. gcc/testsuite/ * gcc.target/powerpc/fusion3.c: Skip on -msoft-float. diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index de959c9..28d8174 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -13024,7 +13024,8 @@ (set (match_operand:SFDF 2 "toc_fusion_or_p9_reg_operand" "") (match_operand:SFDF 3 "fusion_offsettable_mem_operand" ""))] "TARGET_P9_FUSION && peep2_reg_dead_p (2, operands[0]) - && fusion_p9_p (operands[0], operands[1], operands[2], operands[3])" + && fusion_p9_p (operands[0], operands[1], operands[2], operands[3]) + && !TARGET_SOFT_FLOAT" [(const_int 0)] { expand_fusion_p9_load (operands); @@ -13037,7 +13038,8 @@ (set (match_operand:SFDF 2 "offsettable_mem_operand" "") (match_operand:SFDF 3 "toc_fusion_or_p9_reg_operand" ""))] "TARGET_P9_FUSION && peep2_reg_dead_p (2, operands[0]) - && fusion_p9_p (operands[0], operands[1], operands[2], operands[3])" + && fusion_p9_p (operands[0], operands[1], operands[2], operands[3]) + && !TARGET_SOFT_FLOAT" [(const_int 0)] { expand_fusion_p9_store (operands); @@ -13050,7 +13052,8 @@ (set (match_dup 0) (ior:SDI (match_dup 0) (match_operand:SDI 2 "u_short_cint_operand" "")))] - "TARGET_P9_FUSION" + "TARGET_P9_FUSION + && !TARGET_SOFT_FLOAT" [(set (match_dup 0) (unspec:SDI [(match_dup 1) (match_dup 2)] UNSPEC_FUSION_P9))]) @@ -13063,7 +13066,8 @@ (match_operand:SDI 3 "u_short_cint_operand" "")))] "TARGET_P9_FUSION && !rtx_equal_p (operands[0], operands[2]) - && peep2_reg_dead_p (2, operands[0])" + && peep2_reg_dead_p (2, operands[0]) + && !TARGET_SOFT_FLOAT" [(set (match_dup 2) (unspec:SDI [(match_dup 1) (match_dup 3)] UNSPEC_FUSION_P9))]) diff --git a/gcc/testsuite/gcc.target/powerpc/fusion3.c b/gcc/testsuite/gcc.target/powerpc/fusion3.c index 8eca640..20992d0 100644 --- a/gcc/testsuite/gcc.target/powerpc/fusion3.c +++ b/gcc/testsuite/gcc.target/powerpc/fusion3.c @@ -2,6 +2,7 @@ /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ /* { dg-require-effective-target powerpc_p9vector_ok } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ +/* { dg-skip-if "-mpower9-fusion and -msoft-float are incompatible" { powerpc*-*-* } { "-msoft-float" } {} } */ /* { dg-options "-mcpu=power7 -mtune=power9 -O3" } */ #define LARGE 0x12345