Message ID | 55E5790A.2090205@arm.com |
---|---|
State | New |
Headers | show |
Ping. https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00020.html Thanks, Kyrill On 01/09/15 11:08, Kyrill Tkachov wrote: > Hi all, > > The ARMv8-A reference manual says: > "CNEG <Wd>, <Wn>, <cond> > is equivalent to > CSNEG <Wd>, <Wn>, <Wn>, invert(<cond>) > and is the preferred disassembly when Rn == Rm && cond != '111x'." > > That is, when the two input registers are the same we can use the shorter CNEG mnemonic > with the inverse condition instead of the longer CSNEG instruction. Similarly for the > CSINV and CSINC instructions, they have shorter CINV and CINC forms. > This patch adjusts the output templates to emit the preferred shorter sequences when possible. > > The new mnemonics are just aliases, they map down to the same instruction in the end, so there > are no performance or behaviour implications. But it does make the assembly a bit more readable > IMO, since: > "cneg w27, w9, le" > can be simply read as "if the condition is less or equal negate w9" instead of the previous: > "csneg w27, w9, w9, gt" where you have to remember which of the input registers is negated. > > > Bootstrapped and tested on aarch64-linux-gnu. > Ok for trunk? > > Thanks, > Kyrill > > 2015-09-01 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * config/aarch64/aarch64.md (csinc3<mode>_insn): Use CINC > mnemonic when possible. > (*csinv3<mode>_insn): Use CINV mnemonic when possible. > (csneg3<mode>_insn): USE CNEG mnemonic when possible. > > 2015-09-01 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * gcc.target/aarch64/abs_1.c: Update scan-assembler checks > to allow cneg. > * gcc.target/aarch64/cond_op_imm_1.c: Likewise. Likewise for cinv. > * gcc.target/aarch64/mod_2.c: Likewise.
On Tue, Sep 01, 2015 at 11:08:10AM +0100, Kyrill Tkachov wrote: > Hi all, > > The ARMv8-A reference manual says: > "CNEG <Wd>, <Wn>, <cond> > is equivalent to > CSNEG <Wd>, <Wn>, <Wn>, invert(<cond>) > and is the preferred disassembly when Rn == Rm && cond != '111x'." > > That is, when the two input registers are the same we can use the shorter CNEG mnemonic > with the inverse condition instead of the longer CSNEG instruction. Similarly for the > CSINV and CSINC instructions, they have shorter CINV and CINC forms. > This patch adjusts the output templates to emit the preferred shorter sequences when possible. > > The new mnemonics are just aliases, they map down to the same instruction in the end, so there > are no performance or behaviour implications. But it does make the assembly a bit more readable > IMO, since: > "cneg w27, w9, le" > can be simply read as "if the condition is less or equal negate w9" instead of the previous: > "csneg w27, w9, w9, gt" where you have to remember which of the input registers is negated. > > > Bootstrapped and tested on aarch64-linux-gnu. > Ok for trunk? > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 77bc7cd..2e4b26c 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -3090,7 +3090,12 @@ (define_insn "csinc3<mode>_insn" > (const_int 1)) > (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))] > "" > - "csinc\\t%<w>0, %<w>3, %<w>2, %M1" > + { > + if (rtx_equal_p (operands[2], operands[3])) > + return "cinc\\t%<w>0, %<w>2, %m1"; > + else > + return "csinc\\t%<w>0, %<w>3, %<w>2, %M1"; > + } > [(set_attr "type" "csel")] > ) I guess you do it this way rather than just adding a new alternative in the pattern to avoid any chance of constraining the register allocator, but would this not be more natural to read as an {r, r, r, 2} alternative, or similar? If you've given that some thought and decided it doesn't work for you, then this is OK for trunk. Thanks, James
On 11/09/15 16:31, James Greenhalgh wrote: > On Tue, Sep 01, 2015 at 11:08:10AM +0100, Kyrill Tkachov wrote: >> Hi all, >> >> The ARMv8-A reference manual says: >> "CNEG <Wd>, <Wn>, <cond> >> is equivalent to >> CSNEG <Wd>, <Wn>, <Wn>, invert(<cond>) >> and is the preferred disassembly when Rn == Rm && cond != '111x'." >> >> That is, when the two input registers are the same we can use the shorter CNEG mnemonic >> with the inverse condition instead of the longer CSNEG instruction. Similarly for the >> CSINV and CSINC instructions, they have shorter CINV and CINC forms. >> This patch adjusts the output templates to emit the preferred shorter sequences when possible. >> >> The new mnemonics are just aliases, they map down to the same instruction in the end, so there >> are no performance or behaviour implications. But it does make the assembly a bit more readable >> IMO, since: >> "cneg w27, w9, le" >> can be simply read as "if the condition is less or equal negate w9" instead of the previous: >> "csneg w27, w9, w9, gt" where you have to remember which of the input registers is negated. >> >> >> Bootstrapped and tested on aarch64-linux-gnu. >> Ok for trunk? >> >> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md >> index 77bc7cd..2e4b26c 100644 >> --- a/gcc/config/aarch64/aarch64.md >> +++ b/gcc/config/aarch64/aarch64.md >> @@ -3090,7 +3090,12 @@ (define_insn "csinc3<mode>_insn" >> (const_int 1)) >> (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))] >> "" >> - "csinc\\t%<w>0, %<w>3, %<w>2, %M1" >> + { >> + if (rtx_equal_p (operands[2], operands[3])) >> + return "cinc\\t%<w>0, %<w>2, %m1"; >> + else >> + return "csinc\\t%<w>0, %<w>3, %<w>2, %M1"; >> + } >> [(set_attr "type" "csel")] >> ) > I guess you do it this way rather than just adding a new alternative in > the pattern to avoid any chance of constraining the register allocator, but > would this not be more natural to read as an {r, r, r, 2} alternative, or > similar? I had not considered this approach and I'm a bit sceptical on how feasible it is. If we put the {r,r,r,2} as a second alternative then it will be a purely more restrictive version of the first alternative and so will never match. If, however, we put it as the first alternative we'll be expressing some preference for allocating the same register for operands 2 and 3, which is not something we want to do. > > If you've given that some thought and decided it doesn't work for you, > then this is OK for trunk. Given the above I'll commit this version next week if there are no objections. Thanks, Kyrill > > Thanks, > James >
On Tue, Sep 1, 2015 at 6:08 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote: > Hi all, > > The ARMv8-A reference manual says: > "CNEG <Wd>, <Wn>, <cond> > is equivalent to > CSNEG <Wd>, <Wn>, <Wn>, invert(<cond>) > and is the preferred disassembly when Rn == Rm && cond != '111x'." > > That is, when the two input registers are the same we can use the shorter > CNEG mnemonic > with the inverse condition instead of the longer CSNEG instruction. > Similarly for the > CSINV and CSINC instructions, they have shorter CINV and CINC forms. > This patch adjusts the output templates to emit the preferred shorter > sequences when possible. > > The new mnemonics are just aliases, they map down to the same instruction in > the end, so there > are no performance or behaviour implications. But it does make the assembly > a bit more readable > IMO, since: > "cneg w27, w9, le" > can be simply read as "if the condition is less or equal negate w9" instead > of the previous: > "csneg w27, w9, w9, gt" where you have to remember which of the input > registers is negated. > > > Bootstrapped and tested on aarch64-linux-gnu. > Ok for trunk? I really think this kind of special casing is not correct and does not belong in the compiler. The main reason it complicates the back-end more than the benefit of easier of reading the assembly code. Thanks, Andrew Pinski > > Thanks, > Kyrill > > 2015-09-01 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * config/aarch64/aarch64.md (csinc3<mode>_insn): Use CINC > mnemonic when possible. > (*csinv3<mode>_insn): Use CINV mnemonic when possible. > (csneg3<mode>_insn): USE CNEG mnemonic when possible. > > 2015-09-01 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * gcc.target/aarch64/abs_1.c: Update scan-assembler checks > to allow cneg. > * gcc.target/aarch64/cond_op_imm_1.c: Likewise. Likewise for cinv. > * gcc.target/aarch64/mod_2.c: Likewise.
Hi Andrew, On 12/09/15 02:15, Andrew Pinski wrote: > On Tue, Sep 1, 2015 at 6:08 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote: >> Hi all, >> >> The ARMv8-A reference manual says: >> "CNEG <Wd>, <Wn>, <cond> >> is equivalent to >> CSNEG <Wd>, <Wn>, <Wn>, invert(<cond>) >> and is the preferred disassembly when Rn == Rm && cond != '111x'." >> >> That is, when the two input registers are the same we can use the shorter >> CNEG mnemonic >> with the inverse condition instead of the longer CSNEG instruction. >> Similarly for the >> CSINV and CSINC instructions, they have shorter CINV and CINC forms. >> This patch adjusts the output templates to emit the preferred shorter >> sequences when possible. >> >> The new mnemonics are just aliases, they map down to the same instruction in >> the end, so there >> are no performance or behaviour implications. But it does make the assembly >> a bit more readable >> IMO, since: >> "cneg w27, w9, le" >> can be simply read as "if the condition is less or equal negate w9" instead >> of the previous: >> "csneg w27, w9, w9, gt" where you have to remember which of the input >> registers is negated. >> >> >> Bootstrapped and tested on aarch64-linux-gnu. >> Ok for trunk? > I really think this kind of special casing is not correct and does not > belong in the compiler. The main reason it complicates the back-end > more than the benefit of easier of reading the assembly code. <sorry for the delay>. The complication is an extra if-else statement with explicit strings on each arm i.e. there's no snprintf trickery. I tend to read a lot of the generated assembly when performing assembly comparisons while working on performance patches and I find that having the cneg from with two register operands and the negate condition is quicker to parse than the full csneg form where I need to remember that extra bit of info that the condition there must be inverted to get the negation condition. If you feel very strongly against this I can withdraw this patch, but I'd rather have it in. FWIW, clang also emits the CNEG when it can AFAICS, though I admit that's not necessarily a strong argument for this change. Kyrill > > Thanks, > Andrew Pinski > >> Thanks, >> Kyrill >> >> 2015-09-01 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >> >> * config/aarch64/aarch64.md (csinc3<mode>_insn): Use CINC >> mnemonic when possible. >> (*csinv3<mode>_insn): Use CINV mnemonic when possible. >> (csneg3<mode>_insn): USE CNEG mnemonic when possible. >> >> 2015-09-01 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >> >> * gcc.target/aarch64/abs_1.c: Update scan-assembler checks >> to allow cneg. >> * gcc.target/aarch64/cond_op_imm_1.c: Likewise. Likewise for cinv. >> * gcc.target/aarch64/mod_2.c: Likewise.
commit 5f2598ffa7e0d7db92163cc5e8f4f26f7d2aff5a Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> Date: Fri Aug 21 14:51:55 2015 +0100 [AArch64] Use preferred aliases for CSNEG, CSINC, CSINV diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 77bc7cd..2e4b26c 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3090,7 +3090,12 @@ (define_insn "csinc3<mode>_insn" (const_int 1)) (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))] "" - "csinc\\t%<w>0, %<w>3, %<w>2, %M1" + { + if (rtx_equal_p (operands[2], operands[3])) + return "cinc\\t%<w>0, %<w>2, %m1"; + else + return "csinc\\t%<w>0, %<w>3, %<w>2, %M1"; + } [(set_attr "type" "csel")] ) @@ -3101,7 +3106,12 @@ (define_insn "*csinv3<mode>_insn" (not:GPI (match_operand:GPI 2 "register_operand" "r")) (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))] "" - "csinv\\t%<w>0, %<w>3, %<w>2, %M1" + { + if (rtx_equal_p (operands[2], operands[3])) + return "cinv\\t%<w>0, %<w>2, %m1"; + else + return "csinv\\t%<w>0, %<w>3, %<w>2, %M1"; + } [(set_attr "type" "csel")] ) @@ -3112,7 +3122,12 @@ (define_insn "csneg3<mode>_insn" (neg:GPI (match_operand:GPI 2 "register_operand" "r")) (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))] "" - "csneg\\t%<w>0, %<w>3, %<w>2, %M1" + { + if (rtx_equal_p (operands[2], operands[3])) + return "cneg\\t%<w>0, %<w>2, %m1"; + else + return "csneg\\t%<w>0, %<w>3, %<w>2, %M1"; + } [(set_attr "type" "csel")] ) diff --git a/gcc/testsuite/gcc.target/aarch64/abs_1.c b/gcc/testsuite/gcc.target/aarch64/abs_1.c index 39364f4..84996a42 100644 --- a/gcc/testsuite/gcc.target/aarch64/abs_1.c +++ b/gcc/testsuite/gcc.target/aarch64/abs_1.c @@ -7,14 +7,14 @@ extern void abort (void); long long abs64 (long long a) { - /* { dg-final { scan-assembler "csneg\t" } } */ + /* { dg-final { scan-assembler "cs?neg\t" } } */ return llabs (a); } long long abs64_in_dreg (long long a) { - /* { dg-final { scan-assembler "csneg\t" } } */ + /* { dg-final { scan-assembler "cs?neg\t" } } */ register long long x asm ("d8") = a; register long long y asm ("d9"); asm volatile ("" : : "w" (x)); diff --git a/gcc/testsuite/gcc.target/aarch64/cond_op_imm_1.c b/gcc/testsuite/gcc.target/aarch64/cond_op_imm_1.c index e93a693..a5394cc 100644 --- a/gcc/testsuite/gcc.target/aarch64/cond_op_imm_1.c +++ b/gcc/testsuite/gcc.target/aarch64/cond_op_imm_1.c @@ -12,7 +12,7 @@ foonegsi (int a) return a ? N : -N; } -/* { dg-final { scan-assembler "csneg\tw\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler "cs?neg\tw\[0-9\]*.*" } } */ int @@ -21,7 +21,7 @@ fooinvsi (int a) return a ? N : ~N; } -/* { dg-final { scan-assembler "csinv\tw\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler "cs?inv\tw\[0-9\]*.*" } } */ long long @@ -36,7 +36,7 @@ largefooneg (long long a) return a ? M : -M; } -/* { dg-final { scan-assembler "csneg\tx\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler "cs?neg\tx\[0-9\]*.*" } } */ long long fooinvdi (long long a) @@ -50,7 +50,7 @@ largefooinv (long long a) return a ? M : ~M; } -/* { dg-final { scan-assembler "csinv\tx\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler "cs?inv\tx\[0-9\]*.*" } } */ int diff --git a/gcc/testsuite/gcc.target/aarch64/mod_2.c b/gcc/testsuite/gcc.target/aarch64/mod_2.c index 2645c18..a49783d 100644 --- a/gcc/testsuite/gcc.target/aarch64/mod_2.c +++ b/gcc/testsuite/gcc.target/aarch64/mod_2.c @@ -3,5 +3,5 @@ #include "mod_2.x" -/* { dg-final { scan-assembler "csneg\t\[wx\]\[0-9\]*" } } */ +/* { dg-final { scan-assembler "cs?neg\t\[wx\]\[0-9\]*" } } */ /* { dg-final { scan-assembler-times "and\t\[wx\]\[0-9\]*" 1 } } */