diff mbox

Reduce stack usage in sha512 (PR target/77308)

Message ID AM4PR0701MB2162BAB5D4B1D2548BC519F1E4D30@AM4PR0701MB2162.eurprd07.prod.outlook.com
State New
Headers show

Commit Message

Bernd Edlinger Oct. 18, 2016, 2:45 p.m. UTC
On 10/18/16 10:36, Christophe Lyon wrote:
>

> I am seeing a lot of regressions since this patch was committed:

> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html

>

> (you can click on "REGRESSED" to see the list of regressions, "sum"

> and "log" to download

> the corresponding .sum/.log)

>

> Thanks,

>

> Christophe

>


Oh, sorry, I have completely missed that.
Unfortunately I have tested one of the good combinations.

I have not considered the case that in and out could be
the same register.  But that happens here.


This should solve it.

Can you give it a try?



Thanks
Bernd.

Comments

Christophe Lyon Oct. 18, 2016, 3:35 p.m. UTC | #1
On 18 October 2016 at 16:45, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
> On 10/18/16 10:36, Christophe Lyon wrote:

>>

>> I am seeing a lot of regressions since this patch was committed:

>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html

>>

>> (you can click on "REGRESSED" to see the list of regressions, "sum"

>> and "log" to download

>> the corresponding .sum/.log)

>>

>> Thanks,

>>

>> Christophe

>>

>

> Oh, sorry, I have completely missed that.

> Unfortunately I have tested one of the good combinations.

>

> I have not considered the case that in and out could be

> the same register.  But that happens here.

>

>

> This should solve it.

>

> Can you give it a try?

>


Sure, I have started the validations, it will take a few hours.

>

>

> Thanks

> Bernd.
Christophe Lyon Oct. 19, 2016, 6:55 a.m. UTC | #2
On 18 October 2016 at 17:35, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> On 18 October 2016 at 16:45, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:

>> On 10/18/16 10:36, Christophe Lyon wrote:

>>>

>>> I am seeing a lot of regressions since this patch was committed:

>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html

>>>

>>> (you can click on "REGRESSED" to see the list of regressions, "sum"

>>> and "log" to download

>>> the corresponding .sum/.log)

>>>

>>> Thanks,

>>>

>>> Christophe

>>>

>>

>> Oh, sorry, I have completely missed that.

>> Unfortunately I have tested one of the good combinations.

>>

>> I have not considered the case that in and out could be

>> the same register.  But that happens here.

>>

>>

>> This should solve it.

>>

>> Can you give it a try?

>>

>

> Sure, I have started the validations, it will take a few hours.

>


It looks OK, thanks.

>>

>>

>> Thanks

>> Bernd.
Kyrill Tkachov Oct. 19, 2016, 8:34 a.m. UTC | #3
On 19/10/16 07:55, Christophe Lyon wrote:
> On 18 October 2016 at 17:35, Christophe Lyon <christophe.lyon@linaro.org> wrote:

>> On 18 October 2016 at 16:45, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:

>>> On 10/18/16 10:36, Christophe Lyon wrote:

>>>> I am seeing a lot of regressions since this patch was committed:

>>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html

>>>>

>>>> (you can click on "REGRESSED" to see the list of regressions, "sum"

>>>> and "log" to download

>>>> the corresponding .sum/.log)

>>>>

>>>> Thanks,

>>>>

>>>> Christophe

>>>>

>>> Oh, sorry, I have completely missed that.

>>> Unfortunately I have tested one of the good combinations.

>>>

>>> I have not considered the case that in and out could be

>>> the same register.  But that happens here.

>>>

>>>

>>> This should solve it.

>>>

>>> Can you give it a try?

>>>

>> Sure, I have started the validations, it will take a few hours.

>>

> It looks OK, thanks.


Ok. Thanks for testing Christophe.
Sorry for not catching it in review.
Kyrill

>>>

>>> Thanks

>>> Bernd.
Christophe Lyon Oct. 19, 2016, 10:44 a.m. UTC | #4
On 19 October 2016 at 10:34, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
>

> On 19/10/16 07:55, Christophe Lyon wrote:

>>

>> On 18 October 2016 at 17:35, Christophe Lyon <christophe.lyon@linaro.org>

>> wrote:

>>>

>>> On 18 October 2016 at 16:45, Bernd Edlinger <bernd.edlinger@hotmail.de>

>>> wrote:

>>>>

>>>> On 10/18/16 10:36, Christophe Lyon wrote:

>>>>>

>>>>> I am seeing a lot of regressions since this patch was committed:

>>>>>

>>>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html

>>>>>

>>>>> (you can click on "REGRESSED" to see the list of regressions, "sum"

>>>>> and "log" to download

>>>>> the corresponding .sum/.log)

>>>>>

>>>>> Thanks,

>>>>>

>>>>> Christophe

>>>>>

>>>> Oh, sorry, I have completely missed that.

>>>> Unfortunately I have tested one of the good combinations.

>>>>

>>>> I have not considered the case that in and out could be

>>>> the same register.  But that happens here.

>>>>

>>>>

>>>> This should solve it.

>>>>

>>>> Can you give it a try?

>>>>

>>> Sure, I have started the validations, it will take a few hours.

>>>

>> It looks OK, thanks.

>

>

> Ok. Thanks for testing Christophe.

> Sorry for not catching it in review.

> Kyrill

>


Since it was painful to check that this 2nd patch fixed all the regressions
observed in all the configurations, I ran another validation with
the 2 patches combined, on top of r241272 to check the effect of the two
over the previous reference.

It turns out there is still a failure:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/241272-r241273-combined/report-build-info.html
gcc.c-torture/execute/pr34971.c   -O0  execution test
now fails on arm-none-linux-gnueabihf when using thumb mode, expect
when targeting cortex-a5.

Christophe

>>>>

>>>> Thanks

>>>> Bernd.

>

>
Bernd Edlinger Oct. 19, 2016, 4:06 p.m. UTC | #5
On 10/19/16 12:44, Christophe Lyon wrote:
> On 19 October 2016 at 10:34, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:

>>

>> On 19/10/16 07:55, Christophe Lyon wrote:

>>>

>>> On 18 October 2016 at 17:35, Christophe Lyon <christophe.lyon@linaro.org>

>>> wrote:

>>>>

>>>> On 18 October 2016 at 16:45, Bernd Edlinger <bernd.edlinger@hotmail.de>

>>>> wrote:

>>>>>

>>>>> On 10/18/16 10:36, Christophe Lyon wrote:

>>>>>>

>>>>>> I am seeing a lot of regressions since this patch was committed:

>>>>>>

>>>>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html

>>>>>>

>>>>>> (you can click on "REGRESSED" to see the list of regressions, "sum"

>>>>>> and "log" to download

>>>>>> the corresponding .sum/.log)

>>>>>>

>>>>>> Thanks,

>>>>>>

>>>>>> Christophe

>>>>>>

>>>>> Oh, sorry, I have completely missed that.

>>>>> Unfortunately I have tested one of the good combinations.

>>>>>

>>>>> I have not considered the case that in and out could be

>>>>> the same register.  But that happens here.

>>>>>

>>>>>

>>>>> This should solve it.

>>>>>

>>>>> Can you give it a try?

>>>>>

>>>> Sure, I have started the validations, it will take a few hours.

>>>>

>>> It looks OK, thanks.

>>

>>

>> Ok. Thanks for testing Christophe.

>> Sorry for not catching it in review.

>> Kyrill

>>

>

> Since it was painful to check that this 2nd patch fixed all the regressions

> observed in all the configurations, I ran another validation with

> the 2 patches combined, on top of r241272 to check the effect of the two

> over the previous reference.

>

> It turns out there is still a failure:

> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/241272-r241273-combined/report-build-info.html

> gcc.c-torture/execute/pr34971.c   -O0  execution test

> now fails on arm-none-linux-gnueabihf when using thumb mode, expect

> when targeting cortex-a5.

>


Thanks Christophe, I looked in the test case,
and I think that is a pre-existing issue.

I can make the test case fail before my patch:

struct foo
{
   unsigned long long b:40;
} x;

extern void abort (void);

void test1(unsigned long long res)
{
   /* Build a rotate expression on a 40 bit argument.  */
   if ((x.b<<9) + (x.b>>31) != res)
     abort ();
}

int main()
{
   x.b = 0x0100000001;
   test1(0x0000000202);
   x.b = 0x0100000000;
   test1(0x0000000002);
   return 0;
}


fails with -mcpu=cortex-a9 -mthumb -mfpu=neon-fp16 -O0
even before my patch.

before reload we have this insn:
(insn 19 18 20 2 (parallel [
             (set (reg:DI 113 [ _4 ])
                 (lshiftrt:DI (reg:DI 112 [ _3 ])
                     (const_int 31 [0x1f])))
             (clobber (scratch:SI))
             (clobber (scratch:SI))
             (clobber (scratch:DI))
             (clobber (reg:CC 100 cc))
         ]) "pr34971.c":11 1232 {lshrdi3_neon}


which is replaced to this insn:
(insn 19 18 20 2 (parallel [
             (set (reg:DI 1 r1 [orig:113 _4 ] [113])
                 (lshiftrt:DI (reg:DI 0 r0 [orig:112 _3 ] [112])
                     (const_int 31 [0x1f])))
             (clobber (scratch:SI))
             (clobber (scratch:SI))
             (clobber (scratch:DI))
             (clobber (reg:CC 100 cc))
         ]) "pr34971.c":11 1232 {lshrdi3_neon}
      (nil))

And now that is not supposed to happen, when this code
is executed for shifts by a constant less than 32:

           emit_insn (SET (out_down, LSHIFT (code, in_down, amount)));
           emit_insn (SET (out_down,
                           ORR (REV_LSHIFT (code, in_up, reverse_amount),
                                out_down)));
           emit_insn (SET (out_up, SHIFT (code, in_up, amount)));


out_down may not be the same hard register than in_up for this to work.

I opened a new BZ for this:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041

I am not sure if this is a LRA issue or if it can be handled in the
shift expansion.

Is the second patch good for trunk as it is?


Thanks
Bernd.
Kyrill Tkachov Oct. 19, 2016, 4:46 p.m. UTC | #6
On 19/10/16 17:06, Bernd Edlinger wrote:
> On 10/19/16 12:44, Christophe Lyon wrote:

>> On 19 October 2016 at 10:34, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:

>>> On 19/10/16 07:55, Christophe Lyon wrote:

>>>> On 18 October 2016 at 17:35, Christophe Lyon <christophe.lyon@linaro.org>

>>>> wrote:

>>>>> On 18 October 2016 at 16:45, Bernd Edlinger <bernd.edlinger@hotmail.de>

>>>>> wrote:

>>>>>> On 10/18/16 10:36, Christophe Lyon wrote:

>>>>>>> I am seeing a lot of regressions since this patch was committed:

>>>>>>>

>>>>>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html

>>>>>>>

>>>>>>> (you can click on "REGRESSED" to see the list of regressions, "sum"

>>>>>>> and "log" to download

>>>>>>> the corresponding .sum/.log)

>>>>>>>

>>>>>>> Thanks,

>>>>>>>

>>>>>>> Christophe

>>>>>>>

>>>>>> Oh, sorry, I have completely missed that.

>>>>>> Unfortunately I have tested one of the good combinations.

>>>>>>

>>>>>> I have not considered the case that in and out could be

>>>>>> the same register.  But that happens here.

>>>>>>

>>>>>>

>>>>>> This should solve it.

>>>>>>

>>>>>> Can you give it a try?

>>>>>>

>>>>> Sure, I have started the validations, it will take a few hours.

>>>>>

>>>> It looks OK, thanks.

>>>

>>> Ok. Thanks for testing Christophe.

>>> Sorry for not catching it in review.

>>> Kyrill

>>>

>> Since it was painful to check that this 2nd patch fixed all the regressions

>> observed in all the configurations, I ran another validation with

>> the 2 patches combined, on top of r241272 to check the effect of the two

>> over the previous reference.

>>

>> It turns out there is still a failure:

>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/241272-r241273-combined/report-build-info.html

>> gcc.c-torture/execute/pr34971.c   -O0  execution test

>> now fails on arm-none-linux-gnueabihf when using thumb mode, expect

>> when targeting cortex-a5.

>>

> Thanks Christophe, I looked in the test case,

> and I think that is a pre-existing issue.

>

> I can make the test case fail before my patch:

>

> struct foo

> {

>     unsigned long long b:40;

> } x;

>

> extern void abort (void);

>

> void test1(unsigned long long res)

> {

>     /* Build a rotate expression on a 40 bit argument.  */

>     if ((x.b<<9) + (x.b>>31) != res)

>       abort ();

> }

>

> int main()

> {

>     x.b = 0x0100000001;

>     test1(0x0000000202);

>     x.b = 0x0100000000;

>     test1(0x0000000002);

>     return 0;

> }

>

>

> fails with -mcpu=cortex-a9 -mthumb -mfpu=neon-fp16 -O0

> even before my patch.

>

> before reload we have this insn:

> (insn 19 18 20 2 (parallel [

>               (set (reg:DI 113 [ _4 ])

>                   (lshiftrt:DI (reg:DI 112 [ _3 ])

>                       (const_int 31 [0x1f])))

>               (clobber (scratch:SI))

>               (clobber (scratch:SI))

>               (clobber (scratch:DI))

>               (clobber (reg:CC 100 cc))

>           ]) "pr34971.c":11 1232 {lshrdi3_neon}

>

>

> which is replaced to this insn:

> (insn 19 18 20 2 (parallel [

>               (set (reg:DI 1 r1 [orig:113 _4 ] [113])

>                   (lshiftrt:DI (reg:DI 0 r0 [orig:112 _3 ] [112])

>                       (const_int 31 [0x1f])))

>               (clobber (scratch:SI))

>               (clobber (scratch:SI))

>               (clobber (scratch:DI))

>               (clobber (reg:CC 100 cc))

>           ]) "pr34971.c":11 1232 {lshrdi3_neon}

>        (nil))

>

> And now that is not supposed to happen, when this code

> is executed for shifts by a constant less than 32:

>

>             emit_insn (SET (out_down, LSHIFT (code, in_down, amount)));

>             emit_insn (SET (out_down,

>                             ORR (REV_LSHIFT (code, in_up, reverse_amount),

>                                  out_down)));

>             emit_insn (SET (out_up, SHIFT (code, in_up, amount)));

>

>

> out_down may not be the same hard register than in_up for this to work.

>

> I opened a new BZ for this:

>    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041

>

> I am not sure if this is a LRA issue or if it can be handled in the

> shift expansion.

>

> Is the second patch good for trunk as it is?


Thanks for the investigation and the bug report.
The patch is ok.
Kyrill

>

> Thanks

> Bernd.
diff mbox

Patch

2016-10-18  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* config/arm/arm.c (arm_emit_coreregs_64bit_shift): Clear the result
	register only if "in" and "out" are different registers.

--- gcc/config/arm/arm.c.orig	2016-10-17 11:00:34.045673223 +0200
+++ gcc/config/arm/arm.c	2016-10-18 14:53:06.710101327 +0200
@@ -29218,8 +29218,10 @@  arm_emit_coreregs_64bit_shift (enum rtx_
 
 	  /* Clearing the out register in DImode first avoids lots
 	     of spilling and results in less stack usage.
-	     Later this redundant insn is completely removed.  */
-	  emit_insn (SET (out, const0_rtx));
+	     Later this redundant insn is completely removed.
+	     Do that only if "in" and "out" are different registers.  */
+	  if (REG_P (out) && REG_P (in) && REGNO (out) != REGNO (in))
+	    emit_insn (SET (out, const0_rtx));
 	  emit_insn (SET (out_down, LSHIFT (code, in_down, amount)));
 	  emit_insn (SET (out_down,
 			  ORR (REV_LSHIFT (code, in_up, reverse_amount),
@@ -29231,11 +29233,14 @@  arm_emit_coreregs_64bit_shift (enum rtx_
 	  /* Shifts by a constant greater than 31.  */
 	  rtx adj_amount = GEN_INT (INTVAL (amount) - 32);
 
-	  emit_insn (SET (out, const0_rtx));
+	  if (REG_P (out) && REG_P (in) && REGNO (out) != REGNO (in))
+	    emit_insn (SET (out, const0_rtx));
 	  emit_insn (SET (out_down, SHIFT (code, in_up, adj_amount)));
 	  if (code == ASHIFTRT)
 	    emit_insn (gen_ashrsi3 (out_up, in_up,
 				    GEN_INT (31)));
+	  else
+	    emit_insn (SET (out_up, const0_rtx));
 	}
     }
   else