diff mbox

[RFC,AArch64] Define TARGET_SPILL_CLASS

Message ID 537D3A51.2090006@linaro.org
State New
Headers show

Commit Message

Kugan Vivekanandarajah May 21, 2014, 11:44 p.m. UTC
Compiling some applications with -mgeneral-regs-only produces better
code (runs faster) compared to not using it. The difference here is that
when -mgeneral-regs-only is not used, floating point register are also
used in register allocation. Then IRA/LRA has to move them to core
registers before performing operations.

I experimented with TARGET_SPILL_CLASS (as in attached patch) to make
floating point register class as just spill class for integer pseudos.
Though this benefits the application which had this issue. Overall
performance with speck2k is neutral (some of the benchmarks benefits a
lot but others regress). I am looking to see if I can make it perform
better overall. Any suggestions welcome.

Attached experimental patch passes regression but 168.wupwise and
187.facerec miscompares now. I am looking at fixing this as well.

Thanks,
Kugan

gcc/
2014-05-22  Kugan Vivekanandarajah  <kuganv@linaro.org>

	* config/aarch64/aarch64.c (generic_regmove_cost) : Adjust GP2FP and
	FP2GP costs.
	(aarch64_spill_class) : New function.
	(TARGET_SHIFT_TRUNCATION_MASK) : Define.

Comments

Andrew Pinski May 21, 2014, 11:52 p.m. UTC | #1
On Wed, May 21, 2014 at 4:44 PM, Kugan
<kugan.vivekanandarajah@linaro.org> wrote:
> Compiling some applications with -mgeneral-regs-only produces better
> code (runs faster) compared to not using it. The difference here is that
> when -mgeneral-regs-only is not used, floating point register are also
> used in register allocation. Then IRA/LRA has to move them to core
> registers before performing operations.
>
> I experimented with TARGET_SPILL_CLASS (as in attached patch) to make
> floating point register class as just spill class for integer pseudos.
> Though this benefits the application which had this issue. Overall
> performance with speck2k is neutral (some of the benchmarks benefits a
> lot but others regress). I am looking to see if I can make it perform
> better overall. Any suggestions welcome.
>
> Attached experimental patch passes regression but 168.wupwise and
> 187.facerec miscompares now. I am looking at fixing this as well.
>
> Thanks,
> Kugan
>
> gcc/
> 2014-05-22  Kugan Vivekanandarajah  <kuganv@linaro.org>
>
>         * config/aarch64/aarch64.c (generic_regmove_cost) : Adjust GP2FP and
>         FP2GP costs.
>         (aarch64_spill_class) : New function.


>         (TARGET_SHIFT_TRUNCATION_MASK) : Define.

No you are not defining TARGET_SHIFT_TRUNCATION_MASK, remove it from
the changelog.

Thanks,
Andrew Pinski
Richard Earnshaw May 22, 2014, 12:18 p.m. UTC | #2
On 22/05/14 00:44, Kugan wrote:
> Compiling some applications with -mgeneral-regs-only produces better
> code (runs faster) compared to not using it. The difference here is that
> when -mgeneral-regs-only is not used, floating point register are also
> used in register allocation. Then IRA/LRA has to move them to core
> registers before performing operations.
> 
> I experimented with TARGET_SPILL_CLASS (as in attached patch) to make
> floating point register class as just spill class for integer pseudos.
> Though this benefits the application which had this issue. Overall
> performance with speck2k is neutral (some of the benchmarks benefits a
> lot but others regress). I am looking to see if I can make it perform
> better overall. Any suggestions welcome.
> 
> Attached experimental patch passes regression but 168.wupwise and
> 187.facerec miscompares now. I am looking at fixing this as well.
> 

While I'll be the first to admit that the generic costs are currently
little more than estimates, I'm worried about changing them based on the
results of one benchmark run on a single (unspecified) implementation.
The generic costs are intended to be a "best blend" of costs across a
large range of implementations.  Generically tuned code will not
necessarily be optimial on every processor -- what's more important is
that it's not severely sub-optimal on any processor.

I think the way we should proceed here is to add processor-specific
tuning tables for the cores that are being benchmarked.  Then, once we
have a reasonable selection of costing tables to base an assessment on
we should then re-review what the generic costs should be.

R.

> Thanks,
> Kugan
> 
> gcc/
> 2014-05-22  Kugan Vivekanandarajah  <kuganv@linaro.org>
> 
> 	* config/aarch64/aarch64.c (generic_regmove_cost) : Adjust GP2FP and
> 	FP2GP costs.
> 	(aarch64_spill_class) : New function.
> 	(TARGET_SHIFT_TRUNCATION_MASK) : Define.
> 
> 
> p.txt
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index a3147ee..16d1b51 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -184,8 +184,8 @@ __extension__
>  static const struct cpu_regmove_cost generic_regmove_cost =
>  {
>    NAMED_PARAM (GP2GP, 1),
> -  NAMED_PARAM (GP2FP, 2),
> -  NAMED_PARAM (FP2GP, 2),
> +  NAMED_PARAM (GP2FP, 5),
> +  NAMED_PARAM (FP2GP, 5),
>    /* We currently do not provide direct support for TFmode Q->Q move.
>       Therefore we need to raise the cost above 2 in order to have
>       reload handle the situation.  */
> @@ -4882,6 +4882,18 @@ aarch64_register_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
>    return regmove_cost->FP2FP;
>  }
>  
> +/* Return class of registers which could be used for pseudo of MODE
> +   and of class RCLASS for spilling instead of memory.  */
> +static reg_class_t
> +aarch64_spill_class (reg_class_t rclass, enum machine_mode mode)
> +{
> +  if ((GET_MODE_CLASS (mode) == MODE_INT)
> +       && reg_class_subset_p (rclass, GENERAL_REGS))
> +    return FP_REGS;
> +  return NO_REGS;
> +}
> +
> +
>  static int
>  aarch64_memory_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
>  			  reg_class_t rclass ATTRIBUTE_UNUSED,
> @@ -8431,6 +8443,9 @@ aarch64_cannot_change_mode_class (enum machine_mode from,
>  #undef TARGET_SECONDARY_RELOAD
>  #define TARGET_SECONDARY_RELOAD aarch64_secondary_reload
>  
> +#undef TARGET_SPILL_CLASS
> +#define TARGET_SPILL_CLASS aarch64_spill_class
> +
>  #undef TARGET_SHIFT_TRUNCATION_MASK
>  #define TARGET_SHIFT_TRUNCATION_MASK aarch64_shift_truncation_mask
>  
>
Kugan Vivekanandarajah May 28, 2014, 6:18 a.m. UTC | #3
On 22/05/14 22:18, Richard Earnshaw wrote:
> On 22/05/14 00:44, Kugan wrote:
>> Compiling some applications with -mgeneral-regs-only produces better
>> code (runs faster) compared to not using it. The difference here is that
>> when -mgeneral-regs-only is not used, floating point register are also
>> used in register allocation. Then IRA/LRA has to move them to core
>> registers before performing operations.
>>
>> I experimented with TARGET_SPILL_CLASS (as in attached patch) to make
>> floating point register class as just spill class for integer pseudos.
>> Though this benefits the application which had this issue. Overall
>> performance with speck2k is neutral (some of the benchmarks benefits a
>> lot but others regress). I am looking to see if I can make it perform
>> better overall. Any suggestions welcome.
>>
>> Attached experimental patch passes regression but 168.wupwise and
>> 187.facerec miscompares now. I am looking at fixing this as well.
>>
> 
> While I'll be the first to admit that the generic costs are currently
> little more than estimates, I'm worried about changing them based on the
> results of one benchmark run on a single (unspecified) implementation.
> The generic costs are intended to be a "best blend" of costs across a
> large range of implementations.  Generically tuned code will not
> necessarily be optimial on every processor -- what's more important is
> that it's not severely sub-optimal on any processor.

Thanks Richard for the comments. My primary intention here is to use
TARGET_SPILL_CLASS to make FP_REGS as spill registers. Do you think
AArch64 can benefit from TARGET_SPILL_CLASS hook. I agree that just
increasing GP2FP and FP2GP for all the modes as I am doing is not the
right think to do.

Thanks,
Kugan
Ramana Radhakrishnan June 5, 2014, 8:29 a.m. UTC | #4
>
> Thanks Richard for the comments. My primary intention here is to use
> TARGET_SPILL_CLASS to make FP_REGS as spill registers.

> Do you think
> AArch64 can benefit from TARGET_SPILL_CLASS hook. I agree that just
> increasing GP2FP and FP2GP for all the modes as I am doing is not the
> right think to do.
>

I suspect TARGET_SPILL_CLASS again needs to be a per-core decision,
the cost of moving between FP and Integer registers really depends on
the implementation and having this spill all the time to FP register
may not be good enough. So a default definition of TARGET_SPILL_CLASS
doesn't sound to me prima-facie.

I don't think increasing GP2FP and FP2GP costs is a bad thing. In a
number of benchmarks we've seen increased moves between FP and integer
registers and having this fix appears to help some of them. However
moving this to generic model needs more benchmarking across a variety
of cores before it can safely be applied there.

regards
Ramana


> Thanks,
> Kugan
Marcus Shawcroft June 5, 2014, 10:19 a.m. UTC | #5
On 5 June 2014 09:29, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
>>
>> Thanks Richard for the comments. My primary intention here is to use
>> TARGET_SPILL_CLASS to make FP_REGS as spill registers.
>
>> Do you think
>> AArch64 can benefit from TARGET_SPILL_CLASS hook. I agree that just
>> increasing GP2FP and FP2GP for all the modes as I am doing is not the
>> right think to do.
>>
>
> I suspect TARGET_SPILL_CLASS again needs to be a per-core decision,
> the cost of moving between FP and Integer registers really depends on
> the implementation and having this spill all the time to FP register
> may not be good enough. So a default definition of TARGET_SPILL_CLASS
> doesn't sound to me prima-facie.
>
> I don't think increasing GP2FP and FP2GP costs is a bad thing. In a
> number of benchmarks we've seen increased moves between FP and integer
> registers and having this fix appears to help some of them. However
> moving this to generic model needs more benchmarking across a variety
> of cores before it can safely be applied there.

I'm aligned with Richards earlier comment on this topic.  Specifically
I'd like to see numbers in processor specific tuning tables, then we
can take a view on how to adjust the generic numbers.

/Marcus
Ramana Radhakrishnan June 5, 2014, 10:20 a.m. UTC | #6
>> I don't think increasing GP2FP and FP2GP costs is a bad thing. In a
>> number of benchmarks we've seen increased moves between FP and integer
>> registers and having this fix appears to help some of them. However
>> moving this to generic model needs more benchmarking across a variety
>> of cores before it can safely be applied there.
>
> I'm aligned with Richards earlier comment on this topic.  Specifically
> I'd like to see numbers in processor specific tuning tables, then we
> can take a view on how to adjust the generic numbers.

Agreed - that's what I wanted to say but probably wasn't clear enough.

Ramana

>
> /Marcus
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a3147ee..16d1b51 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -184,8 +184,8 @@  __extension__
 static const struct cpu_regmove_cost generic_regmove_cost =
 {
   NAMED_PARAM (GP2GP, 1),
-  NAMED_PARAM (GP2FP, 2),
-  NAMED_PARAM (FP2GP, 2),
+  NAMED_PARAM (GP2FP, 5),
+  NAMED_PARAM (FP2GP, 5),
   /* We currently do not provide direct support for TFmode Q->Q move.
      Therefore we need to raise the cost above 2 in order to have
      reload handle the situation.  */
@@ -4882,6 +4882,18 @@  aarch64_register_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
   return regmove_cost->FP2FP;
 }
 
+/* Return class of registers which could be used for pseudo of MODE
+   and of class RCLASS for spilling instead of memory.  */
+static reg_class_t
+aarch64_spill_class (reg_class_t rclass, enum machine_mode mode)
+{
+  if ((GET_MODE_CLASS (mode) == MODE_INT)
+       && reg_class_subset_p (rclass, GENERAL_REGS))
+    return FP_REGS;
+  return NO_REGS;
+}
+
+
 static int
 aarch64_memory_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
 			  reg_class_t rclass ATTRIBUTE_UNUSED,
@@ -8431,6 +8443,9 @@  aarch64_cannot_change_mode_class (enum machine_mode from,
 #undef TARGET_SECONDARY_RELOAD
 #define TARGET_SECONDARY_RELOAD aarch64_secondary_reload
 
+#undef TARGET_SPILL_CLASS
+#define TARGET_SPILL_CLASS aarch64_spill_class
+
 #undef TARGET_SHIFT_TRUNCATION_MASK
 #define TARGET_SHIFT_TRUNCATION_MASK aarch64_shift_truncation_mask