Message ID | CO2PR07MB26944CDE12E84FD22A41F68583870@CO2PR07MB2694.namprd07.prod.outlook.com |
---|---|
State | Superseded |
Headers | show |
Hi Naveen, On 25/01/17 06:16, Hurugalawadi, Naveen wrote: > Hi, > > Please find attached the patch that adds AES and CMP_BRANCH > fusion for Thunderx2t99. > > Bootstrapped and Regression tested on aarch64-thunderx2t99. > Please review the patch and let us know if its okay? Code looks ok (it's quite simple), but I can't approve. but there are a couple of issues with the ChangeLog > 2017-1-25 Naveen H.S <Naveen.Hurugalawadi@cavium.com> 2017-01-25. Also, two spaces between name and email > > gcc > * config/aarch64/aarch64.c (thunderx2t99_tunings): > Improve vector initialization code gen. This doesn't fit the code in the patch Cheers, Kyrill
Hi Kyrill,
Thanks for the review and comments.
>> but there are a couple of issues with the ChangeLog
2017-02-02 Naveen H.S <Naveen.Hurugalawadi@cavium.com>
* config/aarch64/aarch64.c (thunderx2t99_tunings): Enable AES and
cmp_branch fusion.
Thanks,
Naveen
On Thu, Feb 02, 2017 at 05:03:13AM +0000, Hurugalawadi, Naveen wrote: > Hi Kyrill, > > Thanks for the review and comments. > > >> but there are a couple of issues with the ChangeLog > > 2017-02-02 Naveen H.S <Naveen.Hurugalawadi@cavium.com> > > * config/aarch64/aarch64.c (thunderx2t99_tunings): Enable AES and > cmp_branch fusion. > OK. Thanks, James
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index e87831f..da5b6fa 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -11609,11 +11609,54 @@ aarch64_expand_vector_init (rtx target, rtx vals) aarch64_expand_vector_init (target, copy); } - /* Insert the variable lanes directly. */ - enum insn_code icode = optab_handler (vec_set_optab, mode); gcc_assert (icode != CODE_FOR_nothing); + /* If there is only varables, try to optimize + the inseration using dup for the most common element + followed by insertations. */ + if (n_var == n_elts && n_elts <= 16) + { + int matches[16][2]; + int nummatches = 0; + memset (matches, 0, sizeof(matches)); + for(int i = 0; i < n_elts; i++) + { + for (int j = 0; j <= i; j++) + { + if (rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, j))) + { + matches[i][0] = j; + matches[j][1]++; + if (i != j) + nummatches++; + break; + } + } + } + int maxelement = 0; + int maxv = 0; + for (int i = 0; i < n_elts; i++) + if (matches[i][1] > maxv) + maxelement = i, maxv = matches[i][1]; + + /* Create a duplicate of the most common element. */ + rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement)); + aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x)); + /* Insert the rest. */ + for (int i = 0; i < n_elts; i++) + { + rtx x = XVECEXP (vals, 0, i); + if (matches[i][0] == maxelement) + continue; + x = copy_to_mode_reg (inner_mode, x); + emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i))); + } + return; + } + + /* Insert the variable lanes directly. */ + for (int i = 0; i < n_elts; i++) { rtx x = XVECEXP (vals, 0, i);