Message ID | CAELXzTMdH_scC+o3xFy2mBTakFDohccDLF8ZgX1AcSS03s7k7Q@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | Loop unrolling and memory load streams | expand |
On Fri, Sep 15, 2017 at 3:27 AM, Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org> wrote: > This patch adds separate params for rtl unroller so that they can be > tunned accordingly. Default values I have are based on some testing on > aarch64. I am happy to leave it as the current value and set them in > the back-end. PARAM_MAX_AVERAGE_UNROLLED_INSNS is only used by the RTL unroller. Why should we separate PARAM_MAX_UNROLL_TIMES? PARAM_MAX_UNROLLED_INSNS is only used by gimple passes that perform unrolling. Since GIMPLE is three-address it should match RTL reasonably well -- but I'd be ok in having a separate param for those. But I wouldn't name those 'partial'. That said, those are magic numbers and I expect we can find some that work well on RTL and GIMPLE. Richard. > > Thanks, > Kugan > > > gcc/ChangeLog: > > 2017-09-12 Kugan Vivekanandarajah <kuganv@linaro.org> > > * loop-unroll.c (decide_unroll_constant_iterations): Use new params. > (decide_unroll_runtime_iterations): Likewise. > (decide_unroll_stupid): Likewise. > * params.def (DEFPARAM): Separate and add new params for rtl unroller.
Hi Richard, On 15 September 2017 at 19:31, Richard Biener <richard.guenther@gmail.com> wrote: > On Fri, Sep 15, 2017 at 3:27 AM, Kugan Vivekanandarajah > <kugan.vivekanandarajah@linaro.org> wrote: >> This patch adds separate params for rtl unroller so that they can be >> tunned accordingly. Default values I have are based on some testing on >> aarch64. I am happy to leave it as the current value and set them in >> the back-end. > > PARAM_MAX_AVERAGE_UNROLLED_INSNS is only used by the RTL > unroller. Why should we separate PARAM_MAX_UNROLL_TIMES? > > PARAM_MAX_UNROLLED_INSNS is only used by gimple passes > that perform unrolling. Since GIMPLE is three-address it should > match RTL reasonably well -- but I'd be ok in having a separate param > for those. But I wouldn't name those 'partial'. > > That said, those are magic numbers and I expect we can find some > that work well on RTL and GIMPLE. Thanks for the review. I am mostly interested in having separate params for RTL runtime unrolling as this is different to what GIMPLE unroller does. May be I should have separate params only for the runtime unrolling (or the partial unroller) and let RTL/GIMPLE share the other. Any preference here ? Any preference on the name ? I am suspecting that RTL unroller which does the same as GIMPLE is kind of obsolete now? Thanks, Kugan > Richard. > >> >> Thanks, >> Kugan >> >> >> gcc/ChangeLog: >> >> 2017-09-12 Kugan Vivekanandarajah <kuganv@linaro.org> >> >> * loop-unroll.c (decide_unroll_constant_iterations): Use new params. >> (decide_unroll_runtime_iterations): Likewise. >> (decide_unroll_stupid): Likewise. >> * params.def (DEFPARAM): Separate and add new params for rtl unroller.
On Mon, Sep 18, 2017 at 3:36 AM, Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org> wrote: > Hi Richard, > > On 15 September 2017 at 19:31, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Fri, Sep 15, 2017 at 3:27 AM, Kugan Vivekanandarajah >> <kugan.vivekanandarajah@linaro.org> wrote: >>> This patch adds separate params for rtl unroller so that they can be >>> tunned accordingly. Default values I have are based on some testing on >>> aarch64. I am happy to leave it as the current value and set them in >>> the back-end. >> >> PARAM_MAX_AVERAGE_UNROLLED_INSNS is only used by the RTL >> unroller. Why should we separate PARAM_MAX_UNROLL_TIMES? >> >> PARAM_MAX_UNROLLED_INSNS is only used by gimple passes >> that perform unrolling. Since GIMPLE is three-address it should >> match RTL reasonably well -- but I'd be ok in having a separate param >> for those. But I wouldn't name those 'partial'. >> >> That said, those are magic numbers and I expect we can find some >> that work well on RTL and GIMPLE. > > Thanks for the review. I am mostly interested in having separate > params for RTL runtime unrolling as this is different to what GIMPLE > unroller does. Why? Do you just want to have more magic knobs to machine-auto-tune? > May be I should have separate params only for the > runtime unrolling (or the partial unroller) and let RTL/GIMPLE share > the other. Any preference here ? Any preference on the name ? > > I am suspecting that RTL unroller which does the same as GIMPLE is > kind of obsolete now? We do not have a GIMPLE loop unroller pass. On GIMPLE we only do complete peeling as a separate pass and several passes perform unrolling as part of their transform. Richard. > Thanks, > Kugan > > > > > >> Richard. >> >>> >>> Thanks, >>> Kugan >>> >>> >>> gcc/ChangeLog: >>> >>> 2017-09-12 Kugan Vivekanandarajah <kuganv@linaro.org> >>> >>> * loop-unroll.c (decide_unroll_constant_iterations): Use new params. >>> (decide_unroll_runtime_iterations): Likewise. >>> (decide_unroll_stupid): Likewise. >>> * params.def (DEFPARAM): Separate and add new params for rtl unroller.
Hi Richard, On 18 September 2017 at 17:50, Richard Biener <richard.guenther@gmail.com> wrote: > On Mon, Sep 18, 2017 at 3:36 AM, Kugan Vivekanandarajah > <kugan.vivekanandarajah@linaro.org> wrote: >> Hi Richard, >> >> On 15 September 2017 at 19:31, Richard Biener >> <richard.guenther@gmail.com> wrote: >>> On Fri, Sep 15, 2017 at 3:27 AM, Kugan Vivekanandarajah >>> <kugan.vivekanandarajah@linaro.org> wrote: >>>> This patch adds separate params for rtl unroller so that they can be >>>> tunned accordingly. Default values I have are based on some testing on >>>> aarch64. I am happy to leave it as the current value and set them in >>>> the back-end. >>> >>> PARAM_MAX_AVERAGE_UNROLLED_INSNS is only used by the RTL >>> unroller. Why should we separate PARAM_MAX_UNROLL_TIMES? >>> >>> PARAM_MAX_UNROLLED_INSNS is only used by gimple passes >>> that perform unrolling. Since GIMPLE is three-address it should >>> match RTL reasonably well -- but I'd be ok in having a separate param >>> for those. But I wouldn't name those 'partial'. >>> >>> That said, those are magic numbers and I expect we can find some >>> that work well on RTL and GIMPLE. >> >> Thanks for the review. I am mostly interested in having separate >> params for RTL runtime unrolling as this is different to what GIMPLE >> unroller does. > > Why? Do you just want to have more magic knobs to machine-auto-tune? > >> May be I should have separate params only for the >> runtime unrolling (or the partial unroller) and let RTL/GIMPLE share >> the other. Any preference here ? Any preference on the name ? >> >> I am suspecting that RTL unroller which does the same as GIMPLE is >> kind of obsolete now? > > We do not have a GIMPLE loop unroller pass. On GIMPLE we only do > complete peeling as a separate pass and several passes perform > unrolling as part of their transform. Sorry for my terminology. I was referring to complete unrolling of loops in tree-sea-loop-ivcanon.c. In any case, what I am interested in is partial unrolling of loops in loop-unroll.c. For Falkor at least, partially unrolling of small loops that satisfies certain condition seems to improve performance. Since this is not enabled by default and some of the params are also shared by other optimisations (as you also have pointed out), I thought that having separate params for loop-unroll.c will limit the interference of changing this params on other optimisations. My initial patch was wrong and this should only separate params that are shared. And also, is anyone working on a partial unroller for Gimple? I would like to give it a try if no one is working on this. Is there any preferences here. Thanks, Kugan > Richard. > >> Thanks, >> Kugan >> >> >> >> >> >>> Richard. >>> >>>> >>>> Thanks, >>>> Kugan >>>> >>>> >>>> gcc/ChangeLog: >>>> >>>> 2017-09-12 Kugan Vivekanandarajah <kuganv@linaro.org> >>>> >>>> * loop-unroll.c (decide_unroll_constant_iterations): Use new params. >>>> (decide_unroll_runtime_iterations): Likewise. >>>> (decide_unroll_stupid): Likewise. >>>> * params.def (DEFPARAM): Separate and add new params for rtl unroller.
On Tue, Sep 19, 2017 at 8:25 AM, Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org> wrote: > Hi Richard, > > On 18 September 2017 at 17:50, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Mon, Sep 18, 2017 at 3:36 AM, Kugan Vivekanandarajah >> <kugan.vivekanandarajah@linaro.org> wrote: >>> Hi Richard, >>> >>> On 15 September 2017 at 19:31, Richard Biener >>> <richard.guenther@gmail.com> wrote: >>>> On Fri, Sep 15, 2017 at 3:27 AM, Kugan Vivekanandarajah >>>> <kugan.vivekanandarajah@linaro.org> wrote: >>>>> This patch adds separate params for rtl unroller so that they can be >>>>> tunned accordingly. Default values I have are based on some testing on >>>>> aarch64. I am happy to leave it as the current value and set them in >>>>> the back-end. >>>> >>>> PARAM_MAX_AVERAGE_UNROLLED_INSNS is only used by the RTL >>>> unroller. Why should we separate PARAM_MAX_UNROLL_TIMES? >>>> >>>> PARAM_MAX_UNROLLED_INSNS is only used by gimple passes >>>> that perform unrolling. Since GIMPLE is three-address it should >>>> match RTL reasonably well -- but I'd be ok in having a separate param >>>> for those. But I wouldn't name those 'partial'. >>>> >>>> That said, those are magic numbers and I expect we can find some >>>> that work well on RTL and GIMPLE. >>> >>> Thanks for the review. I am mostly interested in having separate >>> params for RTL runtime unrolling as this is different to what GIMPLE >>> unroller does. >> >> Why? Do you just want to have more magic knobs to machine-auto-tune? >> >>> May be I should have separate params only for the >>> runtime unrolling (or the partial unroller) and let RTL/GIMPLE share >>> the other. Any preference here ? Any preference on the name ? >>> >>> I am suspecting that RTL unroller which does the same as GIMPLE is >>> kind of obsolete now? >> >> We do not have a GIMPLE loop unroller pass. On GIMPLE we only do >> complete peeling as a separate pass and several passes perform >> unrolling as part of their transform. > > Sorry for my terminology. I was referring to complete unrolling of > loops in tree-sea-loop-ivcanon.c. In any case, what I am interested in > is partial unrolling of loops in loop-unroll.c. For Falkor at least, > partially unrolling of small loops that satisfies certain condition > seems to improve performance. Since this is not enabled by default and > some of the params are also shared by other optimisations (as you also > have pointed out), I thought that having separate params for > loop-unroll.c will limit the interference of changing this params on > other optimisations. My initial patch was wrong and this should only > separate params that are shared. Well, as the complete peeling doesn't use those params but only passes that perform "partial" unrolling I see nothing wrong with sharing the parameters. > And also, is anyone working on a partial unroller for Gimple? I would > like to give it a try if no one is working on this. Is there any > preferences here. People have talked about it but I'm not aware of actual implementations. Though an implementation is quite trivial with the interesting piece being the costing (as usual). Richard. > Thanks, > Kugan > > > > >> Richard. >> >>> Thanks, >>> Kugan >>> >>> >>> >>> >>> >>>> Richard. >>>> >>>>> >>>>> Thanks, >>>>> Kugan >>>>> >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> 2017-09-12 Kugan Vivekanandarajah <kuganv@linaro.org> >>>>> >>>>> * loop-unroll.c (decide_unroll_constant_iterations): Use new params. >>>>> (decide_unroll_runtime_iterations): Likewise. >>>>> (decide_unroll_stupid): Likewise. >>>>> * params.def (DEFPARAM): Separate and add new params for rtl unroller.
From a899caf9f82767de3db556225b28dc52a81d5967 Mon Sep 17 00:00:00 2001 From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org> Date: Mon, 14 Aug 2017 10:12:09 +1000 Subject: [PATCH 1/5] add parms for rtl unroller --- gcc/loop-unroll.c | 24 ++++++++++++------------ gcc/params.def | 17 +++++++++++++++++ 2 files changed, 29 insertions(+), 12 deletions(-) diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c index 84145bb..871558c 100644 --- a/gcc/loop-unroll.c +++ b/gcc/loop-unroll.c @@ -360,13 +360,13 @@ decide_unroll_constant_iterations (struct loop *loop, int flags) /* nunroll = total number of copies of the original loop body in unrolled loop (i.e. if it is 2, we have to duplicate loop body once. */ - nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns; + nunroll = PARAM_VALUE (PARAM_MAX_UNROLLEDP_INSNS) / loop->ninsns; nunroll_by_av - = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) / loop->av_ninsns; + = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLEDP_INSNS) / loop->av_ninsns; if (nunroll > nunroll_by_av) nunroll = nunroll_by_av; - if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES)) - nunroll = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES); + if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES)) + nunroll = PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES); if (targetm.loop_unroll_adjust) nunroll = targetm.loop_unroll_adjust (nunroll, loop); @@ -664,12 +664,12 @@ decide_unroll_runtime_iterations (struct loop *loop, int flags) /* nunroll = total number of copies of the original loop body in unrolled loop (i.e. if it is 2, we have to duplicate loop body once. */ - nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns; - nunroll_by_av = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) / loop->av_ninsns; + nunroll = PARAM_VALUE (PARAM_MAX_UNROLLEDP_INSNS) / loop->ninsns; + nunroll_by_av = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLEDP_INSNS) / loop->av_ninsns; if (nunroll > nunroll_by_av) nunroll = nunroll_by_av; - if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES)) - nunroll = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES); + if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES)) + nunroll = PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES); if (targetm.loop_unroll_adjust) nunroll = targetm.loop_unroll_adjust (nunroll, loop); @@ -1158,13 +1158,13 @@ decide_unroll_stupid (struct loop *loop, int flags) /* nunroll = total number of copies of the original loop body in unrolled loop (i.e. if it is 2, we have to duplicate loop body once. */ - nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns; + nunroll = PARAM_VALUE (PARAM_MAX_UNROLLEDP_INSNS) / loop->ninsns; nunroll_by_av - = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) / loop->av_ninsns; + = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLEDP_INSNS) / loop->av_ninsns; if (nunroll > nunroll_by_av) nunroll = nunroll_by_av; - if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES)) - nunroll = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES); + if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES)) + nunroll = PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES); if (targetm.loop_unroll_adjust) nunroll = targetm.loop_unroll_adjust (nunroll, loop); diff --git a/gcc/params.def b/gcc/params.def index 805302b..c8b0a2b 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -302,6 +302,23 @@ DEFPARAM(PARAM_MAX_PEELED_INSNS, "max-peeled-insns", "The maximum number of insns of a peeled loop.", 100, 0, 0) + +DEFPARAM(PARAM_MAX_UNROLLEDP_INSNS, + "max-partial-unrolled-insns", + "The maximum number of instructions to consider to unroll in a loop by rtl unroller.", + 100, 0, 0) +/* This parameter limits how many times the loop is unrolled depending + on number of insns really executed in each iteration. */ +DEFPARAM(PARAM_MAX_AVERAGE_UNROLLEDP_INSNS, + "max-partial-average-unrolled-insns", + "The maximum number of instructions to consider to unroll in a loop on average by rtl unroller.", + 40, 0, 0) +/* The maximum number of unrollings of a single loop. */ +DEFPARAM(PARAM_MAX_UNROLLP_TIMES, + "max-partial-unroll-times", + "The maximum number of unrollings of a single loop by rtl unroller.", + 4, 0, 0) + /* The maximum number of peelings of a single loop. */ DEFPARAM(PARAM_MAX_PEEL_TIMES, "max-peel-times", -- 2.7.4