Message ID | 54204387.5090105@linaro.org |
---|---|
State | New |
Headers | show |
On 22/09/14 16:43, Kugan wrote: > AArch64 has the same issue ARM had where the LR register was not used in > leaf functions. This was reported in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42017. In AArch64, this > test-case need to be added with more live ranges for the need for the > LR_REGNUM. i.e test-case in the PR needs additional loops up to r31 for > the case AArch64 to see this. > > The same fix (from the thread > https://gcc.gnu.org/ml/gcc-patches/2011-04/msg02191.html) which went > into ARM should apply to AArch64 as well. Regression tested on qemu for > aarch64-none-linux-gnu with no new regressions. Is this OK for trunk? This still be a partial fix. LR should be a caller-saved register free to use in case it's saved properly to across function call. I had a very similar patch to this sitting in my local tree and under various benchmark analysis. -- Jiong > > Thanks, > Kugan > > > gcc/ChangeLog: > > 2014-09-23 Kugan Vivekanandarajah <kuganv@linaro.org> > > * config/aarch64/aarch64.h (EPILOGUE_USES): Return true only after > epilogue_completed is true. >
On 23/09/14 01:58, Jiong Wang wrote: > On 22/09/14 16:43, Kugan wrote: > >> AArch64 has the same issue ARM had where the LR register was not used in >> leaf functions. This was reported in >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42017. In AArch64, this >> test-case need to be added with more live ranges for the need for the >> LR_REGNUM. i.e test-case in the PR needs additional loops up to r31 for >> the case AArch64 to see this. >> >> The same fix (from the thread >> https://gcc.gnu.org/ml/gcc-patches/2011-04/msg02191.html) which went >> into ARM should apply to AArch64 as well. Regression tested on qemu for >> aarch64-none-linux-gnu with no new regressions. Is this OK for trunk? > This still be a partial fix. LR should be a caller-saved register free > to use in case it's saved properly to across function call. Indeed. This should be improved from the generic code. Right now, if a hard register is used in EPILOGUE_USES, it conflicts with all the live ranges till a call site kills. I think we should have this patch till the generic code can be improved. Thanks, Kugan > > I had a very similar patch to this sitting in my local tree and under > various benchmark analysis. > > -- Jiong >> >> Thanks, >> Kugan >> >> >> gcc/ChangeLog: >> >> 2014-09-23 Kugan Vivekanandarajah <kuganv@linaro.org> >> >> * config/aarch64/aarch64.h (EPILOGUE_USES): Return true only after >> epilogue_completed is true. >> > >
On 01/10/14 01:00, Jiong Wang wrote: > On 27/09/14 22:20, Kugan wrote: >> >> On 23/09/14 01:58, Jiong Wang wrote: >>> On 22/09/14 16:43, Kugan wrote: >>> >>>> AArch64 has the same issue ARM had where the LR register was not >>>> used in >>>> leaf functions. This was reported in >>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42017. In AArch64, this >>>> test-case need to be added with more live ranges for the need for the >>>> LR_REGNUM. i.e test-case in the PR needs additional loops up to r31 for >>>> the case AArch64 to see this. >>>> >>>> The same fix (from the thread >>>> https://gcc.gnu.org/ml/gcc-patches/2011-04/msg02191.html) which went >>>> into ARM should apply to AArch64 as well. Regression tested on qemu for >>>> aarch64-none-linux-gnu with no new regressions. Is this OK for trunk? >>> This still be a partial fix. LR should be a caller-saved register free >>> to use in case it's saved properly to across function call. >> Indeed. This should be improved from the generic code. Right now, if a >> hard register is used in EPILOGUE_USES, it conflicts with all the live >> ranges till a call site kills. I think we should have this patch till >> the generic code can be improved. > > below is my local patch. LR is treated as free register, and strictly > following AArch64 ABI, frame should always be created, FP maintained > properly if LR clobbered under -fno-omit-frame-pointer. Thanks Jiong. Sorry I missed your point. As for the additions in your patch: /* ... and any callee saved register that dataflow says is live. */ for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++) if (df_regs_ever_live_p (regno) - && !call_used_regs[regno]) + && (regno == R30_REGNUM + || !call_used_regs[regno])) cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED; AArch64 CALL_USED_REGISTERS defines R30_REGNUM to be zero. Therefore shouldn’t this be redundant? for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++) @@ -4313,6 +4314,16 @@ aarch64_can_eliminate (const int from, const int to) return false; } + else + { + /* If we decided that we didn't need a leaf frame pointer but then used + LR in the function, then we'll want a frame pointer after all, so + prevent this elimination to ensure a frame pointer is used. */ + if (to == STACK_POINTER_REGNUM + && flag_omit_leaf_frame_pointer + && df_regs_ever_live_p (LR_REGNUM)) + return false; + } aarch64_frame_pointer_required makes frame pointer needed when flag_omit_leaf_frame_pointer and df_regs_ever_live_p (LR_REGNUM). Is this addition still needed? Thanks, Kugan
On 01/10/14 09:00, Kugan wrote: > On 01/10/14 01:00, Jiong Wang wrote: >> On 27/09/14 22:20, Kugan wrote: >>> On 23/09/14 01:58, Jiong Wang wrote: >>>> On 22/09/14 16:43, Kugan wrote: >>>> >>>>> AArch64 has the same issue ARM had where the LR register was not >>>>> used in >>>>> leaf functions. This was reported in >>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42017. In AArch64, this >>>>> test-case need to be added with more live ranges for the need for the >>>>> LR_REGNUM. i.e test-case in the PR needs additional loops up to r31 for >>>>> the case AArch64 to see this. >>>>> >>>>> The same fix (from the thread >>>>> https://gcc.gnu.org/ml/gcc-patches/2011-04/msg02191.html) which went >>>>> into ARM should apply to AArch64 as well. Regression tested on qemu for >>>>> aarch64-none-linux-gnu with no new regressions. Is this OK for trunk? >>>> This still be a partial fix. LR should be a caller-saved register free >>>> to use in case it's saved properly to across function call. >>> Indeed. This should be improved from the generic code. Right now, if a >>> hard register is used in EPILOGUE_USES, it conflicts with all the live >>> ranges till a call site kills. I think we should have this patch till >>> the generic code can be improved. >> below is my local patch. LR is treated as free register, and strictly >> following AArch64 ABI, frame should always be created, FP maintained >> properly if LR clobbered under -fno-omit-frame-pointer. > Thanks Jiong. Sorry I missed your point. As for the additions in your patch: I am really sorry, just noticed the reply... > /* ... and any callee saved register that dataflow says is live. */ > for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++) > if (df_regs_ever_live_p (regno) > - && !call_used_regs[regno]) > + && (regno == R30_REGNUM > + || !call_used_regs[regno])) > cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED; > > AArch64 CALL_USED_REGISTERS defines R30_REGNUM to be zero. Therefore > shouldn’t this be redundant? that's because the patch also modified r30 to be caller saved. - 0, 0, 0, 0, 0, 1, 0, 1, /* R24 - R30, SP */ \ + 0, 0, 0, 0, 0, 1, 1, 1, /* R24 - R30, SP */ \ thus, we don't need ad-hoc code in pro/epi to save/restore LR. on the other hand, we want LR to be a callee-saved register that it could be used as free register is some scenario. you can't compile the testcase lr_free_2.c for details. > } > + else > + { > + /* If we decided that we didn't need a leaf frame pointer but > then used > + LR in the function, then we'll want a frame pointer after all, so > + prevent this elimination to ensure a frame pointer is used. */ > + if (to == STACK_POINTER_REGNUM > + && flag_omit_leaf_frame_pointer > + && df_regs_ever_live_p (LR_REGNUM)) > + return false; > + } > > aarch64_frame_pointer_required makes frame pointer needed when > flag_omit_leaf_frame_pointer and df_regs_ever_live_p (LR_REGNUM). > Is this addition still needed? it's needed. the problem is the df_reg_ever_live_p (LR_REGNUM) in "aarch64_frame_pointer_required" if (flag_omit_leaf_frame_pointer && (!crtl->is_leaf || df_regs_ever_live_p (LR_REGNUM))) is invoked before register allocation that it can only catch those LR_REGNUM alive scenarios produced by inline assembly clobber, for example __asm__ ("":::"x30"), written by user explicitly in code, while it can't catch the scenarios that LR_REGNUM allocated by register allocator. so, we need to add another check in aarch64_can_eliminate which is invoked after register allocation to catch those scenarios where LR allocated by register allocator. thanks. > > Thanks, > Kugan >
On 30 September 2014 16:00, Jiong Wang <jiong.wang@arm.com> wrote: > gcc/ > * config/aarch64/aarch64.h (CALL_USED_REGISTERS): Mark LR as caller-save. > (EPILOGUE_USES): Guard the check by epilogue_completed. > * config/aarch64/aarch64.c (aarch64_layout_frame): Explictly check for LR. > (aarch64_can_eliminate): Check LR_REGNUM liveness. > > gcc/testsuite/ > * gcc.target/aarch64/lr_free_1.c: New testcase for -fomit-frame-pointer. > * gcc.target/aarch64/lr_free_2.c: New testcase for leaf > -fno-omit-frame-pointer. OK /Marcus
On Tue, Sep 30, 2014 at 8:00 AM, Jiong Wang <jiong.wang@arm.com> wrote: > On 27/09/14 22:20, Kugan wrote: >> >> >> On 23/09/14 01:58, Jiong Wang wrote: >>> >>> On 22/09/14 16:43, Kugan wrote: >>> >>>> AArch64 has the same issue ARM had where the LR register was not used in >>>> leaf functions. This was reported in >>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42017. In AArch64, this >>>> test-case need to be added with more live ranges for the need for the >>>> LR_REGNUM. i.e test-case in the PR needs additional loops up to r31 for >>>> the case AArch64 to see this. >>>> >>>> The same fix (from the thread >>>> https://gcc.gnu.org/ml/gcc-patches/2011-04/msg02191.html) which went >>>> into ARM should apply to AArch64 as well. Regression tested on qemu for >>>> aarch64-none-linux-gnu with no new regressions. Is this OK for trunk? >>> >>> This still be a partial fix. LR should be a caller-saved register free >>> to use in case it's saved properly to across function call. >> >> Indeed. This should be improved from the generic code. Right now, if a >> hard register is used in EPILOGUE_USES, it conflicts with all the live >> ranges till a call site kills. I think we should have this patch till >> the generic code can be improved. > > > below is my local patch. LR is treated as free register, and strictly > following AArch64 ABI, frame should always be created, FP maintained > properly if LR clobbered under -fno-omit-frame-pointer. > > > gcc/ > * config/aarch64/aarch64.h (CALL_USED_REGISTERS): Mark LR as caller-save. > (EPILOGUE_USES): Guard the check by epilogue_completed. > * config/aarch64/aarch64.c (aarch64_layout_frame): Explictly check for LR. > (aarch64_can_eliminate): Check LR_REGNUM liveness. > > gcc/testsuite/ > * gcc.target/aarch64/lr_free_1.c: New testcase for -fomit-frame-pointer. > * gcc.target/aarch64/lr_free_2.c: New testcase for leaf > -fno-omit-frame-pointer. + /* If we decided that we didn't need a leaf frame pointer but then used + LR in the function, then we'll want a frame pointer after all, so + prevent this elimination to ensure a frame pointer is used. */ + if (to == STACK_POINTER_REGNUM + && flag_omit_leaf_frame_pointer + && df_regs_ever_live_p (LR_REGNUM)) + return false; This breaks my build on aarch64-elf (with some local modifications) as aarch64_frame_pointer_required returns true but then we use LR but now aarch64_can_eliminate and aarch64_frame_pointer_required are inconsitant which is not a valid thing for LRA (and reload). This was mentioned in https://gcc.gnu.org/ml/gcc-patches/2013-12/msg00151.html : " IRA calls hook frame_pointer_required and it returns false. After that LRA calls can_eliminate hook and it returns false which means that fp can not be used for allocation and we should spill all pseudos assigned to it." Can you revert your patch until you can figure out how to get LRA (and reload) to play nicely with what you want to do? Thanks, Andrew Pinski
2014-11-15 0:15 GMT+00:00 Andrew Pinski <pinskia@gmail.com>: > On Tue, Sep 30, 2014 at 8:00 AM, Jiong Wang <jiong.wang@arm.com> wrote: >> On 27/09/14 22:20, Kugan wrote: >>> >>> >>> On 23/09/14 01:58, Jiong Wang wrote: > > + /* If we decided that we didn't need a leaf frame pointer but then used > + LR in the function, then we'll want a frame pointer after all, so > + prevent this elimination to ensure a frame pointer is used. */ > + if (to == STACK_POINTER_REGNUM > + && flag_omit_leaf_frame_pointer > + && df_regs_ever_live_p (LR_REGNUM)) > + return false; > > This breaks my build on aarch64-elf (with some local modifications) Hi Andrew, then what's your local modification? I think the problem is we need to figure out why there is an ICE after your local modification? can you please send me your local modification and testcase if possible. > aarch64_frame_pointer_required returns true but then we use LR but now > aarch64_can_eliminate and aarch64_frame_pointer_required are > inconsitant which is not a valid thing for LRA (and reload). > > This was mentioned in https://gcc.gnu.org/ml/gcc-patches/2013-12/msg00151.html : > " IRA calls hook frame_pointer_required and it returns false. After > that LRA calls can_eliminate hook and it returns false which means > that fp can not be used for allocation and we should spill all pseudos > assigned to it." > > Thanks, > Andrew Pinski
On Sat, Nov 15, 2014 at 6:08 AM, Jiong Wang <wong.kwongyuan.tools@gmail.com> wrote: > 2014-11-15 0:15 GMT+00:00 Andrew Pinski <pinskia@gmail.com>: >> On Tue, Sep 30, 2014 at 8:00 AM, Jiong Wang <jiong.wang@arm.com> wrote: >>> On 27/09/14 22:20, Kugan wrote: >>>> >>>> >>>> On 23/09/14 01:58, Jiong Wang wrote: >> >> + /* If we decided that we didn't need a leaf frame pointer but then used >> + LR in the function, then we'll want a frame pointer after all, so >> + prevent this elimination to ensure a frame pointer is used. */ >> + if (to == STACK_POINTER_REGNUM >> + && flag_omit_leaf_frame_pointer >> + && df_regs_ever_live_p (LR_REGNUM)) >> + return false; >> >> This breaks my build on aarch64-elf (with some local modifications) > > Hi Andrew, > > then what's your local modification? > I think the problem is we need to figure out why there is an ICE > after your local modification? > can you please send me your local modification and testcase if possible. My local modifications can be found in the gcc git at apinski/thunderx-cost. Note I reverted this patch so I can continue working. The testcase is compiling newlib. Let me try to get it again. I was configuring a combined build with: --disable-fixed-point --without-ppl --without-python --disable-werror --enable-plugins --enable-checking --disable-sim --with-newlib --disable-tls --with-cpu=thunderx --with-multilib-list=lp64,ilp32 --target=aarch64-thunderx-elf --enable-languages=c,c++ Thanks, Andrew Pinski > > >> aarch64_frame_pointer_required returns true but then we use LR but now >> aarch64_can_eliminate and aarch64_frame_pointer_required are >> inconsitant which is not a valid thing for LRA (and reload). >> >> This was mentioned in https://gcc.gnu.org/ml/gcc-patches/2013-12/msg00151.html : >> " IRA calls hook frame_pointer_required and it returns false. After >> that LRA calls can_eliminate hook and it returns false which means >> that fp can not be used for allocation and we should spill all pseudos >> assigned to it." >> >> Thanks, >> Andrew Pinski
On Sat, Nov 15, 2014 at 7:21 AM, Andrew Pinski <pinskia@gmail.com> wrote: > On Sat, Nov 15, 2014 at 6:08 AM, Jiong Wang > <wong.kwongyuan.tools@gmail.com> wrote: >> 2014-11-15 0:15 GMT+00:00 Andrew Pinski <pinskia@gmail.com>: >>> On Tue, Sep 30, 2014 at 8:00 AM, Jiong Wang <jiong.wang@arm.com> wrote: >>>> On 27/09/14 22:20, Kugan wrote: >>>>> >>>>> >>>>> On 23/09/14 01:58, Jiong Wang wrote: >>> >>> + /* If we decided that we didn't need a leaf frame pointer but then used >>> + LR in the function, then we'll want a frame pointer after all, so >>> + prevent this elimination to ensure a frame pointer is used. */ >>> + if (to == STACK_POINTER_REGNUM >>> + && flag_omit_leaf_frame_pointer >>> + && df_regs_ever_live_p (LR_REGNUM)) >>> + return false; >>> >>> This breaks my build on aarch64-elf (with some local modifications) >> >> Hi Andrew, >> >> then what's your local modification? >> I think the problem is we need to figure out why there is an ICE >> after your local modification? >> can you please send me your local modification and testcase if possible. > > My local modifications can be found in the gcc git at > apinski/thunderx-cost. Note I reverted this patch so I can continue > working. The testcase is compiling newlib. Let me try to get it > again. > I was configuring a combined build with: > --disable-fixed-point --without-ppl --without-python --disable-werror > --enable-plugins --enable-checking --disable-sim --with-newlib > --disable-tls --with-cpu=thunderx --with-multilib-list=lp64,ilp32 > --target=aarch64-thunderx-elf --enable-languages=c,c++ Attached is the preprocessed source. cc1 strtol.i -mabi=ilp32 -O2 is enough to reproduce the ICE. Thanks, Andrew > > Thanks, > Andrew Pinski > >> >> >>> aarch64_frame_pointer_required returns true but then we use LR but now >>> aarch64_can_eliminate and aarch64_frame_pointer_required are >>> inconsitant which is not a valid thing for LRA (and reload). >>> >>> This was mentioned in https://gcc.gnu.org/ml/gcc-patches/2013-12/msg00151.html : >>> " IRA calls hook frame_pointer_required and it returns false. After >>> that LRA calls can_eliminate hook and it returns false which means >>> that fp can not be used for allocation and we should spill all pseudos >>> assigned to it." >>> >>> Thanks, >>> Andrew Pinski
2014-11-15 15:49 GMT+00:00 Andrew Pinski <pinskia@gmail.com>: >> My local modifications can be found in the gcc git at >> apinski/thunderx-cost. Note I reverted this patch so I can continue >> working. The testcase is compiling newlib. Let me try to get it >> again. >> I was configuring a combined build with: >> --disable-fixed-point --without-ppl --without-python --disable-werror >> --enable-plugins --enable-checking --disable-sim --with-newlib >> --disable-tls --with-cpu=thunderx --with-multilib-list=lp64,ilp32 >> --target=aarch64-thunderx-elf --enable-languages=c,c++ > > Attached is the preprocessed source. > cc1 strtol.i -mabi=ilp32 -O2 > is enough to reproduce the ICE. thanks. I can reproduce this ICE under -mabi=ilp32 on your code base. and it's strange LR marked as alive while the final assembly don't have it. the reason of this is the following pattern define_insn "*tb<optab><mode>1" define_insn "*cb<optab><mode>1" always declare to clobber some register while they don't always use them in code generation, so sub-optimal code generated, some registers are wasted. you can see before my patch x19 is allocated for that clobber, thus there is an unnecessary save of x19 to stack, while after my patch, x30 is allocated for that clobber, so aarch64_can_eliminate invoked after this will think this function require frame pointer according to our ABI, so there are unncessary frame setup instruction. basically, it's not inconsistent between aarch64_require_frame_p and aarch64_can_eliminate. it's inconsistent between aarch64_can_eliminate invoked before assign_by_spills and after that. and my first impression is that the gcc_assert in lra-eliminate.c is to strong and need to be relaxed in some situation. > > Thanks, > Andrew >
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index db950da..b3e4585 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -309,7 +309,7 @@ extern unsigned long aarch64_tune_flags; considered live at the start of the called function. */ #define EPILOGUE_USES(REGNO) \ - ((REGNO) == LR_REGNUM) + (epilogue_completed && (REGNO) == LR_REGNUM) /* EXIT_IGNORE_STACK should be nonzero if, when returning from a function, the stack pointer does not matter. The value is tested only in