Message ID | 1557972484-24599-1-git-send-email-kugan.vivekanandarajah@linaro.org |
---|---|
Headers | show |
Series | Fix redundant ptest instruction | expand |
kugan.vivekanandarajah@linaro.org writes: > From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org> > > Inorder to fix this PR. > * We need to change the whilelo pattern in backend > * Change RTL CSE such that: > - Add support for VEC_DUPLICATE > - When handling PARALLEL rtx in cse_insn, we kill CSE defined by all the > parallel rtx at the end. > > For example, with patch1, we now have rtl insn as follows: > > (insn 19 18 20 3 (parallel [ > (set (reg:VNx4BI 93 [ next_mask_18 ]) > (unspec:VNx4BI [ > (const_int 0 [0]) > (reg:DI 95 [ _33 ]) > ] UNSPEC_WHILE_LO)) > (set (reg:CC 66 cc) > (compare:CC (unspec:SI [ > (vec_duplicate:VNx4BI (const_int 1 [0x1])) > (reg:VNx4BI 93 [ next_mask_18 ]) > ] UNSPEC_PTEST_PTRUE) > (const_int 0 [0]))) > ]) 4244 {while_ultdivnx4bi} > > When cse_insn process the first, it records the CSE set in reg 93. Then after > processing both the instruction in the parallel rtx, we invalidate all > expression with reg 93 which means expression in the second instruction is > invalidated for CSE. Attached patch relaxes this by invalidating before processing the > second. As far as patch 1 goes: the traditional reason for using clobbers to start with is that: - setting CC places a requirement on what CC must be after that instruction. We then have to rely on REG_UNUSED notes to tell whether that value actually matters or not. This was a bigger deal before df though. It might not matter as much now. - many passes find it harder to deal with multiple sets rather than single sets, so it's better to keep a single_set unless we know that both results are needed. It's currently combine's job to create a multiple set in cases where both results are useful. The pattern for that already exists (*while_ult<GPI:mode><PRED_ALL:mode>_cc), so if we do go for something like patch 1, we should simply expand to that insn rather than adding a new one. Note that: (vec_duplicate:PRED_ALL (const_int 1)) shouldn't appear in the insn stream. It should always be a const_vector instead. From a quick check, I haven't yet found a case where setting CC up-front hurts though, so maybe the above is out-of-date best practice and we should set the register up-front after all, if only to reduce the number of patterns. However, if possible, I think we should fix the PR in a way that works for instructions that only optionally set the flags (which for AArch64 is the common case). So it would be good if we could fix the PR without needing patch 1. Thanks, Richard > > Bootstrap and regression testing for the current version is ongoing. > > Thanks, > Kugan > > Kugan Vivekanandarajah (2): > [PR88836][aarch64] Set CC_REGNUM instead of clobber > [PR88836][aarch64] Fix CSE to process parallel rtx dest one by one > > gcc/config/aarch64/aarch64-sve.md | 9 +++- > gcc/cse.c | 67 ++++++++++++++++++++++++++---- > gcc/testsuite/gcc.target/aarch64/pr88836.c | 14 +++++++ > 3 files changed, 80 insertions(+), 10 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr88836.c
Hi Richard, Thanks for your comments. On Thu, 16 May 2019 at 18:13, Richard Sandiford <richard.sandiford@arm.com> wrote: > > kugan.vivekanandarajah@linaro.org writes: > > From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org> > > > > Inorder to fix this PR. > > * We need to change the whilelo pattern in backend > > * Change RTL CSE such that: > > - Add support for VEC_DUPLICATE > > - When handling PARALLEL rtx in cse_insn, we kill CSE defined by all the > > parallel rtx at the end. > > > > For example, with patch1, we now have rtl insn as follows: > > > > (insn 19 18 20 3 (parallel [ > > (set (reg:VNx4BI 93 [ next_mask_18 ]) > > (unspec:VNx4BI [ > > (const_int 0 [0]) > > (reg:DI 95 [ _33 ]) > > ] UNSPEC_WHILE_LO)) > > (set (reg:CC 66 cc) > > (compare:CC (unspec:SI [ > > (vec_duplicate:VNx4BI (const_int 1 [0x1])) > > (reg:VNx4BI 93 [ next_mask_18 ]) > > ] UNSPEC_PTEST_PTRUE) > > (const_int 0 [0]))) > > ]) 4244 {while_ultdivnx4bi} > > > > When cse_insn process the first, it records the CSE set in reg 93. Then after > > processing both the instruction in the parallel rtx, we invalidate all > > expression with reg 93 which means expression in the second instruction is > > invalidated for CSE. Attached patch relaxes this by invalidating before processing the > > second. > > As far as patch 1 goes: the traditional reason for using clobbers > to start with is that: > > - setting CC places a requirement on what CC must be after that instruction. > We then have to rely on REG_UNUSED notes to tell whether that value > actually matters or not. > > This was a bigger deal before df though. It might not matter as much now. > > - many passes find it harder to deal with multiple sets rather than > single sets, so it's better to keep a single_set unless we know > that both results are needed. > > It's currently combine's job to create a multiple set in cases > where both results are useful. The pattern for that already exists > (*while_ult<GPI:mode><PRED_ALL:mode>_cc), so if we do go for something > like patch 1, we should simply expand to that insn rather than adding a > new one. Note that: > > (vec_duplicate:PRED_ALL (const_int 1)) > > shouldn't appear in the insn stream. It should always be a const_vector > instead. > > From a quick check, I haven't yet found a case where setting CC up-front > hurts though, so maybe the above is out-of-date best practice and we > should set the register up-front after all, if only to reduce the number > of patterns. > > However, if possible, I think we should fix the PR in a way that works > for instructions that only optionally set the flags (which for AArch64 > is the common case). So it would be good if we could fix the PR without > needing patch 1. Do you think that combine should be able to set this. Sorry, I don't understand how we can let other passes know that this instruction will set the flags needed. Thanks, Kugan > > Thanks, > Richard > > > > > Bootstrap and regression testing for the current version is ongoing. > > > > Thanks, > > Kugan > > > > Kugan Vivekanandarajah (2): > > [PR88836][aarch64] Set CC_REGNUM instead of clobber > > [PR88836][aarch64] Fix CSE to process parallel rtx dest one by one > > > > gcc/config/aarch64/aarch64-sve.md | 9 +++- > > gcc/cse.c | 67 ++++++++++++++++++++++++++---- > > gcc/testsuite/gcc.target/aarch64/pr88836.c | 14 +++++++ > > 3 files changed, 80 insertions(+), 10 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr88836.c
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org> Inorder to fix this PR. * We need to change the whilelo pattern in backend * Change RTL CSE such that: - Add support for VEC_DUPLICATE - When handling PARALLEL rtx in cse_insn, we kill CSE defined by all the parallel rtx at the end. For example, with patch1, we now have rtl insn as follows: (insn 19 18 20 3 (parallel [ (set (reg:VNx4BI 93 [ next_mask_18 ]) (unspec:VNx4BI [ (const_int 0 [0]) (reg:DI 95 [ _33 ]) ] UNSPEC_WHILE_LO)) (set (reg:CC 66 cc) (compare:CC (unspec:SI [ (vec_duplicate:VNx4BI (const_int 1 [0x1])) (reg:VNx4BI 93 [ next_mask_18 ]) ] UNSPEC_PTEST_PTRUE) (const_int 0 [0]))) ]) 4244 {while_ultdivnx4bi} When cse_insn process the first, it records the CSE set in reg 93. Then after processing both the instruction in the parallel rtx, we invalidate all expression with reg 93 which means expression in the second instruction is invalidated for CSE. Attached patch relaxes this by invalidating before processing the second. Bootstrap and regression testing for the current version is ongoing. Thanks, Kugan Kugan Vivekanandarajah (2): [PR88836][aarch64] Set CC_REGNUM instead of clobber [PR88836][aarch64] Fix CSE to process parallel rtx dest one by one gcc/config/aarch64/aarch64-sve.md | 9 +++- gcc/cse.c | 67 ++++++++++++++++++++++++++---- gcc/testsuite/gcc.target/aarch64/pr88836.c | 14 +++++++ 3 files changed, 80 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr88836.c -- 2.7.4