Message ID | CAELXzTPFmSnr0xhWBMVamOCGY2Vd9Kr7jhdX05tMmYDijacr3A@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | [PR63185,RFC] Improve DSE with branches | expand |
On 05/13/2018 09:37 PM, Kugan Vivekanandarajah wrote: > Hi, > > Attached patch handles PR63185 when we reach PHI with temp != NULLL. > We could see the PHI and if there isn't any uses for PHI that is > interesting, we could ignore that ? > > Bootstrapped and regression tested on x86_64-linux-gnu. > Is this OK? > > Thanks, > Kugan > > > gcc/ChangeLog: > > 2018-05-14 Kugan Vivekanandarajah <kuganv@linaro.org> > > * tree-ssa-dse.c (phi_dosent_define_nor_use_p): New. > (dse_classify_store): Use phi_dosent_define_nor_use_p. > > gcc/testsuite/ChangeLog: > > 2018-05-14 Kugan Vivekanandarajah <kuganv@linaro.org> > > * gcc.dg/tree-ssa/ssa-dse-33.c: New test. > So in this case the PHI post-dominates the store. ;; basic block 2, loop depth 0 ;; pred: ENTRY # .MEM_3 = VDEF <.MEM_2(D)> p_4 = malloc (1024); # .MEM_5 = VDEF <.MEM_3> memset (p_4, 8, 1024); if (n_6(D) != 0) goto <bb 3>; [INV] else goto <bb 4>; [INV] ;; succ: 3 ;; 4 ;; basic block 3, loop depth 0 ;; pred: 2 # .MEM_7 = VDEF <.MEM_5> g (); ;; succ: 4 ;; basic block 4, loop depth 0 ;; pred: 2 ;; 3 # .MEM_1 = PHI <.MEM_5(2), .MEM_7(3)> # VUSE <.MEM_1> return; ;; succ: EXIT } So the overall processing in this case is something like this: We call into dse_classify_store for the memset. The first immediate use is the call to g(), which does not do an aliased read. So we set TEMP to g() and continue processing immediate uses. We then process the PHI as an immediate use. We fail to walk through the PHI because we've already walked through the call to g() (as indicated by g() being stored in TEMP). I think we can get the effect we want by realizing that the PHI is a sink for the statement in TEMP. That in turn allows the existing mechanisms to walk through the PHI to work. We would have to somehow verify that we hadn't cleared anything in LIVE_BYTES though. Jeff
On Mon, 14 May 2018, Kugan Vivekanandarajah wrote: > Hi, > > Attached patch handles PR63185 when we reach PHI with temp != NULLL. > We could see the PHI and if there isn't any uses for PHI that is > interesting, we could ignore that ? > > Bootstrapped and regression tested on x86_64-linux-gnu. > Is this OK? No, as Jeff said we can't do it this way. If we end up with multiple VDEFs in the walk of defvar immediate uses we know we are dealing with a CFG fork. We can't really ignore any of the paths but we have to a) find the merge point (and the associated VDEF) b) verify for each each chain of VDEFs with associated VUSEs up to that merge VDEF that we have no uses of the to classify store and collect (partial) kills c) intersect kill info and continue walking from the merge point in b) there's the optional possibility to find sinking opportunities in case we have kills on some paths but uses on others. This is why DSE should be really merged with (store) sinking. So if we want to enhance DSEs handling of branches then we need to refactor the simple dse_classify_store function. Let me take an attempt at this today. Richard. > Thanks, > Kugan > > > gcc/ChangeLog: > > 2018-05-14 Kugan Vivekanandarajah <kuganv@linaro.org> > > * tree-ssa-dse.c (phi_dosent_define_nor_use_p): New. > (dse_classify_store): Use phi_dosent_define_nor_use_p. > > gcc/testsuite/ChangeLog: > > 2018-05-14 Kugan Vivekanandarajah <kuganv@linaro.org> > > * gcc.dg/tree-ssa/ssa-dse-33.c: New test. > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On Tue, 15 May 2018, Richard Biener wrote: > On Mon, 14 May 2018, Kugan Vivekanandarajah wrote: > > > Hi, > > > > Attached patch handles PR63185 when we reach PHI with temp != NULLL. > > We could see the PHI and if there isn't any uses for PHI that is > > interesting, we could ignore that ? > > > > Bootstrapped and regression tested on x86_64-linux-gnu. > > Is this OK? > > No, as Jeff said we can't do it this way. > > If we end up with multiple VDEFs in the walk of defvar immediate > uses we know we are dealing with a CFG fork. We can't really > ignore any of the paths but we have to > > a) find the merge point (and the associated VDEF) > b) verify for each each chain of VDEFs with associated VUSEs > up to that merge VDEF that we have no uses of the to classify > store and collect (partial) kills > c) intersect kill info and continue walking from the merge point > > in b) there's the optional possibility to find sinking opportunities > in case we have kills on some paths but uses on others. This is why > DSE should be really merged with (store) sinking. > > So if we want to enhance DSEs handling of branches then we need > to refactor the simple dse_classify_store function. Let me take > an attempt at this today. First (baby) step is the following - it arranges to collect the defs we need to continue walking from and implements trivial reduction by stopping at (full) kills. This allows us to handle the new testcase (which was already handled in the very late DSE pass with the help of sinking the store). I took the opportunity to kill the use_stmt parameter of dse_classify_store as the only user is only looking for whether the kills were all clobbers which I added a new parameter for. I didn't adjust the byte-tracking case fully (I'm not fully understanding the code in the case of a use and I'm not sure whether it's worth doing the def reduction with byte-tracking). Your testcase can be handled by reducing the PHI and the call def by seeing that the only use of a candidate def is another def we have already processed. Not sure if worth special-casing though, I'd rather have a go at "recursing". That will be the next patch. Bootstrap & regtest running on x86_64-unknown-linux-gnu. Richard. 2018-05-15 Richard Biener <rguenther@suse.de> * tree-ssa-dse.c (dse_classify_store): Remove use_stmt parameter, add by_clobber_p one. Change algorithm to collect all defs representing uses we need to walk and try reducing them to a single one before failing. (dse_dom_walker::dse_optimize_stmt): Adjust. * gcc.dg/tree-ssa/ssa-dse-31.c: New testcase. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c new file mode 100644 index 00000000000..9ea9268fe1d --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-dse1-details" } */ + +void g(); +void f(int n, char *p) +{ + *p = 0; + if (n) + { + *p = 1; + g(); + } + *p = 2; +} + +/* { dg-final { scan-tree-dump-times "Deleted dead store" 1 "dse1" } } */ diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c index 9220fea7f2e..126592a1d13 100644 --- a/gcc/tree-ssa-dse.c +++ b/gcc/tree-ssa-dse.c @@ -516,18 +516,21 @@ live_bytes_read (ao_ref use_ref, ao_ref *ref, sbitmap live) } /* A helper of dse_optimize_stmt. - Given a GIMPLE_ASSIGN in STMT that writes to REF, find a candidate - statement *USE_STMT that may prove STMT to be dead. - Return TRUE if the above conditions are met, otherwise FALSE. */ + Given a GIMPLE_ASSIGN in STMT that writes to REF, classify it + according to downstream uses and defs. Sets *BY_CLOBBER_P to true + if only clobber statements influenced the classification result. + Returns the classification. */ static dse_store_status -dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, - bool byte_tracking_enabled, sbitmap live_bytes) +dse_classify_store (ao_ref *ref, gimple *stmt, + bool byte_tracking_enabled, sbitmap live_bytes, + bool *by_clobber_p = NULL) { gimple *temp; unsigned cnt = 0; - *use_stmt = NULL; + if (by_clobber_p) + *by_clobber_p = true; /* Find the first dominated statement that clobbers (part of) the memory stmt stores to with no intermediate statement that may use @@ -551,7 +554,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, else defvar = gimple_vdef (temp); defvar_def = temp; - temp = NULL; + auto_vec<gimple *, 10> defs; FOR_EACH_IMM_USE_STMT (use_stmt, ui, defvar) { cnt++; @@ -569,9 +572,8 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, See gcc.c-torture/execute/20051110-*.c. */ else if (gimple_code (use_stmt) == GIMPLE_PHI) { - if (temp - /* Make sure we are not in a loop latch block. */ - || gimple_bb (stmt) == gimple_bb (use_stmt) + /* Make sure we are not in a loop latch block. */ + if (gimple_bb (stmt) == gimple_bb (use_stmt) || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt), gimple_bb (use_stmt)) /* We can look through PHIs to regions post-dominating @@ -589,7 +591,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, && !dominated_by_p (CDI_DOMINATORS, gimple_bb (defvar_def), gimple_bb (use_stmt))) - temp = use_stmt; + defs.safe_push (use_stmt); } /* If the statement is a use the store is not dead. */ else if (ref_maybe_used_by_stmt_p (use_stmt, ref)) @@ -597,7 +599,8 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, /* Handle common cases where we can easily build an ao_ref structure for USE_STMT and in doing so we find that the references hit non-live bytes and thus can be ignored. */ - if (byte_tracking_enabled && (!gimple_vdef (use_stmt) || !temp)) + if (byte_tracking_enabled + && (!gimple_vdef (use_stmt) || defs.is_empty ())) { if (is_gimple_assign (use_stmt)) { @@ -613,7 +616,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, /* If this statement has a VDEF, then it is the first store we have seen, so walk through it. */ if (gimple_vdef (use_stmt)) - temp = use_stmt; + defs.safe_push (use_stmt); continue; } } @@ -622,17 +625,10 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, fail = true; BREAK_FROM_IMM_USE_STMT (ui); } - /* If this is a store, remember it or bail out if we have - multiple ones (the will be in different CFG parts then). */ + /* If this is a store, remember it as we possibly need to walk the + defs uses. */ else if (gimple_vdef (use_stmt)) - { - if (temp) - { - fail = true; - BREAK_FROM_IMM_USE_STMT (ui); - } - temp = use_stmt; - } + defs.safe_push (use_stmt); } if (fail) @@ -647,25 +643,54 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, /* If we didn't find any definition this means the store is dead if it isn't a store to global reachable memory. In this case just pretend the stmt makes itself dead. Otherwise fail. */ - if (!temp) + if (defs.is_empty ()) { if (ref_may_alias_global_p (ref)) return DSE_STORE_LIVE; - temp = stmt; - break; + if (by_clobber_p) + *by_clobber_p = false; + return DSE_STORE_DEAD; } - if (byte_tracking_enabled && temp) - clear_bytes_written_by (live_bytes, temp, ref); - } - /* Continue walking until we reach a full kill as a single statement - or there are no more live bytes. */ - while (!stmt_kills_ref_p (temp, ref) - && !(byte_tracking_enabled && bitmap_empty_p (live_bytes))); + /* Process defs and remove paths starting with a kill from further + processing. */ + for (unsigned i = 0; i < defs.length (); ++i) + if (stmt_kills_ref_p (defs[i], ref)) + { + if (by_clobber_p && !gimple_clobber_p (defs[i])) + *by_clobber_p = false; + defs.unordered_remove (i); + } + + /* If all defs kill the ref we are done. */ + if (defs.is_empty ()) + return DSE_STORE_DEAD; + /* If more than one def survives fail. */ + if (defs.length () > 1) + { + /* STMT might be partially dead and we may be able to reduce + how many memory locations it stores into. */ + if (byte_tracking_enabled && !gimple_clobber_p (stmt)) + return DSE_STORE_MAYBE_PARTIAL_DEAD; + return DSE_STORE_LIVE; + } + temp = defs[0]; - *use_stmt = temp; - return DSE_STORE_DEAD; + /* Track partial kills. */ + if (byte_tracking_enabled) + { + clear_bytes_written_by (live_bytes, temp, ref); + if (bitmap_empty_p (live_bytes)) + { + if (by_clobber_p && !gimple_clobber_p (temp)) + *by_clobber_p = false; + return DSE_STORE_DEAD; + } + } + } + /* Continue walking until there are no more live bytes. */ + while (1); } @@ -795,11 +820,10 @@ dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi) return; } - gimple *use_stmt; enum dse_store_status store_status; m_byte_tracking_enabled = setup_live_bytes_from_ref (&ref, m_live_bytes); - store_status = dse_classify_store (&ref, stmt, &use_stmt, + store_status = dse_classify_store (&ref, stmt, m_byte_tracking_enabled, m_live_bytes); if (store_status == DSE_STORE_LIVE) @@ -823,20 +847,20 @@ dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi) if (is_gimple_assign (stmt)) { - gimple *use_stmt; + bool by_clobber_p = false; /* Self-assignments are zombies. */ if (operand_equal_p (gimple_assign_rhs1 (stmt), gimple_assign_lhs (stmt), 0)) - use_stmt = stmt; + ; else { m_byte_tracking_enabled = setup_live_bytes_from_ref (&ref, m_live_bytes); enum dse_store_status store_status; - store_status = dse_classify_store (&ref, stmt, &use_stmt, + store_status = dse_classify_store (&ref, stmt, m_byte_tracking_enabled, - m_live_bytes); + m_live_bytes, &by_clobber_p); if (store_status == DSE_STORE_LIVE) return; @@ -852,7 +876,7 @@ dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi) /* But only remove *this_2(D) ={v} {CLOBBER} if killed by another clobber stmt. */ if (gimple_clobber_p (stmt) - && !gimple_clobber_p (use_stmt)) + && !by_clobber_p) return; delete_dead_assignment (gsi);
On Tue, 15 May 2018, Richard Biener wrote: > On Tue, 15 May 2018, Richard Biener wrote: > > > On Mon, 14 May 2018, Kugan Vivekanandarajah wrote: > > > > > Hi, > > > > > > Attached patch handles PR63185 when we reach PHI with temp != NULLL. > > > We could see the PHI and if there isn't any uses for PHI that is > > > interesting, we could ignore that ? > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu. > > > Is this OK? > > > > No, as Jeff said we can't do it this way. > > > > If we end up with multiple VDEFs in the walk of defvar immediate > > uses we know we are dealing with a CFG fork. We can't really > > ignore any of the paths but we have to > > > > a) find the merge point (and the associated VDEF) > > b) verify for each each chain of VDEFs with associated VUSEs > > up to that merge VDEF that we have no uses of the to classify > > store and collect (partial) kills > > c) intersect kill info and continue walking from the merge point > > > > in b) there's the optional possibility to find sinking opportunities > > in case we have kills on some paths but uses on others. This is why > > DSE should be really merged with (store) sinking. > > > > So if we want to enhance DSEs handling of branches then we need > > to refactor the simple dse_classify_store function. Let me take > > an attempt at this today. > > First (baby) step is the following - it arranges to collect the > defs we need to continue walking from and implements trivial > reduction by stopping at (full) kills. This allows us to handle > the new testcase (which was already handled in the very late DSE > pass with the help of sinking the store). > > I took the opportunity to kill the use_stmt parameter of > dse_classify_store as the only user is only looking for whether > the kills were all clobbers which I added a new parameter for. > > I didn't adjust the byte-tracking case fully (I'm not fully understanding > the code in the case of a use and I'm not sure whether it's worth > doing the def reduction with byte-tracking). > > Your testcase can be handled by reducing the PHI and the call def > by seeing that the only use of a candidate def is another def > we have already processed. Not sure if worth special-casing though, > I'd rather have a go at "recursing". That will be the next > patch. > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. Applied. Another intermediate one below, fixing the byte-tracking for stmt with uses. This also re-does the PHI handling by simply avoiding recursion by means of a visited bitmap and stopping at the DSE classify stmt when re-visiting it instead of failing. This covers Pratamesh loop case for which I added ssa-dse-33.c. For the do-while loop this still runs into the inability to handle two defs to walk from. Bootstrap & regtest running on x86_64-unknown-linux-gnu. Richard. 2018-05-15 Richard Biener <rguenther@suse.de> * tree-ssa-dse.c (dse_classify_store): Track cycles via a visited bitmap of PHI defs and simplify handling of in-loop and across loop dead stores. Handle byte-tracking with multiple defs. * gcc.dg/tree-ssa/ssa-dse-32.c: New testcase. * gcc.dg/tree-ssa/ssa-dse-33.c: Likewise. commit 7129756be201db1bd90e4191ed32084cbb22130d Author: Richard Guenther <rguenther@suse.de> Date: Tue May 15 14:03:32 2018 +0200 dse#2 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c new file mode 100644 index 00000000000..eea0d8d5cf0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-dse1-details" } */ + +void f(int n) +{ + char *p = __builtin_malloc (1); + int i; + do + *p = 0; + while (++i < n); +} + +/* { dg-final { scan-tree-dump-times "Deleted dead store" 1 "dse1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c new file mode 100644 index 00000000000..02e3e6a582c --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-dse1-details" } */ + +void f(char *p, int n) +{ + for (int i = 0; i < n; ++i) + *p = 0; /* Removed by DSE. */ + *p = 1; +} + +void g(char *p, int n) +{ + int i = 0; + do + *p = 0; /* DSE cannot yet handle this case. */ + while (++i < n); + *p = 1; +} + + +/* { dg-final { scan-tree-dump-times "Deleted dead store" 2 "dse1" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "Deleted dead store" 1 "dse1" } } */ diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c index 126592a1d13..8836e84dd63 100644 --- a/gcc/tree-ssa-dse.c +++ b/gcc/tree-ssa-dse.c @@ -528,6 +528,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt, { gimple *temp; unsigned cnt = 0; + auto_bitmap visited; if (by_clobber_p) *by_clobber_p = true; @@ -539,7 +540,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt, temp = stmt; do { - gimple *use_stmt, *defvar_def; + gimple *use_stmt; imm_use_iterator ui; bool fail = false; tree defvar; @@ -550,47 +551,31 @@ dse_classify_store (ao_ref *ref, gimple *stmt, return DSE_STORE_LIVE; if (gimple_code (temp) == GIMPLE_PHI) - defvar = PHI_RESULT (temp); + { + defvar = PHI_RESULT (temp); + bitmap_set_bit (visited, SSA_NAME_VERSION (defvar)); + } else defvar = gimple_vdef (temp); - defvar_def = temp; auto_vec<gimple *, 10> defs; FOR_EACH_IMM_USE_STMT (use_stmt, ui, defvar) { cnt++; - /* If we ever reach our DSE candidate stmt again fail. We - cannot handle dead stores in loops. */ + /* We have visited ourselves already so ignore STMT for the + purpose of chaining. */ if (use_stmt == stmt) - { - fail = true; - BREAK_FROM_IMM_USE_STMT (ui); - } + ; /* In simple cases we can look through PHI nodes, but we have to be careful with loops and with memory references containing operands that are also operands of PHI nodes. See gcc.c-torture/execute/20051110-*.c. */ else if (gimple_code (use_stmt) == GIMPLE_PHI) { - /* Make sure we are not in a loop latch block. */ - if (gimple_bb (stmt) == gimple_bb (use_stmt) - || dominated_by_p (CDI_DOMINATORS, - gimple_bb (stmt), gimple_bb (use_stmt)) - /* We can look through PHIs to regions post-dominating - the DSE candidate stmt. */ - || !dominated_by_p (CDI_POST_DOMINATORS, - gimple_bb (stmt), gimple_bb (use_stmt))) - { - fail = true; - BREAK_FROM_IMM_USE_STMT (ui); - } - /* Do not consider the PHI as use if it dominates the - stmt defining the virtual operand we are processing, - we have processed it already in this case. */ - if (gimple_bb (defvar_def) != gimple_bb (use_stmt) - && !dominated_by_p (CDI_DOMINATORS, - gimple_bb (defvar_def), - gimple_bb (use_stmt))) + /* If we already visited this PHI ignore it for further + processing. */ + if (!bitmap_bit_p (visited, + SSA_NAME_VERSION (PHI_RESULT (use_stmt)))) defs.safe_push (use_stmt); } /* If the statement is a use the store is not dead. */ @@ -600,25 +585,20 @@ dse_classify_store (ao_ref *ref, gimple *stmt, structure for USE_STMT and in doing so we find that the references hit non-live bytes and thus can be ignored. */ if (byte_tracking_enabled - && (!gimple_vdef (use_stmt) || defs.is_empty ())) + && is_gimple_assign (use_stmt)) { - if (is_gimple_assign (use_stmt)) + ao_ref use_ref; + ao_ref_init (&use_ref, gimple_assign_rhs1 (use_stmt)); + if (valid_ao_ref_for_dse (&use_ref) + && use_ref.base == ref->base + && known_eq (use_ref.size, use_ref.max_size) + && !live_bytes_read (use_ref, ref, live_bytes)) { - /* Other cases were noted as non-aliasing by - the call to ref_maybe_used_by_stmt_p. */ - ao_ref use_ref; - ao_ref_init (&use_ref, gimple_assign_rhs1 (use_stmt)); - if (valid_ao_ref_for_dse (&use_ref) - && use_ref.base == ref->base - && known_eq (use_ref.size, use_ref.max_size) - && !live_bytes_read (use_ref, ref, live_bytes)) - { - /* If this statement has a VDEF, then it is the - first store we have seen, so walk through it. */ - if (gimple_vdef (use_stmt)) - defs.safe_push (use_stmt); - continue; - } + /* If this is a store, remember it as we possibly + need to walk the defs uses. */ + if (gimple_vdef (use_stmt)) + defs.safe_push (use_stmt); + continue; } }
On Tue, 15 May 2018, Richard Biener wrote: > On Tue, 15 May 2018, Richard Biener wrote: > > > On Tue, 15 May 2018, Richard Biener wrote: > > > > > On Mon, 14 May 2018, Kugan Vivekanandarajah wrote: > > > > > > > Hi, > > > > > > > > Attached patch handles PR63185 when we reach PHI with temp != NULLL. > > > > We could see the PHI and if there isn't any uses for PHI that is > > > > interesting, we could ignore that ? > > > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu. > > > > Is this OK? > > > > > > No, as Jeff said we can't do it this way. > > > > > > If we end up with multiple VDEFs in the walk of defvar immediate > > > uses we know we are dealing with a CFG fork. We can't really > > > ignore any of the paths but we have to > > > > > > a) find the merge point (and the associated VDEF) > > > b) verify for each each chain of VDEFs with associated VUSEs > > > up to that merge VDEF that we have no uses of the to classify > > > store and collect (partial) kills > > > c) intersect kill info and continue walking from the merge point > > > > > > in b) there's the optional possibility to find sinking opportunities > > > in case we have kills on some paths but uses on others. This is why > > > DSE should be really merged with (store) sinking. > > > > > > So if we want to enhance DSEs handling of branches then we need > > > to refactor the simple dse_classify_store function. Let me take > > > an attempt at this today. > > > > First (baby) step is the following - it arranges to collect the > > defs we need to continue walking from and implements trivial > > reduction by stopping at (full) kills. This allows us to handle > > the new testcase (which was already handled in the very late DSE > > pass with the help of sinking the store). > > > > I took the opportunity to kill the use_stmt parameter of > > dse_classify_store as the only user is only looking for whether > > the kills were all clobbers which I added a new parameter for. > > > > I didn't adjust the byte-tracking case fully (I'm not fully understanding > > the code in the case of a use and I'm not sure whether it's worth > > doing the def reduction with byte-tracking). > > > > Your testcase can be handled by reducing the PHI and the call def > > by seeing that the only use of a candidate def is another def > > we have already processed. Not sure if worth special-casing though, > > I'd rather have a go at "recursing". That will be the next > > patch. > > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > Applied. > > Another intermediate one below, fixing the byte-tracking for > stmt with uses. This also re-does the PHI handling by simply > avoiding recursion by means of a visited bitmap and stopping > at the DSE classify stmt when re-visiting it instead of failing. > This covers Pratamesh loop case for which I added ssa-dse-33.c. > For the do-while loop this still runs into the inability to > handle two defs to walk from. > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. Ok, loop handling doesn't work in general since we run into the issue that SSA form across the backedge is not representing the same values. Consider <bb 3> # .MEM_22 = PHI <.MEM_12(D)(2), .MEM_13(4)> # n_20 = PHI <0(2), n_7(4)> # .MEM_13 = VDEF <.MEM_22> bytes[n_20] = _4; if (n_20 > 7) goto <bb 5>; <bb 4> n_7 = n_20 + 1; # .MEM_15 = VDEF <.MEM_13> bytes[n_20] = _5; goto <bb 3>; then when classifying the store in bb4, visiting the PHI node gets us to the store in bb3 which appears to be killing. if (gimple_code (temp) == GIMPLE_PHI) - defvar = PHI_RESULT (temp); + { + /* If we visit this PHI by following a backedge then reset + any info in ref that may refer to SSA names which we'd need + to PHI translate. */ + if (gimple_bb (temp) == gimple_bb (stmt) + || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt), + gimple_bb (temp))) + /* ??? ref->ref may not refer to SSA names or it may only + refer to SSA names that are invariant with respect to the + loop represented by this PHI node. */ + ref->ref = NULL_TREE; + defvar = PHI_RESULT (temp); + bitmap_set_bit (visited, SSA_NAME_VERSION (defvar)); + } should be a workable solution for that. I'm checking that, but eventually you can think of other things that might prevent us from handling backedges. Note the current code tries to allow looking across loops but not handle backedges of loops the original stmt belongs to. Richard. > Richard. > > 2018-05-15 Richard Biener <rguenther@suse.de> > > * tree-ssa-dse.c (dse_classify_store): Track cycles via a visited > bitmap of PHI defs and simplify handling of in-loop and across > loop dead stores. Handle byte-tracking with multiple defs. > > * gcc.dg/tree-ssa/ssa-dse-32.c: New testcase. > * gcc.dg/tree-ssa/ssa-dse-33.c: Likewise. > > commit 7129756be201db1bd90e4191ed32084cbb22130d > Author: Richard Guenther <rguenther@suse.de> > Date: Tue May 15 14:03:32 2018 +0200 > > dse#2 > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c > new file mode 100644 > index 00000000000..eea0d8d5cf0 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O -fdump-tree-dse1-details" } */ > + > +void f(int n) > +{ > + char *p = __builtin_malloc (1); > + int i; > + do > + *p = 0; > + while (++i < n); > +} > + > +/* { dg-final { scan-tree-dump-times "Deleted dead store" 1 "dse1" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c > new file mode 100644 > index 00000000000..02e3e6a582c > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c > @@ -0,0 +1,22 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O -fdump-tree-dse1-details" } */ > + > +void f(char *p, int n) > +{ > + for (int i = 0; i < n; ++i) > + *p = 0; /* Removed by DSE. */ > + *p = 1; > +} > + > +void g(char *p, int n) > +{ > + int i = 0; > + do > + *p = 0; /* DSE cannot yet handle this case. */ > + while (++i < n); > + *p = 1; > +} > + > + > +/* { dg-final { scan-tree-dump-times "Deleted dead store" 2 "dse1" { xfail *-*-* } } } */ > +/* { dg-final { scan-tree-dump-times "Deleted dead store" 1 "dse1" } } */ > diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c > index 126592a1d13..8836e84dd63 100644 > --- a/gcc/tree-ssa-dse.c > +++ b/gcc/tree-ssa-dse.c > @@ -528,6 +528,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt, > { > gimple *temp; > unsigned cnt = 0; > + auto_bitmap visited; > > if (by_clobber_p) > *by_clobber_p = true; > @@ -539,7 +540,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt, > temp = stmt; > do > { > - gimple *use_stmt, *defvar_def; > + gimple *use_stmt; > imm_use_iterator ui; > bool fail = false; > tree defvar; > @@ -550,47 +551,31 @@ dse_classify_store (ao_ref *ref, gimple *stmt, > return DSE_STORE_LIVE; > > if (gimple_code (temp) == GIMPLE_PHI) > - defvar = PHI_RESULT (temp); > + { > + defvar = PHI_RESULT (temp); > + bitmap_set_bit (visited, SSA_NAME_VERSION (defvar)); > + } > else > defvar = gimple_vdef (temp); > - defvar_def = temp; > auto_vec<gimple *, 10> defs; > FOR_EACH_IMM_USE_STMT (use_stmt, ui, defvar) > { > cnt++; > > - /* If we ever reach our DSE candidate stmt again fail. We > - cannot handle dead stores in loops. */ > + /* We have visited ourselves already so ignore STMT for the > + purpose of chaining. */ > if (use_stmt == stmt) > - { > - fail = true; > - BREAK_FROM_IMM_USE_STMT (ui); > - } > + ; > /* In simple cases we can look through PHI nodes, but we > have to be careful with loops and with memory references > containing operands that are also operands of PHI nodes. > See gcc.c-torture/execute/20051110-*.c. */ > else if (gimple_code (use_stmt) == GIMPLE_PHI) > { > - /* Make sure we are not in a loop latch block. */ > - if (gimple_bb (stmt) == gimple_bb (use_stmt) > - || dominated_by_p (CDI_DOMINATORS, > - gimple_bb (stmt), gimple_bb (use_stmt)) > - /* We can look through PHIs to regions post-dominating > - the DSE candidate stmt. */ > - || !dominated_by_p (CDI_POST_DOMINATORS, > - gimple_bb (stmt), gimple_bb (use_stmt))) > - { > - fail = true; > - BREAK_FROM_IMM_USE_STMT (ui); > - } > - /* Do not consider the PHI as use if it dominates the > - stmt defining the virtual operand we are processing, > - we have processed it already in this case. */ > - if (gimple_bb (defvar_def) != gimple_bb (use_stmt) > - && !dominated_by_p (CDI_DOMINATORS, > - gimple_bb (defvar_def), > - gimple_bb (use_stmt))) > + /* If we already visited this PHI ignore it for further > + processing. */ > + if (!bitmap_bit_p (visited, > + SSA_NAME_VERSION (PHI_RESULT (use_stmt)))) > defs.safe_push (use_stmt); > } > /* If the statement is a use the store is not dead. */ > @@ -600,25 +585,20 @@ dse_classify_store (ao_ref *ref, gimple *stmt, > structure for USE_STMT and in doing so we find that the > references hit non-live bytes and thus can be ignored. */ > if (byte_tracking_enabled > - && (!gimple_vdef (use_stmt) || defs.is_empty ())) > + && is_gimple_assign (use_stmt)) > { > - if (is_gimple_assign (use_stmt)) > + ao_ref use_ref; > + ao_ref_init (&use_ref, gimple_assign_rhs1 (use_stmt)); > + if (valid_ao_ref_for_dse (&use_ref) > + && use_ref.base == ref->base > + && known_eq (use_ref.size, use_ref.max_size) > + && !live_bytes_read (use_ref, ref, live_bytes)) > { > - /* Other cases were noted as non-aliasing by > - the call to ref_maybe_used_by_stmt_p. */ > - ao_ref use_ref; > - ao_ref_init (&use_ref, gimple_assign_rhs1 (use_stmt)); > - if (valid_ao_ref_for_dse (&use_ref) > - && use_ref.base == ref->base > - && known_eq (use_ref.size, use_ref.max_size) > - && !live_bytes_read (use_ref, ref, live_bytes)) > - { > - /* If this statement has a VDEF, then it is the > - first store we have seen, so walk through it. */ > - if (gimple_vdef (use_stmt)) > - defs.safe_push (use_stmt); > - continue; > - } > + /* If this is a store, remember it as we possibly > + need to walk the defs uses. */ > + if (gimple_vdef (use_stmt)) > + defs.safe_push (use_stmt); > + continue; > } > } > > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On Tue, 15 May 2018, Richard Biener wrote: > On Tue, 15 May 2018, Richard Biener wrote: > > > On Tue, 15 May 2018, Richard Biener wrote: > > > > > On Tue, 15 May 2018, Richard Biener wrote: > > > > > > > On Mon, 14 May 2018, Kugan Vivekanandarajah wrote: > > > > > > > > > Hi, > > > > > > > > > > Attached patch handles PR63185 when we reach PHI with temp != NULLL. > > > > > We could see the PHI and if there isn't any uses for PHI that is > > > > > interesting, we could ignore that ? > > > > > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu. > > > > > Is this OK? > > > > > > > > No, as Jeff said we can't do it this way. > > > > > > > > If we end up with multiple VDEFs in the walk of defvar immediate > > > > uses we know we are dealing with a CFG fork. We can't really > > > > ignore any of the paths but we have to > > > > > > > > a) find the merge point (and the associated VDEF) > > > > b) verify for each each chain of VDEFs with associated VUSEs > > > > up to that merge VDEF that we have no uses of the to classify > > > > store and collect (partial) kills > > > > c) intersect kill info and continue walking from the merge point > > > > > > > > in b) there's the optional possibility to find sinking opportunities > > > > in case we have kills on some paths but uses on others. This is why > > > > DSE should be really merged with (store) sinking. > > > > > > > > So if we want to enhance DSEs handling of branches then we need > > > > to refactor the simple dse_classify_store function. Let me take > > > > an attempt at this today. > > > > > > First (baby) step is the following - it arranges to collect the > > > defs we need to continue walking from and implements trivial > > > reduction by stopping at (full) kills. This allows us to handle > > > the new testcase (which was already handled in the very late DSE > > > pass with the help of sinking the store). > > > > > > I took the opportunity to kill the use_stmt parameter of > > > dse_classify_store as the only user is only looking for whether > > > the kills were all clobbers which I added a new parameter for. > > > > > > I didn't adjust the byte-tracking case fully (I'm not fully understanding > > > the code in the case of a use and I'm not sure whether it's worth > > > doing the def reduction with byte-tracking). > > > > > > Your testcase can be handled by reducing the PHI and the call def > > > by seeing that the only use of a candidate def is another def > > > we have already processed. Not sure if worth special-casing though, > > > I'd rather have a go at "recursing". That will be the next > > > patch. > > > > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > > > Applied. > > > > Another intermediate one below, fixing the byte-tracking for > > stmt with uses. This also re-does the PHI handling by simply > > avoiding recursion by means of a visited bitmap and stopping > > at the DSE classify stmt when re-visiting it instead of failing. > > This covers Pratamesh loop case for which I added ssa-dse-33.c. > > For the do-while loop this still runs into the inability to > > handle two defs to walk from. > > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > Ok, loop handling doesn't work in general since we run into the > issue that SSA form across the backedge is not representing the > same values. Consider > > <bb 3> > # .MEM_22 = PHI <.MEM_12(D)(2), .MEM_13(4)> > # n_20 = PHI <0(2), n_7(4)> > # .MEM_13 = VDEF <.MEM_22> > bytes[n_20] = _4; > if (n_20 > 7) > goto <bb 5>; > > <bb 4> > n_7 = n_20 + 1; > # .MEM_15 = VDEF <.MEM_13> > bytes[n_20] = _5; > goto <bb 3>; > > then when classifying the store in bb4, visiting the PHI node > gets us to the store in bb3 which appears to be killing. > > if (gimple_code (temp) == GIMPLE_PHI) > - defvar = PHI_RESULT (temp); > + { > + /* If we visit this PHI by following a backedge then reset > + any info in ref that may refer to SSA names which we'd need > + to PHI translate. */ > + if (gimple_bb (temp) == gimple_bb (stmt) > + || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt), > + gimple_bb (temp))) > + /* ??? ref->ref may not refer to SSA names or it may only > + refer to SSA names that are invariant with respect to the > + loop represented by this PHI node. */ > + ref->ref = NULL_TREE; > + defvar = PHI_RESULT (temp); > + bitmap_set_bit (visited, SSA_NAME_VERSION (defvar)); > + } > > should be a workable solution for that. I'm checking that, but > eventually you can think of other things that might prevent us from > handling backedges. Note the current code tries to allow > looking across loops but not handle backedges of loops the > original stmt belongs to. Just to mention before I leave for the day I think I've identified a latent issue where I just fail to produce a testcase right now which is that non-return exits from a function are not reliably visited given they do not all have virtual operands, like for example resx. Thus consider void foo (int *p, int x) { *p = 0; if (x) resx; *p = 1; return; } where we will never visit any stmts on the resx path and thus think the *p = 0 store is dead (all with current trunk of course). Will continue to dig tomorrow. Richard. > Richard. > > > > Richard. > > > > 2018-05-15 Richard Biener <rguenther@suse.de> > > > > * tree-ssa-dse.c (dse_classify_store): Track cycles via a visited > > bitmap of PHI defs and simplify handling of in-loop and across > > loop dead stores. Handle byte-tracking with multiple defs. > > > > * gcc.dg/tree-ssa/ssa-dse-32.c: New testcase. > > * gcc.dg/tree-ssa/ssa-dse-33.c: Likewise. > > > > commit 7129756be201db1bd90e4191ed32084cbb22130d > > Author: Richard Guenther <rguenther@suse.de> > > Date: Tue May 15 14:03:32 2018 +0200 > > > > dse#2 > > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c > > new file mode 100644 > > index 00000000000..eea0d8d5cf0 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c > > @@ -0,0 +1,13 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O -fdump-tree-dse1-details" } */ > > + > > +void f(int n) > > +{ > > + char *p = __builtin_malloc (1); > > + int i; > > + do > > + *p = 0; > > + while (++i < n); > > +} > > + > > +/* { dg-final { scan-tree-dump-times "Deleted dead store" 1 "dse1" } } */ > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c > > new file mode 100644 > > index 00000000000..02e3e6a582c > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c > > @@ -0,0 +1,22 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O -fdump-tree-dse1-details" } */ > > + > > +void f(char *p, int n) > > +{ > > + for (int i = 0; i < n; ++i) > > + *p = 0; /* Removed by DSE. */ > > + *p = 1; > > +} > > + > > +void g(char *p, int n) > > +{ > > + int i = 0; > > + do > > + *p = 0; /* DSE cannot yet handle this case. */ > > + while (++i < n); > > + *p = 1; > > +} > > + > > + > > +/* { dg-final { scan-tree-dump-times "Deleted dead store" 2 "dse1" { xfail *-*-* } } } */ > > +/* { dg-final { scan-tree-dump-times "Deleted dead store" 1 "dse1" } } */ > > diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c > > index 126592a1d13..8836e84dd63 100644 > > --- a/gcc/tree-ssa-dse.c > > +++ b/gcc/tree-ssa-dse.c > > @@ -528,6 +528,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt, > > { > > gimple *temp; > > unsigned cnt = 0; > > + auto_bitmap visited; > > > > if (by_clobber_p) > > *by_clobber_p = true; > > @@ -539,7 +540,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt, > > temp = stmt; > > do > > { > > - gimple *use_stmt, *defvar_def; > > + gimple *use_stmt; > > imm_use_iterator ui; > > bool fail = false; > > tree defvar; > > @@ -550,47 +551,31 @@ dse_classify_store (ao_ref *ref, gimple *stmt, > > return DSE_STORE_LIVE; > > > > if (gimple_code (temp) == GIMPLE_PHI) > > - defvar = PHI_RESULT (temp); > > + { > > + defvar = PHI_RESULT (temp); > > + bitmap_set_bit (visited, SSA_NAME_VERSION (defvar)); > > + } > > else > > defvar = gimple_vdef (temp); > > - defvar_def = temp; > > auto_vec<gimple *, 10> defs; > > FOR_EACH_IMM_USE_STMT (use_stmt, ui, defvar) > > { > > cnt++; > > > > - /* If we ever reach our DSE candidate stmt again fail. We > > - cannot handle dead stores in loops. */ > > + /* We have visited ourselves already so ignore STMT for the > > + purpose of chaining. */ > > if (use_stmt == stmt) > > - { > > - fail = true; > > - BREAK_FROM_IMM_USE_STMT (ui); > > - } > > + ; > > /* In simple cases we can look through PHI nodes, but we > > have to be careful with loops and with memory references > > containing operands that are also operands of PHI nodes. > > See gcc.c-torture/execute/20051110-*.c. */ > > else if (gimple_code (use_stmt) == GIMPLE_PHI) > > { > > - /* Make sure we are not in a loop latch block. */ > > - if (gimple_bb (stmt) == gimple_bb (use_stmt) > > - || dominated_by_p (CDI_DOMINATORS, > > - gimple_bb (stmt), gimple_bb (use_stmt)) > > - /* We can look through PHIs to regions post-dominating > > - the DSE candidate stmt. */ > > - || !dominated_by_p (CDI_POST_DOMINATORS, > > - gimple_bb (stmt), gimple_bb (use_stmt))) > > - { > > - fail = true; > > - BREAK_FROM_IMM_USE_STMT (ui); > > - } > > - /* Do not consider the PHI as use if it dominates the > > - stmt defining the virtual operand we are processing, > > - we have processed it already in this case. */ > > - if (gimple_bb (defvar_def) != gimple_bb (use_stmt) > > - && !dominated_by_p (CDI_DOMINATORS, > > - gimple_bb (defvar_def), > > - gimple_bb (use_stmt))) > > + /* If we already visited this PHI ignore it for further > > + processing. */ > > + if (!bitmap_bit_p (visited, > > + SSA_NAME_VERSION (PHI_RESULT (use_stmt)))) > > defs.safe_push (use_stmt); > > } > > /* If the statement is a use the store is not dead. */ > > @@ -600,25 +585,20 @@ dse_classify_store (ao_ref *ref, gimple *stmt, > > structure for USE_STMT and in doing so we find that the > > references hit non-live bytes and thus can be ignored. */ > > if (byte_tracking_enabled > > - && (!gimple_vdef (use_stmt) || defs.is_empty ())) > > + && is_gimple_assign (use_stmt)) > > { > > - if (is_gimple_assign (use_stmt)) > > + ao_ref use_ref; > > + ao_ref_init (&use_ref, gimple_assign_rhs1 (use_stmt)); > > + if (valid_ao_ref_for_dse (&use_ref) > > + && use_ref.base == ref->base > > + && known_eq (use_ref.size, use_ref.max_size) > > + && !live_bytes_read (use_ref, ref, live_bytes)) > > { > > - /* Other cases were noted as non-aliasing by > > - the call to ref_maybe_used_by_stmt_p. */ > > - ao_ref use_ref; > > - ao_ref_init (&use_ref, gimple_assign_rhs1 (use_stmt)); > > - if (valid_ao_ref_for_dse (&use_ref) > > - && use_ref.base == ref->base > > - && known_eq (use_ref.size, use_ref.max_size) > > - && !live_bytes_read (use_ref, ref, live_bytes)) > > - { > > - /* If this statement has a VDEF, then it is the > > - first store we have seen, so walk through it. */ > > - if (gimple_vdef (use_stmt)) > > - defs.safe_push (use_stmt); > > - continue; > > - } > > + /* If this is a store, remember it as we possibly > > + need to walk the defs uses. */ > > + if (gimple_vdef (use_stmt)) > > + defs.safe_push (use_stmt); > > + continue; > > } > > } > > > > > > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On 05/15/2018 02:15 AM, Richard Biener wrote: > On Mon, 14 May 2018, Kugan Vivekanandarajah wrote: > >> Hi, >> >> Attached patch handles PR63185 when we reach PHI with temp != NULLL. >> We could see the PHI and if there isn't any uses for PHI that is >> interesting, we could ignore that ? >> >> Bootstrapped and regression tested on x86_64-linux-gnu. >> Is this OK? > > No, as Jeff said we can't do it this way. > > If we end up with multiple VDEFs in the walk of defvar immediate > uses we know we are dealing with a CFG fork. We can't really > ignore any of the paths but we have to > > a) find the merge point (and the associated VDEF) > b) verify for each each chain of VDEFs with associated VUSEs > up to that merge VDEF that we have no uses of the to classify > store and collect (partial) kills > c) intersect kill info and continue walking from the merge point Which makes me come back to "why is this structured as a forward walk?" DSE seems much more naturally structured as a walk of the post-dominator tree, walking instructions in reverse order. Cases like 63185 would naturally fall out of that structure as the clobber at the end of the function post-dominates the memset and there's no intervening aliased loads. In fact, I thought that's how I initially structured DSE. I'm not sure when/why it changed. Jeff
On May 15, 2018 5:04:53 PM GMT+02:00, Jeff Law <law@redhat.com> wrote: >On 05/15/2018 02:15 AM, Richard Biener wrote: >> On Mon, 14 May 2018, Kugan Vivekanandarajah wrote: >> >>> Hi, >>> >>> Attached patch handles PR63185 when we reach PHI with temp != NULLL. >>> We could see the PHI and if there isn't any uses for PHI that is >>> interesting, we could ignore that ? >>> >>> Bootstrapped and regression tested on x86_64-linux-gnu. >>> Is this OK? >> >> No, as Jeff said we can't do it this way. >> >> If we end up with multiple VDEFs in the walk of defvar immediate >> uses we know we are dealing with a CFG fork. We can't really >> ignore any of the paths but we have to >> >> a) find the merge point (and the associated VDEF) >> b) verify for each each chain of VDEFs with associated VUSEs >> up to that merge VDEF that we have no uses of the to classify >> store and collect (partial) kills >> c) intersect kill info and continue walking from the merge point >Which makes me come back to "why is this structured as a forward walk?" Probably historic reasons. We _do_ have a backward walk DSE embedded inside DCE though but I had troubles making it as powerful as DSE. >DSE seems much more naturally structured as a walk of the >post-dominator >tree, walking instructions in reverse order. DSE performs analysis for each possibly dead store and walks those in postdom order. But as the analysis decides whether a candidate is dead it needs to find later stores that makes it so. Thus a forward walk. The canonical way would possibly to find earlier stores that a store makes dead which would structure this more like a dataflow problem. (but the kill sets will be expensive to manage I'd guess) >Cases like 63185 would naturally fall out of that structure as the >clobber at the end of the function post-dominates the memset and >there's >no intervening aliased loads. > >In fact, I thought that's how I initially structured DSE. I'm not >sure when/why it changed. IIRC it was improved, became too expensive and then ended up the current way. Richard. >Jeff
On 05/15/2018 03:20 AM, Richard Biener wrote: > > First (baby) step is the following - it arranges to collect the > defs we need to continue walking from and implements trivial > reduction by stopping at (full) kills. This allows us to handle > the new testcase (which was already handled in the very late DSE > pass with the help of sinking the store). Seems like a notable improvement. They way we were handling defs to continue walking forward from was, umm, fugly. > > I didn't adjust the byte-tracking case fully (I'm not fully understanding > the code in the case of a use and I'm not sure whether it's worth > doing the def reduction with byte-tracking). The byte tracking case is pretty simple. We're tracking the bytes from the original store that are still live. If we encounter a subsequent store that over-writes a subset of bytes, we consider those bytes from the original store as "dead". If we then encounter a load and the load only reads from "dead" bytes, then we can ignore the load for the purposes of DSE. It's something you suggested. I did some instrumentation when working on DSE for gcc6/gcc7 and while it triggers, it's not terribly often. > > Your testcase can be handled by reducing the PHI and the call def > by seeing that the only use of a candidate def is another def > we have already processed. Not sure if worth special-casing though, > I'd rather have a go at "recursing". That will be the next > patch. > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > Richard. > > 2018-05-15 Richard Biener <rguenther@suse.de> > > * tree-ssa-dse.c (dse_classify_store): Remove use_stmt parameter, > add by_clobber_p one. Change algorithm to collect all defs > representing uses we need to walk and try reducing them to > a single one before failing. > (dse_dom_walker::dse_optimize_stmt): Adjust. > > * gcc.dg/tree-ssa/ssa-dse-31.c: New testcase. Seems quite reasonable to me as long as we're continuing with a forward algorithm. jeff
Hi Richard, On 15 May 2018 at 19:20, Richard Biener <rguenther@suse.de> wrote: > On Tue, 15 May 2018, Richard Biener wrote: > >> On Mon, 14 May 2018, Kugan Vivekanandarajah wrote: >> >> > Hi, >> > >> > Attached patch handles PR63185 when we reach PHI with temp != NULLL. >> > We could see the PHI and if there isn't any uses for PHI that is >> > interesting, we could ignore that ? >> > >> > Bootstrapped and regression tested on x86_64-linux-gnu. >> > Is this OK? >> >> No, as Jeff said we can't do it this way. >> >> If we end up with multiple VDEFs in the walk of defvar immediate >> uses we know we are dealing with a CFG fork. We can't really >> ignore any of the paths but we have to >> >> a) find the merge point (and the associated VDEF) >> b) verify for each each chain of VDEFs with associated VUSEs >> up to that merge VDEF that we have no uses of the to classify >> store and collect (partial) kills >> c) intersect kill info and continue walking from the merge point >> >> in b) there's the optional possibility to find sinking opportunities >> in case we have kills on some paths but uses on others. This is why >> DSE should be really merged with (store) sinking. >> >> So if we want to enhance DSEs handling of branches then we need >> to refactor the simple dse_classify_store function. Let me take >> an attempt at this today. > > First (baby) step is the following - it arranges to collect the > defs we need to continue walking from and implements trivial > reduction by stopping at (full) kills. This allows us to handle > the new testcase (which was already handled in the very late DSE > pass with the help of sinking the store). Thanks for the explanation and thanks for working on this. > > I took the opportunity to kill the use_stmt parameter of > dse_classify_store as the only user is only looking for whether > the kills were all clobbers which I added a new parameter for. > > I didn't adjust the byte-tracking case fully (I'm not fully understanding > the code in the case of a use and I'm not sure whether it's worth > doing the def reduction with byte-tracking). Since byte tracking is tracking the killed subset of bytes in the original def, in your patch can we calculate the bytes killed by each defs[i] and then find the intersection for the iteration. If we don't have complete kill, we can use this to set live bytes and iterate till all the live_bytes are cleared like it was done earlier. Thanks, Kugan > > Your testcase can be handled by reducing the PHI and the call def > by seeing that the only use of a candidate def is another def > we have already processed. Not sure if worth special-casing though, > I'd rather have a go at "recursing". That will be the next > patch. > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > Richard. > > 2018-05-15 Richard Biener <rguenther@suse.de> > > * tree-ssa-dse.c (dse_classify_store): Remove use_stmt parameter, > add by_clobber_p one. Change algorithm to collect all defs > representing uses we need to walk and try reducing them to > a single one before failing. > (dse_dom_walker::dse_optimize_stmt): Adjust. > > * gcc.dg/tree-ssa/ssa-dse-31.c: New testcase. > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c > new file mode 100644 > index 00000000000..9ea9268fe1d > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O -fdump-tree-dse1-details" } */ > + > +void g(); > +void f(int n, char *p) > +{ > + *p = 0; > + if (n) > + { > + *p = 1; > + g(); > + } > + *p = 2; > +} > + > +/* { dg-final { scan-tree-dump-times "Deleted dead store" 1 "dse1" } } */ > diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c > index 9220fea7f2e..126592a1d13 100644 > --- a/gcc/tree-ssa-dse.c > +++ b/gcc/tree-ssa-dse.c > @@ -516,18 +516,21 @@ live_bytes_read (ao_ref use_ref, ao_ref *ref, sbitmap live) > } > > /* A helper of dse_optimize_stmt. > - Given a GIMPLE_ASSIGN in STMT that writes to REF, find a candidate > - statement *USE_STMT that may prove STMT to be dead. > - Return TRUE if the above conditions are met, otherwise FALSE. */ > + Given a GIMPLE_ASSIGN in STMT that writes to REF, classify it > + according to downstream uses and defs. Sets *BY_CLOBBER_P to true > + if only clobber statements influenced the classification result. > + Returns the classification. */ > > static dse_store_status > -dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, > - bool byte_tracking_enabled, sbitmap live_bytes) > +dse_classify_store (ao_ref *ref, gimple *stmt, > + bool byte_tracking_enabled, sbitmap live_bytes, > + bool *by_clobber_p = NULL) > { > gimple *temp; > unsigned cnt = 0; > > - *use_stmt = NULL; > + if (by_clobber_p) > + *by_clobber_p = true; > > /* Find the first dominated statement that clobbers (part of) the > memory stmt stores to with no intermediate statement that may use > @@ -551,7 +554,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, > else > defvar = gimple_vdef (temp); > defvar_def = temp; > - temp = NULL; > + auto_vec<gimple *, 10> defs; > FOR_EACH_IMM_USE_STMT (use_stmt, ui, defvar) > { > cnt++; > @@ -569,9 +572,8 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, > See gcc.c-torture/execute/20051110-*.c. */ > else if (gimple_code (use_stmt) == GIMPLE_PHI) > { > - if (temp > - /* Make sure we are not in a loop latch block. */ > - || gimple_bb (stmt) == gimple_bb (use_stmt) > + /* Make sure we are not in a loop latch block. */ > + if (gimple_bb (stmt) == gimple_bb (use_stmt) > || dominated_by_p (CDI_DOMINATORS, > gimple_bb (stmt), gimple_bb (use_stmt)) > /* We can look through PHIs to regions post-dominating > @@ -589,7 +591,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, > && !dominated_by_p (CDI_DOMINATORS, > gimple_bb (defvar_def), > gimple_bb (use_stmt))) > - temp = use_stmt; > + defs.safe_push (use_stmt); > } > /* If the statement is a use the store is not dead. */ > else if (ref_maybe_used_by_stmt_p (use_stmt, ref)) > @@ -597,7 +599,8 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, > /* Handle common cases where we can easily build an ao_ref > structure for USE_STMT and in doing so we find that the > references hit non-live bytes and thus can be ignored. */ > - if (byte_tracking_enabled && (!gimple_vdef (use_stmt) || !temp)) > + if (byte_tracking_enabled > + && (!gimple_vdef (use_stmt) || defs.is_empty ())) > { > if (is_gimple_assign (use_stmt)) > { > @@ -613,7 +616,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, > /* If this statement has a VDEF, then it is the > first store we have seen, so walk through it. */ > if (gimple_vdef (use_stmt)) > - temp = use_stmt; > + defs.safe_push (use_stmt); > continue; > } > } > @@ -622,17 +625,10 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, > fail = true; > BREAK_FROM_IMM_USE_STMT (ui); > } > - /* If this is a store, remember it or bail out if we have > - multiple ones (the will be in different CFG parts then). */ > + /* If this is a store, remember it as we possibly need to walk the > + defs uses. */ > else if (gimple_vdef (use_stmt)) > - { > - if (temp) > - { > - fail = true; > - BREAK_FROM_IMM_USE_STMT (ui); > - } > - temp = use_stmt; > - } > + defs.safe_push (use_stmt); > } > > if (fail) > @@ -647,25 +643,54 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, > /* If we didn't find any definition this means the store is dead > if it isn't a store to global reachable memory. In this case > just pretend the stmt makes itself dead. Otherwise fail. */ > - if (!temp) > + if (defs.is_empty ()) > { > if (ref_may_alias_global_p (ref)) > return DSE_STORE_LIVE; > > - temp = stmt; > - break; > + if (by_clobber_p) > + *by_clobber_p = false; > + return DSE_STORE_DEAD; > } > > - if (byte_tracking_enabled && temp) > - clear_bytes_written_by (live_bytes, temp, ref); > - } > - /* Continue walking until we reach a full kill as a single statement > - or there are no more live bytes. */ > - while (!stmt_kills_ref_p (temp, ref) > - && !(byte_tracking_enabled && bitmap_empty_p (live_bytes))); > + /* Process defs and remove paths starting with a kill from further > + processing. */ > + for (unsigned i = 0; i < defs.length (); ++i) > + if (stmt_kills_ref_p (defs[i], ref)) > + { > + if (by_clobber_p && !gimple_clobber_p (defs[i])) > + *by_clobber_p = false; > + defs.unordered_remove (i); > + } > + > + /* If all defs kill the ref we are done. */ > + if (defs.is_empty ()) > + return DSE_STORE_DEAD; > + /* If more than one def survives fail. */ > + if (defs.length () > 1) > + { > + /* STMT might be partially dead and we may be able to reduce > + how many memory locations it stores into. */ > + if (byte_tracking_enabled && !gimple_clobber_p (stmt)) > + return DSE_STORE_MAYBE_PARTIAL_DEAD; > + return DSE_STORE_LIVE; > + } > + temp = defs[0]; > > - *use_stmt = temp; > - return DSE_STORE_DEAD; > + /* Track partial kills. */ > + if (byte_tracking_enabled) > + { > + clear_bytes_written_by (live_bytes, temp, ref); > + if (bitmap_empty_p (live_bytes)) > + { > + if (by_clobber_p && !gimple_clobber_p (temp)) > + *by_clobber_p = false; > + return DSE_STORE_DEAD; > + } > + } > + } > + /* Continue walking until there are no more live bytes. */ > + while (1); > } > > > @@ -795,11 +820,10 @@ dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi) > return; > } > > - gimple *use_stmt; > enum dse_store_status store_status; > m_byte_tracking_enabled > = setup_live_bytes_from_ref (&ref, m_live_bytes); > - store_status = dse_classify_store (&ref, stmt, &use_stmt, > + store_status = dse_classify_store (&ref, stmt, > m_byte_tracking_enabled, > m_live_bytes); > if (store_status == DSE_STORE_LIVE) > @@ -823,20 +847,20 @@ dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi) > > if (is_gimple_assign (stmt)) > { > - gimple *use_stmt; > + bool by_clobber_p = false; > > /* Self-assignments are zombies. */ > if (operand_equal_p (gimple_assign_rhs1 (stmt), > gimple_assign_lhs (stmt), 0)) > - use_stmt = stmt; > + ; > else > { > m_byte_tracking_enabled > = setup_live_bytes_from_ref (&ref, m_live_bytes); > enum dse_store_status store_status; > - store_status = dse_classify_store (&ref, stmt, &use_stmt, > + store_status = dse_classify_store (&ref, stmt, > m_byte_tracking_enabled, > - m_live_bytes); > + m_live_bytes, &by_clobber_p); > if (store_status == DSE_STORE_LIVE) > return; > > @@ -852,7 +876,7 @@ dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi) > /* But only remove *this_2(D) ={v} {CLOBBER} if killed by > another clobber stmt. */ > if (gimple_clobber_p (stmt) > - && !gimple_clobber_p (use_stmt)) > + && !by_clobber_p) > return; > > delete_dead_assignment (gsi);
On Tue, 15 May 2018, Richard Biener wrote: > On Tue, 15 May 2018, Richard Biener wrote: > > > On Tue, 15 May 2018, Richard Biener wrote: > > > > > On Tue, 15 May 2018, Richard Biener wrote: > > > > > > > On Tue, 15 May 2018, Richard Biener wrote: > > > > > > > > > On Mon, 14 May 2018, Kugan Vivekanandarajah wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > Attached patch handles PR63185 when we reach PHI with temp != NULLL. > > > > > > We could see the PHI and if there isn't any uses for PHI that is > > > > > > interesting, we could ignore that ? > > > > > > > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu. > > > > > > Is this OK? > > > > > > > > > > No, as Jeff said we can't do it this way. > > > > > > > > > > If we end up with multiple VDEFs in the walk of defvar immediate > > > > > uses we know we are dealing with a CFG fork. We can't really > > > > > ignore any of the paths but we have to > > > > > > > > > > a) find the merge point (and the associated VDEF) > > > > > b) verify for each each chain of VDEFs with associated VUSEs > > > > > up to that merge VDEF that we have no uses of the to classify > > > > > store and collect (partial) kills > > > > > c) intersect kill info and continue walking from the merge point > > > > > > > > > > in b) there's the optional possibility to find sinking opportunities > > > > > in case we have kills on some paths but uses on others. This is why > > > > > DSE should be really merged with (store) sinking. > > > > > > > > > > So if we want to enhance DSEs handling of branches then we need > > > > > to refactor the simple dse_classify_store function. Let me take > > > > > an attempt at this today. > > > > > > > > First (baby) step is the following - it arranges to collect the > > > > defs we need to continue walking from and implements trivial > > > > reduction by stopping at (full) kills. This allows us to handle > > > > the new testcase (which was already handled in the very late DSE > > > > pass with the help of sinking the store). > > > > > > > > I took the opportunity to kill the use_stmt parameter of > > > > dse_classify_store as the only user is only looking for whether > > > > the kills were all clobbers which I added a new parameter for. > > > > > > > > I didn't adjust the byte-tracking case fully (I'm not fully understanding > > > > the code in the case of a use and I'm not sure whether it's worth > > > > doing the def reduction with byte-tracking). > > > > > > > > Your testcase can be handled by reducing the PHI and the call def > > > > by seeing that the only use of a candidate def is another def > > > > we have already processed. Not sure if worth special-casing though, > > > > I'd rather have a go at "recursing". That will be the next > > > > patch. > > > > > > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > > > > > Applied. > > > > > > Another intermediate one below, fixing the byte-tracking for > > > stmt with uses. This also re-does the PHI handling by simply > > > avoiding recursion by means of a visited bitmap and stopping > > > at the DSE classify stmt when re-visiting it instead of failing. > > > This covers Pratamesh loop case for which I added ssa-dse-33.c. > > > For the do-while loop this still runs into the inability to > > > handle two defs to walk from. > > > > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > > > Ok, loop handling doesn't work in general since we run into the > > issue that SSA form across the backedge is not representing the > > same values. Consider > > > > <bb 3> > > # .MEM_22 = PHI <.MEM_12(D)(2), .MEM_13(4)> > > # n_20 = PHI <0(2), n_7(4)> > > # .MEM_13 = VDEF <.MEM_22> > > bytes[n_20] = _4; > > if (n_20 > 7) > > goto <bb 5>; > > > > <bb 4> > > n_7 = n_20 + 1; > > # .MEM_15 = VDEF <.MEM_13> > > bytes[n_20] = _5; > > goto <bb 3>; > > > > then when classifying the store in bb4, visiting the PHI node > > gets us to the store in bb3 which appears to be killing. > > > > if (gimple_code (temp) == GIMPLE_PHI) > > - defvar = PHI_RESULT (temp); > > + { > > + /* If we visit this PHI by following a backedge then reset > > + any info in ref that may refer to SSA names which we'd need > > + to PHI translate. */ > > + if (gimple_bb (temp) == gimple_bb (stmt) > > + || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt), > > + gimple_bb (temp))) > > + /* ??? ref->ref may not refer to SSA names or it may only > > + refer to SSA names that are invariant with respect to the > > + loop represented by this PHI node. */ > > + ref->ref = NULL_TREE; > > + defvar = PHI_RESULT (temp); > > + bitmap_set_bit (visited, SSA_NAME_VERSION (defvar)); > > + } > > > > should be a workable solution for that. I'm checking that, but > > eventually you can think of other things that might prevent us from > > handling backedges. Note the current code tries to allow > > looking across loops but not handle backedges of loops the > > original stmt belongs to. > > Just to mention before I leave for the day I think I've identified > a latent issue where I just fail to produce a testcase right now > which is that non-return exits from a function are not reliably > visited given they do not all have virtual operands, like > for example resx. Thus consider > > void foo (int *p, int x) > { > *p = 0; > if (x) > resx; > *p = 1; > return; > } > > where we will never visit any stmts on the resx path and thus think > the *p = 0 store is dead (all with current trunk of course). > > Will continue to dig tomorrow. PR85803. I am currently re-testing the following on x86_64-unknown-linux-gnu with --param dse-max-alias-queries-per-store=100000 in BOOT_CFLAGS. I'll refrain from doing the separate-walks-for-each-def thing but will try to fix PR85757 by pattern matching the def-has-single-virtual-use-that-is-another-def-in-the-vector case in which case we can remove it from further processing. Richard. 2018-05-16 Richard Biener <rguenther@suse.de> * params.def (PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE): New param. * doc/invoke.texi (dse-max-alias-queries-per-store): Document. * tree-ssa-dse.c: Include tree-ssa-loop.h. (check_name): New callback. (dse_classify_store): Track cycles via a visited bitmap of PHI defs and simplify handling of in-loop and across loop dead stores and properly fail for loop-variant refs. Handle byte-tracking with multiple defs. Use PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE for limiting the walk. * gcc.dg/tree-ssa/ssa-dse-32.c: New testcase. * gcc.dg/tree-ssa/ssa-dse-33.c: Likewise. commit 5c287e6190448176bd1ff56083a1c922e890189a Author: Richard Guenther <rguenther@suse.de> Date: Tue May 15 14:03:32 2018 +0200 dse#2 diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index ca3772bbebf..b78a2d67b0d 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -10439,6 +10439,11 @@ Average number of iterations of a loop. Maximum size (in bytes) of objects tracked bytewise by dead store elimination. Larger values may result in larger compilation times. +@item dse-max-alias-queries-per-store +Maximum number of queries into the alias oracle per store. +Larger values result in larger compilation times and may result in more +removed dead stores. + @item scev-max-expr-size Bound on size of expressions used in the scalar evolutions analyzer. Large expressions slow the analyzer. diff --git a/gcc/params.def b/gcc/params.def index dad47ec2b00..5c4e2c95072 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -547,6 +547,11 @@ DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE, "Maximum size (in bytes) of objects tracked bytewise by dead store elimination.", 256, 0, 0) +DEFPARAM(PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE, + "dse-max-alias-queries-per-store", + "Maximum number of queries into the alias oracle per store.", + 256, 0, 0) + DEFPARAM(PARAM_SCEV_MAX_EXPR_SIZE, "scev-max-expr-size", "Bound on size of expressions used in the scalar evolutions analyzer.", diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c new file mode 100644 index 00000000000..eea0d8d5cf0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-dse1-details" } */ + +void f(int n) +{ + char *p = __builtin_malloc (1); + int i; + do + *p = 0; + while (++i < n); +} + +/* { dg-final { scan-tree-dump-times "Deleted dead store" 1 "dse1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c new file mode 100644 index 00000000000..853cbb88190 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-dse1-details" } */ + +void f(char *p, int n) +{ + for (int i = 0; i < n; ++i) + *p = 0; /* Removed by DSE. */ + *p = 1; +} + +void g(char *p, int n) +{ + int i = 0; + do + *p = 0; /* Removed by DSE. */ + while (++i < n); + *p = 1; +} + + +/* { dg-final { scan-tree-dump-times "Deleted dead store" 2 "dse1" } } */ diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c index 126592a1d13..32a25b9eb1e 100644 --- a/gcc/tree-ssa-dse.c +++ b/gcc/tree-ssa-dse.c @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-cfgcleanup.h" #include "params.h" #include "alias.h" +#include "tree-ssa-loop.h" /* This file implements dead store elimination. @@ -515,6 +516,21 @@ live_bytes_read (ao_ref use_ref, ao_ref *ref, sbitmap live) return true; } +/* Callback for dse_classify_store calling for_each_index. Verify that + indices are invariant in the loop with backedge PHI in basic-block DATA. */ + +static bool +check_name (tree, tree *idx, void *data) +{ + basic_block phi_bb = (basic_block) data; + if (TREE_CODE (*idx) == SSA_NAME + && !SSA_NAME_IS_DEFAULT_DEF (*idx) + && dominated_by_p (CDI_DOMINATORS, gimple_bb (SSA_NAME_DEF_STMT (*idx)), + phi_bb)) + return false; + return true; +} + /* A helper of dse_optimize_stmt. Given a GIMPLE_ASSIGN in STMT that writes to REF, classify it according to downstream uses and defs. Sets *BY_CLOBBER_P to true @@ -527,7 +543,8 @@ dse_classify_store (ao_ref *ref, gimple *stmt, bool *by_clobber_p = NULL) { gimple *temp; - unsigned cnt = 0; + int cnt = 0; + auto_bitmap visited; if (by_clobber_p) *by_clobber_p = true; @@ -539,58 +556,50 @@ dse_classify_store (ao_ref *ref, gimple *stmt, temp = stmt; do { - gimple *use_stmt, *defvar_def; + gimple *use_stmt; imm_use_iterator ui; bool fail = false; tree defvar; - /* Limit stmt walking to be linear in the number of possibly - dead stores. */ - if (++cnt > 256) - return DSE_STORE_LIVE; - if (gimple_code (temp) == GIMPLE_PHI) - defvar = PHI_RESULT (temp); + { + /* If we visit this PHI by following a backedge then we have to + make sure ref->ref only refers to SSA names that are invariant + with respect to the loop represented by this PHI node. */ + if (dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt), + gimple_bb (temp)) + && !for_each_index (ref->ref ? &ref->ref : &ref->base, + check_name, gimple_bb (temp))) + return DSE_STORE_LIVE; + defvar = PHI_RESULT (temp); + bitmap_set_bit (visited, SSA_NAME_VERSION (defvar)); + } else defvar = gimple_vdef (temp); - defvar_def = temp; auto_vec<gimple *, 10> defs; FOR_EACH_IMM_USE_STMT (use_stmt, ui, defvar) { - cnt++; - - /* If we ever reach our DSE candidate stmt again fail. We - cannot handle dead stores in loops. */ - if (use_stmt == stmt) + /* Limit stmt walking. */ + if (++cnt > PARAM_VALUE (PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE)) { fail = true; BREAK_FROM_IMM_USE_STMT (ui); } + + /* We have visited ourselves already so ignore STMT for the + purpose of chaining. */ + if (use_stmt == stmt) + ; /* In simple cases we can look through PHI nodes, but we have to be careful with loops and with memory references containing operands that are also operands of PHI nodes. See gcc.c-torture/execute/20051110-*.c. */ else if (gimple_code (use_stmt) == GIMPLE_PHI) { - /* Make sure we are not in a loop latch block. */ - if (gimple_bb (stmt) == gimple_bb (use_stmt) - || dominated_by_p (CDI_DOMINATORS, - gimple_bb (stmt), gimple_bb (use_stmt)) - /* We can look through PHIs to regions post-dominating - the DSE candidate stmt. */ - || !dominated_by_p (CDI_POST_DOMINATORS, - gimple_bb (stmt), gimple_bb (use_stmt))) - { - fail = true; - BREAK_FROM_IMM_USE_STMT (ui); - } - /* Do not consider the PHI as use if it dominates the - stmt defining the virtual operand we are processing, - we have processed it already in this case. */ - if (gimple_bb (defvar_def) != gimple_bb (use_stmt) - && !dominated_by_p (CDI_DOMINATORS, - gimple_bb (defvar_def), - gimple_bb (use_stmt))) + /* If we already visited this PHI ignore it for further + processing. */ + if (!bitmap_bit_p (visited, + SSA_NAME_VERSION (PHI_RESULT (use_stmt)))) defs.safe_push (use_stmt); } /* If the statement is a use the store is not dead. */ @@ -600,25 +609,20 @@ dse_classify_store (ao_ref *ref, gimple *stmt, structure for USE_STMT and in doing so we find that the references hit non-live bytes and thus can be ignored. */ if (byte_tracking_enabled - && (!gimple_vdef (use_stmt) || defs.is_empty ())) + && is_gimple_assign (use_stmt)) { - if (is_gimple_assign (use_stmt)) + ao_ref use_ref; + ao_ref_init (&use_ref, gimple_assign_rhs1 (use_stmt)); + if (valid_ao_ref_for_dse (&use_ref) + && use_ref.base == ref->base + && known_eq (use_ref.size, use_ref.max_size) + && !live_bytes_read (use_ref, ref, live_bytes)) { - /* Other cases were noted as non-aliasing by - the call to ref_maybe_used_by_stmt_p. */ - ao_ref use_ref; - ao_ref_init (&use_ref, gimple_assign_rhs1 (use_stmt)); - if (valid_ao_ref_for_dse (&use_ref) - && use_ref.base == ref->base - && known_eq (use_ref.size, use_ref.max_size) - && !live_bytes_read (use_ref, ref, live_bytes)) - { - /* If this statement has a VDEF, then it is the - first store we have seen, so walk through it. */ - if (gimple_vdef (use_stmt)) - defs.safe_push (use_stmt); - continue; - } + /* If this is a store, remember it as we possibly + need to walk the defs uses. */ + if (gimple_vdef (use_stmt)) + defs.safe_push (use_stmt); + continue; } }
On Wed, 16 May 2018, Richard Biener wrote: > On Tue, 15 May 2018, Richard Biener wrote: > > > On Tue, 15 May 2018, Richard Biener wrote: > > > > > On Tue, 15 May 2018, Richard Biener wrote: > > > > > > > On Tue, 15 May 2018, Richard Biener wrote: > > > > > > > > > On Tue, 15 May 2018, Richard Biener wrote: > > > > > > > > > > > On Mon, 14 May 2018, Kugan Vivekanandarajah wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > Attached patch handles PR63185 when we reach PHI with temp != NULLL. > > > > > > > We could see the PHI and if there isn't any uses for PHI that is > > > > > > > interesting, we could ignore that ? > > > > > > > > > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu. > > > > > > > Is this OK? > > > > > > > > > > > > No, as Jeff said we can't do it this way. > > > > > > > > > > > > If we end up with multiple VDEFs in the walk of defvar immediate > > > > > > uses we know we are dealing with a CFG fork. We can't really > > > > > > ignore any of the paths but we have to > > > > > > > > > > > > a) find the merge point (and the associated VDEF) > > > > > > b) verify for each each chain of VDEFs with associated VUSEs > > > > > > up to that merge VDEF that we have no uses of the to classify > > > > > > store and collect (partial) kills > > > > > > c) intersect kill info and continue walking from the merge point > > > > > > > > > > > > in b) there's the optional possibility to find sinking opportunities > > > > > > in case we have kills on some paths but uses on others. This is why > > > > > > DSE should be really merged with (store) sinking. > > > > > > > > > > > > So if we want to enhance DSEs handling of branches then we need > > > > > > to refactor the simple dse_classify_store function. Let me take > > > > > > an attempt at this today. > > > > > > > > > > First (baby) step is the following - it arranges to collect the > > > > > defs we need to continue walking from and implements trivial > > > > > reduction by stopping at (full) kills. This allows us to handle > > > > > the new testcase (which was already handled in the very late DSE > > > > > pass with the help of sinking the store). > > > > > > > > > > I took the opportunity to kill the use_stmt parameter of > > > > > dse_classify_store as the only user is only looking for whether > > > > > the kills were all clobbers which I added a new parameter for. > > > > > > > > > > I didn't adjust the byte-tracking case fully (I'm not fully understanding > > > > > the code in the case of a use and I'm not sure whether it's worth > > > > > doing the def reduction with byte-tracking). > > > > > > > > > > Your testcase can be handled by reducing the PHI and the call def > > > > > by seeing that the only use of a candidate def is another def > > > > > we have already processed. Not sure if worth special-casing though, > > > > > I'd rather have a go at "recursing". That will be the next > > > > > patch. > > > > > > > > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > > > > > > > Applied. > > > > > > > > Another intermediate one below, fixing the byte-tracking for > > > > stmt with uses. This also re-does the PHI handling by simply > > > > avoiding recursion by means of a visited bitmap and stopping > > > > at the DSE classify stmt when re-visiting it instead of failing. > > > > This covers Pratamesh loop case for which I added ssa-dse-33.c. > > > > For the do-while loop this still runs into the inability to > > > > handle two defs to walk from. > > > > > > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > > > > > Ok, loop handling doesn't work in general since we run into the > > > issue that SSA form across the backedge is not representing the > > > same values. Consider > > > > > > <bb 3> > > > # .MEM_22 = PHI <.MEM_12(D)(2), .MEM_13(4)> > > > # n_20 = PHI <0(2), n_7(4)> > > > # .MEM_13 = VDEF <.MEM_22> > > > bytes[n_20] = _4; > > > if (n_20 > 7) > > > goto <bb 5>; > > > > > > <bb 4> > > > n_7 = n_20 + 1; > > > # .MEM_15 = VDEF <.MEM_13> > > > bytes[n_20] = _5; > > > goto <bb 3>; > > > > > > then when classifying the store in bb4, visiting the PHI node > > > gets us to the store in bb3 which appears to be killing. > > > > > > if (gimple_code (temp) == GIMPLE_PHI) > > > - defvar = PHI_RESULT (temp); > > > + { > > > + /* If we visit this PHI by following a backedge then reset > > > + any info in ref that may refer to SSA names which we'd need > > > + to PHI translate. */ > > > + if (gimple_bb (temp) == gimple_bb (stmt) > > > + || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt), > > > + gimple_bb (temp))) > > > + /* ??? ref->ref may not refer to SSA names or it may only > > > + refer to SSA names that are invariant with respect to the > > > + loop represented by this PHI node. */ > > > + ref->ref = NULL_TREE; > > > + defvar = PHI_RESULT (temp); > > > + bitmap_set_bit (visited, SSA_NAME_VERSION (defvar)); > > > + } > > > > > > should be a workable solution for that. I'm checking that, but > > > eventually you can think of other things that might prevent us from > > > handling backedges. Note the current code tries to allow > > > looking across loops but not handle backedges of loops the > > > original stmt belongs to. > > > > Just to mention before I leave for the day I think I've identified > > a latent issue where I just fail to produce a testcase right now > > which is that non-return exits from a function are not reliably > > visited given they do not all have virtual operands, like > > for example resx. Thus consider > > > > void foo (int *p, int x) > > { > > *p = 0; > > if (x) > > resx; > > *p = 1; > > return; > > } > > > > where we will never visit any stmts on the resx path and thus think > > the *p = 0 store is dead (all with current trunk of course). > > > > Will continue to dig tomorrow. > > PR85803. > > I am currently re-testing the following on x86_64-unknown-linux-gnu > with --param dse-max-alias-queries-per-store=100000 in BOOT_CFLAGS. > > I'll refrain from doing the separate-walks-for-each-def thing > but will try to fix PR85757 by pattern matching the > def-has-single-virtual-use-that-is-another-def-in-the-vector case > in which case we can remove it from further processing. After committing I am now testing that fix for PR85757 on x86_64-unknown-linux-gnu. Richard. 2018-05-16 Richard Biener <rguenther@suse.de> PR tree-optimization/85757 * tree-ssa-dse.c (dse_classify_store): Record a PHI def and remove defs that only feed that PHI from further processing. * gcc.dg/tree-ssa/ssa-dse-34.c: New testcase. From bf8054a966485ba32b1b25ab287312eaf4b763d8 Mon Sep 17 00:00:00 2001 From: Richard Guenther <rguenther@suse.de> Date: Wed, 16 May 2018 13:50:18 +0200 Subject: [PATCH 3/4] dse#3 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-34.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-34.c new file mode 100644 index 00000000000..3016dfecd38 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-34.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-dse1-details" } */ + +void f(int n, char *p0, char *p1, char *p2, char *o) +{ + int t0, t1; + __builtin_memcpy(&t0, p0, 1); + __builtin_memcpy(&t1, p1, 1); + if (n==3) + __builtin_memcpy(o+2, p2, 1); + __builtin_memcpy(o+0, &t0, 1); + __builtin_memcpy(o+1, &t1, 1); +} + +/* { dg-final { scan-tree-dump-times "Deleted dead store" 2 "dse1" } } */ diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c index 32a25b9eb1e..589cfef5df5 100644 --- a/gcc/tree-ssa-dse.c +++ b/gcc/tree-ssa-dse.c @@ -577,6 +577,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt, else defvar = gimple_vdef (temp); auto_vec<gimple *, 10> defs; + gimple *phi_def = NULL; FOR_EACH_IMM_USE_STMT (use_stmt, ui, defvar) { /* Limit stmt walking. */ @@ -600,7 +601,10 @@ dse_classify_store (ao_ref *ref, gimple *stmt, processing. */ if (!bitmap_bit_p (visited, SSA_NAME_VERSION (PHI_RESULT (use_stmt)))) - defs.safe_push (use_stmt); + { + defs.safe_push (use_stmt); + phi_def = use_stmt; + } } /* If the statement is a use the store is not dead. */ else if (ref_maybe_used_by_stmt_p (use_stmt, ref)) @@ -657,15 +661,31 @@ dse_classify_store (ao_ref *ref, gimple *stmt, return DSE_STORE_DEAD; } - /* Process defs and remove paths starting with a kill from further - processing. */ + /* Process defs and remove those we need not process further. */ for (unsigned i = 0; i < defs.length (); ++i) - if (stmt_kills_ref_p (defs[i], ref)) - { - if (by_clobber_p && !gimple_clobber_p (defs[i])) - *by_clobber_p = false; + { + gimple *def = defs[i]; + gimple *use_stmt; + use_operand_p use_p; + /* If the path to check starts with a kill we do not need to + process it further. + ??? With byte tracking we need only kill the bytes currently + live. */ + if (stmt_kills_ref_p (def, ref)) + { + if (by_clobber_p && !gimple_clobber_p (def)) + *by_clobber_p = false; + defs.unordered_remove (i); + } + /* In addition to kills we can remove defs whose only use + is another def in defs. That can only ever be PHIs of which + we track a single for simplicity reasons (we fail for multiple + PHIs anyways). */ + else if (gimple_code (def) != GIMPLE_PHI + && single_imm_use (gimple_vdef (def), &use_p, &use_stmt) + && use_stmt == phi_def) defs.unordered_remove (i); - } + } /* If all defs kill the ref we are done. */ if (defs.is_empty ()) -- 2.12.3
On 05/15/2018 08:42 AM, Richard Biener wrote: > On Tue, 15 May 2018, Richard Biener wrote: > >> On Tue, 15 May 2018, Richard Biener wrote: >> >>> On Tue, 15 May 2018, Richard Biener wrote: >>> >>>> On Tue, 15 May 2018, Richard Biener wrote: >>>> >>>>> On Mon, 14 May 2018, Kugan Vivekanandarajah wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Attached patch handles PR63185 when we reach PHI with temp != NULLL. >>>>>> We could see the PHI and if there isn't any uses for PHI that is >>>>>> interesting, we could ignore that ? >>>>>> >>>>>> Bootstrapped and regression tested on x86_64-linux-gnu. >>>>>> Is this OK? >>>>> >>>>> No, as Jeff said we can't do it this way. >>>>> >>>>> If we end up with multiple VDEFs in the walk of defvar immediate >>>>> uses we know we are dealing with a CFG fork. We can't really >>>>> ignore any of the paths but we have to >>>>> >>>>> a) find the merge point (and the associated VDEF) >>>>> b) verify for each each chain of VDEFs with associated VUSEs >>>>> up to that merge VDEF that we have no uses of the to classify >>>>> store and collect (partial) kills >>>>> c) intersect kill info and continue walking from the merge point >>>>> >>>>> in b) there's the optional possibility to find sinking opportunities >>>>> in case we have kills on some paths but uses on others. This is why >>>>> DSE should be really merged with (store) sinking. >>>>> >>>>> So if we want to enhance DSEs handling of branches then we need >>>>> to refactor the simple dse_classify_store function. Let me take >>>>> an attempt at this today. >>>> >>>> First (baby) step is the following - it arranges to collect the >>>> defs we need to continue walking from and implements trivial >>>> reduction by stopping at (full) kills. This allows us to handle >>>> the new testcase (which was already handled in the very late DSE >>>> pass with the help of sinking the store). >>>> >>>> I took the opportunity to kill the use_stmt parameter of >>>> dse_classify_store as the only user is only looking for whether >>>> the kills were all clobbers which I added a new parameter for. >>>> >>>> I didn't adjust the byte-tracking case fully (I'm not fully understanding >>>> the code in the case of a use and I'm not sure whether it's worth >>>> doing the def reduction with byte-tracking). >>>> >>>> Your testcase can be handled by reducing the PHI and the call def >>>> by seeing that the only use of a candidate def is another def >>>> we have already processed. Not sure if worth special-casing though, >>>> I'd rather have a go at "recursing". That will be the next >>>> patch. >>>> >>>> Bootstrap & regtest running on x86_64-unknown-linux-gnu. >>> >>> Applied. >>> >>> Another intermediate one below, fixing the byte-tracking for >>> stmt with uses. This also re-does the PHI handling by simply >>> avoiding recursion by means of a visited bitmap and stopping >>> at the DSE classify stmt when re-visiting it instead of failing. >>> This covers Pratamesh loop case for which I added ssa-dse-33.c. >>> For the do-while loop this still runs into the inability to >>> handle two defs to walk from. >>> >>> Bootstrap & regtest running on x86_64-unknown-linux-gnu. >> >> Ok, loop handling doesn't work in general since we run into the >> issue that SSA form across the backedge is not representing the >> same values. Consider >> >> <bb 3> >> # .MEM_22 = PHI <.MEM_12(D)(2), .MEM_13(4)> >> # n_20 = PHI <0(2), n_7(4)> >> # .MEM_13 = VDEF <.MEM_22> >> bytes[n_20] = _4; >> if (n_20 > 7) >> goto <bb 5>; >> >> <bb 4> >> n_7 = n_20 + 1; >> # .MEM_15 = VDEF <.MEM_13> >> bytes[n_20] = _5; >> goto <bb 3>; >> >> then when classifying the store in bb4, visiting the PHI node >> gets us to the store in bb3 which appears to be killing. >> >> if (gimple_code (temp) == GIMPLE_PHI) >> - defvar = PHI_RESULT (temp); >> + { >> + /* If we visit this PHI by following a backedge then reset >> + any info in ref that may refer to SSA names which we'd need >> + to PHI translate. */ >> + if (gimple_bb (temp) == gimple_bb (stmt) >> + || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt), >> + gimple_bb (temp))) >> + /* ??? ref->ref may not refer to SSA names or it may only >> + refer to SSA names that are invariant with respect to the >> + loop represented by this PHI node. */ >> + ref->ref = NULL_TREE; >> + defvar = PHI_RESULT (temp); >> + bitmap_set_bit (visited, SSA_NAME_VERSION (defvar)); >> + } >> >> should be a workable solution for that. I'm checking that, but >> eventually you can think of other things that might prevent us from >> handling backedges. Note the current code tries to allow >> looking across loops but not handle backedges of loops the >> original stmt belongs to. > > Just to mention before I leave for the day I think I've identified > a latent issue where I just fail to produce a testcase right now > which is that non-return exits from a function are not reliably > visited given they do not all have virtual operands, like > for example resx. Thus consider > > void foo (int *p, int x) > { > *p = 0; > if (x) > resx; > *p = 1; > return; > } > > where we will never visit any stmts on the resx path and thus think > the *p = 0 store is dead (all with current trunk of course). > > Will continue to dig tomorrow. Yes. I stumbled over this at some point as well. I think there's a BZ which touches on this issue, though I don't have the # and I don't recall if handling a non-return exit would be sufficient to address the BZ. Jeff
On 05/16/2018 04:12 AM, Richard Biener wrote: > On Tue, 15 May 2018, Richard Biener wrote: > >> On Tue, 15 May 2018, Richard Biener wrote: >> >>> On Tue, 15 May 2018, Richard Biener wrote: >>> >>>> On Tue, 15 May 2018, Richard Biener wrote: >>>> >>>>> On Tue, 15 May 2018, Richard Biener wrote: >>>>> >>>>>> On Mon, 14 May 2018, Kugan Vivekanandarajah wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Attached patch handles PR63185 when we reach PHI with temp != NULLL. >>>>>>> We could see the PHI and if there isn't any uses for PHI that is >>>>>>> interesting, we could ignore that ? >>>>>>> >>>>>>> Bootstrapped and regression tested on x86_64-linux-gnu. >>>>>>> Is this OK? >>>>>> >>>>>> No, as Jeff said we can't do it this way. >>>>>> >>>>>> If we end up with multiple VDEFs in the walk of defvar immediate >>>>>> uses we know we are dealing with a CFG fork. We can't really >>>>>> ignore any of the paths but we have to >>>>>> >>>>>> a) find the merge point (and the associated VDEF) >>>>>> b) verify for each each chain of VDEFs with associated VUSEs >>>>>> up to that merge VDEF that we have no uses of the to classify >>>>>> store and collect (partial) kills >>>>>> c) intersect kill info and continue walking from the merge point >>>>>> >>>>>> in b) there's the optional possibility to find sinking opportunities >>>>>> in case we have kills on some paths but uses on others. This is why >>>>>> DSE should be really merged with (store) sinking. >>>>>> >>>>>> So if we want to enhance DSEs handling of branches then we need >>>>>> to refactor the simple dse_classify_store function. Let me take >>>>>> an attempt at this today. >>>>> >>>>> First (baby) step is the following - it arranges to collect the >>>>> defs we need to continue walking from and implements trivial >>>>> reduction by stopping at (full) kills. This allows us to handle >>>>> the new testcase (which was already handled in the very late DSE >>>>> pass with the help of sinking the store). >>>>> >>>>> I took the opportunity to kill the use_stmt parameter of >>>>> dse_classify_store as the only user is only looking for whether >>>>> the kills were all clobbers which I added a new parameter for. >>>>> >>>>> I didn't adjust the byte-tracking case fully (I'm not fully understanding >>>>> the code in the case of a use and I'm not sure whether it's worth >>>>> doing the def reduction with byte-tracking). >>>>> >>>>> Your testcase can be handled by reducing the PHI and the call def >>>>> by seeing that the only use of a candidate def is another def >>>>> we have already processed. Not sure if worth special-casing though, >>>>> I'd rather have a go at "recursing". That will be the next >>>>> patch. >>>>> >>>>> Bootstrap & regtest running on x86_64-unknown-linux-gnu. >>>> >>>> Applied. >>>> >>>> Another intermediate one below, fixing the byte-tracking for >>>> stmt with uses. This also re-does the PHI handling by simply >>>> avoiding recursion by means of a visited bitmap and stopping >>>> at the DSE classify stmt when re-visiting it instead of failing. >>>> This covers Pratamesh loop case for which I added ssa-dse-33.c. >>>> For the do-while loop this still runs into the inability to >>>> handle two defs to walk from. >>>> >>>> Bootstrap & regtest running on x86_64-unknown-linux-gnu. >>> >>> Ok, loop handling doesn't work in general since we run into the >>> issue that SSA form across the backedge is not representing the >>> same values. Consider >>> >>> <bb 3> >>> # .MEM_22 = PHI <.MEM_12(D)(2), .MEM_13(4)> >>> # n_20 = PHI <0(2), n_7(4)> >>> # .MEM_13 = VDEF <.MEM_22> >>> bytes[n_20] = _4; >>> if (n_20 > 7) >>> goto <bb 5>; >>> >>> <bb 4> >>> n_7 = n_20 + 1; >>> # .MEM_15 = VDEF <.MEM_13> >>> bytes[n_20] = _5; >>> goto <bb 3>; >>> >>> then when classifying the store in bb4, visiting the PHI node >>> gets us to the store in bb3 which appears to be killing. >>> >>> if (gimple_code (temp) == GIMPLE_PHI) >>> - defvar = PHI_RESULT (temp); >>> + { >>> + /* If we visit this PHI by following a backedge then reset >>> + any info in ref that may refer to SSA names which we'd need >>> + to PHI translate. */ >>> + if (gimple_bb (temp) == gimple_bb (stmt) >>> + || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt), >>> + gimple_bb (temp))) >>> + /* ??? ref->ref may not refer to SSA names or it may only >>> + refer to SSA names that are invariant with respect to the >>> + loop represented by this PHI node. */ >>> + ref->ref = NULL_TREE; >>> + defvar = PHI_RESULT (temp); >>> + bitmap_set_bit (visited, SSA_NAME_VERSION (defvar)); >>> + } >>> >>> should be a workable solution for that. I'm checking that, but >>> eventually you can think of other things that might prevent us from >>> handling backedges. Note the current code tries to allow >>> looking across loops but not handle backedges of loops the >>> original stmt belongs to. >> >> Just to mention before I leave for the day I think I've identified >> a latent issue where I just fail to produce a testcase right now >> which is that non-return exits from a function are not reliably >> visited given they do not all have virtual operands, like >> for example resx. Thus consider >> >> void foo (int *p, int x) >> { >> *p = 0; >> if (x) >> resx; >> *p = 1; >> return; >> } >> >> where we will never visit any stmts on the resx path and thus think >> the *p = 0 store is dead (all with current trunk of course). >> >> Will continue to dig tomorrow. > > PR85803. > > I am currently re-testing the following on x86_64-unknown-linux-gnu > with --param dse-max-alias-queries-per-store=100000 in BOOT_CFLAGS. > > I'll refrain from doing the separate-walks-for-each-def thing > but will try to fix PR85757 by pattern matching the > def-has-single-virtual-use-that-is-another-def-in-the-vector case > in which case we can remove it from further processing. > > Richard. > > 2018-05-16 Richard Biener <rguenther@suse.de> > > * params.def (PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE): New param. > * doc/invoke.texi (dse-max-alias-queries-per-store): Document. > * tree-ssa-dse.c: Include tree-ssa-loop.h. > (check_name): New callback. > (dse_classify_store): Track cycles via a visited bitmap of PHI > defs and simplify handling of in-loop and across loop dead stores > and properly fail for loop-variant refs. Handle byte-tracking with > multiple defs. Use PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE for > limiting the walk. > > * gcc.dg/tree-ssa/ssa-dse-32.c: New testcase. > * gcc.dg/tree-ssa/ssa-dse-33.c: Likewise. Seems quite reasonable as well. jeff
On 05/15/2018 04:47 PM, Kugan Vivekanandarajah wrote: > Hi Richard, > > On 15 May 2018 at 19:20, Richard Biener <rguenther@suse.de> wrote: >> On Tue, 15 May 2018, Richard Biener wrote: >> >>> On Mon, 14 May 2018, Kugan Vivekanandarajah wrote: >>> >>>> Hi, >>>> >>>> Attached patch handles PR63185 when we reach PHI with temp != NULLL. >>>> We could see the PHI and if there isn't any uses for PHI that is >>>> interesting, we could ignore that ? >>>> >>>> Bootstrapped and regression tested on x86_64-linux-gnu. >>>> Is this OK? >>> >>> No, as Jeff said we can't do it this way. >>> >>> If we end up with multiple VDEFs in the walk of defvar immediate >>> uses we know we are dealing with a CFG fork. We can't really >>> ignore any of the paths but we have to >>> >>> a) find the merge point (and the associated VDEF) >>> b) verify for each each chain of VDEFs with associated VUSEs >>> up to that merge VDEF that we have no uses of the to classify >>> store and collect (partial) kills >>> c) intersect kill info and continue walking from the merge point >>> >>> in b) there's the optional possibility to find sinking opportunities >>> in case we have kills on some paths but uses on others. This is why >>> DSE should be really merged with (store) sinking. >>> >>> So if we want to enhance DSEs handling of branches then we need >>> to refactor the simple dse_classify_store function. Let me take >>> an attempt at this today. >> >> First (baby) step is the following - it arranges to collect the >> defs we need to continue walking from and implements trivial >> reduction by stopping at (full) kills. This allows us to handle >> the new testcase (which was already handled in the very late DSE >> pass with the help of sinking the store). > > Thanks for the explanation and thanks for working on this. > >> >> I took the opportunity to kill the use_stmt parameter of >> dse_classify_store as the only user is only looking for whether >> the kills were all clobbers which I added a new parameter for. >> >> I didn't adjust the byte-tracking case fully (I'm not fully understanding >> the code in the case of a use and I'm not sure whether it's worth >> doing the def reduction with byte-tracking). > > Since byte tracking is tracking the killed subset of bytes in the > original def, in your patch can we calculate the bytes killed by each > defs[i] and then find the intersection for the iteration. If we don't > have complete kill, we can use this to set live bytes and iterate till > all the live_bytes are cleared like it was done earlier. Correct. Jeff
On 05/16/2018 07:55 AM, Richard Biener wrote: > On Wed, 16 May 2018, Richard Biener wrote: > >> On Tue, 15 May 2018, Richard Biener wrote: >> >>> On Tue, 15 May 2018, Richard Biener wrote: >>> >>>> On Tue, 15 May 2018, Richard Biener wrote: >>>> >>>>> On Tue, 15 May 2018, Richard Biener wrote: >>>>> >>>>>> On Tue, 15 May 2018, Richard Biener wrote: >>>>>> >>>>>>> On Mon, 14 May 2018, Kugan Vivekanandarajah wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Attached patch handles PR63185 when we reach PHI with temp != NULLL. >>>>>>>> We could see the PHI and if there isn't any uses for PHI that is >>>>>>>> interesting, we could ignore that ? >>>>>>>> >>>>>>>> Bootstrapped and regression tested on x86_64-linux-gnu. >>>>>>>> Is this OK? >>>>>>> >>>>>>> No, as Jeff said we can't do it this way. >>>>>>> >>>>>>> If we end up with multiple VDEFs in the walk of defvar immediate >>>>>>> uses we know we are dealing with a CFG fork. We can't really >>>>>>> ignore any of the paths but we have to >>>>>>> >>>>>>> a) find the merge point (and the associated VDEF) >>>>>>> b) verify for each each chain of VDEFs with associated VUSEs >>>>>>> up to that merge VDEF that we have no uses of the to classify >>>>>>> store and collect (partial) kills >>>>>>> c) intersect kill info and continue walking from the merge point >>>>>>> >>>>>>> in b) there's the optional possibility to find sinking opportunities >>>>>>> in case we have kills on some paths but uses on others. This is why >>>>>>> DSE should be really merged with (store) sinking. >>>>>>> >>>>>>> So if we want to enhance DSEs handling of branches then we need >>>>>>> to refactor the simple dse_classify_store function. Let me take >>>>>>> an attempt at this today. >>>>>> >>>>>> First (baby) step is the following - it arranges to collect the >>>>>> defs we need to continue walking from and implements trivial >>>>>> reduction by stopping at (full) kills. This allows us to handle >>>>>> the new testcase (which was already handled in the very late DSE >>>>>> pass with the help of sinking the store). >>>>>> >>>>>> I took the opportunity to kill the use_stmt parameter of >>>>>> dse_classify_store as the only user is only looking for whether >>>>>> the kills were all clobbers which I added a new parameter for. >>>>>> >>>>>> I didn't adjust the byte-tracking case fully (I'm not fully understanding >>>>>> the code in the case of a use and I'm not sure whether it's worth >>>>>> doing the def reduction with byte-tracking). >>>>>> >>>>>> Your testcase can be handled by reducing the PHI and the call def >>>>>> by seeing that the only use of a candidate def is another def >>>>>> we have already processed. Not sure if worth special-casing though, >>>>>> I'd rather have a go at "recursing". That will be the next >>>>>> patch. >>>>>> >>>>>> Bootstrap & regtest running on x86_64-unknown-linux-gnu. >>>>> >>>>> Applied. >>>>> >>>>> Another intermediate one below, fixing the byte-tracking for >>>>> stmt with uses. This also re-does the PHI handling by simply >>>>> avoiding recursion by means of a visited bitmap and stopping >>>>> at the DSE classify stmt when re-visiting it instead of failing. >>>>> This covers Pratamesh loop case for which I added ssa-dse-33.c. >>>>> For the do-while loop this still runs into the inability to >>>>> handle two defs to walk from. >>>>> >>>>> Bootstrap & regtest running on x86_64-unknown-linux-gnu. >>>> >>>> Ok, loop handling doesn't work in general since we run into the >>>> issue that SSA form across the backedge is not representing the >>>> same values. Consider >>>> >>>> <bb 3> >>>> # .MEM_22 = PHI <.MEM_12(D)(2), .MEM_13(4)> >>>> # n_20 = PHI <0(2), n_7(4)> >>>> # .MEM_13 = VDEF <.MEM_22> >>>> bytes[n_20] = _4; >>>> if (n_20 > 7) >>>> goto <bb 5>; >>>> >>>> <bb 4> >>>> n_7 = n_20 + 1; >>>> # .MEM_15 = VDEF <.MEM_13> >>>> bytes[n_20] = _5; >>>> goto <bb 3>; >>>> >>>> then when classifying the store in bb4, visiting the PHI node >>>> gets us to the store in bb3 which appears to be killing. >>>> >>>> if (gimple_code (temp) == GIMPLE_PHI) >>>> - defvar = PHI_RESULT (temp); >>>> + { >>>> + /* If we visit this PHI by following a backedge then reset >>>> + any info in ref that may refer to SSA names which we'd need >>>> + to PHI translate. */ >>>> + if (gimple_bb (temp) == gimple_bb (stmt) >>>> + || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt), >>>> + gimple_bb (temp))) >>>> + /* ??? ref->ref may not refer to SSA names or it may only >>>> + refer to SSA names that are invariant with respect to the >>>> + loop represented by this PHI node. */ >>>> + ref->ref = NULL_TREE; >>>> + defvar = PHI_RESULT (temp); >>>> + bitmap_set_bit (visited, SSA_NAME_VERSION (defvar)); >>>> + } >>>> >>>> should be a workable solution for that. I'm checking that, but >>>> eventually you can think of other things that might prevent us from >>>> handling backedges. Note the current code tries to allow >>>> looking across loops but not handle backedges of loops the >>>> original stmt belongs to. >>> >>> Just to mention before I leave for the day I think I've identified >>> a latent issue where I just fail to produce a testcase right now >>> which is that non-return exits from a function are not reliably >>> visited given they do not all have virtual operands, like >>> for example resx. Thus consider >>> >>> void foo (int *p, int x) >>> { >>> *p = 0; >>> if (x) >>> resx; >>> *p = 1; >>> return; >>> } >>> >>> where we will never visit any stmts on the resx path and thus think >>> the *p = 0 store is dead (all with current trunk of course). >>> >>> Will continue to dig tomorrow. >> >> PR85803. >> >> I am currently re-testing the following on x86_64-unknown-linux-gnu >> with --param dse-max-alias-queries-per-store=100000 in BOOT_CFLAGS. >> >> I'll refrain from doing the separate-walks-for-each-def thing >> but will try to fix PR85757 by pattern matching the >> def-has-single-virtual-use-that-is-another-def-in-the-vector case >> in which case we can remove it from further processing. > > After committing I am now testing that fix for PR85757 > on x86_64-unknown-linux-gnu. > > Richard. > > > 2018-05-16 Richard Biener <rguenther@suse.de> > > PR tree-optimization/85757 > * tree-ssa-dse.c (dse_classify_store): Record a PHI def and > remove defs that only feed that PHI from further processing. > > * gcc.dg/tree-ssa/ssa-dse-34.c: New testcase. > Seems reasonable. Jeff
From a69caa24d9c1914b7617a937e84c3b612ffe6d9b Mon Sep 17 00:00:00 2001 From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org> Date: Wed, 9 May 2018 16:26:16 +1000 Subject: [PATCH] PR63185 Change-Id: I9d307884add10d5b5ff07aa31dd86cb83b2388ec --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c | 13 +++++++++++++ gcc/tree-ssa-dse.c | 30 +++++++++++++++++++++++++++++- 2 files changed, 42 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c new file mode 100644 index 0000000..46cb7d1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-33.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-dse-details" } */ + +void g(); +void f(int n) +{ + char *p = malloc(1024); + memset (p, 8, 1024); + if (n) + g(); +} + +/* { dg-final { scan-tree-dump-times "Deleted dead calls" 1 "dse1"} } */ diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c index 9220fea..e7a4637 100644 --- a/gcc/tree-ssa-dse.c +++ b/gcc/tree-ssa-dse.c @@ -515,6 +515,30 @@ live_bytes_read (ao_ref use_ref, ao_ref *ref, sbitmap live) return true; } +/* Return true if there isnt any VDEF or VUSE by following PHI. */ + +static bool +phi_dosent_define_nor_use_p (ao_ref *ref, gphi *phi) +{ + gimple *phi_use; + imm_use_iterator ui; + tree def = PHI_RESULT (phi); + bool ok = true; + + FOR_EACH_IMM_USE_STMT (phi_use, ui, def) + { + if (ref_maybe_used_by_stmt_p (phi_use, ref) + || gimple_vdef (phi_use) + || gimple_code (phi_use) == GIMPLE_PHI) + { + ok = false; + BREAK_FROM_IMM_USE_STMT (ui); + } + } + + return ok; +} + /* A helper of dse_optimize_stmt. Given a GIMPLE_ASSIGN in STMT that writes to REF, find a candidate statement *USE_STMT that may prove STMT to be dead. @@ -570,6 +594,9 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, else if (gimple_code (use_stmt) == GIMPLE_PHI) { if (temp + && phi_dosent_define_nor_use_p (ref, as_a <gphi *> (use_stmt))) + ; + else if (temp /* Make sure we are not in a loop latch block. */ || gimple_bb (stmt) == gimple_bb (use_stmt) || dominated_by_p (CDI_DOMINATORS, @@ -585,7 +612,8 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt, /* Do not consider the PHI as use if it dominates the stmt defining the virtual operand we are processing, we have processed it already in this case. */ - if (gimple_bb (defvar_def) != gimple_bb (use_stmt) + if (!temp + && gimple_bb (defvar_def) != gimple_bb (use_stmt) && !dominated_by_p (CDI_DOMINATORS, gimple_bb (defvar_def), gimple_bb (use_stmt))) -- 2.7.4