Message ID | BANLkTimKTPwg3kM+rUsH7SDQOBX2iF-e5Q@mail.gmail.com |
---|---|
State | New |
Headers | show |
Hello, The following is a summary of discussion I had with Ayal regarding the patch: Some background: currently, SMS supports only targets where the doloop pattern is decoupled from the rest of the loop's instructions (for example PowerPC) (we'll call it 'case decoupled' for simplicity) In this case, the branch is not been scheduled with the rest of the instructions but rather placed in row ii-1 at the end of the scheduling procedure after all the rest of the instructions had been scheduled. The resulting kernel is optimal with respect to the Stage Count because min_cycle placed in row 0 after normalizing the cycles to start from cycle zero. This patch tries to extend SMS to support targets where the doloop pattern is not decoupled from the rest of the loop's instructions (name it 'case NEW' for simplicity). In this case the branch can not be placed wherever we want due to the fact it must honor dependencies and thus we schedule the branch instruction with the rest of the loop's instructions and rotate it to be in row ii-1 at the end of the scheduling procedure to make sure it's the last instruction in the iteration. The suggestion was to simplify the patch by always schedule the branch with the rest of the instructions. This should not effect performance but rather code size by increasing the SC by at most 1, which means adding instructions from at most one iteration to the prologue and epilogue; for case decoupled. (where we have the alternative of normalizing the cycles and achieve optimal SC). The following is my attempt to prove that the SC can increase by at most one: If the distance between min_cycle and max_cycle remains the same when considering the same loop with decoupled branch part, once scheduling the branch instruction with the rest of the loop's instructions and once ignoring it; it means that the SC is at most +1 for the first case. This is true in one direction as the branch instruction should not effect the scheduling window of any other instruction which is what we expect for case decoupled. The question is if there are cases where the branch can be scheduled outside the range of min_cycle and max_cycle. I think there is no such case because the branch will be scheduled in asap = 0 which means that it will fall in the range of min_cycle max_cycle. In practice there is edge between memory references and the branch instruction with latency zero which is inserted by haifa sched. Also, it might be that the branch will be scheduled outside the range of min_cycle and max_cycle due to resources constraints. For example, in PowerPC the issue rate in SMS in always 1 which forces the branch to be scheduled in a new cycle (and might also influence ii in artifact way). Example of resulting SMS kernel for the same loop: The SMS kernel for case NEW, resulting in SC of 3 and ii 5: cycle node ------------------------------------ -3 0 -2 -1 1 <- start of SC 2 0 1 2 3 3 4 5 the branch <- start of SC 3 5 4 The SMS kernel for case decoupled resulting in SC of 2 and ii 5: cycle node ------------------------------------ 0 0 1 2 1 3 3 4 2 <- start of SC 2 5 6 3 Thanks, Revital
=== modified file 'gcc/ddg.c' --- gcc/ddg.c 2011-03-27 07:11:08 +0000 +++ gcc/ddg.c 2011-04-11 12:04:54 +0000 @@ -1116,13 +1116,19 @@ find_nodes_on_paths (sbitmap result, ddg { ddg_edge_ptr e; ddg_node_ptr u_node = &g->nodes[u]; - + + /* Ignore DEBUG_INSNs when calculating the SCCs to avoid their + influence on the scheduling order and rec_mii. */ + if (DEBUG_INSN_P (u_node->insn)) + continue; + for (e = u_node->out; e != (ddg_edge_ptr) 0; e = e->next_out) { ddg_node_ptr v_node = e->dest; int v = v_node->cuid; - if (!TEST_BIT (reachable_from, v)) + /* Ignore DEBUG_INSN when calculating the SCCs. */ + if (!TEST_BIT (reachable_from, v) && !DEBUG_INSN_P (v_node->insn)) { SET_BIT (reachable_from, v); SET_BIT (tmp, v); @@ -1146,12 +1152,18 @@ find_nodes_on_paths (sbitmap result, ddg ddg_edge_ptr e; ddg_node_ptr u_node = &g->nodes[u]; + /* Ignore DEBUG_INSNs when calculating the SCCs to avoid their + influence on the scheduling order and rec_mii. */ + if (DEBUG_INSN_P (u_node->insn)) + continue; + for (e = u_node->in; e != (ddg_edge_ptr) 0; e = e->next_in) { ddg_node_ptr v_node = e->src; int v = v_node->cuid; - if (!TEST_BIT (reach_to, v)) + /* Ignore DEBUG_INSN when calculating the SCCs. */ + if (!TEST_BIT (reach_to, v) && !DEBUG_INSN_P (v_node->insn)) { SET_BIT (reach_to, v); SET_BIT (tmp, v);