diff mbox series

inline_small_functions speedup

Message ID 877eskygfu.fsf@linaro.org
State Accepted
Commit 01b9bf0615e178e6c31f8d010cb8df9382b3c2d4
Headers show
Series inline_small_functions speedup | expand

Commit Message

Richard Sandiford Jan. 14, 2018, 8:48 a.m. UTC
After inlining A into B, inline_small_functions updates the information
for (most) callees and callers of the new B:

	  update_callee_keys (&edge_heap, where, updated_nodes);
      [...]
      /* Our profitability metric can depend on local properties
	 such as number of inlinable calls and size of the function body.
	 After inlining these properties might change for the function we
	 inlined into (since it's body size changed) and for the functions
	 called by function we inlined (since number of it inlinable callers
	 might change).  */
      update_caller_keys (&edge_heap, where, updated_nodes, NULL);

These functions in turn call can_inline_edge_p for most of the associated
edges:

	    if (can_inline_edge_p (edge, false)
		&& want_inline_small_function_p (edge, false))
	      update_edge_key (heap, edge);

can_inline_edge_p indirectly calls estimate_calls_size_and_time
on the caller node, which seems to recursively process all callee
edges rooted at the node.  It looks from this like algorithm can be
at least quadratic in the worst case.

Maybe there's something we can do to make can_inline_edge_p cheaper, but
since neither of these two calls is responsible for reporting an inline
failure reason, it seems cheaper to test want_inline_small_function_p
first, so that we don't calculate an estimate for something that we
already know isn't a "small function".  I think the only change
needed to make that work is to check for CIF_FINAL_ERROR in
want_inline_small_function_p; at the moment we rely on can_inline_edge_p
to make that check.

This cuts the time to build optabs.ii by over 4% with an
--enable-checking=release compiler on x86_64-linux-gnu.  I've seen more
dramatic wins on aarch64-linux-gnu due to the NUM_POLY_INT_COEFFS==2
thing.  The patch doesn't affect the output code.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* ipa-inline.c (want_inline_small_function_p): Return false if
	inlining has already failed with CIF_FINAL_ERROR.
	(update_caller_keys): Call want_inline_small_function_p before
	can_inline_edge_p.
	(update_callee_keys): Likewise.

Comments

Jan Hubicka Jan. 14, 2018, 10:51 a.m. UTC | #1
> After inlining A into B, inline_small_functions updates the information

> for (most) callees and callers of the new B:

> 

> 	  update_callee_keys (&edge_heap, where, updated_nodes);

>       [...]

>       /* Our profitability metric can depend on local properties

> 	 such as number of inlinable calls and size of the function body.

> 	 After inlining these properties might change for the function we

> 	 inlined into (since it's body size changed) and for the functions

> 	 called by function we inlined (since number of it inlinable callers

> 	 might change).  */

>       update_caller_keys (&edge_heap, where, updated_nodes, NULL);

> 

> These functions in turn call can_inline_edge_p for most of the associated

> edges:

> 

> 	    if (can_inline_edge_p (edge, false)

> 		&& want_inline_small_function_p (edge, false))

> 	      update_edge_key (heap, edge);

> 

> can_inline_edge_p indirectly calls estimate_calls_size_and_time

> on the caller node, which seems to recursively process all callee

> edges rooted at the node.  It looks from this like algorithm can be

> at least quadratic in the worst case.


Yep, I have patch to add overall size into summary that makes it constant
time.  Will need to clean it up for mainline.
> 

> Maybe there's something we can do to make can_inline_edge_p cheaper, but

> since neither of these two calls is responsible for reporting an inline

> failure reason, it seems cheaper to test want_inline_small_function_p

> first, so that we don't calculate an estimate for something that we

> already know isn't a "small function".  I think the only change

> needed to make that work is to check for CIF_FINAL_ERROR in

> want_inline_small_function_p; at the moment we rely on can_inline_edge_p

> to make that check.

> 

> This cuts the time to build optabs.ii by over 4% with an

> --enable-checking=release compiler on x86_64-linux-gnu.  I've seen more

> dramatic wins on aarch64-linux-gnu due to the NUM_POLY_INT_COEFFS==2

> thing.  The patch doesn't affect the output code.

> 

> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.

> OK to install?


Anyway this seems to make sense to do.
> 

> Richard

> 

> 

> 2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>

> 

> gcc/

> 	* ipa-inline.c (want_inline_small_function_p): Return false if

> 	inlining has already failed with CIF_FINAL_ERROR.

> 	(update_caller_keys): Call want_inline_small_function_p before

> 	can_inline_edge_p.

> 	(update_callee_keys): Likewise.


OK,
thanks!
Honza
diff mbox series

Patch

Index: gcc/ipa-inline.c
===================================================================
--- gcc/ipa-inline.c	2018-01-09 14:29:35.151550415 +0000
+++ gcc/ipa-inline.c	2018-01-14 08:43:35.653122186 +0000
@@ -706,7 +706,11 @@  want_inline_small_function_p (struct cgr
   bool want_inline = true;
   struct cgraph_node *callee = e->callee->ultimate_alias_target ();
 
-  if (DECL_DISREGARD_INLINE_LIMITS (callee->decl))
+  /* Allow this function to be called before can_inline_edge_p,
+     since it's usually cheaper.  */
+  if (cgraph_inline_failed_type (e->inline_failed) == CIF_FINAL_ERROR)
+    want_inline = false;
+  else if (DECL_DISREGARD_INLINE_LIMITS (callee->decl))
     ;
   else if (!DECL_DECLARED_INLINE_P (callee->decl)
 	   && !opt_for_fn (e->caller->decl, flag_inline_small_functions))
@@ -1312,8 +1316,8 @@  update_caller_keys (edge_heap_t *heap, s
         if (!check_inlinablity_for
 	    || check_inlinablity_for == edge)
 	  {
-	    if (can_inline_edge_p (edge, false)
-		&& want_inline_small_function_p (edge, false))
+	    if (want_inline_small_function_p (edge, false)
+		&& can_inline_edge_p (edge, false))
 	      update_edge_key (heap, edge);
 	    else if (edge->aux)
 	      {
@@ -1356,8 +1360,8 @@  update_callee_keys (edge_heap_t *heap, s
 	    && avail >= AVAIL_AVAILABLE
 	    && !bitmap_bit_p (updated_nodes, callee->uid))
 	  {
-	    if (can_inline_edge_p (e, false)
-		&& want_inline_small_function_p (e, false))
+	    if (want_inline_small_function_p (e, false)
+		&& can_inline_edge_p (e, false))
 	      update_edge_key (heap, e);
 	    else if (e->aux)
 	      {