From patchwork Fri Mar 4 13:56:55 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 324 Return-Path: Delivered-To: unknown Received: from imap.gmail.com (74.125.159.109) by localhost6.localdomain6 with IMAP4-SSL; 08 Jun 2011 14:41:42 -0000 Delivered-To: patches@linaro.org Received: by 10.224.60.68 with SMTP id o4cs15657qah; Fri, 4 Mar 2011 05:56:59 -0800 (PST) Received: by 10.217.2.73 with SMTP id o51mr650017wes.66.1299247018615; Fri, 04 Mar 2011 05:56:58 -0800 (PST) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx.google.com with ESMTPS id z1si4158120weq.201.2011.03.04.05.56.58 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 04 Mar 2011 05:56:58 -0800 (PST) Received-SPF: neutral (google.com: 74.125.82.50 is neither permitted nor denied by best guess record for domain of richard.sandiford@linaro.org) client-ip=74.125.82.50; Authentication-Results: mx.google.com; spf=neutral (google.com: 74.125.82.50 is neither permitted nor denied by best guess record for domain of richard.sandiford@linaro.org) smtp.mail=richard.sandiford@linaro.org Received: by wwb31 with SMTP id 31so2834229wwb.31 for ; Fri, 04 Mar 2011 05:56:58 -0800 (PST) Received: by 10.216.30.80 with SMTP id j58mr580652wea.64.1299247018140; Fri, 04 Mar 2011 05:56:58 -0800 (PST) Received: from richards-thinkpad (gbibp9ph1--blueice2n1.emea.ibm.com [195.212.29.75]) by mx.google.com with ESMTPS id n1sm1179026weq.31.2011.03.04.05.56.56 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 04 Mar 2011 05:56:57 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, patches@linaro.org, richard.sandiford@linaro.org Cc: patches@linaro.org Subject: [2/2] Reducing the overhead of dwarf2 location tracking Date: Fri, 04 Mar 2011 13:56:55 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) MIME-Version: 1.0 Suppose we have a function F that is inlined several times. Suppose too that F has a local variable X, and that no "real" (as opposed to debug) references to X remain after pre-inlining optimisations. In this case, we will add X to BLOCK_NONLOCALIZED_VARS rather than duplicate it each time F is inlined. Location notes for each inlining of F will then refer to the same non-localised X decl. This in turn means that each inlining of F has the same location list for X, with the list specifying the location of X for all inlinings of F. Jakub confirms that this indeed the intended behaviour, and I haven't seen any problem with the output. However, the way we generate the lists within dwarf2out.c is then quadratic in the number of times F is inlined. We generate the location list every time F is inlined, then (at the end) run an optimisation pass to collapse these duplicate lists back together. Compile times are usually OK, but on this ARM testcase: https://bugs.launchpad.net/gcc-linaro/+bug/714921/+attachment/1837418/+files/gcc-oom.i the duplicate lists take up a huge amount of memory. It isn't actually possible build the thing with -O2 -g on most 32-bit hosts, whereas with 4.4 it was. The patch below caches the list for each BLOCK_NONLOCALIZED_VAR so that the list can be shared immediately. The testcase then goes from needing over 2GB of virtual memory to needing less than 1GB. I didn't want to grow var_loc_list_def by putting the cache there, since most uses aren't like this. I therefore went for a separate hash table instead. (I wondered at first whether we could store the cache in the existing var_loc_list_def fields, e.g. by setting "first" to null and putting the cache pointer in a union with "last". This would be the only case in which "first" was null and the second (union) field was nonnull. However, it didn't look like this would be safe in cases where another location list references the decl, perhaps with a different want_address. In other words, I didn't think it would be safe in cases where dw_loc_list is not called indirectly from add_location_or_const_value_attribute. I don't think it's safe in general to cache the list in loc_list_from_tree, since some callers modify it.) The patch also makes sure that resolve_addr copes with cases where location lists are already shared. Bootstrapped & regression-tested on x86_64-linux-gnu. Also checked to the make sure that the "after" output for the testcase only differs from the "before" output (which had 1/2 applied) in the numbering of the location list labels. OK for 4.6? How about 4.5? Richard gcc/ * dwarf2out.c (dw_loc_list_node): Add resolved_addr and replaced. (cached_dw_loc_list_def): New structure. (cached_dw_loc_list): New typedef. (cached_dw_loc_list_table): New variable. (cached_dw_loc_list_table_hash): New function. (cached_dw_loc_list_table_eq): Likewise. (add_location_or_const_value_attribute): Take a bool cache_p. Cache the list when the parameter is true. (gen_formal_parameter_die): Update caller. (gen_variable_die): Likewise. (dwarf2out_finish): Likewise. (dwarf2out_function_decl): Clear cached_dw_loc_list_table. (dwarf2out_init): Initialize cached_dw_loc_list_table. (resolve_addr): Cache the result of resolving a chain of location lists. Index: gcc/dwarf2out.c =================================================================== --- gcc/dwarf2out.c 2011-03-04 13:15:28.000000000 +0000 +++ gcc/dwarf2out.c 2011-03-04 13:50:33.000000000 +0000 @@ -4464,6 +4464,11 @@ typedef struct GTY(()) dw_loc_list_struc const char *section; /* Section this loclist is relative to */ dw_loc_descr_ref expr; hashval_t hash; + /* True if all addresses in this and subsequent lists are known to be + resolved. */ + bool resolved_addr; + /* True if this list has been replaced by dw_loc_next. */ + bool replaced; bool emitted; } dw_loc_list_node; @@ -6119,6 +6124,19 @@ typedef struct var_loc_list_def var_loc_ /* Table of decl location linked lists. */ static GTY ((param_is (var_loc_list))) htab_t decl_loc_table; +/* A cached location list. */ +struct GTY (()) cached_dw_loc_list_def { + /* The DECL_UID of the decl that this entry describes. */ + unsigned int decl_id; + + /* The cached location list. */ + dw_loc_list_ref loc_list; +}; +typedef struct cached_dw_loc_list_def cached_dw_loc_list; + +/* Table of cached location lists. */ +static GTY ((param_is (cached_dw_loc_list))) htab_t cached_dw_loc_list_table; + /* A pointer to the base of a list of references to DIE's that are uniquely identified by their tag, presence/absence of children DIE's, and list of attribute/value pairs. */ @@ -6479,7 +6497,7 @@ static void insert_int (HOST_WIDE_INT, u static void insert_double (double_int, unsigned char *); static void insert_float (const_rtx, unsigned char *); static rtx rtl_for_decl_location (tree); -static bool add_location_or_const_value_attribute (dw_die_ref, tree, +static bool add_location_or_const_value_attribute (dw_die_ref, tree, bool, enum dwarf_attribute); static bool tree_add_const_value_attribute (dw_die_ref, tree); static bool tree_add_const_value_attribute_for_decl (dw_die_ref, tree); @@ -8201,6 +8219,24 @@ lookup_decl_loc (const_tree decl) htab_find_with_hash (decl_loc_table, decl, DECL_UID (decl)); } +/* Returns a hash value for X (which really is a cached_dw_loc_list_list). */ + +static hashval_t +cached_dw_loc_list_table_hash (const void *x) +{ + return (hashval_t) ((const cached_dw_loc_list *) x)->decl_id; +} + +/* Return nonzero if decl_id of cached_dw_loc_list X is the same as + UID of decl *Y. */ + +static int +cached_dw_loc_list_table_eq (const void *x, const void *y) +{ + return (((const cached_dw_loc_list *) x)->decl_id + == DECL_UID ((const_tree) y)); +} + /* Equate a DIE to a particular declaration. */ static void @@ -16978,15 +17014,22 @@ fortran_common (tree decl, HOST_WIDE_INT these things can crop up in other ways also.) Note that one type of constant value which can be passed into an inlined function is a constant pointer. This can happen for example if an actual argument in an inlined - function call evaluates to a compile-time constant address. */ + function call evaluates to a compile-time constant address. + + CACHE_P is true if it is worth caching the location list for DECL, + so that future calls can reuse it rather than regenerate it from scratch. + This is true for BLOCK_NONLOCALIZED_VARS in inlined subroutines, + since we will need to refer to them each time the function is inlined. */ static bool -add_location_or_const_value_attribute (dw_die_ref die, tree decl, +add_location_or_const_value_attribute (dw_die_ref die, tree decl, bool cache_p, enum dwarf_attribute attr) { rtx rtl; dw_loc_list_ref list; var_loc_list *loc_list; + cached_dw_loc_list *cache; + void **slot; if (TREE_CODE (decl) == ERROR_MARK) return false; @@ -17023,7 +17066,31 @@ add_location_or_const_value_attribute (d && add_const_value_attribute (die, rtl)) return true; } - list = loc_list_from_tree (decl, decl_by_reference_p (decl) ? 0 : 2); + list = NULL; + /* If this decl is from BLOCK_NONLOCALIZED_VARS, we might need its + list several times. See if we've already cached the contents. */ + if (cache_p && loc_list) + { + cache = (cached_dw_loc_list *) + htab_find_with_hash (cached_dw_loc_list_table, decl, DECL_UID (decl)); + if (cache) + list = cache->loc_list; + } + if (list == NULL) + { + list = loc_list_from_tree (decl, decl_by_reference_p (decl) ? 0 : 2); + /* It is usually worth caching this result if the decl is from + BLOCK_NONLOCALIZED_VARS and if the list has at least two elements. */ + if (cache_p && loc_list && list && list->dw_loc_next) + { + slot = htab_find_slot_with_hash (cached_dw_loc_list_table, decl, + DECL_UID (decl), INSERT); + cache = ggc_alloc_cleared_cached_dw_loc_list (); + cache->decl_id = DECL_UID (decl); + cache->loc_list = list; + *slot = cache; + } + } if (list) { add_AT_location_description (die, attr, list); @@ -18684,7 +18751,7 @@ gen_formal_parameter_die (tree node, tre equate_decl_number_to_die (node, parm_die); if (! DECL_ABSTRACT (node_or_origin)) add_location_or_const_value_attribute (parm_die, node_or_origin, - DW_AT_location); + node == NULL, DW_AT_location); break; @@ -19723,9 +19790,8 @@ gen_variable_die (tree decl, tree origin && !TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (decl_or_origin))) defer_location (decl_or_origin, var_die); else - add_location_or_const_value_attribute (var_die, - decl_or_origin, - DW_AT_location); + add_location_or_const_value_attribute (var_die, decl_or_origin, + decl == NULL, DW_AT_location); add_pubname (decl_or_origin, var_die); } else @@ -21501,6 +21567,7 @@ dwarf2out_function_decl (tree decl) dwarf2out_decl (decl); htab_empty (decl_loc_table); + htab_empty (cached_dw_loc_list_table); } /* Output a marker (i.e. a label) for the beginning of the generated code for @@ -22210,6 +22277,11 @@ dwarf2out_init (const char *filename ATT decl_loc_table = htab_create_ggc (10, decl_loc_table_hash, decl_loc_table_eq, NULL); + /* Allocate the cached_dw_loc_list_table. */ + cached_dw_loc_list_table + = htab_create_ggc (10, cached_dw_loc_list_table_hash, + cached_dw_loc_list_table_eq, NULL); + /* Allocate the initial hunk of the decl_scope_table. */ decl_scope_table = VEC_alloc (tree, gc, 256); @@ -22852,30 +22924,53 @@ resolve_addr (dw_die_ref die) { dw_die_ref c; dw_attr_ref a; - dw_loc_list_ref *curr; + dw_loc_list_ref *curr, *start, loc; unsigned ix; FOR_EACH_VEC_ELT (dw_attr_node, die->die_attr, ix, a) switch (AT_class (a)) { case dw_val_class_loc_list: - curr = AT_loc_list_ptr (a); - while (*curr) + start = curr = AT_loc_list_ptr (a); + loc = *curr; + gcc_assert (loc); + /* The same list can be referenced more than once. See if we have + already recorded the result from a previous pass. */ + if (loc->replaced) + *curr = loc->dw_loc_next; + else if (!loc->resolved_addr) { - if (!resolve_addr_in_expr ((*curr)->expr)) + /* As things stand, we do not expect or allow one die to + reference a suffix of another die's location list chain. + References must be identical or completely separate. + There is therefore no need to cache the result of this + pass on any list other than the first; doing so + would lead to unnecessary writes. */ + while (*curr) { - dw_loc_list_ref next = (*curr)->dw_loc_next; - if (next && (*curr)->ll_symbol) + gcc_assert (!(*curr)->replaced && !(*curr)->resolved_addr); + if (!resolve_addr_in_expr ((*curr)->expr)) { - gcc_assert (!next->ll_symbol); - next->ll_symbol = (*curr)->ll_symbol; + dw_loc_list_ref next = (*curr)->dw_loc_next; + if (next && (*curr)->ll_symbol) + { + gcc_assert (!next->ll_symbol); + next->ll_symbol = (*curr)->ll_symbol; + } + *curr = next; } - *curr = next; + else + curr = &(*curr)->dw_loc_next; } + if (loc == *start) + loc->resolved_addr = 1; else - curr = &(*curr)->dw_loc_next; + { + loc->replaced = 1; + loc->dw_loc_next = *start; + } } - if (!AT_loc_list (a)) + if (!*start) { remove_AT (die, a->dw_attr); ix--; @@ -23304,6 +23399,7 @@ dwarf2out_finish (const char *filename) add_location_or_const_value_attribute ( VEC_index (deferred_locations, deferred_locations_list, i)->die, VEC_index (deferred_locations, deferred_locations_list, i)->variable, + false, DW_AT_location); }