Message ID | 1a50afaa-6b8e-ba98-6cde-aae51c05a2c4@suse.cz |
---|---|
State | New |
Headers | show |
On Thu, Nov 3, 2016 at 1:12 PM, Martin Liška <mliska@suse.cz> wrote: > Hello. > > This is small follow-up of the patches I sent to string folding. > The patch transforms character array defined in an initializer to string > constant: > > +const char global[] = {'a', 'b', 'c', 'd', '\0'}; > > Apart from that, it also enables string folding of local symbols like: > + const char local[] = "abcd"; > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Hmm, does this handle const char global[] = {'a', [4] = 'd', 'b', [3] = '\0'}; correctly? I think you need to check that 'key' is increasing and there are no gaps (ISTR constructor elements are sorted but gaps can still appear). + || !useless_type_conversion_p (TREE_TYPE (value), char_type_node)) please instead check the element type of the array constructor. I'd also use a stricter check like TYPE_MAIN_VARIANT (type) == char_type_node to avoid changing non-strings like unsigned / signed char. Finally I'm a bit worried about doing this for all 'char' constructors. Maybe we should restrict this to those with ISPRINT () entries? Richard. > Martin
On 11/03/2016 01:12 PM, Martin Liška wrote: > + tree init = DECL_INITIAL (decl); > + if (init > + && TREE_READONLY (decl) > + && can_convert_ctor_to_string_cst (init)) > + DECL_INITIAL (decl) = build_string_cst_from_ctor (init); I'd merge these two new functions since they're only ever called together. We'd then have something like if (init && TREE_READONLY (decl)) init = convert_ctor_to_string_cst (init); if (init) DECL_INITIAL (decl) = init; I'll defer to Jan on whether finalize_decl seems like a good place to do this. > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c > index 283bd1c..b2d1fd5 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c > @@ -4,12 +4,15 @@ > char *buffer1; > char *buffer2; > > +const char global[] = {'a', 'b', 'c', 'd', '\0'}; > + > #define SIZE 1000 > > int > main (void) > { > const char* const foo1 = "hello world"; > + const char local[] = "abcd"; > > buffer1 = __builtin_malloc (SIZE); > __builtin_strcpy (buffer1, foo1); > @@ -45,6 +48,10 @@ main (void) > __builtin_abort (); > if (__builtin_memchr (foo1, null, 12) != foo1 + 11) > __builtin_abort (); > + if (__builtin_memchr (global, null, 5) == 0) > + __builtin_abort (); > + if (__builtin_memchr (local, null, 5) == 0) > + __builtin_abort (); How is that a meaningful test? This seems to work even with an unpatched gcc. I'd have expected something that shows a benefit for doing this conversion, and maybe also a test that shows it isn't done in cases where it's not allowed. > tree > -build_string_literal (int len, const char *str) > +build_string_literal (int len, const char *str, bool build_addr_expr) New arguments should be documented in the function comment. > +/* Return TRUE when CTOR can be converted to a string constant. */ "if", not "when". > + unsigned HOST_WIDE_INT elements = CONSTRUCTOR_NELTS (ctor); > + FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), idx, key, value) > + { > + if (key == NULL_TREE > + || TREE_CODE (key) != INTEGER_CST > + || !tree_fits_uhwi_p (value) > + || !useless_type_conversion_p (TREE_TYPE (value), char_type_node)) > + return false; Shouldn't all elements have the same type, or do you really have to call useless_type_conversion in a loop? > + /* Allow zero character just at the end of a string. */ > + if (integer_zerop (value)) > + return idx == elements - 1; Don't you also have to explicitly check it's there? Bernd
> On 11/03/2016 01:12 PM, Martin Liška wrote: > >+ tree init = DECL_INITIAL (decl); > >+ if (init > >+ && TREE_READONLY (decl) > >+ && can_convert_ctor_to_string_cst (init)) > >+ DECL_INITIAL (decl) = build_string_cst_from_ctor (init); > > I'd merge these two new functions since they're only ever called > together. We'd then have something like > > if (init && TREE_READONLY (decl)) > init = convert_ctor_to_string_cst (init); > if (init) > DECL_INITIAL (decl) = init; > > I'll defer to Jan on whether finalize_decl seems like a good place > to do this. I think finalize_decl may be bit too early because frontends may expects the ctors to be in a way they produced them. We only want to convert those arrays seen by middle-end so I would move the logic to varpool_node::analyze Otherwise the patch seems fine to me. Honza > > >diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c > >index 283bd1c..b2d1fd5 100644 > >--- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c > >+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c > >@@ -4,12 +4,15 @@ > > char *buffer1; > > char *buffer2; > > > >+const char global[] = {'a', 'b', 'c', 'd', '\0'}; > >+ > > #define SIZE 1000 > > > > int > > main (void) > > { > > const char* const foo1 = "hello world"; > >+ const char local[] = "abcd"; > > > > buffer1 = __builtin_malloc (SIZE); > > __builtin_strcpy (buffer1, foo1); > >@@ -45,6 +48,10 @@ main (void) > > __builtin_abort (); > > if (__builtin_memchr (foo1, null, 12) != foo1 + 11) > > __builtin_abort (); > >+ if (__builtin_memchr (global, null, 5) == 0) > >+ __builtin_abort (); > >+ if (__builtin_memchr (local, null, 5) == 0) > >+ __builtin_abort (); > > How is that a meaningful test? This seems to work even with an > unpatched gcc. I'd have expected something that shows a benefit for > doing this conversion, and maybe also a test that shows it isn't > done in cases where it's not allowed. > > > tree > >-build_string_literal (int len, const char *str) > >+build_string_literal (int len, const char *str, bool build_addr_expr) > > New arguments should be documented in the function comment. > > >+/* Return TRUE when CTOR can be converted to a string constant. */ > > "if", not "when". > > >+ unsigned HOST_WIDE_INT elements = CONSTRUCTOR_NELTS (ctor); > >+ FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), idx, key, value) > >+ { > >+ if (key == NULL_TREE > >+ || TREE_CODE (key) != INTEGER_CST > >+ || !tree_fits_uhwi_p (value) > >+ || !useless_type_conversion_p (TREE_TYPE (value), char_type_node)) > >+ return false; > > Shouldn't all elements have the same type, or do you really have to > call useless_type_conversion in a loop? > > >+ /* Allow zero character just at the end of a string. */ > >+ if (integer_zerop (value)) > >+ return idx == elements - 1; > > Don't you also have to explicitly check it's there? > > > Bernd
On 11/03/2016 01:46 PM, Richard Biener wrote: > On Thu, Nov 3, 2016 at 1:12 PM, Martin Liška <mliska@suse.cz> wrote: >> Hello. >> >> This is small follow-up of the patches I sent to string folding. >> The patch transforms character array defined in an initializer to string >> constant: >> >> +const char global[] = {'a', 'b', 'c', 'd', '\0'}; >> >> Apart from that, it also enables string folding of local symbols like: >> + const char local[] = "abcd"; >> >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Hmm, does this handle > > const char global[] = {'a', [4] = 'd', 'b', [3] = '\0'}; > > correctly? It hasn't, fixed in the v2. As ctor indices are sorted in increasing order, I only check that there are no holes. > > I think you need to check that 'key' is increasing and there are no gaps > (ISTR constructor elements are sorted but gaps can still appear). > > + || !useless_type_conversion_p (TREE_TYPE (value), char_type_node)) > > please instead check the element type of the array constructor. I'd also > use a stricter check like TYPE_MAIN_VARIANT (type) == char_type_node > to avoid changing non-strings like unsigned / signed char. Ok, v2 does: TYPE_MAIN_VARIANT (type) == char_type_node > > Finally I'm a bit worried about doing this for all 'char' > constructors. Maybe we > should restrict this to those with ISPRINT () entries? Also done for all characters except the trailing \0 character. I'll send patch in the next email. Martin > > Richard. > > >> Martin
From ecd2b614072f55e71896f7f5bf290072b3a4b04c Mon Sep 17 00:00:00 2001 From: marxin <mliska@suse.cz> Date: Mon, 17 Oct 2016 15:24:46 +0200 Subject: [PATCH] Convert character arrays to string csts gcc/testsuite/ChangeLog: 2016-10-24 Martin Liska <mliska@suse.cz> * gcc.dg/tree-ssa/builtins-folding-gimple.c (main): Add new tests. gcc/ChangeLog: 2016-10-24 Martin Liska <mliska@suse.cz> * cgraphunit.c (varpool_node::finalize_decl): Convert ctors to string constants if possible. * gimplify.c (gimplify_decl_expr): Emit INIT_EXPR just if it cannot be converted to a string constant. (gimplify_init_constructor): Create string constant for local variables (if possible). * tree.c (build_string_cst_from_ctor): New function. (build_string_literal): Add new argument. (can_convert_ctor_to_string_cst): New function. * tree.h: Declare new functions. * varpool.c (ctor_for_folding): Support automatic variables. --- gcc/cgraphunit.c | 6 ++ gcc/gimplify.c | 14 ++++- .../gcc.dg/tree-ssa/builtins-folding-gimple.c | 7 +++ gcc/tree.c | 68 ++++++++++++++++++++-- gcc/tree.h | 6 +- gcc/varpool.c | 8 ++- 6 files changed, 100 insertions(+), 9 deletions(-) diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index e315a77..bc041d6 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -791,6 +791,12 @@ process_function_and_variable_attributes (cgraph_node *first, void varpool_node::finalize_decl (tree decl) { + tree init = DECL_INITIAL (decl); + if (init + && TREE_READONLY (decl) + && can_convert_ctor_to_string_cst (init)) + DECL_INITIAL (decl) = build_string_cst_from_ctor (init); + varpool_node *node = varpool_node::get_create (decl); gcc_assert (TREE_STATIC (decl) || DECL_EXTERNAL (decl)); diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 1531582..4520843 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -1495,7 +1495,8 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p) { if (!TREE_STATIC (decl)) { - DECL_INITIAL (decl) = NULL_TREE; + if (!TREE_READONLY (decl) || TREE_CODE (init) != STRING_CST) + DECL_INITIAL (decl) = NULL_TREE; init = build2 (INIT_EXPR, void_type_node, decl, init); gimplify_and_add (init, seq_p); ggc_free (init); @@ -4438,6 +4439,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, break; } + /* Replace a ctor with a string constant with possible. */ + if (TREE_READONLY (object) + && VAR_P (object) + && can_convert_ctor_to_string_cst (ctor)) + { + ctor = build_string_cst_from_ctor (ctor); + TREE_OPERAND (*expr_p, 1) = ctor; + DECL_INITIAL (object) = ctor; + break; + } + /* Fetch information about the constructor to direct later processing. We might want to make static versions of it in various cases, and can only do so if it known to be a valid constant initializer. */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c index 283bd1c..b2d1fd5 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c @@ -4,12 +4,15 @@ char *buffer1; char *buffer2; +const char global[] = {'a', 'b', 'c', 'd', '\0'}; + #define SIZE 1000 int main (void) { const char* const foo1 = "hello world"; + const char local[] = "abcd"; buffer1 = __builtin_malloc (SIZE); __builtin_strcpy (buffer1, foo1); @@ -45,6 +48,10 @@ main (void) __builtin_abort (); if (__builtin_memchr (foo1, null, 12) != foo1 + 11) __builtin_abort (); + if (__builtin_memchr (global, null, 5) == 0) + __builtin_abort (); + if (__builtin_memchr (local, null, 5) == 0) + __builtin_abort (); __builtin_memchr (foo1, x, 11); __builtin_memchr (buffer1, x, zero); diff --git a/gcc/tree.c b/gcc/tree.c index 56cc653..dcf767f 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -1784,6 +1784,26 @@ build_vector_from_val (tree vectype, tree sc) } } +/* Build string constant from a constructor CTOR. */ + +tree +build_string_cst_from_ctor (tree ctor) +{ + unsigned HOST_WIDE_INT idx; + tree value; + vec<constructor_elt, va_gc> *elts = CONSTRUCTOR_ELTS (ctor); + char *str = XNEWVEC (char, elts->length ()); + + FOR_EACH_CONSTRUCTOR_VALUE (elts, idx, value) + str[idx] = (char)tree_to_uhwi (value); + + tree init = build_string_literal (elts->length (), + ggc_alloc_string (str, elts->length ()), + false); + free (str); + return init; +} + /* Something has messed with the elements of CONSTRUCTOR C after it was built; calculate TREE_CONSTANT and TREE_SIDE_EFFECTS. */ @@ -11344,7 +11364,7 @@ maybe_build_call_expr_loc (location_t loc, combined_fn fn, tree type, /* Create a new constant string literal and return a char* pointer to it. The STRING_CST value is the LEN characters at STR. */ tree -build_string_literal (int len, const char *str) +build_string_literal (int len, const char *str, bool build_addr_expr) { tree t, elem, index, type; @@ -11357,10 +11377,14 @@ build_string_literal (int len, const char *str) TREE_READONLY (t) = 1; TREE_STATIC (t) = 1; - type = build_pointer_type (elem); - t = build1 (ADDR_EXPR, type, - build4 (ARRAY_REF, elem, - t, integer_zero_node, NULL_TREE, NULL_TREE)); + if (build_addr_expr) + { + type = build_pointer_type (elem); + t = build1 (ADDR_EXPR, type, + build4 (ARRAY_REF, elem, + t, integer_zero_node, NULL_TREE, NULL_TREE)); + } + return t; } @@ -14217,6 +14241,40 @@ combined_fn_name (combined_fn fn) return internal_fn_name (as_internal_fn (fn)); } +/* Return TRUE when CTOR can be converted to a string constant. */ + +bool +can_convert_ctor_to_string_cst (tree ctor) +{ + unsigned HOST_WIDE_INT idx; + tree value, key; + + if (TREE_CODE (ctor) != CONSTRUCTOR + || TREE_CODE (TREE_TYPE (ctor)) != ARRAY_TYPE) + return false; + + tree t = TREE_TYPE (TREE_TYPE (ctor)); + if (TREE_CODE (TYPE_SIZE (t)) != INTEGER_CST + || tree_to_uhwi (TYPE_SIZE (t)) != BITS_PER_UNIT) + return false; + + unsigned HOST_WIDE_INT elements = CONSTRUCTOR_NELTS (ctor); + FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), idx, key, value) + { + if (key == NULL_TREE + || TREE_CODE (key) != INTEGER_CST + || !tree_fits_uhwi_p (value) + || !useless_type_conversion_p (TREE_TYPE (value), char_type_node)) + return false; + + /* Allow zero character just at the end of a string. */ + if (integer_zerop (value)) + return idx == elements - 1; + } + + return false; +} + #if CHECKING_P namespace selftest { diff --git a/gcc/tree.h b/gcc/tree.h index 531bc5e..ea5d184 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -3975,6 +3975,8 @@ extern tree build_vector_stat (tree, tree * MEM_STAT_DECL); #define build_vector(t,v) build_vector_stat (t, v MEM_STAT_INFO) extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *); extern tree build_vector_from_val (tree, tree); +extern tree build_string_cst_from_ctor (tree); +extern tree build_vector_from_val (tree, tree); extern void recompute_constructor_flags (tree); extern void verify_constructor_flags (tree); extern tree build_constructor (tree, vec<constructor_elt, va_gc> *); @@ -4022,7 +4024,8 @@ extern tree build_call_expr_internal_loc_array (location_t, enum internal_fn, tree, int, const tree *); extern tree maybe_build_call_expr_loc (location_t, combined_fn, tree, int, ...); -extern tree build_string_literal (int, const char *); +extern tree build_string_literal (int len, const char *str, + bool build_addr_expr = true); /* Construct various nodes representing data types. */ @@ -4688,6 +4691,7 @@ extern void assign_assembler_name_if_neeeded (tree); extern void warn_deprecated_use (tree, tree); extern void cache_integer_cst (tree); extern const char *combined_fn_name (combined_fn); +extern bool can_convert_ctor_to_string_cst (tree ctor); /* Compare and hash for any structure which begins with a canonical pointer. Assumes all pointers are interchangeable, which is sort diff --git a/gcc/varpool.c b/gcc/varpool.c index 78969d2..6c9d067 100644 --- a/gcc/varpool.c +++ b/gcc/varpool.c @@ -412,11 +412,15 @@ ctor_for_folding (tree decl) return error_mark_node; /* Do not care about automatic variables. Those are never initialized - anyway, because gimplifier exapnds the code. */ + anyway, because gimplifier expands the code. */ if (!TREE_STATIC (decl) && !DECL_EXTERNAL (decl)) { gcc_assert (!TREE_PUBLIC (decl)); - return error_mark_node; + if (!TREE_READONLY (decl) || TREE_THIS_VOLATILE (decl)) + return error_mark_node; + + tree ctor = DECL_INITIAL (decl); + return ctor == NULL_TREE ? error_mark_node : ctor; } gcc_assert (VAR_P (decl)); -- 2.10.1