From patchwork Tue Nov 15 21:34:38 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bernd Edlinger X-Patchwork-Id: 82415 Delivered-To: patch@linaro.org Received: by 10.140.97.165 with SMTP id m34csp1748442qge; Tue, 15 Nov 2016 13:35:09 -0800 (PST) X-Received: by 10.98.63.1 with SMTP id m1mr50061253pfa.31.1479245709184; Tue, 15 Nov 2016 13:35:09 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id k126si3858140pgc.284.2016.11.15.13.35.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 15 Nov 2016 13:35:09 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-441568-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-441568-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-441568-patch=linaro.org@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:references:in-reply-to:content-type :mime-version; q=dns; s=default; b=H8cbBL7Ktowae/LV+fh7KFr/ynRiO 5nMXNh0TJJKt4BWKeU4t6ZernEs41QrCYzT4i9HTHUIHvYqNTApcolhzinbIIQY0 PPyboEa6ocQWKN5P8cRqCraQA1/bnI/u59ApQw72mUQcrNJbb6PmD+yOINTQWm3A gOdd7Bii+BNZU0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:references:in-reply-to:content-type :mime-version; s=default; bh=Bnv1x+kQwHE0c/wO3iQMXurfCWg=; b=dwd NHCL0PWE5TJIoKlsGtvd8sWdvOunfzrcRu2ODx773R9AcickrkfnmaE1oteiJRhE OJwr+LaMSlrO+o5t4SUrdxAHMM/NFNHUFbvYVyMX18IYB83Rq2afRyeuFzxbigoe dUUIFnVcpb4UeoNbGhprFN+7vcKutXSPudnVEkuQ= Received: (qmail 124584 invoked by alias); 15 Nov 2016 21:34:54 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 124567 invoked by uid 89); 15 Nov 2016 21:34:52 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=tossed, 6117, simplification, HX-MS-Has-Attach:yes X-HELO: SNT004-OMC4S20.hotmail.com Received: from snt004-omc4s20.hotmail.com (HELO SNT004-OMC4S20.hotmail.com) (65.55.90.223) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 15 Nov 2016 21:34:42 +0000 Received: from EUR03-DB5-obe.outbound.protection.outlook.com ([65.55.90.200]) by SNT004-OMC4S20.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008); Tue, 15 Nov 2016 13:34:40 -0800 Received: from DB5EUR03FT010.eop-EUR03.prod.protection.outlook.com (10.152.20.58) by DB5EUR03HT108.eop-EUR03.prod.protection.outlook.com (10.152.21.204) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.721.5; Tue, 15 Nov 2016 21:34:38 +0000 Received: from AM4PR0701MB2162.eurprd07.prod.outlook.com (10.152.20.51) by DB5EUR03FT010.mail.protection.outlook.com (10.152.20.96) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.721.5 via Frontend Transport; Tue, 15 Nov 2016 21:34:38 +0000 Received: from AM4PR0701MB2162.eurprd07.prod.outlook.com ([10.167.132.147]) by AM4PR0701MB2162.eurprd07.prod.outlook.com ([10.167.132.147]) with mapi id 15.01.0734.007; Tue, 15 Nov 2016 21:34:38 +0000 From: Bernd Edlinger To: GCC Patches , Bernd Schmidt , "richard.sandiford@arm.com" Subject: Re: [PATCH] Significantly reduce memory usage of genattrtab Date: Tue, 15 Nov 2016 21:34:38 +0000 Message-ID: References: <87h9793pe5.fsf@e105548-lin.cambridge.arm.com> In-Reply-To: <87h9793pe5.fsf@e105548-lin.cambridge.arm.com> authentication-results: gcc.gnu.org; dkim=none (message not signed) header.d=none; gcc.gnu.org; dmarc=none action=none header.from=hotmail.de; x-incomingtopheadermarker: OriginalChecksum:; UpperCasedChecksum:; SizeAsReceived:7490; Count:36 x-ms-exchange-messagesentrepresentingtype: 1 x-incomingheadercount: 36 x-eopattributedmessage: 0 x-microsoft-exchange-diagnostics: 1; DB5EUR03HT108; 7:s2HHl1bjsM9OnCDnMhvSpQ1wJc3TPBbXYj+ajvuvgDeKnlP/Hou/OfOqrAVTxaLyEFzKvZsoffW3BDLMC7kV49MkR/gANc4juMVx0wVoKGiyonL2SKjdiSEcqzEnpSLkHGFCwttDb7imL/DlQi9573WP2iEm5eT324hzRHGumrstVs7RYKXwLrv62zGs5Z5dI762qZ+sMbAZtBkAKTBr7AfQZa5Q8szw2V8mQZfB0yGvtw3+HP01mHfWgM2w7/iD8+HffnCjLwSRtE1pBcvpaRAXLm0KafU4Ww8iy0MGOtuZxYVjRGlU3yUx77Gx1y+Mb06tlLGtyCcGtHEuqEJOMabMJ3PMkrXkqubDcwgiWR0= x-forefront-antispam-report: EFV:NLI; SFV:NSPM; SFS:(10019020)(98900003); DIR:OUT; SFP:1102; SCL:1; SRVR:DB5EUR03HT108; H:AM4PR0701MB2162.eurprd07.prod.outlook.com; FPR:; SPF:None; LANG:en; x-ms-office365-filtering-correlation-id: 62331604-b2f0-4b4c-612a-08d40d9f3300 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(1601124038)(1603103113)(1601125047); SRVR:DB5EUR03HT108; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(432015012)(102415395)(82015046); SRVR:DB5EUR03HT108; BCL:0; PCL:0; RULEID:; SRVR:DB5EUR03HT108; x-forefront-prvs: 012792EC17 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Nov 2016 21:34:38.4856 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5EUR03HT108 On 11/15/16 13:21, Richard Sandiford wrote: > Bernd Edlinger writes: >> Hi! >> >> The genattrtab build-tool uses way too much memory in general. >> I think there is no other build step that uses more memory. >> >> On the currently trunk it takes around 700MB to build the >> ARM latency tab files. I debugged that yesterday >> and found that this can be reduced to 8MB (!). Yes, really. >> >> So the attached patch does try really hard to hash and re-use >> all ever created rtx objects. >> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu and ARM. >> Is it OK for trunk? > > Just to check: does this produce the same output as before? > And did you notice any difference in the time genattrtab > takes to run? > The run time was in the range of 24-25s, with and without the patch. However the tables are a bit different, although that seems only to be w flaw with the ATTR_CURR_SIMPLIFIED_P which is now re-used when a matching rtx was found in the hash. As I said, the generated functions do really work, probably because just a few simplifications are missing. So it looks like I need to clear the ATTR_CURR_SIMPLIFIED_P on re-used binary ops. That I found out just by try-and-error. I can say that now the generated functions are exactly identical for i386, arm and mips. The memory and the run time did not change due to this re-hashing. > > ATTR_PERMANENT_P is supposed to guarantee that no other rtx like it exists, > so that x != y when x or y is "permanent" implies that the attributes > must be different. This lets attr_equal_p avoid a recursive walk: > > static int > attr_equal_p (rtx x, rtx y) > { > return (x == y || (! (ATTR_PERMANENT_P (x) && ATTR_PERMANENT_P (y)) > && rtx_equal_p (x, y))); > } > > Does the patch still guarantee that? > Hmm, I see. I expected that ATTR_PERMANENT_P means more or less, that it is in the hash table. I believe that a long time ago, there was a kind of garbage collection of temporary rtx objects, that needed to be copied from the temporary space to the permanent space, after the simplification was done. And then all temporary objects were just tossed away. But that was long before my time. Today everything is permanent, that is why the memory usage is unbounded. But I can fix that, by only setting ATTR_PERMANENT_P on the hashed rtx when both sub-rtx are also ATTR_PERMANENT_P. How does that new version look, is it OK? Thanks Bernd. 2016-11-15 Bernd Edlinger * genattrtab.c (attr_rtx_1): Avoid allocating new rtx objects. Clear ATTR_CURR_SIMPLIFIED_P for re-used binary rtx objects. Use DEF_ATTR_STRING for string arguments. Use RTL_HASH for integer arguments. Only set ATTR_PERMANENT_P on newly hashed rtx when all sub-rtx are also permanent. (attr_eq): Simplify. (attr_copy_rtx): Remove. (make_canonical, get_attr_value): Use attr_equal_p. (copy_boolean): Rehash NOT. (simplify_test_exp_in_temp, optimize_attrs): Remove call to attr_copy_rtx. (attr_alt_intersection, attr_alt_union, attr_alt_complement, mk_attr_alt): Rehash EQ_ATTR_ALT. (make_automaton_attrs): Use attr_eq. Index: gcc/genattrtab.c =================================================================== --- gcc/genattrtab.c (revision 242335) +++ gcc/genattrtab.c (working copy) @@ -386,6 +386,7 @@ attr_rtx_1 (enum rtx_code code, va_list p) unsigned int hashcode; struct attr_hash *h; struct obstack *old_obstack = rtl_obstack; + int permanent_p = 1; /* For each of several cases, search the hash table for an existing entry. Use that entry if one is found; otherwise create a new RTL and add it @@ -395,13 +396,8 @@ attr_rtx_1 (enum rtx_code code, va_list p) { rtx arg0 = va_arg (p, rtx); - /* A permanent object cannot point to impermanent ones. */ if (! ATTR_PERMANENT_P (arg0)) - { - rt_val = rtx_alloc (code); - XEXP (rt_val, 0) = arg0; - return rt_val; - } + permanent_p = 0; hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0)); for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next) @@ -425,14 +421,8 @@ attr_rtx_1 (enum rtx_code code, va_list p) rtx arg0 = va_arg (p, rtx); rtx arg1 = va_arg (p, rtx); - /* A permanent object cannot point to impermanent ones. */ if (! ATTR_PERMANENT_P (arg0) || ! ATTR_PERMANENT_P (arg1)) - { - rt_val = rtx_alloc (code); - XEXP (rt_val, 0) = arg0; - XEXP (rt_val, 1) = arg1; - return rt_val; - } + permanent_p = 0; hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0) + RTL_HASH (arg1)); for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next) @@ -440,7 +430,10 @@ attr_rtx_1 (enum rtx_code code, va_list p) && GET_CODE (h->u.rtl) == code && XEXP (h->u.rtl, 0) == arg0 && XEXP (h->u.rtl, 1) == arg1) - return h->u.rtl; + { + ATTR_CURR_SIMPLIFIED_P (h->u.rtl) = 0; + return h->u.rtl; + } if (h == 0) { @@ -481,6 +474,9 @@ attr_rtx_1 (enum rtx_code code, va_list p) char *arg0 = va_arg (p, char *); char *arg1 = va_arg (p, char *); + arg0 = DEF_ATTR_STRING (arg0); + arg1 = DEF_ATTR_STRING (arg1); + hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0) + RTL_HASH (arg1)); for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next) if (h->hashcode == hashcode @@ -497,6 +493,29 @@ attr_rtx_1 (enum rtx_code code, va_list p) XSTR (rt_val, 1) = arg1; } } + else if (GET_RTX_LENGTH (code) == 2 + && GET_RTX_FORMAT (code)[0] == 'i' + && GET_RTX_FORMAT (code)[1] == 'i') + { + int arg0 = va_arg (p, int); + int arg1 = va_arg (p, int); + + hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0) + RTL_HASH (arg1)); + for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next) + if (h->hashcode == hashcode + && GET_CODE (h->u.rtl) == code + && XINT (h->u.rtl, 0) == arg0 + && XINT (h->u.rtl, 1) == arg1) + return h->u.rtl; + + if (h == 0) + { + rtl_obstack = hash_obstack; + rt_val = rtx_alloc (code); + XINT (rt_val, 0) = arg0; + XINT (rt_val, 1) = arg1; + } + } else if (code == CONST_INT) { HOST_WIDE_INT arg0 = va_arg (p, HOST_WIDE_INT); @@ -552,7 +571,7 @@ attr_rtx_1 (enum rtx_code code, va_list p) rtl_obstack = old_obstack; attr_hash_add_rtx (hashcode, rt_val); - ATTR_PERMANENT_P (rt_val) = 1; + ATTR_PERMANENT_P (rt_val) = permanent_p; return rt_val; } @@ -592,7 +611,7 @@ attr_printf (unsigned int len, const char *fmt, .. static rtx attr_eq (const char *name, const char *value) { - return attr_rtx (EQ_ATTR, DEF_ATTR_STRING (name), DEF_ATTR_STRING (value)); + return attr_rtx (EQ_ATTR, name, value); } static const char * @@ -646,89 +665,6 @@ attr_equal_p (rtx x, rtx y) && rtx_equal_p (x, y))); } -/* Copy an attribute value expression, - descending to all depths, but not copying any - permanent hashed subexpressions. */ - -static rtx -attr_copy_rtx (rtx orig) -{ - rtx copy; - int i, j; - RTX_CODE code; - const char *format_ptr; - - /* No need to copy a permanent object. */ - if (ATTR_PERMANENT_P (orig)) - return orig; - - code = GET_CODE (orig); - - switch (code) - { - case REG: - CASE_CONST_ANY: - case SYMBOL_REF: - case MATCH_TEST: - case CODE_LABEL: - case PC: - case CC0: - return orig; - - default: - break; - } - - copy = rtx_alloc (code); - PUT_MODE (copy, GET_MODE (orig)); - ATTR_IND_SIMPLIFIED_P (copy) = ATTR_IND_SIMPLIFIED_P (orig); - ATTR_CURR_SIMPLIFIED_P (copy) = ATTR_CURR_SIMPLIFIED_P (orig); - ATTR_PERMANENT_P (copy) = ATTR_PERMANENT_P (orig); - - format_ptr = GET_RTX_FORMAT (GET_CODE (copy)); - - for (i = 0; i < GET_RTX_LENGTH (GET_CODE (copy)); i++) - { - switch (*format_ptr++) - { - case 'e': - XEXP (copy, i) = XEXP (orig, i); - if (XEXP (orig, i) != NULL) - XEXP (copy, i) = attr_copy_rtx (XEXP (orig, i)); - break; - - case 'E': - case 'V': - XVEC (copy, i) = XVEC (orig, i); - if (XVEC (orig, i) != NULL) - { - XVEC (copy, i) = rtvec_alloc (XVECLEN (orig, i)); - for (j = 0; j < XVECLEN (copy, i); j++) - XVECEXP (copy, i, j) = attr_copy_rtx (XVECEXP (orig, i, j)); - } - break; - - case 'n': - case 'i': - XINT (copy, i) = XINT (orig, i); - break; - - case 'w': - XWINT (copy, i) = XWINT (orig, i); - break; - - case 's': - case 'S': - XSTR (copy, i) = XSTR (orig, i); - break; - - default: - gcc_unreachable (); - } - } - return copy; -} - /* Given a test expression EXP for attribute ATTR, ensure it is validly formed. LOC is the location of the .md construct that contains EXP. @@ -1236,7 +1172,7 @@ make_canonical (file_location loc, struct attr_des XVECEXP (exp, 0, i) = copy_boolean (XVECEXP (exp, 0, i)); XVECEXP (exp, 0, i + 1) = make_canonical (loc, attr, XVECEXP (exp, 0, i + 1)); - if (! rtx_equal_p (XVECEXP (exp, 0, i + 1), defval)) + if (! attr_equal_p (XVECEXP (exp, 0, i + 1), defval)) allsame = 0; } if (allsame) @@ -1257,6 +1193,8 @@ copy_boolean (rtx exp) if (GET_CODE (exp) == AND || GET_CODE (exp) == IOR) return attr_rtx (GET_CODE (exp), copy_boolean (XEXP (exp, 0)), copy_boolean (XEXP (exp, 1))); + else if (GET_CODE (exp) == NOT) + return attr_rtx (NOT, copy_boolean (XEXP (exp, 0))); if (GET_CODE (exp) == MATCH_OPERAND) { XSTR (exp, 1) = DEF_ATTR_STRING (XSTR (exp, 1)); @@ -1298,7 +1236,7 @@ get_attr_value (file_location loc, rtx value, stru } for (av = attr->first_value; av; av = av->next) - if (rtx_equal_p (value, av->value) + if (attr_equal_p (value, av->value) && (num_alt == 0 || av->first_insn == NULL || insn_alternatives[av->first_insn->def->insn_code])) return av; @@ -2339,9 +2277,7 @@ simplify_test_exp_in_temp (rtx exp, int insn_code, rtl_obstack = temp_obstack; x = simplify_test_exp (exp, insn_code, insn_index); rtl_obstack = old; - if (x == exp || rtl_obstack == temp_obstack) - return x; - return attr_copy_rtx (x); + return x; } /* Returns true if S1 is a subset of S2. */ @@ -2397,28 +2333,27 @@ attr_alt_subset_of_compl_p (rtx s1, rtx s2) static rtx attr_alt_intersection (rtx s1, rtx s2) { - rtx result = rtx_alloc (EQ_ATTR_ALT); + int result; switch ((XINT (s1, 1) << 1) | XINT (s2, 1)) { case (0 << 1) | 0: - XINT (result, 0) = XINT (s1, 0) & XINT (s2, 0); + result = XINT (s1, 0) & XINT (s2, 0); break; case (0 << 1) | 1: - XINT (result, 0) = XINT (s1, 0) & ~XINT (s2, 0); + result = XINT (s1, 0) & ~XINT (s2, 0); break; case (1 << 1) | 0: - XINT (result, 0) = XINT (s2, 0) & ~XINT (s1, 0); + result = XINT (s2, 0) & ~XINT (s1, 0); break; case (1 << 1) | 1: - XINT (result, 0) = XINT (s1, 0) | XINT (s2, 0); + result = XINT (s1, 0) | XINT (s2, 0); break; default: gcc_unreachable (); } - XINT (result, 1) = XINT (s1, 1) & XINT (s2, 1); - return result; + return attr_rtx (EQ_ATTR_ALT, result, XINT (s1, 1) & XINT (s2, 1)); } /* Return EQ_ATTR_ALT expression representing union of S1 and S2. */ @@ -2426,28 +2361,27 @@ attr_alt_intersection (rtx s1, rtx s2) static rtx attr_alt_union (rtx s1, rtx s2) { - rtx result = rtx_alloc (EQ_ATTR_ALT); + int result; switch ((XINT (s1, 1) << 1) | XINT (s2, 1)) { case (0 << 1) | 0: - XINT (result, 0) = XINT (s1, 0) | XINT (s2, 0); + result = XINT (s1, 0) | XINT (s2, 0); break; case (0 << 1) | 1: - XINT (result, 0) = XINT (s2, 0) & ~XINT (s1, 0); + result = XINT (s2, 0) & ~XINT (s1, 0); break; case (1 << 1) | 0: - XINT (result, 0) = XINT (s1, 0) & ~XINT (s2, 0); + result = XINT (s1, 0) & ~XINT (s2, 0); break; case (1 << 1) | 1: - XINT (result, 0) = XINT (s1, 0) & XINT (s2, 0); + result = XINT (s1, 0) & XINT (s2, 0); break; default: gcc_unreachable (); } - XINT (result, 1) = XINT (s1, 1) | XINT (s2, 1); - return result; + return attr_rtx (EQ_ATTR_ALT, result, XINT (s1, 1) | XINT (s2, 1)); } /* Return EQ_ATTR_ALT expression representing complement of S. */ @@ -2455,12 +2389,7 @@ attr_alt_union (rtx s1, rtx s2) static rtx attr_alt_complement (rtx s) { - rtx result = rtx_alloc (EQ_ATTR_ALT); - - XINT (result, 0) = XINT (s, 0); - XINT (result, 1) = 1 - XINT (s, 1); - - return result; + return attr_rtx (EQ_ATTR_ALT, XINT (s, 0), 1 - XINT (s, 1)); } /* Return EQ_ATTR_ALT expression representing set containing elements set @@ -2469,12 +2398,7 @@ attr_alt_complement (rtx s) static rtx mk_attr_alt (uint64_t e) { - rtx result = rtx_alloc (EQ_ATTR_ALT); - - XINT (result, 0) = e; - XINT (result, 1) = 0; - - return result; + return attr_rtx (EQ_ATTR_ALT, (int)e, 0); } /* Given an expression, see if it can be simplified for a particular insn @@ -3045,7 +2969,6 @@ optimize_attrs (int num_insn_codes) && attr_rtx_cost (newexp) < 26 ) { - newexp = attr_copy_rtx (newexp); remove_insn_ent (av, ie); av = get_attr_value (ie->def->loc, newexp, attr, ie->def->insn_code); @@ -5004,7 +4927,7 @@ make_automaton_attrs (void) { int j; char *name; - rtx test = attr_rtx (EQ_ATTR, tune_attr->name, XSTR (val->value, 0)); + rtx test = attr_eq (tune_attr->name, XSTR (val->value, 0)); if (val == tune_attr->default_val) continue;