From patchwork Wed Jul 20 09:51:32 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Greenhalgh X-Patchwork-Id: 72410 Delivered-To: patch@linaro.org Received: by 10.140.29.52 with SMTP id a49csp553564qga; Wed, 20 Jul 2016 02:52:16 -0700 (PDT) X-Received: by 10.98.2.208 with SMTP id 199mr63609584pfc.80.1469008336779; Wed, 20 Jul 2016 02:52:16 -0700 (PDT) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id gc6si2597579pab.18.2016.07.20.02.52.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 20 Jul 2016 02:52:16 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-432035-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-432035-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-432035-patch=linaro.org@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type; q=dns; s=default; b=xJou5VwlPHLYWJmI 7GizTnW/0aM1cb3Nvkh8SOyHCBbq0UsjgiEGs69p9cuNQhhHiFVTiuKzBtBqsRJa yls06NuIOXOwnbd1R4zW0YQ2QIP1bwtC7icc8Cv+oco4MAULbQR9i9DvxPvoeWaM qWFJKeDNjPdcSuEPgRG47KGaPFQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type; s=default; bh=f7vk546Ewy//79OQ2OSLxM e2QDw=; b=QjbzD0OygXbLZpiPINdGu5Eqrn0itBgehw23xCdDOtPseM9fv8XDZb rXWhS5ehgicLaeXjNoZWstjf09rwPMlouxNmEq4WevzxmTaam6uGRmqB6RS6k5jr bs1YjoE//y3I7LaZMEfKpZZsFjab0UUY30r/0KcyxB+UIk1GwEXaI= Received: (qmail 62122 invoked by alias); 20 Jul 2016 09:52:03 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 61249 invoked by uid 89); 20 Jul 2016 09:52:01 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.2 required=5.0 tests=AWL, BAYES_50, SPF_PASS autolearn=ham version=3.3.2 spammy=Hook, H*RU:15.1.523.9, Hx-spam-relays-external:15.1.523.9, costing X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 20 Jul 2016 09:51:51 +0000 Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-ve1eur01lp0245.outbound.protection.outlook.com [213.199.154.245]) (Using TLS) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-13-p6l2AP2WPC-ZbX2FIVKK0w-1; Wed, 20 Jul 2016 10:51:42 +0100 Received: from DB6PR0801CA0030.eurprd08.prod.outlook.com (10.165.173.168) by AM4PR0801MB1428.eurprd08.prod.outlook.com (10.168.5.20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.544.10; Wed, 20 Jul 2016 09:51:40 +0000 Received: from DB3FFO11FD007.protection.gbl (2a01:111:f400:7e04::172) by DB6PR0801CA0030.outlook.office365.com (2603:10a6:4:2::40) with Microsoft SMTP Server (TLS) id 15.1.539.14 via Frontend Transport; Wed, 20 Jul 2016 09:51:40 +0000 Received: from nebula.arm.com (217.140.96.140) by DB3FFO11FD007.mail.protection.outlook.com (10.47.216.96) with Microsoft SMTP Server (TLS) id 15.1.523.9 via Frontend Transport; Wed, 20 Jul 2016 09:51:40 +0000 Received: from e107456-lin.cambridge.arm.com (10.1.2.79) by mail.arm.com (10.1.105.66) with Microsoft SMTP Server id 14.3.294.0; Wed, 20 Jul 2016 10:51:38 +0100 From: James Greenhalgh To: CC: , , , , , , , Subject: [Re: RFC: Patch 1/2 v3] New target hook: max_noce_ifcvt_seq_cost Date: Wed, 20 Jul 2016 10:51:32 +0100 Message-ID: <1469008295-28884-1-git-send-email-james.greenhalgh@arm.com> In-Reply-To: <0f3c74c6-0c1b-f2df-77ce-a2ffc112583d@redhat.com> References: <0f3c74c6-0c1b-f2df-77ce-a2ffc112583d@redhat.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-Office365-Filtering-HT: Tenant X-Forefront-Antispam-Report: CIP:217.140.96.140; IPV:CAL; SCL:-1; CTRY:GB; EFV:NLI; SFV:NSPM; SFS:(10009020)(6009001)(7916002)(2980300002)(438002)(199003)(189002)(24454002)(377454003)(377424004)(2906002)(50986999)(84326002)(76176999)(87936001)(33646002)(586003)(106466001)(86362001)(5890100001)(36756003)(2351001)(229853001)(77096005)(568964002)(104016004)(356003)(4326007)(6806005)(2950100001)(11100500001)(246002)(189998001)(2476003)(26826002)(110136002)(8936002)(4610100001)(19580405001)(512874002)(92566002)(50226002)(305945005)(19580395003)(7846002)(5003600100003)(7696003)(8676002); DIR:OUT; SFP:1101; SCL:1; SRVR:AM4PR0801MB1428; H:nebula.arm.com; FPR:; SPF:Pass; PTR:fw-tnat.cambridge.arm.com; A:1; MX:1; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; DB3FFO11FD007; 1:ASd3LBo/I1ZFwE63agAUbmu0sJFQQmalhHHg6Iorf/h9YYkw404RdkjNoA5iFSbZJ4wpwDXZSYr0WhYKRnDsUyPy9J7LzQUGV1dRz5VY0ZL1XnaDM11Z1aA/mB7QggnVO2KZPMnAfaofSdhuZl3L2ZXUOQRKLjaGxnoMYpCtw7HBNRW+RSoQGDxpJoMfLl2crSM3FpeT7bxYCYl2N3q5UN1vdYJkD5xAtMcupBo/Q9b4wBhJy1++xaWjA19WsK0XL2F/LFrlCuCaz9GzvSshxksavbhMRAVwWih7xHTromqOYToDdLwjgYdWOnHAAlnwELjYlztCuN8OFnGB7BjbXPhaDNiwOUIcFcnGbf0TMV2DTLrxzuwj1jjGjdlLH6r/ad5sNm/7n/u5uofnPExX0t2NRhR5MuA4oMiOoSUe6aVuCT5Ey247p6jz2VqeKDkD2s1mRh447zT6kir7mCyUWvscdfX8s6/LUzbWQGqkV63gVVNw2QCRjiCFIz6GaC/a9iap1FU5I9KzedtB67UjBysOB8pWvIDZnHacjAgGPQMP/fjP68N/nXodrfijiyt+Ab2YimmYLkj3RTYYMYytZQ== X-MS-Office365-Filtering-Correlation-Id: 90196d18-b79e-4113-ef76-08d3b0837314 X-Microsoft-Exchange-Diagnostics: 1; AM4PR0801MB1428; 2:g6yV0FU+nzFgolT7JdgGcADLdN1JGQoGLEEehCCK6gKXdR1sahQULlwSsyZ/iloWofV+H9IlAn1aMWmgtJM/ZFKnE8+rIx6D51VqpkPDteRTD/czHripBuk0SjMp4fzfmvSC1Hs3BZKEjI51m0e4iYlE0Ww7gsP4YPIauN826UbB/YqzjeHZ0dKRmZdE2n9k; 3:J1mIxvQCaJ2Rb3DvABTNyX1fbTmw2yLxEbx70fIqXHTBIRGsUROjWv+moBWpquWVE6EXQGrTIF51Zg6eFzV5NXusVfuZJOmJ2nH5uaRaUC4pfXy2dfNssiBzOCP5wy3UxE8j9Q0Eg1Pq9sXDIQ2zRh4FbvTcUI6qvf9Y4ih+FpoUOgVCOEND3ZW3TRgBKVQCCKRiuGM1LPXW23ir9EdGxtiFHJC5oHqoHTQyNMWTqn1pbPme0ahGsYUxSEWq6TVqJZ+5ixFgF8qXBK62ndpF9Q== X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(8251501002); SRVR:AM4PR0801MB1428; X-Microsoft-Exchange-Diagnostics: 1; AM4PR0801MB1428; 25:Gk93LaWusZeyhn7z8ju+n2Qr+bfPlJRn4pgn/mbaKNxvKWOsbWr7fb2mTRXBYapnyiFGMU38fq4oM7YRErFt2EwtwlV4aY3/lxq+X44WMoOkfum8bknvSpHWRelyO1I+HB26xPK1EdIvDLWEWlPwRrVNmLbmivh8ulO1ZsZxV22v/rrQvMjGQb8L8atpI1xj3BH4JoiyZZnBa9nduBt+Z971Fb4lGwB+ZJPgU5Za/+p7dFTlePaFuEtoETQiqX1CmFFs0MBjb2d5fOUSMBihk2po4FmSNwo9LwoIjUP9Abtsi+lsoAsi6BqJKVqAeI7kMk/dm/R3VaKOZpI17A2eJSH0KgGi9k2Lai//uDL8T8J4cjhHfscRDQaPpsQUxi3Qb59xS5YGF12eyRECACoVex2lK1MgbrP7k0VKn4D4KrL+GanOKY5KQZprh7I54Fcy99R2uDnZsBxqCNKLQ8wxZslMACrl2SgFAFriJOBpMCqhmU/bJw1Cuqt20JFSPlsdX7mr5uR+SID0dz64FKhGKjxS8GchCFE4FZMPEHJ/O3QgAnl+cuzoGnuwRoFsAgpe0FCE7IbvMeV2Wze48EHyK9NGdylfee37nojsqcN9/sg4RN4wtrGW46U0+9iL4h9PvLqtglGYgsDVidF9+2RTFds+zrSJ0AGTZkBukGNZeIul/1hvn1BS1u82lB3o9751Fy60o4PCEKTjCaC5Ue3c5CUaSTGBm87liQszRAuAScF/5xgucLPjAK88gfi7WkW42S5ZjOGA/Q2OeblqQhv94c15mdh6pcK32bF+uVD0wOecIsl8r7fq9mvRUjlVi5Hbz32Pz6RBbZUUd+e/Qmv0m4ojhw6ARWEWgbKCBiFOU5dWcz1DJerHMU66F4eElHiOHKiAycT0jJ5gLm6B1a+Fia/0diiepypUacEH8A0UuVOUjtz+BDU78PxLUz2OabRx X-Microsoft-Exchange-Diagnostics: 1; AM4PR0801MB1428; 31:B52cPIaJYrGbYsI2VFtmFF5asn4YjK4nxSkOnKARnamXsHiYliba2ICiMz3gFoMVOnYg8lw6kY2UYin2XCaltJZfbJiKSovDcBW/ggo5niibILqbzapMwsqnOzZVM5N4mZTeO9QE0HlvdORNaHFRblGcy4JpA+ifvt08BKon0A2gOtm5g5WnSQRuQL30dasHk/y6dWgJMgiGB1DdpWiMrw==; 20:BEN/XsWzju9nkuYJ0tfaXOg0UabK1WixG8sse5l3tFelZvg+F6ey0uBEm3NaayIrwi1WIrQ6c4PXL5LwIu/Z5lVzaba1x6XAGxPFiDPB5mNU6yYdmFK9LbbioZXVx7eRTUvLk2A0apU9tC/3xMej/dMZqW74D039qjlGzXyehdr1tre1EVkUn+Br0LL70CE31ovM1OReVxEHQllOl/hjYzf4pRdA9MkCbVt6TBfp2MzdCgynxiNc6o309D0dGmJj NoDisclaimer: True X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(180628864354917); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(102415321)(601004)(2401047)(13023025)(13024025)(13020025)(13013025)(5005006)(8121501046)(10201501046)(3002001)(6055026); SRVR:AM4PR0801MB1428; BCL:0; PCL:0; RULEID:; SRVR:AM4PR0801MB1428; X-Microsoft-Exchange-Diagnostics: 1; AM4PR0801MB1428; 4:iZ6pnPLbhhKQULZv7GRw2/NFCxKyb7L9jqxSOp6IUwWzIb0Pio4kSvl2d1mV1Tt25IziLm+O5uS9JQYCmrGAQFEYQAg1J4hXR/UfXDKyq488/eNX6CDQP+Imwk1dT4OD4kZ/LWUQZCvqWa7u0yg2ejYYHeJhBo5UFYdFKU8ymj1C5dKV2FKlpzVAVCM+NGerO/Dt610cYChTYzqUMy8mG5hpL/664crB1E6+YRq602ILr3S3/UQOaasHKkwQ+V5O41KzQCTKP6tpc2nSao8teYWVcbF2itB2NcaM45LLvSodJ/OnxSI/AiEifDBhWmN95znglwYWYTwErOQV7uLhF2+NL6eRg45+sYSKTM+JzpBkOSytSk0ciEe/MBc5YBZf6MtNptJLZTInEg6z6WNJKAOa4S04ls8Ho6jSLfZCk8UGm7IEIczN7x3d4QcDUFwP9sIfu4JyK/7tMf0NLdku1kEn9XEoVRnbUfXlRR3T2toobWt0YZUrkc3AZtVnwnOPk1/oROROGCm1qdAGMdeVZw== X-Forefront-PRVS: 000947967F X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; AM4PR0801MB1428; 23:pvgeTy6Ka+F9NRtrNp21SgnTxT8xY/8xmca82sM?= =?us-ascii?Q?TLDFBudc9qFm0YGFj6qSmWfU/GtSfijwurbkScX09bf5gpkKHCgpQi4xMCOl?= =?us-ascii?Q?5ygQC1YUiW+Vm5WCg1DSVYz303ZylBo63rxF5Qqbf7KuOnFf+/MHSF8cL0SK?= =?us-ascii?Q?xdVvrZJCXzhMUIj9F1FZHrtVZySHGAmtSEZ1O1/QNiwAcawOldIRJze0qRo7?= =?us-ascii?Q?PDDrEvFB0bEaC5XhMApfiGgPkoE++3lP4PLyG8bHpJk5tTXL27r5RfN/KskS?= =?us-ascii?Q?GEZOz+ArsYnSGdSY/Pv3we8JS9SeAShMSNvvmkqWCIs50wHnzjPrB744zVDR?= =?us-ascii?Q?cd0eFfGmL/N05OOhyyw5gFZwufg/qHrPxWA/CYeBvQ/vEZ/OcEAvDOySKWPM?= =?us-ascii?Q?32lvK8yvkKj6iXtumEel7rJUX4uEZLGtfv+cdTdP5kb47kVeET1oGRBpXKJp?= =?us-ascii?Q?tjx65TFhUoybgPS5imCgOO2gZfS8YqVKSpLVNhh4qQud3dU0E8FRG5A7MVLT?= =?us-ascii?Q?WzwDr+uYbxMyxi7pu0zLP8s6zcBcEgkDUut9b/OqnzNfIp3Ojp5HSIbUy07R?= =?us-ascii?Q?q9abiSrt3A6T/0kF/it4wb7GsxalNuvC988A606wvwITgIxhzN9b5auCk13M?= =?us-ascii?Q?675gvT454tsxW8rFvAcuWTaYMomqiHafvld/bQCBgmqdLC2C0AyRGwqpcYhQ?= =?us-ascii?Q?p2CE/d6ZH60gqhPqJSn5JQwMvVL9afbgSpS4rl2WomUJu+Z/m/JU93Z/GuMs?= =?us-ascii?Q?CUWBDdO3x96Urm/9y0dPf06meZKB0/AhMXNnW2DZP9Ox6rUk9eKZDympoLTz?= =?us-ascii?Q?2PDeAwGiLBxJHhACD9so8vgHa/PLYz5+9UoF2E98izEnVOAi9iduhOwXcgL3?= =?us-ascii?Q?/ZXycpYLJX7aEY/qg41/cp1Lx7CdLNV0LZF0vpXYnelHKGSPQZ33aq/WLryC?= =?us-ascii?Q?leyFgxs4+syIaRkJDUh0KcVsRS6S+hxguaQxo+1mK8gNigy+J5JdRCzCW/v8?= =?us-ascii?Q?m9dbiZAqqjvhMtMCcvtmpxItxO3nIFOLzY+Qo3jlQveyCvNdk5U6gAt5RJFZ?= =?us-ascii?Q?Y+phExebUWdATKhDYC7PgkPiqLVv2IrRYIV1LHk+Nm1TFF8cU8K5kKU5Mxtr?= =?us-ascii?Q?xDKKTY75b3MMy5fgQPEetkmvtRnEdTx9lVAQoXrSURyby4QRa0BGnUDhIhH2?= =?us-ascii?Q?JDJIEUgfz8BK2zas=3D?= X-Microsoft-Exchange-Diagnostics: 1; AM4PR0801MB1428; 6:95DW9CViuvTPl7YTvaQx7uomctheSxc+kh0dojHqphq8nH7dz7vic3z8Mt5PeruoFFqK4IH8uESxAdkKPPTg3EJR3sbXA3MCBjAWFuuJA1lZELrkSjpWMT1nrF2EbfKQ26YPqGdwkuxaNapKDznoTcJB/P6UUaX3Nh2IYYdJ2KOENk8Cmgx2M/jLxRz1ItgR1mWjN0pBj9G+B7YjYetPv5uBSXcGd+aMjiqvOvQES0wznYiAzneFbQDIu5YZD8ZN/kAzT4KIjnsmFBEyPYa4uBaFRdIgo2BCxOmykF026c1mi/pcVoZOKPIny3oKrM1RnmWOHtjvTy52swKRzl+reg==; 5:64xilpMs/Wi5R334Gr9+d32Nw0TfYBZ6M2dWuNW9gwzhv/nzv9O315CPpvPAWB3NGiz+hJY7AhEEsz//M5Kb2R1nBBmVr6KtzTCSEXSAKGHgMNaC7hjUvcK6UWzxIB0V3fncS3O9zyHlAWOH5ALWbg==; 24:1A1Bh80cgI9yNlEt+1CJpeIN2KGDoFTa9B9B3q4cdRPHmhv85RCcAwWEYye29QXHXjW5sl/s5/q3hmf7sTPtylGd4OAxLR9B8sMkFCJQ/Gc=; 7:sGDEM0yqU5jzaUhIf3LAVF7jxsDBJtr2BXyU6QfeA0LUxHHkwmMMoiZ4izDaUZ+7Iiac4wYK/vFzvI9S2ofm16VoMU9/0lQOFhRrQzl9eEbi7xz3KTi+tuwoWUXuFc4hF4YrB1v6wMJyCQkcAAG6afrLf5nCB61cmcvb9jKiY/VVzvbFGQmKT9KGULZYFr/BNIKaoKfL9cNFt7plX5BB0tbXBYrR+uzQ4J7J/amtnkIvqsvZsuPLjssru9BWCiQF SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; AM4PR0801MB1428; 20:lUE8vzGf6Q1n4TLldPyEHJ1xgEPS9M3VofpB9w788v+6o2P4se5SrdWXpt365sNjcXYFx/V7yh7U6OD1HSvuNaqJSdkoLdF7FHM0mqh+FBo/W9FdVY1hdiNMDe0zTObT9Itrdvjtc61l4A4F+XEo5ULTDTBI+23+0l4Y1s1xnqY= X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Jul 2016 09:51:40.4832 (UTC) X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[217.140.96.140]; Helo=[nebula.arm.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR0801MB1428 X-MC-Unique: p6l2AP2WPC-ZbX2FIVKK0w-1 X-IsSubscribed: yes Splicing replies to Bernd, Bernhard and Jeff. Jeff, thanks for reviewing the patch set, I appreciate the ack, though I've held off committing while I was working through Bernd's criticism of the size cost model that this patch introduced and trying to get that right. Sorry to cause extra reviewing work, but I have respun the patch set to try to improve the consistency of how we're costing things, and to better handle the size cases. I'm a bit happier with how it has turned out and I think the approach is now a little easier to justify. Hopefully it will still be acceptable for trunk. There are essentially two families of cost models in this file. The true before/after comparisons (noce_cmove_arith, noce_convert_multiple_sets), and the "magic numbers" comparisons (noce_try_store_flag_constants, noce_try_addcc, noce_try_store_flag_mask, noce_try_cmove). In the first revisions of this patch set, I refactored the magic numbers comparisons, but I didn't try to solve their "magic" as comparing two integers was a suitably fast routine, and the comparison seemed accurate enough. But the magic numbers are potentially inaccurate for a variable-length instruction architecture, and given the number of times we actually manage to spot these if-convert opportunities, the compile time overhead of moving every cost model to a before/after comparison is probably not all that high. Then we have everything going through one single function, making Additionally, if we can rework most of the costs to actually calculate the before/after costs, we can then drop the "size" case from this hook entirely - we can just look at the size of the sequences directly rather than asking the target to guess at an acceptable size growth. This is good as it will completely remove magic numbers from ifcvt and make everything dependent on a simple question to the target, when compiling for speed; "What is the maximum cost of extra execution that you'd like to see on the unconditional path?" Unfortunately, disentangling this makes it harder to layout the patch set quite as neatly as before. The changes follow the same structure, but I've had to squash all the cost changes in to patch 2/2. Fortunately these now look reasonably mechanical, and consequently the patch is not much more difficult to review. Patches 3/2 and 4/2 are not strictly needed as part of the cost model work, but they do help the cost model by performing some simplifications early. This reduces the chance of us rejecting if-conversion based on too many simple moves that a future pass would have cleared up anyway. The csibe numbers below rely on these two patches having been applied. Without them, we get a couple of decisions wrong and some files from csibe increase by < 3%. On Tue, Jun 21, 2016 at 11:30:17PM +0200, Bernhard Reutner-Fischer wrote: > > >For the default implementation, if the parameters are not set, I just > >multiply BRANCH_COST through by COSTS_N_INSNS (1) for size and > >COSTS_N_INSNS (3) for speed. I know this is not ideal, but I'm still > >short > >of ideas on how best to form the default implementation. > > How bad is it in e.g. CSiBE? I'm not completely sure I've set it up right, but these are the >0.5% size differences for an x86_64 compiler I built last Friday using -Os: Smaller: Relative size Test name 93.33 flex-2.5.31,tables_shared 94.37 teem-1.6.0-src,src/limn/qn 97.27 teem-1.6.0-src,src/nrrd/kernel 98.31 teem-1.6.0-src,src/ten/miscTen 98.60 teem-1.6.0-src,src/ell/genmat 98.69 teem-1.6.0-src,src/nrrd/measure 99.03 teem-1.6.0-src,src/ten/mod 99.04 libpng-1.2.5,pngwtran 99.08 jpeg-6b,jdcoefct 99.14 teem-1.6.0-src,src/dye/convertDye 99.15 teem-1.6.0-src,src/ten/glyph 99.16 teem-1.6.0-src,src/bane/gkmsPvg 99.20 teem-1.6.0-src,src/limn/splineEval 99.25 teem-1.6.0-src,src/nrrd/accessors 99.28 teem-1.6.0-src,src/hest/parseHest 99.33 teem-1.6.0-src,src/limn/transform 99.40 teem-1.6.0-src,src/alan/coreAlan 99.48 teem-1.6.0-src,src/air/miscAir Larger: Relative size Test name 101.43 teem-1.6.0-src,src/ten/tendEvec 101.57 teem-1.6.0-src,src/ten/tendEval However, the total size difference is indistinguishable from noise (< 0.08%). Running the same experiment with an AArch64 cross compiler, I get the following changes: Smaller: Relative size Test name 97.78 libpng-1.2.5,pngrio 98.02 libpng-1.2.5,pngwio 98.82 replaypc-0.4.0.preproc,ReplayPC 99.21 lwip-0.5.3.preproc,src/core/inet 99.48 jpeg-6b,wrppm Larger: Relative size Test name 100.52 jpeg-6b,wrbmp 100.82 libpng-1.2.5,pngwtran 100.91 zlib-1.1.4,infcodes And the overall size difference was tiny (< 0.01%). There were no >0.5% changes for the ARM port (expected as it doesn't use noce). I looked in to each of the regressions, and generally they occur where we relying on a future pass to clean up after us. This is especially true for the large x86_64 regressions, which as far as I can see are a consequence of x86_64's floating-point conditional move expanding out to bitwise operations. Taken individually, these look huge, but when you have multiple conditional moves feeding each other some of the bitwise expressions simplify and you get a size saving. We can't model that in our cost model, and in many ways we just got lucky previously. > s/precitable/predictable/ ? This, and all your other comments regarding spelling and grammar have been fixed. Thanks. On Thu, Jun 30, 2016 at 01:58:52PM +0200, Bernd Schmidt wrote: > On 06/21/2016 05:50 PM, James Greenhalgh wrote: > >For the default implementation, if the parameters are not set, I just > >multiply BRANCH_COST through by COSTS_N_INSNS (1) for size and > >COSTS_N_INSNS (3) for speed. I know this is not ideal, but I'm still short > >of ideas on how best to form the default implementation. > > Yeah, this does seem kind of arbitrary. It looks especialy odd > considering that BRANCH_COST is likely to already vary between the > size/speed cases. What's wrong with just multiplying through by > CNI(1)? > > I'm not sure we want params for this; targets should just eventually > upgrade their cost models. We've used params in the past as a migration path, they are particularly handy in this case as they allow us to override target settings when in the testsuite. Removing these would be easy, but I've left them in for now, as I like the three tiered flexibility: * Target does nothing - hook uses BRANCH_COST, * Target only needs a simple model so sets params - hook uses params * Target thinks it can do something very smart - target implements hook > >The new default causes some changes in generated conditional move sequences > >for x86_64. Whether these changes are for the better or not I can't say. > > How about arm/aarch64? I think some benchmark results might be good to have. The ARM port isn't very interesting as it has conditional execution and therefore mostly uses other paths through this file. On AArch64 I get an increase in the number of CSEL and FCSEL instructions generated when compiling Spec2006. With the current definition of BRANCH_COST for AArch64 we lose some important if-cvt opportunities, but these can be restored by setting the BRANCH_COST for predictable branches higher. With this done I see some small improvements in Spec2006, but nothing meaningful and no regressions. This is probably exactly where I want to be with this patch set - no change is a good thing. > Bernhard already pointed out some issues with the patch; I'll omit these. As mentioned above, I've fixed Bernhard's issues in this patch revision. > >+(max_noce_ifcvt_seq_cost, > >+ "This hook should return a value in the same units as\n\ > >+@code{TARGET_RTX_COSTS}, giving the maximum acceptable cost for\n\ > >+a sequence generated by the RTL if-conversion pass when conditional\n\ > >+execution is not available. > > There's still the issue that we're also replacing instructions when > doing if-conversion. Let's say in this case, > > /* Convert "if (test) x = a; else x = b", for A and B constant. > Also allow A = y + c1, B = y + c2, with a common y between A > and B. */ > > we're removing two assignments for the purposes of optimizing for > size, and one assignment when considering optimization for speed. > This needs to factor into the cost calculations somehow if we want > to do it properly. I think we can leave the hook as-is, but maybe > add documentation to the effect of "The caller should increase the > limit by the cost of whatever instructions are removed in the > transformation." Yes, I see what you mean. Hopefully I've addressed this in this patchset revision. > > >+/* Default implementation of TARGET_RTX_BRANCH_COST. */ > > Wrong name for the hook. Thanks, fixed. > >+unsigned int > >+default_max_noce_ifcvt_seq_cost (bool speed_p, edge e) > >+{ > >+ bool predictable_p = predictable_edge_p (e); > >+ /* For size, some targets like to set a BRANCH_COST of zero to disable > >+ ifcvt, continue to allow that. Then multiply through by > >+ COSTS_N_INSNS (1) so we're in a comparable base. */ > >+ > >+ if (!speed_p) > >+ return BRANCH_COST (speed_p, predictable_p) * COSTS_N_INSNS (1); > > Blank line before the comment would be more readable. Fixed by virtue of removing this code. > >+ enum compiler_param param = predictable_p > >+ ? PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST > >+ : PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST; > > When splitting expressions across multiple lines, wrap in parens so > that emacs formats them automatically. Fixed. This patch, and all others in this series bootstrapped and tested on x86_64 and aarch64 with no issues. OK? Thanks, James --- 2016-07-20 James Greenhalgh * target.def (max_noce_ifcvt_seq_cost): New. * doc/tm.texi.in (TARGET_MAX_NOCE_IFCVT_SEQ_COST): Document it. * doc/tm.texi: Regenerate. * targhooks.h (default_max_noce_ifcvt_seq_cost): New. * targhooks.c (default_max_noce_ifcvt_seq_cost): New. * params.def (PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST): New. (PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST): Likewise. * doc/invoke.texi: Document new params. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 9a4db38..94d2b48 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -8865,6 +8865,17 @@ considered for if-conversion. The default is 10, though the compiler will also use other heuristics to decide whether if-conversion is likely to be profitable. +@item max-rtl-if-conversion-predictable-cost +@item max-rtl-if-conversion-unpredictable-cost +RTL if-conversion will try to remove conditional branches around a block +and replace them with conditionally executed instructions. These parameters +give the maximum permissible cost for the sequence that would be generated +by if-conversion depending on whether the branch is statically determined +to be predictable or not. The units for this parameter are the same as +those for the GCC internal seq_cost metric. The compiler will try to +provide a reasonable default for this parameter using the BRANCH_COST +target macro. + @item max-crossjump-edges The maximum number of incoming edges to consider for cross-jumping. The algorithm used by @option{-fcrossjumping} is @math{O(N^2)} in diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index b318615..28fba6b 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6526,6 +6526,26 @@ should probably only be given to addresses with different numbers of registers on machines with lots of registers. @end deftypefn +@deftypefn {Target Hook} {unsigned int} TARGET_MAX_NOCE_IFCVT_SEQ_COST (edge @var{e}) +This hook returns a value in the same units as @code{TARGET_RTX_COSTS}, +giving the maximum acceptable cost for a sequence generated by the RTL +if-conversion pass when conditional execution is not available. +The RTL if-conversion pass attempts to convert conditional operations +that would require a branch to a series of unconditional operations and +@code{mov@var{mode}cc} insns. This hook returns the maximum cost of the +unconditional instructions and the @code{mov@var{mode}cc} insns. +RTL if-conversion is cancelled if the cost of the converted sequence +is greater than the value returned by this hook. + +@code{e} is the edge between the basic block containing the conditional +branch to the basic block which would be executed if the condition +were true. + +The default implementation of this hook uses the +@code{max-rtl-if-conversion-[un]predictable} parameters if they are set, +and uses a multiple of @code{BRANCH_COST} otherwise. +@end deftypefn + @deftypefn {Target Hook} bool TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P (void) This predicate controls the use of the eager delay slot filler to disallow speculatively executed instructions being placed in delay slots. Targets diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 1e8423c..d2b7f41 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4762,6 +4762,8 @@ Define this macro if a non-short-circuit operation produced by @hook TARGET_ADDRESS_COST +@hook TARGET_MAX_NOCE_IFCVT_SEQ_COST + @hook TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P @node Scheduling diff --git a/gcc/params.def b/gcc/params.def index b86d592..166032e 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -1222,6 +1222,20 @@ DEFPARAM (PARAM_MAX_RTL_IF_CONVERSION_INSNS, "if-conversion.", 10, 0, 99) +DEFPARAM (PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST, + "max-rtl-if-conversion-predictable-cost", + "Maximum permissible cost for the sequence that would be " + "generated by the RTL if-conversion pass for a branch that " + "is considered predictable.", + 20, 0, 200) + +DEFPARAM (PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST, + "max-rtl-if-conversion-unpredictable-cost", + "Maximum permissible cost for the sequence that would be " + "generated by the RTL if-conversion pass for a branch that " + "is considered unpredictable.", + 40, 0, 200) + DEFPARAM (PARAM_HSA_GEN_DEBUG_STORES, "hsa-gen-debug-stores", "Level of hsa debug stores verbosity", diff --git a/gcc/target.def b/gcc/target.def index a4df363..b2139ce 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -3572,6 +3572,30 @@ registers on machines with lots of registers.", int, (rtx address, machine_mode mode, addr_space_t as, bool speed), default_address_cost) +/* Give a cost, in RTX Costs units, for an edge. Like BRANCH_COST, but with + well defined units. */ +DEFHOOK +(max_noce_ifcvt_seq_cost, + "This hook returns a value in the same units as @code{TARGET_RTX_COSTS},\n\ +giving the maximum acceptable cost for a sequence generated by the RTL\n\ +if-conversion pass when conditional execution is not available.\n\ +The RTL if-conversion pass attempts to convert conditional operations\n\ +that would require a branch to a series of unconditional operations and\n\ +@code{mov@var{mode}cc} insns. This hook returns the maximum cost of the\n\ +unconditional instructions and the @code{mov@var{mode}cc} insns.\n\ +RTL if-conversion is cancelled if the cost of the converted sequence\n\ +is greater than the value returned by this hook.\n\ +\n\ +@code{e} is the edge between the basic block containing the conditional\n\ +branch to the basic block which would be executed if the condition\n\ +were true.\n\ +\n\ +The default implementation of this hook uses the\n\ +@code{max-rtl-if-conversion-[un]predictable} parameters if they are set,\n\ +and uses a multiple of @code{BRANCH_COST} otherwise.", +unsigned int, (edge e), +default_max_noce_ifcvt_seq_cost) + /* Permit speculative instructions in delay slots during delayed-branch scheduling. */ DEFHOOK diff --git a/gcc/targhooks.c b/gcc/targhooks.c index 3e089e7..08136eb 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -74,6 +74,8 @@ along with GCC; see the file COPYING3. If not see #include "intl.h" #include "opts.h" #include "gimplify.h" +#include "predict.h" +#include "params.h" bool @@ -1977,4 +1979,24 @@ default_optab_supported_p (int, machine_mode, machine_mode, optimization_type) return true; } +/* Default implementation of TARGET_MAX_NOCE_IFCVT_SEQ_COST. */ + +unsigned int +default_max_noce_ifcvt_seq_cost (edge e) +{ + bool predictable_p = predictable_edge_p (e); + + enum compiler_param param + = (predictable_p + ? PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST + : PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST); + + /* If we have a parameter set, use that, otherwise take a guess using + BRANCH_COST. */ + if (global_options_set.x_param_values[param]) + return PARAM_VALUE (param); + else + return BRANCH_COST (true, predictable_p) * COSTS_N_INSNS (3); +} + #include "gt-targhooks.h" diff --git a/gcc/targhooks.h b/gcc/targhooks.h index d6581cf..b7b5ba3 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -255,4 +255,6 @@ extern void default_setup_incoming_vararg_bounds (cumulative_args_t ca ATTRIBUTE extern bool default_optab_supported_p (int, machine_mode, machine_mode, optimization_type); +extern unsigned int default_max_noce_ifcvt_seq_cost (edge); + #endif /* GCC_TARGHOOKS_H */