From patchwork Fri Nov 25 15:34:50 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Georg-Johann Lay X-Patchwork-Id: 84159 Delivered-To: patch@linaro.org Received: by 10.140.20.101 with SMTP id 92csp165252qgi; Fri, 25 Nov 2016 07:35:22 -0800 (PST) X-Received: by 10.99.5.21 with SMTP id 21mr15341427pgf.32.1480088121998; Fri, 25 Nov 2016 07:35:21 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id b61si16592905plc.299.2016.11.25.07.35.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 25 Nov 2016 07:35:21 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-442655-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-442655-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-442655-patch=linaro.org@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=oG5rUtx1jAdhYiFtd/rNdvQegQbiTb+4O7jydsZ+aeP3+vnUGx 9AqSF1Litkxu4jpoXMn4XHiWfzcSQ0f0k1/yOfVhoYgqMJO80x8CcdjdKL4UCYmO Lj4jcUI+2cVueppp6JaLPVZr2Qz23Pr/RkEWT/8wDxwP8NtmMiWbMmvco= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type; s= default; bh=BQBFuWaCxT7wCQiuNzn80LoBYeE=; b=V5GLCgkySRMIBTcfZLeh K2Cgz/fl3rG5EqUJx6aDcA3SXiu14gxWY9yMkrpW7Ey3y5cyqL9ZwG9i7HXZNZxg kPKHGOJ4KNqXiZoB97yoTPwtcu76/gncpekFnB5hVIFI+q8uyNii6Eua6h/QRZtp x4YxY7GoUlIR1d6q8Iwt/j0= Received: (qmail 129096 invoked by alias); 25 Nov 2016 15:35:05 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 129080 invoked by uid 89); 25 Nov 2016 15:35:04 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=sign_extract, extzv, likes, DONE X-HELO: mo4-p00-ob.smtp.rzone.de Received: from mo4-p00-ob.smtp.rzone.de (HELO mo4-p00-ob.smtp.rzone.de) (81.169.146.218) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 25 Nov 2016 15:34:54 +0000 X-RZG-AUTH: :LXoWVUeid/7A29J/hMvvT3ol15ykJcYwR/bcHRirORRW3yMcVao= X-RZG-CLASS-ID: mo00 Received: from [192.168.0.123] (mail.hightec-rt.com [213.135.1.215]) by smtp.strato.de (RZmta 39.9 DYNA|AUTH) with ESMTPSA id u07332sAPFYoo3a (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA (curve secp521r1 with 521 ECDH bits, eq. 15360 bits RSA)) (Client did not present a certificate); Fri, 25 Nov 2016 16:34:50 +0100 (CET) To: gcc-patches Cc: Denis Chertykov , Senthil Kumar Selvaraj , Pitchumani Sivanupandi From: Georg-Johann Lay Subject: [patch,avr]: Improve code as of PR41076 Message-ID: <87bc1067-da8f-c54a-e06f-07dd8316a6a1@gjlay.de> Date: Fri, 25 Nov 2016 16:34:50 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 X-IsSubscribed: yes Mentioned PR is about composing 16-bit values out of 8-bit values which due to integer promotions likes to work on 16-bit values. The patch adds 3 combiner patterns to catch such situations and then split after reload. Some more splitting is performed after reload for: and, ior, xor of reg-reg operations, HImode reg = 0 and 3- and 4-byte registers moves (and 2-byte moved if !MOVW). This gived better code for almost all test cases, yet some test cases could still be improved. I also tried pre-reload splits, but the generated SUBREGs seem to disturb register allocation, hence I used post-reload splits. There is also PR60145 which has a test case where a 4-byte value is composed out of 4 bytes, but the patterns to catch that are going to be insane... so for now, such 4-byte cases will still result in bloated code. Patch tested against trunk without regressions. Ok for trunk? Johann PR 41076 * config/avr/avr.md SPLIT34): New mode iterator. (bitop): New code iterator. (*iorhi3.ashift8-*). New insn-and-split patterns. (*movhi): Post-reload split reg = 0. (*movhi) [!MOVW]: Post-reload split reg = reg. (*mov) [SI,SF,PSI,SQ,USQ,SA,USA]: Post-reload split reg = reg. (andhi3, andpsi3, andsi3): Post-reload split reg-reg operations. (iorhi3, iorpsi3, iorsi3): Same. (xorhi3, xorpsi3, xorsi3): Same. * config/avr/avr.c (avr_rtx_costs_1) [IOR && HImode]: Adjust rtx costs to *iorhi3.ashift8-* patterns. Index: config/avr/avr.c =================================================================== --- config/avr/avr.c (revision 242823) +++ config/avr/avr.c (working copy) @@ -10645,6 +10645,19 @@ avr_rtx_costs_1 (rtx x, machine_mode mod /* FALLTHRU */ case AND: case IOR: + if (IOR == code + && HImode == mode + && ASHIFT == GET_CODE (XEXP (x, 0))) + { + *total = COSTS_N_INSNS (2); + // Just a rough estimate. If we see no sign- or zero-extend, + // then increase the cost a little bit. + if (REG_P (XEXP (XEXP (x, 0), 0))) + *total += COSTS_N_INSNS (1); + if (REG_P (XEXP (x, 1))) + *total += COSTS_N_INSNS (1); + return true; + } *total = COSTS_N_INSNS (GET_MODE_SIZE (mode)); *total += avr_operand_rtx_cost (XEXP (x, 0), mode, code, 0, speed); if (!CONST_INT_P (XEXP (x, 1))) Index: config/avr/avr.md =================================================================== --- config/avr/avr.md (revision 242823) +++ config/avr/avr.md (working copy) @@ -260,6 +260,10 @@ (define_mode_iterator ORDERED234 [HI SI HQ UHQ HA UHA SQ USQ SA USA]) +;; Post-reload split of 3, 4 bytes wide moves. +(define_mode_iterator SPLIT34 [SI SF PSI + SQ USQ SA USA]) + ;; Define code iterators ;; Define two incarnations so that we can build the cross product. (define_code_iterator any_extend [sign_extend zero_extend]) @@ -267,6 +271,7 @@ (define_code_iterator any_extend2 [sign_ (define_code_iterator any_extract [sign_extract zero_extract]) (define_code_iterator any_shiftrt [lshiftrt ashiftrt]) +(define_code_iterator bitop [xor ior and]) (define_code_iterator xior [xor ior]) (define_code_iterator eqne [eq ne]) @@ -3328,6 +3333,66 @@ (define_insn "xorsi3" (set_attr "adjust_len" "*,out_bitop,out_bitop") (set_attr "cc" "set_n,clobber,clobber")]) + +(define_split + [(set (match_operand:SPLIT34 0 "register_operand") + (match_operand:SPLIT34 1 "register_operand"))] + "optimize + && reload_completed" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 4) (match_dup 5))] + { + machine_mode mode_hi = 4 == GET_MODE_SIZE (mode) ? HImode : QImode; + bool lo_first = REGNO (operands[0]) < REGNO (operands[1]); + rtx dst_lo = simplify_gen_subreg (HImode, operands[0], mode, 0); + rtx src_lo = simplify_gen_subreg (HImode, operands[1], mode, 0); + rtx dst_hi = simplify_gen_subreg (mode_hi, operands[0], mode, 2); + rtx src_hi = simplify_gen_subreg (mode_hi, operands[1], mode, 2); + + operands[2] = lo_first ? dst_lo : dst_hi; + operands[3] = lo_first ? src_lo : src_hi; + operands[4] = lo_first ? dst_hi : dst_lo; + operands[5] = lo_first ? src_hi : src_lo; + }) + +(define_split + [(set (match_operand:HI 0 "register_operand") + (match_operand:HI 1 "reg_or_0_operand"))] + "optimize + && reload_completed + && (!AVR_HAVE_MOVW + || const0_rtx == operands[1])" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 4) (match_dup 5))] + { + operands[2] = simplify_gen_subreg (QImode, operands[0], HImode, 1); + operands[3] = simplify_gen_subreg (QImode, operands[1], HImode, 1); + operands[4] = simplify_gen_subreg (QImode, operands[0], HImode, 0); + operands[5] = simplify_gen_subreg (QImode, operands[1], HImode, 0); + }) + +;; Split andhi3, andpsi3, andsi3. +;; Split iorhi3, iorpsi3, iorsi3. +;; Split xorhi3, xorpsi3, xorsi3. +(define_split + [(parallel [(set (match_operand:HISI 0 "register_operand") + (bitop:HISI (match_dup 0) + (match_operand:HISI 1 "register_operand"))) + (clobber (scratch:QI))])] + "optimize + && reload_completed" + [(const_int 1)] + { + for (int i = 0; i < GET_MODE_SIZE (mode); i++) + { + rtx dst = simplify_gen_subreg (QImode, operands[0], mode, i); + rtx src = simplify_gen_subreg (QImode, operands[1], mode, i); + emit_insn (gen_qi3 (dst, dst, src)); + } + DONE; + }) + + ;; swap swap swap swap swap swap swap swap swap swap swap swap swap swap swap ;; swap @@ -6696,6 +6761,84 @@ (define_insn_and_split "*< operands[4] = simplify_gen_subreg (QImode, operands[0], mode, byteno); }) + +(define_insn_and_split "*iorhi3.ashift8-ext.zerox" + [(set (match_operand:HI 0 "register_operand" "=r") + (ior:HI (ashift:HI (any_extend:HI + (match_operand:QI 1 "register_operand" "r")) + (const_int 8)) + (zero_extend:HI (match_operand:QI 2 "register_operand" "r"))))] + "optimize" + { gcc_unreachable(); } + "&& reload_completed" + [(set (match_dup 1) (xor:QI (match_dup 1) (match_dup 2))) + (set (match_dup 2) (xor:QI (match_dup 2) (match_dup 1))) + (set (match_dup 1) (xor:QI (match_dup 1) (match_dup 2)))] + { + rtx hi = simplify_gen_subreg (QImode, operands[0], HImode, 1); + rtx lo = simplify_gen_subreg (QImode, operands[0], HImode, 0); + + if (!reg_overlap_mentioned_p (hi, operands[2])) + { + emit_move_insn (hi, operands[1]); + emit_move_insn (lo, operands[2]); + DONE; + } + else if (!reg_overlap_mentioned_p (lo, operands[1])) + { + emit_move_insn (lo, operands[2]); + emit_move_insn (hi, operands[1]); + DONE; + } + + gcc_assert (REGNO (operands[1]) == REGNO (operands[0])); + gcc_assert (REGNO (operands[2]) == 1 + REGNO (operands[0])); + }) + +(define_insn_and_split "*iorhi3.ashift8-ext.reg" + [(set (match_operand:HI 0 "register_operand" "=r") + (ior:HI (ashift:HI (any_extend:HI + (match_operand:QI 1 "register_operand" "r")) + (const_int 8)) + (match_operand:HI 2 "register_operand" "0")))] + "optimize" + { gcc_unreachable(); } + "&& reload_completed" + [(set (match_dup 3) + (ior:QI (match_dup 4) + (match_dup 1)))] + { + operands[3] = simplify_gen_subreg (QImode, operands[0], HImode, 1); + operands[4] = simplify_gen_subreg (QImode, operands[2], HImode, 1); + }) + +(define_insn_and_split "*iorhi3.ashift8-reg.zerox" + [(set (match_operand:HI 0 "register_operand" "=r") + (ior:HI (ashift:HI (match_operand:HI 1 "register_operand" "r") + (const_int 8)) + (zero_extend:HI (match_operand:QI 2 "register_operand" "0"))))] + "optimize" + { gcc_unreachable(); } + "&& reload_completed" + [(set (match_dup 3) + (match_dup 4))] + { + operands[3] = simplify_gen_subreg (QImode, operands[0], HImode, 1); + operands[4] = simplify_gen_subreg (QImode, operands[1], HImode, 0); + }) + + +(define_peephole2 + [(set (match_operand:QI 0 "register_operand") + (const_int 0)) + (set (match_dup 0) + (ior:QI (match_dup 0) + (match_operand:QI 1 "register_operand")))] + "" + [(set (match_dup 0) + (match_dup 1))]) + + (define_expand "extzv" [(set (match_operand:QI 0 "register_operand" "") (zero_extract:QI (match_operand:QI 1 "register_operand" "") @@ -6778,7 +6921,6 @@ (define_insn_and_split "*extract.subreg. operands[4] = GEN_INT (bitno % 8); }) - ;; Fixed-point instructions (include "avr-fixed.md")