From patchwork Wed Nov 30 17:01:24 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bernd Edlinger X-Patchwork-Id: 85899 Delivered-To: patch@linaro.org Received: by 10.140.20.101 with SMTP id 92csp317951qgi; Wed, 30 Nov 2016 09:01:56 -0800 (PST) X-Received: by 10.99.115.82 with SMTP id d18mr60072873pgn.56.1480525316257; Wed, 30 Nov 2016 09:01:56 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id 33si36674911ply.304.2016.11.30.09.01.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 30 Nov 2016 09:01:56 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-443094-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-443094-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-443094-patch=linaro.org@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:references:in-reply-to :content-type:mime-version; q=dns; s=default; b=CNeJU0Vrp6sGZPx6 0VHxEPX9zQZSuTVqGPaGbwP1G75z1EgyNR6ArEPwnIhF0wBYlxdz7cWX7gcvPqvV tFrmS874fA2nF1pS/3ECvKFQl67D5JhBGWZNob3EJ2FbNUTPpTiKBBVPIGGqZkLm iEtfj+2anI3hGV/FEUKhxwEQbZ0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:references:in-reply-to :content-type:mime-version; s=default; bh=ZmPvSKUNrBTg4gLcge5Q4f qjueo=; b=yUrRflARLT8NtP46va+gwNvaZQMJrYR1+jnxsmNpgPc+CMEuY0/qZ5 17B+VfyLBhcIPKRx5M4Fh8J/lUre2v47MP4RcdbYuXpyoNXybZG/tiFMLhKz2ZBf aXEmbtUJRKgnbUgNrFrwYDCKnxWu5fS9QO23yrZVPrv+H2mw2r/wA= Received: (qmail 111152 invoked by alias); 30 Nov 2016 17:01:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 111140 invoked by uid 89); 30 Nov 2016 17:01:38 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=2500, influence, Unlike, differently X-HELO: SNT004-OMC2S22.hotmail.com Received: from snt004-omc2s22.hotmail.com (HELO SNT004-OMC2S22.hotmail.com) (65.55.90.97) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 30 Nov 2016 17:01:28 +0000 Received: from EUR03-DB5-obe.outbound.protection.outlook.com ([65.55.90.71]) by SNT004-OMC2S22.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008); Wed, 30 Nov 2016 09:01:27 -0800 Received: from AM5EUR03FT010.eop-EUR03.prod.protection.outlook.com (10.152.16.57) by AM5EUR03HT140.eop-EUR03.prod.protection.outlook.com (10.152.17.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.734.4; Wed, 30 Nov 2016 17:01:24 +0000 Received: from AM4PR0701MB2162.eurprd07.prod.outlook.com (10.152.16.54) by AM5EUR03FT010.mail.protection.outlook.com (10.152.16.134) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.734.4 via Frontend Transport; Wed, 30 Nov 2016 17:01:24 +0000 Received: from AM4PR0701MB2162.eurprd07.prod.outlook.com ([10.167.132.147]) by AM4PR0701MB2162.eurprd07.prod.outlook.com ([10.167.132.147]) with mapi id 15.01.0761.009; Wed, 30 Nov 2016 17:01:24 +0000 From: Bernd Edlinger To: Wilco Dijkstra , Ramana Radhakrishnan CC: GCC Patches , Kyrill Tkachov , Richard Earnshaw , nd Subject: Re: [PATCH, ARM] Further improve stack usage on sha512 (PR 77308) Date: Wed, 30 Nov 2016 17:01:24 +0000 Message-ID: References: In-Reply-To: authentication-results: arm.com; dkim=none (message not signed) header.d=none; arm.com; dmarc=none action=none header.from=hotmail.de; x-incomingtopheadermarker: OriginalChecksum:; UpperCasedChecksum:; SizeAsReceived:8136; Count:37 x-ms-exchange-messagesentrepresentingtype: 1 x-incomingheadercount: 37 x-eopattributedmessage: 0 x-microsoft-exchange-diagnostics: 1; AM5EUR03HT140; 7:m6Hn5qfXZKAhbdWHQhBbYKzd3ceA3tVGeAg9d15SFh0kXhbd1fwhQ62m63NVzkMu3GC3mDk0nLyQqtPuegDLu4myL5gcQnKIAOcJk7SdmpRCZapansPWenvSPuyxxyzg5xYCbQ1AxZ0oilYnGWzbtdLzKrZkxdbHtTpuZvzA5+Zx4TyELPMgd0oAIj9ui/NEKmZSMGHUaWrKWvfW5xJYOu+Jv8owXUWNGUQfiqCb0/45t2tubimDqsaUpJAzG2ALS7pn0ba3kTnRkrClTGUC8yxG/F0Tu2DEU9DoStyghIGH6osSUGtyvVdGAhpGljF5RMSo9Rw4VgG230q6VT/KG/dJR1HhtBEmQHfkrvwAcX4= x-forefront-antispam-report: EFV:NLI; SFV:NSPM; SFS:(10019020)(98900003); DIR:OUT; SFP:1102; SCL:1; SRVR:AM5EUR03HT140; H:AM4PR0701MB2162.eurprd07.prod.outlook.com; FPR:; SPF:None; LANG:en; x-ms-office365-filtering-correlation-id: 65e60204-05f3-447a-1ba5-08d41942837c x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(1601124038)(1603103113)(1601125047); SRVR:AM5EUR03HT140; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(432015012)(102415395)(82015046); SRVR:AM5EUR03HT140; BCL:0; PCL:0; RULEID:; SRVR:AM5EUR03HT140; x-forefront-prvs: 0142F22657 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-originalarrivaltime: 30 Nov 2016 17:01:24.4238 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5EUR03HT140 On 11/30/16 13:01, Wilco Dijkstra wrote: > Bernd Edlinger wrote: >> On 11/29/16 16:06, Wilco Dijkstra wrote: >>> Bernd Edlinger wrote: >>> >>> - "TARGET_32BIT && reload_completed >>> + "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed) >>> && ! (TARGET_NEON && IS_VFP_REGNUM (REGNO (operands[0])))" >>> >>> This is equivalent to "&& (!TARGET_IWMMXT || reload_completed)" since we're >>> already excluding NEON. >> >> Aehm, no. This would split the addi_neon insn before it is clear >> if the reload pass will assign a VFP register. > > Hmm that's strange... This instruction shouldn't be used to also split some random > Neon pattern - for example arm_subdi3 doesn't do the same. To understand and > reason about any of these complex patterns they should all work in the same way... > I was a bit surprised as well, when I saw that happen. But subdi3 is different: "TARGET_32BIT && !TARGET_NEON" "#" ; "subs\\t%Q0, %Q1, %Q2\;sbc\\t%R0, %R1, %R2" "&& reload_completed" so this never splits anything if TARGET_NEON. but adddi3 can not expand if TARGET_NEON but it's pattern simply looks exactly like the addi3_neon: (define_insn_and_split "*arm_adddi3" [(set (match_operand:DI 0 "s_register_operand" "=&r,&r,&r,&r,&r") (plus:DI (match_operand:DI 1 "s_register_operand" "%0, 0, r, 0, r") (match_operand:DI 2 "arm_adddi_operand" "r, 0, r, Dd, Dd"))) (clobber (reg:CC CC_REGNUM))] "TARGET_32BIT && !TARGET_NEON" "#" "TARGET_32BIT && reload_completed && ! (TARGET_NEON && IS_VFP_REGNUM (REGNO (operands[0])))" (define_insn "adddi3_neon" [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?w,?&r,?&r,?&r") (plus:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,w,r,0,r") (match_operand:DI 2 "arm_adddi_operand" "w,r,0,w,r,Dd,Dd"))) (clobber (reg:CC CC_REGNUM))] "TARGET_NEON" { switch (which_alternative) { case 0: /* fall through */ case 3: return "vadd.i64\t%P0, %P1, %P2"; case 1: return "#"; case 2: return "#"; case 4: return "#"; case 5: return "#"; case 6: return "#"; default: gcc_unreachable (); } Even the return "#" explicitly invokes the former pattern. So I think the author knew that, and did it on purpose. >> But when I make *arm_cmpdi_insn split early, it ICEs: > > (insn 4870 4869 1636 87 (set (scratch:SI) > (minus:SI (minus:SI (subreg:SI (reg:DI 2261) 4) > (subreg:SI (reg:DI 473 [ X$14 ]) 4)) > (ltu:SI (reg:CC_C 100 cc) > (const_int 0 [0])))) "pr77308-2.c":140 -1 > (nil)) > > That's easy, we don't have a sbcs , r1, r2 pattern. A quick workaround is > to create a temporary for operand[2] (if before reload) so it will match the standard > sbcs pattern, and then the split works fine. > >> So it is certainly possible, but not really simple to improve the >> stack size even further. But I would prefer to do that in a >> separate patch. > > Yes separate patches would be fine. However there is a lot of scope to improve this > further. For example after your patch shifts and logical operations are expanded in > expand, add/sub are in split1 after combine runs and everything else is split after > reload. It doesn't make sense to split different operations at different times - it means > you're still going to get the bad DImode subregs and miss lots of optimization > opportunities due to the mix of partly split and partly not-yet-split operations. > Yes. I did the add/sub differently because it was more easy this way, and it was simply sufficient to make the existing test cases happy. Also, the biggest benefit was IIRC from the very early splitting of the anddi/iordi/xordi patterns, because they have completely separate data flow in low and high parts. And that is not the case for the arihmetic patterns, but nevertheless they can still be optimized, preferably, when a new test case is found, that can demonstrate an improvement. I am not sure why the cmpdi pattern have an influence at all, because from the data flow you need all 64 bits of both sides. Nevertheless it is a fact: With the modified test case I get 264 bytes frame size, and that was 1920 before. I attached the completely untested follow-up patch now, but I would like to post that one again for review, after I applied my current patch, which is still waiting for final review (please feel pinged!). This is really exciting... Thanks Bernd. --- gcc/config/arm/arm.md.orig 2016-11-27 09:22:41.794790123 +0100 +++ gcc/config/arm/arm.md 2016-11-30 16:40:30.140532737 +0100 @@ -4738,7 +4738,7 @@ (clobber (reg:CC CC_REGNUM))] "TARGET_ARM" "#" ; "rsbs\\t%Q0, %Q1, #0\;rsc\\t%R0, %R1, #0" - "&& reload_completed" + "&& ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)" [(parallel [(set (reg:CC CC_REGNUM) (compare:CC (const_int 0) (match_dup 1))) (set (match_dup 0) (minus:SI (const_int 0) (match_dup 1)))]) @@ -7432,7 +7432,7 @@ (clobber (match_scratch:SI 2 "=r"))] "TARGET_32BIT" "#" ; "cmp\\t%Q0, %Q1\;sbcs\\t%2, %R0, %R1" - "&& reload_completed" + "&& ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)" [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 0) (match_dup 1))) (parallel [(set (reg:CC CC_REGNUM) @@ -7456,7 +7456,10 @@ operands[5] = gen_rtx_MINUS (SImode, operands[3], operands[4]); } operands[1] = gen_lowpart (SImode, operands[1]); - operands[2] = gen_lowpart (SImode, operands[2]); + if (can_create_pseudo_p ()) + operands[2] = gen_reg_rtx (SImode); + else + operands[2] = gen_lowpart (SImode, operands[2]); } [(set_attr "conds" "set") (set_attr "length" "8") @@ -7470,7 +7473,7 @@ "TARGET_32BIT" "#" ; "cmp\\t%R0, %R1\;it eq\;cmpeq\\t%Q0, %Q1" - "&& reload_completed" + "&& ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)" [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 2) (match_dup 3))) (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0)) --- gcc/config/arm/thumb2.md.orig 2016-11-30 16:57:44.760589624 +0100 +++ gcc/config/arm/thumb2.md 2016-11-30 16:58:05.310590754 +0100 @@ -132,7 +132,7 @@ (clobber (reg:CC CC_REGNUM))] "TARGET_THUMB2" "#" ; negs\\t%Q0, %Q1\;sbc\\t%R0, %R1, %R1, lsl #1 - "&& reload_completed" + "&& (!TARGET_NEON || reload_completed)" [(parallel [(set (reg:CC CC_REGNUM) (compare:CC (const_int 0) (match_dup 1))) (set (match_dup 0) (minus:SI (const_int 0) (match_dup 1)))]) --- /dev/null 2016-11-30 15:23:46.779473644 +0100 +++ gcc/testsuite/gcc.target/arm/pr77308-2.c 2016-11-30 17:05:21.021614711 +0100 @@ -0,0 +1,169 @@ +/* { dg-do compile } */ +/* { dg-options "-Os -Wstack-usage=2500" } */ + +/* This is a modified algorithm with 64bit cmp and neg at the Sigma-blocks. + It improves the test coverage of cmpdi and negdi2 patterns. + Unlike the original test case these insns can reach the reload pass, + which may result in large stack usage. */ + +#define SHA_LONG64 unsigned long long +#define U64(C) C##ULL + +#define SHA_LBLOCK 16 +#define SHA512_CBLOCK (SHA_LBLOCK*8) + +typedef struct SHA512state_st { + SHA_LONG64 h[8]; + SHA_LONG64 Nl, Nh; + union { + SHA_LONG64 d[SHA_LBLOCK]; + unsigned char p[SHA512_CBLOCK]; + } u; + unsigned int num, md_len; +} SHA512_CTX; + +static const SHA_LONG64 K512[80] = { + U64(0x428a2f98d728ae22), U64(0x7137449123ef65cd), + U64(0xb5c0fbcfec4d3b2f), U64(0xe9b5dba58189dbbc), + U64(0x3956c25bf348b538), U64(0x59f111f1b605d019), + U64(0x923f82a4af194f9b), U64(0xab1c5ed5da6d8118), + U64(0xd807aa98a3030242), U64(0x12835b0145706fbe), + U64(0x243185be4ee4b28c), U64(0x550c7dc3d5ffb4e2), + U64(0x72be5d74f27b896f), U64(0x80deb1fe3b1696b1), + U64(0x9bdc06a725c71235), U64(0xc19bf174cf692694), + U64(0xe49b69c19ef14ad2), U64(0xefbe4786384f25e3), + U64(0x0fc19dc68b8cd5b5), U64(0x240ca1cc77ac9c65), + U64(0x2de92c6f592b0275), U64(0x4a7484aa6ea6e483), + U64(0x5cb0a9dcbd41fbd4), U64(0x76f988da831153b5), + U64(0x983e5152ee66dfab), U64(0xa831c66d2db43210), + U64(0xb00327c898fb213f), U64(0xbf597fc7beef0ee4), + U64(0xc6e00bf33da88fc2), U64(0xd5a79147930aa725), + U64(0x06ca6351e003826f), U64(0x142929670a0e6e70), + U64(0x27b70a8546d22ffc), U64(0x2e1b21385c26c926), + U64(0x4d2c6dfc5ac42aed), U64(0x53380d139d95b3df), + U64(0x650a73548baf63de), U64(0x766a0abb3c77b2a8), + U64(0x81c2c92e47edaee6), U64(0x92722c851482353b), + U64(0xa2bfe8a14cf10364), U64(0xa81a664bbc423001), + U64(0xc24b8b70d0f89791), U64(0xc76c51a30654be30), + U64(0xd192e819d6ef5218), U64(0xd69906245565a910), + U64(0xf40e35855771202a), U64(0x106aa07032bbd1b8), + U64(0x19a4c116b8d2d0c8), U64(0x1e376c085141ab53), + U64(0x2748774cdf8eeb99), U64(0x34b0bcb5e19b48a8), + U64(0x391c0cb3c5c95a63), U64(0x4ed8aa4ae3418acb), + U64(0x5b9cca4f7763e373), U64(0x682e6ff3d6b2b8a3), + U64(0x748f82ee5defb2fc), U64(0x78a5636f43172f60), + U64(0x84c87814a1f0ab72), U64(0x8cc702081a6439ec), + U64(0x90befffa23631e28), U64(0xa4506cebde82bde9), + U64(0xbef9a3f7b2c67915), U64(0xc67178f2e372532b), + U64(0xca273eceea26619c), U64(0xd186b8c721c0c207), + U64(0xeada7dd6cde0eb1e), U64(0xf57d4f7fee6ed178), + U64(0x06f067aa72176fba), U64(0x0a637dc5a2c898a6), + U64(0x113f9804bef90dae), U64(0x1b710b35131c471b), + U64(0x28db77f523047d84), U64(0x32caab7b40c72493), + U64(0x3c9ebe0a15c9bebc), U64(0x431d67c49c100d4c), + U64(0x4cc5d4becb3e42b6), U64(0x597f299cfc657e2a), + U64(0x5fcb6fab3ad6faec), U64(0x6c44198c4a475817) +}; + +#define B(x,j) (((SHA_LONG64)(*(((const unsigned char *)(&x))+j)))<<((7-j)*8)) +#define PULL64(x) (B(x,0)|B(x,1)|B(x,2)|B(x,3)|B(x,4)|B(x,5)|B(x,6)|B(x,7)) +#define ROTR(x,s) (((x)>>s) | (x)<<(64-s)) +#define Sigma0(x) (ROTR((x),28) ^ ROTR((x),34) ^ (ROTR((x),39) == (x)) ? -(x) : (x)) +#define Sigma1(x) (ROTR((x),14) ^ ROTR(-(x),18) ^ ((long long)ROTR((x),41) < (long long)(x)) ? -(x) : (x)) +#define sigma0(x) (ROTR((x),1) ^ ROTR((x),8) ^ (((x)>>7) > (x)) ? -(x) : (x)) +#define sigma1(x) (ROTR((x),19) ^ ROTR((x),61) ^ ((long long)((x)>>6) < (long long)(x)) ? -(x) : (x)) +#define Ch(x,y,z) (((x) & (y)) ^ ((~(x)) & (z))) +#define Maj(x,y,z) (((x) & (y)) ^ ((x) & (z)) ^ ((y) & (z))) + +#define ROUND_00_15(i,a,b,c,d,e,f,g,h) do { \ + T1 += h + Sigma1(e) + Ch(e,f,g) + K512[i]; \ + h = Sigma0(a) + Maj(a,b,c); \ + d += T1; h += T1; } while (0) +#define ROUND_16_80(i,j,a,b,c,d,e,f,g,h,X) do { \ + s0 = X[(j+1)&0x0f]; s0 = sigma0(s0); \ + s1 = X[(j+14)&0x0f]; s1 = sigma1(s1); \ + T1 = X[(j)&0x0f] += s0 + s1 + X[(j+9)&0x0f]; \ + ROUND_00_15(i+j,a,b,c,d,e,f,g,h); } while (0) +void sha512_block_data_order(SHA512_CTX *ctx, const void *in, + unsigned int num) +{ + const SHA_LONG64 *W = in; + SHA_LONG64 a, b, c, d, e, f, g, h, s0, s1, T1; + SHA_LONG64 X[16]; + int i; + + while (num--) { + + a = ctx->h[0]; + b = ctx->h[1]; + c = ctx->h[2]; + d = ctx->h[3]; + e = ctx->h[4]; + f = ctx->h[5]; + g = ctx->h[6]; + h = ctx->h[7]; + + T1 = X[0] = PULL64(W[0]); + ROUND_00_15(0, a, b, c, d, e, f, g, h); + T1 = X[1] = PULL64(W[1]); + ROUND_00_15(1, h, a, b, c, d, e, f, g); + T1 = X[2] = PULL64(W[2]); + ROUND_00_15(2, g, h, a, b, c, d, e, f); + T1 = X[3] = PULL64(W[3]); + ROUND_00_15(3, f, g, h, a, b, c, d, e); + T1 = X[4] = PULL64(W[4]); + ROUND_00_15(4, e, f, g, h, a, b, c, d); + T1 = X[5] = PULL64(W[5]); + ROUND_00_15(5, d, e, f, g, h, a, b, c); + T1 = X[6] = PULL64(W[6]); + ROUND_00_15(6, c, d, e, f, g, h, a, b); + T1 = X[7] = PULL64(W[7]); + ROUND_00_15(7, b, c, d, e, f, g, h, a); + T1 = X[8] = PULL64(W[8]); + ROUND_00_15(8, a, b, c, d, e, f, g, h); + T1 = X[9] = PULL64(W[9]); + ROUND_00_15(9, h, a, b, c, d, e, f, g); + T1 = X[10] = PULL64(W[10]); + ROUND_00_15(10, g, h, a, b, c, d, e, f); + T1 = X[11] = PULL64(W[11]); + ROUND_00_15(11, f, g, h, a, b, c, d, e); + T1 = X[12] = PULL64(W[12]); + ROUND_00_15(12, e, f, g, h, a, b, c, d); + T1 = X[13] = PULL64(W[13]); + ROUND_00_15(13, d, e, f, g, h, a, b, c); + T1 = X[14] = PULL64(W[14]); + ROUND_00_15(14, c, d, e, f, g, h, a, b); + T1 = X[15] = PULL64(W[15]); + ROUND_00_15(15, b, c, d, e, f, g, h, a); + + for (i = 16; i < 80; i += 16) { + ROUND_16_80(i, 0, a, b, c, d, e, f, g, h, X); + ROUND_16_80(i, 1, h, a, b, c, d, e, f, g, X); + ROUND_16_80(i, 2, g, h, a, b, c, d, e, f, X); + ROUND_16_80(i, 3, f, g, h, a, b, c, d, e, X); + ROUND_16_80(i, 4, e, f, g, h, a, b, c, d, X); + ROUND_16_80(i, 5, d, e, f, g, h, a, b, c, X); + ROUND_16_80(i, 6, c, d, e, f, g, h, a, b, X); + ROUND_16_80(i, 7, b, c, d, e, f, g, h, a, X); + ROUND_16_80(i, 8, a, b, c, d, e, f, g, h, X); + ROUND_16_80(i, 9, h, a, b, c, d, e, f, g, X); + ROUND_16_80(i, 10, g, h, a, b, c, d, e, f, X); + ROUND_16_80(i, 11, f, g, h, a, b, c, d, e, X); + ROUND_16_80(i, 12, e, f, g, h, a, b, c, d, X); + ROUND_16_80(i, 13, d, e, f, g, h, a, b, c, X); + ROUND_16_80(i, 14, c, d, e, f, g, h, a, b, X); + ROUND_16_80(i, 15, b, c, d, e, f, g, h, a, X); + } + + ctx->h[0] += a; + ctx->h[1] += b; + ctx->h[2] += c; + ctx->h[3] += d; + ctx->h[4] += e; + ctx->h[5] += f; + ctx->h[6] += g; + ctx->h[7] += h; + + W += SHA_LBLOCK; + } +}