From patchwork Thu Nov 17 20:15:47 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Preudhomme X-Patchwork-Id: 82793 Delivered-To: patch@linaro.org Received: by 10.140.97.165 with SMTP id m34csp962268qge; Thu, 17 Nov 2016 12:16:25 -0800 (PST) X-Received: by 10.55.100.67 with SMTP id y64mr5658035qkb.118.1479413785447; Thu, 17 Nov 2016 12:16:25 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id w128si2986818qkd.147.2016.11.17.12.16.25 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 17 Nov 2016 12:16:25 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-441859-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-441859-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-441859-patch=linaro.org@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:from:message-id:date:mime-version :in-reply-to:content-type; q=dns; s=default; b=KknQUtWrDps7hjFl/ mYEQV2sTCe7E1bbrtf3eEeQ0h5Ad9ihUXVa8zjVXi/x//D+mrHllGCRjQPuNYJ3X dyMPDXYf5HJ3199cWdiY+Qbw8F5STf6/XQ7aTpsv0MzUbKEXvYwf+KFX2rpNsJpR Gk1pb+EFsE5/DE5lQWsaVpRdNo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:from:message-id:date:mime-version :in-reply-to:content-type; s=default; bh=kK4aPAd3xTSrdGgVJMevaSS 3h8s=; b=xO9seB4vNAC3wRzVc41fUqj3Y8oobQTro+SsaHukkovX0lPaWnuRxPr nddGG0SPFllz6Kae1/tkTMcKz4eavgLEOb8JXDeD6MxEsVckbgrS4+h9XLasp5o1 5PAP4HfKJCKJvLCrkhLlNmUS54m63OQ1bkE3dY891KNNSaczEHvI= Received: (qmail 29281 invoked by alias); 17 Nov 2016 20:16:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 29259 invoked by uid 89); 17 Nov 2016 20:16:01 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.5 required=5.0 tests=BAYES_00, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=thumbs, sk:clobber, Thumbs, Best X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 17 Nov 2016 20:15:50 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4AC9513D5; Thu, 17 Nov 2016 12:15:49 -0800 (PST) Received: from [10.2.206.52] (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8B15F3F24D; Thu, 17 Nov 2016 12:15:48 -0800 (PST) Subject: Re: [PATCH, GCC/ARM] Fix PR77933: stack corruption on ARM when using high registers and lr To: Kyrill Tkachov , Ramana Radhakrishnan , Richard Earnshaw , "gcc-patches@gcc.gnu.org" References: <34d1835c-9eea-1f86-0c93-30aeace74762@foss.arm.com> <9075b1c8-2258-2abe-8487-b3116eea8483@foss.arm.com> <582D809F.5010902@foss.arm.com> From: Thomas Preudhomme Message-ID: <3eb11571-1149-5ee8-cd66-18fd9ff717d1@foss.arm.com> Date: Thu, 17 Nov 2016 20:15:47 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <582D809F.5010902@foss.arm.com> X-IsSubscribed: yes Hi Kyrill, I've committed the following updated patch where the test is restricted to Thumb execution mode and skipping it if not possible since -mtpcs-leaf-frame is only available in Thumb mode. I've considered the change obvious. *** gcc/ChangeLog *** 2016-11-08 Thomas Preud'homme PR target/77933 * config/arm/arm.c (thumb1_expand_prologue): Distinguish between lr being live in the function and lr needing to be saved. Distinguish between already saved pushable registers and registers to push. Check for LR being an available pushable register. *** gcc/testsuite/ChangeLog *** 2016-11-08 Thomas Preud'homme PR target/77933 * gcc.target/arm/pr77933-1.c: New test. * gcc.target/arm/pr77933-2.c: Likewise. Best regards, Thomas On 17/11/16 10:04, Kyrill Tkachov wrote: > > On 09/11/16 16:41, Thomas Preudhomme wrote: >> I've reworked the patch following comments from Wilco [1] (sorry could not >> find it in my MUA for some reason). >> >> [1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00317.html >> >> >> == Context == >> >> When saving registers, function thumb1_expand_prologue () aims at minimizing >> the number of push instructions. One of the optimization it does is to push LR >> alongside high register(s) (after having moved them to low register(s)) when >> there is no low register to save. The way this is implemented is to add LR to >> the pushable_regs mask if it is live just before pushing the registers in that >> mask. The mask of live pushable registers which is used to detect whether LR >> needs to be saved is then clear to ensure LR is only saved once. >> >> >> == Problem == >> >> However beyond deciding what register to push pushable_regs is used to track >> what pushable register can be used to move a high register before being >> pushed, hence the name. That mask is cleared when all high registers have been >> assigned a low register but the clearing assumes the high registers were >> assigned to the registers with the biggest number in that mask. This is not >> the case because LR is not considered when looking for a register in that >> mask. Furthermore, LR might have been saved in the TARGET_BACKTRACE path above >> yet the mask of live pushable registers is not cleared in that case. >> >> >> == Solution == >> >> This patch changes the loop to iterate over register LR to r0 so as to both >> fix the stack corruption reported in PR77933 and reuse lr to push some high >> register when possible. This patch also introduce a new variable >> lr_needs_saving to record whether LR (still) needs to be saved at a given >> point in code and sets the variable accordingly throughout the code, thus >> fixing the second issue. Finally, this patch create a new push_mask variable >> to distinguish between the mask of registers to push and the mask of live >> pushable registers. >> >> >> == Note == >> >> Other bits could have been improved but have been left out to allow the patch >> to be backported to stable branch: >> >> (1) using argument registers that are not holding an argument >> (2) using push_mask consistently instead of l_mask (in TARGET_BACKTRACE), mask >> (low register push) and push_mask >> (3) the !l_mask case improved in TARGET_BACKTRACE since offset == 0 >> (4) rename l_mask to a more appropriate name (live_pushable_regs_mask?) >> >> ChangeLog entry are as follow: >> >> *** gcc/ChangeLog *** >> >> 2016-11-08 Thomas Preud'homme >> >> PR target/77933 >> * config/arm/arm.c (thumb1_expand_prologue): Distinguish between lr >> being live in the function and lr needing to be saved. Distinguish >> between already saved pushable registers and registers to push. >> Check for LR being an available pushable register. >> >> >> *** gcc/testsuite/ChangeLog *** >> >> 2016-11-08 Thomas Preud'homme >> >> PR target/77933 >> * gcc.target/arm/pr77933-1.c: New test. >> * gcc.target/arm/pr77933-2.c: Likewise. >> >> >> Testing: no regression on arm-none-eabi GCC cross-compiler targeting Cortex-M0 >> >> Is this ok for trunk? >> > > Ok. > Thanks, > Kyrill > >> Best regards, >> >> Thomas >> >> On 02/11/16 17:08, Thomas Preudhomme wrote: >>> Hi, >>> >>> When saving registers, function thumb1_expand_prologue () aims at minimizing the >>> number of push instructions. One of the optimization it does is to push lr >>> alongside high register(s) (after having moved them to low register(s)) when >>> there is no low register to save. The way this is implemented is to add lr to >>> the list of registers that can be pushed just before the push happens. This >>> would then push lr and allows it to be used for further push if there was not >>> enough registers to push all high registers to be pushed. >>> >>> However, the logic that decides what register to move high registers to before >>> being pushed only looks at low registers (see for loop initialization). This >>> means not only that lr is not used for pushing high registers but also that lr >>> is not removed from the list of registers to be pushed when it's not used. This >>> extra lr push is not poped in epilogue leading in stack corruption. >>> >>> This patch changes the loop to iterate over register r0 to lr so as to both fix >>> the stack corruption and reuse lr to push some high register when possible. >>> >>> ChangeLog entry are as follow: >>> >>> *** gcc/ChangeLog *** >>> >>> 2016-11-01 Thomas Preud'homme >>> >>> PR target/77933 >>> * config/arm/arm.c (thumb1_expand_prologue): Also check for lr being a >>> pushable register. >>> >>> >>> *** gcc/testsuite/ChangeLog *** >>> >>> 2016-11-01 Thomas Preud'homme >>> >>> PR target/77933 >>> * gcc.target/arm/pr77933.c: New test. >>> >>> >>> Testing: no regression on arm-none-eabi GCC cross-compiler targeting Cortex-M0 >>> >>> Is this ok for trunk? >>> >>> Best regards, >>> >>> Thomas > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index dd8d5e5db8ca50daab648e58df290969aa794862..ddbda3e46dbcabb6c5775f847bc338c37705e122 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -24806,6 +24806,7 @@ thumb1_expand_prologue (void) unsigned long live_regs_mask; unsigned long l_mask; unsigned high_regs_pushed = 0; + bool lr_needs_saving; func_type = arm_current_func_type (); @@ -24828,6 +24829,7 @@ thumb1_expand_prologue (void) offsets = arm_get_frame_offsets (); live_regs_mask = offsets->saved_regs_mask; + lr_needs_saving = live_regs_mask & (1 << LR_REGNUM); /* Extract a mask of the ones we can give to the Thumb's push instruction. */ l_mask = live_regs_mask & 0x40ff; @@ -24894,6 +24896,7 @@ thumb1_expand_prologue (void) { insn = thumb1_emit_multi_reg_push (l_mask, l_mask); RTX_FRAME_RELATED_P (insn) = 1; + lr_needs_saving = false; offset = bit_count (l_mask) * UNITS_PER_WORD; } @@ -24958,12 +24961,13 @@ thumb1_expand_prologue (void) be a push of LR and we can combine it with the push of the first high register. */ else if ((l_mask & 0xff) != 0 - || (high_regs_pushed == 0 && l_mask)) + || (high_regs_pushed == 0 && lr_needs_saving)) { unsigned long mask = l_mask; mask |= (1 << thumb1_extra_regs_pushed (offsets, true)) - 1; insn = thumb1_emit_multi_reg_push (mask, mask); RTX_FRAME_RELATED_P (insn) = 1; + lr_needs_saving = false; } if (high_regs_pushed) @@ -24981,7 +24985,9 @@ thumb1_expand_prologue (void) /* Here we need to mask out registers used for passing arguments even if they can be pushed. This is to avoid using them to stash the high registers. Such kind of stash may clobber the use of arguments. */ - pushable_regs = l_mask & (~arg_regs_mask) & 0xff; + pushable_regs = l_mask & (~arg_regs_mask); + if (lr_needs_saving) + pushable_regs &= ~(1 << LR_REGNUM); if (pushable_regs == 0) pushable_regs = 1 << thumb_find_work_register (live_regs_mask); @@ -24989,8 +24995,9 @@ thumb1_expand_prologue (void) while (high_regs_pushed > 0) { unsigned long real_regs_mask = 0; + unsigned long push_mask = 0; - for (regno = LAST_LO_REGNUM; regno >= 0; regno --) + for (regno = LR_REGNUM; regno >= 0; regno --) { if (pushable_regs & (1 << regno)) { @@ -24999,6 +25006,7 @@ thumb1_expand_prologue (void) high_regs_pushed --; real_regs_mask |= (1 << next_hi_reg); + push_mask |= (1 << regno); if (high_regs_pushed) { @@ -25008,23 +25016,20 @@ thumb1_expand_prologue (void) break; } else - { - pushable_regs &= ~((1 << regno) - 1); - break; - } + break; } } /* If we had to find a work register and we have not yet saved the LR then add it to the list of regs to push. */ - if (l_mask == (1 << LR_REGNUM)) + if (lr_needs_saving) { - pushable_regs |= l_mask; - real_regs_mask |= l_mask; - l_mask = 0; + push_mask |= 1 << LR_REGNUM; + real_regs_mask |= 1 << LR_REGNUM; + lr_needs_saving = false; } - insn = thumb1_emit_multi_reg_push (pushable_regs, real_regs_mask); + insn = thumb1_emit_multi_reg_push (push_mask, real_regs_mask); RTX_FRAME_RELATED_P (insn) = 1; } } diff --git a/gcc/testsuite/gcc.target/arm/pr77933-1.c b/gcc/testsuite/gcc.target/arm/pr77933-1.c new file mode 100644 index 0000000000000000000000000000000000000000..95cf68ea7531bcc453371f493a05bd40caa5541b --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr77933-1.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +__attribute__ ((noinline, noclone)) void +clobber_lr_and_highregs (void) +{ + __asm__ volatile ("" : : : "r8", "r9", "lr"); +} + +int +main (void) +{ + int ret; + + __asm volatile ("mov\tr4, #0xf4\n\t" + "mov\tr5, #0xf5\n\t" + "mov\tr6, #0xf6\n\t" + "mov\tr7, #0xf7\n\t" + "mov\tr0, #0xf8\n\t" + "mov\tr8, r0\n\t" + "mov\tr0, #0xfa\n\t" + "mov\tr10, r0" + : : : "r0", "r4", "r5", "r6", "r7", "r8", "r10"); + + clobber_lr_and_highregs (); + + __asm volatile ("cmp\tr4, #0xf4\n\t" + "bne\tfail\n\t" + "cmp\tr5, #0xf5\n\t" + "bne\tfail\n\t" + "cmp\tr6, #0xf6\n\t" + "bne\tfail\n\t" + "cmp\tr7, #0xf7\n\t" + "bne\tfail\n\t" + "mov\tr0, r8\n\t" + "cmp\tr0, #0xf8\n\t" + "bne\tfail\n\t" + "mov\tr0, r10\n\t" + "cmp\tr0, #0xfa\n\t" + "bne\tfail\n\t" + "mov\t%0, #1\n" + "fail:\n\t" + "sub\tr0, #1" + : "=r" (ret) : :); + return ret; +} diff --git a/gcc/testsuite/gcc.target/arm/pr77933-2.c b/gcc/testsuite/gcc.target/arm/pr77933-2.c new file mode 100644 index 0000000000000000000000000000000000000000..9028c4fcab4229591fa057f15c641d2b5597cd1d --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr77933-2.c @@ -0,0 +1,47 @@ +/* { dg-do run } */ +/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */ +/* { dg-options "-mthumb -O2 -mtpcs-leaf-frame" } */ + +__attribute__ ((noinline, noclone)) void +clobber_lr_and_highregs (void) +{ + __asm__ volatile ("" : : : "r8", "r9", "lr"); +} + +int +main (void) +{ + int ret; + + __asm volatile ("mov\tr4, #0xf4\n\t" + "mov\tr5, #0xf5\n\t" + "mov\tr6, #0xf6\n\t" + "mov\tr7, #0xf7\n\t" + "mov\tr0, #0xf8\n\t" + "mov\tr8, r0\n\t" + "mov\tr0, #0xfa\n\t" + "mov\tr10, r0" + : : : "r0", "r4", "r5", "r6", "r7", "r8", "r10"); + + clobber_lr_and_highregs (); + + __asm volatile ("cmp\tr4, #0xf4\n\t" + "bne\tfail\n\t" + "cmp\tr5, #0xf5\n\t" + "bne\tfail\n\t" + "cmp\tr6, #0xf6\n\t" + "bne\tfail\n\t" + "cmp\tr7, #0xf7\n\t" + "bne\tfail\n\t" + "mov\tr0, r8\n\t" + "cmp\tr0, #0xf8\n\t" + "bne\tfail\n\t" + "mov\tr0, r10\n\t" + "cmp\tr0, #0xfa\n\t" + "bne\tfail\n\t" + "mov\t%0, #1\n" + "fail:\n\t" + "sub\tr0, #1" + : "=r" (ret) : :); + return ret; +}