From patchwork Wed Apr 22 15:07:47 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 47416 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-lb0-f199.google.com (mail-lb0-f199.google.com [209.85.217.199]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 7469420553 for ; Wed, 22 Apr 2015 15:08:12 +0000 (UTC) Received: by lbcne10 with SMTP id ne10sf52792471lbc.1 for ; Wed, 22 Apr 2015 08:08:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id :list-unsubscribe:list-subscribe:list-archive:list-post:list-help :sender:delivered-to:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding:x-original-sender :x-original-authentication-results; bh=ANrr7XVp3lIkPqvbOxbVzUtOSczlHPqLpyF8HWckdJM=; b=WM8rF6E3sS/yVoLL+3Pk3gtcBL+CD4nEPOgHZsICGGpP9YLQSKysk5pE+4iG1tTAu2 F7g9FfHSiAJ5+fZ814zqr3ICyfXHunxc+2v6RHYr5ed0wHBX7fCo2wzDUR8n6nfFxh9i eO7tFv1WPjk61XQ42BLLb8UpZRYGUci2HRfQMvEShrKOw/dzr2k827kXM8Ub1dRzO+hO 35/uQqzPXxzCwOC7s2Uu5IZrLQqj6D460EHdz1lpCngsfmKSUv5L0MIikXzBn6qtTo7B KeQLkwmXaLK8J/eyiukxCZHFdplXiatBR/joOyP26X1grQhKp0+8188V2LyE4fjUx80Z xkNA== X-Gm-Message-State: ALoCoQnyUFp0ccVwPSs65vI7UJ7myn0F+CQxvsEUTykhemJsjKyCPe+Zi+D3zn3e4BbxixkV89IV X-Received: by 10.112.171.41 with SMTP id ar9mr11744543lbc.24.1429715291405; Wed, 22 Apr 2015 08:08:11 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.36.34 with SMTP id n2ls198994laj.15.gmail; Wed, 22 Apr 2015 08:08:11 -0700 (PDT) X-Received: by 10.152.88.80 with SMTP id be16mr24790677lab.39.1429715291249; Wed, 22 Apr 2015 08:08:11 -0700 (PDT) Received: from mail-lb0-x229.google.com (mail-lb0-x229.google.com. [2a00:1450:4010:c04::229]) by mx.google.com with ESMTPS id es16si3913378lbc.122.2015.04.22.08.08.11 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 22 Apr 2015 08:08:11 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::229 as permitted sender) client-ip=2a00:1450:4010:c04::229; Received: by lbbqq2 with SMTP id qq2so182463286lbb.3 for ; Wed, 22 Apr 2015 08:08:11 -0700 (PDT) X-Received: by 10.112.163.168 with SMTP id yj8mr25011611lbb.36.1429715291093; Wed, 22 Apr 2015 08:08:11 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.67.65 with SMTP id l1csp1327045lbt; Wed, 22 Apr 2015 08:08:09 -0700 (PDT) X-Received: by 10.66.157.106 with SMTP id wl10mr46839022pab.57.1429715286666; Wed, 22 Apr 2015 08:08:06 -0700 (PDT) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id hn9si8170445pdb.133.2015.04.22.08.08.05 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 22 Apr 2015 08:08:06 -0700 (PDT) Received-SPF: pass (google.com: domain of libc-alpha-return-58523-patch=linaro.org@sourceware.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 125055 invoked by alias); 22 Apr 2015 15:07:55 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Subscribe: List-Archive: List-Post: , List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 125043 invoked by uid 89); 22 Apr 2015 15:07:55 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-qg0-f44.google.com X-Received: by 10.55.53.72 with SMTP id c69mr49807331qka.67.1429715271230; Wed, 22 Apr 2015 08:07:51 -0700 (PDT) Message-ID: <5537B943.2060001@linaro.org> Date: Wed, 22 Apr 2015 12:07:47 -0300 From: Adhemerval Zanella User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: libc-alpha@sourceware.org Subject: Re: [PATCH] [BZ 18034] [AArch64] Lazy TLSDESC relocation data race fix References: <553793A3.7030206@arm.com> In-Reply-To: <553793A3.7030206@arm.com> X-Original-Sender: adhemerval.zanella@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::229 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@sourceware.org X-Google-Group-Id: 836684582541 Hi On 22-04-2015 09:27, Szabolcs Nagy wrote: > Lazy TLSDESC initialization needs to be synchronized with concurrent TLS > accesses. > > The original ARM TLSDESC design did not discuss this race that > arises on systems with weak memory order guarantees > > http://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-ARM.txt > > "Storing the final value of the TLS descriptor also needs care: the > resolver field must be set to its final value after the argument gets > its final value, such that any if thread attempts to use the > descriptor before it gets its final value, it still goes to the hold > function." > > The current glibc code (i386, x86_64, arm, aarch64) is > > td->arg = ...; > td->entry = ...; > > the arm memory model allows store-store reordering, so a barrier is > needed between these two stores (the code is broken on x86 as well in > principle: x86 memory model does not help on the c code level, the > compiler can reorder non-aliasing stores). > > What is missing from the TLSDESC design spec is a description of the > ordering requirements on the read side: the TLS access sequence > (generated by the compiler) loads the descriptor function pointer > (td->entry) and then the argument is loaded (td->arg) in that function. > These two loads must observe the changes made on the write side in a > sequentially consistent way. The code in asm: > > ldr x1, [x0] // load td->entry > blr [x0] // call it > > entryfunc: > ldr x0, [x0,#8] // load td->arg > > The branch (blr) does not provide a load-load ordering barrier (so the > second load may observe an old arg even though the first load observed > the new entry, this happens with existing aarch64 hardware causing > invalid memory access due to the incorrect TLS offset). > > Various alternatives were considered to force the load ordering in the > descriptor function: > > (1) barrier > > entryfunc: > dmb ishld > ldr x0, [x0,#8] > > (2) address dependency (if the address of the second load depends on the > result of the first one the ordering is guaranteed): > > entryfunc: > ldr x1,[x0] > and x1,x1,#8 > orr x1,x1,#8 > ldr x0,[x0,x1] > > (3) load-acquire (ARMv8 instruction that is ordered before subsequent > loads and stores) > > entryfunc: > ldar xzr,[x0] > ldr x0,[x0,#8] > > Option (1) is the simplest but slowest (note: this runs at every TLS > access), options (2) and (3) do one extra load from [x0] (same address > loads are ordered so it happens-after the load on the call site), > option (2) clobbers x1, so I think (3) is the best solution (a load > into the zero register has the same effect as with a normal register, > but the result is discarded so nothing is clobbered. Note that the > TLSDESC abi allows the descriptor function to clobber x1, but current > gcc does not implement this correctly, gcc will be fixed independently, > the dynamic and undefweak descriptors currently save/restore x1 so only > static TLS would be affected by this clobbering issue). I see options (3) as the preferred one as well, since it fits better on the C11 memory semantics. > > On the write side I used a full barrier (__sync_synchronize) although > > dmb ishst > > would be enough, but write side is not performance critical. My understanding is you do not need to push a seq-consistent, but rather a store release on the 'td->arg', since the code only requires that 'td->entry' should not be reordered with 'td->arg'. So, to adjust to C11 semantics I would change: > > I introduced a new _dl_tlsdesc_return_lazy descriptor function for > lazily relocated static TLS, so non-lazy use-case is not affected. > The dynamic and undefweak descriptors do enough work so the additional > ldar should not have a performance impact. > > Other thoughts: > > - Lazy binding for static TLS may be unnecessary complexity: it seems > that gcc generates at most two static TLSDESC relocation entries for a > translation unit (zero and non-zero initialized TLS), so there has to be > a lot of object files with static TLS linked into a shared object to > make the load time relocation overhead significant. Unless there is some > evidence that lazy static TLS relocation makes sense I would suggest > removing that logic (from all archs). (The dynamic case is probably a > similar micro-optimization, but there the lazy vs non-lazy semantics are > different in case of undefined symbols and some applications may depend > on that). Recently I am seeing that all lazy relocation yields less gains than before and add a lot of complexity. I would like to see an usercase where lazy TLS or even normal relocation shows a real gain. > > - 32bit arm with -mtls-dialect=gnu2 is affected too, it will have to go > with option (1) or (2) with an additional twist: some existing ARMv7 cpus > don't implement the same address load ordering reliably, see: > http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf > > - I don't understand the role of the hold function in the general > TLSDESC design, why is it not enough to let other threads wait on the > global lock in the initial resolver function? Is the global dl lock > implemented as a spin lock? Is it for some liveness/fairness property? > > - There are some incorrect uses of "cfi_adjust_cfa_offset" in > dl-tlsdesc.S which is a separate patch. > > Changelog: > > 2015-04-22 Szabolcs Nagy > > [BZ #18034] > * sysdeps/aarch64/dl-tlsdesc.h (_dl_tlsdesc_return_lazy): Declare. > * sysdeps/aarch64/dl-tlsdesc.S (_dl_tlsdesc_return_lazy): Define. > (_dl_tlsdesc_undefweak): Guarantee load-load ordering. > (_dl_tlsdesc_dynamic): Likewise. > * sysdeps/aarch64/tlsdesc.c (_dl_tlsdesc_resolve_rela_fixup): Add > barrier between stores. > diff --git a/sysdeps/aarch64/tlsdesc.c b/sysdeps/aarch64/tlsdesc.c index 4821f8c..f738cc6 100644 --- a/sysdeps/aarch64/tlsdesc.c +++ b/sysdeps/aarch64/tlsdesc.c @@ -87,6 +87,8 @@ _dl_tlsdesc_resolve_rela_fixup (struct tlsdesc volatile *td, if (!sym) { td->arg = (void*) reloc->r_addend; + /* Barrier so readers see the write above before the one below. */ + __sync_synchronize (); To 'atomic_store_relase (td->arg, (void*) reloc->r_addend))'