From patchwork Wed Jan 27 18:22:04 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Will Deacon X-Patchwork-Id: 60638 Delivered-To: patch@linaro.org Received: by 10.112.130.2 with SMTP id oa2csp7125lbb; Wed, 27 Jan 2016 10:22:50 -0800 (PST) X-Received: by 10.66.141.79 with SMTP id rm15mr3655942pab.77.1453918970801; Wed, 27 Jan 2016 10:22:50 -0800 (PST) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ph10si10903354pac.110.2016.01.27.10.22.50; Wed, 27 Jan 2016 10:22:50 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934807AbcA0SWo (ORCPT + 30 others); Wed, 27 Jan 2016 13:22:44 -0500 Received: from cam-admin0.cambridge.arm.com ([217.140.96.50]:55530 "EHLO cam-admin0.cambridge.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934567AbcA0SWj (ORCPT ); Wed, 27 Jan 2016 13:22:39 -0500 Received: from edgewater-inn.cambridge.arm.com (edgewater-inn.cambridge.arm.com [10.1.203.121]) by cam-admin0.cambridge.arm.com (8.12.6/8.12.6) with ESMTP id u0RIM4Wr007094; Wed, 27 Jan 2016 18:22:04 GMT Received: by edgewater-inn.cambridge.arm.com (Postfix, from userid 1000) id 512A31AE3650; Wed, 27 Jan 2016 18:22:05 +0000 (GMT) From: Will Deacon To: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Will Deacon , Boqun Feng , "Paul E. McKenney" , Peter Zijlstra Subject: [PATCH v3] barriers: introduce smp_mb__release_acquire and update documentation Date: Wed, 27 Jan 2016 18:22:04 +0000 Message-Id: <1453918924-27606-1-git-send-email-will.deacon@arm.com> X-Mailer: git-send-email 2.1.4 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org As much as we'd like to live in a world where RELEASE -> ACQUIRE is always cheaply ordered and can be used to construct UNLOCK -> LOCK definitions with similar guarantees, the grim reality is that this isn't even possible on x86 (thanks to Paul for bringing us crashing down to Earth). This patch handles the issue by introducing a new barrier macro, smp_mb__after_release_acquire, that can be placed after an ACQUIRE that either reads from a RELEASE or is in program-order after a RELEASE. The barrier upgrades the RELEASE-ACQUIRE pair to a full memory barrier, implying global transitivity. At the moment, it doesn't have any users, so its existence serves mainly as a documentation aid and a potential stepping stone to the reintroduction of smp_mb__after_unlock_lock() used by RCU. Documentation/memory-barriers.txt is updated to describe more clearly the ACQUIRE and RELEASE ordering in this area and to show some examples of the new barrier in action. Cc: Boqun Feng Cc: Paul E. McKenney Cc: Peter Zijlstra Signed-off-by: Will Deacon --- Based on Paul's patch to describe local vs global transitivity: http://lkml.kernel.org/r/20160115173912.GU3818@linux.vnet.ibm.com Documentation/memory-barriers.txt | 63 +++++++++++++++++++++++++++++++++---- arch/ia64/include/asm/barrier.h | 2 ++ arch/powerpc/include/asm/barrier.h | 3 +- arch/s390/include/asm/barrier.h | 1 + arch/sparc/include/asm/barrier_64.h | 5 +-- arch/x86/include/asm/barrier.h | 2 ++ include/asm-generic/barrier.h | 13 ++++++++ 7 files changed, 80 insertions(+), 9 deletions(-) -- 2.1.4 diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 2dc4ba7c8d4d..62ce096b88fa 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -454,16 +454,22 @@ And a couple of implicit varieties: The use of ACQUIRE and RELEASE operations generally precludes the need for other sorts of memory barrier (but note the exceptions mentioned in the subsection "MMIO write barrier"). In addition, a RELEASE+ACQUIRE - pair is -not- guaranteed to act as a full memory barrier. However, after + pair is -not- guaranteed to act as a full memory barrier without an + explicit smp_mb__after_release_acquire() barrier. However, after an ACQUIRE on a given variable, all memory accesses preceding any prior RELEASE on that same variable are guaranteed to be visible. In other - words, within a given variable's critical section, all accesses of all - previous critical sections for that variable are guaranteed to have - completed. + words, for a CPU executing within a given variable's critical section, + all accesses of all previous critical sections for that variable are + guaranteed to be visible to that CPU. This means that ACQUIRE acts as a minimal "acquire" operation and RELEASE acts as a minimal "release" operation. +A subset of the atomic operations described in atomic_ops.txt have ACQUIRE +and RELEASE variants in addition to fully-ordered and relaxed (no barrier +semantics) definitions. For compound atomics performing both a load and +a store, ACQUIRE semantics apply only to the load and RELEASE semantics +apply only to the store portion of the operation. Memory barriers are only required where there's a possibility of interaction between two CPUs or between a CPU and a device. If it can be guaranteed that @@ -1357,7 +1363,7 @@ However, the transitivity of release-acquire is local to the participating CPUs and does not apply to cpu3(). Therefore, the following outcome is possible: - r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0 + r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0 && r5 == 1 Although cpu0(), cpu1(), and cpu2() will see their respective reads and writes in order, CPUs not involved in the release-acquire chain might @@ -1369,10 +1375,27 @@ store to u as happening -after- cpu1()'s load from v, even though both cpu0() and cpu1() agree that these two operations occurred in the intended order. +This can be forbidden by upgrading the release-acquire relationship +between cpu0() and cpu1() to a full barrier using +smp_mb__after_release_acquire() to enforce global transitivity: + + void cpu1(void) + { + r1 = smp_load_acquire(&y); + smp_mb__after_release_acquire(); + r4 = READ_ONCE(v); + r5 = READ_ONCE(u); + smp_store_release(&z, 1); + } + +With this addition, the previous result is forbidden and, as long as +r1 == 1, all CPUs must agree that cpu0()'s store to u happened before +cpu1()'s read from v. + However, please keep in mind that smp_load_acquire() is not magic. In particular, it simply reads from its argument with ordering. It does -not- ensure that any particular value will be read. Therefore, the -following outcome is possible: +following outcome is possible irrespective of any additional barriers: r0 == 0 && r1 == 0 && r2 == 0 && r5 == 0 @@ -1971,6 +1994,34 @@ the RELEASE would simply complete, thereby avoiding the deadlock. a sleep-unlock race, but the locking primitive needs to resolve such races properly in any case. +Where the RELEASE and ACQUIRE operations are performed by the same CPU +to different addresses, ordering can be enforced by an +smp_mb__after_release_acquire() barrier: + + *A = a; + RELEASE M + ACQUIRE N + smp_mb__after_release_acquire(); + *B = b; + +in which case, the only permitted orderings are: + + STORE *A, RELEASE M, ACQUIRE N, STORE *B + STORE *A, ACQUIRE N, RELEASE M, STORE *B + +Similarly, smp_mb__after_release_acquire() enforces full, +globally-transitive ordering in the case that an ACQUIRE operation on +one CPU reads from a RELEASE operation on another: + + CPU 1 CPU 2 CPU 3 + ======================= ======================= =============== + { X = 0, Y = 0, M = 0 } + STORE X=1 ACQUIRE M==1 STORE Y=1 + RELEASE M=1 smp_mb__after_release_acquire() smp_mb(); + LOAD Y==0 LOAD X=0 + +This outcome is forbidden (i.e. CPU 3 must read X == 1). + Locks and semaphores may not provide any guarantee of ordering on UP compiled systems, and so cannot be counted on in such a situation to actually achieve anything at all - especially with respect to I/O accesses - unless combined diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h index 588f1614cafc..ed803c84bd19 100644 --- a/arch/ia64/include/asm/barrier.h +++ b/arch/ia64/include/asm/barrier.h @@ -67,6 +67,8 @@ do { \ ___p1; \ }) +#define __smp_mb__after_release_acquire() __smp_mb() + /* * The group barrier in front of the rsm & ssm are necessary to ensure * that none of the previous instructions in the same group are diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index c0deafc212b8..b12860b8c7e4 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -74,7 +74,8 @@ do { \ ___p1; \ }) -#define smp_mb__before_spinlock() smp_mb() +#define smp_mb__after_release_acquire() __smp_mb() +#define smp_mb__before_spinlock() smp_mb() #include diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h index 5c8db3ce61c8..ee31da604b11 100644 --- a/arch/s390/include/asm/barrier.h +++ b/arch/s390/include/asm/barrier.h @@ -45,6 +45,7 @@ do { \ ___p1; \ }) +#define __smp_mb__release_acquire() __smp_mb() #define __smp_mb__before_atomic() barrier() #define __smp_mb__after_atomic() barrier() diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h index c9f6ee64f41d..68c9e931a933 100644 --- a/arch/sparc/include/asm/barrier_64.h +++ b/arch/sparc/include/asm/barrier_64.h @@ -52,8 +52,9 @@ do { \ ___p1; \ }) -#define __smp_mb__before_atomic() barrier() -#define __smp_mb__after_atomic() barrier() +#define __smp_mb__after_release_acquire() __smp_mb() +#define __smp_mb__before_atomic() barrier() +#define __smp_mb__after_atomic() barrier() #include diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index a584e1c50918..b24dcb7e806a 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -77,6 +77,8 @@ do { \ #endif +#define __smp_mb__after_release_acquire() __smp_mb() + /* Atomic operations are already serializing on x86 */ #define __smp_mb__before_atomic() barrier() #define __smp_mb__after_atomic() barrier() diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h index 1cceca146905..895f5993d341 100644 --- a/include/asm-generic/barrier.h +++ b/include/asm-generic/barrier.h @@ -139,6 +139,10 @@ do { \ }) #endif +#ifndef __smp_mb__after_release_acquire +#define __smp_mb__after_release_acquire() do { } while (0) +#endif + #ifdef CONFIG_SMP #ifndef smp_store_mb @@ -161,6 +165,10 @@ do { \ #define smp_load_acquire(p) __smp_load_acquire(p) #endif +#ifndef smp_mb__after_release_acquire +#define smp_mb__after_release_acquire() __smp_mb__after_release_acquire() +#endif + #else /* !CONFIG_SMP */ #ifndef smp_store_mb @@ -194,6 +202,10 @@ do { \ }) #endif +#ifndef smp_mb__after_release_acquire +#define smp_mb__after_release_acquire() do { } while (0) +#endif + #endif /* Barriers for virtual machine guests when talking to an SMP host */ @@ -206,6 +218,7 @@ do { \ #define virt_mb__after_atomic() __smp_mb__after_atomic() #define virt_store_release(p, v) __smp_store_release(p, v) #define virt_load_acquire(p) __smp_load_acquire(p) +#define virt_mb__after_release_acquire() __smp_mb__after_release_acquire() #endif /* !__ASSEMBLY__ */ #endif /* __ASM_GENERIC_BARRIER_H */