From patchwork Fri Dec  2 09:56:32 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Torvald Riegel <triegel@redhat.com>
X-Patchwork-Id: 86257
Delivered-To: patch@linaro.org
Received: by 10.182.112.6 with SMTP id im6csp222434obb;
 Fri, 2 Dec 2016 01:56:52 -0800 (PST)
X-Received: by 10.84.217.18 with SMTP id o18mr95588325pli.45.1480672612196; 
 Fri, 02 Dec 2016 01:56:52 -0800 (PST)
Return-Path: <libc-alpha-return-75413-patch=linaro.org@sourceware.org>
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id 3si4434405pln.317.2016.12.02.01.56.51
 for <patch@linaro.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Fri, 02 Dec 2016 01:56:52 -0800 (PST)
Received-SPF: pass (google.com: domain of
 libc-alpha-return-75413-patch=linaro.org@sourceware.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Authentication-Results: mx.google.com; dkim=pass header.i=@sourceware.org;
 spf=pass (google.com: domain of
 libc-alpha-return-75413-patch=linaro.org@sourceware.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom=libc-alpha-return-75413-patch=linaro.org@sourceware.org
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
 :list-unsubscribe:list-subscribe:list-archive:list-post
 :list-help:sender:message-id:subject:from:to:cc:date
 :content-type:mime-version; q=dns; s=default; b=dbYU3pdpP9UljWIK
 HJAiOnTKexjL4VImYHFruRxXiqggXAurcS8XgjF8tbZPz2x3E+9AUm3TrP5oZRiv
 oDeTnXe/ccjZRFoxFYIB1AVdfTG6n2j8ZInRHc/eVK3TMVrViqBlbNNa5iZXyoc6
 ir4WWv/vM7+AZlFBy2Txy73ou2Y=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
 :list-unsubscribe:list-subscribe:list-archive:list-post
 :list-help:sender:message-id:subject:from:to:cc:date
 :content-type:mime-version; s=default; bh=r2fRRGmU/PA9JIHE86B054
 YgTD0=; b=iRxQCCdNcoB5lvqaKAG3oPD0z8ZbPEoJahasBN1YB8uN7RfY+dkkbr
 wmZq4lmVk7eSVctpv1TQ+qe/jhf9CC7keefsaCn/XWaduoov52gd6RuAdmPSIkxp
 yu3rp8lJtynUhene778QgU32T/wtcNCd88zVDYGFWtwuMpXpUxLs4=
Received: (qmail 37010 invoked by alias); 2 Dec 2016 09:56:40 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-patch=linaro.org@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>,
 <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Delivered-To: mailing list libc-alpha@sourceware.org
Received: (qmail 36996 invoked by uid 89); 2 Dec 2016 09:56:39 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-3.8 required=5.0 tests=BAYES_00,
 KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD,
 SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=transactions,
 5506, Hx-languages-length:9765, 3535
X-HELO: mx1.redhat.com
Message-ID: <1480672592.14990.43.camel@redhat.com>
Subject: [PATCH] Use C11-like atomics instead of plain memory accesses in
 x86 lock elision.
From: Torvald Riegel <triegel@redhat.com>
To: GLIBC Devel <libc-alpha@sourceware.org>
Cc: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>, Stefan Liebler
 <stli@linux.vnet.ibm.com>, Steven Munroe <munroesj@linux.vnet.ibm.comcom>
Date: Fri, 02 Dec 2016 10:56:32 +0100
Mime-Version: 1.0

This uses atomic operations to access lock elision metadata that is
accessed concurrently (ie, adapt_count fields).  The size of the data is
less than a word but accessed only with atomic loads and stores;
therefore, we add support for shorter-size atomic load and stores too.

Once committed, I will add a note to the Concurrency page on the wiki.
The reason for just enabling shorter-size atomic loads and stores is
that so far, we have no need for shorter-size atomic read-modify-write
operations, and it would be harder to enable these on certain archs than
just loads and stores.

Other architectures that use lock elision should apply similar changes.

Tested on x86_64-linux.

commit d4944c57736847586aeee6b0e1ca911bd4799f8b
Author: Torvald Riegel <triegel@redhat.com>
Date:   Wed Nov 30 17:53:11 2016 +0100

    Use C11-like atomics instead of plain memory accesses in x86 lock elision.
    
    This uses atomic operations to access lock elision metadata that is accessed
    concurrently (ie, adapt_count fields).  The size of the data is less than a
    word but accessed only with atomic loads and stores; therefore, we add
    support for shorter-size atomic load and stores too.
    
    	* include/atomic.h (__atomic_check_size_ls): New.
    	(atomic_load_relaxed, atomic_load_acquire, atomic_store_relaxed,
    	atomic_store_release): Use it.
    	* sysdeps/x86/elide.h (ACCESS_ONCE): Remove.
    	(elision_adapt, ELIDE_LOCK): Use atomics.
    	* sysdeps/unix/sysv/linux/x86/elision-lock.c (__lll_lock_elision): Use
    	atomics and improve code comments.
    	* sysdeps/unix/sysv/linux/x86/elision-trylock.c
    	(__lll_trylock_elision): Likewise.

diff --git a/include/atomic.h b/include/atomic.h
index c8b4664..d14cbc5 100644
--- a/include/atomic.h
+++ b/include/atomic.h
@@ -550,6 +550,20 @@ void __atomic_link_error (void);
    if (sizeof (*mem) != 4)						      \
      __atomic_link_error ();
 # endif
+/* We additionally provide 8b and 16b atomic loads and stores; we do not yet
+   need other atomic operations of such sizes, and restricting the support to
+   loads and stores makes this easier for archs that do not have native
+   support for atomic operations to less-than-word-sized data.  */
+# if __HAVE_64B_ATOMICS == 1
+#  define __atomic_check_size_ls(mem) \
+   if ((sizeof (*mem) != 1) && (sizeof (*mem) != 2) && (sizeof (*mem) != 4)   \
+       && (sizeof (*mem) != 8))						      \
+     __atomic_link_error ();
+# else
+#  define __atomic_check_size_ls(mem) \
+   if ((sizeof (*mem) != 1) && (sizeof (*mem) != 2) && sizeof (*mem) != 4)    \
+     __atomic_link_error ();
+# endif
 
 # define atomic_thread_fence_acquire() \
   __atomic_thread_fence (__ATOMIC_ACQUIRE)
@@ -559,18 +573,20 @@ void __atomic_link_error (void);
   __atomic_thread_fence (__ATOMIC_SEQ_CST)
 
 # define atomic_load_relaxed(mem) \
-  ({ __atomic_check_size((mem)); __atomic_load_n ((mem), __ATOMIC_RELAXED); })
+  ({ __atomic_check_size_ls((mem));					      \
+     __atomic_load_n ((mem), __ATOMIC_RELAXED); })
 # define atomic_load_acquire(mem) \
-  ({ __atomic_check_size((mem)); __atomic_load_n ((mem), __ATOMIC_ACQUIRE); })
+  ({ __atomic_check_size_ls((mem));					      \
+     __atomic_load_n ((mem), __ATOMIC_ACQUIRE); })
 
 # define atomic_store_relaxed(mem, val) \
   do {									      \
-    __atomic_check_size((mem));						      \
+    __atomic_check_size_ls((mem));					      \
     __atomic_store_n ((mem), (val), __ATOMIC_RELAXED);			      \
   } while (0)
 # define atomic_store_release(mem, val) \
   do {									      \
-    __atomic_check_size((mem));						      \
+    __atomic_check_size_ls((mem));					      \
     __atomic_store_n ((mem), (val), __ATOMIC_RELEASE);			      \
   } while (0)
 
diff --git a/sysdeps/unix/sysv/linux/x86/elision-lock.c b/sysdeps/unix/sysv/linux/x86/elision-lock.c
index 5e66960..384c48e 100644
--- a/sysdeps/unix/sysv/linux/x86/elision-lock.c
+++ b/sysdeps/unix/sysv/linux/x86/elision-lock.c
@@ -44,7 +44,11 @@
 int
 __lll_lock_elision (int *futex, short *adapt_count, EXTRAARG int private)
 {
-  if (*adapt_count <= 0)
+  /* adapt_count is accessed inside and outside of transactions concurrently,
+     so we need to use atomic accesses to avoid data races.  However, the
+     value of adapt_count is just a hint, so relaxed MO accesses are
+     sufficient.  */
+  if (atomic_load_relaxed (adapt_count) <= 0)
     {
       unsigned status;
       int try_xbegin;
@@ -70,15 +74,23 @@ __lll_lock_elision (int *futex, short *adapt_count, EXTRAARG int private)
 			&& _XABORT_CODE (status) == _ABORT_LOCK_BUSY)
 	        {
 		  /* Right now we skip here.  Better would be to wait a bit
-		     and retry.  This likely needs some spinning.  */
-		  if (*adapt_count != aconf.skip_lock_busy)
-		    *adapt_count = aconf.skip_lock_busy;
+		     and retry.  This likely needs some spinning.
+		     While the transaction already ensures atomicity, we use
+		     atomic accesses here too just for consistency, and to
+		     make a potential future transition to C11 atomic data
+		     types easier.  */
+		  if (atomic_load_relaxed (adapt_count)
+		      != aconf.skip_lock_busy)
+		    atomic_store_relaxed (adapt_count, aconf.skip_lock_busy);
 		}
 	      /* Internal abort.  There is no chance for retry.
 		 Use the normal locking and next time use lock.
-		 Be careful to avoid writing to the lock.  */
-	      else if (*adapt_count != aconf.skip_lock_internal_abort)
-		*adapt_count = aconf.skip_lock_internal_abort;
+		 Be careful to avoid writing to the lock.  See above for why
+		 relaxed MO is sufficient.  */
+	      else if (atomic_load_relaxed (adapt_count)
+		  != aconf.skip_lock_internal_abort)
+		atomic_store_relaxed (adapt_count,
+		    aconf.skip_lock_internal_abort);
 	      break;
 	    }
 	}
@@ -87,7 +99,8 @@ __lll_lock_elision (int *futex, short *adapt_count, EXTRAARG int private)
     {
       /* Use a normal lock until the threshold counter runs out.
 	 Lost updates possible.  */
-      (*adapt_count)--;
+      atomic_store_relaxed (adapt_count,
+	  atomic_load_relaxed (adapt_count) - 1);
     }
 
   /* Use a normal lock here.  */
diff --git a/sysdeps/unix/sysv/linux/x86/elision-trylock.c b/sysdeps/unix/sysv/linux/x86/elision-trylock.c
index 65d9c18..677919c 100644
--- a/sysdeps/unix/sysv/linux/x86/elision-trylock.c
+++ b/sysdeps/unix/sysv/linux/x86/elision-trylock.c
@@ -36,8 +36,9 @@ __lll_trylock_elision (int *futex, short *adapt_count)
      return an error.  */
   _xabort (_ABORT_NESTED_TRYLOCK);
 
-  /* Only try a transaction if it's worth it.  */
-  if (*adapt_count <= 0)
+  /* Only try a transaction if it's worth it.  Relaxed MO is sufficient
+     because this is just a hint.  */
+  if (atomic_load_relaxed (adapt_count) <= 0)
     {
       unsigned status;
 
@@ -55,16 +56,18 @@ __lll_trylock_elision (int *futex, short *adapt_count)
       if (!(status & _XABORT_RETRY))
         {
           /* Internal abort.  No chance for retry.  For future
-             locks don't try speculation for some time.  */
-          if (*adapt_count != aconf.skip_trylock_internal_abort)
-            *adapt_count = aconf.skip_trylock_internal_abort;
+             locks don't try speculation for some time.  See above for MO.  */
+          if (atomic_load_relaxed (adapt_count)
+              != aconf.skip_lock_internal_abort)
+            atomic_store_relaxed (adapt_count, aconf.skip_lock_internal_abort);
         }
       /* Could do some retries here.  */
     }
   else
     {
-      /* Lost updates are possible, but harmless.  */
-      (*adapt_count)--;
+      /* Lost updates are possible but harmless (see above).  */
+      atomic_store_relaxed (adapt_count,
+	  atomic_load_relaxed (adapt_count) - 1);
     }
 
   return lll_trylock (*futex);
diff --git a/sysdeps/x86/elide.h b/sysdeps/x86/elide.h
index 8691e66..f7d5220 100644
--- a/sysdeps/x86/elide.h
+++ b/sysdeps/x86/elide.h
@@ -20,8 +20,8 @@
 
 #include <hle.h>
 #include <elision-conf.h>
+#include <atomic.h>
 
-#define ACCESS_ONCE(x) (* (volatile typeof(x) *) &(x))
 
 /* Adapt elision with ADAPT_COUNT and STATUS and decide retries.  */
 
@@ -35,28 +35,35 @@ elision_adapt(signed char *adapt_count, unsigned int status)
     {
       /* Right now we skip here.  Better would be to wait a bit
 	 and retry.  This likely needs some spinning. Be careful
-	 to avoid writing the lock.  */
-      if (*adapt_count != __elision_aconf.skip_lock_busy)
-	ACCESS_ONCE (*adapt_count) = __elision_aconf.skip_lock_busy;
+	 to avoid writing the lock.
+	 Using relaxed MO and separate atomic accesses is sufficient because
+	 adapt_count is just a hint.  */
+      if (atomic_load_relaxed (adapt_count) != __elision_aconf.skip_lock_busy)
+	atomic_store_relaxed (adapt_count, __elision_aconf.skip_lock_busy);
     }
   /* Internal abort.  There is no chance for retry.
      Use the normal locking and next time use lock.
-     Be careful to avoid writing to the lock.  */
-  else if (*adapt_count != __elision_aconf.skip_lock_internal_abort)
-    ACCESS_ONCE (*adapt_count) = __elision_aconf.skip_lock_internal_abort;
+     Be careful to avoid writing to the lock.  See above for MO.  */
+  else if (atomic_load_relaxed (adapt_count)
+      != __elision_aconf.skip_lock_internal_abort)
+    atomic_store_relaxed (adapt_count,
+	__elision_aconf.skip_lock_internal_abort);
   return true;
 }
 
 /* is_lock_free must be executed inside the transaction */
 
 /* Returns true if lock defined by IS_LOCK_FREE was elided.
-   ADAPT_COUNT is a pointer to per-lock state variable. */
+   ADAPT_COUNT is a per-lock state variable; it must be accessed atomically
+   to avoid data races but is just a hint, so using relaxed MO and separate
+   atomic loads and stores instead of atomic read-modify-write operations is
+   sufficient.  */
 
 #define ELIDE_LOCK(adapt_count, is_lock_free)			\
   ({								\
     int ret = 0;						\
 								\
-    if ((adapt_count) <= 0) 					\
+    if (atomic_load_relaxed (&(adapt_count)) <= 0)		\
       {								\
         for (int i = __elision_aconf.retry_try_xbegin; i > 0; i--) \
           {							\
@@ -75,12 +82,13 @@ elision_adapt(signed char *adapt_count, unsigned int status)
           }							\
       }								\
     else 							\
-      (adapt_count)--; /* missing updates ok */			\
+      atomic_store_relaxed (&(adapt_count),			\
+	  atomic_load_relaxed (&(adapt_count)) - 1);		\
     ret;							\
   })
 
 /* Returns true if lock defined by IS_LOCK_FREE was try-elided.
-   ADAPT_COUNT is a pointer to per-lock state variable.  */
+   ADAPT_COUNT is a per-lock state variable.  */
 
 #define ELIDE_TRYLOCK(adapt_count, is_lock_free, write) ({	\
   int ret = 0;						\