From patchwork Wed Oct 15 21:46:59 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ola Liljedahl X-Patchwork-Id: 38792 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-ee0-f70.google.com (mail-ee0-f70.google.com [74.125.83.70]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id E131320C4E for ; Wed, 15 Oct 2014 21:47:43 +0000 (UTC) Received: by mail-ee0-f70.google.com with SMTP id c13sf1316610eek.9 for ; Wed, 15 Oct 2014 14:47:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:from:to:date:message-id:subject :precedence:list-id:list-unsubscribe:list-archive:list-post :list-help:list-subscribe:mime-version:errors-to:sender :x-original-sender:x-original-authentication-results:mailing-list :content-type:content-transfer-encoding; bh=tVrmwpCF+J+0VwiwiIcX4m7bkWWbXZbGWeLWTG4HKQQ=; b=W7tYytsRGVCrCRtSwyo+XSNXW5+ZisTj/O7MA6GrJNln+qUqlSIy8eOtku9q5VwnJS YR0cAXAQ6I1Va4Ikjx5H0dxAyTsPeGo3xzZqlfXWmrmwmY9keaOMYuxd2U9+htHWY2gd PMFLKqybYa7Y/1gn4OzM3lot2oPmtXdA4FoSGz5/EUWM0ajWuwy9BCuBKjl5wnEzyB1y z3o3v45muN9aeFo9CjEHLMi83WGp9YQIfmClgjQM2yaVJbQbWeIfMlD/x+A/jot1n1lN VEjRgBnUPUkmVPBNG4EmPnUE7PhluxrovAtlbVlFE+PBfPvrSnlWZaSiMpoPLZl+MyK4 NXwA== X-Gm-Message-State: ALoCoQnMGALNHSOLVPblv6SD5h6D5qkoKEgR1uC5UfUr9vkJVoaCbtHBAZ2v66a8jZxb3tCusfKt X-Received: by 10.152.26.72 with SMTP id j8mr2521335lag.3.1413409662666; Wed, 15 Oct 2014 14:47:42 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.8.133 with SMTP id r5ls93255laa.65.gmail; Wed, 15 Oct 2014 14:47:42 -0700 (PDT) X-Received: by 10.112.140.135 with SMTP id rg7mr15496160lbb.24.1413409662484; Wed, 15 Oct 2014 14:47:42 -0700 (PDT) Received: from mail-lb0-f171.google.com (mail-lb0-f171.google.com [209.85.217.171]) by mx.google.com with ESMTPS id ls3si7344277lac.126.2014.10.15.14.47.42 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 15 Oct 2014 14:47:42 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.217.171 as permitted sender) client-ip=209.85.217.171; Received: by mail-lb0-f171.google.com with SMTP id z12so1817239lbi.16 for ; Wed, 15 Oct 2014 14:47:42 -0700 (PDT) X-Received: by 10.152.120.200 with SMTP id le8mr14966512lab.67.1413409662311; Wed, 15 Oct 2014 14:47:42 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.84.229 with SMTP id c5csp691184lbz; Wed, 15 Oct 2014 14:47:40 -0700 (PDT) X-Received: by 10.224.160.83 with SMTP id m19mr2312209qax.17.1413409660186; Wed, 15 Oct 2014 14:47:40 -0700 (PDT) Received: from ip-10-35-177-41.ec2.internal (lists.linaro.org. [54.225.227.206]) by mx.google.com with ESMTPS id w6si18242748qax.23.2014.10.15.14.47.39 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 15 Oct 2014 14:47:40 -0700 (PDT) Received-SPF: none (google.com: lng-odp-bounces@lists.linaro.org does not designate permitted sender hosts) client-ip=54.225.227.206; Received: from localhost ([127.0.0.1] helo=ip-10-35-177-41.ec2.internal) by ip-10-35-177-41.ec2.internal with esmtp (Exim 4.76) (envelope-from ) id 1XeWPx-0005It-46; Wed, 15 Oct 2014 21:47:37 +0000 Received: from mail-lb0-f175.google.com ([209.85.217.175]) by ip-10-35-177-41.ec2.internal with esmtp (Exim 4.76) (envelope-from ) id 1XeWPk-0005I6-Ve for lng-odp@lists.linaro.org; Wed, 15 Oct 2014 21:47:25 +0000 Received: by mail-lb0-f175.google.com with SMTP id u10so1813187lbd.34 for ; Wed, 15 Oct 2014 14:47:18 -0700 (PDT) X-Received: by 10.152.2.41 with SMTP id 9mr14903712lar.47.1413409638675; Wed, 15 Oct 2014 14:47:18 -0700 (PDT) Received: from localhost.localdomain (84-217-193-107.tn.glocalnet.net. [84.217.193.107]) by mx.google.com with ESMTPSA id v1sm7098275lae.19.2014.10.15.14.47.15 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 15 Oct 2014 14:47:17 -0700 (PDT) From: Ola Liljedahl To: lng-odp@lists.linaro.org Date: Wed, 15 Oct 2014 23:46:59 +0200 Message-Id: <1413409619-13489-1-git-send-email-ola.liljedahl@linaro.org> X-Mailer: git-send-email 1.9.1 X-Topics: patch Subject: [lng-odp] [ODP/PATCH v1] Look ma, no barriers! C11 memory model X-BeenThere: lng-odp@lists.linaro.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , List-Subscribe: , MIME-Version: 1.0 Errors-To: lng-odp-bounces@lists.linaro.org Sender: lng-odp-bounces@lists.linaro.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: ola.liljedahl@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.217.171 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 Signed-off-by: Ola Liljedahl --- Implementation of C11-based memory model for atomic operations. Attempt to remove all explicit memory barriers (odp_sync_stores) from code that implements multithreaded synchronization primitives (e.g. locks, barriers). Rewrote such primitives to use the new atomic operations. Optimized support for ARMv6/v7, ARMv8(aarch64), x86_64, MIPS64/OCTEON Other architectures will fall back to GCC __sync builtins which often include unnecessarily heavy barrier/sync operations (always sequentially consistent). Fixed race conditions in odp_barrier_sync() (non-atomic wrap of counter) and odp_ring enqueue/dequeue (need release barrier but only had compiler barrier). example/generator/odp_generator.c | 43 +- example/ipsec/odp_ipsec.c | 2 +- example/odp_example/odp_example.c | 2 +- example/timer/odp_timer_test.c | 2 +- helper/include/odph_ring.h | 8 +- platform/linux-generic/include/api/odp_atomic.h | 820 ++++++++++++--------- platform/linux-generic/include/api/odp_barrier.h | 10 +- platform/linux-generic/include/api/odp_rwlock.h | 20 +- .../linux-generic/include/api/odp_ticketlock.h | 4 +- .../linux-generic/include/odp_buffer_internal.h | 2 +- platform/linux-generic/odp_barrier.c | 43 +- platform/linux-generic/odp_buffer.c | 3 +- platform/linux-generic/odp_crypto.c | 4 +- platform/linux-generic/odp_queue.c | 7 +- platform/linux-generic/odp_ring.c | 86 ++- platform/linux-generic/odp_rwlock.c | 46 +- platform/linux-generic/odp_thread.c | 6 +- platform/linux-generic/odp_ticketlock.c | 27 +- platform/linux-generic/odp_timer.c | 17 +- test/api_test/odp_atomic_test.c | 126 +--- test/api_test/odp_atomic_test.h | 9 - 21 files changed, 651 insertions(+), 636 deletions(-) diff --git a/example/generator/odp_generator.c b/example/generator/odp_generator.c index eb8b340..cf2d77b 100644 --- a/example/generator/odp_generator.c +++ b/example/generator/odp_generator.c @@ -62,10 +62,10 @@ typedef struct { * counters */ static struct { - odp_atomic_u64_t seq; /**< ip seq to be send */ - odp_atomic_u64_t ip; /**< ip packets */ - odp_atomic_u64_t udp; /**< udp packets */ - odp_atomic_u64_t icmp; /**< icmp packets */ + odp_atomic64_t seq; /**< ip seq to be send */ + odp_atomic64_t ip; /**< ip packets */ + odp_atomic64_t udp; /**< udp packets */ + odp_atomic64_t icmp; /**< icmp packets */ } counters; /** * Thread specific arguments @@ -201,7 +201,7 @@ static void pack_udp_pkt(odp_buffer_t obuf) ip->tot_len = odp_cpu_to_be_16(args->appl.payload + ODPH_UDPHDR_LEN + ODPH_IPV4HDR_LEN); ip->proto = ODPH_IPPROTO_UDP; - seq = odp_atomic_fetch_add_u64(&counters.seq, 1) % 0xFFFF; + seq = odp_atomic64_fetch_add_rlx(&counters.seq, 1) % 0xFFFF; ip->id = odp_cpu_to_be_16(seq); ip->chksum = 0; odph_ipv4_csum_update(pkt); @@ -258,7 +258,7 @@ static void pack_icmp_pkt(odp_buffer_t obuf) ip->tot_len = odp_cpu_to_be_16(args->appl.payload + ODPH_ICMPHDR_LEN + ODPH_IPV4HDR_LEN); ip->proto = ODPH_IPPROTO_ICMP; - seq = odp_atomic_fetch_add_u64(&counters.seq, 1) % 0xffff; + seq = odp_atomic64_fetch_add_rlx(&counters.seq, 1) % 0xffff; ip->id = odp_cpu_to_be_16(seq); ip->chksum = 0; odph_ipv4_csum_update(pkt); @@ -334,13 +334,15 @@ static void *gen_send_thread(void *arg) } if (args->appl.interval != 0) { + uint64_t seq = odp_atomic64_load_rlx(&counters.seq); printf(" [%02i] send pkt no:%ju seq %ju\n", - thr, counters.seq, counters.seq%0xffff); + thr, seq, seq%0xffff); /* TODO use odp timer */ usleep(args->appl.interval * 1000); } - if (args->appl.number != -1 && counters.seq - >= (unsigned int)args->appl.number) { + if (args->appl.number != -1 && + odp_atomic64_load_rlx(&counters.seq) >= + (unsigned int)args->appl.number) { break; } } @@ -348,7 +350,8 @@ static void *gen_send_thread(void *arg) /* receive number of reply pks until timeout */ if (args->appl.mode == APPL_MODE_PING && args->appl.number > 0) { while (args->appl.timeout >= 0) { - if (counters.icmp >= (unsigned int)args->appl.number) + if (odp_atomic64_load_rlx(&counters.icmp) >= + (unsigned int)args->appl.number) break; /* TODO use odp timer */ sleep(1); @@ -358,10 +361,12 @@ static void *gen_send_thread(void *arg) /* print info */ if (args->appl.mode == APPL_MODE_UDP) { - printf(" [%02i] total send: %ju\n", thr, counters.seq); + printf(" [%02i] total send: %ju\n", thr, + odp_atomic64_load_rlx(&counters.seq)); } else if (args->appl.mode == APPL_MODE_PING) { printf(" [%02i] total send: %ju total receive: %ju\n", - thr, counters.seq, counters.icmp); + thr, odp_atomic64_load_rlx(&counters.seq), + odp_atomic64_load_rlx(&counters.icmp)); } return arg; } @@ -395,7 +400,7 @@ static void print_pkts(int thr, odp_packet_t pkt_tbl[], unsigned len) if (!odp_packet_inflag_ipv4(pkt)) continue; - odp_atomic_inc_u64(&counters.ip); + odp_atomic64_add_rlx(&counters.ip, 1); rlen += sprintf(msg, "receive Packet proto:IP "); buf = odp_buffer_addr(odp_buffer_from_packet(pkt)); ip = (odph_ipv4hdr_t *)(buf + odp_packet_l3_offset(pkt)); @@ -405,7 +410,7 @@ static void print_pkts(int thr, odp_packet_t pkt_tbl[], unsigned len) /* udp */ if (ip->proto == ODPH_IPPROTO_UDP) { - odp_atomic_inc_u64(&counters.udp); + odp_atomic64_add_rlx(&counters.udp, 1); udp = (odph_udphdr_t *)(buf + offset); rlen += sprintf(msg + rlen, "UDP payload %d ", odp_be_to_cpu_16(udp->length) - @@ -417,7 +422,7 @@ static void print_pkts(int thr, odp_packet_t pkt_tbl[], unsigned len) icmp = (odph_icmphdr_t *)(buf + offset); /* echo reply */ if (icmp->type == ICMP_ECHOREPLY) { - odp_atomic_inc_u64(&counters.icmp); + odp_atomic64_add_rlx(&counters.icmp, 1); memcpy(&tvsend, buf + offset + ODPH_ICMPHDR_LEN, sizeof(struct timeval)); /* TODO This should be changed to use an @@ -530,10 +535,10 @@ int main(int argc, char *argv[]) } /* init counters */ - odp_atomic_init_u64(&counters.seq); - odp_atomic_init_u64(&counters.ip); - odp_atomic_init_u64(&counters.udp); - odp_atomic_init_u64(&counters.icmp); + odp_atomic64_store_rlx(&counters.seq, 0); + odp_atomic64_store_rlx(&counters.ip, 0); + odp_atomic64_store_rlx(&counters.udp, 0); + odp_atomic64_store_rlx(&counters.icmp, 0); /* Reserve memory for args from shared mem */ shm = odp_shm_reserve("shm_args", sizeof(args_t), diff --git a/example/ipsec/odp_ipsec.c b/example/ipsec/odp_ipsec.c index 2f2dc19..76c27d0 100644 --- a/example/ipsec/odp_ipsec.c +++ b/example/ipsec/odp_ipsec.c @@ -1223,7 +1223,7 @@ main(int argc, char *argv[]) printf("Num worker threads: %i\n", num_workers); /* Create a barrier to synchronize thread startup */ - odp_barrier_init_count(&sync_barrier, num_workers); + odp_barrier_init(&sync_barrier, num_workers); /* * By default core #0 runs Linux kernel background tasks. diff --git a/example/odp_example/odp_example.c b/example/odp_example/odp_example.c index 0e9aa3d..c473395 100644 --- a/example/odp_example/odp_example.c +++ b/example/odp_example/odp_example.c @@ -1120,7 +1120,7 @@ int main(int argc, char *argv[]) odp_shm_print_all(); /* Barrier to sync test case execution */ - odp_barrier_init_count(&globals->barrier, num_workers); + odp_barrier_init(&globals->barrier, num_workers); if (args.proc_mode) { int ret; diff --git a/example/timer/odp_timer_test.c b/example/timer/odp_timer_test.c index 78b2ae2..dfbeae9 100644 --- a/example/timer/odp_timer_test.c +++ b/example/timer/odp_timer_test.c @@ -372,7 +372,7 @@ int main(int argc, char *argv[]) printf("\n"); /* Barrier to sync test case execution */ - odp_barrier_init_count(&test_barrier, num_workers); + odp_barrier_init(&test_barrier, num_workers); /* Create and launch worker threads */ odph_linux_pthread_create(thread_tbl, num_workers, first_core, diff --git a/helper/include/odph_ring.h b/helper/include/odph_ring.h index 76c1db8..5e78b34 100644 --- a/helper/include/odph_ring.h +++ b/helper/include/odph_ring.h @@ -138,8 +138,8 @@ typedef struct odph_ring { uint32_t sp_enqueue; /* True, if single producer. */ uint32_t size; /* Size of ring. */ uint32_t mask; /* Mask (size-1) of ring. */ - uint32_t head; /* Producer head. */ - uint32_t tail; /* Producer tail. */ + odp_atomic32_t head; /* Producer head. */ + odp_atomic32_t tail; /* Producer tail. */ } prod ODP_ALIGNED_CACHE; /** @private Consumer */ @@ -147,8 +147,8 @@ typedef struct odph_ring { uint32_t sc_dequeue; /* True, if single consumer. */ uint32_t size; /* Size of the ring. */ uint32_t mask; /* Mask (size-1) of ring. */ - uint32_t head; /* Consumer head. */ - uint32_t tail; /* Consumer tail. */ + odp_atomic32_t head; /* Consumer head. */ + odp_atomic32_t tail; /* Consumer tail. */ } cons ODP_ALIGNED_CACHE; /** @private Memory space of ring starts here. */ diff --git a/platform/linux-generic/include/api/odp_atomic.h b/platform/linux-generic/include/api/odp_atomic.h index 0cc4cf4..89f183c 100644 --- a/platform/linux-generic/include/api/odp_atomic.h +++ b/platform/linux-generic/include/api/odp_atomic.h @@ -4,463 +4,559 @@ * SPDX-License-Identifier: BSD-3-Clause */ - /** * @file * - * ODP atomic operations + * ODP atomic types and operations, semantically a subset of C11 atomics. + * Scalar variable wrapped in a struct to avoid accessing scalar directly + * without using the required access functions. + * Atomic functions must be used to operate on atomic variables! */ #ifndef ODP_ATOMIC_H_ #define ODP_ATOMIC_H_ +#include +#include +#include + #ifdef __cplusplus extern "C" { #endif - -#include - - -/** - * Atomic integer - */ -typedef volatile int32_t odp_atomic_int_t; - /** - * Atomic unsigned integer 64 bits + * 32-bit (unsigned) atomic type */ -typedef volatile uint64_t odp_atomic_u64_t; +typedef struct { + uint32_t v; /**< Actual storage for the atomic variable */ +} odp_atomic32_t +ODP_ALIGNED(sizeof(uint32_t)); /* Enforce alignement! */ /** - * Atomic unsigned integer 32 bits + * 64-bit (unsigned) atomic type */ -typedef volatile uint32_t odp_atomic_u32_t; - +typedef struct { + uint64_t v; /**< Actual storage for the atomic variable */ +} odp_atomic64_t +ODP_ALIGNED(sizeof(uint64_t)); /* Enforce alignement! */ -/** - * Initialize atomic integer - * - * @param ptr An integer atomic variable - * - * @note The operation is not synchronized with other threads - */ -static inline void odp_atomic_init_int(odp_atomic_int_t *ptr) -{ - *ptr = 0; -} - -/** - * Load value of atomic integer - * - * @param ptr An atomic variable - * - * @return atomic integer value - * - * @note The operation is not synchronized with other threads - */ -static inline int odp_atomic_load_int(odp_atomic_int_t *ptr) -{ - return *ptr; -} +/***************************************************************************** + * Just a few helpers + *****************************************************************************/ -/** - * Store value to atomic integer - * - * @param ptr An atomic variable - * @param new_value Store new_value to a variable - * - * @note The operation is not synchronized with other threads - */ -static inline void odp_atomic_store_int(odp_atomic_int_t *ptr, int new_value) -{ - *ptr = new_value; -} - -/** - * Fetch and add atomic integer - * - * @param ptr An atomic variable - * @param value A value to be added to the variable - * - * @return Value of the variable before the operation - */ -static inline int odp_atomic_fetch_add_int(odp_atomic_int_t *ptr, int value) -{ - return __sync_fetch_and_add(ptr, value); -} - -/** - * Fetch and subtract atomic integer - * - * @param ptr An atomic integer variable - * @param value A value to be subtracted from the variable - * - * @return Value of the variable before the operation - */ -static inline int odp_atomic_fetch_sub_int(odp_atomic_int_t *ptr, int value) -{ - return __sync_fetch_and_sub(ptr, value); -} - -/** - * Fetch and increment atomic integer by 1 - * - * @param ptr An atomic variable - * - * @return Value of the variable before the operation - */ -static inline int odp_atomic_fetch_inc_int(odp_atomic_int_t *ptr) -{ - return odp_atomic_fetch_add_int(ptr, 1); -} - -/** - * Increment atomic integer by 1 - * - * @param ptr An atomic variable - * - */ -static inline void odp_atomic_inc_int(odp_atomic_int_t *ptr) -{ - odp_atomic_fetch_add_int(ptr, 1); -} - -/** - * Fetch and decrement atomic integer by 1 - * - * @param ptr An atomic int variable - * - * @return Value of the variable before the operation - */ -static inline int odp_atomic_fetch_dec_int(odp_atomic_int_t *ptr) -{ - return odp_atomic_fetch_sub_int(ptr, 1); -} - -/** - * Decrement atomic integer by 1 - * - * @param ptr An atomic variable - * - */ -static inline void odp_atomic_dec_int(odp_atomic_int_t *ptr) -{ - odp_atomic_fetch_sub_int(ptr, 1); -} +#ifdef __OCTEON__ +/* OCTEON Write Memory Barrier */ +#define COMPILER_HW_BARRIER() __asm __volatile( \ + /* Double syncw to work around errata */ \ + ".set push\n\t.set arch=octeon\n\tsyncw\n\tsyncw\n\t.set pop" \ + : : : "memory") +/* syncw is also used to flush the write buffer which makes stores visible + * quicker which should be beneficial to release operations */ +#define OCTEON_FLUSH() __asm __volatile( \ + ".set push\n\t.set arch=octeon\n\tsyncw\n\t.set pop" \ + : : : "memory") +#else +/* __sync_synchronize() generates the right insn for ARMv6t2 and ARMv7-a */ +/** Compiler and hardware full memory barrier */ +#define COMPILER_HW_BARRIER() __sync_synchronize() +/** Flush write buffer on OCTEON */ +#define OCTEON_FLUSH() (void)0 +#endif -/** - * Initialize atomic uint32 - * - * @param ptr An atomic variable - * - * @note The operation is not synchronized with other threads - */ -static inline void odp_atomic_init_u32(odp_atomic_u32_t *ptr) -{ - *ptr = 0; -} +/** Compiler memory barrier */ +#define COMPILER_BARRIER() __asm __volatile("" : : : "memory") -/** - * Load value of atomic uint32 - * - * @param ptr An atomic variable - * - * @return atomic uint32 value - * - * @note The operation is not synchronized with other threads - */ -static inline uint32_t odp_atomic_load_u32(odp_atomic_u32_t *ptr) -{ - return *ptr; -} +/***************************************************************************** + * Operations on 32-bit atomics + * odp_atomic32_load_rlx + * odp_atomic32_store_rlx + * odp_atomic32_load_acq + * odp_atomic32_store_rls + * odp_atomic32_cmp_and_swap_rlx - return old value + * odp_atomic32_fetch_add_rlx - return old value + * odp_atomic32_fetch_add_rls - return old value + * odp_atomic32_add_rlx - no return value + * odp_atomic32_add_rls - no return value + *****************************************************************************/ /** - * Store value to atomic uint32 + * Relaxed atomic load of 32-bit atomic variable + * @note Relaxed memory model, no barriers. * - * @param ptr An atomic variable - * @param new_value Store new_value to a variable + * @param ptr Pointer to a 32-bit atomic variable * - * @note The operation is not synchronized with other threads + * @return Value of the variable */ -static inline void odp_atomic_store_u32(odp_atomic_u32_t *ptr, - uint32_t new_value) +static inline uint32_t odp_atomic32_load_rlx(const odp_atomic32_t *ptr) { - *ptr = new_value; + uint32_t val; + COMPILER_BARRIER(); + /* Read of aligned word is atomic */ + val = ptr->v; + COMPILER_BARRIER(); + return val; } /** - * Fetch and add atomic uint32 - * - * @param ptr An atomic variable - * @param value A value to be added to the variable + * Relaxed atomic store of 32-bit atomic variable + * @note Relaxed memory model, no barriers. * - * @return Value of the variable before the operation + * @param ptr Pointer to a 32-bit atomic variable + * @param val Value to write to the variable */ -static inline uint32_t odp_atomic_fetch_add_u32(odp_atomic_u32_t *ptr, - uint32_t value) +static inline void odp_atomic32_store_rlx(odp_atomic32_t *ptr, uint32_t val) { - return __sync_fetch_and_add(ptr, value); + COMPILER_BARRIER(); + /* Write of aligned word is atomic */ + ptr->v = val; + COMPILER_BARRIER(); } /** - * Fetch and subtract uint32 + * Atomic load-acquire of 32-bit atomic variable + * @note SC-load-acquire barrier, later accesses cannot move before + * the load-acquire access. * - * @param ptr An atomic variable - * @param value A value to be sub to the variable + * @param ptr Pointer to a 32-bit atomic variable * - * @return Value of the variable before the operation + * @return Value of the variable */ -static inline uint32_t odp_atomic_fetch_sub_u32(odp_atomic_u32_t *ptr, - uint32_t value) +static inline uint32_t odp_atomic32_load_acq(const odp_atomic32_t *ptr) { - return __sync_fetch_and_sub(ptr, value); +#if defined __aarch64__ + uint32_t val; + __asm __volatile("ldar %w0, [%1]" + : "=&r"(val) + : "r"(&ptr->v) + : "memory"); + return val; +#elif defined __arm__ || defined __mips64__ || defined __x86_64__ + /* Read of aligned word is atomic */ + uint32_t val = ptr->v; + /* To prevent later accesses from moving up */ + /* FIXME: Herb Sutter claims HW barrier not needed on x86? */ + COMPILER_HW_BARRIER(); + return val; +#else +#warning odp_atomic32_load_acq() may not be efficiently implemented + /* Assume read of aligned word is atomic */ + uint32_t val = ptr->v; + /* To prevent later accesses from moving up */ + COMPILER_HW_BARRIER(); + return val; +#endif } /** - * Fetch and increment atomic uint32 by 1 - * - * @param ptr An atomic variable - * - * @return Value of the variable before the operation - */ -#if defined __OCTEON__ - -static inline uint32_t odp_atomic_fetch_inc_u32(odp_atomic_u32_t *ptr) -{ - uint32_t ret; - - __asm__ __volatile__ ("syncws"); - __asm__ __volatile__ ("lai %0,(%2)" : "=r" (ret), "+m" (ptr) : - "r" (ptr)); - - return ret; -} - + * Atomic store-release of 32-bit atomic variable + * @note SC-store-release barrier, earlier accesses cannot move after + * store-release access. + * + * @param ptr Pointer to a 32-bit atomic variable + * @param val Value to write to the atomic variable + */ +static inline void odp_atomic32_store_rls(odp_atomic32_t *ptr, uint32_t val) +{ +#if defined __arm__ /* A32/T32 ISA */ + /* Compiler and HW barrier to prevent earlier accesses from moving + * down */ + COMPILER_HW_BARRIER(); + /* Write of aligned word is atomic */ + ptr->v = val; + /* Compiler and HW barrier to prevent this store from moving down after + * a later load-acquire and thus create overlapping critical sections. + * Herb Sutter thinks this is needed */ + COMPILER_HW_BARRIER(); +#elif defined __aarch64__ + __asm __volatile("stlr %w0, [%1]" + : + : "r"(val), "r"(&ptr->v) + : "memory"); +#elif defined __mips64__ + /* Compiler and HW barrier to prevent earlier accesses from moving + * down */ + COMPILER_HW_BARRIER(); + /* Write of aligned word is atomic */ + ptr->v = val; + /* Compiler and HW barrier to prevent this store from moving down after + * a later load-acquire and thus create overlapping critical sections. + * Herb Sutter thinks this is needed */ + COMPILER_HW_BARRIER(); +#elif defined __x86_64__ + /* This is actually an atomic exchange operation */ + /* Generates good code on x86_64 */ + (void)__sync_lock_test_and_set(&ptr->v, val); #else - -static inline uint32_t odp_atomic_fetch_inc_u32(odp_atomic_u32_t *ptr) -{ - return odp_atomic_fetch_add_u32(ptr, 1); -} - +#warning odp_atomic32_store_rls() may not be efficiently implemented + /* This is actually an atomic exchange operation */ + (void)__sync_lock_test_and_set(&ptr->v, val); #endif - -/** - * Increment atomic uint32 by 1 - * - * @param ptr An atomic variable - * - */ -static inline void odp_atomic_inc_u32(odp_atomic_u32_t *ptr) -{ - odp_atomic_fetch_add_u32(ptr, 1); } -/** - * Fetch and decrement uint32 by 1 - * - * @param ptr An atomic variable - * - * @return Value of the variable before the operation - */ -static inline uint32_t odp_atomic_fetch_dec_u32(odp_atomic_u32_t *ptr) -{ - return odp_atomic_fetch_sub_u32(ptr, 1); -} /** - * Decrement atomic uint32 by 1 - * - * @param ptr An atomic variable - * - */ -static inline void odp_atomic_dec_u32(odp_atomic_u32_t *ptr) -{ - odp_atomic_fetch_sub_u32(ptr, 1); + * Atomic compare and swap of 32-bit atomic variable + * @note Relaxed memory model, no barriers. + * @note Not compare-and-set! Called should compare return value with expected + * parameter to check if swap operation succeeded. + * + * @param ptr Pointer to a 32-bit atomic variable + * @param exp Expected old value + * @param val New value + * @return Actual old value, if different from 'exp' then swap failed + */ +static inline uint32_t +odp_atomic32_cmp_and_swap_rlx(odp_atomic32_t *ptr, + uint32_t exp, + uint32_t val) +{ +#if defined __arm__ /* A32/T32 ISA */ + uint32_t old; + int status; + do { + __asm __volatile("ldrex %0, [%1]" + : "=&r"(old) + : "r"(&ptr->v) + : "memory"); + if (odp_unlikely(old != exp)) { + /* Value has changed, can't proceed */ + /* Clear exclusive access monitor */ + __asm __volatile("clrex"); + break; + } + /* Current value is as expected, attempt to write new value */ + __asm __volatile("strex %0, %1, [%2]" + : "=&r"(status) + : "r"(val), "r"(&ptr->v) + : "memory"); + /* Restart the loop so we can re-read the previous value */ + } while (odp_unlikely(status != 0)); + return old; +#elif defined __aarch64__ + uint32_t old; + int status; + do { + __asm __volatile("ldxr %w0, [%1]" + : "=&r"(old) + : "r"(&ptr->v) + : "memory"); + if (odp_unlikely(old != exp)) { + /* Value has changed, can't proceed */ + /* Clear exclusive access monitor */ + __asm __volatile("clrex"); + break; + } + /* Current value is as expected, attempt to write new value */ + __asm __volatile("stxr %w0, %w1, [%2]" + : "=&r"(status) + : "r"(val), "r"(&ptr->v) + : "memory"); + /* Restart the loop so we can re-read the previous value */ + } while (odp_unlikely(status != 0)); + return old; +#elif defined __mips64__ + uint32_t old, new_val; + do { + __asm __volatile("llw %0, [%1]" + : "=&r"(old) + : "r"(&ptr->v) + : "memory"); + if (odp_unlikely(old != exp)) { + /* Value has changed, can't proceed */ + break; + } + /* Current value is as expected, attempt to write new value */ + new_val = val; + __asm __volatile("scw %0, [%1]" + : "+&r"(new_val) + : "r"(&ptr->v) + : "memory"); + } while (odp_unlikely(new_val == 0)); + return old; +#elif defined __x86_64__ + /* Generates good code on x86_64 */ + return __sync_val_compare_and_swap(&ptr->v, exp, val); +#else +#warning odp_atomic32_cmp_and_swap_rlx() may not be efficiently implemented + return __sync_val_compare_and_swap(&ptr->v, exp, val); +#endif } /** - * Atomic compare and set for 32bit - * - * @param dst destination location into which the value will be written. - * @param exp expected value. - * @param src new value. - * @return Non-zero on success; 0 on failure. - */ -static inline int -odp_atomic_cmpset_u32(odp_atomic_u32_t *dst, uint32_t exp, uint32_t src) -{ - return __sync_bool_compare_and_swap(dst, exp, src); + * Atomic fetch and add to 32-bit atomic variable + * @note Relaxed memory model, no barriers. + * @note A - B <=> A + (-B) + * + * @param ptr Pointer to a 32-bit atomic variable + * @param incr The value to be added to the atomic variable + * + * @return Value of the atomic variable before the addition + */ +static inline uint32_t odp_atomic32_fetch_add_rlx(odp_atomic32_t *ptr, + uint32_t incr) +{ +#if defined __arm__ /* A32/T32 ISA */ + uint32_t old_val, new_val; + int status; + do { + __asm __volatile("ldrex %0, [%1]" + : "=&r"(old_val) + : "r"(&ptr->v) + : "memory"); + new_val = old_val + incr; + __asm __volatile("strex %0, %1, [%2]" + : "=&r"(status) + : "r"(new_val), "r"(&ptr->v) + : "memory"); + } while (odp_unlikely(status != 0)); + return old_val; +#elif defined __aarch64__ + uint32_t old_val, new_val; + int status; + do { + __asm __volatile("ldxr %w0, [%1]" + : "=&r"(old_val) + : "r"(&ptr->v) + : "memory"); + new_val = old_val + incr; + __asm __volatile("stxr %w0, %w1, [%2]" + : "=&r"(status) + : "r"(new_val), "r"(&ptr->v) + : "memory"); + } while (odp_unlikely(status != 0)); + return old_val; +#elif defined __mips64__ + uint32_t old_val, new_val; + do { + __asm __volatile("llw %0, [%1]" + : "=&r"(old_val) + : "r"(&ptr->v) + : "memory"); + new_val = old_val + incr; + __asm __volatile("scw %0, [%1]" + : "+&r"(new_val) + : "r"(&ptr->v) + : "memory"); + } while (odp_unlikely(new_val == 0)); + return old_val; +#elif defined __x86_64__ + /* Generates good code on x86_64 */ + return __sync_fetch_and_add(&ptr->v, incr); +#else +#warning odp_atomic32_fetch_add_rlx() may not be efficiently implemented + return __sync_fetch_and_add(&ptr->v, incr); +#endif } /** - * Initialize atomic uint64 + * Atomic fetch and add to 32-bit atomic variable + * @note Sequential consistent memory model, barriers before and after the + * @note A - B <=> A + (-B) * - * @param ptr An atomic variable + * @param ptr Pointer to a 32-bit atomic variable + * @param incr The value to be added to the atomic variable * - * @note The operation is not synchronized with other threads + * @return Value of the atomic variable before the addition */ -static inline void odp_atomic_init_u64(odp_atomic_u64_t *ptr) +static inline uint32_t odp_atomic32_fetch_add_rls(odp_atomic32_t *ptr, + uint32_t incr) { - *ptr = 0; +#if defined __arm__ /* A32/T32 ISA */ + COMPILER_HW_BARRIER(); + return odp_atomic32_fetch_add_rlx(ptr, incr); +#elif defined __aarch64__ + /* We basically get acquire/release semantics */ + return __sync_fetch_and_add(&ptr->v, incr); +#elif defined __mips64__ + uint32_t old; + COMPILER_HW_BARRIER(); + old = odp_atomic32_fetch_add_rlx(ptr, incr); + OCTEON_FLUSH(); + return old; +#elif defined __x86_64__ + /* Generates good code on x86_64 */ + return __sync_fetch_and_add(&ptr->v, incr); +#else +#warning odp_atomic32_fetch_add_rls() may not be efficiently implemented + return __sync_fetch_and_add(&ptr->v, incr); +#endif } /** - * Load value of atomic uint64 - * - * @param ptr An atomic variable + * Atomic add to 32-bit atomic variable + * @note Relaxed memory model, no barriers. * - * @return atomic uint64 value - * - * @note The operation is not synchronized with other threads + * @param ptr Pointer to a 32-bit atomic variable + * @param incr The value to be added to the atomic variable */ -static inline uint64_t odp_atomic_load_u64(odp_atomic_u64_t *ptr) +static inline void odp_atomic32_add_rlx(odp_atomic32_t *ptr, + uint32_t incr) { - return *ptr; + /* Use odp_atomic32_fetch_add_rlx() for now */ + (void)odp_atomic32_fetch_add_rlx(ptr, incr); } /** - * Store value to atomic uint64 - * - * @param ptr An atomic variable - * @param new_value Store new_value to a variable + * Atomic add to 32-bit atomic variable + * @note Sequential consistent memory model, barriers before and after the + * operation. * - * @note The operation is not synchronized with other threads + * @param ptr Pointer to a 32-bit atomic variable + * @param incr The value to be added to the atomic variable */ -static inline void odp_atomic_store_u64(odp_atomic_u64_t *ptr, - uint64_t new_value) +static inline void odp_atomic32_add_rls(odp_atomic32_t *ptr, uint32_t incr) { - *ptr = new_value; + /* Use odp_atomic32_fetch_add_rls() for now */ + (void)odp_atomic32_fetch_add_rls(ptr, incr); } -/** - * Add atomic uint64 - * - * @param ptr An atomic variable - * @param value A value to be added to the variable - * - */ -static inline void odp_atomic_add_u64(odp_atomic_u64_t *ptr, uint64_t value) -{ - __sync_fetch_and_add(ptr, value); -} +/***************************************************************************** + * Operations on 64-bit atomics + * odp_atomic64_load_rlx + * odp_atomic64_store_rlx + * odp_atomic64_fetch_add_rlx + * odp_atomic64_add_rlx + *****************************************************************************/ /** - * Fetch and add atomic uint64 + * Relaxed atomic load of 64-bit atomic variable + * @note Relaxed memory model, no barriers. * - * @param ptr An atomic variable - * @param value A value to be added to the variable + * @param ptr Pointer to a 64-bit atomic variable * - * @return Value of the variable before the operation + * @return Value of the atomic variable */ - -#if defined __powerpc__ && !defined __powerpc64__ -static inline uint64_t odp_atomic_fetch_add_u64(odp_atomic_u64_t *ptr, - uint64_t value) +static inline uint64_t odp_atomic64_load_rlx(odp_atomic64_t *ptr) { - return __sync_fetch_and_add((odp_atomic_u32_t *)ptr, - (uint32_t)value); -} +#if defined __arm__ /* A32/T32 ISA */ + uint64_t val; + __asm __volatile("ldrexd %0, %H0, [%1]\n\t" + "clrex" /* Clear exclusive access monitor */ + : "=&r"(val) + : "r"(&ptr->v) + : ); + return val; +#elif defined __x86_64__ || defined __aarch64__ || defined __mips64__ + /* Read of aligned quad/double word is atomic */ + return ptr->v; #else -static inline uint64_t odp_atomic_fetch_add_u64(odp_atomic_u64_t *ptr, - uint64_t value) -{ - return __sync_fetch_and_add(ptr, value); -} +#warning odp_atomic64_load_rlx() may not be efficiently implemented + return __sync_fetch_and_or(&ptr->v, 0); #endif -/** - * Subtract atomic uint64 - * - * @param ptr An atomic variable - * @param value A value to be subtracted from the variable - * - */ -static inline void odp_atomic_sub_u64(odp_atomic_u64_t *ptr, uint64_t value) -{ - __sync_fetch_and_sub(ptr, value); } /** - * Fetch and subtract atomic uint64 - * - * @param ptr An atomic variable - * @param value A value to be subtracted from the variable - * - * @return Value of the variable before the operation - */ -#if defined __powerpc__ && !defined __powerpc64__ -static inline uint64_t odp_atomic_fetch_sub_u64(odp_atomic_u64_t *ptr, - uint64_t value) -{ - return __sync_fetch_and_sub((odp_atomic_u32_t *)ptr, - (uint32_t)value); -} + * Relaxed atomic store of 64-bit atomic variable + * @note Relaxed memory model, no barriers. + * + * @param ptr Pointer to a 64-bit atomic variable + * @param val Value to write to the atomic variable + */ +static inline void odp_atomic64_store_rlx(odp_atomic64_t *ptr, + uint64_t val) +{ +#if defined __arm__ /* A32/T32 ISA */ + uint64_t old_val; + int status; + do { + /* Read atomic variable exclusively so we can write to it + * later */ + __asm __volatile("ldrexd %0, %H0, [%1]" + : "=&r"(old_val) + : "r"(&ptr->v) + : "memory"); + (void)old_val; /* Ignore old value */ + /* Attempt to write the new value */ + __asm __volatile("strexd %0, %1, %H1, [%2]" + : "=&r"(status) + : "r"(val), "r"(&ptr->v) + : "memory"); + } while (odp_unlikely(status != 0)); /* Retry until write succeeds */ +#elif defined __x86_64__ || defined __aarch64__ || defined __mips64__ + /* Write of aligned quad/double word is atomic */ + ptr->v = val; #else -static inline uint64_t odp_atomic_fetch_sub_u64(odp_atomic_u64_t *ptr, - uint64_t value) -{ - return __sync_fetch_and_sub(ptr, value); -} +#warning odp_atomic64_store_rlx() may not be efficiently implemented + /* This is actually an atomic exchange operation */ + (void)__sync_lock_test_and_set(&ptr->v, val); #endif -/** - * Fetch and increment atomic uint64 by 1 - * - * @param ptr An atomic variable - * - * @return Value of the variable before the operation - */ -static inline uint64_t odp_atomic_fetch_inc_u64(odp_atomic_u64_t *ptr) -{ - return odp_atomic_fetch_add_u64(ptr, 1); -} - -/** - * Increment atomic uint64 by 1 - * - * @param ptr An atomic variable - * - */ -static inline void odp_atomic_inc_u64(odp_atomic_u64_t *ptr) -{ - odp_atomic_fetch_add_u64(ptr, 1); -} - -/** - * Fetch and decrement atomic uint64 by 1 - * - * @param ptr An atomic variable - * - * @return Value of the variable before the operation - */ -static inline uint64_t odp_atomic_fetch_dec_u64(odp_atomic_u64_t *ptr) -{ - return odp_atomic_fetch_sub_u64(ptr, 1); } /** - * Decrement atomic uint64 by 1 - * - * @param ptr An atomic variable - * - */ -static inline void odp_atomic_dec_u64(odp_atomic_u64_t *ptr) -{ - odp_atomic_fetch_sub_u64(ptr, 1); + * Atomic fetch and add to 64-bit atomic variable + * @note Relaxed memory model, no barriers. + * + * @param ptr Pointer to a 64-bit atomic variable + * @param incr The value to be added to the atomic variable + * + * @return Value of the atomic variable before the addition + */ +static inline uint64_t odp_atomic64_fetch_add_rlx(odp_atomic64_t *ptr, + uint64_t incr) +{ +#if defined __arm__ /* A32/T32 ISA */ + uint64_t old_val, new_val; + int status; + do { + __asm __volatile("ldrexd %0, %H0, [%1]" + : "=&r"(old_val) + : "r"(&ptr->v) + : "memory"); + new_val = old_val + incr; + __asm __volatile("strexd %0, %1, %H1, [%2]" + : "=&r"(status) + : "r"(new_val), "r"(&ptr->v) + : "memory"); + } while (odp_unlikely(status != 0)); /* Retry until write succeeds */ + return old_val; +#elif defined __aarch64__ + uint64_t old_val, new_val; + int status; + do { + __asm __volatile("ldxr %x0, [%1]" + : "=&r"(old_val) + : "r"(&ptr->v) + : "memory"); + new_val = old_val + incr; + __asm __volatile("stxr %w0, %x1, [%2]" + : "=&r"(status) + : "r"(new_val), "r"(&ptr->v) + : "memory"); + } while (odp_unlikely(status != 0)); /* Retry until write succeeds */ + return old_val; +#elif defined __mips64__ + uint64_t old_val, new_val; + do { + __asm __volatile("ll %0, [%1]" + : "=&r"(old_val) + : "r"(&ptr->v) + : "memory"); + new_val = old_val + incr; + __asm __volatile("sc %0, [%1]" + : "+&r"(new_val) + : "r"(&ptr->v) + : "memory"); + } while (odp_unlikely(new_val == 0)); + return old_val; +#elif defined __x86_64__ + /* Generates good code on x86_64 */ + return __sync_fetch_and_add(&ptr->v, incr); +#else +#warning odp_atomic64_fetch_add_rlx() may not be efficiently implemented + return __sync_fetch_and_add(&ptr->v, incr); +#endif } /** - * Atomic compare and set for 64bit + * Atomic add to 64-bit atomic variable + * @note Relaxed memory model, no barriers. * - * @param dst destination location into which the value will be written. - * @param exp expected value. - * @param src new value. - * @return Non-zero on success; 0 on failure. + * @param ptr Pointer to a 64-bit atomic variable + * @param incr The value to be added to the atomic variable */ -static inline int -odp_atomic_cmpset_u64(odp_atomic_u64_t *dst, uint64_t exp, uint64_t src) +static inline void odp_atomic64_add_rlx(odp_atomic64_t *ptr, uint64_t incr) { - return __sync_bool_compare_and_swap(dst, exp, src); + (void)odp_atomic64_fetch_add_rlx(ptr, incr); } #ifdef __cplusplus diff --git a/platform/linux-generic/include/api/odp_barrier.h b/platform/linux-generic/include/api/odp_barrier.h index a7b3215..f8eae9a 100644 --- a/platform/linux-generic/include/api/odp_barrier.h +++ b/platform/linux-generic/include/api/odp_barrier.h @@ -27,18 +27,18 @@ extern "C" { * ODP execution barrier */ typedef struct odp_barrier_t { - int count; /**< @private Thread count */ - odp_atomic_int_t bar; /**< @private Barrier counter */ + uint32_t num_threads; /**< @private Thread count (constant) */ + odp_atomic32_t in_barrier; /**< @private Threaads in barrier */ } odp_barrier_t; /** * Init barrier with thread count * - * @param barrier Barrier - * @param count Thread count + * @param barrier Barrier + * @param num_threads Number of threads which share the barrier */ -void odp_barrier_init_count(odp_barrier_t *barrier, int count); +void odp_barrier_init(odp_barrier_t *barrier, uint32_t num_threads); /** diff --git a/platform/linux-generic/include/api/odp_rwlock.h b/platform/linux-generic/include/api/odp_rwlock.h index 252ebb2..ff8a9a2 100644 --- a/platform/linux-generic/include/api/odp_rwlock.h +++ b/platform/linux-generic/include/api/odp_rwlock.h @@ -10,26 +10,30 @@ /** * @file * - * ODP RW Locks + * ODP read/write lock + * RW lock support multiple concurrent reads but only one (exclusive) writer. */ +#include + #ifdef __cplusplus extern "C" { #endif /** * The odp_rwlock_t type. - * write lock count is -1, - * read lock count > 0 + * write lock is ~0U + * read lock count >0 && <~0U */ typedef struct { - volatile int32_t cnt; /**< -1 Write lock, - > 0 for Read lock. */ + odp_atomic32_t cnt; /**< == 0: unlocked, + == ~0: locked for write, + > 0 number of concurrent read locks */ } odp_rwlock_t; /** - * Initialize the rwlock to an unlocked state. + * Initialize the rwlock to the unlocked state. * * @param rwlock pointer to the RW Lock. */ @@ -50,14 +54,14 @@ void odp_rwlock_read_lock(odp_rwlock_t *rwlock); void odp_rwlock_read_unlock(odp_rwlock_t *rwlock); /** - * Aquire a write lock. + * Aquire the write lock. * * @param rwlock pointer to a RW Lock. */ void odp_rwlock_write_lock(odp_rwlock_t *rwlock); /** - * Release a write lock. + * Release the write lock. * * @param rwlock pointer to a RW Lock. */ diff --git a/platform/linux-generic/include/api/odp_ticketlock.h b/platform/linux-generic/include/api/odp_ticketlock.h index 6277a18..c4b5e34 100644 --- a/platform/linux-generic/include/api/odp_ticketlock.h +++ b/platform/linux-generic/include/api/odp_ticketlock.h @@ -27,8 +27,8 @@ extern "C" { * ODP ticketlock */ typedef struct odp_ticketlock_t { - odp_atomic_u32_t next_ticket; /**< @private Next ticket */ - volatile uint32_t cur_ticket; /**< @private Current ticket */ + odp_atomic32_t next_ticket; /**< @private Next ticket */ + odp_atomic32_t cur_ticket; /**< @private Current ticket */ } odp_ticketlock_t; diff --git a/platform/linux-generic/include/odp_buffer_internal.h b/platform/linux-generic/include/odp_buffer_internal.h index 2002b51..530ab96 100644 --- a/platform/linux-generic/include/odp_buffer_internal.h +++ b/platform/linux-generic/include/odp_buffer_internal.h @@ -88,7 +88,7 @@ typedef struct odp_buffer_hdr_t { uint32_t index; /* buf index in the pool */ size_t size; /* max data size */ size_t cur_offset; /* current offset */ - odp_atomic_int_t ref_count; /* reference count */ + odp_atomic32_t ref_count; /* reference count */ odp_buffer_scatter_t scatter; /* Scatter/gather list */ int type; /* type of next header */ odp_buffer_pool_t pool_hdl; /* buffer pool handle */ diff --git a/platform/linux-generic/odp_barrier.c b/platform/linux-generic/odp_barrier.c index a82b294..6c3b884 100644 --- a/platform/linux-generic/odp_barrier.c +++ b/platform/linux-generic/odp_barrier.c @@ -8,41 +8,48 @@ #include #include -void odp_barrier_init_count(odp_barrier_t *barrier, int count) +void odp_barrier_init(odp_barrier_t *barrier, uint32_t num_threads) { - barrier->count = count; - barrier->bar = 0; - odp_sync_stores(); + barrier->num_threads = num_threads; /* Constant after initialisation */ + odp_atomic32_store_rls(&barrier->in_barrier, 0); } /* * Efficient barrier_sync - * * Barriers are initialized with a count of the number of callers - * that must sync on the barrier before any may proceed. + * that must sync on (enter) the barrier before any may proceed (exit). * * To avoid race conditions and to permit the barrier to be fully * reusable, the barrier value cycles between 0..2*count-1. When - * synchronizing the wasless variable simply tracks which half of + * synchronizing the waslow variable simply tracks which half of * the cycle the barrier was in upon entry. Exit is when the * barrier crosses to the other half of the cycle. */ void odp_barrier_sync(odp_barrier_t *barrier) { - int count; - int wasless; + uint32_t count; + bool waslow; - odp_sync_stores(); - wasless = barrier->bar < barrier->count; - count = odp_atomic_fetch_inc_int(&barrier->bar); + /* FIXME do we need acquire barrier as well? */ + /* Increase threads in_barrier count, this will automatically release + * the other threads when lower/upper range is switched */ + count = odp_atomic32_fetch_add_rls(&barrier->in_barrier, 1); + /* Compute lower or higher range indicator */ + waslow = count < barrier->num_threads; - if (count == 2*barrier->count-1) { - barrier->bar = 0; - } else { - while ((barrier->bar < barrier->count) == wasless) - odp_spin(); + /* Check if in_barrier count has "wrapped" */ + if (count == 2 * barrier->num_threads - 1) { + /* Manually wrap the counter */ + odp_atomic32_add_rls(&barrier->in_barrier, + (uint32_t)(-2*(int)barrier->num_threads)); + /* We don't need to wait below, return immediately */ + return; + } + /* Wait for counter to change half */ + while ((odp_atomic32_load_rlx(&barrier->in_barrier) < + barrier->num_threads) == waslow) { + odp_spin(); } - - odp_mem_barrier(); } diff --git a/platform/linux-generic/odp_buffer.c b/platform/linux-generic/odp_buffer.c index e54e0e7..a5939f3 100644 --- a/platform/linux-generic/odp_buffer.c +++ b/platform/linux-generic/odp_buffer.c @@ -73,7 +73,8 @@ int odp_buffer_snprint(char *str, size_t n, odp_buffer_t buf) len += snprintf(&str[len], n-len, " cur_offset %zu\n", hdr->cur_offset); len += snprintf(&str[len], n-len, - " ref_count %i\n", hdr->ref_count); + " ref_count %u\n", + odp_atomic32_load_rlx(&hdr->ref_count)); len += snprintf(&str[len], n-len, " type %i\n", hdr->type); len += snprintf(&str[len], n-len, diff --git a/platform/linux-generic/odp_crypto.c b/platform/linux-generic/odp_crypto.c index b37ad6b..d9fff10 100644 --- a/platform/linux-generic/odp_crypto.c +++ b/platform/linux-generic/odp_crypto.c @@ -26,7 +26,7 @@ #define MAX_SESSIONS 32 typedef struct { - odp_atomic_u32_t next; + odp_atomic32_t next; uint32_t max; odp_crypto_generic_session_t sessions[0]; } odp_crypto_global_t; @@ -58,7 +58,7 @@ odp_crypto_generic_session_t *alloc_session(void) uint32_t idx; odp_crypto_generic_session_t *session = NULL; - idx = odp_atomic_fetch_inc_u32(&global->next); + idx = odp_atomic32_fetch_add_rlx(&global->next, 1); if (idx < global->max) { session = &global->sessions[idx]; session->index = idx; diff --git a/platform/linux-generic/odp_queue.c b/platform/linux-generic/odp_queue.c index 1318bcd..08c0d29 100644 --- a/platform/linux-generic/odp_queue.c +++ b/platform/linux-generic/odp_queue.c @@ -214,8 +214,13 @@ int odp_queue_set_context(odp_queue_t handle, void *context) { queue_entry_t *queue; queue = queue_to_qentry(handle); + /* Setting a new queue context can be viewed as a release operation, + * all writes to the context must be observable before the context + * is made observable */ odp_sync_stores(); - queue->s.param.context = context; + queue->s.param.context = context; /* Store-release */ + /* Ensure queue modification is globally visible before we return + * and the application might cause the queue to be scheduled */ odp_sync_stores(); return 0; } diff --git a/platform/linux-generic/odp_ring.c b/platform/linux-generic/odp_ring.c index 632aa66..d1ec825 100644 --- a/platform/linux-generic/odp_ring.c +++ b/platform/linux-generic/odp_ring.c @@ -187,10 +187,10 @@ odph_ring_create(const char *name, unsigned count, unsigned flags) r->cons.size = count; r->prod.mask = count-1; r->cons.mask = count-1; - r->prod.head = 0; - r->cons.head = 0; - r->prod.tail = 0; - r->cons.tail = 0; + odp_atomic32_store_rlx(&r->prod.head, 0); + odp_atomic32_store_rlx(&r->cons.head, 0); + odp_atomic32_store_rlx(&r->prod.tail, 0); + odp_atomic32_store_rlx(&r->cons.tail, 0); TAILQ_INSERT_TAIL(&odp_ring_list, r, next); } else { @@ -227,7 +227,7 @@ int __odph_ring_mp_do_enqueue(odph_ring_t *r, void * const *obj_table, uint32_t prod_head, prod_next; uint32_t cons_tail, free_entries; const unsigned max = n; - int success; + bool ok; unsigned i; uint32_t mask = r->prod.mask; int ret; @@ -237,8 +237,8 @@ int __odph_ring_mp_do_enqueue(odph_ring_t *r, void * const *obj_table, /* Reset n to the initial burst count */ n = max; - prod_head = r->prod.head; - cons_tail = r->cons.tail; + prod_head = odp_atomic32_load_rlx(&r->prod.head); + cons_tail = odp_atomic32_load_acq(&r->cons.tail); /* The subtraction is done between two unsigned 32bits value * (the result is always modulo 32 bits even if we have * prod_head > cons_tail). So 'free_entries' is always between 0 @@ -259,13 +259,13 @@ int __odph_ring_mp_do_enqueue(odph_ring_t *r, void * const *obj_table, } prod_next = prod_head + n; - success = odp_atomic_cmpset_u32(&r->prod.head, prod_head, - prod_next); - } while (odp_unlikely(success == 0)); + ok = odp_atomic32_cmp_and_swap_rlx(&r->prod.head, + prod_head, + prod_next) == prod_head; + } while (odp_unlikely(!ok)); /* write entries in ring */ ENQUEUE_PTRS(); - odp_mem_barrier(); /* if we exceed the watermark */ if (odp_unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) { @@ -279,10 +279,10 @@ int __odph_ring_mp_do_enqueue(odph_ring_t *r, void * const *obj_table, * If there are other enqueues in progress that preceeded us, * we need to wait for them to complete */ - while (odp_unlikely(r->prod.tail != prod_head)) + while (odp_unlikely(odp_atomic32_load_rlx(&r->prod.tail) != prod_head)) odp_spin(); - r->prod.tail = prod_next; + odp_atomic32_store_rls(&r->prod.tail, prod_next); return ret; } @@ -298,8 +298,8 @@ int __odph_ring_sp_do_enqueue(odph_ring_t *r, void * const *obj_table, uint32_t mask = r->prod.mask; int ret; - prod_head = r->prod.head; - cons_tail = r->cons.tail; + prod_head = odp_atomic32_load_rlx(&r->prod.head); + cons_tail = odp_atomic32_load_acq(&r->cons.tail); /* The subtraction is done between two unsigned 32bits value * (the result is always modulo 32 bits even if we have * prod_head > cons_tail). So 'free_entries' is always between 0 @@ -320,11 +320,10 @@ int __odph_ring_sp_do_enqueue(odph_ring_t *r, void * const *obj_table, } prod_next = prod_head + n; - r->prod.head = prod_next; + odp_atomic32_store_rlx(&r->prod.head, prod_next); /* write entries in ring */ ENQUEUE_PTRS(); - odp_mem_barrier(); /* if we exceed the watermark */ if (odp_unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) { @@ -334,7 +333,7 @@ int __odph_ring_sp_do_enqueue(odph_ring_t *r, void * const *obj_table, ret = (behavior == ODPH_RING_QUEUE_FIXED) ? 0 : n; } - r->prod.tail = prod_next; + odp_atomic32_store_rls(&r->prod.tail, prod_next); return ret; } @@ -348,7 +347,7 @@ int __odph_ring_mc_do_dequeue(odph_ring_t *r, void **obj_table, uint32_t cons_head, prod_tail; uint32_t cons_next, entries; const unsigned max = n; - int success; + bool ok; unsigned i; uint32_t mask = r->prod.mask; @@ -357,8 +356,8 @@ int __odph_ring_mc_do_dequeue(odph_ring_t *r, void **obj_table, /* Restore n as it may change every loop */ n = max; - cons_head = r->cons.head; - prod_tail = r->prod.tail; + cons_head = odp_atomic32_load_rlx(&r->cons.head); + prod_tail = odp_atomic32_load_acq(&r->prod.tail); /* The subtraction is done between two unsigned 32bits value * (the result is always modulo 32 bits even if we have * cons_head > prod_tail). So 'entries' is always between 0 @@ -378,22 +377,22 @@ int __odph_ring_mc_do_dequeue(odph_ring_t *r, void **obj_table, } cons_next = cons_head + n; - success = odp_atomic_cmpset_u32(&r->cons.head, cons_head, - cons_next); - } while (odp_unlikely(success == 0)); + ok = odp_atomic32_cmp_and_swap_rlx(&r->cons.head, + cons_head, + cons_next) == cons_head; + } while (odp_unlikely(!ok)); /* copy in table */ DEQUEUE_PTRS(); - odp_mem_barrier(); /* * If there are other dequeues in progress that preceded us, * we need to wait for them to complete */ - while (odp_unlikely(r->cons.tail != cons_head)) + while (odp_unlikely(odp_atomic32_load_rlx(&r->cons.tail) != cons_head)) odp_spin(); - r->cons.tail = cons_next; + odp_atomic32_store_rls(&r->cons.tail, cons_next); return behavior == ODPH_RING_QUEUE_FIXED ? 0 : n; } @@ -409,8 +408,8 @@ int __odph_ring_sc_do_dequeue(odph_ring_t *r, void **obj_table, unsigned i; uint32_t mask = r->prod.mask; - cons_head = r->cons.head; - prod_tail = r->prod.tail; + cons_head = odp_atomic32_load_rlx(&r->cons.head); + prod_tail = odp_atomic32_load_acq(&r->prod.tail); /* The subtraction is done between two unsigned 32bits value * (the result is always modulo 32 bits even if we have * cons_head > prod_tail). So 'entries' is always between 0 @@ -429,13 +428,12 @@ int __odph_ring_sc_do_dequeue(odph_ring_t *r, void **obj_table, } cons_next = cons_head + n; - r->cons.head = cons_next; + odp_atomic32_store_rlx(&r->cons.head, cons_next); /* copy in table */ DEQUEUE_PTRS(); - odp_mem_barrier(); - r->cons.tail = cons_next; + odp_atomic32_store_rls(&r->cons.tail, cons_next); return behavior == ODPH_RING_QUEUE_FIXED ? 0 : n; } @@ -482,8 +480,8 @@ int odph_ring_sc_dequeue_bulk(odph_ring_t *r, void **obj_table, unsigned n) */ int odph_ring_full(const odph_ring_t *r) { - uint32_t prod_tail = r->prod.tail; - uint32_t cons_tail = r->cons.tail; + uint32_t prod_tail = odp_atomic32_load_rlx(&r->prod.tail); + uint32_t cons_tail = odp_atomic32_load_rlx(&r->cons.tail); return (((cons_tail - prod_tail - 1) & r->prod.mask) == 0); } @@ -492,8 +490,8 @@ int odph_ring_full(const odph_ring_t *r) */ int odph_ring_empty(const odph_ring_t *r) { - uint32_t prod_tail = r->prod.tail; - uint32_t cons_tail = r->cons.tail; + uint32_t prod_tail = odp_atomic32_load_rlx(&r->prod.tail); + uint32_t cons_tail = odp_atomic32_load_rlx(&r->cons.tail); return !!(cons_tail == prod_tail); } @@ -502,8 +500,8 @@ int odph_ring_empty(const odph_ring_t *r) */ unsigned odph_ring_count(const odph_ring_t *r) { - uint32_t prod_tail = r->prod.tail; - uint32_t cons_tail = r->cons.tail; + uint32_t prod_tail = odp_atomic32_load_rlx(&r->prod.tail); + uint32_t cons_tail = odp_atomic32_load_rlx(&r->cons.tail); return (prod_tail - cons_tail) & r->prod.mask; } @@ -512,8 +510,8 @@ unsigned odph_ring_count(const odph_ring_t *r) */ unsigned odph_ring_free_count(const odph_ring_t *r) { - uint32_t prod_tail = r->prod.tail; - uint32_t cons_tail = r->cons.tail; + uint32_t prod_tail = odp_atomic32_load_rlx(&r->prod.tail); + uint32_t cons_tail = odp_atomic32_load_rlx(&r->cons.tail); return (cons_tail - prod_tail - 1) & r->prod.mask; } @@ -523,10 +521,10 @@ void odph_ring_dump(const odph_ring_t *r) ODP_DBG("ring <%s>@%p\n", r->name, r); ODP_DBG(" flags=%x\n", r->flags); ODP_DBG(" size=%"PRIu32"\n", r->prod.size); - ODP_DBG(" ct=%"PRIu32"\n", r->cons.tail); - ODP_DBG(" ch=%"PRIu32"\n", r->cons.head); - ODP_DBG(" pt=%"PRIu32"\n", r->prod.tail); - ODP_DBG(" ph=%"PRIu32"\n", r->prod.head); + ODP_DBG(" ct=%"PRIu32"\n", odp_atomic32_load_rlx(&r->cons.tail)); + ODP_DBG(" ch=%"PRIu32"\n", odp_atomic32_load_rlx(&r->cons.head)); + ODP_DBG(" pt=%"PRIu32"\n", odp_atomic32_load_rlx(&r->prod.tail)); + ODP_DBG(" ph=%"PRIu32"\n", odp_atomic32_load_rlx(&r->prod.head)); ODP_DBG(" used=%u\n", odph_ring_count(r)); ODP_DBG(" avail=%u\n", odph_ring_free_count(r)); if (r->prod.watermark == r->prod.size) diff --git a/platform/linux-generic/odp_rwlock.c b/platform/linux-generic/odp_rwlock.c index 11c8dd7..ba0a7ca 100644 --- a/platform/linux-generic/odp_rwlock.c +++ b/platform/linux-generic/odp_rwlock.c @@ -4,58 +4,56 @@ * SPDX-License-Identifier: BSD-3-Clause */ +#include #include #include - #include void odp_rwlock_init(odp_rwlock_t *rwlock) { - rwlock->cnt = 0; + odp_atomic32_store_rlx(&rwlock->cnt, 0); } void odp_rwlock_read_lock(odp_rwlock_t *rwlock) { - int32_t cnt; - int is_locked = 0; - - while (is_locked == 0) { - cnt = rwlock->cnt; + bool gotit; + do { + uint32_t cnt = odp_atomic32_load_acq(&rwlock->cnt); /* waiting for read lock */ - if (cnt < 0) { + if ((int32_t)cnt < 0) { odp_spin(); continue; } - is_locked = odp_atomic_cmpset_u32( - (volatile uint32_t *)&rwlock->cnt, - cnt, cnt + 1); - } + /* Attempt to take another read lock */ + gotit = odp_atomic32_cmp_and_swap_rlx(&rwlock->cnt, + cnt, cnt + 1) == cnt; + } while (!gotit); } void odp_rwlock_read_unlock(odp_rwlock_t *rwlock) { - odp_atomic_dec_u32((odp_atomic_u32_t *)(intptr_t)&rwlock->cnt); + /* Release one read lock by subtracting 1 */ + odp_atomic32_add_rls(&rwlock->cnt, (uint32_t)-1); } void odp_rwlock_write_lock(odp_rwlock_t *rwlock) { - int32_t cnt; - int is_locked = 0; - - while (is_locked == 0) { - cnt = rwlock->cnt; - /* lock aquired, wait */ + bool gotit; + do { + uint32_t cnt = odp_atomic32_load_acq(&rwlock->cnt); if (cnt != 0) { + /* Lock is busy */ odp_spin(); continue; } - is_locked = odp_atomic_cmpset_u32( - (volatile uint32_t *)&rwlock->cnt, - 0, -1); - } + /* Attempt to take write lock */ + gotit = odp_atomic32_cmp_and_swap_rlx(&rwlock->cnt, 0, + (uint32_t)-1) == 0; + } while (!gotit); } void odp_rwlock_write_unlock(odp_rwlock_t *rwlock) { - odp_atomic_inc_u32((odp_atomic_u32_t *)(intptr_t)&rwlock->cnt); + /* Release the write lock by adding 1 */ + odp_atomic32_add_rls(&rwlock->cnt, 1); } diff --git a/platform/linux-generic/odp_thread.c b/platform/linux-generic/odp_thread.c index b869b27..569b235 100644 --- a/platform/linux-generic/odp_thread.c +++ b/platform/linux-generic/odp_thread.c @@ -31,7 +31,7 @@ typedef struct { typedef struct { thread_state_t thr[ODP_CONFIG_MAX_THREADS]; - odp_atomic_int_t num; + odp_atomic32_t num; } thread_globals_t; @@ -67,7 +67,7 @@ static int thread_id(void) int id; int cpu; - id = odp_atomic_fetch_add_int(&thread_globals->num, 1); + id = (int)odp_atomic32_fetch_add_rlx(&thread_globals->num, 1); if (id >= ODP_CONFIG_MAX_THREADS) { ODP_ERR("Too many threads\n"); @@ -77,7 +77,7 @@ static int thread_id(void) cpu = sched_getcpu(); if (cpu < 0) { - ODP_ERR("getcpu failed\n"); + ODP_ERR("sched_getcpu failed\n"); return -1; } diff --git a/platform/linux-generic/odp_ticketlock.c b/platform/linux-generic/odp_ticketlock.c index be5b885..cadc0e0 100644 --- a/platform/linux-generic/odp_ticketlock.c +++ b/platform/linux-generic/odp_ticketlock.c @@ -12,9 +12,8 @@ void odp_ticketlock_init(odp_ticketlock_t *ticketlock) { - ticketlock->next_ticket = 0; - ticketlock->cur_ticket = 0; - odp_sync_stores(); + odp_atomic32_store_rlx(&ticketlock->next_ticket, 0); + odp_atomic32_store_rlx(&ticketlock->cur_ticket, 0); } @@ -22,30 +21,14 @@ void odp_ticketlock_lock(odp_ticketlock_t *ticketlock) { uint32_t ticket; - ticket = odp_atomic_fetch_inc_u32(&ticketlock->next_ticket); + ticket = odp_atomic32_fetch_add_rlx(&ticketlock->next_ticket, 1); - while (ticket != ticketlock->cur_ticket) + while (ticket != odp_atomic32_load_acq(&ticketlock->cur_ticket)) odp_spin(); - - odp_mem_barrier(); } void odp_ticketlock_unlock(odp_ticketlock_t *ticketlock) { - odp_sync_stores(); - - ticketlock->cur_ticket++; - -#if defined __OCTEON__ - odp_sync_stores(); -#else - odp_mem_barrier(); -#endif -} - - -int odp_ticketlock_is_locked(odp_ticketlock_t *ticketlock) -{ - return ticketlock->cur_ticket != ticketlock->next_ticket; + odp_atomic32_add_rls(&ticketlock->cur_ticket, 1); } diff --git a/platform/linux-generic/odp_timer.c b/platform/linux-generic/odp_timer.c index 313c713..938429f 100644 --- a/platform/linux-generic/odp_timer.c +++ b/platform/linux-generic/odp_timer.c @@ -32,8 +32,8 @@ typedef struct { typedef struct { int allocated; - volatile int active; - volatile uint64_t cur_tick; + odp_atomic32_t active; + odp_atomic64_t cur_tick; timer_t timerid; odp_timer_t timer_hdl; odp_buffer_pool_t pool; @@ -150,16 +150,14 @@ static void notify_function(union sigval sigval) timer = sigval.sival_ptr; - if (timer->active == 0) { + if (odp_atomic32_load_rlx(&timer->active) == 0) { ODP_DBG("Timer (%u) not active\n", timer->timer_hdl); return; } /* ODP_DBG("Tick\n"); */ - cur_tick = timer->cur_tick++; - - odp_sync_stores(); + cur_tick = odp_atomic64_fetch_add_rlx(&timer->cur_tick, 1); tick = &timer->tick[cur_tick % MAX_TICKS]; @@ -318,8 +316,7 @@ odp_timer_t odp_timer_create(const char *name, odp_buffer_pool_t pool, timer->tick[i].list = NULL; } - timer->active = 1; - odp_sync_stores(); + odp_atomic32_store_rls(&timer->active, 1); timer_start(timer); @@ -340,7 +337,7 @@ odp_timer_tmo_t odp_timer_absolute_tmo(odp_timer_t timer_hdl, uint64_t tmo_tick, id = (int)timer_hdl - 1; timer = &odp_timer.timer[id]; - cur_tick = timer->cur_tick; + cur_tick = odp_atomic64_load_rlx(&timer->cur_tick); if (tmo_tick <= cur_tick) { ODP_DBG("timeout too close\n"); return ODP_TIMER_TMO_INVALID; @@ -416,7 +413,7 @@ uint64_t odp_timer_current_tick(odp_timer_t timer_hdl) uint32_t id; id = timer_hdl - 1; - return odp_timer.timer[id].cur_tick; + return odp_atomic64_load_rlx(&odp_timer.timer[id].cur_tick); } odp_timeout_t odp_timeout_from_buffer(odp_buffer_t buf) diff --git a/test/api_test/odp_atomic_test.c b/test/api_test/odp_atomic_test.c index 9019d4f..4d27b32 100644 --- a/test/api_test/odp_atomic_test.c +++ b/test/api_test/odp_atomic_test.c @@ -10,17 +10,14 @@ #include #include -static odp_atomic_int_t a32; -static odp_atomic_u32_t a32u; -static odp_atomic_u64_t a64u; +static odp_atomic32_t a32u; +static odp_atomic64_t a64u; -static odp_atomic_int_t numthrds; +static odp_barrier_t barrier; static const char * const test_name[] = { "dummy", "test atomic basic ops add/sub/inc/dec", - "test atomic inc/dec of signed word", - "test atomic add/sub of signed word", "test atomic inc/dec of unsigned word", "test atomic add/sub of unsigned word", "test atomic inc/dec of unsigned double word", @@ -31,39 +28,29 @@ static struct timeval tv0[MAX_WORKERS], tv1[MAX_WORKERS]; static void usage(void) { - printf("\n./odp_atomic -t -n ,\n\n" + printf("\n./odp_atomic -t -n \n\n" "\t is\n" "\t\t1 - Test mix(does inc,dec,add,sub on 32/64 bit)\n" - "\t\t2 - Test inc dec of signed word\n" - "\t\t3 - Test add sub of signed word\n" - "\t\t4 - Test inc dec of unsigned word\n" - "\t\t5 - Test add sub of unsigned word\n" - "\t\t6 - Test inc dec of double word\n" - "\t\t7 - Test add sub of double word\n" - "\t is optional\n" - "\t\t<1 - 31> - no of pthreads to start\n" + "\t\t2 - Test inc dec of unsigned word\n" + "\t\t3 - Test add sub of unsigned word\n" + "\t\t4 - Test inc dec of double word\n" + "\t\t5 - Test add sub of double word\n" + "\t is optional\n" + "\t\t<1 - 31> - no of threads to start\n" "\t\tif user doesn't specify this option, then\n" - "\t\tno of pthreads created is equivalent to no of cores\n" + "\t\tno of threads created is equivalent to no of cores\n" "\t\tavailable in the system\n" "\tExample usage:\n" "\t\t./odp_atomic -t 2\n" "\t\t./odp_atomic -t 3 -n 12\n"); } -void test_atomic_inc_32(void) -{ - int i; - - for (i = 0; i < CNT; i++) - odp_atomic_inc_int(&a32); -} - void test_atomic_inc_u32(void) { int i; for (i = 0; i < CNT; i++) - odp_atomic_inc_u32(&a32u); + odp_atomic32_add_rlx(&a32u, 1); } void test_atomic_inc_64(void) @@ -71,15 +58,7 @@ void test_atomic_inc_64(void) int i; for (i = 0; i < CNT; i++) - odp_atomic_inc_u64(&a64u); -} - -void test_atomic_dec_32(void) -{ - int i; - - for (i = 0; i < CNT; i++) - odp_atomic_dec_int(&a32); + odp_atomic64_add_rlx(&a64u, 1); } void test_atomic_dec_u32(void) @@ -87,7 +66,7 @@ void test_atomic_dec_u32(void) int i; for (i = 0; i < CNT; i++) - odp_atomic_dec_u32(&a32u); + odp_atomic32_add_rlx(&a32u, (uint32_t)-1); } void test_atomic_dec_64(void) @@ -95,15 +74,7 @@ void test_atomic_dec_64(void) int i; for (i = 0; i < CNT; i++) - odp_atomic_dec_u64(&a64u); -} - -void test_atomic_add_32(void) -{ - int i; - - for (i = 0; i < (CNT / ADD_SUB_CNT); i++) - odp_atomic_fetch_add_int(&a32, ADD_SUB_CNT); + odp_atomic64_add_rlx(&a64u, (uint64_t)-1); } void test_atomic_add_u32(void) @@ -111,7 +82,7 @@ void test_atomic_add_u32(void) int i; for (i = 0; i < (CNT / ADD_SUB_CNT); i++) - odp_atomic_fetch_add_u32(&a32u, ADD_SUB_CNT); + odp_atomic32_fetch_add_rlx(&a32u, ADD_SUB_CNT); } void test_atomic_add_64(void) @@ -119,15 +90,7 @@ void test_atomic_add_64(void) int i; for (i = 0; i < (CNT / ADD_SUB_CNT); i++) - odp_atomic_fetch_add_u64(&a64u, ADD_SUB_CNT); -} - -void test_atomic_sub_32(void) -{ - int i; - - for (i = 0; i < (CNT / ADD_SUB_CNT); i++) - odp_atomic_fetch_sub_int(&a32, ADD_SUB_CNT); + odp_atomic64_fetch_add_rlx(&a64u, ADD_SUB_CNT); } void test_atomic_sub_u32(void) @@ -135,7 +98,7 @@ void test_atomic_sub_u32(void) int i; for (i = 0; i < (CNT / ADD_SUB_CNT); i++) - odp_atomic_fetch_sub_u32(&a32u, ADD_SUB_CNT); + odp_atomic32_fetch_add_rlx(&a32u, -ADD_SUB_CNT); } void test_atomic_sub_64(void) @@ -143,19 +106,7 @@ void test_atomic_sub_64(void) int i; for (i = 0; i < (CNT / ADD_SUB_CNT); i++) - odp_atomic_fetch_sub_u64(&a64u, ADD_SUB_CNT); -} - -void test_atomic_inc_dec_32(void) -{ - test_atomic_inc_32(); - test_atomic_dec_32(); -} - -void test_atomic_add_sub_32(void) -{ - test_atomic_add_32(); - test_atomic_sub_32(); + odp_atomic64_fetch_add_rlx(&a64u, -ADD_SUB_CNT); } void test_atomic_inc_dec_u32(void) @@ -188,11 +139,6 @@ void test_atomic_add_sub_64(void) */ void test_atomic_basic(void) { - test_atomic_inc_32(); - test_atomic_dec_32(); - test_atomic_add_32(); - test_atomic_sub_32(); - test_atomic_inc_u32(); test_atomic_dec_u32(); test_atomic_add_u32(); @@ -206,31 +152,24 @@ void test_atomic_basic(void) void test_atomic_init(void) { - odp_atomic_init_int(&a32); - odp_atomic_init_u32(&a32u); - odp_atomic_init_u64(&a64u); + odp_atomic32_store_rlx(&a32u, 0); + odp_atomic64_store_rlx(&a64u, 0); } void test_atomic_store(void) { - odp_atomic_store_int(&a32, S32_INIT_VAL); - odp_atomic_store_u32(&a32u, U32_INIT_VAL); - odp_atomic_store_u64(&a64u, U64_INIT_VAL); + odp_atomic32_store_rlx(&a32u, U32_INIT_VAL); + odp_atomic64_store_rlx(&a64u, U64_INIT_VAL); } int test_atomic_validate(void) { - if (odp_atomic_load_int(&a32) != S32_INIT_VAL) { - ODP_ERR("Atomic signed 32 usual functions failed\n"); - return -1; - } - - if (odp_atomic_load_u32(&a32u) != U32_INIT_VAL) { + if (odp_atomic32_load_rlx(&a32u) != U32_INIT_VAL) { ODP_ERR("Atomic u32 usual functions failed\n"); return -1; } - if (odp_atomic_load_u64(&a64u) != U64_INIT_VAL) { + if (odp_atomic64_load_rlx(&a64u) != U64_INIT_VAL) { ODP_ERR("Atomic u64 usual functions failed\n"); return -1; } @@ -247,11 +186,8 @@ static void *run_thread(void *arg) ODP_DBG("Thread %i starts\n", thr); - odp_atomic_inc_int(&numthrds); - - /* Wait here until all pthreads are created */ - while (*(volatile int *)&numthrds < parg->numthrds) - ; + /* Wait here until all threads have arrived */ + odp_barrier_sync(&barrier); gettimeofday(&tv0[thr], NULL); @@ -259,12 +195,6 @@ static void *run_thread(void *arg) case TEST_MIX: test_atomic_basic(); break; - case TEST_INC_DEC_S32: - test_atomic_inc_dec_32(); - break; - case TEST_ADD_SUB_S32: - test_atomic_add_sub_32(); - break; case TEST_INC_DEC_U32: test_atomic_inc_dec_u32(); break; @@ -327,7 +257,6 @@ int main(int argc, char *argv[]) if (pthrdnum == 0) pthrdnum = odp_sys_core_count(); - odp_atomic_init_int(&numthrds); test_atomic_init(); test_atomic_store(); @@ -342,6 +271,7 @@ int main(int argc, char *argv[]) usage(); goto err_exit; } + odp_barrier_init(&barrier, pthrdnum); odp_test_thread_create(run_thread, &thrdarg); odp_test_thread_exit(&thrdarg); diff --git a/test/api_test/odp_atomic_test.h b/test/api_test/odp_atomic_test.h index 7814da5..aaa9d34 100644 --- a/test/api_test/odp_atomic_test.h +++ b/test/api_test/odp_atomic_test.h @@ -18,14 +18,11 @@ #define ADD_SUB_CNT 5 #define CNT 500000 -#define S32_INIT_VAL (1UL << 10) #define U32_INIT_VAL (1UL << 10) #define U64_INIT_VAL (1ULL << 33) typedef enum { TEST_MIX = 1, /* Must be first test case num */ - TEST_INC_DEC_S32, - TEST_ADD_SUB_S32, TEST_INC_DEC_U32, TEST_ADD_SUB_U32, TEST_INC_DEC_64, @@ -34,16 +31,10 @@ typedef enum { } odp_test_atomic_t; -void test_atomic_inc_dec_32(void); -void test_atomic_add_sub_32(void); void test_atomic_inc_dec_u32(void); void test_atomic_add_sub_u32(void); void test_atomic_inc_dec_64(void); void test_atomic_add_sub_64(void); -void test_atomic_inc_32(void); -void test_atomic_dec_32(void); -void test_atomic_add_32(void); -void test_atomic_sub_32(void); void test_atomic_inc_u32(void); void test_atomic_dec_u32(void); void test_atomic_add_u32(void);