From patchwork Sun Jun 26 11:20:57 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: He Kuang X-Patchwork-Id: 70882 Delivered-To: patch@linaro.org Received: by 10.140.28.4 with SMTP id 4csp616892qgy; Sun, 26 Jun 2016 04:27:37 -0700 (PDT) X-Received: by 10.66.164.34 with SMTP id yn2mr23487364pab.97.1466940457363; Sun, 26 Jun 2016 04:27:37 -0700 (PDT) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id wj3si18995120pab.81.2016.06.26.04.27.37; Sun, 26 Jun 2016 04:27:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752780AbcFZL1f (ORCPT + 30 others); Sun, 26 Jun 2016 07:27:35 -0400 Received: from szxga04-in.huawei.com ([58.251.152.52]:20257 "EHLO szxga04-in.huawei.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1752323AbcFZLV6 (ORCPT ); Sun, 26 Jun 2016 07:21:58 -0400 Received: from 172.24.1.46 (EHLO lggeml424-hub.china.huawei.com) ([172.24.1.46]) by szxrg04-dlp.huawei.com (MOS 4.4.6-GA FastPath queued) with ESMTP id BSN98820; Sun, 26 Jun 2016 19:10:51 +0800 (CST) Received: from euler.hulk-profiling (10.107.193.250) by lggeml424-hub.china.huawei.com (10.72.61.34) with Microsoft SMTP Server id 14.3.235.1; Sun, 26 Jun 2016 19:21:29 +0800 From: He Kuang To: , , , , , , , , CC: Subject: [RFC PATCH v2 05/26] tools include: Sync math64.h and div64.h Date: Sun, 26 Jun 2016 11:20:57 +0000 Message-ID: <1466940078-65581-6-git-send-email-hekuang@huawei.com> X-Mailer: git-send-email 1.8.3.4 In-Reply-To: <1466940078-65581-1-git-send-email-hekuang@huawei.com> References: <1466940078-65581-1-git-send-email-hekuang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.107.193.250] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090206.576FBAC7.0068, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 5869bcfcfe0e59c75a430d82d3937122 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Wang Nan This patch copies "include/linux/math64.h" into "tools/include/linux/math64.h" and copies "include/asm-generic/div64.h" into "tools/include/asm-generic/div64.h", to enable other libraries use arithmetic operation defined in them. tools/perf/MANIFEST is also updated for 'make perf-*-src-pkg'. Signed-off-by: Wang Nan Signed-off-by: He Kuang --- tools/include/asm-generic/div64.h | 234 ++++++++++++++++++++++++++++++++++++ tools/include/linux/math64.h | 247 ++++++++++++++++++++++++++++++++++++++ tools/perf/MANIFEST | 2 + 3 files changed, 483 insertions(+) create mode 100644 tools/include/asm-generic/div64.h create mode 100644 tools/include/linux/math64.h -- 1.8.5.2 diff --git a/tools/include/asm-generic/div64.h b/tools/include/asm-generic/div64.h new file mode 100644 index 0000000..be724b2 --- /dev/null +++ b/tools/include/asm-generic/div64.h @@ -0,0 +1,234 @@ +#ifndef _TOOLS_LINUX_ASM_GENERIC_DIV64_H +#define _TOOLS_LINUX_ASM_GENERIC_DIV64_H +/* + * Copyright (C) 2003 Bernardo Innocenti + * Based on former asm-ppc/div64.h and asm-m68knommu/div64.h + * + * Optimization for constant divisors on 32-bit machines: + * Copyright (C) 2006-2015 Nicolas Pitre + * + * The semantics of do_div() are: + * + * uint32_t do_div(uint64_t *n, uint32_t base) + * { + * uint32_t remainder = *n % base; + * *n = *n / base; + * return remainder; + * } + * + * NOTE: macro parameter n is evaluated multiple times, + * beware of side effects! + */ + +#include +#include + +#if BITS_PER_LONG == 64 + +# define do_div(n,base) ({ \ + uint32_t __base = (base); \ + uint32_t __rem; \ + __rem = ((uint64_t)(n)) % __base; \ + (n) = ((uint64_t)(n)) / __base; \ + __rem; \ + }) + +#elif BITS_PER_LONG == 32 + +#include + +/* + * If the divisor happens to be constant, we determine the appropriate + * inverse at compile time to turn the division into a few inline + * multiplications which ought to be much faster. And yet only if compiling + * with a sufficiently recent gcc version to perform proper 64-bit constant + * propagation. + * + * (It is unfortunate that gcc doesn't perform all this internally.) + */ + +#ifndef __div64_const32_is_OK +#define __div64_const32_is_OK (__GNUC__ >= 4) +#endif + +#define __div64_const32(n, ___b) \ +({ \ + /* \ + * Multiplication by reciprocal of b: n / b = n * (p / b) / p \ + * \ + * We rely on the fact that most of this code gets optimized \ + * away at compile time due to constant propagation and only \ + * a few multiplication instructions should remain. \ + * Hence this monstrous macro (static inline doesn't always \ + * do the trick here). \ + */ \ + uint64_t ___res, ___x, ___t, ___m, ___n = (n); \ + uint32_t ___p, ___bias; \ + \ + /* determine MSB of b */ \ + ___p = 1 << ilog2(___b); \ + \ + /* compute m = ((p << 64) + b - 1) / b */ \ + ___m = (~0ULL / ___b) * ___p; \ + ___m += (((~0ULL % ___b + 1) * ___p) + ___b - 1) / ___b; \ + \ + /* one less than the dividend with highest result */ \ + ___x = ~0ULL / ___b * ___b - 1; \ + \ + /* test our ___m with res = m * x / (p << 64) */ \ + ___res = ((___m & 0xffffffff) * (___x & 0xffffffff)) >> 32; \ + ___t = ___res += (___m & 0xffffffff) * (___x >> 32); \ + ___res += (___x & 0xffffffff) * (___m >> 32); \ + ___t = (___res < ___t) ? (1ULL << 32) : 0; \ + ___res = (___res >> 32) + ___t; \ + ___res += (___m >> 32) * (___x >> 32); \ + ___res /= ___p; \ + \ + /* Now sanitize and optimize what we've got. */ \ + if (~0ULL % (___b / (___b & -___b)) == 0) { \ + /* special case, can be simplified to ... */ \ + ___n /= (___b & -___b); \ + ___m = ~0ULL / (___b / (___b & -___b)); \ + ___p = 1; \ + ___bias = 1; \ + } else if (___res != ___x / ___b) { \ + /* \ + * We can't get away without a bias to compensate \ + * for bit truncation errors. To avoid it we'd need an \ + * additional bit to represent m which would overflow \ + * a 64-bit variable. \ + * \ + * Instead we do m = p / b and n / b = (n * m + m) / p. \ + */ \ + ___bias = 1; \ + /* Compute m = (p << 64) / b */ \ + ___m = (~0ULL / ___b) * ___p; \ + ___m += ((~0ULL % ___b + 1) * ___p) / ___b; \ + } else { \ + /* \ + * Reduce m / p, and try to clear bit 31 of m when \ + * possible, otherwise that'll need extra overflow \ + * handling later. \ + */ \ + uint32_t ___bits = -(___m & -___m); \ + ___bits |= ___m >> 32; \ + ___bits = (~___bits) << 1; \ + /* \ + * If ___bits == 0 then setting bit 31 is unavoidable. \ + * Simply apply the maximum possible reduction in that \ + * case. Otherwise the MSB of ___bits indicates the \ + * best reduction we should apply. \ + */ \ + if (!___bits) { \ + ___p /= (___m & -___m); \ + ___m /= (___m & -___m); \ + } else { \ + ___p >>= ilog2(___bits); \ + ___m >>= ilog2(___bits); \ + } \ + /* No bias needed. */ \ + ___bias = 0; \ + } \ + \ + /* \ + * Now we have a combination of 2 conditions: \ + * \ + * 1) whether or not we need to apply a bias, and \ + * \ + * 2) whether or not there might be an overflow in the cross \ + * product determined by (___m & ((1 << 63) | (1 << 31))). \ + * \ + * Select the best way to do (m_bias + m * n) / (1 << 64). \ + * From now on there will be actual runtime code generated. \ + */ \ + ___res = __arch_xprod_64(___m, ___n, ___bias); \ + \ + ___res /= ___p; \ +}) + +#ifndef __arch_xprod_64 +/* + * Default C implementation for __arch_xprod_64() + * + * Prototype: uint64_t __arch_xprod_64(const uint64_t m, uint64_t n, bool bias) + * Semantic: retval = ((bias ? m : 0) + m * n) >> 64 + * + * The product is a 128-bit value, scaled down to 64 bits. + * Assuming constant propagation to optimize away unused conditional code. + * Architectures may provide their own optimized assembly implementation. + */ +static inline uint64_t __arch_xprod_64(const uint64_t m, uint64_t n, bool bias) +{ + uint32_t m_lo = m; + uint32_t m_hi = m >> 32; + uint32_t n_lo = n; + uint32_t n_hi = n >> 32; + uint64_t res, tmp; + + if (!bias) { + res = ((uint64_t)m_lo * n_lo) >> 32; + } else if (!(m & ((1ULL << 63) | (1ULL << 31)))) { + /* there can't be any overflow here */ + res = (m + (uint64_t)m_lo * n_lo) >> 32; + } else { + res = m + (uint64_t)m_lo * n_lo; + tmp = (res < m) ? (1ULL << 32) : 0; + res = (res >> 32) + tmp; + } + + if (!(m & ((1ULL << 63) | (1ULL << 31)))) { + /* there can't be any overflow here */ + res += (uint64_t)m_lo * n_hi; + res += (uint64_t)m_hi * n_lo; + res >>= 32; + } else { + tmp = res += (uint64_t)m_lo * n_hi; + res += (uint64_t)m_hi * n_lo; + tmp = (res < tmp) ? (1ULL << 32) : 0; + res = (res >> 32) + tmp; + } + + res += (uint64_t)m_hi * n_hi; + + return res; +} +#endif + +#ifndef __div64_32 +extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor); +#endif + +/* The unnecessary pointer compare is there + * to check for type safety (n must be 64bit) + */ +# define do_div(n,base) ({ \ + uint32_t __base = (base); \ + uint32_t __rem; \ + (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ + if (__builtin_constant_p(__base) && \ + is_power_of_2(__base)) { \ + __rem = (n) & (__base - 1); \ + (n) >>= ilog2(__base); \ + } else if (__div64_const32_is_OK && \ + __builtin_constant_p(__base) && \ + __base != 0) { \ + uint32_t __res_lo, __n_lo = (n); \ + (n) = __div64_const32(n, __base); \ + /* the remainder can be computed with 32-bit regs */ \ + __res_lo = (n); \ + __rem = __n_lo - __res_lo * __base; \ + } else if (likely(((n) >> 32) == 0)) { \ + __rem = (uint32_t)(n) % __base; \ + (n) = (uint32_t)(n) / __base; \ + } else \ + __rem = __div64_32(&(n), __base); \ + __rem; \ + }) + +#else /* BITS_PER_LONG == ?? */ + +# error do_div() does not yet support the C64 + +#endif /* BITS_PER_LONG */ + +#endif /* _TOOLS_LINUX_ASM_GENERIC_DIV64_H */ diff --git a/tools/include/linux/math64.h b/tools/include/linux/math64.h new file mode 100644 index 0000000..e823227 --- /dev/null +++ b/tools/include/linux/math64.h @@ -0,0 +1,247 @@ +#ifndef _TOOLS_LINUX_MATH64_H +#define _TOOLS_LINUX_MATH64_H + +#include +#include // for BITS_PER_LONG +#include + +#if BITS_PER_LONG == 64 + +#define div64_long(x, y) div64_s64((x), (y)) +#define div64_ul(x, y) div64_u64((x), (y)) + +/** + * div_u64_rem - unsigned 64bit divide with 32bit divisor with remainder + * + * This is commonly provided by 32bit archs to provide an optimized 64bit + * divide. + */ +static inline u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder) +{ + *remainder = dividend % divisor; + return dividend / divisor; +} + +/** + * div_s64_rem - signed 64bit divide with 32bit divisor with remainder + */ +static inline s64 div_s64_rem(s64 dividend, s32 divisor, s32 *remainder) +{ + *remainder = dividend % divisor; + return dividend / divisor; +} + +/** + * div64_u64_rem - unsigned 64bit divide with 64bit divisor and remainder + */ +static inline u64 div64_u64_rem(u64 dividend, u64 divisor, u64 *remainder) +{ + *remainder = dividend % divisor; + return dividend / divisor; +} + +/** + * div64_u64 - unsigned 64bit divide with 64bit divisor + */ +static inline u64 div64_u64(u64 dividend, u64 divisor) +{ + return dividend / divisor; +} + +/** + * div64_s64 - signed 64bit divide with 64bit divisor + */ +static inline s64 div64_s64(s64 dividend, s64 divisor) +{ + return dividend / divisor; +} + +#elif BITS_PER_LONG == 32 + +#define div64_long(x, y) div_s64((x), (y)) +#define div64_ul(x, y) div_u64((x), (y)) + +#ifndef div_u64_rem +static inline u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder) +{ + *remainder = do_div(dividend, divisor); + return dividend; +} +#endif + +#ifndef div_s64_rem +extern s64 div_s64_rem(s64 dividend, s32 divisor, s32 *remainder); +#endif + +#ifndef div64_u64_rem +extern u64 div64_u64_rem(u64 dividend, u64 divisor, u64 *remainder); +#endif + +#ifndef div64_u64 +extern u64 div64_u64(u64 dividend, u64 divisor); +#endif + +#ifndef div64_s64 +extern s64 div64_s64(s64 dividend, s64 divisor); +#endif + +#endif /* BITS_PER_LONG */ + +/** + * div_u64 - unsigned 64bit divide with 32bit divisor + * + * This is the most common 64bit divide and should be used if possible, + * as many 32bit archs can optimize this variant better than a full 64bit + * divide. + */ +#ifndef div_u64 +static inline u64 div_u64(u64 dividend, u32 divisor) +{ + u32 remainder; + return div_u64_rem(dividend, divisor, &remainder); +} +#endif + +/** + * div_s64 - signed 64bit divide with 32bit divisor + */ +#ifndef div_s64 +static inline s64 div_s64(s64 dividend, s32 divisor) +{ + s32 remainder; + return div_s64_rem(dividend, divisor, &remainder); +} +#endif + +u32 iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder); + +static __always_inline u32 +__iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder) +{ + u32 ret = 0; + + while (dividend >= divisor) { + /* The following asm() prevents the compiler from + optimising this loop into a modulo operation. */ + asm("" : "+rm"(dividend)); + + dividend -= divisor; + ret++; + } + + *remainder = dividend; + + return ret; +} + +#if defined(CONFIG_ARCH_SUPPORTS_INT128) && defined(__SIZEOF_INT128__) + +#ifndef mul_u64_u32_shr +static inline u64 mul_u64_u32_shr(u64 a, u32 mul, unsigned int shift) +{ + return (u64)(((unsigned __int128)a * mul) >> shift); +} +#endif /* mul_u64_u32_shr */ + +#ifndef mul_u64_u64_shr +static inline u64 mul_u64_u64_shr(u64 a, u64 mul, unsigned int shift) +{ + return (u64)(((unsigned __int128)a * mul) >> shift); +} +#endif /* mul_u64_u64_shr */ + +#else + +#ifndef mul_u64_u32_shr +static inline u64 mul_u64_u32_shr(u64 a, u32 mul, unsigned int shift) +{ + u32 ah, al; + u64 ret; + + al = a; + ah = a >> 32; + + ret = ((u64)al * mul) >> shift; + if (ah) + ret += ((u64)ah * mul) << (32 - shift); + + return ret; +} +#endif /* mul_u64_u32_shr */ + +#ifndef mul_u64_u64_shr +static inline u64 mul_u64_u64_shr(u64 a, u64 b, unsigned int shift) +{ + union { + u64 ll; + struct { +#ifdef __BIG_ENDIAN + u32 high, low; +#else + u32 low, high; +#endif + } l; + } rl, rm, rn, rh, a0, b0; + u64 c; + + a0.ll = a; + b0.ll = b; + + rl.ll = (u64)a0.l.low * b0.l.low; + rm.ll = (u64)a0.l.low * b0.l.high; + rn.ll = (u64)a0.l.high * b0.l.low; + rh.ll = (u64)a0.l.high * b0.l.high; + + /* + * Each of these lines computes a 64-bit intermediate result into "c", + * starting at bits 32-95. The low 32-bits go into the result of the + * multiplication, the high 32-bits are carried into the next step. + */ + rl.l.high = c = (u64)rl.l.high + rm.l.low + rn.l.low; + rh.l.low = c = (c >> 32) + rm.l.high + rn.l.high + rh.l.low; + rh.l.high = (c >> 32) + rh.l.high; + + /* + * The 128-bit result of the multiplication is in rl.ll and rh.ll, + * shift it right and throw away the high part of the result. + */ + if (shift == 0) + return rl.ll; + if (shift < 64) + return (rl.ll >> shift) | (rh.ll << (64 - shift)); + return rh.ll >> (shift & 63); +} +#endif /* mul_u64_u64_shr */ + +#endif + +#ifndef mul_u64_u32_div +static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 divisor) +{ + union { + u64 ll; + struct { +#ifdef __BIG_ENDIAN + u32 high, low; +#else + u32 low, high; +#endif + } l; + } u, rl, rh; + + u.ll = a; + rl.ll = (u64)u.l.low * mul; + rh.ll = (u64)u.l.high * mul + rl.l.high; + + /* Bits 32-63 of the result will be in rh.l.low. */ + rl.l.high = do_div(rh.ll, divisor); + + /* Bits 0-31 of the result will be in rl.l.low. */ + do_div(rl.ll, divisor); + + rl.l.high = rh.l.low; + return rl.ll; +} +#endif /* mul_u64_u32_div */ + +#endif /* _TOOLS_LINUX_MATH64_H */ diff --git a/tools/perf/MANIFEST b/tools/perf/MANIFEST index 80ac3d4..5386e9d 100644 --- a/tools/perf/MANIFEST +++ b/tools/perf/MANIFEST @@ -44,6 +44,7 @@ tools/include/asm-generic/bitops/fls64.h tools/include/asm-generic/bitops/fls.h tools/include/asm-generic/bitops/hweight.h tools/include/asm-generic/bitops.h +tools/include/asm-generic/div64.h tools/include/linux/atomic.h tools/include/linux/bitops.h tools/include/linux/byteorder/generic.h @@ -53,6 +54,7 @@ tools/include/linux/hash.h tools/include/linux/kernel.h tools/include/linux/list.h tools/include/linux/log2.h +tools/include/linux/math64.h tools/include/linux/poison.h tools/include/linux/rbtree.h tools/include/linux/rbtree_augmented.h