From patchwork Wed Jun 25 09:03:50 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Edward Nevill X-Patchwork-Id: 32458 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-qg0-f69.google.com (mail-qg0-f69.google.com [209.85.192.69]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id B9CA820C88 for ; Wed, 25 Jun 2014 09:03:54 +0000 (UTC) Received: by mail-qg0-f69.google.com with SMTP id j107sf2280792qga.4 for ; Wed, 25 Jun 2014 02:03:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:message-id:subject:from:reply-to:to :cc:date:organization:mime-version:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-unsubscribe:content-type :content-transfer-encoding; bh=rBn0PeX3037IDFLqKyt+3E2AAJpEq0OTRsuoyeO13cc=; b=FGSAh9TZDmkexqqo2aYaWuBRfK5BpBHALeUFUfyTUV7FVz1+uHw79xQKDHIhvr4ySs sjWAh5jNHQwlIO+jrgRFrjl2D5rSPvMMGtizQnBoFrZzE51oRpjZkPHPRUMDK0siKlS0 kH4NvabPJ8D6S5znnueF5MfaxR1dkFEh75i+p3sngCLS2RWX12aXaqeFcTzMKSuGYRKP eP6DMRF0BRmbIjAx0kkB7ZFYpqgbQ40N42kGSJ2197mx19Sd0lRqmc/YGZtSgv42Na+5 iM6o+AkAvbcL23DVR0Sg8O7TCwIAUdr2iesFYjGtUsnZd8VX+MkcxmWyPvrqrM+apCFq ivqg== X-Gm-Message-State: ALoCoQnPHf+oC4FF9goVmngD8396V7hRLq15DD6pWlknR4rjVwj62p4iMFhyrn5BRE6m3oXK85M+ X-Received: by 10.58.37.164 with SMTP id z4mr3731958vej.28.1403687034553; Wed, 25 Jun 2014 02:03:54 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.96.201 with SMTP id k67ls2685032qge.22.gmail; Wed, 25 Jun 2014 02:03:54 -0700 (PDT) X-Received: by 10.221.7.71 with SMTP id on7mr5778871vcb.18.1403687034461; Wed, 25 Jun 2014 02:03:54 -0700 (PDT) Received: from mail-ve0-f170.google.com (mail-ve0-f170.google.com [209.85.128.170]) by mx.google.com with ESMTPS id vr7si1856635vcb.93.2014.06.25.02.03.54 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 25 Jun 2014 02:03:54 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.128.170 as permitted sender) client-ip=209.85.128.170; Received: by mail-ve0-f170.google.com with SMTP id i13so1649335veh.15 for ; Wed, 25 Jun 2014 02:03:54 -0700 (PDT) X-Received: by 10.52.96.8 with SMTP id do8mr4793328vdb.4.1403687034391; Wed, 25 Jun 2014 02:03:54 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.221.37.5 with SMTP id tc5csp272896vcb; Wed, 25 Jun 2014 02:03:53 -0700 (PDT) X-Received: by 10.180.24.2 with SMTP id q2mr9003766wif.22.1403687033533; Wed, 25 Jun 2014 02:03:53 -0700 (PDT) Received: from mail-wi0-f182.google.com (mail-wi0-f182.google.com [209.85.212.182]) by mx.google.com with ESMTPS id v8si26215922wix.35.2014.06.25.02.03.53 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 25 Jun 2014 02:03:53 -0700 (PDT) Received-SPF: pass (google.com: domain of edward.nevill@linaro.org designates 209.85.212.182 as permitted sender) client-ip=209.85.212.182; Received: by mail-wi0-f182.google.com with SMTP id bs8so2129987wib.3 for ; Wed, 25 Jun 2014 02:03:53 -0700 (PDT) X-Received: by 10.194.93.234 with SMTP id cx10mr8369080wjb.72.1403687032880; Wed, 25 Jun 2014 02:03:52 -0700 (PDT) Received: from [10.0.7.5] ([88.98.47.97]) by mx.google.com with ESMTPSA id sg1sm33358237wic.4.2014.06.25.02.03.51 for (version=SSLv3 cipher=RC4-SHA bits=128/128); Wed, 25 Jun 2014 02:03:52 -0700 (PDT) Message-ID: <1403687030.3355.19.camel@localhost.localdomain> Subject: Adding support for hardware crc to ARM aarch64 From: Edward Nevill Reply-To: edward.nevill@linaro.org To: common-dev@hadoop.apache.org Cc: patches@linaro.org Date: Wed, 25 Jun 2014 10:03:50 +0100 Organization: Linaro X-Mailer: Evolution 3.8.5 (3.8.5-2.fc19) Mime-Version: 1.0 X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: edward.nevill@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.128.170 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , Hi, I would like to add support for hardware crc for ARM's new 64 bit architecture, aarch64. I would be grateful if some committer could help me though the process of getting this change pushed into the trunk. I have prepared an initial patch below. The patch is completely conditionalized on __arch64__ For the moment I have only done the non pipelined version as the hw I have only has 1 crc execute unit. Some initial benchmarks on terasort give sw crc: 107 sec hw crc: 103 sec The performance improvement is quite small, but this is limited by the fact that I am using early stage hw which is not performant. I have also built it on x86 and I think the change is fairly safe for other architectures because post conditionalization the src is identical on other architectures. Thanks for you help, Ed. --- CUT HERE --- Index: hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.c =================================================================== --- hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.c (revision 1605031) +++ hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.c (working copy) @@ -38,7 +38,7 @@ #include "bulk_crc32.h" #include "gcc_optimizations.h" -#if (!defined(__FreeBSD__) && !defined(WINDOWS)) +#if (!defined(__FreeBSD__) && !defined(WINDOWS)) && !defined(__aarch64__) #define USE_PIPELINED #endif @@ -672,8 +672,61 @@ # endif // 64-bit vs 32-bit -#else // end x86 architecture +#elif defined(__aarch64__) // end x86 architecture +#include +#include + +#ifndef HWCAP_CRC32 +#define HWCAP_CRC32 (1<<7) +#endif + +/** + * On library load, determine what sort of crc we are going to do + * and set cached_cpu_supports_crc32 appropriately. + */ +void __attribute__ ((constructor)) init_cpu_support_flag(void) { + unsigned long auxv = getauxval(AT_HWCAP); + cached_cpu_supports_crc32 = auxv & HWCAP_CRC32; +} + +#define CRC32X(crc,value) asm("crc32cx %w[c], %w[c], %x[v]" : [c]"+r"(crc) : [v]"r"(value)) +#define CRC32W(crc,value) asm("crc32cw %w[c], %w[c], %w[v]" : [c]"+r"(crc) : [v]"r"(value)) +#define CRC32H(crc,value) asm("crc32ch %w[c], %w[c], %w[v]" : [c]"+r"(crc) : [v]"r"(value)) +#define CRC32B(crc,value) asm("crc32cb %w[c], %w[c], %w[v]" : [c]"+r"(crc) : [v]"r"(value)) + +/** + * Hardware-accelerated CRC32C calculation using the 64-bit instructions. + */ +static uint32_t crc32c_hardware(uint32_t crc, const uint8_t* p_buf, size_t length) { + int64_t len = length; + asm(".cpu generic+crc"); // Allow crc instructions in asm + if ((len -= sizeof(uint64_t)) >= 0) { + do { + CRC32X(crc, *(uint64_t*)p_buf); + p_buf += sizeof(uint64_t); + } while ((len -= sizeof(uint64_t)) >= 0); + } + + // The following is more efficient than the straight loop + if (len & sizeof(uint32_t)) { + CRC32W(crc, *(uint32_t*)p_buf); + p_buf += sizeof(uint32_t); + } + if (len & sizeof(uint16_t)) { + CRC32H(crc, *(uint16_t*)p_buf); + p_buf += sizeof(uint16_t); + } + if (len & sizeof(uint8_t)) { + CRC32B(crc, *p_buf); + p_buf++; + } + + return crc; +} + +#else + static uint32_t crc32c_hardware(uint32_t crc, const uint8_t* data, size_t length) { // never called! assert(0 && "hardware crc called on an unsupported platform"); --- CUT HERE ---