From patchwork Mon Apr 13 15:59:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Raphael Moreira Zinsly X-Patchwork-Id: 197878 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3E4DC2BA2B for ; Mon, 13 Apr 2020 15:59:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8454520732 for ; Mon, 13 Apr 2020 15:59:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731168AbgDMP7x (ORCPT ); Mon, 13 Apr 2020 11:59:53 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:49944 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731185AbgDMP7w (ORCPT ); Mon, 13 Apr 2020 11:59:52 -0400 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 03DFXt2A066240; Mon, 13 Apr 2020 11:59:39 -0400 Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0b-001b2d01.pphosted.com with ESMTP id 30b6tuvu73-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 13 Apr 2020 11:59:38 -0400 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.16.0.27/8.16.0.27) with SMTP id 03DFuV4n027349; Mon, 13 Apr 2020 15:59:38 GMT Received: from b01cxnp22033.gho.pok.ibm.com (b01cxnp22033.gho.pok.ibm.com [9.57.198.23]) by ppma04wdc.us.ibm.com with ESMTP id 30b5h69ehm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 13 Apr 2020 15:59:38 +0000 Received: from b01ledav006.gho.pok.ibm.com (b01ledav006.gho.pok.ibm.com [9.57.199.111]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 03DFxc5u52560170 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 13 Apr 2020 15:59:38 GMT Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E96E2AC05B; Mon, 13 Apr 2020 15:59:37 +0000 (GMT) Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8984AAC059; Mon, 13 Apr 2020 15:59:37 +0000 (GMT) Received: from localhost (unknown [9.85.151.130]) by b01ledav006.gho.pok.ibm.com (Postfix) with ESMTP; Mon, 13 Apr 2020 15:59:37 +0000 (GMT) From: Raphael Moreira Zinsly To: linuxppc-dev@lists.ozlabs.org, linux-crypto@vger.kernel.org, dja@axtens.net Cc: rzinsly@linux.ibm.com, herbert@gondor.apana.org.au, mpe@ellerman.id.au, haren@linux.ibm.com, abali@us.ibm.com Subject: [PATCH V3 1/5] selftests/powerpc: Add header files for GZIP engine test Date: Mon, 13 Apr 2020 12:59:12 -0300 Message-Id: <20200413155916.16900-2-rzinsly@linux.ibm.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200413155916.16900-1-rzinsly@linux.ibm.com> References: <20200413155916.16900-1-rzinsly@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.676 definitions=2020-04-13_07:2020-04-13,2020-04-13 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 phishscore=0 priorityscore=1501 malwarescore=0 adultscore=0 mlxscore=0 bulkscore=0 lowpriorityscore=0 impostorscore=0 clxscore=1015 mlxlogscore=999 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004130116 Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Add files to access the powerpc NX-GZIP engine in user space. Signed-off-by: Bulent Abali Signed-off-by: Raphael Moreira Zinsly --- .../selftests/powerpc/nx-gzip/inc/crb.h | 159 ++++++++++++++++++ .../selftests/powerpc/nx-gzip/inc/nx-gzip.h | 27 +++ .../powerpc/nx-gzip/inc/nx-helpers.h | 54 ++++++ .../selftests/powerpc/nx-gzip/inc/nx.h | 38 +++++ 4 files changed, 278 insertions(+) create mode 100644 tools/testing/selftests/powerpc/nx-gzip/inc/crb.h create mode 100644 tools/testing/selftests/powerpc/nx-gzip/inc/nx-gzip.h create mode 100644 tools/testing/selftests/powerpc/nx-gzip/inc/nx-helpers.h create mode 100644 tools/testing/selftests/powerpc/nx-gzip/inc/nx.h diff --git a/tools/testing/selftests/powerpc/nx-gzip/inc/crb.h b/tools/testing/selftests/powerpc/nx-gzip/inc/crb.h new file mode 100644 index 000000000000..9056e3dc1831 --- /dev/null +++ b/tools/testing/selftests/powerpc/nx-gzip/inc/crb.h @@ -0,0 +1,159 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef __CRB_H +#define __CRB_H +#include +#include "nx.h" + +typedef unsigned char u8; +typedef unsigned int u32; +typedef unsigned long long u64; + +/* CCW 842 CI/FC masks + * NX P8 workbook, section 4.3.1, figure 4-6 + * "CI/FC Boundary by NX CT type" + */ +#define CCW_CI_842 (0x00003ff8) +#define CCW_FC_842 (0x00000007) + +/* Chapter 6.5.8 Coprocessor-Completion Block (CCB) */ + +#define CCB_VALUE (0x3fffffffffffffff) +#define CCB_ADDRESS (0xfffffffffffffff8) +#define CCB_CM (0x0000000000000007) +#define CCB_CM0 (0x0000000000000004) +#define CCB_CM12 (0x0000000000000003) + +#define CCB_CM0_ALL_COMPLETIONS (0x0) +#define CCB_CM0_LAST_IN_CHAIN (0x4) +#define CCB_CM12_STORE (0x0) +#define CCB_CM12_INTERRUPT (0x1) + +#define CCB_SIZE (0x10) +#define CCB_ALIGN CCB_SIZE + +struct coprocessor_completion_block { + __be64 value; + __be64 address; +} __aligned(CCB_ALIGN); + + +/* Chapter 6.5.7 Coprocessor-Status Block (CSB) */ + +#define CSB_V (0x80) +#define CSB_F (0x04) +#define CSB_CH (0x03) +#define CSB_CE_INCOMPLETE (0x80) +#define CSB_CE_TERMINATION (0x40) +#define CSB_CE_TPBC (0x20) + +#define CSB_CC_SUCCESS (0) +#define CSB_CC_INVALID_ALIGN (1) +#define CSB_CC_OPERAND_OVERLAP (2) +#define CSB_CC_DATA_LENGTH (3) +#define CSB_CC_TRANSLATION (5) +#define CSB_CC_PROTECTION (6) +#define CSB_CC_RD_EXTERNAL (7) +#define CSB_CC_INVALID_OPERAND (8) +#define CSB_CC_PRIVILEGE (9) +#define CSB_CC_INTERNAL (10) +#define CSB_CC_WR_EXTERNAL (12) +#define CSB_CC_NOSPC (13) +#define CSB_CC_EXCESSIVE_DDE (14) +#define CSB_CC_WR_TRANSLATION (15) +#define CSB_CC_WR_PROTECTION (16) +#define CSB_CC_UNKNOWN_CODE (17) +#define CSB_CC_ABORT (18) +#define CSB_CC_TRANSPORT (20) +#define CSB_CC_SEGMENTED_DDL (31) +#define CSB_CC_PROGRESS_POINT (32) +#define CSB_CC_DDE_OVERFLOW (33) +#define CSB_CC_SESSION (34) +#define CSB_CC_PROVISION (36) +#define CSB_CC_CHAIN (37) +#define CSB_CC_SEQUENCE (38) +#define CSB_CC_HW (39) + +#define CSB_SIZE (0x10) +#define CSB_ALIGN CSB_SIZE + +struct coprocessor_status_block { + u8 flags; + u8 cs; + u8 cc; + u8 ce; + __be32 count; + __be64 address; +} __aligned(CSB_ALIGN); + + +/* Chapter 6.5.10 Data-Descriptor List (DDL) + * each list contains one or more Data-Descriptor Entries (DDE) + */ + +#define DDE_P (0x8000) + +#define DDE_SIZE (0x10) +#define DDE_ALIGN DDE_SIZE + +struct data_descriptor_entry { + __be16 flags; + u8 count; + u8 index; + __be32 length; + __be64 address; +} __aligned(DDE_ALIGN); + + +/* Chapter 6.5.2 Coprocessor-Request Block (CRB) */ + +#define CRB_SIZE (0x80) +#define CRB_ALIGN (0x100) /* Errata: requires 256 alignment */ + + +/* Coprocessor Status Block field + * ADDRESS address of CSB + * C CCB is valid + * AT 0 = addrs are virtual, 1 = addrs are phys + * M enable perf monitor + */ +#define CRB_CSB_ADDRESS (0xfffffffffffffff0) +#define CRB_CSB_C (0x0000000000000008) +#define CRB_CSB_AT (0x0000000000000002) +#define CRB_CSB_M (0x0000000000000001) + +struct coprocessor_request_block { + __be32 ccw; + __be32 flags; + __be64 csb_addr; + + struct data_descriptor_entry source; + struct data_descriptor_entry target; + + struct coprocessor_completion_block ccb; + + u8 reserved[48]; + + struct coprocessor_status_block csb; +} __aligned(CRB_ALIGN); + +#define crb_csb_addr(c) __be64_to_cpu(c->csb_addr) +#define crb_nx_fault_addr(c) __be64_to_cpu(c->stamp.nx.fault_storage_addr) +#define crb_nx_flags(c) c->stamp.nx.flags +#define crb_nx_fault_status(c) c->stamp.nx.fault_status +#define crb_nx_pswid(c) c->stamp.nx.pswid + + +/* RFC02167 Initiate Coprocessor Instructions document + * Chapter 8.2.1.1.1 RS + * Chapter 8.2.3 Coprocessor Directive + * Chapter 8.2.4 Execution + * + * The CCW must be converted to BE before passing to icswx() + */ + +#define CCW_PS (0xff000000) +#define CCW_CT (0x00ff0000) +#define CCW_CD (0x0000ffff) +#define CCW_CL (0x0000c000) + +#endif diff --git a/tools/testing/selftests/powerpc/nx-gzip/inc/nx-gzip.h b/tools/testing/selftests/powerpc/nx-gzip/inc/nx-gzip.h new file mode 100644 index 000000000000..75482c45574d --- /dev/null +++ b/tools/testing/selftests/powerpc/nx-gzip/inc/nx-gzip.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later + * + * Copyright 2020 IBM Corp. + * + */ + +#ifndef _UAPI_MISC_VAS_H +#define _UAPI_MISC_VAS_H + +#include + +#define VAS_FLAGS_PIN_WINDOW 0x1 +#define VAS_FLAGS_HIGH_PRI 0x2 + +#define VAS_FTW_SETUP _IOW('v', 1, struct vas_gzip_setup_attr) +#define VAS_842_TX_WIN_OPEN _IOW('v', 2, struct vas_gzip_setup_attr) +#define VAS_GZIP_TX_WIN_OPEN _IOW('v', 0x20, struct vas_gzip_setup_attr) + +struct vas_gzip_setup_attr { + int32_t version; + int16_t vas_id; + int16_t reserved1; + int64_t flags; + int64_t reserved2[6]; +}; + +#endif /* _UAPI_MISC_VAS_H */ diff --git a/tools/testing/selftests/powerpc/nx-gzip/inc/nx-helpers.h b/tools/testing/selftests/powerpc/nx-gzip/inc/nx-helpers.h new file mode 100644 index 000000000000..e0d68914c941 --- /dev/null +++ b/tools/testing/selftests/powerpc/nx-gzip/inc/nx-helpers.h @@ -0,0 +1,54 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#include +#include +#include +#include +#include "crb.h" + +#define cpu_to_be32 __cpu_to_be32 +#define cpu_to_be64 __cpu_to_be64 +#define be32_to_cpu __be32_to_cpu +#define be64_to_cpu __be64_to_cpu + +/* + * Several helpers/macros below were copied from the tree + * (kernel.h, nx-842.h, nx-ftw.h, asm-compat.h etc) + */ + +/* from kernel.h */ +#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0) +#define __round_mask(x, y) ((__typeof__(x))((y)-1)) +#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1) +#define round_down(x, y) ((x) & ~__round_mask(x, y)) + +#define min_t(t, x, y) ((x) < (y) ? (x) : (y)) +/* + * Get/Set bit fields. (from nx-842.h) + */ +#define GET_FIELD(m, v) (((v) & (m)) >> MASK_LSH(m)) +#define MASK_LSH(m) (__builtin_ffsl(m) - 1) +#define SET_FIELD(m, v, val) \ + (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_LSH(m)) & (m))) + +/* From asm-compat.h */ +#define __stringify_in_c(...) #__VA_ARGS__ +#define stringify_in_c(...) __stringify_in_c(__VA_ARGS__) " " + +#define pr_debug +#define pr_debug_ratelimited printf +#define pr_err printf +#define pr_err_ratelimited printf + +#define WARN_ON_ONCE(x) do {if (x) \ + printf("WARNING: %s:%d\n", __func__, __LINE__)\ + } while (0) + +extern void dump_buffer(char *msg, char *buf, int len); +extern void *alloc_aligned_mem(int len, int align, char *msg); +extern void get_payload(char *buf, int len); +extern void time_add(struct timeval *in, int seconds, struct timeval *out); + +extern bool time_after(struct timeval *a, struct timeval *b); +extern long time_delta(struct timeval *a, struct timeval *b); +extern void dump_dde(struct data_descriptor_entry *dde, char *msg); +extern void copy_paste_crb_data(struct coprocessor_request_block *crb); diff --git a/tools/testing/selftests/powerpc/nx-gzip/inc/nx.h b/tools/testing/selftests/powerpc/nx-gzip/inc/nx.h new file mode 100644 index 000000000000..1ae8348b59d6 --- /dev/null +++ b/tools/testing/selftests/powerpc/nx-gzip/inc/nx.h @@ -0,0 +1,38 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later + * + * Copyright 2020 IBM Corp. + * + */ +#ifndef _NX_H +#define _NX_H + +#include + +#define NX_FUNC_COMP_842 1 +#define NX_FUNC_COMP_GZIP 2 + +#ifndef __aligned +#define __aligned(x) __attribute__((aligned(x))) +#endif + +struct nx842_func_args { + bool use_crc; + bool decompress; /* true decompress; false compress */ + bool move_data; + int timeout; /* seconds */ +}; + +struct nxbuf_t { + int len; + char *buf; +}; + +/* @function should be EFT (aka 842), GZIP etc */ +extern void *nx_function_begin(int function, int pri); + +extern int nx_function(void *handle, struct nxbuf_t *in, struct nxbuf_t *out, + void *arg); + +extern int nx_function_end(void *handle); + +#endif /* _NX_H */ From patchwork Mon Apr 13 15:59:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Raphael Moreira Zinsly X-Patchwork-Id: 197877 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50B66C2BBFD for ; Mon, 13 Apr 2020 16:00:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 305442072D for ; Mon, 13 Apr 2020 16:00:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731185AbgDMQAA (ORCPT ); Mon, 13 Apr 2020 12:00:00 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:33884 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731187AbgDMP77 (ORCPT ); Mon, 13 Apr 2020 11:59:59 -0400 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 03DFqSYT112712; Mon, 13 Apr 2020 11:59:45 -0400 Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0b-001b2d01.pphosted.com with ESMTP id 30cu3x85mk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 13 Apr 2020 11:59:45 -0400 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.16.0.27/8.16.0.27) with SMTP id 03DFuVIb018547; Mon, 13 Apr 2020 15:59:44 GMT Received: from b03cxnp07028.gho.boulder.ibm.com (b03cxnp07028.gho.boulder.ibm.com [9.17.130.15]) by ppma04dal.us.ibm.com with ESMTP id 30b5h6nsfg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 13 Apr 2020 15:59:44 +0000 Received: from b03ledav005.gho.boulder.ibm.com (b03ledav005.gho.boulder.ibm.com [9.17.130.236]) by b03cxnp07028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 03DFxgi123790038 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 13 Apr 2020 15:59:42 GMT Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CA8F2BE051; Mon, 13 Apr 2020 15:59:42 +0000 (GMT) Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 38575BE04F; Mon, 13 Apr 2020 15:59:42 +0000 (GMT) Received: from localhost (unknown [9.85.151.130]) by b03ledav005.gho.boulder.ibm.com (Postfix) with ESMTP; Mon, 13 Apr 2020 15:59:41 +0000 (GMT) From: Raphael Moreira Zinsly To: linuxppc-dev@lists.ozlabs.org, linux-crypto@vger.kernel.org, dja@axtens.net Cc: rzinsly@linux.ibm.com, herbert@gondor.apana.org.au, mpe@ellerman.id.au, haren@linux.ibm.com, abali@us.ibm.com Subject: [PATCH V3 3/5] selftests/powerpc: Add NX-GZIP engine compress testcase Date: Mon, 13 Apr 2020 12:59:14 -0300 Message-Id: <20200413155916.16900-4-rzinsly@linux.ibm.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200413155916.16900-1-rzinsly@linux.ibm.com> References: <20200413155916.16900-1-rzinsly@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.676 definitions=2020-04-13_07:2020-04-13,2020-04-13 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 impostorscore=0 phishscore=0 adultscore=0 bulkscore=0 mlxlogscore=999 priorityscore=1501 malwarescore=0 suspectscore=0 clxscore=1015 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004130116 Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Add a compression testcase for the powerpc NX-GZIP engine. Signed-off-by: Bulent Abali Signed-off-by: Raphael Moreira Zinsly --- .../selftests/powerpc/nx-gzip/Makefile | 21 + .../selftests/powerpc/nx-gzip/gzfht_test.c | 432 ++++++++++++++++++ .../selftests/powerpc/nx-gzip/gzip_vas.c | 316 +++++++++++++ 3 files changed, 769 insertions(+) create mode 100644 tools/testing/selftests/powerpc/nx-gzip/Makefile create mode 100644 tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c create mode 100644 tools/testing/selftests/powerpc/nx-gzip/gzip_vas.c diff --git a/tools/testing/selftests/powerpc/nx-gzip/Makefile b/tools/testing/selftests/powerpc/nx-gzip/Makefile new file mode 100644 index 000000000000..ab903f63bbbd --- /dev/null +++ b/tools/testing/selftests/powerpc/nx-gzip/Makefile @@ -0,0 +1,21 @@ +CC = gcc +CFLAGS = -O3 +INC = ./inc +SRC = gzfht_test.c +OBJ = $(SRC:.c=.o) +TESTS = gzfht_test +EXTRA_SOURCES = gzip_vas.c + +all: $(TESTS) + +$(OBJ): %.o: %.c + $(CC) $(CFLAGS) -I$(INC) -c $< + +$(TESTS): $(OBJ) + $(CC) $(CFLAGS) -I$(INC) -o $@ $@.o $(EXTRA_SOURCES) + +run_tests: $(TESTS) + ./gzfht_test gzip_vas.c + +clean: + rm -f $(TESTS) *.o *~ *.gz diff --git a/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c new file mode 100644 index 000000000000..e60f743e2c6b --- /dev/null +++ b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c @@ -0,0 +1,432 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +/* P9 gzip sample code for demonstrating the P9 NX hardware interface. + * Not intended for productive uses or for performance or compression + * ratio measurements. For simplicity of demonstration, this sample + * code compresses in to fixed Huffman blocks only (Deflate btype=1) + * and has very simple memory management. Dynamic Huffman blocks + * (Deflate btype=2) are more involved as detailed in the user guide. + * Note also that /dev/crypto/gzip, VAS and skiboot support are + * required. + * + * Copyright 2020 IBM Corp. + * + * https://github.com/libnxz/power-gzip for zlib api and other utils + * + * Author: Bulent Abali + * + * Definitions of acronyms used here. See + * P9 NX Gzip Accelerator User's Manual for details: + * https://github.com/libnxz/power-gzip/blob/develop/doc/power_nx_gzip_um.pdf + * + * adler/crc: 32 bit checksums appended to stream tail + * ce: completion extension + * cpb: coprocessor parameter block (metadata) + * crb: coprocessor request block (command) + * csb: coprocessor status block (status) + * dht: dynamic huffman table + * dde: data descriptor element (address, length) + * ddl: list of ddes + * dh/fh: dynamic and fixed huffman types + * fc: coprocessor function code + * histlen: history/dictionary length + * history: sliding window of up to 32KB of data + * lzcount: Deflate LZ symbol counts + * rembytecnt: remaining byte count + * sfbt: source final block type; last block's type during decomp + * spbc: source processed byte count + * subc: source unprocessed bit count + * tebc: target ending bit count; valid bits in the last byte + * tpbc: target processed byte count + * vas: virtual accelerator switch; the user mode interface + */ + + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "nxu.h" +#include "nx.h" + +int nx_dbg; +FILE *nx_gzip_log; + +#define NX_MIN(X, Y) (((X) < (Y)) ? (X) : (Y)) +#define FNAME_MAX 1024 +#define FEXT ".nx.gz" + +/* + * LZ counts returned in the user supplied nx_gzip_crb_cpb_t structure. + */ +static int compress_fht_sample(char *src, uint32_t srclen, char *dst, + uint32_t dstlen, int with_count, + struct nx_gzip_crb_cpb_t *cmdp, void *handle) +{ + int cc; + uint32_t fc; + + assert(!!cmdp); + + put32(cmdp->crb, gzip_fc, 0); /* clear */ + fc = (with_count) ? GZIP_FC_COMPRESS_RESUME_FHT_COUNT : + GZIP_FC_COMPRESS_RESUME_FHT; + putnn(cmdp->crb, gzip_fc, fc); + putnn(cmdp->cpb, in_histlen, 0); /* resuming with no history */ + memset((void *) &cmdp->crb.csb, 0, sizeof(cmdp->crb.csb)); + + /* Section 6.6 programming notes; spbc may be in two different + * places depending on FC. + */ + if (!with_count) + put32(cmdp->cpb, out_spbc_comp, 0); + else + put32(cmdp->cpb, out_spbc_comp_with_count, 0); + + /* Figure 6-3 6-4; CSB location */ + put64(cmdp->crb, csb_address, 0); + put64(cmdp->crb, csb_address, + (uint64_t) &cmdp->crb.csb & csb_address_mask); + + /* Source direct dde (scatter-gather list) */ + clear_dde(cmdp->crb.source_dde); + putnn(cmdp->crb.source_dde, dde_count, 0); + put32(cmdp->crb.source_dde, ddebc, srclen); + put64(cmdp->crb.source_dde, ddead, (uint64_t) src); + + /* Target direct dde (scatter-gather list) */ + clear_dde(cmdp->crb.target_dde); + putnn(cmdp->crb.target_dde, dde_count, 0); + put32(cmdp->crb.target_dde, ddebc, dstlen); + put64(cmdp->crb.target_dde, ddead, (uint64_t) dst); + + /* Submit the crb, the job descriptor, to the accelerator */ + return nxu_submit_job(cmdp, handle); +} + +/* + * Prepares a blank no filename no timestamp gzip header and returns + * the number of bytes written to buf. + * Gzip specification at https://tools.ietf.org/html/rfc1952 + */ +int gzip_header_blank(char *buf) +{ + int i = 0; + + buf[i++] = 0x1f; /* ID1 */ + buf[i++] = 0x8b; /* ID2 */ + buf[i++] = 0x08; /* CM */ + buf[i++] = 0x00; /* FLG */ + buf[i++] = 0x00; /* MTIME */ + buf[i++] = 0x00; /* MTIME */ + buf[i++] = 0x00; /* MTIME */ + buf[i++] = 0x00; /* MTIME */ + buf[i++] = 0x04; /* XFL 4=fastest */ + buf[i++] = 0x03; /* OS UNIX */ + + return i; +} + +/* Caller must free the allocated buffer return nonzero on error. */ +int read_alloc_input_file(char *fname, char **buf, size_t *bufsize) +{ + struct stat statbuf; + FILE *fp; + char *p; + size_t num_bytes; + + if (stat(fname, &statbuf)) { + perror(fname); + return(-1); + } + fp = fopen(fname, "r"); + if (fp == NULL) { + perror(fname); + return(-1); + } + assert(NULL != (p = (char *) malloc(statbuf.st_size))); + num_bytes = fread(p, 1, statbuf.st_size, fp); + if (ferror(fp) || (num_bytes != statbuf.st_size)) { + perror(fname); + return(-1); + } + *buf = p; + *bufsize = num_bytes; + return 0; +} + +/* Returns nonzero on error */ +int write_output_file(char *fname, char *buf, size_t bufsize) +{ + FILE *fp; + size_t num_bytes; + + fp = fopen(fname, "w"); + if (fp == NULL) { + perror(fname); + return(-1); + } + num_bytes = fwrite(buf, 1, bufsize, fp); + if (ferror(fp) || (num_bytes != bufsize)) { + perror(fname); + return(-1); + } + fclose(fp); + return 0; +} + +/* + * Z_SYNC_FLUSH as described in zlib.h. + * Returns number of appended bytes + */ +int append_sync_flush(char *buf, int tebc, int final) +{ + uint64_t flush; + int shift = (tebc & 0x7); + + if (tebc > 0) { + /* Last byte is partially full */ + buf = buf - 1; + *buf = *buf & (unsigned char) ((1< 0) { + *buf++ = (unsigned char) (flush & 0xffULL); + flush = flush >> 8; + shift = shift - 8; + } + return(((tebc > 5) || (tebc == 0)) ? 5 : 4); +} + +/* + * Final deflate block bit. This call assumes the block + * beginning is byte aligned. + */ +static void set_bfinal(void *buf, int bfinal) +{ + char *b = buf; + + if (bfinal) + *b = *b | (unsigned char) 0x01; + else + *b = *b & (unsigned char) 0xfe; +} + +int compress_file(int argc, char **argv, void *handle) +{ + char *inbuf, *outbuf, *srcbuf, *dstbuf; + char outname[FNAME_MAX]; + uint32_t srclen, dstlen; + uint32_t flushlen, chunk; + size_t inlen, outlen, dsttotlen, srctotlen; + uint32_t crc, spbc, tpbc, tebc; + int lzcounts = 0; + int cc; + int num_hdr_bytes; + struct nx_gzip_crb_cpb_t *cmdp; + uint32_t pagelen = 65536; + int fault_tries = NX_MAX_FAULTS; + + cmdp = (void *)(uintptr_t) + aligned_alloc(sizeof(struct nx_gzip_crb_cpb_t), + sizeof(struct nx_gzip_crb_cpb_t)); + + if (argc != 2) { + fprintf(stderr, "usage: %s \n", argv[0]); + exit(-1); + } + if (read_alloc_input_file(argv[1], &inbuf, &inlen)) + exit(-1); + fprintf(stderr, "file %s read, %ld bytes\n", argv[1], inlen); + + /* Generous output buffer for header/trailer */ + outlen = 2 * inlen + 1024; + + assert(NULL != (outbuf = (char *)malloc(outlen))); + nxu_touch_pages(outbuf, outlen, pagelen, 1); + + /* Compress piecemeal in smallish chunks */ + chunk = 1<<22; + + /* Write the gzip header to the stream */ + num_hdr_bytes = gzip_header_blank(outbuf); + dstbuf = outbuf + num_hdr_bytes; + outlen = outlen - num_hdr_bytes; + dsttotlen = num_hdr_bytes; + + srcbuf = inbuf; + srctotlen = 0; + + /* Init the CRB, the coprocessor request block */ + memset(&cmdp->crb, 0, sizeof(cmdp->crb)); + + /* Initial gzip crc32 */ + put32(cmdp->cpb, in_crc, 0); + + while (inlen > 0) { + + /* Submit chunk size source data per job */ + srclen = NX_MIN(chunk, inlen); + /* Supply large target in case data expands */ + dstlen = NX_MIN(2*srclen, outlen); + + /* Page faults are handled by the user code */ + + /* Fault-in pages; an improved code wouldn't touch so + * many pages but would try to estimate the + * compression ratio and adjust both the src and dst + * touch amounts. + */ + nxu_touch_pages(cmdp, sizeof(struct nx_gzip_crb_cpb_t), pagelen, + 1); + nxu_touch_pages(srcbuf, srclen, pagelen, 0); + nxu_touch_pages(dstbuf, dstlen, pagelen, 1); + + cc = compress_fht_sample( + srcbuf, srclen, + dstbuf, dstlen, + lzcounts, cmdp, handle); + + if (cc != ERR_NX_OK && cc != ERR_NX_TPBC_GT_SPBC && + cc != ERR_NX_TRANSLATION) { + fprintf(stderr, "nx error: cc= %d\n", cc); + exit(-1); + } + + /* Page faults are handled by the user code */ + if (cc == ERR_NX_TRANSLATION) { + NXPRT(fprintf(stderr, "page fault: cc= %d, ", cc)); + NXPRT(fprintf(stderr, "try= %d, fsa= %08llx\n", + fault_tries, + (unsigned long long) cmdp->crb.csb.fsaddr)); + fault_tries--; + if (fault_tries > 0) { + continue; + } else { + fprintf(stderr, "error: cannot progress; "); + fprintf(stderr, "too many faults\n"); + exit(-1); + }; + } + + fault_tries = NX_MAX_FAULTS; /* Reset for the next chunk */ + + inlen = inlen - srclen; + srcbuf = srcbuf + srclen; + srctotlen = srctotlen + srclen; + + /* Two possible locations for spbc depending on the function + * code. + */ + spbc = (!lzcounts) ? get32(cmdp->cpb, out_spbc_comp) : + get32(cmdp->cpb, out_spbc_comp_with_count); + assert(spbc == srclen); + + /* Target byte count */ + tpbc = get32(cmdp->crb.csb, tpbc); + /* Target ending bit count */ + tebc = getnn(cmdp->cpb, out_tebc); + NXPRT(fprintf(stderr, "compressed chunk %d ", spbc)); + NXPRT(fprintf(stderr, "to %d bytes, tebc= %d\n", tpbc, tebc)); + + if (inlen > 0) { /* More chunks to go */ + set_bfinal(dstbuf, 0); + dstbuf = dstbuf + tpbc; + dsttotlen = dsttotlen + tpbc; + outlen = outlen - tpbc; + /* Round up to the next byte with a flush + * block; do not set the BFINAqL bit. + */ + flushlen = append_sync_flush(dstbuf, tebc, 0); + dsttotlen = dsttotlen + flushlen; + outlen = outlen - flushlen; + dstbuf = dstbuf + flushlen; + NXPRT(fprintf(stderr, "added sync_flush %d bytes\n", + flushlen)); + } else { /* Done */ + /* Set the BFINAL bit of the last block per Deflate + * specification. + */ + set_bfinal(dstbuf, 1); + dstbuf = dstbuf + tpbc; + dsttotlen = dsttotlen + tpbc; + outlen = outlen - tpbc; + } + + /* Resuming crc32 for the next chunk */ + crc = get32(cmdp->cpb, out_crc); + put32(cmdp->cpb, in_crc, crc); + crc = be32toh(crc); + } + + /* Append crc32 and ISIZE to the end */ + memcpy(dstbuf, &crc, 4); + memcpy(dstbuf+4, &srctotlen, 4); + dsttotlen = dsttotlen + 8; + outlen = outlen - 8; + + assert(FNAME_MAX > (strlen(argv[1]) + strlen(FEXT))); + strcpy(outname, argv[1]); + strcat(outname, FEXT); + if (write_output_file(outname, outbuf, dsttotlen)) { + fprintf(stderr, "write error: %s\n", outname); + exit(-1); + } + + fprintf(stderr, "compressed %ld to %ld bytes total, ", srctotlen, + dsttotlen); + fprintf(stderr, "crc32 checksum = %08x\n", crc); + + if (inbuf != NULL) + free(inbuf); + + if (outbuf != NULL) + free(outbuf); + + return 0; +} + +int main(int argc, char **argv) +{ + int rc; + struct sigaction act; + void *handle; + + nx_dbg = 0; + nx_gzip_log = NULL; + act.sa_handler = 0; + act.sa_sigaction = nxu_sigsegv_handler; + act.sa_flags = SA_SIGINFO; + act.sa_restorer = 0; + sigemptyset(&act.sa_mask); + sigaction(SIGSEGV, &act, NULL); + + handle = nx_function_begin(NX_FUNC_COMP_GZIP, 0); + if (!handle) { + fprintf(stderr, "Unable to init NX, errno %d\n", errno); + exit(-1); + } + + rc = compress_file(argc, argv, handle); + + nx_function_end(handle); + + return rc; +} diff --git a/tools/testing/selftests/powerpc/nx-gzip/gzip_vas.c b/tools/testing/selftests/powerpc/nx-gzip/gzip_vas.c new file mode 100644 index 000000000000..e96b073ca2e5 --- /dev/null +++ b/tools/testing/selftests/powerpc/nx-gzip/gzip_vas.c @@ -0,0 +1,316 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +/* Copyright 2020 IBM Corp. + * + * Author: Bulent Abali + * + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "nx-gzip.h" +#include "nx.h" +#include "copy-paste.h" +#include "nxu.h" +#include "nx_dbg.h" +#include + +#define barrier() +#define hwsync() ({ asm volatile("sync" ::: "memory"); }) + +#ifndef NX_NO_CPU_PRI +#define cpu_pri_default() ({ asm volatile ("or 2, 2, 2"); }) +#define cpu_pri_low() ({ asm volatile ("or 31, 31, 31"); }) +#else +#define cpu_pri_default() +#define cpu_pri_low() +#endif + +void *nx_fault_storage_address; + +struct nx_handle { + int fd; + int function; + void *paste_addr; +}; + +static int open_device_nodes(char *devname, int pri, struct nx_handle *handle) +{ + int rc, fd; + void *addr; + struct vas_gzip_setup_attr txattr; + + fd = open(devname, O_RDWR); + if (fd < 0) { + fprintf(stderr, " open device name %s\n", devname); + return -errno; + } + + memset(&txattr, 0, sizeof(txattr)); + txattr.version = 1; + txattr.vas_id = pri; + rc = ioctl(fd, VAS_GZIP_TX_WIN_OPEN, (unsigned long)&txattr); + if (rc < 0) { + fprintf(stderr, "ioctl() n %d, error %d\n", rc, errno); + rc = -errno; + goto out; + } + + addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0ULL); + if (addr == MAP_FAILED) { + fprintf(stderr, "mmap() failed, errno %d\n", errno); + rc = -errno; + goto out; + } + handle->fd = fd; + handle->paste_addr = (void *)((char *)addr + 0x400); + + rc = 0; +out: + close(fd); + return rc; +} + +void *nx_function_begin(int function, int pri) +{ + int rc; + char *devname = "/dev/crypto/nx-gzip"; + struct nx_handle *nxhandle; + + if (function != NX_FUNC_COMP_GZIP) { + errno = EINVAL; + fprintf(stderr, " NX_FUNC_COMP_GZIP not found\n"); + return NULL; + } + + + nxhandle = malloc(sizeof(*nxhandle)); + if (!nxhandle) { + errno = ENOMEM; + fprintf(stderr, " No memory\n"); + return NULL; + } + + nxhandle->function = function; + rc = open_device_nodes(devname, pri, nxhandle); + if (rc < 0) { + errno = -rc; + fprintf(stderr, " open_device_nodes failed\n"); + return NULL; + } + + return nxhandle; +} + +int nx_function_end(void *handle) +{ + int rc = 0; + struct nx_handle *nxhandle = handle; + + rc = munmap(nxhandle->paste_addr - 0x400, 4096); + if (rc < 0) { + fprintf(stderr, "munmap() failed, errno %d\n", errno); + return rc; + } + close(nxhandle->fd); + free(nxhandle); + + return rc; +} + +static int nx_wait_for_csb(struct nx_gzip_crb_cpb_t *cmdp) +{ + long poll = 0; + uint64_t t; + + /* Save power and let other threads use the h/w. top may show + * 100% but only because OS doesn't know we slowed the this + * h/w thread while polling. We're letting other threads have + * higher throughput on the core. + */ + cpu_pri_low(); + +#define CSB_MAX_POLL 200000000UL +#define USLEEP_TH 300000UL + + t = __ppc_get_timebase(); + + while (getnn(cmdp->crb.csb, csb_v) == 0) { + ++poll; + hwsync(); + + cpu_pri_low(); + + /* usleep(0) takes around 29000 ticks ~60 us. + * 300000 is spinning for about 600 us then + * start sleeping. + */ + if ((__ppc_get_timebase() - t) > USLEEP_TH) { + cpu_pri_default(); + usleep(1); + } + + if (poll > CSB_MAX_POLL) + break; + + /* Fault address from signal handler */ + if (nx_fault_storage_address) { + cpu_pri_default(); + return -EAGAIN; + } + + } + + cpu_pri_default(); + + /* hw has updated csb and output buffer */ + hwsync(); + + /* Check CSB flags. */ + if (getnn(cmdp->crb.csb, csb_v) == 0) { + fprintf(stderr, "CSB still not valid after %d polls.\n", + (int) poll); + prt_err("CSB still not valid after %d polls, giving up.\n", + (int) poll); + return -ETIMEDOUT; + } + + return 0; +} + +static int nxu_run_job(struct nx_gzip_crb_cpb_t *cmdp, void *handle) +{ + int i, ret, retries; + struct nx_handle *nxhandle = handle; + + assert(handle != NULL); + i = 0; + retries = 5000; + while (i++ < retries) { + hwsync(); + vas_copy(&cmdp->crb, 0); + ret = vas_paste(nxhandle->paste_addr, 0); + hwsync(); + + NXPRT(fprintf(stderr, "Paste attempt %d/%d returns 0x%x\n", + i, retries, ret)); + + if ((ret == 2) || (ret == 3)) { + + ret = nx_wait_for_csb(cmdp); + if (!ret) { + goto out; + } else if (ret == -EAGAIN) { + long x; + + prt_err("Touching address %p, 0x%lx\n", + nx_fault_storage_address, + *(long *) nx_fault_storage_address); + x = *(long *) nx_fault_storage_address; + *(long *) nx_fault_storage_address = x; + nx_fault_storage_address = 0; + continue; + } else { + prt_err("wait_for_csb() returns %d\n", ret); + break; + } + } else { + if (i < 10) { + /* spin for few ticks */ +#define SPIN_TH 500UL + uint64_t fail_spin; + + fail_spin = __ppc_get_timebase(); + while ((__ppc_get_timebase() - fail_spin) < + SPIN_TH) + ; + } else { + /* sleep */ + unsigned int pr = 0; + + if (pr++ % 100 == 0) { + prt_err("Paste attempt %d/", i); + prt_err("%d, failed pid= %d\n", retries, + getpid()); + } + usleep(1); + } + continue; + } + } + +out: + cpu_pri_default(); + + return ret; +} + +int nxu_submit_job(struct nx_gzip_crb_cpb_t *cmdp, void *handle) +{ + int cc; + + cc = nxu_run_job(cmdp, handle); + + if (!cc) + cc = getnn(cmdp->crb.csb, csb_cc); /* CC Table 6-8 */ + + return cc; +} + + +void nxu_sigsegv_handler(int sig, siginfo_t *info, void *ctx) +{ + fprintf(stderr, "%d: Got signal %d si_code %d, si_addr %p\n", getpid(), + sig, info->si_code, info->si_addr); + + nx_fault_storage_address = info->si_addr; +} + +/* + * Fault in pages prior to NX job submission. wr=1 may be required to + * touch writeable pages. System zero pages do not fault-in the page as + * intended. Typically set wr=1 for NX target pages and set wr=0 for NX + * source pages. + */ +int nxu_touch_pages(void *buf, long buf_len, long page_len, int wr) +{ + char *begin = buf; + char *end = (char *) buf + buf_len - 1; + volatile char t; + + assert(buf_len >= 0 && !!buf); + + NXPRT(fprintf(stderr, "touch %p %p len 0x%lx wr=%d\n", buf, + (buf + buf_len), buf_len, wr)); + + if (buf_len <= 0 || buf == NULL) + return -1; + + do { + t = *begin; + if (wr) + *begin = t; + begin = begin + page_len; + } while (begin < end); + + /* When buf_sz is small or buf tail is in another page */ + t = *end; + if (wr) + *end = t; + + return 0; +} + From patchwork Mon Apr 13 15:59:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Raphael Moreira Zinsly X-Patchwork-Id: 197876 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64B1DC2BBFD for ; Mon, 13 Apr 2020 16:00:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 41AD220732 for ; Mon, 13 Apr 2020 16:00:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731187AbgDMQAI (ORCPT ); Mon, 13 Apr 2020 12:00:08 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:55868 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731188AbgDMQAC (ORCPT ); Mon, 13 Apr 2020 12:00:02 -0400 Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 03DFYeLB113394; Mon, 13 Apr 2020 11:59:47 -0400 Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0b-001b2d01.pphosted.com with ESMTP id 30ba1gr6qb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 13 Apr 2020 11:59:47 -0400 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.16.0.27/8.16.0.27) with SMTP id 03DFvFOw018491; Mon, 13 Apr 2020 15:59:46 GMT Received: from b03cxnp08028.gho.boulder.ibm.com (b03cxnp08028.gho.boulder.ibm.com [9.17.130.20]) by ppma03wdc.us.ibm.com with ESMTP id 30b5h61ecj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 13 Apr 2020 15:59:46 +0000 Received: from b03ledav005.gho.boulder.ibm.com (b03ledav005.gho.boulder.ibm.com [9.17.130.236]) by b03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 03DFxjIR57803236 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 13 Apr 2020 15:59:45 GMT Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2DE68BE053; Mon, 13 Apr 2020 15:59:45 +0000 (GMT) Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 97FCDBE054; Mon, 13 Apr 2020 15:59:44 +0000 (GMT) Received: from localhost (unknown [9.85.151.130]) by b03ledav005.gho.boulder.ibm.com (Postfix) with ESMTP; Mon, 13 Apr 2020 15:59:44 +0000 (GMT) From: Raphael Moreira Zinsly To: linuxppc-dev@lists.ozlabs.org, linux-crypto@vger.kernel.org, dja@axtens.net Cc: rzinsly@linux.ibm.com, herbert@gondor.apana.org.au, mpe@ellerman.id.au, haren@linux.ibm.com, abali@us.ibm.com Subject: [PATCH V3 4/5] selftests/powerpc: Add NX-GZIP engine decompress testcase Date: Mon, 13 Apr 2020 12:59:15 -0300 Message-Id: <20200413155916.16900-5-rzinsly@linux.ibm.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200413155916.16900-1-rzinsly@linux.ibm.com> References: <20200413155916.16900-1-rzinsly@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.676 definitions=2020-04-13_07:2020-04-13,2020-04-13 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 lowpriorityscore=0 bulkscore=0 mlxscore=0 adultscore=0 phishscore=0 priorityscore=1501 mlxlogscore=999 suspectscore=2 impostorscore=0 clxscore=1015 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004130116 Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Include a decompression testcase for the powerpc NX-GZIP engine. Signed-off-by: Bulent Abali Signed-off-by: Raphael Moreira Zinsly --- .../selftests/powerpc/nx-gzip/Makefile | 7 +- .../selftests/powerpc/nx-gzip/gunz_test.c | 1026 +++++++++++++++++ 2 files changed, 1030 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/powerpc/nx-gzip/gunz_test.c diff --git a/tools/testing/selftests/powerpc/nx-gzip/Makefile b/tools/testing/selftests/powerpc/nx-gzip/Makefile index ab903f63bbbd..82abc19a49a0 100644 --- a/tools/testing/selftests/powerpc/nx-gzip/Makefile +++ b/tools/testing/selftests/powerpc/nx-gzip/Makefile @@ -1,9 +1,9 @@ CC = gcc CFLAGS = -O3 INC = ./inc -SRC = gzfht_test.c +SRC = gzfht_test.c gunz_test.c OBJ = $(SRC:.c=.o) -TESTS = gzfht_test +TESTS = gzfht_test gunz_test EXTRA_SOURCES = gzip_vas.c all: $(TESTS) @@ -16,6 +16,7 @@ $(TESTS): $(OBJ) run_tests: $(TESTS) ./gzfht_test gzip_vas.c + ./gunz_test gzip_vas.c.nx.gz clean: - rm -f $(TESTS) *.o *~ *.gz + rm -f $(TESTS) *.o *~ *.gz *.gunzip diff --git a/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c new file mode 100644 index 000000000000..94cb79616225 --- /dev/null +++ b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c @@ -0,0 +1,1026 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +/* P9 gunzip sample code for demonstrating the P9 NX hardware + * interface. Not intended for productive uses or for performance or + * compression ratio measurements. Note also that /dev/crypto/gzip, + * VAS and skiboot support are required + * + * Copyright 2020 IBM Corp. + * + * Author: Bulent Abali + * + * https://github.com/libnxz/power-gzip for zlib api and other utils + * Definitions of acronyms used here. See + * P9 NX Gzip Accelerator User's Manual for details: + * https://github.com/libnxz/power-gzip/blob/develop/doc/power_nx_gzip_um.pdf + * + * adler/crc: 32 bit checksums appended to stream tail + * ce: completion extension + * cpb: coprocessor parameter block (metadata) + * crb: coprocessor request block (command) + * csb: coprocessor status block (status) + * dht: dynamic huffman table + * dde: data descriptor element (address, length) + * ddl: list of ddes + * dh/fh: dynamic and fixed huffman types + * fc: coprocessor function code + * histlen: history/dictionary length + * history: sliding window of up to 32KB of data + * lzcount: Deflate LZ symbol counts + * rembytecnt: remaining byte count + * sfbt: source final block type; last block's type during decomp + * spbc: source processed byte count + * subc: source unprocessed bit count + * tebc: target ending bit count; valid bits in the last byte + * tpbc: target processed byte count + * vas: virtual accelerator switch; the user mode interface + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "nxu.h" +#include "nx.h" +#include "crb.h" + +int nx_dbg; +FILE *nx_gzip_log; + +#define NX_MIN(X, Y) (((X) < (Y))?(X):(Y)) +#define NX_MAX(X, Y) (((X) > (Y))?(X):(Y)) + +#define GETINPC(X) fgetc(X) +#define FNAME_MAX 1024 + +/* fifo queue management */ +#define fifo_used_bytes(used) (used) +#define fifo_free_bytes(used, len) ((len)-(used)) +/* amount of free bytes in the first and last parts */ +#define fifo_free_first_bytes(cur, used, len) ((((cur)+(used)) <= (len)) \ + ? (len)-((cur)+(used)) : 0) +#define fifo_free_last_bytes(cur, used, len) ((((cur)+(used)) <= (len)) \ + ? (cur) : (len)-(used)) +/* amount of used bytes in the first and last parts */ +#define fifo_used_first_bytes(cur, used, len) ((((cur)+(used)) <= (len)) \ + ? (used) : (len)-(cur)) +#define fifo_used_last_bytes(cur, used, len) ((((cur)+(used)) <= (len)) \ + ? 0 : ((used)+(cur))-(len)) +/* first and last free parts start here */ +#define fifo_free_first_offset(cur, used) ((cur)+(used)) +#define fifo_free_last_offset(cur, used, len) \ + fifo_used_last_bytes(cur, used, len) +/* first and last used parts start here */ +#define fifo_used_first_offset(cur) (cur) +#define fifo_used_last_offset(cur) (0) + +const int fifo_in_len = 1<<24; +const int fifo_out_len = 1<<24; +const int page_sz = 1<<16; +const int line_sz = 1<<7; +const int window_max = 1<<15; + +/* + * Adds an (address, len) pair to the list of ddes (ddl) and updates + * the base dde. ddl[0] is the only dde in a direct dde which + * contains a single (addr,len) pair. For more pairs, ddl[0] becomes + * the indirect (base) dde that points to a list of direct ddes. + * See Section 6.4 of the NX-gzip user manual for DDE description. + * Addr=NULL, len=0 clears the ddl[0]. Returns the total number of + * bytes in ddl. Caller is responsible for allocting the array of + * nx_dde_t *ddl. If N addresses are required in the scatter-gather + * list, the ddl array must have N+1 entries minimum. + */ +static inline uint32_t nx_append_dde(struct nx_dde_t *ddl, void *addr, + uint32_t len) +{ + uint32_t ddecnt; + uint32_t bytes; + + if (addr == NULL && len == 0) { + clearp_dde(ddl); + return 0; + } + + NXPRT(fprintf(stderr, "%d: %s addr %p len %x\n", __LINE__, addr, + __func__, len)); + + /* Number of ddes in the dde list ; == 0 when it is a direct dde */ + ddecnt = getpnn(ddl, dde_count); + bytes = getp32(ddl, ddebc); + + if (ddecnt == 0 && bytes == 0) { + /* First dde is unused; make it a direct dde */ + bytes = len; + putp32(ddl, ddebc, bytes); + putp64(ddl, ddead, (uint64_t) addr); + } else if (ddecnt == 0) { + /* Converting direct to indirect dde + * ddl[0] becomes head dde of ddl + * copy direct to indirect first. + */ + ddl[1] = ddl[0]; + + /* Add the new dde next */ + clear_dde(ddl[2]); + put32(ddl[2], ddebc, len); + put64(ddl[2], ddead, (uint64_t) addr); + + /* Ddl head points to 2 direct ddes */ + ddecnt = 2; + putpnn(ddl, dde_count, ddecnt); + bytes = bytes + len; + putp32(ddl, ddebc, bytes); + /* Pointer to the first direct dde */ + putp64(ddl, ddead, (uint64_t) &ddl[1]); + } else { + /* Append a dde to an existing indirect ddl */ + ++ddecnt; + clear_dde(ddl[ddecnt]); + put64(ddl[ddecnt], ddead, (uint64_t) addr); + put32(ddl[ddecnt], ddebc, len); + + putpnn(ddl, dde_count, ddecnt); + bytes = bytes + len; + putp32(ddl, ddebc, bytes); /* byte sum of all dde */ + } + return bytes; +} + +/* + * Touch specified number of pages represented in number bytes + * beginning from the first buffer in a dde list. + * Do not touch the pages past buf_sz-th byte's page. + * + * Set buf_sz = 0 to touch all pages described by the ddep. + */ +static int nx_touch_pages_dde(struct nx_dde_t *ddep, long buf_sz, long page_sz, + int wr) +{ + uint32_t indirect_count; + uint32_t buf_len; + long total; + uint64_t buf_addr; + struct nx_dde_t *dde_list; + int i; + + assert(!!ddep); + + indirect_count = getpnn(ddep, dde_count); + + NXPRT(fprintf(stderr, "%s dde_count %d request len ", __func__, + indirect_count)); + NXPRT(fprintf(stderr, "0x%lx\n", buf_sz)); + + if (indirect_count == 0) { + /* Direct dde */ + buf_len = getp32(ddep, ddebc); + buf_addr = getp64(ddep, ddead); + + NXPRT(fprintf(stderr, "touch direct ddebc 0x%x ddead %p\n", + buf_len, (void *)buf_addr)); + + if (buf_sz == 0) + nxu_touch_pages((void *)buf_addr, buf_len, page_sz, wr); + else + nxu_touch_pages((void *)buf_addr, NX_MIN(buf_len, + buf_sz), page_sz, wr); + + return ERR_NX_OK; + } + + /* Indirect dde */ + if (indirect_count > MAX_DDE_COUNT) + return ERR_NX_EXCESSIVE_DDE; + + /* First address of the list */ + dde_list = (struct nx_dde_t *) getp64(ddep, ddead); + + if (buf_sz == 0) + buf_sz = getp32(ddep, ddebc); + + total = 0; + for (i = 0; i < indirect_count; i++) { + buf_len = get32(dde_list[i], ddebc); + buf_addr = get64(dde_list[i], ddead); + total += buf_len; + + NXPRT(fprintf(stderr, "touch loop len 0x%x ddead %p total ", + buf_len, (void *)buf_addr)); + NXPRT(fprintf(stderr, "0x%lx\n", total)); + + /* Touching fewer pages than encoded in the ddebc */ + if (total > buf_sz) { + buf_len = NX_MIN(buf_len, total - buf_sz); + nxu_touch_pages((void *)buf_addr, buf_len, page_sz, wr); + NXPRT(fprintf(stderr, "touch loop break len 0x%x ", + buf_len)); + NXPRT(fprintf(stderr, "ddead %p\n", (void *)buf_addr)); + break; + } + nxu_touch_pages((void *)buf_addr, buf_len, page_sz, wr); + } + return ERR_NX_OK; +} + +/* + * Src and dst buffers are supplied in scatter gather lists. + * NX function code and other parameters supplied in cmdp. + */ +static int nx_submit_job(struct nx_dde_t *src, struct nx_dde_t *dst, + struct nx_gzip_crb_cpb_t *cmdp, void *handle) +{ + int cc; + uint64_t csbaddr; + + memset((void *)&cmdp->crb.csb, 0, sizeof(cmdp->crb.csb)); + + cmdp->crb.source_dde = *src; + cmdp->crb.target_dde = *dst; + + /* Status, output byte count in tpbc */ + csbaddr = ((uint64_t) &cmdp->crb.csb) & csb_address_mask; + put64(cmdp->crb, csb_address, csbaddr); + + /* NX reports input bytes in spbc; cleared */ + cmdp->cpb.out_spbc_comp_wrap = 0; + cmdp->cpb.out_spbc_comp_with_count = 0; + cmdp->cpb.out_spbc_decomp = 0; + + /* Clear output */ + put32(cmdp->cpb, out_crc, INIT_CRC); + put32(cmdp->cpb, out_adler, INIT_ADLER); + + /* Submit the crb, the job descriptor, to the accelerator. */ + return nxu_submit_job(cmdp, handle); +} + +int decompress_file(int argc, char **argv, void *devhandle) +{ + FILE *inpf; + FILE *outf; + + int c, expect, i, cc, rc = 0; + char gzfname[FNAME_MAX]; + + /* Queuing, file ops, byte counting */ + char *fifo_in, *fifo_out; + int used_in, cur_in, used_out, cur_out, read_sz, n; + int first_free, last_free, first_used, last_used; + int first_offset, last_offset; + int write_sz, free_space, source_sz; + int source_sz_estimate, target_sz_estimate; + uint64_t last_comp_ratio; /* 1000 max */ + uint64_t total_out; + int is_final, is_eof; + + /* nx hardware */ + int sfbt, subc, spbc, tpbc, nx_ce, fc, resuming = 0; + int history_len = 0; + struct nx_gzip_crb_cpb_t cmd, *cmdp; + struct nx_dde_t *ddl_in; + struct nx_dde_t dde_in[6] __aligned(128); + struct nx_dde_t *ddl_out; + struct nx_dde_t dde_out[6] __aligned(128); + int pgfault_retries; + + /* when using mmap'ed files */ + off_t input_file_offset; + + if (argc > 2) { + fprintf(stderr, "usage: %s or stdin\n", argv[0]); + fprintf(stderr, " writes to stdout or .nx.gunzip\n"); + return -1; + } + + if (argc == 1) { + inpf = stdin; + outf = stdout; + } else if (argc == 2) { + char w[1024]; + char *wp; + + inpf = fopen(argv[1], "r"); + if (inpf == NULL) { + perror(argv[1]); + return -1; + } + + /* Make a new file name to write to. Ignoring '.gz' */ + wp = (NULL != (wp = strrchr(argv[1], '/'))) ? (wp+1) : argv[1]; + strcpy(w, wp); + strcat(w, ".nx.gunzip"); + + outf = fopen(w, "w"); + if (outf == NULL) { + perror(w); + return -1; + } + } + + /* Decode the gzip header */ + c = GETINPC(inpf); expect = 0x1f; /* ID1 */ + if (c != expect) + goto err1; + + c = GETINPC(inpf); expect = 0x8b; /* ID2 */ + if (c != expect) + goto err1; + + c = GETINPC(inpf); expect = 0x08; /* CM */ + if (c != expect) + goto err1; + + int flg = GETINPC(inpf); /* FLG */ + + if (flg & 0xE0 || flg & 0x4 || flg == EOF) + goto err2; + + fprintf(stderr, "gzHeader FLG %x\n", flg); + + /* Read 6 bytes; ignoring the MTIME, XFL, OS fields in this + * sample code. + */ + for (i = 0; i < 6; i++) { + char tmp[10]; + + tmp[i] = GETINPC(inpf); + if (tmp[i] == EOF) + goto err3; + fprintf(stderr, "%02x ", tmp[i]); + if (i == 5) + fprintf(stderr, "\n"); + } + fprintf(stderr, "gzHeader MTIME, XFL, OS ignored\n"); + + /* FNAME */ + if (flg & 0x8) { + int k = 0; + + do { + c = GETINPC(inpf); + if (c == EOF || k >= FNAME_MAX) + goto err3; + gzfname[k++] = c; + } while (c); + fprintf(stderr, "gzHeader FNAME: %s\n", gzfname); + } + + /* FHCRC */ + if (flg & 0x2) { + c = GETINPC(inpf); + if (c == EOF) + goto err3; + c = GETINPC(inpf); + if (c == EOF) + goto err3; + fprintf(stderr, "gzHeader FHCRC: ignored\n"); + } + + used_in = cur_in = used_out = cur_out = 0; + is_final = is_eof = 0; + + /* Allocate one page larger to prevent page faults due to NX + * overfetching. + * Either do this (char*)(uintptr_t)aligned_alloc or use + * -std=c11 flag to make the int-to-pointer warning go away. + */ + assert((fifo_in = (char *)(uintptr_t)aligned_alloc(line_sz, + fifo_in_len + page_sz)) != NULL); + assert((fifo_out = (char *)(uintptr_t)aligned_alloc(line_sz, + fifo_out_len + page_sz + line_sz)) != NULL); + /* Leave unused space due to history rounding rules */ + fifo_out = fifo_out + line_sz; + nxu_touch_pages(fifo_out, fifo_out_len, page_sz, 1); + + ddl_in = &dde_in[0]; + ddl_out = &dde_out[0]; + cmdp = &cmd; + memset(&cmdp->crb, 0, sizeof(cmdp->crb)); + +read_state: + + /* Read from .gz file */ + + NXPRT(fprintf(stderr, "read_state:\n")); + + if (is_eof != 0) + goto write_state; + + /* We read in to fifo_in in two steps: first: read in to from + * cur_in to the end of the buffer. last: if free space wrapped + * around, read from fifo_in offset 0 to offset cur_in. + */ + + /* Reset fifo head to reduce unnecessary wrap arounds */ + cur_in = (used_in == 0) ? 0 : cur_in; + + /* Free space total is reduced by a gap */ + free_space = NX_MAX(0, fifo_free_bytes(used_in, fifo_in_len) + - line_sz); + + /* Free space may wrap around as first and last */ + first_free = fifo_free_first_bytes(cur_in, used_in, fifo_in_len); + last_free = fifo_free_last_bytes(cur_in, used_in, fifo_in_len); + + /* Start offsets of the free memory */ + first_offset = fifo_free_first_offset(cur_in, used_in); + last_offset = fifo_free_last_offset(cur_in, used_in, fifo_in_len); + + /* Reduce read_sz because of the line_sz gap */ + read_sz = NX_MIN(free_space, first_free); + n = 0; + if (read_sz > 0) { + /* Read in to offset cur_in + used_in */ + n = fread(fifo_in + first_offset, 1, read_sz, inpf); + used_in = used_in + n; + free_space = free_space - n; + assert(n <= read_sz); + if (n != read_sz) { + /* Either EOF or error; exit the read loop */ + is_eof = 1; + goto write_state; + } + } + + /* If free space wrapped around */ + if (last_free > 0) { + /* Reduce read_sz because of the line_sz gap */ + read_sz = NX_MIN(free_space, last_free); + n = 0; + if (read_sz > 0) { + n = fread(fifo_in + last_offset, 1, read_sz, inpf); + used_in = used_in + n; /* Increase used space */ + free_space = free_space - n; /* Decrease free space */ + assert(n <= read_sz); + if (n != read_sz) { + /* Either EOF or error; exit the read loop */ + is_eof = 1; + goto write_state; + } + } + } + + /* At this point we have used_in bytes in fifo_in with the + * data head starting at cur_in and possibly wrapping around. + */ + +write_state: + + /* Write decompressed data to output file */ + + NXPRT(fprintf(stderr, "write_state:\n")); + + if (used_out == 0) + goto decomp_state; + + /* If fifo_out has data waiting, write it out to the file to + * make free target space for the accelerator used bytes in + * the first and last parts of fifo_out. + */ + + first_used = fifo_used_first_bytes(cur_out, used_out, fifo_out_len); + last_used = fifo_used_last_bytes(cur_out, used_out, fifo_out_len); + + write_sz = first_used; + + n = 0; + if (write_sz > 0) { + n = fwrite(fifo_out + cur_out, 1, write_sz, outf); + used_out = used_out - n; + /* Move head of the fifo */ + cur_out = (cur_out + n) % fifo_out_len; + assert(n <= write_sz); + if (n != write_sz) { + fprintf(stderr, "error: write\n"); + rc = -1; + goto err5; + } + } + + if (last_used > 0) { /* If more data available in the last part */ + write_sz = last_used; /* Keep it here for later */ + n = 0; + if (write_sz > 0) { + n = fwrite(fifo_out, 1, write_sz, outf); + used_out = used_out - n; + cur_out = (cur_out + n) % fifo_out_len; + assert(n <= write_sz); + if (n != write_sz) { + fprintf(stderr, "error: write\n"); + rc = -1; + goto err5; + } + } + } + +decomp_state: + + /* NX decompresses input data */ + + NXPRT(fprintf(stderr, "decomp_state:\n")); + + if (is_final) + goto finish_state; + + /* Address/len lists */ + clearp_dde(ddl_in); + clearp_dde(ddl_out); + + /* FC, CRC, HistLen, Table 6-6 */ + if (resuming) { + /* Resuming a partially decompressed input. + * The key to resume is supplying the 32KB + * dictionary (history) to NX, which is basically + * the last 32KB of output produced. + */ + fc = GZIP_FC_DECOMPRESS_RESUME; + + cmdp->cpb.in_crc = cmdp->cpb.out_crc; + cmdp->cpb.in_adler = cmdp->cpb.out_adler; + + /* Round up the history size to quadword. Section 2.10 */ + history_len = (history_len + 15) / 16; + putnn(cmdp->cpb, in_histlen, history_len); + history_len = history_len * 16; /* bytes */ + + if (history_len > 0) { + /* Chain in the history buffer to the DDE list */ + if (cur_out >= history_len) { + nx_append_dde(ddl_in, fifo_out + + (cur_out - history_len), + history_len); + } else { + nx_append_dde(ddl_in, fifo_out + + ((fifo_out_len + cur_out) + - history_len), + history_len - cur_out); + /* Up to 32KB history wraps around fifo_out */ + nx_append_dde(ddl_in, fifo_out, cur_out); + } + + } + } else { + /* First decompress job */ + fc = GZIP_FC_DECOMPRESS; + + history_len = 0; + /* Writing 0 clears out subc as well */ + cmdp->cpb.in_histlen = 0; + total_out = 0; + + put32(cmdp->cpb, in_crc, INIT_CRC); + put32(cmdp->cpb, in_adler, INIT_ADLER); + put32(cmdp->cpb, out_crc, INIT_CRC); + put32(cmdp->cpb, out_adler, INIT_ADLER); + + /* Assuming 10% compression ratio initially; use the + * most recently measured compression ratio as a + * heuristic to estimate the input and output + * sizes. If we give too much input, the target buffer + * overflows and NX cycles are wasted, and then we + * must retry with smaller input size. 1000 is 100%. + */ + last_comp_ratio = 100UL; + } + cmdp->crb.gzip_fc = 0; + putnn(cmdp->crb, gzip_fc, fc); + + /* + * NX source buffers + */ + first_used = fifo_used_first_bytes(cur_in, used_in, fifo_in_len); + last_used = fifo_used_last_bytes(cur_in, used_in, fifo_in_len); + + if (first_used > 0) + nx_append_dde(ddl_in, fifo_in + cur_in, first_used); + + if (last_used > 0) + nx_append_dde(ddl_in, fifo_in, last_used); + + /* + * NX target buffers + */ + first_free = fifo_free_first_bytes(cur_out, used_out, fifo_out_len); + last_free = fifo_free_last_bytes(cur_out, used_out, fifo_out_len); + + /* Reduce output free space amount not to overwrite the history */ + int target_max = NX_MAX(0, fifo_free_bytes(used_out, fifo_out_len) + - (1<<16)); + + NXPRT(fprintf(stderr, "target_max %d (0x%x)\n", target_max, + target_max)); + + first_free = NX_MIN(target_max, first_free); + if (first_free > 0) { + first_offset = fifo_free_first_offset(cur_out, used_out); + nx_append_dde(ddl_out, fifo_out + first_offset, first_free); + } + + if (last_free > 0) { + last_free = NX_MIN(target_max - first_free, last_free); + if (last_free > 0) { + last_offset = fifo_free_last_offset(cur_out, used_out, + fifo_out_len); + nx_append_dde(ddl_out, fifo_out + last_offset, + last_free); + } + } + + /* Target buffer size is used to limit the source data size + * based on previous measurements of compression ratio. + */ + + /* source_sz includes history */ + source_sz = getp32(ddl_in, ddebc); + assert(source_sz > history_len); + source_sz = source_sz - history_len; + + /* Estimating how much source is needed to 3/4 fill a + * target_max size target buffer. If we overshoot, then NX + * must repeat the job with smaller input and we waste + * bandwidth. If we undershoot then we use more NX calls than + * necessary. + */ + + source_sz_estimate = ((uint64_t)target_max * last_comp_ratio * 3UL) + / 4000; + + if (source_sz_estimate < source_sz) { + /* Target might be small, therefore limiting the + * source data. + */ + source_sz = source_sz_estimate; + target_sz_estimate = target_max; + } else { + /* Source file might be small, therefore limiting target + * touch pages to a smaller value to save processor cycles. + */ + target_sz_estimate = ((uint64_t)source_sz * 1000UL) + / (last_comp_ratio + 1); + target_sz_estimate = NX_MIN(2 * target_sz_estimate, + target_max); + } + + source_sz = source_sz + history_len; + + /* Some NX condition codes require submitting the NX job again. + * Kernel doesn't handle NX page faults. Expects user code to + * touch pages. + */ + pgfault_retries = NX_MAX_FAULTS; + +restart_nx: + + putp32(ddl_in, ddebc, source_sz); + + /* Fault in pages */ + nxu_touch_pages(cmdp, sizeof(struct nx_gzip_crb_cpb_t), page_sz, 1); + nx_touch_pages_dde(ddl_in, 0, page_sz, 0); + nx_touch_pages_dde(ddl_out, target_sz_estimate, page_sz, 1); + + /* Send job to NX */ + cc = nx_submit_job(ddl_in, ddl_out, cmdp, devhandle); + + switch (cc) { + + case ERR_NX_TRANSLATION: + + /* We touched the pages ahead of time. In the most common case + * we shouldn't be here. But may be some pages were paged out. + * Kernel should have placed the faulting address to fsaddr. + */ + NXPRT(fprintf(stderr, "ERR_NX_TRANSLATION %p\n", + (void *)cmdp->crb.csb.fsaddr)); + + if (pgfault_retries == NX_MAX_FAULTS) { + /* Try once with exact number of pages */ + --pgfault_retries; + goto restart_nx; + } else if (pgfault_retries > 0) { + /* If still faulting try fewer input pages + * assuming memory outage + */ + if (source_sz > page_sz) + source_sz = NX_MAX(source_sz / 2, page_sz); + --pgfault_retries; + goto restart_nx; + } else { + fprintf(stderr, "cannot make progress; too many "); + fprintf(stderr, "page fault retries cc= %d\n", cc); + rc = -1; + goto err5; + } + + case ERR_NX_DATA_LENGTH: + + NXPRT(fprintf(stderr, "ERR_NX_DATA_LENGTH; ")); + NXPRT(fprintf(stderr, "stream may have trailing data\n")); + + /* Not an error in the most common case; it just says + * there is trailing data that we must examine. + * + * CC=3 CE(1)=0 CE(0)=1 indicates partial completion + * Fig.6-7 and Table 6-8. + */ + nx_ce = get_csb_ce_ms3b(cmdp->crb.csb); + + if (!csb_ce_termination(nx_ce) && + csb_ce_partial_completion(nx_ce)) { + /* Check CPB for more information + * spbc and tpbc are valid + */ + sfbt = getnn(cmdp->cpb, out_sfbt); /* Table 6-4 */ + subc = getnn(cmdp->cpb, out_subc); /* Table 6-4 */ + spbc = get32(cmdp->cpb, out_spbc_decomp); + tpbc = get32(cmdp->crb.csb, tpbc); + assert(target_max >= tpbc); + + goto ok_cc3; /* not an error */ + } else { + /* History length error when CE(1)=1 CE(0)=0. */ + rc = -1; + fprintf(stderr, "history length error cc= %d\n", cc); + goto err5; + } + + case ERR_NX_TARGET_SPACE: + + /* Target buffer not large enough; retry smaller input + * data; give at least 1 byte. SPBC/TPBC are not valid. + */ + assert(source_sz > history_len); + source_sz = ((source_sz - history_len + 2) / 2) + history_len; + NXPRT(fprintf(stderr, "ERR_NX_TARGET_SPACE; retry with ")); + NXPRT(fprintf(stderr, "smaller input data src %d hist %d\n", + source_sz, history_len)); + goto restart_nx; + + case ERR_NX_OK: + + /* This should not happen for gzip formatted data; + * we need trailing crc and isize + */ + fprintf(stderr, "ERR_NX_OK\n"); + spbc = get32(cmdp->cpb, out_spbc_decomp); + tpbc = get32(cmdp->crb.csb, tpbc); + assert(target_max >= tpbc); + assert(spbc >= history_len); + source_sz = spbc - history_len; + goto offsets_state; + + default: + fprintf(stderr, "error: cc= %d\n", cc); + rc = -1; + goto err5; + } + +ok_cc3: + + NXPRT(fprintf(stderr, "cc3: sfbt: %x\n", sfbt)); + + assert(spbc > history_len); + source_sz = spbc - history_len; + + /* Table 6-4: Source Final Block Type (SFBT) describes the + * last processed deflate block and clues the software how to + * resume the next job. SUBC indicates how many input bits NX + * consumed but did not process. SPBC indicates how many + * bytes of source were given to the accelerator including + * history bytes. + */ + + switch (sfbt) { + int dhtlen; + + case 0x0: /* Deflate final EOB received */ + + /* Calculating the checksum start position. */ + + source_sz = source_sz - subc / 8; + is_final = 1; + break; + + /* Resume decompression cases are below. Basically + * indicates where NX has suspended and how to resume + * the input stream. + */ + + case 0x8: /* Within a literal block; use rembytecount */ + case 0x9: /* Within a literal block; use rembytecount; bfinal=1 */ + + /* Supply the partially processed source byte again */ + source_sz = source_sz - ((subc + 7) / 8); + + /* SUBC LS 3bits: number of bits in the first source byte need + * to be processed. + * 000 means all 8 bits; Table 6-3 + * Clear subc, histlen, sfbt, rembytecnt, dhtlen + */ + cmdp->cpb.in_subc = 0; + cmdp->cpb.in_sfbt = 0; + putnn(cmdp->cpb, in_subc, subc % 8); + putnn(cmdp->cpb, in_sfbt, sfbt); + putnn(cmdp->cpb, in_rembytecnt, getnn(cmdp->cpb, + out_rembytecnt)); + break; + + case 0xA: /* Within a FH block; */ + case 0xB: /* Within a FH block; bfinal=1 */ + + source_sz = source_sz - ((subc + 7) / 8); + + /* Clear subc, histlen, sfbt, rembytecnt, dhtlen */ + cmdp->cpb.in_subc = 0; + cmdp->cpb.in_sfbt = 0; + putnn(cmdp->cpb, in_subc, subc % 8); + putnn(cmdp->cpb, in_sfbt, sfbt); + break; + + case 0xC: /* Within a DH block; */ + case 0xD: /* Within a DH block; bfinal=1 */ + + source_sz = source_sz - ((subc + 7) / 8); + + /* Clear subc, histlen, sfbt, rembytecnt, dhtlen */ + cmdp->cpb.in_subc = 0; + cmdp->cpb.in_sfbt = 0; + putnn(cmdp->cpb, in_subc, subc % 8); + putnn(cmdp->cpb, in_sfbt, sfbt); + + dhtlen = getnn(cmdp->cpb, out_dhtlen); + putnn(cmdp->cpb, in_dhtlen, dhtlen); + assert(dhtlen >= 42); + + /* Round up to a qword */ + dhtlen = (dhtlen + 127) / 128; + + while (dhtlen > 0) { /* Copy dht from cpb.out to cpb.in */ + --dhtlen; + cmdp->cpb.in_dht[dhtlen] = cmdp->cpb.out_dht[dhtlen]; + } + break; + + case 0xE: /* Within a block header; bfinal=0; */ + /* Also given if source data exactly ends (SUBC=0) with + * EOB code with BFINAL=0. Means the next byte will + * contain a block header. + */ + case 0xF: /* within a block header with BFINAL=1. */ + + source_sz = source_sz - ((subc + 7) / 8); + + /* Clear subc, histlen, sfbt, rembytecnt, dhtlen */ + cmdp->cpb.in_subc = 0; + cmdp->cpb.in_sfbt = 0; + putnn(cmdp->cpb, in_subc, subc % 8); + putnn(cmdp->cpb, in_sfbt, sfbt); + + /* Engine did not process any data */ + if (is_eof && (source_sz == 0)) + is_final = 1; + } + +offsets_state: + + /* Adjust the source and target buffer offsets and lengths */ + + NXPRT(fprintf(stderr, "offsets_state:\n")); + + /* Delete input data from fifo_in */ + used_in = used_in - source_sz; + cur_in = (cur_in + source_sz) % fifo_in_len; + input_file_offset = input_file_offset + source_sz; + + /* Add output data to fifo_out */ + used_out = used_out + tpbc; + + assert(used_out <= fifo_out_len); + + total_out = total_out + tpbc; + + /* Deflate history is 32KB max. No need to supply more + * than 32KB on a resume. + */ + history_len = (total_out > window_max) ? window_max : total_out; + + /* To estimate expected expansion in the next NX job; 500 means 50%. + * Deflate best case is around 1 to 1000. + */ + last_comp_ratio = (1000UL * ((uint64_t)source_sz + 1)) + / ((uint64_t)tpbc + 1); + last_comp_ratio = NX_MAX(NX_MIN(1000UL, last_comp_ratio), 1); + NXPRT(fprintf(stderr, "comp_ratio %ld source_sz %d spbc %d tpbc %d\n", + last_comp_ratio, source_sz, spbc, tpbc)); + + resuming = 1; + +finish_state: + + NXPRT(fprintf(stderr, "finish_state:\n")); + + if (is_final) { + if (used_out) + goto write_state; /* More data to write out */ + else if (used_in < 8) { + /* Need at least 8 more bytes containing gzip crc + * and isize. + */ + rc = -1; + goto err4; + } else { + /* Compare checksums and exit */ + int i; + unsigned char tail[8]; + uint32_t cksum, isize; + + for (i = 0; i < 8; i++) + tail[i] = fifo_in[(cur_in + i) % fifo_in_len]; + fprintf(stderr, "computed checksum %08x isize %08x\n", + cmdp->cpb.out_crc, (uint32_t) (total_out + % (1ULL<<32))); + cksum = ((uint32_t) tail[0] | (uint32_t) tail[1]<<8 + | (uint32_t) tail[2]<<16 + | (uint32_t) tail[3]<<24); + isize = ((uint32_t) tail[4] | (uint32_t) tail[5]<<8 + | (uint32_t) tail[6]<<16 + | (uint32_t) tail[7]<<24); + fprintf(stderr, "stored checksum %08x isize %08x\n", + cksum, isize); + + if (cksum == cmdp->cpb.out_crc && isize == (uint32_t) + (total_out % (1ULL<<32))) { + rc = 0; goto ok1; + } else { + rc = -1; goto err4; + } + } + } else + goto read_state; + + return -1; + +err1: + fprintf(stderr, "error: not a gzip file, expect %x, read %x\n", + expect, c); + return -1; + +err2: + fprintf(stderr, "error: the FLG byte is wrong or not being handled\n"); + return -1; + +err3: + fprintf(stderr, "error: gzip header\n"); + return -1; + +err4: + fprintf(stderr, "error: checksum missing or mismatch\n"); + +err5: +ok1: + fprintf(stderr, "decomp is complete: fclose\n"); + fclose(outf); + + return rc; +} + + +int main(int argc, char **argv) +{ + int rc; + struct sigaction act; + void *handle; + + nx_dbg = 0; + nx_gzip_log = NULL; + act.sa_handler = 0; + act.sa_sigaction = nxu_sigsegv_handler; + act.sa_flags = SA_SIGINFO; + act.sa_restorer = 0; + sigemptyset(&act.sa_mask); + sigaction(SIGSEGV, &act, NULL); + + handle = nx_function_begin(NX_FUNC_COMP_GZIP, 0); + if (!handle) { + fprintf(stderr, "Unable to init NX, errno %d\n", errno); + exit(-1); + } + + rc = decompress_file(argc, argv, handle); + + nx_function_end(handle); + + return rc; +}