From patchwork Mon Nov 12 07:58:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kenneth Lee X-Patchwork-Id: 150787 Delivered-To: patch@linaro.org Received: by 2002:a2e:299d:0:0:0:0:0 with SMTP id p29-v6csp2856290ljp; Sun, 11 Nov 2018 23:59:14 -0800 (PST) X-Google-Smtp-Source: AJdET5fgTFgvhu5qpKne4jyP9rteKk1xaq1+koEveD1bk6lu0rCzh9Yz6BhOoKSODktzRZg/ZXOE X-Received: by 2002:aa7:8498:: with SMTP id u24-v6mr6849015pfn.220.1542009554474; Sun, 11 Nov 2018 23:59:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542009554; cv=none; d=google.com; s=arc-20160816; b=tIXegIXpmjKwuadYW461fnmooFIOM6sjRE9ItFMHVn5WeqT6pVjq9lEwtj9F2S9Int kVBgialHKsh9HHJNiBfHS43SUmhGsB1ZBgZ/5vygNtE6Zw81rLdgGyyale+tSaz74jHO N8ANbf3O3UbpFBizIujFU0HGfe6n2Ijd/q8BfrCdsQ/J6JmIhB6xaqZKMFMDM39p4EmE 6wbWNrK/M5oj0PVvltagnpOAx+FiRz0W5GFjABP7jmO/78vf8FrJbPUIiHZBc0gd1MWl ZezOdl4ePF7jTfdyBsAIADUEGW6bmjRW4NPgZrwADSgMgqgkcl5F9MhLHEB7KccV74dz j+hA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=O5QGqogXo+zbFjJnIdn/zbBce5DhfCXEtw0mdKStgBg=; b=VQurheo5vVBN27JJDhQz61DeiTdsZjvajE0wnPNLVLnpz5QmNkz9F9lWtICj8+lbHs kVXOtjXv408/aRejVo0T/rwMBc4+0nXvjLNFMGJx7Mm9HNDLRfBtyFNeb0zrBCrWk0jU JvG6HUlbIBrmmHFWbVkJUMUxqbewHErBWui9Ul0HovaWvPq9n5xmf1Z3ZRBKm2PDR4DO R8z4lTpiuNtvKN/nHTOy3DyFU07ZORj1gXHPzka8XMrasl+chLbEFHC43w3IWGj/uCMy 5gPIv2d+HGplXaVDrWPT7rG0bpwg7GnV/1vTf0p7Aj81hIa60G/ePLD2QiGlYBnkwNa1 OFmQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=BiLlZc8I; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bh6-v6si16792078plb.66.2018.11.11.23.59.13; Sun, 11 Nov 2018 23:59:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=BiLlZc8I; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727357AbeKLRvP (ORCPT + 2 others); Mon, 12 Nov 2018 12:51:15 -0500 Received: from mail-pg1-f195.google.com ([209.85.215.195]:44264 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727000AbeKLRvP (ORCPT ); Mon, 12 Nov 2018 12:51:15 -0500 Received: by mail-pg1-f195.google.com with SMTP id w3-v6so3682384pgs.11; Sun, 11 Nov 2018 23:59:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=O5QGqogXo+zbFjJnIdn/zbBce5DhfCXEtw0mdKStgBg=; b=BiLlZc8I1JCLzEgxgOLPnWtf4GVp1woUTUYF8CRQgQ8lmz28zuQDNxrQ6N6Dq3JgYB bQRBRuwsHPMivemtvg9hGiI5YGW6jHUQf5yQXwtQgdY3Djo97W5fRWcyoOB8Q1Z3/tl5 75LCc8Iny/pICamT7NoDOZN+Ucne7ftpn/9RCcBSAGJmG6vUNcYVLiYtNe0obitUp08f xka4kOx7e9Je1dI2MgH0fnL8aFJmDYd5+YIEkOHUH0fTtdeCNnG10WmHSTbZ62jQgQfE ZLh9WvgDqeJbhTn4yUqsbsyUaybmSU9XXFTJKbfNyWw4fgNF+nUvhpPP9n/K590iZQlu uFiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=O5QGqogXo+zbFjJnIdn/zbBce5DhfCXEtw0mdKStgBg=; b=C5CX7Jxjmmmico17M6nF+OZ1VpHPtw8zqZYuwJdkXKNNaEZ/VXSsRbUeW2tLhkCPcg sqOMMcAyQ1ZZUJcv67YgEaa6IhOtWXD9tpkKZTzkhO41i1w5tcqR1L7XHZEPLVP4ZdUk mCR5isQE4mgRYrXMvTndjj0+WnJhHNVBkaIOXsWvSQ50A9QhewHViEChyoySV5CYulBj H8l4yifKWe8/leHbz0Q2YTUosdIlxbs1kUZHpmrQaC/aHl7/Pc3A5U9HaxEbzQwSOCrc zQLIepdWRy1wCyi+JADDxqesIAP1k7t0mmQPyWcohj4ee+pvGWJD6JTVm+MEBzrKHxL7 Pzgw== X-Gm-Message-State: AGRZ1gJ7PnXO0qPPBwyv514euRqAeH572CtGUpiXdAaH+or0yTg4QhmJ W/JJX8N2moILIgXxB+r/Rjc= X-Received: by 2002:a62:1bd6:: with SMTP id b205-v6mr2405240pfb.178.1542009548673; Sun, 11 Nov 2018 23:59:08 -0800 (PST) Received: from localhost.localdomain ([45.41.180.77]) by smtp.gmail.com with ESMTPSA id u2-v6sm17050816pfn.50.2018.11.11.23.58.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 11 Nov 2018 23:59:07 -0800 (PST) From: Kenneth Lee To: Alexander Shishkin , Tim Sell , Sanyog Kale , Randy Dunlap , =?utf-8?q?Uwe_Kleine-K=C3=B6nig?= , Vinod Koul , David Kershner , Sagar Dharia , Gavin Schenk , Jens Axboe , Philippe Ombredanne , Cyrille Pitchen , Johan Hovold , Zhou Wang , Hao Fang , Jonathan Cameron , Zaibo Xu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org, linux-accelerators@lists.ozlabs.org Cc: linuxarm@huawei.com, guodong.xu@linaro.org, zhangfei.gao@foxmail.com, haojian.zhuang@linaro.org, Kenneth Lee Subject: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce Date: Mon, 12 Nov 2018 15:58:02 +0800 Message-Id: <20181112075807.9291-2-nek.in.cn@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181112075807.9291-1-nek.in.cn@gmail.com> References: <20181112075807.9291-1-nek.in.cn@gmail.com> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org From: Kenneth Lee WarpDrive is a general accelerator framework for the user application to access the hardware without going through the kernel in data path. The kernel component to provide kernel facility to driver for expose the user interface is called uacce. It a short name for "Unified/User-space-access-intended Accelerator Framework". This patch add document to explain how it works. Signed-off-by: Kenneth Lee --- Documentation/warpdrive/warpdrive.rst | 260 +++++++ Documentation/warpdrive/wd-arch.svg | 764 ++++++++++++++++++++ Documentation/warpdrive/wd.svg | 526 ++++++++++++++ Documentation/warpdrive/wd_q_addr_space.svg | 359 +++++++++ 4 files changed, 1909 insertions(+) create mode 100644 Documentation/warpdrive/warpdrive.rst create mode 100644 Documentation/warpdrive/wd-arch.svg create mode 100644 Documentation/warpdrive/wd.svg create mode 100644 Documentation/warpdrive/wd_q_addr_space.svg -- 2.17.1 diff --git a/Documentation/warpdrive/warpdrive.rst b/Documentation/warpdrive/warpdrive.rst new file mode 100644 index 000000000000..ef84d3a2d462 --- /dev/null +++ b/Documentation/warpdrive/warpdrive.rst @@ -0,0 +1,260 @@ +Introduction of WarpDrive +========================= + +*WarpDrive* is a general accelerator framework for the user application to +access the hardware without going through the kernel in data path. + +It can be used as the quick channel for accelerators, network adaptors or +other hardware for application in user space. + +This may make some implementation simpler. E.g. you can reuse most of the +*netdev* driver in kernel and just share some ring buffer to the user space +driver for *DPDK* [4] or *ODP* [5]. Or you can combine the RSA accelerator with +the *netdev* in the user space as a https reversed proxy, etc. + +*WarpDrive* takes the hardware accelerator as a heterogeneous processor which +can share particular load from the CPU: + +.. image:: wd.svg + :alt: WarpDrive Concept + +The virtual concept, queue, is used to manage the requests sent to the +accelerator. The application send requests to the queue by writing to some +particular address, while the hardware takes the requests directly from the +address and send feedback accordingly. + +The format of the queue may differ from hardware to hardware. But the +application need not to make any system call for the communication. + +*WarpDrive* tries to create a shared virtual address space for all involved +accelerators. Within this space, the requests sent to queue can refer to any +virtual address, which will be valid to the application and all involved +accelerators. + +The name *WarpDrive* is simply a cool and general name meaning the framework +makes the application faster. It includes general user library, kernel +management module and drivers for the hardware. In kernel, the management +module is called *uacce*, meaning "Unified/User-space-access-intended +Accelerator Framework". + + +How does it work +================ + +*WarpDrive* uses *mmap* and *IOMMU* to play the trick. + +*Uacce* creates a chrdev for the device registered to it. A "queue" will be +created when the chrdev is opened. The application access the queue by mmap +different address region of the queue file. + +The following figure demonstrated the queue file address space: + +.. image:: wd_q_addr_space.svg + :alt: WarpDrive Queue Address Space + +The first region of the space, device region, is used for the application to +write request or read answer to or from the hardware. + +Normally, there can be three types of device regions mmio and memory regions. +It is recommended to use common memory for request/answer descriptors and use +the mmio space for device notification, such as doorbell. But of course, this +is all up to the interface designer. + +There can be two types of device memory regions, kernel-only and user-shared. +This will be explained in the "kernel APIs" section. + +The Static Share Virtual Memory region is necessary only when the device IOMMU +does not support "Share Virtual Memory". This will be explained after the +*IOMMU* idea. + + +Architecture +------------ + +The full *WarpDrive* architecture is represented in the following class +diagram: + +.. image:: wd-arch.svg + :alt: WarpDrive Architecture + + +The user API +------------ + +We adopt a polling style interface in the user space: :: + + int wd_request_queue(struct wd_queue *q); + void wd_release_queue(struct wd_queue *q); + + int wd_send(struct wd_queue *q, void *req); + int wd_recv(struct wd_queue *q, void **req); + int wd_recv_sync(struct wd_queue *q, void **req); + void wd_flush(struct wd_queue *q); + +wd_recv_sync() is a wrapper to its non-sync version. It will trapped into +kernel and waits until the queue become available. + +If the queue do not support SVA/SVM. The following helper function +can be used to create Static Virtual Share Memory: :: + + void *wd_preserve_share_memory(struct wd_queue *q, size_t size); + +The user API is not mandatory. It is simply a suggestion and hint what the +kernel interface is supposed to support. + + +The user driver +--------------- + +The queue file mmap space will need a user driver to wrap the communication +protocol. *UACCE* provides some attributes in sysfs for the user driver to +match the right accelerator accordingly. + +The *UACCE* device attribute is under the following directory: + +/sys/class/uacce//params + +The following attributes is supported: + +nr_queue_remained (ro) + number of queue remained + +api_version (ro) + a string to identify the queue mmap space format and its version + +device_attr (ro) + attributes of the device, see UACCE_DEV_xxx flag defined in uacce.h + +numa_node (ro) + id of numa node + +priority (rw) + Priority or the device, bigger is higher + +(This is not yet implemented in RFC version) + + +The kernel API +-------------- + +The *uacce* kernel API is defined in uacce.h. If the hardware support SVM/SVA, +The driver need only the following API functions: :: + + int uacce_register(uacce); + void uacce_unregister(uacce); + void uacce_wake_up(q); + +*uacce_wake_up* is used to notify the process who epoll() on the queue file. + +According to the IOMMU capability, *uacce* categories the devices as follow: + +UACCE_DEV_NOIOMMU + The device has no IOMMU. The user process cannot use VA on the hardware + This mode is not recommended. + +UACCE_DEV_SVA (UACCE_DEV_PASID | UACCE_DEV_FAULT_FROM_DEV) + The device has IOMMU which can share the same page table with user + process + +UACCE_DEV_SHARE_DOMAIN + The device has IOMMU which has no multiple page table and device page + fault support + +If the device works in mode other than UACCE_DEV_NOIOMMU, *uacce* will set its +IOMMU to IOMMU_DOMAIN_UNMANAGED. So the driver must not use any kernel +DMA API but the following ones from *uacce* instead: :: + + uacce_dma_map(q, va, size, prot); + uacce_dma_unmap(q, va, size, prot); + +*uacce_dma_map/unmap* is valid only for UACCE_DEV_SVA device. It creates a +particular PASID and page table for the kernel in the IOMMU (Not yet +implemented in the RFC) + +For the UACCE_DEV_SHARE_DOMAIN device, uacce_dma_map/unmap is not valid. +*Uacce* call back start_queue only when the DUS and DKO region is mmapped. The +accelerator driver must use those dma buffer, via uacce_queue->qfrs[], on +start_queue call back. The size of the queue file region is defined by +uacce->ops->qf_pg_start[]. + +We have to do it this way because most of current IOMMU cannot support the +kernel and user virtual address at the same time. So we have to let them both +share the same user virtual address space. + +If the device have to support kernel and user at the same time, both kernel +and the user should use these DMA API. This is not convenient. A better +solution is to change the future DMA/IOMMU design to let them separate the +address space between the user and kernel space. But it is not going to be in +a short time. + + +Multiple processes support +========================== + +In the latest mainline kernel (4.19) when this document is written, the IOMMU +subsystem do not support multiple process page tables yet. + +Most IOMMU hardware implementation support multi-process with the concept +of PASID. But they may use different name, e.g. it is call sub-stream-id in +SMMU of ARM. With PASID or similar design, multi page table can be added to +the IOMMU and referred by its PASID. + +*JPB* has a patchset to enable this[1]_. We have tested it with our hardware +(which is known as *D06*). It works well. *WarpDrive* rely on them to support +UACCE_DEV_SVA. If it is not enabled, *WarpDrive* can still work. But it +support only one process, the device will be set to UACCE_DEV_SHARE_DOMAIN +even it is set to UACCE_DEV_SVA initially. + +Static Share Virtual Memory is mainly used by UACCE_DEV_SHARE_DOMAIN device. + + +Legacy Mode Support +=================== +For the hardware without IOMMU, WarpDrive can still work, the only problem is +VA cannot be used in the device. The driver should adopt another strategy for +the shared memory. It is only for testing, and not recommended. + + +The Folk Scenario +================= +For a process with allocated queues and shared memory, what happen if it forks +a child? + +The fd of the queue will be duplicated on folk, so the child can send request +to the same queue as its parent. But the requests which is sent from processes +except for the one who open the queue will be blocked. + +It is recommended to add O_CLOEXEC to the queue file. + +The queue mmap space has a VM_DONTCOPY in its VMA. So the child will lost all +those VMAs. + +This is why *WarpDrive* does not adopt the mode used in *VFIO* and *InfiniBand*. +Both solutions can set any user pointer for hardware sharing. But they cannot +support fork when the dma is in process. Or the "Copy-On-Write" procedure will +make the parent process lost its physical pages. + + +The Sample Code +=============== +There is a sample user land implementation with a simple driver for Hisilicon +Hi1620 ZIP Accelerator. + +To test, do the following in samples/warpdrive (for the case of PC host): :: + ./autogen.sh + ./conf.sh # or simply ./configure if you build on target system + make + +Then you can get test_hisi_zip in the test subdirectory. Copy it to the target +system and make sure the hisi_zip driver is enabled (the major and minor of +the uacce chrdev can be gotten from the dmesg or sysfs), and run: :: + mknod /dev/ua1 c + test/test_hisi_zip -z < data > data.zip + test/test_hisi_zip -g < data > data.gzip + + +References +========== +.. [1] https://patchwork.kernel.org/patch/10394851/ + +.. vim: tw=78 diff --git a/Documentation/warpdrive/wd-arch.svg b/Documentation/warpdrive/wd-arch.svg new file mode 100644 index 000000000000..e59934188443 --- /dev/null +++ b/Documentation/warpdrive/wd-arch.svg @@ -0,0 +1,764 @@ + + + + + + + + + + + + + + + + + + + + + + + + generation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + WarpDrive + + + user_driver + + + + uacce + + + + + Device Driver + + <<anom_file>>Queue FD + 1 + * + + + other standard framework(crypto/nic/others) + + <<lkm>> + uacce register api + register to other subsystem + <<user_lib>> + mmapped memory r/w interface + wd user api + + + + Device(Hardware) + + + + IOMMU + + manage the driver iommu state + + diff --git a/Documentation/warpdrive/wd.svg b/Documentation/warpdrive/wd.svg new file mode 100644 index 000000000000..87ab92ebfbc6 --- /dev/null +++ b/Documentation/warpdrive/wd.svg @@ -0,0 +1,526 @@ + + + + + + + + + + + + + + + + generation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + user application (running by the CPU + + + + + MMU + + + Memory + + + IOMMU + + + Hardware Accelerator + + + + diff --git a/Documentation/warpdrive/wd_q_addr_space.svg b/Documentation/warpdrive/wd_q_addr_space.svg new file mode 100644 index 000000000000..5e6cf8e89908 --- /dev/null +++ b/Documentation/warpdrive/wd_q_addr_space.svg @@ -0,0 +1,359 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + queue file address space + + + + + + + + + offset 0 + device region (mapped to device mmio or shared kernel driver memory) + static share virtual memory region (for device without share virtual memory) + + + device mmio region + device kernel only region + + device user share region + +