From patchwork Tue Oct 27 03:53:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Song Bao Hua \(Barry Song\)" X-Patchwork-Id: 285828 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6898C55179 for ; Tue, 27 Oct 2020 03:57:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 936472080A for ; Tue, 27 Oct 2020 03:57:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2505303AbgJ0D53 (ORCPT ); Mon, 26 Oct 2020 23:57:29 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:4589 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2505305AbgJ0D53 (ORCPT ); Mon, 26 Oct 2020 23:57:29 -0400 Received: from DGGEMS408-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4CKyZz2rsMzhb70; Tue, 27 Oct 2020 11:57:31 +0800 (CST) Received: from SWX921481.china.huawei.com (10.126.202.177) by DGGEMS408-HUB.china.huawei.com (10.3.19.208) with Microsoft SMTP Server id 14.3.487.0; Tue, 27 Oct 2020 11:57:16 +0800 From: Barry Song To: , , , CC: , , , , , Barry Song Subject: [PATCH 1/2] dma-mapping: add benchmark support for streaming DMA APIs Date: Tue, 27 Oct 2020 16:53:29 +1300 Message-ID: <20201027035330.29612-2-song.bao.hua@hisilicon.com> X-Mailer: git-send-email 2.21.0.windows.1 In-Reply-To: <20201027035330.29612-1-song.bao.hua@hisilicon.com> References: <20201027035330.29612-1-song.bao.hua@hisilicon.com> MIME-Version: 1.0 X-Originating-IP: [10.126.202.177] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Nowadays, there are increasing requirements to benchmark the performance of dma_map and dma_unmap particually while the device is attached to an IOMMU. This patch enables the support. Users can run specified number of threads to do dma_map_page and dma_unmap_page on a specific NUMA node with the specified duration. Then dma_map_benchmark will calculate the average latency for map and unmap. A difficulity for this benchmark is that dma_map/unmap APIs must run on a particular device. Each device might have different backend of IOMMU or non-IOMMU. So we use the driver_override to bind dma_map_benchmark to a particual device by: echo dma_map_benchmark > /sys/bus/platform/devices/xxx/driver_override echo xxx > /sys/bus/platform/drivers/xxx/unbind echo xxx > /sys/bus/platform/drivers/dma_map_benchmark/bind For this moment, it supports platform device only, PCI device will also be supported afterwards. Cc: Joerg Roedel Cc: Will Deacon Cc: Shuah Khan Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Signed-off-by: Barry Song --- kernel/dma/Kconfig | 8 ++ kernel/dma/Makefile | 1 + kernel/dma/map_benchmark.c | 202 +++++++++++++++++++++++++++++++++++++ 3 files changed, 211 insertions(+) create mode 100644 kernel/dma/map_benchmark.c diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index c99de4a21458..949c53da5991 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -225,3 +225,11 @@ config DMA_API_DEBUG_SG is technically out-of-spec. If unsure, say N. + +config DMA_MAP_BENCHMARK + bool "Enable benchmarking of streaming DMA mapping" + help + Provides /sys/kernel/debug/dma_map_benchmark that helps with testing + performance of dma_(un)map_page. + + See tools/testing/selftests/dma/dma_map_benchmark.c diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile index dc755ab68aab..7aa6b26b1348 100644 --- a/kernel/dma/Makefile +++ b/kernel/dma/Makefile @@ -10,3 +10,4 @@ obj-$(CONFIG_DMA_API_DEBUG) += debug.o obj-$(CONFIG_SWIOTLB) += swiotlb.o obj-$(CONFIG_DMA_COHERENT_POOL) += pool.o obj-$(CONFIG_DMA_REMAP) += remap.o +obj-$(CONFIG_DMA_MAP_BENCHMARK) += map_benchmark.o diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c new file mode 100644 index 000000000000..16a5d7779d67 --- /dev/null +++ b/kernel/dma/map_benchmark.c @@ -0,0 +1,202 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2020 Hisilicon Limited. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define DMA_MAP_BENCHMARK _IOWR('d', 1, struct map_benchmark) + +struct map_benchmark { + __u64 map_nsec; + __u64 unmap_nsec; + __u32 threads; /* how many threads will do map/unmap in parallel */ + __u32 seconds; /* how long the test will last */ + int node; /* which numa node this benchmark will run on */ + __u64 expansion[10]; /* For future use */ +}; + +struct map_benchmark_data { + struct map_benchmark bparam; + struct device *dev; + struct dentry *debugfs; + atomic64_t total_map_nsecs; + atomic64_t total_map_loops; + atomic64_t total_unmap_nsecs; + atomic64_t total_unmap_loops; +}; + +static int map_benchmark_thread(void *data) +{ + struct page *page; + dma_addr_t dma_addr; + struct map_benchmark_data *map = data; + int ret = 0; + + page = alloc_page(GFP_KERNEL); + if (!page) + return -ENOMEM; + + while (!kthread_should_stop()) { + ktime_t map_stime, map_etime, unmap_stime, unmap_etime; + + map_stime = ktime_get(); + dma_addr = dma_map_page(map->dev, page, 0, PAGE_SIZE, DMA_BIDIRECTIONAL); + if (unlikely(dma_mapping_error(map->dev, dma_addr))) { + dev_err(map->dev, "dma_map_page failed\n"); + ret = -ENOMEM; + goto out; + } + map_etime = ktime_get(); + + unmap_stime = ktime_get(); + dma_unmap_single(map->dev, dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); + unmap_etime = ktime_get(); + + atomic64_add((long long)ktime_to_ns(ktime_sub(map_etime, map_stime)), + &map->total_map_nsecs); + atomic64_add((long long)ktime_to_ns(ktime_sub(unmap_etime, unmap_stime)), + &map->total_unmap_nsecs); + atomic64_inc(&map->total_map_loops); + atomic64_inc(&map->total_unmap_loops); + } + +out: + __free_page(page); + return ret; +} + +static int do_map_benchmark(struct map_benchmark_data *map) +{ + struct task_struct **tsk; + int threads = map->bparam.threads; + int node = map->bparam.node; + const cpumask_t *cpu_mask = cpumask_of_node(node); + int ret = 0; + int i; + + tsk = kmalloc_array(threads, sizeof(tsk), GFP_KERNEL); + if (!tsk) + return -ENOMEM; + + get_device(map->dev); + + for (i = 0; i < threads; i++) { + tsk[i] = kthread_create_on_node(map_benchmark_thread, map, + map->bparam.node, "dma-map-benchmark/%d", i); + if (IS_ERR(tsk[i])) { + dev_err(map->dev, "create dma_map thread failed\n"); + return PTR_ERR(tsk[i]); + } + + if (node != NUMA_NO_NODE && node_online(node)) + kthread_bind_mask(tsk[i], cpu_mask); + + wake_up_process(tsk[i]); + } + + ssleep(map->bparam.seconds); + + /* wait for the completion of benchmark threads */ + for (i = 0; i < threads; i++) { + ret = kthread_stop(tsk[i]); + if (ret) + goto out; + } + + /* average map nsec and unmap nsec */ + map->bparam.map_nsec = atomic64_read(&map->total_map_nsecs) / + atomic64_read(&map->total_map_loops); + map->bparam.unmap_nsec = atomic64_read(&map->total_unmap_nsecs) / + atomic64_read(&map->total_unmap_loops); + +out: + put_device(map->dev); + kfree(tsk); + return ret; +} + +static long map_benchmark_ioctl(struct file *filep, unsigned int cmd, + unsigned long arg) +{ + struct map_benchmark_data *map = filep->private_data; + int ret; + + if (copy_from_user(&map->bparam, (void __user *)arg, sizeof(map->bparam))) + return -EFAULT; + + switch (cmd) { + case DMA_MAP_BENCHMARK: + ret = do_map_benchmark(map); + break; + default: + return -EINVAL; + } + + if (copy_to_user((void __user *)arg, &map->bparam, sizeof(map->bparam))) + return -EFAULT; + + return ret; +} + +static const struct file_operations map_benchmark_fops = { + .open = simple_open, + .unlocked_ioctl = map_benchmark_ioctl, +}; + +static int map_benchmark_probe(struct platform_device *pdev) +{ + struct dentry *entry; + struct map_benchmark_data *map; + + map = devm_kzalloc(&pdev->dev, sizeof(*map), GFP_KERNEL); + if (!map) + return -ENOMEM; + + map->dev = &pdev->dev; + platform_set_drvdata(pdev, map); + + /* + * we only permit a device bound with this driver, 2nd probe + * will fail + */ + entry = debugfs_create_file("dma_map_benchmark", 0600, NULL, map, + &map_benchmark_fops); + if (IS_ERR(entry)) + return PTR_ERR(entry); + map->debugfs = entry; + + return 0; +} + +static int map_benchmark_remove(struct platform_device *pdev) +{ + struct map_benchmark_data *map = platform_get_drvdata(pdev); + + debugfs_remove(map->debugfs); + + return 0; +} + +static struct platform_driver map_benchmark_driver = { + .driver = { + .name = "dma_map_benchmark", + }, + .probe = map_benchmark_probe, + .remove = map_benchmark_remove, +}; + +module_platform_driver(map_benchmark_driver); + +MODULE_AUTHOR("Barry Song "); +MODULE_DESCRIPTION("dma_map benchmark driver"); +MODULE_LICENSE("GPL"); From patchwork Tue Oct 27 03:53:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Song Bao Hua \(Barry Song\)" X-Patchwork-Id: 310949 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57C6CC55178 for ; Tue, 27 Oct 2020 03:57:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 257A12080A for ; Tue, 27 Oct 2020 03:57:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2505306AbgJ0D53 (ORCPT ); Mon, 26 Oct 2020 23:57:29 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:4588 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2505303AbgJ0D52 (ORCPT ); Mon, 26 Oct 2020 23:57:28 -0400 Received: from DGGEMS408-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4CKyZz2bmKzhZsq; Tue, 27 Oct 2020 11:57:31 +0800 (CST) Received: from SWX921481.china.huawei.com (10.126.202.177) by DGGEMS408-HUB.china.huawei.com (10.3.19.208) with Microsoft SMTP Server id 14.3.487.0; Tue, 27 Oct 2020 11:57:18 +0800 From: Barry Song To: , , , CC: , , , , , Barry Song Subject: [PATCH 2/2] selftests/dma: add test application for DMA_MAP_BENCHMARK Date: Tue, 27 Oct 2020 16:53:30 +1300 Message-ID: <20201027035330.29612-3-song.bao.hua@hisilicon.com> X-Mailer: git-send-email 2.21.0.windows.1 In-Reply-To: <20201027035330.29612-1-song.bao.hua@hisilicon.com> References: <20201027035330.29612-1-song.bao.hua@hisilicon.com> MIME-Version: 1.0 X-Originating-IP: [10.126.202.177] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org This patch provides the test application for DMA_MAP_BENCHMARK. Before running the test application, we need to bind a device to dma_map_ benchmark driver. For example, unbind "xxx" from its original driver and bind to dma_map_benchmark: echo dma_map_benchmark > /sys/bus/platform/devices/xxx/driver_override echo xxx > /sys/bus/platform/drivers/xxx/unbind echo xxx > /sys/bus/platform/drivers/dma_map_benchmark/bind Then, run 10 threads on numa node 1 for 10 seconds on device "xxx": ./dma_map_benchmark -t 10 -s 10 -n 1 dma mapping benchmark: average map_nsec:3619 average unmap_nsec:2423 Cc: Joerg Roedel Cc: Will Deacon Cc: Shuah Khan Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Signed-off-by: Barry Song --- MAINTAINERS | 6 ++ tools/testing/selftests/dma/Makefile | 6 ++ tools/testing/selftests/dma/config | 1 + .../testing/selftests/dma/dma_map_benchmark.c | 72 +++++++++++++++++++ 4 files changed, 85 insertions(+) create mode 100644 tools/testing/selftests/dma/Makefile create mode 100644 tools/testing/selftests/dma/config create mode 100644 tools/testing/selftests/dma/dma_map_benchmark.c diff --git a/MAINTAINERS b/MAINTAINERS index f310f0a09904..552389874ca2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5220,6 +5220,12 @@ F: include/linux/dma-mapping.h F: include/linux/dma-map-ops.h F: kernel/dma/ +DMA MAPPING BENCHMARK +M: Barry Song +L: iommu@lists.linux-foundation.org +F: kernel/dma/map_benchmark.c +F: tools/testing/selftests/dma/ + DMA-BUF HEAPS FRAMEWORK M: Sumit Semwal R: Andrew F. Davis diff --git a/tools/testing/selftests/dma/Makefile b/tools/testing/selftests/dma/Makefile new file mode 100644 index 000000000000..aa8e8b5b3864 --- /dev/null +++ b/tools/testing/selftests/dma/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0 +CFLAGS += -I../../../../usr/include/ + +TEST_GEN_PROGS := dma_map_benchmark + +include ../lib.mk diff --git a/tools/testing/selftests/dma/config b/tools/testing/selftests/dma/config new file mode 100644 index 000000000000..6102ee3c43cd --- /dev/null +++ b/tools/testing/selftests/dma/config @@ -0,0 +1 @@ +CONFIG_DMA_MAP_BENCHMARK=y diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c b/tools/testing/selftests/dma/dma_map_benchmark.c new file mode 100644 index 000000000000..e03bd03e101e --- /dev/null +++ b/tools/testing/selftests/dma/dma_map_benchmark.c @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2020 Hisilicon Limited. + */ + +#include +#include +#include +#include +#include +#include +#include + +#define DMA_MAP_BENCHMARK _IOWR('d', 1, struct map_benchmark) + +struct map_benchmark { + __u64 map_nsec; + __u64 unmap_nsec; + __u32 threads; /* how many threads will do map/unmap in parallel */ + __u32 seconds; /* how long the test will last */ + int node; /* which numa node this benchmark will run on */ + __u64 expansion[10]; /* For future use */ +}; + +int main(int argc, char **argv) +{ + struct map_benchmark map; + int fd, opt, threads = 0, seconds = 0, node = -1; + int cmd = DMA_MAP_BENCHMARK; + char *p; + + while ((opt = getopt(argc, argv, "t:s:n:")) != -1) { + switch (opt) { + case 't': + threads = atoi(optarg); + break; + case 's': + seconds = atoi(optarg); + break; + case 'n': + node = atoi(optarg); + break; + default: + return -1; + } + } + + if (threads <= 0 || seconds <= 0) { + perror("invalid number of threads or seconds"); + exit(1); + } + + fd = open("/sys/kernel/debug/dma_map_benchmark", O_RDWR); + if (fd == -1) { + perror("open"); + exit(1); + } + + map.seconds = seconds; + map.threads = threads; + map.node = node; + if (ioctl(fd, cmd, &map)) { + perror("ioctl"); + exit(1); + } + + printf("dma mapping benchmark: average map_nsec:%lld average unmap_nsec:%lld\n", + map.map_nsec, + map.unmap_nsec); + + return 0; +}