From patchwork Sun Jan 28 23:44:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Linus Walleij X-Patchwork-Id: 126128 Delivered-To: patch@linaro.org Received: by 10.46.84.92 with SMTP id y28csp1988328ljd; Sun, 28 Jan 2018 15:47:08 -0800 (PST) X-Google-Smtp-Source: AH8x226N9gBfif62hTBWIw5PwMXz+tvU34RTfOKibIFzmSP/OipL4B0x0wcNaI0LQZvMQYGSPoK/ X-Received: by 2002:a17:902:33c1:: with SMTP id b59-v6mr10669037plc.111.1517183228823; Sun, 28 Jan 2018 15:47:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517183228; cv=none; d=google.com; s=arc-20160816; b=kwajL2zSaVotWJsZ7NYoqvOf2Ny8icpdYJ4AlKR+UpvnmTzP12W3wdfdDkrTSE9YWL ytPh7BakCEl1VpS+khdt5VegHj5y1uRr2xatYXzl/C0JLp+wIP8n8c/E6SPs7SKveMWC p8zwsjAvwfB3NMwl31iv+Zqx4qE2ygqMxNa3yTv9nPcqlQRJWM83Ckz/IiqpxWsIqo55 764xfSYArICpVN4t9zIQdtnu+ykG2FCuJs1nPLeRsoWQaIGwPrn3OZo7yhjxoZIneZJ6 Uk2XYUeor4ajJWjHz+kK45DgSyugtfT1Cd6T4nssGjaxKbsYKOe0HssiE6YAHAOsQgf+ 8BMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature :arc-authentication-results; bh=fXqxyXTqkJ7dnEPb/331yd3zGm1safjjfNf7YzUMgh8=; b=S0yb03YmujrmIN8MAzr38G9HH/AoGTOMsoyCROJTRd48DjO8uA5ybXx3hrIzbwFgJv snrK2Lbrimbz1mu+PCXSBMPSiFlwqk/mhhZGNDg9w3dX2enyIRuhJZT0QnLi8kzYxjPQ R0ZCqVf+iyToy0ssAg65XXbuw/WAfM4fCVgS9XI4i2YdHuE0RIRRp0GQUhttzAGFts8I 0FQaQJEhWtemXaPXsekhAh/Sd0+gWOc0TPH0zXCgm2aMIhBqWt8KrnhFk9Y8CnGx1/0t KJTlnu2DP+7wbT9rsLfYkb52r9v252jtLdNyhCRuB3WAc8YRvu+6lEIbpg+KdmHVYvkW ZCjg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=a6abJyXM; spf=pass (google.com: best guess record for domain of stable-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1-v6si7879332ply.348.2018.01.28.15.47.08; Sun, 28 Jan 2018 15:47:08 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of stable-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=a6abJyXM; spf=pass (google.com: best guess record for domain of stable-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754016AbeA1XrH (ORCPT + 10 others); Sun, 28 Jan 2018 18:47:07 -0500 Received: from mail-lf0-f68.google.com ([209.85.215.68]:44322 "EHLO mail-lf0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753710AbeA1XrG (ORCPT ); Sun, 28 Jan 2018 18:47:06 -0500 Received: by mail-lf0-f68.google.com with SMTP id v188so7355639lfa.11 for ; Sun, 28 Jan 2018 15:47:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=fXqxyXTqkJ7dnEPb/331yd3zGm1safjjfNf7YzUMgh8=; b=a6abJyXMgUygGTLY9XjncKbaFqA648fO4Zc3M8BsCiM9gPkxcG7RE8oUYpbe2Z+pFz k4uQlQ8MV1gFnQxNvexMFV3VATs96HmQfOMXgy1z2dxjouq6OKE4RO6uWOiYMbBPKbm8 wOKguwCUUor6/6ZB2oixdc4qrF6oe1vmunY5w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=fXqxyXTqkJ7dnEPb/331yd3zGm1safjjfNf7YzUMgh8=; b=EVfPOVqRv1PPgN+Hcs+Nm9uc/3c3dXwxEzy5L0o+TTLwLu2qJWEKk86cxcr5UkZeX7 nZkrcMTeFCuQA/ZyVd8LX4nvzCSr/fBi/bs5aR0+YIC/AJoVSW4O9ZC13ccG0G2dt7W2 B3OS6fL383s3NNIer3dzVcZJcNns+zLbshQzemwRaoGViGu6g5UqLbCzRw4b7u2mwju9 vxOZWEqSlbI8QNCbClkTbqJwCX0z7z/IO2uJVHr57e6HZnClWak6PMFK8gev0xv0PM32 9rlDOuO7yXINGdtW2wBddaE8Ottg6ui7w2tX/sBGr22Nh0FWQHY5JFsJyrxbwDw7b38A mtPQ== X-Gm-Message-State: AKwxytePQGN1lh+a3TKRpHuIq+IwIA3e0rmG11AqCvvGWAhUgxwy3PiR gAo1dQ47e2gQfeiBifjptBhGvw== X-Received: by 10.25.1.23 with SMTP id 23mr13165060lfb.131.1517183224275; Sun, 28 Jan 2018 15:47:04 -0800 (PST) Received: from localhost.localdomain (c-cb7471d5.014-348-6c756e10.cust.bredbandsbolaget.se. [213.113.116.203]) by smtp.gmail.com with ESMTPSA id i23sm2619163lfh.4.2018.01.28.15.47.03 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 28 Jan 2018 15:47:03 -0800 (PST) From: Linus Walleij To: linux-mmc@vger.kernel.org, Ulf Hansson , Benjamin Beckmeyer , Adrian Hunter Cc: Linus Walleij , Pierre Ossman , =?utf-8?q?Beno=C3=AEt_Th=C3=A9baudeau?= , Fabio Estevam , stable@vger.kernel.org Subject: [PATCH v8] mmc: sdhci: Implement an SDHCI-specific bounce buffer Date: Mon, 29 Jan 2018 00:44:53 +0100 Message-Id: <20180128234453.2880-1-linus.walleij@linaro.org> X-Mailer: git-send-email 2.14.3 MIME-Version: 1.0 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org The bounce buffer is gone from the MMC core, and now we found out that there are some (crippled) i.MX boards out there that have broken ADMA (cannot do scatter-gather), and also broken PIO so they must use SDMA. Closer examination shows a less significant slowdown also on SDMA-only capable Laptop hosts. SDMA sets down the number of segments to one, so that each segment gets turned into a singular request that ping-pongs to the block layer before the next request/segment is issued. Apparently it happens a lot that the block layer send requests that include a lot of physically discontiguous segments. My guess is that this phenomenon is coming from the file system. These devices that cannot handle scatterlists in hardware can see major benefits from a DMA-contiguous bounce buffer. This patch accumulates those fragmented scatterlists in a physically contiguous bounce buffer so that we can issue bigger DMA data chunks to/from the card. When tested with a PCI-integrated host (1217:8221) that only supports SDMA: 0b:00.0 SD Host controller: O2 Micro, Inc. OZ600FJ0/OZ900FJ0/OZ600FJS SD/MMC Card Reader Controller (rev 05) This patch gave ~1Mbyte/s improved throughput on large reads and writes when testing using iozone than without the patch. dmesg: sdhci-pci 0000:0b:00.0: SDHCI controller found [1217:8221] (rev 5) mmc0 bounce up to 128 segments into one, max segment size 65536 bytes mmc0: SDHCI controller on PCI [0000:0b:00.0] using DMA On the i.MX SDHCI controllers on the crippled i.MX 25 and i.MX 35 the patch restores the performance to what it was before we removed the bounce buffers. Cc: Pierre Ossman Cc: Benoît Thébaudeau Cc: Fabio Estevam Cc: Benjamin Beckmeyer Cc: stable@vger.kernel.org # v4.14+ Fixes: de3ee99b097d ("mmc: Delete bounce buffer handling") Tested-by: Benjamin Beckmeyer Acked-by: Adrian Hunter Signed-off-by: Linus Walleij --- ChangeLog v7->v8: - Fixed bad information and spelling mistakes in the commit message. - Use sdhci_sdma_address() in one more spot identified by Adrian. - Collected Adrian's ACK. ChangeLog v6->v7: - Fix the directions on dma_sync_for[device|cpu]() so the ownership of the buffer gets swapped properly and in the right direction for every transfer. Didn't see this because x86 PCI is DMA coherent... - Tested and greelighted on i.MX 25. - Also tested on the PCI version. ChangeLog v5->v6: - Again switch back to explicit sync of buffers. I want to get this solution to work because it gives more control and it's more elegant. - Update host->max_req_size as noted by Adrian, hopefully this fixes the i.MX. I was just lucky on my Intel laptop I guess: the block stack never requested anything bigger than 64KB and that was why it worked even if max_req_size was bigger than what would fit in the bounce buffer. - Copy the number of bytes in the mmc_data instead of the number of bytes in the bounce buffer. For RX this is blksize * blocks and for TX this is bytes_xfered. - Break out a sdhci_sdma_address() for getting the DMA address for either the raw sglist or the bounce buffer depending on configuration. - Add some explicit bounds check for the data so that we do not attempt to copy more than the bounce buffer size even if the block layer is erroneously configured. - Move allocation of bounce buffer out to its own function. - Use pr_[info|err] throughout so all debug prints from the driver come out in the same manner and style. - Use unsigned int for the bounce buffer size. - Re-tested with iozone: we still get the same nice performance improvements. - Request a text on i.MX (hi Benjamin) ChangeLog v4->v5: - Go back to dma_alloc_coherent() as this apparently works better. - Keep the other changes, cap for 64KB, fall back to single segments. - Requesting a test of this on i.MX. (Sorry Benjamin.) ChangeLog v3->v4: - Cap the bounce buffer to 64KB instead of the biggest segment as we experience diminishing returns with buffers > 64KB. - Instead of using dma_alloc_coherent(), use good old devm_kmalloc() and issue dma_sync_single_for*() to explicitly switch ownership between CPU and the device. This way we exercise the cache better and may consume less CPU. - Bail out with single segments if we cannot allocate a bounce buffer. - Tested on the PCI SDHCI on my laptop: requesting a new test on i.MX from Benjamin. (Please!) ChangeLog v2->v3: - Rewrite the commit message a bit - Add Benjamin's Tested-by - Add Fixes and stable tags ChangeLog v1->v2: - Skip the remapping and fiddling with the buffer, instead use dma_alloc_coherent() and use a simple, coherent bounce buffer. - Couple kernel messages to ->parent of the mmc_host as it relates to the hardware characteristics. --- drivers/mmc/host/sdhci.c | 164 ++++++++++++++++++++++++++++++++++++++++++++--- drivers/mmc/host/sdhci.h | 3 + 2 files changed, 159 insertions(+), 8 deletions(-) -- 2.14.3 diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index e9290a3439d5..d24306b2b839 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include @@ -502,8 +503,35 @@ static int sdhci_pre_dma_transfer(struct sdhci_host *host, if (data->host_cookie == COOKIE_PRE_MAPPED) return data->sg_count; - sg_count = dma_map_sg(mmc_dev(host->mmc), data->sg, data->sg_len, - mmc_get_dma_dir(data)); + /* Bounce write requests to the bounce buffer */ + if (host->bounce_buffer) { + unsigned int length = data->blksz * data->blocks; + + if (length > host->bounce_buffer_size) { + pr_err("%s: asked for transfer of %u bytes exceeds bounce buffer %u bytes\n", + mmc_hostname(host->mmc), length, + host->bounce_buffer_size); + return -EIO; + } + if (mmc_get_dma_dir(data) == DMA_TO_DEVICE) { + /* Copy the data to the bounce buffer */ + sg_copy_to_buffer(data->sg, data->sg_len, + host->bounce_buffer, + length); + } + /* Switch ownership to the DMA */ + dma_sync_single_for_device(host->mmc->parent, + host->bounce_addr, + host->bounce_buffer_size, + mmc_get_dma_dir(data)); + /* Just a dummy value */ + sg_count = 1; + } else { + /* Just access the data directly from memory */ + sg_count = dma_map_sg(mmc_dev(host->mmc), + data->sg, data->sg_len, + mmc_get_dma_dir(data)); + } if (sg_count == 0) return -ENOSPC; @@ -673,6 +701,14 @@ static void sdhci_adma_table_post(struct sdhci_host *host, } } +static u32 sdhci_sdma_address(struct sdhci_host *host) +{ + if (host->bounce_buffer) + return host->bounce_addr; + else + return sg_dma_address(host->data->sg); +} + static u8 sdhci_calc_timeout(struct sdhci_host *host, struct mmc_command *cmd) { u8 count; @@ -858,8 +894,8 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_command *cmd) SDHCI_ADMA_ADDRESS_HI); } else { WARN_ON(sg_cnt != 1); - sdhci_writel(host, sg_dma_address(data->sg), - SDHCI_DMA_ADDRESS); + sdhci_writel(host, sdhci_sdma_address(host), + SDHCI_DMA_ADDRESS); } } @@ -2248,7 +2284,12 @@ static void sdhci_pre_req(struct mmc_host *mmc, struct mmc_request *mrq) mrq->data->host_cookie = COOKIE_UNMAPPED; - if (host->flags & SDHCI_REQ_USE_DMA) + /* + * No pre-mapping in the pre hook if we're using the bounce buffer, + * for that we would need two bounce buffers since one buffer is + * in flight when this is getting called. + */ + if (host->flags & SDHCI_REQ_USE_DMA && !host->bounce_buffer) sdhci_pre_dma_transfer(host, mrq->data, COOKIE_PRE_MAPPED); } @@ -2352,8 +2393,45 @@ static bool sdhci_request_done(struct sdhci_host *host) struct mmc_data *data = mrq->data; if (data && data->host_cookie == COOKIE_MAPPED) { - dma_unmap_sg(mmc_dev(host->mmc), data->sg, data->sg_len, - mmc_get_dma_dir(data)); + if (host->bounce_buffer) { + /* + * On reads, copy the bounced data into the + * sglist + */ + if (mmc_get_dma_dir(data) == DMA_FROM_DEVICE) { + unsigned int length = data->bytes_xfered; + + if (length > host->bounce_buffer_size) { + pr_err("%s: bounce buffer is %u bytes but DMA claims to have transferred %u bytes\n", + mmc_hostname(host->mmc), + host->bounce_buffer_size, + data->bytes_xfered); + /* Cap it down and continue */ + length = host->bounce_buffer_size; + } + dma_sync_single_for_cpu( + host->mmc->parent, + host->bounce_addr, + host->bounce_buffer_size, + DMA_FROM_DEVICE); + sg_copy_from_buffer(data->sg, + data->sg_len, + host->bounce_buffer, + length); + } else { + /* No copying, just switch ownership */ + dma_sync_single_for_cpu( + host->mmc->parent, + host->bounce_addr, + host->bounce_buffer_size, + mmc_get_dma_dir(data)); + } + } else { + /* Unmap the raw data */ + dma_unmap_sg(mmc_dev(host->mmc), data->sg, + data->sg_len, + mmc_get_dma_dir(data)); + } data->host_cookie = COOKIE_UNMAPPED; } } @@ -2636,7 +2714,8 @@ static void sdhci_data_irq(struct sdhci_host *host, u32 intmask) */ if (intmask & SDHCI_INT_DMA_END) { u32 dmastart, dmanow; - dmastart = sg_dma_address(host->data->sg); + + dmastart = sdhci_sdma_address(host); dmanow = dmastart + host->data->bytes_xfered; /* * Force update to the next DMA block boundary. @@ -3217,6 +3296,68 @@ void __sdhci_read_caps(struct sdhci_host *host, u16 *ver, u32 *caps, u32 *caps1) } EXPORT_SYMBOL_GPL(__sdhci_read_caps); +static int sdhci_allocate_bounce_buffer(struct sdhci_host *host) +{ + struct mmc_host *mmc = host->mmc; + unsigned int max_blocks; + unsigned int bounce_size; + int ret; + + /* + * Cap the bounce buffer at 64KB. Using a bigger bounce buffer + * has diminishing returns, this is probably because SD/MMC + * cards are usually optimized to handle this size of requests. + */ + bounce_size = SZ_64K; + /* + * Adjust downwards to maximum request size if this is less + * than our segment size, else hammer down the maximum + * request size to the maximum buffer size. + */ + if (mmc->max_req_size < bounce_size) + bounce_size = mmc->max_req_size; + max_blocks = bounce_size / 512; + + /* + * When we just support one segment, we can get significant + * speedups by the help of a bounce buffer to group scattered + * reads/writes together. + */ + host->bounce_buffer = devm_kmalloc(mmc->parent, + bounce_size, + GFP_KERNEL); + if (!host->bounce_buffer) { + pr_err("%s: failed to allocate %u bytes for bounce buffer, falling back to single segments\n", + mmc_hostname(mmc), + bounce_size); + /* + * Exiting with zero here makes sure we proceed with + * mmc->max_segs == 1. + */ + return 0; + } + + host->bounce_addr = dma_map_single(mmc->parent, + host->bounce_buffer, + bounce_size, + DMA_BIDIRECTIONAL); + ret = dma_mapping_error(mmc->parent, host->bounce_addr); + if (ret) + /* Again fall back to max_segs == 1 */ + return 0; + host->bounce_buffer_size = bounce_size; + + /* Lie about this since we're bouncing */ + mmc->max_segs = max_blocks; + mmc->max_seg_size = bounce_size; + mmc->max_req_size = bounce_size; + + pr_info("%s bounce up to %u segments into one, max segment size %u bytes\n", + mmc_hostname(mmc), max_blocks, bounce_size); + + return 0; +} + int sdhci_setup_host(struct sdhci_host *host) { struct mmc_host *mmc; @@ -3713,6 +3854,13 @@ int sdhci_setup_host(struct sdhci_host *host) */ mmc->max_blk_count = (host->quirks & SDHCI_QUIRK_NO_MULTIBLOCK) ? 1 : 65535; + if (mmc->max_segs == 1) { + /* This may alter mmc->*_blk_* parameters */ + ret = sdhci_allocate_bounce_buffer(host); + if (ret) + return ret; + } + return 0; unreg: diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h index 54bc444c317f..1d7d61e25dbf 100644 --- a/drivers/mmc/host/sdhci.h +++ b/drivers/mmc/host/sdhci.h @@ -440,6 +440,9 @@ struct sdhci_host { int irq; /* Device IRQ */ void __iomem *ioaddr; /* Mapped address */ + char *bounce_buffer; /* For packing SDMA reads/writes */ + dma_addr_t bounce_addr; + unsigned int bounce_buffer_size; const struct sdhci_ops *ops; /* Low level hw interface */