From patchwork Wed Oct 26 09:27:56 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valente X-Patchwork-Id: 79379 Delivered-To: patch@linaro.org Received: by 10.140.97.247 with SMTP id m110csp308120qge; Wed, 26 Oct 2016 02:30:25 -0700 (PDT) X-Received: by 10.98.205.207 with SMTP id o198mr2435473pfg.114.1477474225444; Wed, 26 Oct 2016 02:30:25 -0700 (PDT) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c9si1385306pas.21.2016.10.26.02.30.25; Wed, 26 Oct 2016 02:30:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758567AbcJZJaM (ORCPT + 27 others); Wed, 26 Oct 2016 05:30:12 -0400 Received: from mail-wm0-f46.google.com ([74.125.82.46]:37778 "EHLO mail-wm0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752363AbcJZJ3K (ORCPT ); Wed, 26 Oct 2016 05:29:10 -0400 Received: by mail-wm0-f46.google.com with SMTP id 140so10057088wmv.0 for ; Wed, 26 Oct 2016 02:28:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=QZtIHZ35CXwV5efE4ZukxoJ9w8XRpgkxTiJMwK0bLx8=; b=N2eLDY21VJ8K51VEvsPbCgKmztRizK1iIvyYhiFJZHboqaZuN49R4IpLP2icI2wtIT 0UR3EENGCdpTZ1vxNIyHIqFUKbdQ1h+S1UKPPdLUuJQGwJQoCRoBvqHA0K9MUeyW1hgB jHihISqpn9d+V6GXu7o7AZpwTUdgJcvFcCqyw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=QZtIHZ35CXwV5efE4ZukxoJ9w8XRpgkxTiJMwK0bLx8=; b=DWPdQaJAAviww556dvuxNI30xwQpOMwKsDLUo/obuMemjBJTDmyFiYB6NtCeWnChWa 5RZswCjI0H4YPKMRAoCRT5cDhJ64G1WpXJRXrJM8wMOV1eNsF4tDitcQTuZHZfLYswMG OoRGry3uAobYdAQWa1dHqn7kriccKFD12aqqbQYWnolv9b2Kqu9S/VkUvwaKtr7gub+y aZ+r/qfEJVh12SzcYDZ1sOiTIak3p6vhdbCoUgNiIeNNmVkr8YdqGOMl9XPPpJf/jrBP FRksY64wLWw6GwvYqKZrqdQB4BlRaQLpacCh+JRu8oWb01Hm4DNXZPdmoBu+mAX6LeMJ hXGQ== X-Gm-Message-State: ABUngvd21bVCLmDSjYk1ogv9ai4VQzdAU8DcRPgUwlVFQwbBUlbfYyoOEQfdDPgsOqzObwxU X-Received: by 10.28.100.135 with SMTP id y129mr7192749wmb.32.1477474098368; Wed, 26 Oct 2016 02:28:18 -0700 (PDT) Received: from paolo-VirtualBox.mat.unimo.it (nb-valente.mat.unimo.it. [155.185.5.181]) by smtp.gmail.com with ESMTPSA id q134sm8529945wme.3.2016.10.26.02.28.17 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 26 Oct 2016 02:28:17 -0700 (PDT) From: Paolo Valente To: Jens Axboe , Tejun Heo Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, linus.walleij@linaro.org, broonie@kernel.org, hare@suse.de, arnd@arndb.de, bart.vanassche@sandisk.com, grant.likely@secretlab.ca, jack@suse.cz, James.Bottomley@HansenPartnership.com, Paolo Valente , Arianna Avanzini Subject: [PATCH 03/14] block, bfq: improve throughput boosting Date: Wed, 26 Oct 2016 11:27:56 +0200 Message-Id: <1477474082-2846-4-git-send-email-paolo.valente@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1477474082-2846-1-git-send-email-paolo.valente@linaro.org> References: <1477474082-2846-1-git-send-email-paolo.valente@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The feedback-loop algorithm used by BFQ to compute queue (process) budgets is basically a set of three update rules, one for each of the main reasons why a queue may be expired. If many processes suddenly switch from sporadic I/O to greedy and sequential I/O, then these rules are quite slow to assign large budgets to these processes, and hence to achieve a high throughput. On the opposite side, BFQ assigns the maximum possible budget B_max to a just-created queue. This allows a high throughput to be achieved immediately if the associated process is I/O-bound and performs sequential I/O from the beginning. But it also increases the worst-case latency experienced by the first requests issued by the process, because the larger the budget of a queue waiting for service is, the later the queue will be served by B-WF2Q+ (Subsec 3.3 in [1]). This is detrimental for an interactive or soft real-time application. To tackle these throughput and latency problems, on one hand this patch changes the initial budget value to B_max/2. On the other hand, it re-tunes the three rules, adopting a more aggressive, multiplicative increase/linear decrease scheme. This scheme trades latency for throughput more than before, and tends to assign large budgets quickly to processes that are or become I/O-bound. For two of the expiration reasons, the new version of the rules also contains some more little improvements, briefly described below. *No more backlog.* In this case, the budget was larger than the number of sectors actually read/written by the process before it stopped doing I/O. Hence, to reduce latency for the possible future I/O requests of the process, the old rule simply set the next budget to the number of sectors actually consumed by the process. However, if there are still outstanding requests, then the process may have not yet issued its next request just because it is still waiting for the completion of some of the still outstanding ones. If this sub-case holds true, then the new rule, instead of decreasing the budget, doubles it, proactively, in the hope that: 1) a larger budget will fit the actual needs of the process, and 2) the process is sequential and hence a higher throughput will be achieved by serving the process longer after granting it access to the device. *Budget timeout*. The original rule set the new budget to the maximum value B_max, to maximize throughput and let all processes experiencing budget timeouts receive the same share of the device time. In our experiments we verified that this sudden jump to B_max did not provide sensible benefits; rather it increased the latency of processes performing sporadic and short I/O. The new rule only doubles the budget. [1] P. Valente and M. Andreolini, "Improving Application Responsiveness with the BFQ Disk I/O Scheduler", Proceedings of the 5th Annual International Systems and Storage Conference (SYSTOR '12), June 2012. Slightly extended version: http://algogroup.unimore.it/people/paolo/disk_sched/bfq-v1-suite- results.pdf Signed-off-by: Paolo Valente Signed-off-by: Arianna Avanzini --- block/bfq-iosched.c | 83 ++++++++++++++++++++++++++--------------------------- 1 file changed, 41 insertions(+), 42 deletions(-) -- 2.10.0 diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index e33f85e..fa1e531 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -692,9 +692,6 @@ struct kmem_cache *bfq_pool; #define BFQQ_SEEK_THR (sector_t)(8 * 100) #define BFQQ_SEEKY(bfqq) (hweight32(bfqq->seek_history) > 32/8) -/* Budget feedback step. */ -#define BFQ_BUDGET_STEP 128 - /* Min samples used for peak rate estimation (for autotuning). */ #define BFQ_PEAK_RATE_SAMPLES 32 @@ -3609,36 +3606,6 @@ static struct bfq_queue *bfq_set_in_service_queue(struct bfq_data *bfqd) return bfqq; } -/* - * bfq_default_budget - return the default budget for @bfqq on @bfqd. - * @bfqd: the device descriptor. - * @bfqq: the queue to consider. - * - * We use 3/4 of the @bfqd maximum budget as the default value - * for the max_budget field of the queues. This lets the feedback - * mechanism to start from some middle ground, then the behavior - * of the process will drive the heuristics towards high values, if - * it behaves as a greedy sequential reader, or towards small values - * if it shows a more intermittent behavior. - */ -static unsigned long bfq_default_budget(struct bfq_data *bfqd, - struct bfq_queue *bfqq) -{ - unsigned long budget; - - /* - * When we need an estimate of the peak rate we need to avoid - * to give budgets that are too short due to previous measurements. - * So, in the first 10 assignments use a ``safe'' budget value. - */ - if (bfqd->budgets_assigned < 194 && bfqd->bfq_user_max_budget == 0) - budget = bfq_default_max_budget; - else - budget = bfqd->bfq_max_budget; - - return budget - budget / 4; -} - static void bfq_arm_slice_timer(struct bfq_data *bfqd) { struct bfq_queue *bfqq = bfqd->in_service_queue; @@ -3780,13 +3747,47 @@ static void __bfq_bfqq_recalc_budget(struct bfq_data *bfqd, * for throughput. */ case BFQ_BFQQ_TOO_IDLE: - if (budget > min_budget + BFQ_BUDGET_STEP) - budget -= BFQ_BUDGET_STEP; - else - budget = min_budget; + /* + * This is the only case where we may reduce + * the budget: if there is no request of the + * process still waiting for completion, then + * we assume (tentatively) that the timer has + * expired because the batch of requests of + * the process could have been served with a + * smaller budget. Hence, betting that + * process will behave in the same way when it + * becomes backlogged again, we reduce its + * next budget. As long as we guess right, + * this budget cut reduces the latency + * experienced by the process. + * + * However, if there are still outstanding + * requests, then the process may have not yet + * issued its next request just because it is + * still waiting for the completion of some of + * the still outstanding ones. So in this + * subcase we do not reduce its budget, on the + * contrary we increase it to possibly boost + * the throughput, as discussed in the + * comments to the BUDGET_TIMEOUT case. + */ + if (bfqq->dispatched > 0) /* still outstanding reqs */ + budget = min(budget * 2, bfqd->bfq_max_budget); + else { + if (budget > 5 * min_budget) + budget -= 4 * min_budget; + else + budget = min_budget; + } break; case BFQ_BFQQ_BUDGET_TIMEOUT: - budget = bfq_default_budget(bfqd, bfqq); + /* + * We double the budget here because it gives + * the chance to boost the throughput if this + * is not a seeky process (and has bumped into + * this timeout because of, e.g., ZBR). + */ + budget = min(budget * 2, bfqd->bfq_max_budget); break; case BFQ_BFQQ_BUDGET_EXHAUSTED: /* @@ -3798,8 +3799,7 @@ static void __bfq_bfqq_recalc_budget(struct bfq_data *bfqd, * definitely increase the budget of this good * candidate to boost the disk throughput. */ - budget = min(budget + 8 * BFQ_BUDGET_STEP, - bfqd->bfq_max_budget); + budget = min(budget * 4, bfqd->bfq_max_budget); break; case BFQ_BFQQ_NO_MORE_REQUESTS: /* @@ -4533,9 +4533,8 @@ static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq, bfqq->pid = pid; /* Tentative initial value to trade off between thr and lat */ - bfqq->max_budget = bfq_default_budget(bfqd, bfqq); + bfqq->max_budget = (2 * bfq_max_budget(bfqd)) / 3; bfqq->budget_timeout = bfq_smallest_from_now(); - bfqq->pid = pid; /* first request is almost certainly seeky */ bfqq->seek_history = 1;