From patchwork Mon Feb  5 19:05:10 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Paolo Valente <paolo.valente@linaro.org>
X-Patchwork-Id: 126922
Delivered-To: patch@linaro.org
Received: by 10.46.124.24 with SMTP id x24csp2234823ljc;
 Mon, 5 Feb 2018 11:05:49 -0800 (PST)
X-Google-Smtp-Source: AH8x226Rb7h858GTHANUYtLhA7d3GItTyrmx3d+BnP2de8zSMOGerv9MoNv4/60jm64t5qH9JdLM
X-Received: by 2002:a17:902:9306:: with SMTP id
 bc6-v6mr44130766plb.29.1517857549204; 
 Mon, 05 Feb 2018 11:05:49 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1517857549; cv=none;
 d=google.com; s=arc-20160816;
 b=TCVTepShMk/mCink0NQQT15qAVGZZur7tuuqu2QB+u0RTSZt1Rjp2/a5FPPYMFI30y
 /EsQObU2LtvKzkqrXTufI6ItZ8868GX4ACy67+oZFAohQTGXEfZiha0pnRw/fuNa1zzQ
 VHRdChRcUdPzI3HEm0yIJjFUNk5tFE2M1s29VJdZ1+lEGctXm7L+saG4X7zGJFTTCQYe
 lZ5swAILUZ8fApD+ZNQYMKH2oRWpjLuZL0ugM7Hvk+3Gm1+0x14aQeYmgTYyszS4DJiM
 yq8h350I++Y1XRni7QvFbBivDolPBbifA7ZQBmcaEsmISYG7JvurJYNh54duwT+KlffB
 p6mA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=jDnqNC/9gNCHKsvoOrBde/hN6ChpLTpSjXE6l1r9hj8=;
 b=M2BtxpBS+jayvXF63yQ27ggoN2armM8WW8sz+pB7K7GJLvdTiqgxUu+B/EMirOVBxK
 EyxOq+fxUvJi3XpXwVt9k+bMWchtI6LR8DwtjulkwG51OGA4JNYZ08dR/Ss6/MJGpiXU
 jgVT83F7AHPz5v1xOg0aQaz1FVFEDEJmuNJEg1MZKSguKTLky2Ugn/AzYa13/SIGXLLJ
 0ObXN0Le++FFpq4CEEMxXeHy2RxcAUN35gmU1TQeCCm0rM/TkTijs7UNU0eqnp7YxIPp
 80l+TdwUIJ1s4kqj+i4wrLOAHyeW7fZ0+yU+W1gxdtgKt8DMnRH7v0xmgN58Dzcud3Nl
 qq7w==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=VBwtdpCU;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 2-v6si7268369plc.784.2018.02.05.11.05.48; 
 Mon, 05 Feb 2018 11:05:49 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=VBwtdpCU;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1753887AbeBETFr (ORCPT <rfc822; dan.rue@linaro.org> + 27 others); 
 Mon, 5 Feb 2018 14:05:47 -0500
Received: from mail-wr0-f196.google.com ([209.85.128.196]:41874 "EHLO
 mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1753878AbeBETF2 (ORCPT
 <rfc822;linux-kernel@vger.kernel.org>);
 Mon, 5 Feb 2018 14:05:28 -0500
Received: by mail-wr0-f196.google.com with SMTP id v15so30764569wrb.8
 for <linux-kernel@vger.kernel.org>;
 Mon, 05 Feb 2018 11:05:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=jDnqNC/9gNCHKsvoOrBde/hN6ChpLTpSjXE6l1r9hj8=;
 b=VBwtdpCUmU7UfWK1LWoUW/TbfTNDcXI37PAJdtDUov7vZTw8l0sORrqpFc5enU0wjl
 dO3lQMt6KXBDJ4Utl0uDjvKYEf3Bwth+4/keoZtlfZzeiRsSrRiZLpf2oDbxyZnRURVB
 NJ7brD98YhUwyaTF2/taYJGvMq2KRJQX5chSc=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=jDnqNC/9gNCHKsvoOrBde/hN6ChpLTpSjXE6l1r9hj8=;
 b=bA4aFRATJfdFxnOKkQ2sOU9jPWlruws5y3MUgG8K4cUUeSYoFj5eIF9HI6kKKAl0qp
 36JrfZOeaSx+QQAffdgRBSWc5kuYDB3vLtSpW8vmLI6QVJAJ3P3khFmJwTwLIQcFsbuc
 8iRvvnQEc4Pp5rSn8rDhDAHkD4NG1V9WDUluNIcAZYbENkk7Nz/8G9HOmSAVEPYPy1Cd
 wc0MrP98WeGKzkssE2ph7F3/cy64XUx3e/ngZuKRpyGL1+nViGNG5Sq4DhrrPl818UvE
 jNk52m6kqNj77L/zbEmfDdsFL0sXQIPXeQ52GElSjGqshbcE0bdH9vvuKJm/4iFSum0E
 054A==
X-Gm-Message-State: AKwxytd02uUEtbVq1KR/rDFhmp7bAehiT4dMK6tBg67/EVk6cT6HmutB
 CvG2WY69ejePZEuR4S00ogwadA==
X-Received: by 10.223.178.206 with SMTP id g72mr17443747wrd.213.1517857527136; 
 Mon, 05 Feb 2018 11:05:27 -0800 (PST)
Received: from localhost.localdomain (146-241-18-147.dyn.eolo.it.
 [146.241.18.147]) by smtp.gmail.com with ESMTPSA id
 h200sm4539004wme.11.2018.02.05.11.05.25
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 05 Feb 2018 11:05:26 -0800 (PST)
From: Paolo Valente <paolo.valente@linaro.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
 ulf.hansson@linaro.org, broonie@kernel.org,
 linus.walleij@linaro.org, bfq-iosched@googlegroups.com,
 oleksandr@natalenko.name, alban.browaeys@gmail.com,
 ming.lei@redhat.com, ivan@ludios.org, 169364@studenti.unimore.it,
 Paolo Valente <paolo.valente@linaro.org>,
 Serena Ziviani <ziviani.serena@gmail.com>
Subject: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook
Date: Mon,  5 Feb 2018 20:05:10 +0100
Message-Id: <20180205190510.5499-2-paolo.valente@linaro.org>
X-Mailer: git-send-email 2.15.1
In-Reply-To: <20180205190510.5499-1-paolo.valente@linaro.org>
References: <20180205190510.5499-1-paolo.valente@linaro.org>
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
be re-inserted into the active I/O scheduler for that device. As a
consequence, I/O schedulers may get the same request inserted again,
even several times, without a finish_request invoked on that request
before each re-insertion.

This fact is the cause of the failure reported in [1]. For an I/O
scheduler, every re-insertion of the same re-prepared request is
equivalent to the insertion of a new request. For schedulers like
mq-deadline or kyber, this fact causes no harm. In contrast, it
confuses a stateful scheduler like BFQ, which keeps state for an I/O
request, until the finish_request hook is invoked on the request. In
particular, BFQ may get stuck, waiting forever for the number of
request dispatches, of the same request, to be balanced by an equal
number of request completions (while there will be one completion for
that request). In this state, BFQ may refuse to serve I/O requests
from other bfq_queues. The hang reported in [1] then follows.

However, the above re-prepared requests undergo a requeue, thus the
requeue_request hook of the active elevator is invoked for these
requests, if set. This commit then addresses the above issue by
properly implementing the hook requeue_request in BFQ.

[1] https://marc.info/?l=linux-block&m=151211117608676

Reported-by: Ivan Kozik <ivan@ludios.org>
Reported-by: Alban Browaeys <alban.browaeys@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Serena Ziviani <ziviani.serena@gmail.com>
---
 block/bfq-iosched.c | 101 +++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 76 insertions(+), 25 deletions(-)

-- 
2.15.1
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 47e6ec7427c4..343452488515 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -3823,24 +3823,26 @@ static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
 		}
 
 		/*
-		 * We exploit the bfq_finish_request hook to decrement
-		 * rq_in_driver, but bfq_finish_request will not be
-		 * invoked on this request. So, to avoid unbalance,
-		 * just start this request, without incrementing
-		 * rq_in_driver. As a negative consequence,
-		 * rq_in_driver is deceptively lower than it should be
-		 * while this request is in service. This may cause
-		 * bfq_schedule_dispatch to be invoked uselessly.
+		 * We exploit the bfq_finish_requeue_request hook to
+		 * decrement rq_in_driver, but
+		 * bfq_finish_requeue_request will not be invoked on
+		 * this request. So, to avoid unbalance, just start
+		 * this request, without incrementing rq_in_driver. As
+		 * a negative consequence, rq_in_driver is deceptively
+		 * lower than it should be while this request is in
+		 * service. This may cause bfq_schedule_dispatch to be
+		 * invoked uselessly.
 		 *
 		 * As for implementing an exact solution, the
-		 * bfq_finish_request hook, if defined, is probably
-		 * invoked also on this request. So, by exploiting
-		 * this hook, we could 1) increment rq_in_driver here,
-		 * and 2) decrement it in bfq_finish_request. Such a
-		 * solution would let the value of the counter be
-		 * always accurate, but it would entail using an extra
-		 * interface function. This cost seems higher than the
-		 * benefit, being the frequency of non-elevator-private
+		 * bfq_finish_requeue_request hook, if defined, is
+		 * probably invoked also on this request. So, by
+		 * exploiting this hook, we could 1) increment
+		 * rq_in_driver here, and 2) decrement it in
+		 * bfq_finish_requeue_request. Such a solution would
+		 * let the value of the counter be always accurate,
+		 * but it would entail using an extra interface
+		 * function. This cost seems higher than the benefit,
+		 * being the frequency of non-elevator-private
 		 * requests very low.
 		 */
 		goto start_rq;
@@ -4515,6 +4517,8 @@ static inline void bfq_update_insert_stats(struct request_queue *q,
 					   unsigned int cmd_flags) {}
 #endif
 
+static void bfq_prepare_request(struct request *rq, struct bio *bio);
+
 static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 			       bool at_head)
 {
@@ -4541,6 +4545,20 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 		else
 			list_add_tail(&rq->queuelist, &bfqd->dispatch);
 	} else {
+		if (!bfqq) {
+			/*
+			 * This should never happen. Most likely rq is
+			 * a requeued regular request, being
+			 * re-inserted without being first
+			 * re-prepared. Do a prepare, to avoid
+			 * failure.
+			 */
+			pr_warn("Regular request associated with no queue");
+			WARN_ON(1);
+			bfq_prepare_request(rq, rq->bio);
+			bfqq = RQ_BFQQ(rq);
+		}
+
 		idle_timer_disabled = __bfq_insert_request(bfqd, rq);
 		/*
 		 * Update bfqq, because, if a queue merge has occurred
@@ -4697,22 +4715,36 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd)
 		bfq_schedule_dispatch(bfqd);
 }
 
-static void bfq_finish_request_body(struct bfq_queue *bfqq)
+static void bfq_finish_requeue_request_body(struct bfq_queue *bfqq)
 {
 	bfqq->allocated--;
 
 	bfq_put_queue(bfqq);
 }
 
-static void bfq_finish_request(struct request *rq)
+/*
+ * Handle either a requeue or a finish for rq. The things to do are
+ * the same in both cases: all references to rq are to be dropped. In
+ * particular, rq is considered completed from the point of view of
+ * the scheduler.
+ */
+static void bfq_finish_requeue_request(struct request *rq)
 {
-	struct bfq_queue *bfqq;
+	struct bfq_queue *bfqq = RQ_BFQQ(rq);
 	struct bfq_data *bfqd;
 
-	if (!rq->elv.icq)
+	/*
+	 * The following checks makes this function exit in case of
+	 * spurious invocations, for which there is nothing to do. One
+	 * example is if rq is an already requeued request, which has
+	 * not (yet) been re-inserted into a bfq_queue. In fact,
+	 * requeue and finish hooks of elevators are invoked in blk-mq
+	 * without checking whether the involved request is actually
+	 * still referenced in the scheduler.
+	 */
+	if (!rq->elv.icq || !bfqq)
 		return;
 
-	bfqq = RQ_BFQQ(rq);
 	bfqd = bfqq->bfqd;
 
 	if (rq->rq_flags & RQF_STARTED)
@@ -4727,13 +4759,14 @@ static void bfq_finish_request(struct request *rq)
 		spin_lock_irqsave(&bfqd->lock, flags);
 
 		bfq_completed_request(bfqq, bfqd);
-		bfq_finish_request_body(bfqq);
+		bfq_finish_requeue_request_body(bfqq);
 
 		spin_unlock_irqrestore(&bfqd->lock, flags);
 	} else {
 		/*
 		 * Request rq may be still/already in the scheduler,
-		 * in which case we need to remove it. And we cannot
+		 * in which case we need to remove it (this should
+		 * never happen in case of requeue). And we cannot
 		 * defer such a check and removal, to avoid
 		 * inconsistencies in the time interval from the end
 		 * of this function to the start of the deferred work.
@@ -4748,9 +4781,26 @@ static void bfq_finish_request(struct request *rq)
 			bfqg_stats_update_io_remove(bfqq_group(bfqq),
 						    rq->cmd_flags);
 		}
-		bfq_finish_request_body(bfqq);
+		bfq_finish_requeue_request_body(bfqq);
 	}
 
+	/*
+	 * Reset private fields. In case of a requeue, this allows
+	 * this function to correctly do nothing if it is spuriously
+	 * invoked again on this same request (see the check at the
+	 * beginning of the function). Probably, a better general
+	 * design would be to prevent blk-mq from invoking the requeue
+	 * or finish hooks of an elevator, for a request that is not
+	 * referred by that elevator.
+	 *
+	 * Resetting the following fields would break the
+	 * request-insertion logic if rq is re-inserted into a bfq
+	 * internal queue, without a re-preparation. Here we assume
+	 * that re-insertions of requeued requests, without
+	 * re-preparation, can happen only for pass_through or at_head
+	 * requests (which are not re-inserted into bfq internal
+	 * queues).
+	 */
 	rq->elv.priv[0] = NULL;
 	rq->elv.priv[1] = NULL;
 }
@@ -5426,7 +5476,8 @@ static struct elevator_type iosched_bfq_mq = {
 	.ops.mq = {
 		.limit_depth		= bfq_limit_depth,
 		.prepare_request	= bfq_prepare_request,
-		.finish_request		= bfq_finish_request,
+		.requeue_request        = bfq_finish_requeue_request,
+		.finish_request		= bfq_finish_requeue_request,
 		.exit_icq		= bfq_exit_icq,
 		.insert_requests	= bfq_insert_requests,
 		.dispatch_request	= bfq_dispatch_request,