[BUGFIX,RFC,0/2] reverting two commits causing freezes

Message ID	20190118115219.63576-1-paolo.valente@linaro.org
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; From: Paolo Valente <paolo.valente@linaro.org> To: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, linus.walleij@linaro.org, broonie@kernel.org, bfq-iosched@googlegroups.com, oleksandr@natalenko.name, hurikhan77+bko@gmail.com, Paolo Valente <paolo.valente@linaro.org> Subject: [PATCH BUGFIX RFC 0/2] reverting two commits causing freezes Date: Fri, 18 Jan 2019 12:52:17 +0100 Message-Id: <20190118115219.63576-1-paolo.valente@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk
Series	reverting two commits causing freezes \| expand [BUGFIX,RFC,0/2] reverting two commits causing freezes [BUGFIX,RFC,1/2] Revert "bfq-iosched: remove unused variable" [BUGFIX,RFC,2/2] Revert "bfq: calculate shallow depths at init time"

Message ID

20190118115219.63576-1-paolo.valente@linaro.org

Headers

Received-SPF: pass (google.com: best guess record for domain of
	linux-kernel-owner@vger.kernel.org designates 209.132.180.67
	as permitted sender) client-ip=209.132.180.67; 
From: Paolo Valente <paolo.valente@linaro.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	ulf.hansson@linaro.org, linus.walleij@linaro.org,
	broonie@kernel.org, bfq-iosched@googlegroups.com,
	oleksandr@natalenko.name, hurikhan77+bko@gmail.com,
	Paolo Valente <paolo.valente@linaro.org>
Subject: [PATCH BUGFIX RFC 0/2] reverting two commits causing freezes
Date: Fri, 18 Jan 2019 12:52:17 +0100
Message-Id: <20190118115219.63576-1-paolo.valente@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

Series

reverting two commits causing freezes | expand

Message

Paolo Valente Jan. 18, 2019, 11:52 a.m. UTC

Hi Jens,
a user reported a warning, followed by freezes, in case he increases
nr_requests to more than 64 [1]. After reproducing the issues, I
reverted the commit f0635b8a416e ("bfq: calculate shallow depths at
init time"), plus the related commit bd7d4ef6a4c9 ("bfq-iosched:
remove unused variable"). The problem went away.

Maybe the assumption in commit f0635b8a416e ("bfq: calculate shallow
depths at init time") does not hold true?

Thanks,
Paolo

[1] https://bugzilla.kernel.org/show_bug.cgi?id=200813

Paolo Valente (2):
  Revert "bfq-iosched: remove unused variable"
  Revert "bfq: calculate shallow depths at init time"

 block/bfq-iosched.c | 116 ++++++++++++++++++++++----------------------
 block/bfq-iosched.h |   6 +++
 2 files changed, 63 insertions(+), 59 deletions(-)

--
2.20.1

Comments

Jens Axboe Jan. 18, 2019, 1:35 p.m. UTC | #1

On 1/18/19 4:52 AM, Paolo Valente wrote:
> Hi Jens,

> a user reported a warning, followed by freezes, in case he increases

> nr_requests to more than 64 [1]. After reproducing the issues, I

> reverted the commit f0635b8a416e ("bfq: calculate shallow depths at

> init time"), plus the related commit bd7d4ef6a4c9 ("bfq-iosched:

> remove unused variable"). The problem went away.


For reverts, please put the justification into the actual revert
commit. With this series, if applied as-is, we'd have two patches
in the tree that just says "revert X" without any hint as to why
that was done.

> Maybe the assumption in commit f0635b8a416e ("bfq: calculate shallow

> depths at init time") does not hold true?


It apparently doesn't! But let's try and figure this out instead of
blindly reverting it. OK, I think I see it. For the sched_tags
case, when we grow the requests, we allocate a new set. Hence any
old cache would be stale at that point.

How about something like this? It still keeps the code of having
to update this out of the hot IO path, and only calls it when we
actually change the depths.

Totally untested...


diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index cd307767a134..b09589915667 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5342,7 +5342,7 @@ static unsigned int bfq_update_depths(struct bfq_data *bfqd,
 	return min_shallow;
 }
 
-static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
+static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
 {
 	struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
 	struct blk_mq_tags *tags = hctx->sched_tags;
@@ -5350,6 +5350,11 @@ static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
 
 	min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
 	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
+}
+
+static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
+{
+	bfq_depth_updated(hctx);
 	return 0;
 }
 
@@ -5772,6 +5777,7 @@ static struct elevator_type iosched_bfq_mq = {
 		.requests_merged	= bfq_requests_merged,
 		.request_merged		= bfq_request_merged,
 		.has_work		= bfq_has_work,
+		.depth_updated		= bfq_depth_updated,
 		.init_hctx		= bfq_init_hctx,
 		.init_sched		= bfq_init_queue,
 		.exit_sched		= bfq_exit_queue,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3ba37b9e15e9..a047b297ade5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3101,6 +3101,8 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 		}
 		if (ret)
 			break;
+		if (q->elevator && q->elevator->type->ops.depth_updated)
+			q->elevator->type->ops.depth_updated(hctx);
 	}
 
 	if (!ret)
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index 2e9e2763bf47..6e8bc53740f0 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -31,6 +31,7 @@ struct elevator_mq_ops {
 	void (*exit_sched)(struct elevator_queue *);
 	int (*init_hctx)(struct blk_mq_hw_ctx *, unsigned int);
 	void (*exit_hctx)(struct blk_mq_hw_ctx *, unsigned int);
+	void (*depth_updated)(struct blk_mq_hw_ctx *);
 
 	bool (*allow_merge)(struct request_queue *, struct request *, struct bio *);
 	bool (*bio_merge)(struct blk_mq_hw_ctx *, struct bio *);

-- 
Jens Axboe

Jens Axboe Jan. 18, 2019, 4:29 p.m. UTC | #2

On 1/18/19 6:35 AM, Jens Axboe wrote:
> On 1/18/19 4:52 AM, Paolo Valente wrote:

>> Hi Jens,

>> a user reported a warning, followed by freezes, in case he increases

>> nr_requests to more than 64 [1]. After reproducing the issues, I

>> reverted the commit f0635b8a416e ("bfq: calculate shallow depths at

>> init time"), plus the related commit bd7d4ef6a4c9 ("bfq-iosched:

>> remove unused variable"). The problem went away.

> 

> For reverts, please put the justification into the actual revert

> commit. With this series, if applied as-is, we'd have two patches

> in the tree that just says "revert X" without any hint as to why

> that was done.

> 

>> Maybe the assumption in commit f0635b8a416e ("bfq: calculate shallow

>> depths at init time") does not hold true?

> 

> It apparently doesn't! But let's try and figure this out instead of

> blindly reverting it. OK, I think I see it. For the sched_tags

> case, when we grow the requests, we allocate a new set. Hence any

> old cache would be stale at that point.

> 

> How about something like this? It still keeps the code of having

> to update this out of the hot IO path, and only calls it when we

> actually change the depths.

> 

> Totally untested...


Now tested, and it seems to work for me. Note that haven't tried to
reproduce the issue, I just verified that the patch functionally
does what it should - when depths are updated, the hook is invoked
and updates the internal BFQ depth map.

-- 
Jens Axboe

Paolo Valente Jan. 18, 2019, 5:24 p.m. UTC | #3

> Il giorno 18 gen 2019, alle ore 14:35, Jens Axboe <axboe@kernel.dk> ha scritto:

> 

> On 1/18/19 4:52 AM, Paolo Valente wrote:

>> Hi Jens,

>> a user reported a warning, followed by freezes, in case he increases

>> nr_requests to more than 64 [1]. After reproducing the issues, I

>> reverted the commit f0635b8a416e ("bfq: calculate shallow depths at

>> init time"), plus the related commit bd7d4ef6a4c9 ("bfq-iosched:

>> remove unused variable"). The problem went away.

> 

> For reverts, please put the justification into the actual revert

> commit. With this series, if applied as-is, we'd have two patches

> in the tree that just says "revert X" without any hint as to why

> that was done.

> 


I forget to say explicitly that these patches were meant only to give
you and anybody else something concrete to test and check.

With me you're as safe as houses, in terms of amount of comments in
final patches :)

>> Maybe the assumption in commit f0635b8a416e ("bfq: calculate shallow

>> depths at init time") does not hold true?

> 

> It apparently doesn't! But let's try and figure this out instead of

> blindly reverting it.


Totally agree.

> OK, I think I see it. For the sched_tags

> case, when we grow the requests, we allocate a new set. Hence any

> old cache would be stale at that point.

> 


ok

> How about something like this? It still keeps the code of having

> to update this out of the hot IO path, and only calls it when we

> actually change the depths.

> 


Looks rather clean and efficient.

> Totally untested...

> 


It seems to work here too.

Thanks,
Paolo

> 

> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c

> index cd307767a134..b09589915667 100644

> --- a/block/bfq-iosched.c

> +++ b/block/bfq-iosched.c

> @@ -5342,7 +5342,7 @@ static unsigned int bfq_update_depths(struct bfq_data *bfqd,

> 	return min_shallow;

> }

> 

> -static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)

> +static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)

> {

> 	struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;

> 	struct blk_mq_tags *tags = hctx->sched_tags;

> @@ -5350,6 +5350,11 @@ static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)

> 

> 	min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);

> 	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);

> +}

> +

> +static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)

> +{

> +	bfq_depth_updated(hctx);

> 	return 0;

> }

> 

> @@ -5772,6 +5777,7 @@ static struct elevator_type iosched_bfq_mq = {

> 		.requests_merged	= bfq_requests_merged,

> 		.request_merged		= bfq_request_merged,

> 		.has_work		= bfq_has_work,

> +		.depth_updated		= bfq_depth_updated,

> 		.init_hctx		= bfq_init_hctx,

> 		.init_sched		= bfq_init_queue,

> 		.exit_sched		= bfq_exit_queue,

> diff --git a/block/blk-mq.c b/block/blk-mq.c

> index 3ba37b9e15e9..a047b297ade5 100644

> --- a/block/blk-mq.c

> +++ b/block/blk-mq.c

> @@ -3101,6 +3101,8 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)

> 		}

> 		if (ret)

> 			break;

> +		if (q->elevator && q->elevator->type->ops.depth_updated)

> +			q->elevator->type->ops.depth_updated(hctx);

> 	}

> 

> 	if (!ret)

> diff --git a/include/linux/elevator.h b/include/linux/elevator.h

> index 2e9e2763bf47..6e8bc53740f0 100644

> --- a/include/linux/elevator.h

> +++ b/include/linux/elevator.h

> @@ -31,6 +31,7 @@ struct elevator_mq_ops {

> 	void (*exit_sched)(struct elevator_queue *);

> 	int (*init_hctx)(struct blk_mq_hw_ctx *, unsigned int);

> 	void (*exit_hctx)(struct blk_mq_hw_ctx *, unsigned int);

> +	void (*depth_updated)(struct blk_mq_hw_ctx *);

> 

> 	bool (*allow_merge)(struct request_queue *, struct request *, struct bio *);

> 	bool (*bio_merge)(struct blk_mq_hw_ctx *, struct bio *);

> 

> -- 

> Jens Axboe

>

Jens Axboe Jan. 18, 2019, 5:36 p.m. UTC | #4

On 1/18/19 10:24 AM, Paolo Valente wrote:
> 

> 

>> Il giorno 18 gen 2019, alle ore 14:35, Jens Axboe <axboe@kernel.dk> ha scritto:

>>

>> On 1/18/19 4:52 AM, Paolo Valente wrote:

>>> Hi Jens,

>>> a user reported a warning, followed by freezes, in case he increases

>>> nr_requests to more than 64 [1]. After reproducing the issues, I

>>> reverted the commit f0635b8a416e ("bfq: calculate shallow depths at

>>> init time"), plus the related commit bd7d4ef6a4c9 ("bfq-iosched:

>>> remove unused variable"). The problem went away.

>>

>> For reverts, please put the justification into the actual revert

>> commit. With this series, if applied as-is, we'd have two patches

>> in the tree that just says "revert X" without any hint as to why

>> that was done.

>>

> 

> I forget to say explicitly that these patches were meant only to give

> you and anybody else something concrete to test and check.

> 

> With me you're as safe as houses, in terms of amount of comments in

> final patches :)


It's almost an example of the classic case of "if you want a real
solution to a problem, post a knowingly bad and half assed solution".
That always gets people out of the woodwork :-)

>>> Maybe the assumption in commit f0635b8a416e ("bfq: calculate shallow

>>> depths at init time") does not hold true?

>>

>> It apparently doesn't! But let's try and figure this out instead of

>> blindly reverting it.

> 

> Totally agree.

> 

>> OK, I think I see it. For the sched_tags

>> case, when we grow the requests, we allocate a new set. Hence any

>> old cache would be stale at that point.

>>

> 

> ok

> 

>> How about something like this? It still keeps the code of having

>> to update this out of the hot IO path, and only calls it when we

>> actually change the depths.

>>

> 

> Looks rather clean and efficient.

> 

>> Totally untested...

>>

> 

> It seems to work here too.


OK good, I've posted it "officially" now.
 
-- 
Jens Axboe