From patchwork Sat Jul 29 10:42:56 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valente X-Patchwork-Id: 108936 Delivered-To: patch@linaro.org Received: by 10.182.45.195 with SMTP id p3csp1458926obm; Sat, 29 Jul 2017 03:43:30 -0700 (PDT) X-Received: by 10.99.156.1 with SMTP id f1mr3088552pge.327.1501325010682; Sat, 29 Jul 2017 03:43:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1501325010; cv=none; d=google.com; s=arc-20160816; b=A++biLPhTdTvE1RHKW/3LzckCDKB3rJrT+Utyvupfcu9iJQ403lPxJD7uaj73Nnc2j LRmrwW16JueuzhPWiEfS4a+cafZa9oI5M+VV3ZNvd2I4HJ/s7JkFjE1li83R7EFLGFVn im+MqUzLqzi5lAJ3MFhuZCEXPgldgpS3qyqTOcnRjYwvsBxDPPTzZH0Lezb5Es+ZjUEm chFmNQlKPlwi8bN+cc/+DDPPbIMeCpg7qqaB7/qT9OkILBP3TBHg3bSSUmLgBp/XLxJH Twqw0Exxuh7kQIFEl5UoJrZIZVYp+47Pph6F0bSn6sF7H7eYkmWYZ8ieIzHp3j+FoFtf x4Xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=lLFUZlhMUTl4JofHhY3PEVFkEp5SqHUfsa9WeP818n4=; b=UkKUS9uOlRzBxYFKPcFtGT5jDh0W/7N5b+WSyuv5RdjfoFOwoXrSMHIocNzKCKtu26 IdrM4Pxk6Jr5CtaN2LJqLU4MWgWeofN4dTRyC7cSDlTbHKxszzQsPyj3IbCZFeJoUkFt 6zXFtxarwp9NvD9vGb7uSQ7FjAym6tqgjk9GlxQkakNZp5vuANpS4cCBKDYZl3qCd9G+ SGkEYxHNtC0KAMt9LRBN0flfeVbQWzb0oB4qjxZQl/LDFg6KGfwEFlVdDpQVO9kuCdMF r9KRxR/XWpbRqF0/iAp2uL3mAErd1odRphK2vPiWRxo4WH7/B/ha3K8vBUnKsgHQlDk2 pBGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.b=kl7lErma; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b12si14236765plk.803.2017.07.29.03.43.30; Sat, 29 Jul 2017 03:43:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.b=kl7lErma; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753735AbdG2KnX (ORCPT + 26 others); Sat, 29 Jul 2017 06:43:23 -0400 Received: from mail-wr0-f182.google.com ([209.85.128.182]:38265 "EHLO mail-wr0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753709AbdG2KnV (ORCPT ); Sat, 29 Jul 2017 06:43:21 -0400 Received: by mail-wr0-f182.google.com with SMTP id f21so106457505wrf.5 for ; Sat, 29 Jul 2017 03:43:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=lLFUZlhMUTl4JofHhY3PEVFkEp5SqHUfsa9WeP818n4=; b=kl7lErmabTfyREMbEf1DvceJb07ArQqavpTY871sjzEJwhFw+4ihCJFopx72vTXS89 aGOQ6YinffrzL3wYO6rcMItjuTS+3iukKBsvxvbjpPspQBSOkEySDDVx5nahAth4Y9Vx H729LCQsH8WNkoebhHEFHgMrUMM4JYNMcuiCM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=lLFUZlhMUTl4JofHhY3PEVFkEp5SqHUfsa9WeP818n4=; b=Vy6uZkstnIlt8wXDuNUSZCSMdF2DUmC2PMpaY7QsTFae5kPqHalWDSL4mi0AHBft6G O5qlX+Hu2Dl1y1G2kJjy8eA3o9S5s4GLlyBc6UEGzIArRdmtaSvuJFajImWErQGBuSFS fOsZJQOveFSPdqInzN+sx68l0jm53zO0YaS8W3OUC6cPa+SxWAXfKK2jfBuoVZjFd0ql 6wgtVxdvoDwNnp4dH+qblnOZ+ACje+/Daok+Fkco2fq1F4VbSDIjh48QKx5TYCe+s9S/ c67bqdqyH2wfN5rR+ldHVxkz7nP46Ygoxspu0tTiHDAYpjRls8+gnr/2g7iuS54EFhbF Fagw== X-Gm-Message-State: AIVw1105wu9wFl2+UZhJgQ9Yp/CtnWl3UWUiRS0YVNbrBaOiJr4aIYkN haGumNyWgdvo3wUb X-Received: by 10.223.130.102 with SMTP id 93mr7645150wrb.253.1501324999565; Sat, 29 Jul 2017 03:43:19 -0700 (PDT) Received: from localhost.localdomain ([185.14.10.218]) by smtp.gmail.com with ESMTPSA id s17sm5996129wma.48.2017.07.29.03.43.17 (version=TLS1 cipher=AES128-SHA bits=128/128); Sat, 29 Jul 2017 03:43:18 -0700 (PDT) From: Paolo Valente To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, broonie@kernel.org, lnicola@dend.ro, bfq-sched@lists.ewheeler.net, rick_yiu@htc.com, tom81094@gmail.com, Paolo Valente Subject: [PATCH BUGFIX] block, bfq: consider also in_service_entity to state whether an entity is active Date: Sat, 29 Jul 2017 12:42:56 +0200 Message-Id: <20170729104256.2002-1-paolo.valente@linaro.org> X-Mailer: git-send-email 2.10.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Groups of BFQ queues are represented by generic entities in BFQ. When a queue belonging to a parent entity is deactivated, the parent entity may need to be deactivated too, in case the deactivated queue was the only active queue for the parent entity. This deactivation may need to be propagated upwards if the entity belongs, in its turn, to a further higher-level entity, and so on. In particular, the upward propagation of deactivation stops at the first parent entity that remains active even if one of its child entities has been deactivated. To decide whether the last non-deactivation condition holds for a parent entity, BFQ checks whether the field next_in_service is still not NULL for the parent entity, after the deactivation of one of its child entity. If it is not NULL, then there are certainly other active entities in the parent entity, and deactivations can stop. Unfortunately, this check misses a corner case: if in_service_entity is not NULL, then next_in_service may happen to be NULL, although the parent entity is evidently active. This happens if: 1) the entity pointed by in_service_entity is the only active entity in the parent entity, and 2) according to the definition of next_in_service, the in_service_entity cannot be considered as next_in_service. See the comments on the definition of next_in_service for details on this second point. Hitting the above corner case causes crashes. To address this issue, this commit: 1) Extends the above check on only next_in_service to controlling both next_in_service and in_service_entity (if any of them is not NULL, then no further deactivation is performed) 2) Improves the (important) comments on how next_in_service is defined and updated; in particular it fixes a few rather obscure paragraphs Reported-by: Eric Wheeler Reported-by: Rick Yiu Reported-by: Tom X Nguyen Signed-off-by: Paolo Valente Tested-by: Eric Wheeler Tested-by: Rick Yiu Tested-by: Laurentiu Nicola Tested-by: Tom X Nguyen --- block/bfq-iosched.h | 22 ++++++-- block/bfq-wf2q.c | 142 +++++++++++++++++++++++++++++----------------------- 2 files changed, 95 insertions(+), 69 deletions(-) -- 2.10.0 diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index 63e771a..859f0a8 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -71,17 +71,29 @@ struct bfq_service_tree { * * bfq_sched_data is the basic scheduler queue. It supports three * ioprio_classes, and can be used either as a toplevel queue or as an - * intermediate queue on a hierarchical setup. @next_in_service - * points to the active entity of the sched_data service trees that - * will be scheduled next. It is used to reduce the number of steps - * needed for each hierarchical-schedule update. + * intermediate queue in a hierarchical setup. * * The supported ioprio_classes are the same as in CFQ, in descending * priority order, IOPRIO_CLASS_RT, IOPRIO_CLASS_BE, IOPRIO_CLASS_IDLE. * Requests from higher priority queues are served before all the * requests from lower priority queues; among requests of the same * queue requests are served according to B-WF2Q+. - * All the fields are protected by the queue lock of the containing bfqd. + * + * The schedule is implemented by the service trees, plus the field + * @next_in_service, which points to the entity on the active trees + * that will be served next, if 1) no changes in the schedule occurs + * before the current in-service entity is expired, 2) the in-service + * queue becomes idle when it expires, and 3) if the entity pointed by + * in_service_entity is not a queue, then the in-service child entity + * of the entity pointed by in_service_entity becomes idle on + * expiration. This peculiar definition allows for the following + * optimization, not yet exploited: while a given entity is still in + * service, we already know which is the best candidate for next + * service among the other active entitities in the same parent + * entity. We can then quickly compare the timestamps of the + * in-service entity with those of such best candidate. + * + * All fields are protected by the lock of the containing bfqd. */ struct bfq_sched_data { /* entity in service */ diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c index 881bbe5..911aa74 100644 --- a/block/bfq-wf2q.c +++ b/block/bfq-wf2q.c @@ -188,21 +188,23 @@ static bool bfq_update_parent_budget(struct bfq_entity *next_in_service) /* * This function tells whether entity stops being a candidate for next - * service, according to the following logic. + * service, according to the restrictive definition of the field + * next_in_service. In particular, this function is invoked for an + * entity that is about to be set in service. * - * This function is invoked for an entity that is about to be set in - * service. If such an entity is a queue, then the entity is no longer - * a candidate for next service (i.e, a candidate entity to serve - * after the in-service entity is expired). The function then returns - * true. + * If entity is a queue, then the entity is no longer a candidate for + * next service according to the that definition, because entity is + * about to become the in-service queue. This function then returns + * true if entity is a queue. * - * In contrast, the entity could stil be a candidate for next service - * if it is not a queue, and has more than one child. In fact, even if - * one of its children is about to be set in service, other children - * may still be the next to serve. As a consequence, a non-queue - * entity is not a candidate for next-service only if it has only one - * child. And only if this condition holds, then the function returns - * true for a non-queue entity. + * In contrast, entity could still be a candidate for next service if + * it is not a queue, and has more than one active child. In fact, + * even if one of its children is about to be set in service, other + * active children may still be the next to serve, for the parent + * entity, even according to the above definition. As a consequence, a + * non-queue entity is not a candidate for next-service only if it has + * only one active child. And only if this condition holds, then this + * function returns true for a non-queue entity. */ static bool bfq_no_longer_next_in_service(struct bfq_entity *entity) { @@ -213,6 +215,18 @@ static bool bfq_no_longer_next_in_service(struct bfq_entity *entity) bfqg = container_of(entity, struct bfq_group, entity); + /* + * The field active_entities does not always contain the + * actual number of active children entities: it happens to + * not account for the in-service entity in case the latter is + * removed from its active tree (which may get done after + * invoking the function bfq_no_longer_next_in_service in + * bfq_get_next_queue). Fortunately, here, i.e., while + * bfq_no_longer_next_in_service is not yet completed in + * bfq_get_next_queue, bfq_active_extract has not yet been + * invoked, and thus active_entities still coincides with the + * actual number of active entities. + */ if (bfqg->active_entities == 1) return true; @@ -954,7 +968,7 @@ static void bfq_update_fin_time_enqueue(struct bfq_entity *entity, * one of its children receives a new request. * * Basically, this function updates the timestamps of entity and - * inserts entity into its active tree, ater possible extracting it + * inserts entity into its active tree, ater possibly extracting it * from its idle tree. */ static void __bfq_activate_entity(struct bfq_entity *entity, @@ -1048,7 +1062,7 @@ static void __bfq_requeue_entity(struct bfq_entity *entity) entity->start = entity->finish; /* * In addition, if the entity had more than one child - * when set in service, then was not extracted from + * when set in service, then it was not extracted from * the active tree. This implies that the position of * the entity in the active tree may need to be * changed now, because we have just updated the start @@ -1056,9 +1070,8 @@ static void __bfq_requeue_entity(struct bfq_entity *entity) * time in a moment (the requeueing is then, more * precisely, a repositioning in this case). To * implement this repositioning, we: 1) dequeue the - * entity here, 2) update the finish time and - * requeue the entity according to the new - * timestamps below. + * entity here, 2) update the finish time and requeue + * the entity according to the new timestamps below. */ if (entity->tree) bfq_active_extract(st, entity); @@ -1105,9 +1118,10 @@ static void __bfq_activate_requeue_entity(struct bfq_entity *entity, /** - * bfq_activate_entity - activate or requeue an entity representing a bfq_queue, - * and activate, requeue or reposition all ancestors - * for which such an update becomes necessary. + * bfq_activate_requeue_entity - activate or requeue an entity representing a + * bfq_queue, and activate, requeue or reposition + * all ancestors for which such an update becomes + * necessary. * @entity: the entity to activate. * @non_blocking_wait_rq: true if this entity was waiting for a request * @requeue: true if this is a requeue, which implies that bfqq is @@ -1135,9 +1149,9 @@ static void bfq_activate_requeue_entity(struct bfq_entity *entity, * @ins_into_idle_tree: if false, the entity will not be put into the * idle tree. * - * Deactivates an entity, independently from its previous state. Must + * Deactivates an entity, independently of its previous state. Must * be invoked only if entity is on a service tree. Extracts the entity - * from that tree, and if necessary and allowed, puts it on the idle + * from that tree, and if necessary and allowed, puts it into the idle * tree. */ bool __bfq_deactivate_entity(struct bfq_entity *entity, bool ins_into_idle_tree) @@ -1179,7 +1193,7 @@ bool __bfq_deactivate_entity(struct bfq_entity *entity, bool ins_into_idle_tree) /** * bfq_deactivate_entity - deactivate an entity representing a bfq_queue. * @entity: the entity to deactivate. - * @ins_into_idle_tree: true if the entity can be put on the idle tree + * @ins_into_idle_tree: true if the entity can be put into the idle tree */ static void bfq_deactivate_entity(struct bfq_entity *entity, bool ins_into_idle_tree, @@ -1210,16 +1224,29 @@ static void bfq_deactivate_entity(struct bfq_entity *entity, */ bfq_update_next_in_service(sd, NULL); - if (sd->next_in_service) + if (sd->next_in_service || sd->in_service_entity) { /* - * The parent entity is still backlogged, - * because next_in_service is not NULL. So, no - * further upwards deactivation must be - * performed. Yet, next_in_service has - * changed. Then the schedule does need to be - * updated upwards. + * The parent entity is still active, because + * either next_in_service or in_service_entity + * is not NULL. So, no further upwards + * deactivation must be performed. Yet, + * next_in_service has changed. Then the + * schedule does need to be updated upwards. + * + * NOTE If in_service_entity is not NULL, then + * next_in_service may happen to be NULL, + * although the parent entity is evidently + * active. This happens if 1) the entity + * pointed by in_service_entity is the only + * active entity in the parent entity, and 2) + * according to the definition of + * next_in_service, the in_service_entity + * cannot be considered as + * next_in_service. See the comments on the + * definition of next_in_service for details. */ break; + } /* * If we get here, then the parent is no more @@ -1496,47 +1523,34 @@ struct bfq_queue *bfq_get_next_queue(struct bfq_data *bfqd) /* * If entity is no longer a candidate for next - * service, then we extract it from its active tree, - * for the following reason. To further boost the - * throughput in some special case, BFQ needs to know - * which is the next candidate entity to serve, while - * there is already an entity in service. In this - * respect, to make it easy to compute/update the next - * candidate entity to serve after the current - * candidate has been set in service, there is a case - * where it is necessary to extract the current - * candidate from its service tree. Such a case is - * when the entity just set in service cannot be also - * a candidate for next service. Details about when - * this conditions holds are reported in the comments - * on the function bfq_no_longer_next_in_service() - * invoked below. + * service, then it must be extracted from its active + * tree, so as to make sure that it won't be + * considered when computing next_in_service. See the + * comments on the function + * bfq_no_longer_next_in_service() for details. */ if (bfq_no_longer_next_in_service(entity)) bfq_active_extract(bfq_entity_service_tree(entity), entity); /* - * For the same reason why we may have just extracted - * entity from its active tree, we may need to update - * next_in_service for the sched_data of entity too, - * regardless of whether entity has been extracted. - * In fact, even if entity has not been extracted, a - * descendant entity may get extracted. Such an event - * would cause a change in next_in_service for the - * level of the descendant entity, and thus possibly - * back to upper levels. + * Even if entity is not to be extracted according to + * the above check, a descendant entity may get + * extracted in one of the next iterations of this + * loop. Such an event could cause a change in + * next_in_service for the level of the descendant + * entity, and thus possibly back to this level. * - * We cannot perform the resulting needed update - * before the end of this loop, because, to know which - * is the correct next-to-serve candidate entity for - * each level, we need first to find the leaf entity - * to set in service. In fact, only after we know - * which is the next-to-serve leaf entity, we can - * discover whether the parent entity of the leaf - * entity becomes the next-to-serve, and so on. + * However, we cannot perform the resulting needed + * update of next_in_service for this level before the + * end of the whole loop, because, to know which is + * the correct next-to-serve candidate entity for each + * level, we need first to find the leaf entity to set + * in service. In fact, only after we know which is + * the next-to-serve leaf entity, we can discover + * whether the parent entity of the leaf entity + * becomes the next-to-serve, and so on. */ - } bfqq = bfq_entity_to_bfqq(entity);