From patchwork Thu May 31 14:45:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valente X-Patchwork-Id: 137420 Delivered-To: patch@linaro.org Received: by 2002:a2e:9706:0:0:0:0:0 with SMTP id r6-v6csp6692014lji; Thu, 31 May 2018 07:46:10 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJEiLIqrIDHDPg2EVqnGiYBRFBE9+rHBbpl03x/Q6mdVKVXxf3RbOgsrFrfbDuUVpO7R5S6 X-Received: by 2002:a17:902:1a8:: with SMTP id b37-v6mr7475251plb.326.1527777970140; Thu, 31 May 2018 07:46:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527777970; cv=none; d=google.com; s=arc-20160816; b=aU3qsMgmfBmJ0PmYzgQ9cttsVs3Lti3Zg76ObSbfu9HgqdzV1arkESP7qCAZf/4QSk rz6rXK/AWbHx13py9i3quRlOHydGzzuA6/QDMoNtJTKxxBaMYtBb6RY/loG/UYj8tOKc cMi/sVphkOpdCBQpg9hgckupI+yR0udxRTclxNlpJqub8MjQYGmA//T6goWrkmJO0+/M kIdU3miYBuSOQJN6K5E1iWcO3tmuSps/5b5AqWUBUUw8wSv9WYIBgnpzRwPr6pI7oaaD FCfAYgkE58Xj/Xo3YpKEX/uyjtNsJE99c2cAuuoRmSCXxf944199tGRj4n5hh0cv2l7A OF5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=sgT80lJDZuXrK3Hc6RP6kg3Wz7JpN74EdKgIT7FqXKk=; b=VVBfvNRNkvgedH7R0WWtN67UQJuQCtJe6RRtv9NTpZo2jiHUB4iyZbbtLEd55+yaan yYmdGLl4xd9aErhHTVXiPhnz/Cu1p4zMhTdYcfWwujEJMYY9cEDV2Fpu4RkdB28MOvZ8 UaYPJtgQ0wt6+tnI9axseG9yoCv+JtJAQzf6NMpRiefTAifwsaXy12myLPqrnQwDsFFp sOrTFOtp0i58JdXP8yvKCW6eoJ2Ul0ALjrHyqPPhCMcua0PHHINgp5vVZy0OeB/NCuzH +MCO7zNeYePZw5KIZZ0PcMcvskDsKsoEfGs6oyLcaZr26VilCpO8jTIYViQXVihTAn69 qg/A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=acjpMXmn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p91-v6si37495154plb.457.2018.05.31.07.46.09; Thu, 31 May 2018 07:46:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=acjpMXmn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755502AbeEaOqI (ORCPT + 30 others); Thu, 31 May 2018 10:46:08 -0400 Received: from mail-wr0-f193.google.com ([209.85.128.193]:35229 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755437AbeEaOpb (ORCPT ); Thu, 31 May 2018 10:45:31 -0400 Received: by mail-wr0-f193.google.com with SMTP id i14-v6so33275032wre.2 for ; Thu, 31 May 2018 07:45:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=sgT80lJDZuXrK3Hc6RP6kg3Wz7JpN74EdKgIT7FqXKk=; b=acjpMXmn/fD3qV4XI9Pc9WKQWJUrTmFAprlauhugMyAgcpm/XYUY70Smb7jjktU0g0 0xugg4/Z8Y7MgQqTjzaJaDQ0mH49PxbaygVyMpVv+3BChD8d/NHB5eWqjvdXrJ/cVBzz cS2u6dOU1a76nxR623BZr9lBQmv3jv3nC+4zw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=sgT80lJDZuXrK3Hc6RP6kg3Wz7JpN74EdKgIT7FqXKk=; b=lGN/9BOHl9/LOPj0Uub2hnj5sFS80lUhviM14kCIaAImPxT6x7sIg6hSP3H2xniRmd QINluIRJEtHPwZqhO6Y9ymIxodaWWYPwd1DwMCHYvcBJ33OXs1CRxg10to6kieVXLSq+ cQck9MDviJdIUxxNbxC+If4DkmE4NK3PxjsVlNaKLXElId2YkBM/v+E+5q8WrFwcY8Lz ivqE31NNaWY8B+5Yh/W4h6Eh5vF61Hxr3MUnqbRdnSg90czL63PWqF01DZ2riCrc91N+ XS7o9wM+pRHbT9Oq6pL8Q1513PrvWJzIUKtEv8tS8T+UvyOae26KjokmN/e2A9N89828 FYCA== X-Gm-Message-State: ALKqPwfR0e7MziB63e91Pf6qCci8yAVdZBeaFiOBMkkeCD8bsvo3yN/e 3brVt0ETOPkRD2B09hZr5nAGCQ== X-Received: by 2002:adf:b053:: with SMTP id g19-v6mr5578780wra.128.1527777930297; Thu, 31 May 2018 07:45:30 -0700 (PDT) Received: from localhost.localdomain (146-241-12-84.dyn.eolo.it. [146.241.12.84]) by smtp.gmail.com with ESMTPSA id y45-v6sm36106869wrd.97.2018.05.31.07.45.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 31 May 2018 07:45:29 -0700 (PDT) From: Paolo Valente To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, broonie@kernel.org, linus.walleij@linaro.org, bfq-iosched@googlegroups.com, oleksandr@natalenko.name, sapienza.dav@gmail.com, 177992@studenti.unimore.it, Paolo Valente Subject: [PATCH BUGFIX/IMPROVEMENTS 2/4] block, bfq: remove slow-system class Date: Thu, 31 May 2018 16:45:06 +0200 Message-Id: <20180531144508.3927-3-paolo.valente@linaro.org> X-Mailer: git-send-email 2.16.1 In-Reply-To: <20180531144508.3927-1-paolo.valente@linaro.org> References: <20180531144508.3927-1-paolo.valente@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org BFQ computes the duration of weight raising for interactive applications automatically, using some reference parameters. In particular, BFQ uses the best durations (see comments in the code for how these durations have been assessed) for two classes of systems: slow and fast ones. Examples of slow systems are old phones or systems using micro HDDs. Fast systems are all the remaining ones. Using these parameters, BFQ computes the actual duration of the weight raising, for the system at hand, as a function of the relative speed of the system w.r.t. the speed of a reference system, belonging to the same class of systems as the system at hand. This slow vs fast differentiation proved to be useful in the past, but happens to have little meaning with current hardware. Even worse, it does cause problems in virtual systems, where the speed of the system can vary frequently, and so widely to just confuse the class-detection mechanism, and, as we have verified experimentally, to cause BFQ to compute non-sensical weight-raising durations. This commit addresses this issue by removing the slow class and the class-detection mechanism. Signed-off-by: Paolo Valente --- block/bfq-iosched.c | 137 ++++++++++++++++------------------------------------ block/bfq-iosched.h | 14 ++---- 2 files changed, 46 insertions(+), 105 deletions(-) -- 2.16.1 diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index f3703e7431aa..262c929e24ee 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -251,55 +251,43 @@ static struct kmem_cache *bfq_pool; * When configured for computing the duration of the weight-raising * for interactive queues automatically (see the comments at the * beginning of this file), BFQ does it using the following formula: - * duration = (R / r) * T, - * where r is the peak rate of the device, and R - * and T are two reference parameters. In particular, - * R is the peak rate of the reference device (see below), and - * T is a reference time: given the systems that are likely - * to be installed on the reference device according to its speed - * class, T is about the maximum time needed, under BFQ and - * while reading two files in parallel, to load typical large - * applications on these systems (see the comments on - * max_service_from_wr below, for more details on how T is - * obtained). In practice, the slower/faster the device at hand is, - * the more/less it takes to load applications with respect to the + * duration = (ref_rate / r) * ref_wr_duration, + * where r is the peak rate of the device, and ref_rate and + * ref_wr_duration are two reference parameters. In particular, + * ref_rate is the peak rate of the reference storage device (see + * below), and ref_wr_duration is about the maximum time needed, with + * BFQ and while reading two files in parallel, to load typical large + * applications on the reference device (see the comments on + * max_service_from_wr below, for more details on how ref_wr_duration + * is obtained). In practice, the slower/faster the device at hand + * is, the more/less it takes to load applications with respect to the * reference device. Accordingly, the longer/shorter BFQ grants * weight raising to interactive applications. * - * BFQ uses four different reference pairs (R, T), depending on: - * . whether the device is rotational or non-rotational; - * . whether the device is slow, such as old or portable HDDs, as well as - * SD cards, or fast, such as newer HDDs and SSDs. + * BFQ uses two different reference pairs (ref_rate, ref_wr_duration), + * depending on whether the device is rotational or non-rotational. * - * The device's speed class is dynamically (re)detected in - * bfq_update_peak_rate() every time the estimated peak rate is updated. + * In the following definitions, ref_rate[0] and ref_wr_duration[0] + * are the reference values for a rotational device, whereas + * ref_rate[1] and ref_wr_duration[1] are the reference values for a + * non-rotational device. The reference rates are not the actual peak + * rates of the devices used as a reference, but slightly lower + * values. The reason for using slightly lower values is that the + * peak-rate estimator tends to yield slightly lower values than the + * actual peak rate (it can yield the actual peak rate only if there + * is only one process doing I/O, and the process does sequential + * I/O). * - * In the following definitions, R_slow[0]/R_fast[0] and - * T_slow[0]/T_fast[0] are the reference values for a slow/fast - * rotational device, whereas R_slow[1]/R_fast[1] and - * T_slow[1]/T_fast[1] are the reference values for a slow/fast - * non-rotational device. Finally, device_speed_thresh are the - * thresholds used to switch between speed classes. The reference - * rates are not the actual peak rates of the devices used as a - * reference, but slightly lower values. The reason for using these - * slightly lower values is that the peak-rate estimator tends to - * yield slightly lower values than the actual peak rate (it can yield - * the actual peak rate only if there is only one process doing I/O, - * and the process does sequential I/O). - * - * Both the reference peak rates and the thresholds are measured in - * sectors/usec, left-shifted by BFQ_RATE_SHIFT. + * The reference peak rates are measured in sectors/usec, left-shifted + * by BFQ_RATE_SHIFT. */ -static int R_slow[2] = {1000, 10700}; -static int R_fast[2] = {14000, 33000}; +static int ref_rate[2] = {14000, 33000}; /* - * To improve readability, a conversion function is used to initialize the - * following arrays, which entails that they can be initialized only in a - * function. + * To improve readability, a conversion function is used to initialize + * the following array, which entails that the array can be + * initialized only in a function. */ -static int T_slow[2]; -static int T_fast[2]; -static int device_speed_thresh[2]; +static int ref_wr_duration[2]; /* * BFQ uses the above-detailed, time-based weight-raising mechanism to @@ -938,7 +926,7 @@ static unsigned int bfq_wr_duration(struct bfq_data *bfqd) if (bfqd->bfq_wr_max_time > 0) return bfqd->bfq_wr_max_time; - dur = bfqd->RT_prod; + dur = bfqd->rate_dur_prod; do_div(dur, bfqd->peak_rate); /* @@ -2543,37 +2531,15 @@ static unsigned long bfq_calc_max_budget(struct bfq_data *bfqd) /* * Update parameters related to throughput and responsiveness, as a * function of the estimated peak rate. See comments on - * bfq_calc_max_budget(), and on T_slow and T_fast arrays. + * bfq_calc_max_budget(), and on the ref_wr_duration array. */ static void update_thr_responsiveness_params(struct bfq_data *bfqd) { - int dev_type = blk_queue_nonrot(bfqd->queue); - - if (bfqd->bfq_user_max_budget == 0) + if (bfqd->bfq_user_max_budget == 0) { bfqd->bfq_max_budget = bfq_calc_max_budget(bfqd); - - if (bfqd->device_speed == BFQ_BFQD_FAST && - bfqd->peak_rate < device_speed_thresh[dev_type]) { - bfqd->device_speed = BFQ_BFQD_SLOW; - bfqd->RT_prod = R_slow[dev_type] * - T_slow[dev_type]; - } else if (bfqd->device_speed == BFQ_BFQD_SLOW && - bfqd->peak_rate > device_speed_thresh[dev_type]) { - bfqd->device_speed = BFQ_BFQD_FAST; - bfqd->RT_prod = R_fast[dev_type] * - T_fast[dev_type]; + bfq_log(bfqd, "new max_budget = %d", bfqd->bfq_max_budget); } - - bfq_log(bfqd, -"dev_type %s dev_speed_class = %s (%llu sects/sec), thresh %llu setcs/sec", - dev_type == 0 ? "ROT" : "NONROT", - bfqd->device_speed == BFQ_BFQD_FAST ? "FAST" : "SLOW", - bfqd->device_speed == BFQ_BFQD_FAST ? - (USEC_PER_SEC*(u64)R_fast[dev_type])>>BFQ_RATE_SHIFT : - (USEC_PER_SEC*(u64)R_slow[dev_type])>>BFQ_RATE_SHIFT, - (USEC_PER_SEC*(u64)device_speed_thresh[dev_type])>> - BFQ_RATE_SHIFT); } static void bfq_reset_rate_computation(struct bfq_data *bfqd, @@ -5279,14 +5245,12 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e) bfqd->wr_busy_queues = 0; /* - * Begin by assuming, optimistically, that the device is a - * high-speed one, and that its peak rate is equal to 2/3 of - * the highest reference rate. + * Begin by assuming, optimistically, that the device peak + * rate is equal to 2/3 of the highest reference rate. */ - bfqd->RT_prod = R_fast[blk_queue_nonrot(bfqd->queue)] * - T_fast[blk_queue_nonrot(bfqd->queue)]; - bfqd->peak_rate = R_fast[blk_queue_nonrot(bfqd->queue)] * 2 / 3; - bfqd->device_speed = BFQ_BFQD_FAST; + bfqd->rate_dur_prod = ref_rate[blk_queue_nonrot(bfqd->queue)] * + ref_wr_duration[blk_queue_nonrot(bfqd->queue)]; + bfqd->peak_rate = ref_rate[blk_queue_nonrot(bfqd->queue)] * 2 / 3; spin_lock_init(&bfqd->lock); @@ -5593,8 +5557,8 @@ static int __init bfq_init(void) /* * Times to load large popular applications for the typical * systems installed on the reference devices (see the - * comments before the definitions of the next two - * arrays). Actually, we use slightly slower values, as the + * comments before the definition of the next + * array). Actually, we use slightly lower values, as the * estimated peak rate tends to be smaller than the actual * peak rate. The reason for this last fact is that estimates * are computed over much shorter time intervals than the long @@ -5603,25 +5567,8 @@ static int __init bfq_init(void) * scheduler cannot rely on a peak-rate-evaluation workload to * be run for a long time. */ - T_slow[0] = msecs_to_jiffies(3500); /* actually 4 sec */ - T_slow[1] = msecs_to_jiffies(6000); /* actually 6.5 sec */ - T_fast[0] = msecs_to_jiffies(7000); /* actually 8 sec */ - T_fast[1] = msecs_to_jiffies(2500); /* actually 3 sec */ - - /* - * Thresholds that determine the switch between speed classes - * (see the comments before the definition of the array - * device_speed_thresh). These thresholds are biased towards - * transitions to the fast class. This is safer than the - * opposite bias. In fact, a wrong transition to the slow - * class results in short weight-raising periods, because the - * speed of the device then tends to be higher that the - * reference peak rate. On the opposite end, a wrong - * transition to the fast class tends to increase - * weight-raising periods, because of the opposite reason. - */ - device_speed_thresh[0] = (4 * R_slow[0]) / 3; - device_speed_thresh[1] = (4 * R_slow[1]) / 3; + ref_wr_duration[0] = msecs_to_jiffies(7000); /* actually 8 sec */ + ref_wr_duration[1] = msecs_to_jiffies(2500); /* actually 3 sec */ ret = elv_register(&iosched_bfq_mq); if (ret) diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index ae2f3dadec44..b5678cc8cfa1 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -399,11 +399,6 @@ struct bfq_io_cq { struct bfq_ttime saved_ttime; }; -enum bfq_device_speed { - BFQ_BFQD_FAST, - BFQ_BFQD_SLOW, -}; - /** * struct bfq_data - per-device data structure. * @@ -611,12 +606,11 @@ struct bfq_data { /* Max service-rate for a soft real-time queue, in sectors/sec */ unsigned int bfq_wr_max_softrt_rate; /* - * Cached value of the product R*T, used for computing the - * maximum duration of weight raising automatically. + * Cached value of the product ref_rate*ref_wr_duration, used + * for computing the maximum duration of weight raising + * automatically. */ - u64 RT_prod; - /* device-speed class for the low-latency heuristic */ - enum bfq_device_speed device_speed; + u64 rate_dur_prod; /* fallback dummy bfqq for extreme OOM conditions */ struct bfq_queue oom_bfqq;