diff mbox series

[IMPROVEMENT] block, bfq: increase threshold to deem I/O as random

Message ID 20171220162736.5067-1-paolo.valente@linaro.org
State Accepted
Commit f0ba5ea2fe45c0ad24a7dedae84a97f7aa046494
Headers show
Series [IMPROVEMENT] block, bfq: increase threshold to deem I/O as random | expand

Commit Message

Paolo Valente Dec. 20, 2017, 4:27 p.m. UTC
If two processes do I/O close to each other, i.e., are cooperating
processes in BFQ (and CFQ'S) nomenclature, then BFQ merges their
associated bfq_queues, so as to get sequential I/O from the union of
the I/O requests of the processes, and thus reach a higher
throughput. A merged queue is then split if its I/O stops being
sequential. In this respect, BFQ deems the I/O of a bfq_queue as
(mostly) sequential only if less than 4 I/O requests are random, out
of the last 32 requests inserted into the queue.

Unfortunately, extensive testing (with the interleaved_io benchmark of
the S suite [1], and with real applications spawning cooperating
processes) has clearly shown that, with such a low threshold, only a
rather low I/O throughput may be reached when several cooperating
processes do I/O. In particular, the outcome of each test run was
bimodal: if queue merging occurred and was stable during the test,
then the throughput was close to the peak rate of the storage device,
otherwise the throughput was arbitrarily low (usually around 1/10 of
the peak rate with a rotational device). The probability to get the
unlucky outcomes grew with the number of cooperating processes: it was
already significant with 5 processes, and close to one with 7 or more
processes.

The cause of the low throughput in the unlucky runs was that the
merged queues containing the I/O of these cooperating processes were
soon split, because they contained more random I/O requests than those
tolerated by the 4/32 threshold, but
- that I/O would have however allowed the storage device to reach
  peak throughput or almost peak throughput;
- in contrast, the I/O of these processes, if served individually
  (from separate queues) yielded a rather low throughput.

So we repeated our tests with increasing values of the threshold,
until we found the minimum value (19) for which we obtained maximum
throughput, reliably, with at least up to 9 cooperating
processes. Then we checked that the use of that higher threshold value
did not cause any regression for any other benchmark in the suite [1].
This commit raises the threshold to such a higher value.

[1] https://github.com/Algodev-github/S

Signed-off-by: Angelo Ruocco <angeloruocco90@gmail.com>

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>

---
 block/bfq-iosched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.15.1

Comments

Jens Axboe Jan. 5, 2018, 4:24 p.m. UTC | #1
On 12/20/17 9:27 AM, Paolo Valente wrote:
> If two processes do I/O close to each other, i.e., are cooperating

> processes in BFQ (and CFQ'S) nomenclature, then BFQ merges their

> associated bfq_queues, so as to get sequential I/O from the union of

> the I/O requests of the processes, and thus reach a higher

> throughput. A merged queue is then split if its I/O stops being

> sequential. In this respect, BFQ deems the I/O of a bfq_queue as

> (mostly) sequential only if less than 4 I/O requests are random, out

> of the last 32 requests inserted into the queue.

> 

> Unfortunately, extensive testing (with the interleaved_io benchmark of

> the S suite [1], and with real applications spawning cooperating

> processes) has clearly shown that, with such a low threshold, only a

> rather low I/O throughput may be reached when several cooperating

> processes do I/O. In particular, the outcome of each test run was

> bimodal: if queue merging occurred and was stable during the test,

> then the throughput was close to the peak rate of the storage device,

> otherwise the throughput was arbitrarily low (usually around 1/10 of

> the peak rate with a rotational device). The probability to get the

> unlucky outcomes grew with the number of cooperating processes: it was

> already significant with 5 processes, and close to one with 7 or more

> processes.

> 

> The cause of the low throughput in the unlucky runs was that the

> merged queues containing the I/O of these cooperating processes were

> soon split, because they contained more random I/O requests than those

> tolerated by the 4/32 threshold, but

> - that I/O would have however allowed the storage device to reach

>   peak throughput or almost peak throughput;

> - in contrast, the I/O of these processes, if served individually

>   (from separate queues) yielded a rather low throughput.

> 

> So we repeated our tests with increasing values of the threshold,

> until we found the minimum value (19) for which we obtained maximum

> throughput, reliably, with at least up to 9 cooperating

> processes. Then we checked that the use of that higher threshold value

> did not cause any regression for any other benchmark in the suite [1].

> This commit raises the threshold to such a higher value.


Applied for 4.16, thanks.

-- 
Jens Axboe
diff mbox series

Patch

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index c66578592c9e..e33c5c4c9856 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -192,7 +192,7 @@  static struct kmem_cache *bfq_pool;
 #define BFQQ_SEEK_THR		(sector_t)(8 * 100)
 #define BFQQ_SECT_THR_NONROT	(sector_t)(2 * 32)
 #define BFQQ_CLOSE_THR		(sector_t)(8 * 1024)
-#define BFQQ_SEEKY(bfqq)	(hweight32(bfqq->seek_history) > 32/8)
+#define BFQQ_SEEKY(bfqq)	(hweight32(bfqq->seek_history) > 19)
 
 /* Min number of samples required to perform peak-rate update */
 #define BFQ_RATE_MIN_SAMPLES	32