@@ -6442,15 +6442,25 @@ static bool bfq_bfqq_may_idle(struct bfq_queue *bfqq)
* The value of the variable is computed considering that
* idling is usually beneficial for the throughput if:
* (a) the device is not NCQ-capable, or
- * (b) regardless of the presence of NCQ, the request pattern
- * for bfqq is I/O-bound (possible throughput losses
- * caused by granting idling to seeky queues are mitigated
- * by the fact that, in all scenarios where boosting
- * throughput is the best thing to do, i.e., in all
- * symmetric scenarios, only a minimal idle time is
- * allowed to seeky queues).
+ * (b) regardless of the presence of NCQ, the device is rotational
+ * and the request pattern for bfqq is I/O-bound (possible
+ * throughput losses caused by granting idling to seeky queues
+ * are mitigated by the fact that, in all scenarios where
+ * boosting throughput is the best thing to do, i.e., in all
+ * symmetric scenarios, only a minimal idle time is allowed to
+ * seeky queues).
+ *
+ * Secondly, and in contrast to the above item (b), idling an
+ * NCQ-capable flash-based device would not boost the
+ * throughput even with intense I/O; rather it would lower
+ * the throughput in proportion to how fast the device
+ * is. Accordingly, the next variable is true if any of the
+ * above conditions (a) and (b) is true, and, in particular,
+ * happens to be false if bfqd is an NCQ-capable flash-based
+ * device.
*/
- idling_boosts_thr = !bfqd->hw_tag || bfq_bfqq_IO_bound(bfqq);
+ idling_boosts_thr = !bfqd->hw_tag ||
+ (!blk_queue_nonrot(bfqd->queue) && bfq_bfqq_IO_bound(bfqq));
/*
* The value of the next variable,
@@ -6491,14 +6501,16 @@ static bool bfq_bfqq_may_idle(struct bfq_queue *bfqq)
bfqd->wr_busy_queues == 0;
/*
- * There is then a case where idling must be performed not for
- * throughput concerns, but to preserve service guarantees. To
- * introduce it, we can note that allowing the drive to
- * enqueue more than one request at a time, and hence
+ * There is then a case where idling must be performed not
+ * for throughput concerns, but to preserve service
+ * guarantees.
+ *
+ * To introduce this case, we can note that allowing the drive
+ * to enqueue more than one request at a time, and hence
* delegating de facto final scheduling decisions to the
- * drive's internal scheduler, causes loss of control on the
+ * drive's internal scheduler, entails loss of control on the
* actual request service order. In particular, the critical
- * situation is when requests from different processes happens
+ * situation is when requests from different processes happen
* to be present, at the same time, in the internal queue(s)
* of the drive. In such a situation, the drive, by deciding
* the service order of the internally-queued requests, does
@@ -6509,51 +6521,97 @@ static bool bfq_bfqq_may_idle(struct bfq_queue *bfqq)
* the service distribution enforced by the drive's internal
* scheduler is likely to coincide with the desired
* device-throughput distribution only in a completely
- * symmetric scenario where: (i) each of these processes must
- * get the same throughput as the others; (ii) all these
- * processes have the same I/O pattern (either sequential or
- * random). In fact, in such a scenario, the drive will tend
- * to treat the requests of each of these processes in about
- * the same way as the requests of the others, and thus to
- * provide each of these processes with about the same
- * throughput (which is exactly the desired throughput
- * distribution). In contrast, in any asymmetric scenario,
- * device idling is certainly needed to guarantee that bfqq
- * receives its assigned fraction of the device throughput
- * (see [1] for details).
+ * symmetric scenario where:
+ * (i) each of these processes must get the same throughput as
+ * the others;
+ * (ii) all these processes have the same I/O pattern
+ (either sequential or random).
+ * In fact, in such a scenario, the drive will tend to treat
+ * the requests of each of these processes in about the same
+ * way as the requests of the others, and thus to provide
+ * each of these processes with about the same throughput
+ * (which is exactly the desired throughput distribution). In
+ * contrast, in any asymmetric scenario, device idling is
+ * certainly needed to guarantee that bfqq receives its
+ * assigned fraction of the device throughput (see [1] for
+ * details).
+ *
+ * We address this issue by controlling, actually, only the
+ * symmetry sub-condition (i), i.e., provided that
+ * sub-condition (i) holds, idling is not performed,
+ * regardless of whether sub-condition (ii) holds. In other
+ * words, only if sub-condition (i) holds, then idling is
+ * allowed, and the device tends to be prevented from queueing
+ * many requests, possibly of several processes. The reason
+ * for not controlling also sub-condition (ii) is that we
+ * exploit preemption to preserve guarantees in case of
+ * symmetric scenarios, even if (ii) does not hold, as
+ * explained in the next two paragraphs.
+ *
+ * Even if a queue, say Q, is expired when it remains idle, Q
+ * can still preempt the new in-service queue if the next
+ * request of Q arrives soon (see the comments on
+ * bfq_bfqq_update_budg_for_activation). If all queues and
+ * groups have the same weight, this form of preemption,
+ * combined with the hole-recovery heuristic described in the
+ * comments on function bfq_bfqq_update_budg_for_activation,
+ * are enough to preserve a correct bandwidth distribution in
+ * the mid term, even without idling. In fact, even if not
+ * idling allows the internal queues of the device to contain
+ * many requests, and thus to reorder requests, we can rather
+ * safely assume that the internal scheduler still preserves a
+ * minimum of mid-term fairness. The motivation for using
+ * preemption instead of idling is that, by not idling,
+ * service guarantees are preserved without minimally
+ * sacrificing throughput. In other words, both a high
+ * throughput and its desired distribution are obtained.
+ *
+ * More precisely, this preemption-based, idleless approach
+ * provides fairness in terms of IOPS, and not sectors per
+ * second. This can be seen with a simple example. Suppose
+ * that there are two queues with the same weight, but that
+ * the first queue receives requests of 8 sectors, while the
+ * second queue receives requests of 1024 sectors. In
+ * addition, suppose that each of the two queues contains at
+ * most one request at a time, which implies that each queue
+ * always remains idle after it is served. Finally, after
+ * remaining idle, each queue receives very quickly a new
+ * request. It follows that the two queues are served
+ * alternatively, preempting each other if needed. This
+ * implies that, although both queues have the same weight,
+ * the queue with large requests receives a service that is
+ * 1024/8 times as high as the service received by the other
+ * queue.
*
- * As for sub-condition (i), actually we check only whether
- * bfqq is being weight-raised. In fact, if bfqq is not being
- * weight-raised, we have that:
- * - if the process associated with bfqq is not I/O-bound, then
- * it is not either latency- or throughput-critical; therefore
- * idling is not needed for bfqq;
- * - if the process asociated with bfqq is I/O-bound, then
- * idling is already granted with bfqq (see the comments on
- * idling_boosts_thr).
+ * On the other hand, device idling is performed, and thus
+ * pure sector-domain guarantees are provided, for the
+ * following queues, which are likely to need stronger
+ * throughput guarantees: weight-raised queues, and queues
+ * with a higher weight than other queues. When such queues
+ * are active, sub-condition (i) is false, which triggers
+ * device idling.
*
- * We do not check sub-condition (ii) at all, i.e., the next
- * variable is true if and only if bfqq is being
- * weight-raised. We do not need to control sub-condition (ii)
- * for the following reason:
- * - if bfqq is being weight-raised, then idling is already
- * guaranteed to bfqq by sub-condition (i);
- * - if bfqq is not being weight-raised, then idling is
- * already guaranteed to bfqq (only) if it matters, i.e., if
- * bfqq is associated to a currently I/O-bound process (see
- * the above comment on sub-condition (i)).
+ * According to the above considerations, the next variable is
+ * true (only) if sub-condition (i) holds. To compute the
+ * value of this variable, we not only use the return value of
+ * the function bfq_symmetric_scenario(), but also check
+ * whether bfqq is being weight-raised, because
+ * bfq_symmetric_scenario() does not take into account also
+ * weight-raised queues (see comments on
+ * bfq_weights_tree_add()).
*
* As a side note, it is worth considering that the above
* device-idling countermeasures may however fail in the
* following unlucky scenario: if idling is (correctly)
- * disabled in a time period during which the symmetry
- * sub-condition holds, and hence the device is allowed to
+ * disabled in a time period during which all symmetry
+ * sub-conditions hold, and hence the device is allowed to
* enqueue many requests, but at some later point in time some
* sub-condition stops to hold, then it may become impossible
* to let requests be served in the desired order until all
* the requests already queued in the device have been served.
*/
- asymmetric_scenario = bfqq->wr_coeff > 1;
+ asymmetric_scenario = bfqq->wr_coeff > 1 ||
+ !bfq_symmetric_scenario(bfqd);
/*
* We have now all the components we need to compute the return