[RFC,v2,0/1] Introduce per-task io utilization boost

There is a feature inside of both schedutil and intel_pstate called iowait boosting which tries to prevent selecting a low frequency during IO workloads when it impacts throughput. The feature is implemented by checking for task wakeups that have the in_iowait flag set and boost the CPU of the rq accordingly (implemented through cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT)). The necessity of the feature is argued with the potentially low utilization of a task being frequently in_iowait (i.e. most of the time not enqueued on any rq and cannot build up utilization). The RFC focuses on the schedutil implementation. intel_pstate implementation is possible, but with reviews of v1 it seems a governor-based implementation is preferred. Current schedutil iowait boosting has several issues: 1. Boosting happens even in scenarios where it doesn't improve throughput. [1] 2. The boost is not accounted for in EAS: a) feec() will only consider the actual task utilization for task placement, but another CPU might be more energy-efficient at that capacity than the boosted one.) b) When placing a non-IO task while a CPU is boosted compute_energy() assumes a lower OPP than what is actually applied. This leads to wrong EAS decisions. 3. Actual IO heavy workloads are hardly distinguished from infrequent in_iowait wakeups. 4. The boost isn't associated with a task, it therefore isn't considered for task placement, potentially missing out on higher capacity CPUs on heterogeneous CPU topologies. 5. The boost isn't associated with a task, it therefore lingers on the rq even after the responsible task has migrated / stopped. 6. The boost isn't associated with a task, it therefore needs to ramp up again when migrated. 7. Since schedutil doesn't know which task is getting woken up, multiple unrelated in_iowait tasks might lead to boosting. 8. Boosting is hard to control with UCLAMP_MAX. We attempt to mitigate all of the above by reworking the way the iowait boosting (io boosting from here on) works in two major ways: - Carry the boost in task_struct, so it is a per-task attribute and behaves similar to utilization of the task in some ways. - Employ a counting-based tracking strategy that only boosts as long as it sees benefits and returns to minimal boosting dynamically. Note that some the issues (1, 3) can be solved by using a counting-based strategy on a per-rq basis, i.e. in sugov entirely. Experiments with Android in particular showed that such a strategy (which necessarily needs longer intervals to be reasonably stable) is too prone to migrations to be useful generally. We therefore consider the additional complexity of such a per-task based approach like proposed to be worth it. We require a minimum of 1000 iowait wakeups per second to start boosting. This isn't too far off from what sugov currently does, since it resets the boost if it hasn't seen an iowait wakeup for TICK_NSEC. For CONFIG_HZ=1000 we are on par, for anything below we are stricter. We justify this by the small possible improvement by boosting in the first place with 'rare' iowait wakeups. When IO even leads to a task being in iowait isn't as straightforward to explain. Of course if the issued IO can be served by the page cache (e.g. on reads because the pages are contained, on writes because they can be marked dirty and the writeback takes care of it later) the actual issuing task is usually not in iowait. We consider this the good case, since whenever the scheduler and a potential userspace / kernel switch is in the critical path for IO there is possibly overhead impacting throughput. We therefore focus on random read from here on, because (on synchronous IO [3]) this will lead to the task being set in iowait for every IO. This is where iowait boosting shows its biggest throughput improvement.

Message ID	20240518113947.2127802-1-christian.loehle@arm.com
Headers	show Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B7622C8C0; Sat, 18 May 2024 11:40:23 +0000 (UTC) From: Christian Loehle <christian.loehle@arm.com> To: linux-kernel@vger.kernel.org Cc: peterz@infradead.org, juri.lelli@redhat.com, mingo@redhat.com, rafael@kernel.org, dietmar.eggemann@arm.com, vschneid@redhat.com, vincent.guittot@linaro.org, Johannes.Thumshirn@wdc.com, adrian.hunter@intel.com, ulf.hansson@linaro.org, bvanassche@acm.org, andres@anarazel.de, asml.silence@gmail.com, linux-pm@vger.kernel.org, linux-block@vger.kernel.org, io-uring@vger.kernel.org, qyousef@layalina.io, Christian Loehle <christian.loehle@arm.com> Subject: [RFC PATCH v2 0/1] Introduce per-task io utilization boost Date: Sat, 18 May 2024 12:39:46 +0100 Message-Id: <20240518113947.2127802-1-christian.loehle@arm.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Introduce per-task io utilization boost \| expand [RFC,v2,0/1] Introduce per-task io utilization boost [RFC,v2,1/1] sched/fair: sugov: Introduce per-task io util boost

[RFC,v2,0/1] Introduce per-task io utilization boost

Message