diff mbox series

[1/4] cputime,cpuacct: Include guest time in user time in cpuacct.stat

Message ID 20210217120004.7984-1-arbn@yandex-team.com
State New
Headers show
Series [1/4] cputime,cpuacct: Include guest time in user time in cpuacct.stat | expand

Commit Message

Andrey Ryabinin Feb. 17, 2021, noon UTC
cpuacct.stat in no-root cgroups shows user time without guest time
included int it. This doesn't match with user time shown in root
cpuacct.stat and /proc/<pid>/stat.

Make account_guest_time() to add user time to cgroup's cpustat to
fix this.

Fixes: ef12fefabf94 ("cpuacct: add per-cgroup utime/stime statistics")
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
Cc: <stable@vger.kernel.org>
---
 kernel/sched/cputime.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

Comments

Daniel Jordan March 17, 2021, 10:09 p.m. UTC | #1
Andrey Ryabinin <arbn@yandex-team.com> writes:

> cpuacct.stat in no-root cgroups shows user time without guest time

> included int it. This doesn't match with user time shown in root

> cpuacct.stat and /proc/<pid>/stat.


Yeah, that's inconsistent.

> Make account_guest_time() to add user time to cgroup's cpustat to

> fix this.


Yep.

cgroup2's cpu.stat is broken the same way for child cgroups, and this
happily fixes it.  Probably deserves a mention in the changelog.

The problem with cgroup2 was, if the workload was mostly guest time,
cpu.stat's user and system together reflected it, but it was split
unevenly across the two.  I think guest time wasn't actually included in
either bucket, it was just that the little user and system time there
was got scaled up in cgroup_base_stat_cputime_show -> cputime_adjust to
match sum_exec_runtime, which did have it.

The stats look ok now for both cgroup1 and 2.  Just slightly unsure
whether we want to change the way both interfaces expose the accounting
in case something out there depends on it.  Seems like we should, but
it'd be good to hear more opinions.

> @@ -148,11 +146,11 @@ void account_guest_time(struct task_struct *p, u64 cputime)

>  

>  	/* Add guest time to cpustat. */

>  	if (task_nice(p) > 0) {

> -		cpustat[CPUTIME_NICE] += cputime;

> -		cpustat[CPUTIME_GUEST_NICE] += cputime;

> +		task_group_account_field(p, CPUTIME_NICE, cputime);

> +		task_group_account_field(p, CPUTIME_GUEST_NICE, cputime);

>  	} else {

> -		cpustat[CPUTIME_USER] += cputime;

> -		cpustat[CPUTIME_GUEST] += cputime;

> +		task_group_account_field(p, CPUTIME_USER, cputime);

> +		task_group_account_field(p, CPUTIME_GUEST, cputime);

>  	}


Makes sense for _USER and _NICE, but it doesn't seem cgroup1 or 2
actually use _GUEST and _GUEST_NICE.

Could go either way.  Consistency is nice, but I probably wouldn't
change the GUEST ones so people aren't confused about why they're
accounted.  It's also extra cycles for nothing, even though most of the
data is probably in the cache.
Andrey Ryabinin Aug. 20, 2021, 9:37 a.m. UTC | #2
Sorry for abandoning this, got distracted by lots of other stuff.


On 3/18/21 1:09 AM, Daniel Jordan wrote:
> Andrey Ryabinin <arbn@yandex-team.com> writes:

> 

>> cpuacct.stat in no-root cgroups shows user time without guest time

>> included int it. This doesn't match with user time shown in root

>> cpuacct.stat and /proc/<pid>/stat.

> 

> Yeah, that's inconsistent.

> 

>> Make account_guest_time() to add user time to cgroup's cpustat to

>> fix this.

> 

> Yep.

> 

> cgroup2's cpu.stat is broken the same way for child cgroups, and this

> happily fixes it.  Probably deserves a mention in the changelog.

> 


Sure.

> The problem with cgroup2 was, if the workload was mostly guest time,

> cpu.stat's user and system together reflected it, but it was split

> unevenly across the two.  I think guest time wasn't actually included in

> either bucket, it was just that the little user and system time there

> was got scaled up in cgroup_base_stat_cputime_show -> cputime_adjust to

> match sum_exec_runtime, which did have it.

> 

> The stats look ok now for both cgroup1 and 2.  Just slightly unsure

> whether we want to change the way both interfaces expose the accounting

> in case something out there depends on it.  Seems like we should, but

> it'd be good to hear more opinions.

> 

>> @@ -148,11 +146,11 @@ void account_guest_time(struct task_struct *p, u64 cputime)

>>  

>>  	/* Add guest time to cpustat. */

>>  	if (task_nice(p) > 0) {

>> -		cpustat[CPUTIME_NICE] += cputime;

>> -		cpustat[CPUTIME_GUEST_NICE] += cputime;

>> +		task_group_account_field(p, CPUTIME_NICE, cputime);

>> +		task_group_account_field(p, CPUTIME_GUEST_NICE, cputime);

>>  	} else {

>> -		cpustat[CPUTIME_USER] += cputime;

>> -		cpustat[CPUTIME_GUEST] += cputime;

>> +		task_group_account_field(p, CPUTIME_USER, cputime);

>> +		task_group_account_field(p, CPUTIME_GUEST, cputime);

>>  	}

> 

> Makes sense for _USER and _NICE, but it doesn't seem cgroup1 or 2

> actually use _GUEST and _GUEST_NICE.

> 

> Could go either way.  Consistency is nice, but I probably wouldn't

> change the GUEST ones so people aren't confused about why they're

> accounted.  It's also extra cycles for nothing, even though most of the

> data is probably in the cache.

> 


Agreed, will live the _GUEST* as is.
diff mbox series

Patch

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5f611658eeab..95a9c5603d29 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -139,8 +139,6 @@  void account_user_time(struct task_struct *p, u64 cputime)
  */
 void account_guest_time(struct task_struct *p, u64 cputime)
 {
-	u64 *cpustat = kcpustat_this_cpu->cpustat;
-
 	/* Add guest time to process. */
 	p->utime += cputime;
 	account_group_user_time(p, cputime);
@@ -148,11 +146,11 @@  void account_guest_time(struct task_struct *p, u64 cputime)
 
 	/* Add guest time to cpustat. */
 	if (task_nice(p) > 0) {
-		cpustat[CPUTIME_NICE] += cputime;
-		cpustat[CPUTIME_GUEST_NICE] += cputime;
+		task_group_account_field(p, CPUTIME_NICE, cputime);
+		task_group_account_field(p, CPUTIME_GUEST_NICE, cputime);
 	} else {
-		cpustat[CPUTIME_USER] += cputime;
-		cpustat[CPUTIME_GUEST] += cputime;
+		task_group_account_field(p, CPUTIME_USER, cputime);
+		task_group_account_field(p, CPUTIME_GUEST, cputime);
 	}
 }