diff mbox series

[5.4,32/80] taskstats: Cleanup the use of task->exit_code

Message ID 20220221084916.628257481@linuxfoundation.org
State New
Headers show
Series None | expand

Commit Message

Greg KH Feb. 21, 2022, 8:49 a.m. UTC
From: "Eric W. Biederman" <ebiederm@xmission.com>

commit 1b5a42d9c85f0e731f01c8d1129001fd8531a8a0 upstream.

In the function bacct_add_task the code reading task->exit_code was
introduced in commit f3cef7a99469 ("[PATCH] csa: basic accounting over
taskstats"), and it is not entirely clear what the taskstats interface
is trying to return as only returning the exit_code of the first task
in a process doesn't make a lot of sense.

As best as I can figure the intent is to return task->exit_code after
a task exits.  The field is returned with per task fields, so the
exit_code of the entire process is not wanted.  Only the value of the
first task is returned so this is not a useful way to get the per task
ptrace stop code.  The ordinary case of returning this value is
returning after a task exits, which also precludes use for getting
a ptrace value.

It is common to for the first task of a process to also be the last
task of a process so this field may have done something reasonable by
accident in testing.

Make ac_exitcode a reliable per task value by always returning it for
every exited task.

Setting ac_exitcode in a sensible mannter makes it possible to continue
to provide this value going forward.

Cc: Balbir Singh <bsingharora@gmail.com>
Fixes: f3cef7a99469 ("[PATCH] csa: basic accounting over taskstats")
Link: https://lkml.kernel.org/r/20220103213312.9144-5-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
[sudip: adjust context]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/tsacct.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Eric W. Biederman Feb. 22, 2022, 11:53 p.m. UTC | #1
"Dr. Thomas Orgis" <thomas.orgis@uni-hamburg.de> writes:

> Am Mon, 21 Feb 2022 09:49:12 +0100
> schrieb Greg Kroah-Hartman <gregkh@linuxfoundation.org>: 
>
>> As best as I can figure the intent is to return task->exit_code after
>> a task exits.  The field is returned with per task fields, so the
>> exit_code of the entire process is not wanted.
>
> I wondered about the use of exit_code, too, when preparing my patch
> that introduces ac_tgid and the AGROUP flag to identify the first and
> last tasks of a task group/process, see
>
> 	https://lkml.org/lkml/2022/2/18/887
>
> With the information about the position of this task in the group,
> users can take some meaning from the exit code (individual kills?). The
> old style ensured that you got one exit code per process.

How do you figure?

For single-threaded processes ac_exitcode would always be reasonable,
and be what userspace passed to exit(3).

For multi-threaded processes ac_exitcode before my change was set to
some completely arbitrary value for the thread whose tgid == tid.

Frequently the thread whose tgid == tid is the last thread to
exit and is brought down by a call to group_exit so it makes sense.
Unfortunately there is no requirement for that to be the case.

If the thread whose tgid == tid happens to call pthread_exit the value
in ac_exitcode for that thread is pretty much undefined.

The ac_exitcode for the other threads would be the useless value of 0
that the field was initialized to.  With my change the value returned is
at least well defined.

But thread_group_leader in this context does nothing except limit the
value that is returned.

> I addressing ac_exitcode fits together with my patch, while increasing
> the version of taskstats helps clients that then can know that
> ac_exitcode now has a different meaning. Right now this is a change
> under the hood and you can just guess (or have to know from the kernel
> version).

As best as I can tell I did not change the meaning of the field.  I
change buggy code, and removed an arbitrary and senseless filter.

Now maybe it would have been better to flag the bug fix with a version
number.  Unfortunately I did not even realize taskstats had a version
number.  I just know the code made no sense.

Eric
Dr. Thomas Orgis Feb. 23, 2022, 10:40 p.m. UTC | #2
Am Tue, 22 Feb 2022 17:53:12 -0600
schrieb "Eric W. Biederman" <ebiederm@xmission.com>: 

> How do you figure?

I admit that I am struggling with understanding where exit codes come
from in the non-usual cases. During my taskstats tests, I played with
writing a multithreaded application that does call pthread_exit() in
the main thread (pid==tgid), for example. I slowly had to learn just
how messy this can be …

Is it clearly defined what the exitcode of a task as part of a process
is/should/can mean, as opposed to the process as a whole?

> For single-threaded processes ac_exitcode would always be reasonable,
> and be what userspace passed to exit(3).

Yes. That is the one case where we all know what we are dealing with;-)

> For multi-threaded processes ac_exitcode before my change was set to
> some completely arbitrary value for the thread whose tgid == tid.

Isn't the only place where it really makes sense to set the exitcode
when the last task of the process exits? I guess that was the intention
of the earlier code — with the same wrong assumption that I fell victim
to for quite some time: That the group leader (first task, tgid == pid)
always exits last.

I do not know in which cases group member threads have meaningful exit
codes different from the last one (which is the one returned for the
process in whole … ?). I'd love to see the exact reasoning on how
multithreading got mapped into kernel tasks which used to track only
single-threaded processes before.

> With my change the value returned
> is at least well defined.

But defined to what?

> Now maybe it would have been better to flag the bug fix with a version
> number.  Unfortunately I did not even realize taskstats had a version
> number.  I just know the code made no sense.

Well, fixing a bug that has been there from the beginning (of adding
multithreading, at least) is a significant change that one might want
to know about. And I do think that it fits to thouroughly fix these
issues that relate to identifying threads and processes (the shameless
plug of my taskstats patch that I'm working on since 2018, and only got
right in 2022, finally — I hope), while at that.


Alrighty then,

Thomas
Eric W. Biederman Feb. 25, 2022, 12:23 a.m. UTC | #3
"Dr. Thomas Orgis" <thomas.orgis@uni-hamburg.de> writes:

> Am Tue, 22 Feb 2022 17:53:12 -0600
> schrieb "Eric W. Biederman" <ebiederm@xmission.com>: 
>
>> How do you figure?
>
> I admit that I am struggling with understanding where exit codes come
> from in the non-usual cases. During my taskstats tests, I played with
> writing a multithreaded application that does call pthread_exit() in
> the main thread (pid==tgid), for example. I slowly had to learn just
> how messy this can be …
>
> Is it clearly defined what the exitcode of a task as part of a process
> is/should/can mean, as opposed to the process as a whole?

In the code it is clearly defined.  The decoding is exactly the same
as from an entire process and for a single threaded process there is no
difference.

Linux has a system 2 system calls "exit(2)" and "exit_group(2)" if a
thread exits by itself whatever is passed to exit(2) is the exit code.

What pthread_exit passes to exit(2) I don't know.  I have not been able
to trace glibc that far, and I have not instrumented up a kernel to see.

For threads that are alive when exit_group(2) is called they all get the
same final exit code.

>> For single-threaded processes ac_exitcode would always be reasonable,
>> and be what userspace passed to exit(3).
>
> Yes. That is the one case where we all know what we are dealing with;-)
>
>> For multi-threaded processes ac_exitcode before my change was set to
>> some completely arbitrary value for the thread whose tgid == tid.
>
> Isn't the only place where it really makes sense to set the exitcode
> when the last task of the process exits? I guess that was the intention
> of the earlier code — with the same wrong assumption that I fell victim
> to for quite some time: That the group leader (first task, tgid == pid)
> always exits last.
>
> I do not know in which cases group member threads have meaningful exit
> codes different from the last one (which is the one returned for the
> process in whole … ?). I'd love to see the exact reasoning on how
> multithreading got mapped into kernel tasks which used to track only
> single-threaded processes before.

The internal model in the kernel is there are tasks (which pthreads are
mapped to in a 1-1 fashion).  These tasks were the original process
abstraction.  In the case of CLONE_THREAD these tasks are glued together
into a POSIX process, with shared signal handling.

So from a kernel standpoint as it basically the original process
abstraction it is all well defined what happens when an individual task
exits.

>> With my change the value returned
>> is at least well defined.
>
> But defined to what?

See above.

>> Now maybe it would have been better to flag the bug fix with a version
>> number.  Unfortunately I did not even realize taskstats had a version
>> number.  I just know the code made no sense.
>
> Well, fixing a bug that has been there from the beginning (of adding
> multithreading, at least) is a significant change that one might want
> to know about. And I do think that it fits to thouroughly fix these
> issues that relate to identifying threads and processes (the shameless
> plug of my taskstats patch that I'm working on since 2018, and only got
> right in 2022, finally — I hope), while at that.

It looks like the bug was in commit f3cef7a99469 ("[PATCH] csa: basic
accounting over taskstats") in 2006 in 2.6.19-rc1 when taskstats were
added.  That is long after CLONE_THREAD support was added in the 2.5
development kernel.

I have been working to get a single place that code can look to find the
process exit status.  AKA so that the code can always set
SIGNAL_GROUP_EXIT, and look at signal->group_exit_code.  Fixing this was
just part of sorting out the misconceptions, and I didn't realize there
was anyone that paying attention and cared.

I will see if I can find some time to give your taskstats patch a
review.

Eric
diff mbox series

Patch

--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -35,11 +35,10 @@  void bacct_add_tsk(struct user_namespace
 	/* Convert to seconds for btime */
 	do_div(delta, USEC_PER_SEC);
 	stats->ac_btime = get_seconds() - delta;
-	if (thread_group_leader(tsk)) {
+	if (tsk->flags & PF_EXITING)
 		stats->ac_exitcode = tsk->exit_code;
-		if (tsk->flags & PF_FORKNOEXEC)
-			stats->ac_flag |= AFORK;
-	}
+	if (thread_group_leader(tsk) && (tsk->flags & PF_FORKNOEXEC))
+		stats->ac_flag |= AFORK;
 	if (tsk->flags & PF_SUPERPRIV)
 		stats->ac_flag |= ASU;
 	if (tsk->flags & PF_DUMPCORE)