[5.4,32/80] taskstats: Cleanup the use of task->exit_code

Message ID	20220221084916.628257481@linuxfoundation.org
State	New
Headers	show Return-Path: <stable-owner@kernel.org> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, stable@vger.kernel.org, Balbir Singh <bsingharora@gmail.com>, "Eric W. Biederman" <ebiederm@xmission.com>, Sudip Mukherjee <sudipm.mukherjee@gmail.com> Subject: [PATCH 5.4 32/80] taskstats: Cleanup the use of task->exit_code Date: Mon, 21 Feb 2022 09:49:12 +0100 Message-Id: <20220221084916.628257481@linuxfoundation.org> In-Reply-To: <20220221084915.554151737@linuxfoundation.org> References: <20220221084915.554151737@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	None \| expand [5.4,02/80] HID:Add support for UGTABLET WP5540 [5.4,05/80] parisc: Drop __init from map_pages declaration [5.4,06/80] parisc: Fix data TLB miss in sba_unmap_sg [5.4,07/80] parisc: Fix sglist access in ccio-dma.c [5.4,08/80] btrfs: send: in case of IO error log it [5.4,09/80] platform/x86: ISST: Fix possible circular locking dependency detected [5.4,10/80] selftests: rtc: Increase test timeout so that all tests run [5.4,14/80] selftests/zram: Adapt the situation that /dev/zram0 is being used [5.4,17/80] quota: make dquot_quota_sync return errors from ->sync_fs [5.4,19/80] nvme-tcp: fix possible use-after-free in transport error_recovery work [5.4,20/80] nvme-rdma: fix possible use-after-free in transport error_recovery work [5.4,21/80] drm/amdgpu: fix logic inversion in check [5.4,22/80] Revert "module, async: async_synchronize_full() on module init iff async is used" [5.4,23/80] ftrace: add ftrace_init_nop() [5.4,26/80] arm64: module/ftrace: intialize PLT at load time [5.4,31/80] ext4: prevent partial update of the extent blocks [5.4,32/80] taskstats: Cleanup the use of task->exit_code [5.4,33/80] dmaengine: at_xdmac: Start transfer for cyclic channels in issue_pending [5.4,35/80] mmc: block: fix read single on recovery logic [5.4,37/80] iwlwifi: pcie: gen2: fix locking when "HW not ready" [5.4,39/80] net: dsa: lan9303: fix reset on probe [5.4,40/80] net: ieee802154: ca8210: Fix lifs/sifs periods [5.4,42/80] bonding: force carrier update when releasing slave [5.4,44/80] bonding: fix data-races around agg_select_timer [5.4,45/80] libsubcmd: Fix use-after-free for realloc(..., 0) [5.4,48/80] ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw() [5.4,50/80] powerpc/lib/sstep: fix ptesync build error [5.4,51/80] mtd: rawnand: gpmi: dont leak PM reference in error path [5.4,53/80] optee: use driver internal tee_context for some rpc [5.4,54/80] block/wbt: fix negative inflight counter when remove scsi device [5.4,56/80] NFS: Do not report writeback errors in nfs_getattr() [5.4,59/80] scsi: lpfc: Fix pt2pt NVMe PRLI reject LOGO loop [5.4,61/80] net: sched: limit TC_ACT_REPEAT loops [5.4,64/80] i2c: brcmstb: fix support for DSL and CM variants [5.4,66/80] KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW [5.4,67/80] ARM: OMAP2+: hwmod: Add of_node_put() before break [5.4,69/80] irqchip/sifive-plic: Add missing thead,c900-plic match string [5.4,70/80] netfilter: conntrack: dont refresh sctp entries in closed state [5.4,71/80] arm64: dts: meson-gx: add ATF BL32 reserved-memory region [5.4,74/80] kconfig: let shell return enough output for deep path names [5.4,77/80] tracing: Fix tp_printk option related with tp_printk_stop_on_boot

Message ID

20220221084916.628257481@linuxfoundation.org

State

New

Headers

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        stable@vger.kernel.org, Balbir Singh <bsingharora@gmail.com>,
        "Eric W. Biederman" <ebiederm@xmission.com>,
        Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Subject: [PATCH 5.4 32/80] taskstats: Cleanup the use of task->exit_code
Date: Mon, 21 Feb 2022 09:49:12 +0100
Message-Id: <20220221084916.628257481@linuxfoundation.org>
In-Reply-To: <20220221084915.554151737@linuxfoundation.org>
References: <20220221084915.554151737@linuxfoundation.org>
User-Agent: quilt/0.66
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

None | expand

Commit Message

Greg KH Feb. 21, 2022, 8:49 a.m. UTC

From: "Eric W. Biederman" <ebiederm@xmission.com>

commit 1b5a42d9c85f0e731f01c8d1129001fd8531a8a0 upstream.

In the function bacct_add_task the code reading task->exit_code was
introduced in commit f3cef7a99469 ("[PATCH] csa: basic accounting over
taskstats"), and it is not entirely clear what the taskstats interface
is trying to return as only returning the exit_code of the first task
in a process doesn't make a lot of sense.

As best as I can figure the intent is to return task->exit_code after
a task exits.  The field is returned with per task fields, so the
exit_code of the entire process is not wanted.  Only the value of the
first task is returned so this is not a useful way to get the per task
ptrace stop code.  The ordinary case of returning this value is
returning after a task exits, which also precludes use for getting
a ptrace value.

It is common to for the first task of a process to also be the last
task of a process so this field may have done something reasonable by
accident in testing.

Make ac_exitcode a reliable per task value by always returning it for
every exited task.

Setting ac_exitcode in a sensible mannter makes it possible to continue
to provide this value going forward.

Cc: Balbir Singh <bsingharora@gmail.com>
Fixes: f3cef7a99469 ("[PATCH] csa: basic accounting over taskstats")
Link: https://lkml.kernel.org/r/20220103213312.9144-5-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
[sudip: adjust context]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/tsacct.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Eric W. Biederman Feb. 22, 2022, 11:53 p.m. UTC | #1

"Dr. Thomas Orgis" <thomas.orgis@uni-hamburg.de> writes:

> Am Mon, 21 Feb 2022 09:49:12 +0100
> schrieb Greg Kroah-Hartman <gregkh@linuxfoundation.org>: 
>
>> As best as I can figure the intent is to return task->exit_code after
>> a task exits.  The field is returned with per task fields, so the
>> exit_code of the entire process is not wanted.
>
> I wondered about the use of exit_code, too, when preparing my patch
> that introduces ac_tgid and the AGROUP flag to identify the first and
> last tasks of a task group/process, see
>
> 	https://lkml.org/lkml/2022/2/18/887
>
> With the information about the position of this task in the group,
> users can take some meaning from the exit code (individual kills?). The
> old style ensured that you got one exit code per process.

How do you figure?

For single-threaded processes ac_exitcode would always be reasonable,
and be what userspace passed to exit(3).

For multi-threaded processes ac_exitcode before my change was set to
some completely arbitrary value for the thread whose tgid == tid.

Frequently the thread whose tgid == tid is the last thread to
exit and is brought down by a call to group_exit so it makes sense.
Unfortunately there is no requirement for that to be the case.

If the thread whose tgid == tid happens to call pthread_exit the value
in ac_exitcode for that thread is pretty much undefined.

The ac_exitcode for the other threads would be the useless value of 0
that the field was initialized to.  With my change the value returned is
at least well defined.

But thread_group_leader in this context does nothing except limit the
value that is returned.

> I addressing ac_exitcode fits together with my patch, while increasing
> the version of taskstats helps clients that then can know that
> ac_exitcode now has a different meaning. Right now this is a change
> under the hood and you can just guess (or have to know from the kernel
> version).

As best as I can tell I did not change the meaning of the field.  I
change buggy code, and removed an arbitrary and senseless filter.

Now maybe it would have been better to flag the bug fix with a version
number.  Unfortunately I did not even realize taskstats had a version
number.  I just know the code made no sense.

Eric

Dr. Thomas Orgis Feb. 23, 2022, 10:40 p.m. UTC | #2

Am Tue, 22 Feb 2022 17:53:12 -0600
schrieb "Eric W. Biederman" <ebiederm@xmission.com>: 

> How do you figure?

I admit that I am struggling with understanding where exit codes come
from in the non-usual cases. During my taskstats tests, I played with
writing a multithreaded application that does call pthread_exit() in
the main thread (pid==tgid), for example. I slowly had to learn just
how messy this can be …

Is it clearly defined what the exitcode of a task as part of a process
is/should/can mean, as opposed to the process as a whole?

> For single-threaded processes ac_exitcode would always be reasonable,
> and be what userspace passed to exit(3).

Yes. That is the one case where we all know what we are dealing with;-)

> For multi-threaded processes ac_exitcode before my change was set to
> some completely arbitrary value for the thread whose tgid == tid.

Isn't the only place where it really makes sense to set the exitcode
when the last task of the process exits? I guess that was the intention
of the earlier code — with the same wrong assumption that I fell victim
to for quite some time: That the group leader (first task, tgid == pid)
always exits last.

I do not know in which cases group member threads have meaningful exit
codes different from the last one (which is the one returned for the
process in whole … ?). I'd love to see the exact reasoning on how
multithreading got mapped into kernel tasks which used to track only
single-threaded processes before.

> With my change the value returned
> is at least well defined.

But defined to what?

> Now maybe it would have been better to flag the bug fix with a version
> number.  Unfortunately I did not even realize taskstats had a version
> number.  I just know the code made no sense.

Well, fixing a bug that has been there from the beginning (of adding
multithreading, at least) is a significant change that one might want
to know about. And I do think that it fits to thouroughly fix these
issues that relate to identifying threads and processes (the shameless
plug of my taskstats patch that I'm working on since 2018, and only got
right in 2022, finally — I hope), while at that.


Alrighty then,

Thomas

Eric W. Biederman Feb. 25, 2022, 12:23 a.m. UTC | #3

"Dr. Thomas Orgis" <thomas.orgis@uni-hamburg.de> writes:

> Am Tue, 22 Feb 2022 17:53:12 -0600
> schrieb "Eric W. Biederman" <ebiederm@xmission.com>: 
>
>> How do you figure?
>
> I admit that I am struggling with understanding where exit codes come
> from in the non-usual cases. During my taskstats tests, I played with
> writing a multithreaded application that does call pthread_exit() in
> the main thread (pid==tgid), for example. I slowly had to learn just
> how messy this can be …
>
> Is it clearly defined what the exitcode of a task as part of a process
> is/should/can mean, as opposed to the process as a whole?

In the code it is clearly defined.  The decoding is exactly the same
as from an entire process and for a single threaded process there is no
difference.

Linux has a system 2 system calls "exit(2)" and "exit_group(2)" if a
thread exits by itself whatever is passed to exit(2) is the exit code.

What pthread_exit passes to exit(2) I don't know.  I have not been able
to trace glibc that far, and I have not instrumented up a kernel to see.

For threads that are alive when exit_group(2) is called they all get the
same final exit code.

>> For single-threaded processes ac_exitcode would always be reasonable,
>> and be what userspace passed to exit(3).
>
> Yes. That is the one case where we all know what we are dealing with;-)
>
>> For multi-threaded processes ac_exitcode before my change was set to
>> some completely arbitrary value for the thread whose tgid == tid.
>
> Isn't the only place where it really makes sense to set the exitcode
> when the last task of the process exits? I guess that was the intention
> of the earlier code — with the same wrong assumption that I fell victim
> to for quite some time: That the group leader (first task, tgid == pid)
> always exits last.
>
> I do not know in which cases group member threads have meaningful exit
> codes different from the last one (which is the one returned for the
> process in whole … ?). I'd love to see the exact reasoning on how
> multithreading got mapped into kernel tasks which used to track only
> single-threaded processes before.

The internal model in the kernel is there are tasks (which pthreads are
mapped to in a 1-1 fashion).  These tasks were the original process
abstraction.  In the case of CLONE_THREAD these tasks are glued together
into a POSIX process, with shared signal handling.

So from a kernel standpoint as it basically the original process
abstraction it is all well defined what happens when an individual task
exits.

>> With my change the value returned
>> is at least well defined.
>
> But defined to what?

See above.

>> Now maybe it would have been better to flag the bug fix with a version
>> number.  Unfortunately I did not even realize taskstats had a version
>> number.  I just know the code made no sense.
>
> Well, fixing a bug that has been there from the beginning (of adding
> multithreading, at least) is a significant change that one might want
> to know about. And I do think that it fits to thouroughly fix these
> issues that relate to identifying threads and processes (the shameless
> plug of my taskstats patch that I'm working on since 2018, and only got
> right in 2022, finally — I hope), while at that.

It looks like the bug was in commit f3cef7a99469 ("[PATCH] csa: basic
accounting over taskstats") in 2006 in 2.6.19-rc1 when taskstats were
added.  That is long after CLONE_THREAD support was added in the 2.5
development kernel.

I have been working to get a single place that code can look to find the
process exit status.  AKA so that the code can always set
SIGNAL_GROUP_EXIT, and look at signal->group_exit_code.  Fixing this was
just part of sorting out the misconceptions, and I didn't realize there
was anyone that paying attention and cared.

I will see if I can find some time to give your taskstats patch a
review.

Eric

--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -35,11 +35,10 @@  void bacct_add_tsk(struct user_namespace
 	/* Convert to seconds for btime */
 	do_div(delta, USEC_PER_SEC);
 	stats->ac_btime = get_seconds() - delta;
-	if (thread_group_leader(tsk)) {
+	if (tsk->flags & PF_EXITING)
 		stats->ac_exitcode = tsk->exit_code;
-		if (tsk->flags & PF_FORKNOEXEC)
-			stats->ac_flag |= AFORK;
-	}
+	if (thread_group_leader(tsk) && (tsk->flags & PF_FORKNOEXEC))
+		stats->ac_flag |= AFORK;
 	if (tsk->flags & PF_SUPERPRIV)
 		stats->ac_flag |= ASU;
 	if (tsk->flags & PF_DUMPCORE)

[5.4,32/80] taskstats: Cleanup the use of task->exit_code

Commit Message

Comments

Patch