Message ID | 11e1777296b7d06085c9fd341bafc4b9d82e6e4e.1730883229.git.namcao@linutronix.de |
---|---|
State | New |
Headers | show |
Series | fix reading ESP during coredump | expand |
On Wed, Nov 06 2024 at 10:22, Nam Cao wrote: > Commit 0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in > /proc/PID/stat") disabled stack pointer reading, because it is generally > dangerous to do so. > > Commit fd7d56270b52 ("fs/proc: Report eip/esp in /prod/PID/stat for > coredumping") made an exception for coredumping thread, because for this > case it is safe. > > The exception was later extended to all threads in a coredumping process by > commit cb8f381f1613 ("fs/proc/array.c: allow reporting eip/esp for all > coredumping threads"). > > The above two commits determine if a task is core dumping by checking the > PF_EXITING and PF_DUMPCORE flags. > > However, commit 92307383082d ("coredump: Don't perform any cleanups before > dumping core") moved coredump to happen earlier and before PF_EXITING is > set. Thus, the check of the PF_EXITING flag no longer works. > > Instead, use task->signal->core_state to determine if coredump is > happening. This pointer is set at the beginning of coredump and is cleared > once coredump is done. Thus, while this pointer is not NULL, it is safe to > read ESP. > > Fixes: 92307383082d ("coredump: Don't perform any cleanups before dumping core") Can we please make progress with that? It's a user space visible change which causes a regression in core dumper tools. Thanks, tglx
On 11/06, Nam Cao wrote: > > @@ -534,6 +517,23 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, > ppid = task_tgid_nr_ns(task->real_parent, ns); > pgid = task_pgrp_nr_ns(task, ns); > > + /* > + * esp and eip are intentionally zeroed out. There is no > + * non-racy way to read them without freezing the task. > + * Programs that need reliable values can use ptrace(2). OK, but then: > + * The only exception is if the task is core dumping because > + * a program is not able to use ptrace(2) in that case. It is > + * safe because the task has stopped executing permanently. > + */ > + if (permitted && task->signal->core_state) { > + if (try_get_task_stack(task)) { > + eip = KSTK_EIP(task); > + esp = KSTK_ESP(task); > + put_task_stack(task); How can the task->signal->core_state check help ? Suppose we have a task T1 with T1-pid == 100 and you read /proc/100/stat. It is possible that the T1's sub-thread T2 starts the coredumping and sets signal->core_state != NULL. But read(/proc/100/stat) can run before T1 gets SIGKILL from T2 and enters the kernel mode? Oleg.
On 12/17, Oleg Nesterov wrote: > > On 11/06, Nam Cao wrote: > > > > @@ -534,6 +517,23 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, > > ppid = task_tgid_nr_ns(task->real_parent, ns); > > pgid = task_pgrp_nr_ns(task, ns); > > > > + /* > > + * esp and eip are intentionally zeroed out. There is no > > + * non-racy way to read them without freezing the task. > > + * Programs that need reliable values can use ptrace(2). > > OK, > > but then: > > > + * The only exception is if the task is core dumping because > > + * a program is not able to use ptrace(2) in that case. It is > > + * safe because the task has stopped executing permanently. > > + */ > > + if (permitted && task->signal->core_state) { > > + if (try_get_task_stack(task)) { > > + eip = KSTK_EIP(task); > > + esp = KSTK_ESP(task); > > + put_task_stack(task); > > How can the task->signal->core_state check help ? > > Suppose we have a task T1 with T1-pid == 100 and you read /proc/100/stat. > It is possible that the T1's sub-thread T2 starts the coredumping and sets > signal->core_state != NULL. > > But read(/proc/100/stat) can run before T1 gets SIGKILL from T2 and enters > the kernel mode? Can't the trivial patch below fix the problem? Oleg. --- xfs/proc/array.c +++ x/fs/proc/array.c @@ -500,7 +500,7 @@ * a program is not able to use ptrace(2) in that case. It is * safe because the task has stopped executing permanently. */ - if (permitted && (task->flags & (PF_EXITING|PF_DUMPCORE))) { + if (permitted && (task->flags & (PF_EXITING|PF_DUMPCORE|PF_POSTCOREDUMP))) { if (try_get_task_stack(task)) { eip = KSTK_EIP(task); esp = KSTK_ESP(task);
Hi Oleg, On Tue, Dec 17, 2024 at 04:09:14PM +0100, Oleg Nesterov wrote: > On 12/17, Oleg Nesterov wrote: > > > > On 11/06, Nam Cao wrote: > > > > > > @@ -534,6 +517,23 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, > > > ppid = task_tgid_nr_ns(task->real_parent, ns); > > > pgid = task_pgrp_nr_ns(task, ns); > > > > > > + /* > > > + * esp and eip are intentionally zeroed out. There is no > > > + * non-racy way to read them without freezing the task. > > > + * Programs that need reliable values can use ptrace(2). > > > > OK, > > > > but then: > > > > > + * The only exception is if the task is core dumping because > > > + * a program is not able to use ptrace(2) in that case. It is > > > + * safe because the task has stopped executing permanently. > > > + */ > > > + if (permitted && task->signal->core_state) { > > > + if (try_get_task_stack(task)) { > > > + eip = KSTK_EIP(task); > > > + esp = KSTK_ESP(task); > > > + put_task_stack(task); > > > > How can the task->signal->core_state check help ? > > > > Suppose we have a task T1 with T1-pid == 100 and you read /proc/100/stat. > > It is possible that the T1's sub-thread T2 starts the coredumping and sets > > signal->core_state != NULL. > > > > But read(/proc/100/stat) can run before T1 gets SIGKILL from T2 and enters > > the kernel mode? Right, I missed that race, thanks for pointing it out. > Can't the trivial patch below fix the problem? It can. In fact this is the original fix we had. I thought that checking a single "core_state" is simpler than checking 3 flags, oh well.. Can you send a proper patch, or should I do it? Best regards, Nam > > Oleg. > > > --- xfs/proc/array.c > +++ x/fs/proc/array.c > @@ -500,7 +500,7 @@ > * a program is not able to use ptrace(2) in that case. It is > * safe because the task has stopped executing permanently. > */ > - if (permitted && (task->flags & (PF_EXITING|PF_DUMPCORE))) { > + if (permitted && (task->flags & (PF_EXITING|PF_DUMPCORE|PF_POSTCOREDUMP))) { > if (try_get_task_stack(task)) { > eip = KSTK_EIP(task); > esp = KSTK_ESP(task); >
Hi Nam, On 12/20, Nam Cao wrote: > > > Can't the trivial patch below fix the problem? > > It can. In fact this is the original fix we had. I thought that checking a > single "core_state" is simpler than checking 3 flags, oh well.. > > Can you send a proper patch, or should I do it? Can you send V2 please? It was you who found/investigated the problem, and the patch is trivial. Feel free to include my acked-by. Thanks, Oleg.
diff --git a/fs/proc/array.c b/fs/proc/array.c index 34a47fb0c57f..2f1dbfcf143d 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -489,25 +489,8 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, vsize = eip = esp = 0; permitted = ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS | PTRACE_MODE_NOAUDIT); mm = get_task_mm(task); - if (mm) { + if (mm) vsize = task_vsize(mm); - /* - * esp and eip are intentionally zeroed out. There is no - * non-racy way to read them without freezing the task. - * Programs that need reliable values can use ptrace(2). - * - * The only exception is if the task is core dumping because - * a program is not able to use ptrace(2) in that case. It is - * safe because the task has stopped executing permanently. - */ - if (permitted && (task->flags & (PF_EXITING|PF_DUMPCORE))) { - if (try_get_task_stack(task)) { - eip = KSTK_EIP(task); - esp = KSTK_ESP(task); - put_task_stack(task); - } - } - } sigemptyset(&sigign); sigemptyset(&sigcatch); @@ -534,6 +517,23 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, ppid = task_tgid_nr_ns(task->real_parent, ns); pgid = task_pgrp_nr_ns(task, ns); + /* + * esp and eip are intentionally zeroed out. There is no + * non-racy way to read them without freezing the task. + * Programs that need reliable values can use ptrace(2). + * + * The only exception is if the task is core dumping because + * a program is not able to use ptrace(2) in that case. It is + * safe because the task has stopped executing permanently. + */ + if (permitted && task->signal->core_state) { + if (try_get_task_stack(task)) { + eip = KSTK_EIP(task); + esp = KSTK_ESP(task); + put_task_stack(task); + } + } + unlock_task_sighand(task, &flags); }
Commit 0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in /proc/PID/stat") disabled stack pointer reading, because it is generally dangerous to do so. Commit fd7d56270b52 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping") made an exception for coredumping thread, because for this case it is safe. The exception was later extended to all threads in a coredumping process by commit cb8f381f1613 ("fs/proc/array.c: allow reporting eip/esp for all coredumping threads"). The above two commits determine if a task is core dumping by checking the PF_EXITING and PF_DUMPCORE flags. However, commit 92307383082d ("coredump: Don't perform any cleanups before dumping core") moved coredump to happen earlier and before PF_EXITING is set. Thus, the check of the PF_EXITING flag no longer works. Instead, use task->signal->core_state to determine if coredump is happening. This pointer is set at the beginning of coredump and is cleared once coredump is done. Thus, while this pointer is not NULL, it is safe to read ESP. Fixes: 92307383082d ("coredump: Don't perform any cleanups before dumping core") Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: <stable@vger.kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> --- fs/proc/array.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-)