mbox series

[RFC,0/3] Checkpoint Support for Syscall User Dispatch

Message ID 20230109153348.5625-1-gregory.price@memverge.com
Headers show
Series Checkpoint Support for Syscall User Dispatch | expand

Message

Gregory Price Jan. 9, 2023, 3:33 p.m. UTC
Syscall user dispatch makes it possible to cleanly intercept system
calls from user-land.  However, most transparent checkpoint software
presently leverages some combination of ptrace and system call
injection to place software in a ready-to-checkpoint state.

If Syscall User Dispatch is enabled at the time of being quiesced,
injected system calls will subsequently be interposed upon and
dispatched to the task's signal handler.

This patch set implements 3 features to enable software such as CRIU
to cleanly interpose upon software leveraging syscall user dispatch.

- Implement PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH, akin to a similar
  feature for SECCOMP.  This allows a ptracer to temporarily disable
  syscall user dispatch, making syscall injection possible.

- Implement an fs/proc extension that reports whether Syscall User
  Dispatch is being used in proc/status.  A similar value is present
  for SECCOMP, and is used to determine whether special logic is
  needed during checkpoint/resume.

- Implement a getter interface for Syscall User Dispatch config info.
  To resume successfully, the checkpoint/resume software has to
  save and restore this information.  Presently this configuration
  is write-only, with no way for C/R software to save it.


Signed-off-by: Gregory Price <gregory.price@memverge.com>  


Gregory Price (3):
  ptrace,syscall_user_dispatch: Implement Syscall User Dispatch
    Suspension
  fs/proc/array: Add Syscall User Dispatch to proc status
  prctl,syscall_user_dispatch: add a getter for configuration info

 .../admin-guide/syscall-user-dispatch.rst     | 18 +++++++
 fs/proc/array.c                               |  8 +++
 include/linux/ptrace.h                        |  2 +
 include/linux/syscall_user_dispatch.h         |  7 +++
 include/uapi/linux/prctl.h                    |  3 ++
 include/uapi/linux/ptrace.h                   |  6 ++-
 kernel/entry/syscall_user_dispatch.c          | 19 +++++++
 kernel/ptrace.c                               |  5 ++
 kernel/sys.c                                  |  4 ++
 .../syscall_user_dispatch/sud_test.c          | 54 +++++++++++++++++++
 10 files changed, 125 insertions(+), 1 deletion(-)

Comments

Peter Zijlstra Jan. 18, 2023, 5:16 p.m. UTC | #1
On Mon, Jan 09, 2023 at 10:33:46AM -0500, Gregory Price wrote:
> @@ -36,6 +37,10 @@ bool syscall_user_dispatch(struct pt_regs *regs)
>  	struct syscall_user_dispatch *sd = &current->syscall_dispatch;
>  	char state;
>  
> +	if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) &&
> +			unlikely(current->ptrace & PT_SUSPEND_SYSCALL_USER_DISPATCH))
> +		return false;
> +
>  	if (likely(instruction_pointer(regs) - sd->offset < sd->len))
>  		return false;
>  

So by making syscall_user_dispatch() return false, we'll make
syscall_trace_enter() continue to handle things, and supposedly you want
to land in ptrace_report_syscall_entry(), right?

> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 54482193e1ed..a6ad815bd4be 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -370,6 +370,11 @@ static int check_ptrace_options(unsigned long data)
>  	if (data & ~(unsigned long)PTRACE_O_MASK)
>  		return -EINVAL;
>  
> +	if (unlikely(data & PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH)) {
> +		if (!IS_ENABLED(CONFIG_CHECKPOINT_RESTART))
> +			return -EINVAL;
> +	}

Should setting this then not also depend on having
SYSCALL_WORK_SYSCALL_TRACE set? Because without that, you get 'funny'
things.
Gregory Price Jan. 18, 2023, 7:49 p.m. UTC | #2
On Wed, Jan 18, 2023 at 02:41:00PM -0500, Gregory Price wrote:
> ---------- Forwarded message ---------
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed, Jan 18, 2023 at 12:16 PM
> Subject: Re: [PATCH 1/3] ptrace,syscall_user_dispatch: Implement Syscall
> User Dispatch Suspension
> To: Gregory Price <gourry.memverge@gmail.com>
> 
> 
> On Mon, Jan 09, 2023 at 10:33:46AM -0500, Gregory Price wrote:
> > @@ -36,6 +37,10 @@ bool syscall_user_dispatch(struct pt_regs *regs)
> >       struct syscall_user_dispatch *sd = &current->syscall_dispatch;
> >       char state;
> >
> > +     if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) &&
> > +                     unlikely(current->ptrace &
> PT_SUSPEND_SYSCALL_USER_DISPATCH))
> > +             return false;
> > +
> >       if (likely(instruction_pointer(regs) - sd->offset < sd->len))
> >               return false;
> >
> 
> So by making syscall_user_dispatch() return false, we'll make
> syscall_trace_enter() continue to handle things, and supposedly you want
> to land in ptrace_report_syscall_entry(), right?
>
> ... snip ...
> 
> Should setting this then not also depend on having
> SYSCALL_WORK_SYSCALL_TRACE set? Because without that, you get 'funny'
> things.

Hm, this is an interesting question.  My thoughts are that I want the
process to handle the syscall as-if syscall user dispatch was not
present at all, regardless of SYSCALL_TRACE.

This is because some software, like CRIU, actually injects syscalls to
run in the context of the software in an effort to collect resources.

So I actually *want* those 'funny' things to occur, because they're most
likely intentional.  I don't necessarily want to intercept system calls
that subsequently occur (although i might).

So if this feature required SYSCALL_TRACE, you would no longer be able
to inject system calls ala CRIU.

That's also my understanding of the SECCOMP_SUSPEND feature as well,
it's intended specifically to allow *otherwise disallowed* syscalls to
be injected into the process and SECCOMP bypassed. (in this case,
SECCOMP_SUSPEND requires root for exactly this reason).
Peter Zijlstra Jan. 18, 2023, 8:40 p.m. UTC | #3
On Wed, Jan 18, 2023 at 02:49:31PM -0500, Gregory Price wrote:
> On Wed, Jan 18, 2023 at 02:41:00PM -0500, Gregory Price wrote:
> > ---------- Forwarded message ---------
> > From: Peter Zijlstra <peterz@infradead.org>
> > Date: Wed, Jan 18, 2023 at 12:16 PM
> > Subject: Re: [PATCH 1/3] ptrace,syscall_user_dispatch: Implement Syscall
> > User Dispatch Suspension
> > To: Gregory Price <gourry.memverge@gmail.com>
> > 
> > 
> > On Mon, Jan 09, 2023 at 10:33:46AM -0500, Gregory Price wrote:
> > > @@ -36,6 +37,10 @@ bool syscall_user_dispatch(struct pt_regs *regs)
> > >       struct syscall_user_dispatch *sd = &current->syscall_dispatch;
> > >       char state;
> > >
> > > +     if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) &&
> > > +                     unlikely(current->ptrace &
> > PT_SUSPEND_SYSCALL_USER_DISPATCH))
> > > +             return false;
> > > +
> > >       if (likely(instruction_pointer(regs) - sd->offset < sd->len))
> > >               return false;
> > >
> > 
> > So by making syscall_user_dispatch() return false, we'll make
> > syscall_trace_enter() continue to handle things, and supposedly you want
> > to land in ptrace_report_syscall_entry(), right?
> >
> > ... snip ...
> > 
> > Should setting this then not also depend on having
> > SYSCALL_WORK_SYSCALL_TRACE set? Because without that, you get 'funny'
> > things.
> 
> Hm, this is an interesting question.  My thoughts are that I want the
> process to handle the syscall as-if syscall user dispatch was not
> present at all, regardless of SYSCALL_TRACE.
> 
> This is because some software, like CRIU, actually injects syscalls to
> run in the context of the software in an effort to collect resources.

Oh, right. I used to know that.

> So I actually *want* those 'funny' things to occur, because they're most
> likely intentional.  I don't necessarily want to intercept system calls
> that subsequently occur (although i might).
> 
> So if this feature required SYSCALL_TRACE, you would no longer be able
> to inject system calls ala CRIU.

Yeah, I suppose you're right. It makes it a very sharp instrument, but I
suppose you get what you asked for.