diff mbox series

[3/3] ptrace,syscall_user_dispatch: add a getter/setter for sud configuration

Message ID 20230123032942.18263-4-gregory.price@memverge.com
State New
Headers show
Series Checkpoint Support for Syscall User Dispatch | expand

Commit Message

Gregory Price Jan. 23, 2023, 3:29 a.m. UTC
Implement ptrace getter/setter interface for syscall user dispatch.

Presently, these settings are write-only via prctl, making it impossible
to implement transparent checkpoint (coordination with the software is
required).

This is modeled after a similar interface for SECCOMP, which can have
its configuration dumped by ptrace for software like CRIU.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 .../admin-guide/syscall-user-dispatch.rst     |  5 +-
 include/linux/syscall_user_dispatch.h         | 19 ++++++++
 include/uapi/linux/ptrace.h                   | 10 ++++
 kernel/entry/syscall_user_dispatch.c          | 46 +++++++++++++++++++
 kernel/ptrace.c                               |  9 ++++
 5 files changed, 88 insertions(+), 1 deletion(-)

Comments

Andrei Vagin Jan. 24, 2023, 2:51 a.m. UTC | #1
On Sun, Jan 22, 2023 at 8:22 PM Gregory Price <gourry.memverge@gmail.com> wrote:
<snip>
>
> +#define PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG 0x4210
> +#define PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG 0x4211
> +struct syscall_user_dispatch_config {
> +       __u64 mode;
> +       __s8 *selector;
> +       __u64 offset;
> +       __u64 len;
> +       __u8 on_dispatch;

Sorry, I didn't notice this in the previous version. on_dispatch looks
like an internal
property and I don't see how we can stop a process with ptrace when on_dispatch
is set to a non-zero value. I am not sure that we need to expose it to
user-space.

Other than that, the patch looks good to me.

Thanks,
Andrei
Gregory Price Jan. 24, 2023, 3:30 a.m. UTC | #2
On Mon, Jan 23, 2023 at 06:51:07PM -0800, Andrei Vagin wrote:
> On Sun, Jan 22, 2023 at 8:22 PM Gregory Price <gourry.memverge@gmail.com> wrote:
> <snip>
> >
> > +#define PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG 0x4210
> > +#define PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG 0x4211
> > +struct syscall_user_dispatch_config {
> > +       __u64 mode;
> > +       __s8 *selector;
> > +       __u64 offset;
> > +       __u64 len;
> > +       __u8 on_dispatch;
> 
> Sorry, I didn't notice this in the previous version. on_dispatch looks
> like an internal
> property and I don't see how we can stop a process with ptrace when on_dispatch
> is set to a non-zero value. I am not sure that we need to expose it to
> user-space.
> 
> Other than that, the patch looks good to me.
> 
> Thanks,
> Andrei

I tried tracing down the exit routes, but wasn't sure if there was a
no-return somewhere in the stack i hadn't accounted for, so i left it in
just in case.

I'll take one more look then i'll drop it before shipping out a v6.

May I add your Reviewed-by?

Thanks
~Gregory
Gregory Price Jan. 24, 2023, 3:59 p.m. UTC | #3
On Mon, Jan 23, 2023 at 08:52:29PM +0100, Oleg Nesterov wrote:
> On 01/23, Gregory Price wrote:
> >
> > So i think dropping 2/3 in the list is good.  If you concur i'll do
> > that.
> 
> Well I obviously think that 2/3 should be dropped ;)
> 
> As for 1/3 and 3/3, feel free to add my reviewed-by.
> 
> Oleg.
>

I'm actually going to walk my agreement back.

After one more review, the need for the proc/status entry is not to
decide whether to dump SUD settings, but for use in deciding whether to
set the SUSPEND_SYSCALL_DISPATCH option from patch 1/3.

For SECCOMP, CRIU's `compel` does the following:

1. ptrace attach / halt
2. examine proc/status for seccomp usage
3. if seccomp in use, set PTRACE_O_SUSPEND_SECCOMP
4. proceed with further operations

The same pattern would be used for syscall dispatch.

Technically I think setting the flag unconditionally would be safe, but
it would lead to unclear system state (i.e. did i actually suspend
something? was the process actually using it?)

To me it seems better to leave it explicit and keep the second commit.

Thoughts?

(cc: @avagin if you happen to have any input on this particular pattern)

~Gregory
Oleg Nesterov Jan. 24, 2023, 4:43 p.m. UTC | #4
I won't really argue, but...

On 01/24, Gregory Price wrote:
>
> On Mon, Jan 23, 2023 at 08:52:29PM +0100, Oleg Nesterov wrote:
> > On 01/23, Gregory Price wrote:
> > >
> > > So i think dropping 2/3 in the list is good.  If you concur i'll do
> > > that.
> >
> > Well I obviously think that 2/3 should be dropped ;)
> >
> > As for 1/3 and 3/3, feel free to add my reviewed-by.
> >
> > Oleg.
> >
>
> I'm actually going to walk my agreement back.
>
> After one more review, the need for the proc/status entry is not to
> decide whether to dump SUD settings, but for use in deciding whether to
> set the SUSPEND_SYSCALL_DISPATCH option from patch 1/3.

Rather than read /proc/pid/status, CRIU can just do
PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG unconditionally
and check syscall_user_dispatch_config.mode ?

Why do want to expose SYSCALL_USER_DISPATCH in /proc/status? If this task
is not stopped you can't trust this value anyway. If it is stopped, I don't
think ptrace(PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG) is slower than reading
/proc.

but perhaps I missed something?

Oleg.
Gregory Price Jan. 24, 2023, 4:54 p.m. UTC | #5
On Tue, Jan 24, 2023 at 05:43:47PM +0100, Oleg Nesterov wrote:
> I won't really argue, but...
> 
> On 01/24, Gregory Price wrote:
> >
> > On Mon, Jan 23, 2023 at 08:52:29PM +0100, Oleg Nesterov wrote:
> > > On 01/23, Gregory Price wrote:
> > > >
> > > > So i think dropping 2/3 in the list is good.  If you concur i'll do
> > > > that.
> > >
> > > Well I obviously think that 2/3 should be dropped ;)
> > >
> > > As for 1/3 and 3/3, feel free to add my reviewed-by.
> > >
> > > Oleg.
> > >
> >
> > I'm actually going to walk my agreement back.
> >
> > After one more review, the need for the proc/status entry is not to
> > decide whether to dump SUD settings, but for use in deciding whether to
> > set the SUSPEND_SYSCALL_DISPATCH option from patch 1/3.
> 
> Rather than read /proc/pid/status, CRIU can just do
> PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG unconditionally
> and check syscall_user_dispatch_config.mode ?
> 
> Why do want to expose SYSCALL_USER_DISPATCH in /proc/status? If this task
> is not stopped you can't trust this value anyway. If it is stopped, I don't
> think ptrace(PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG) is slower than reading
> /proc.
> 
> but perhaps I missed something?
> 
> Oleg.
> 

*facepalm* good point, i'm wondering if there's a reason CRIU doesn't do
the same for SECCOMP.

either way, going to drop it
Andrei Vagin Jan. 24, 2023, 5:58 p.m. UTC | #6
On Tue, Jan 24, 2023 at 8:54 AM Gregory Price
<gregory.price@memverge.com> wrote:
>
> On Tue, Jan 24, 2023 at 05:43:47PM +0100, Oleg Nesterov wrote:
> > I won't really argue, but...
> >
> > On 01/24, Gregory Price wrote:
> > >
> > > On Mon, Jan 23, 2023 at 08:52:29PM +0100, Oleg Nesterov wrote:
> > > > On 01/23, Gregory Price wrote:
> > > > >
> > > > > So i think dropping 2/3 in the list is good.  If you concur i'll do
> > > > > that.
> > > >
> > > > Well I obviously think that 2/3 should be dropped ;)
> > > >
> > > > As for 1/3 and 3/3, feel free to add my reviewed-by.
> > > >
> > > > Oleg.
> > > >
> > >
> > > I'm actually going to walk my agreement back.
> > >
> > > After one more review, the need for the proc/status entry is not to
> > > decide whether to dump SUD settings, but for use in deciding whether to
> > > set the SUSPEND_SYSCALL_DISPATCH option from patch 1/3.
> >
> > Rather than read /proc/pid/status, CRIU can just do
> > PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG unconditionally
> > and check syscall_user_dispatch_config.mode ?
> >
> > Why do want to expose SYSCALL_USER_DISPATCH in /proc/status? If this task
> > is not stopped you can't trust this value anyway. If it is stopped, I don't
> > think ptrace(PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG) is slower than reading
> > /proc.
> >
> > but perhaps I missed something?
> >
> > Oleg.
> >
>
> *facepalm* good point, i'm wondering if there's a reason CRIU doesn't do
> the same for SECCOMP.

Because information about seccomp was in /proc/pid/status forever and we
started using it before the ptrace interface was merged. I am not sure that
this is the only reason, but  it is definitely one of them.

>
> either way, going to drop it
Gregory Price Jan. 24, 2023, 9:39 p.m. UTC | #7
On Tue, Jan 24, 2023 at 09:58:02AM -0800, Andrei Vagin wrote:
> >
> > *facepalm* good point, i'm wondering if there's a reason CRIU doesn't do
> > the same for SECCOMP.
> 
> Because information about seccomp was in /proc/pid/status forever and we
> started using it before the ptrace interface was merged. I am not sure that
> this is the only reason, but  it is definitely one of them.
> 

Even better reason to drop it.  I'll send out (hopefully) the final
configuration here shortly.

Glad this simplified down as much as it did.
diff mbox series

Patch

diff --git a/Documentation/admin-guide/syscall-user-dispatch.rst b/Documentation/admin-guide/syscall-user-dispatch.rst
index 60314953c728..a23ae21a1d5b 100644
--- a/Documentation/admin-guide/syscall-user-dispatch.rst
+++ b/Documentation/admin-guide/syscall-user-dispatch.rst
@@ -43,7 +43,10 @@  doesn't rely on any of the syscall ABI to make the filtering.  It uses
 only the syscall dispatcher address and the userspace key.
 
 As the ABI of these intercepted syscalls is unknown to Linux, these
-syscalls are not instrumentable via ptrace or the syscall tracepoints.
+syscalls are not instrumentable via ptrace or the syscall tracepoints,
+however an interfaces to suspend, checkpoint, and restore syscall user
+dispatch configuration has been added to ptrace to assist userland
+checkpoint/restart software.
 
 Interface
 ---------
diff --git a/include/linux/syscall_user_dispatch.h b/include/linux/syscall_user_dispatch.h
index a0ae443fb7df..9e1bd0d87c1e 100644
--- a/include/linux/syscall_user_dispatch.h
+++ b/include/linux/syscall_user_dispatch.h
@@ -22,6 +22,13 @@  int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 #define clear_syscall_work_syscall_user_dispatch(tsk) \
 	clear_task_syscall_work(tsk, SYSCALL_USER_DISPATCH)
 
+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+	void __user *data);
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+	void __user *data);
+
+
 #else
 struct syscall_user_dispatch {};
 
@@ -35,6 +42,18 @@  static inline void clear_syscall_work_syscall_user_dispatch(struct task_struct *
 {
 }
 
+static inline int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+	void __user *data)
+{
+	return -EINVAL;
+}
+
+static inline int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+	void __user *data)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_GENERIC_ENTRY */
 
 #endif /* _SYSCALL_USER_DISPATCH_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index ba9e3f19a22c..8b93c78189b5 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -112,6 +112,16 @@  struct ptrace_rseq_configuration {
 	__u32 pad;
 };
 
+#define PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG 0x4210
+#define PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG 0x4211
+struct syscall_user_dispatch_config {
+	__u64 mode;
+	__s8 *selector;
+	__u64 offset;
+	__u64 len;
+	__u8 on_dispatch;
+};
+
 /*
  * These values are stored in task->ptrace_message
  * by ptrace_stop to describe the current syscall-stop.
diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index b5ec75164805..a303c8de59af 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -111,3 +111,49 @@  int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 
 	return 0;
 }
+
+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+		void __user *data)
+{
+	struct syscall_user_dispatch *sd = &task->syscall_dispatch;
+	struct syscall_user_dispatch_config config;
+
+	if (size != sizeof(struct syscall_user_dispatch_config))
+		return -EINVAL;
+
+	if (test_syscall_work(SYSCALL_USER_DISPATCH))
+		config.mode = PR_SYS_DISPATCH_ON;
+	else
+		config.mode = PR_SYS_DISPATCH_OFF;
+
+	config.offset = sd->offset;
+	config.len = sd->len;
+	config.selector = sd->selector;
+	config.on_dispatch = sd->on_dispatch;
+
+	if (copy_to_user(data, &config, sizeof(config)))
+		return -EFAULT;
+
+	return 0;
+}
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+		void __user *data)
+{
+	struct syscall_user_dispatch_config config;
+	int ret;
+
+	if (size != sizeof(struct syscall_user_dispatch_config))
+		return -EINVAL;
+
+	if (copy_from_user(&config, data, sizeof(config)))
+		return -EFAULT;
+
+	ret = set_syscall_user_dispatch(config.mode, config.offset, config.len,
+			config.selector);
+	if (ret)
+		return ret;
+
+	task->syscall_dispatch.on_dispatch = config.on_dispatch;
+	return 0;
+}
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index a348b68d07a2..76de46e080e2 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -32,6 +32,7 @@ 
 #include <linux/compat.h>
 #include <linux/sched/signal.h>
 #include <linux/minmax.h>
+#include <linux/syscall_user_dispatch.h>
 
 #include <asm/syscall.h>	/* for syscall_get_* */
 
@@ -1263,6 +1264,14 @@  int ptrace_request(struct task_struct *child, long request,
 		break;
 #endif
 
+	case PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG:
+		ret = syscall_user_dispatch_set_config(child, addr, datavp);
+		break;
+
+	case PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG:
+		ret = syscall_user_dispatch_get_config(child, addr, datavp);
+		break;
+
 	default:
 		break;
 	}