Message ID | 20210616125157.438837-1-paul.gortmaker@windriver.com |
---|---|
State | Accepted |
Commit | 1e7107c5ef44431bc1ebbd4c353f1d7c22e5f2ec |
Headers | show |
Series | cgroup1: fix leaked context root causing sporadic NULL deref in LTP | expand |
On Wed, Jun 16, 2021 at 11:23:34AM -0400, Tejun Heo wrote: > On Wed, Jun 16, 2021 at 08:51:57AM -0400, Paul Gortmaker wrote: > > A fix would be to not leave the stale reference in fc->root as follows: > > -------------- > > dput(fc->root); > > + fc->root = NULL; > > deactivate_locked_super(sb); > > -------------- > > ...but then we are just open-coding a duplicate of fc_drop_locked() so we > > simply use that instead. > As this is unlikely to be a real-world problem both in probability and > circumstances, I'm applying this to cgroup/for-5.14 instead of > cgroup/for-5.13-fixes. FWIW at Arm we've started seeing what appears to be this issue blow up very frequently in some of our internal LTP CI runs against -next, seems to be mostly on lower end platforms. We seem to have started finding it at roughly the same time that the Yocto people did, I guess some other change made it more likely to trigger. Not exactly real world usage obviously but it's creating quite a bit of noise in testing which is disruptive so it'd be good to get it into -next as a fix.
On Wed, 2021-06-30 at 17:10 +0100, Mark Brown wrote: > On Wed, Jun 16, 2021 at 11:23:34AM -0400, Tejun Heo wrote: > > On Wed, Jun 16, 2021 at 08:51:57AM -0400, Paul Gortmaker wrote: > > > > A fix would be to not leave the stale reference in fc->root as follows: > > > > -------------- > > > dput(fc->root); > > > + fc->root = NULL; > > > deactivate_locked_super(sb); > > > -------------- > > > > ...but then we are just open-coding a duplicate of fc_drop_locked() so we > > > simply use that instead. > > > As this is unlikely to be a real-world problem both in probability and > > circumstances, I'm applying this to cgroup/for-5.14 instead of > > cgroup/for-5.13-fixes. > > FWIW at Arm we've started seeing what appears to be this issue blow up > very frequently in some of our internal LTP CI runs against -next, seems > to be mostly on lower end platforms. We seem to have started finding it > at roughly the same time that the Yocto people did, I guess some other > change made it more likely to trigger. Not exactly real world usage > obviously but it's creating quite a bit of noise in testing which is > disruptive so it'd be good to get it into -next as a fix. It is a horrible bug to debug as you end up with "random" failures on the systems which are hard to pin down. Along with the RCU stall hangs it was all a bit of a nightmare. Out of interest are you also seeing the proc01 test hang on a non-blocking read of /proc/kmsg periodically? https://bugzilla.yoctoproject.org/show_bug.cgi?id=14460 I've not figured out a way to reproduce it at will yet and it seems strace was enough to unblock it. It seems arm specific. Cheers, Richard
On Wed, Jun 30, 2021 at 11:31:06PM +0100, Richard Purdie wrote: > Out of interest are you also seeing the proc01 test hang on a non-blocking > read of /proc/kmsg periodically? > https://bugzilla.yoctoproject.org/show_bug.cgi?id=14460 > I've not figured out a way to reproduce it at will yet and it seems strace I've been talking to Ross, he mentioned it but no, rings no bells. > was enough to unblock it. It seems arm specific. If it's 32 bit I'm not really doing anything that looks at it.
On Thu, 2021-07-01 at 13:11 +0100, Mark Brown wrote: > On Wed, Jun 30, 2021 at 11:31:06PM +0100, Richard Purdie wrote: > > > Out of interest are you also seeing the proc01 test hang on a non-blocking > > read of /proc/kmsg periodically? > > > https://bugzilla.yoctoproject.org/show_bug.cgi?id=14460 > > > I've not figured out a way to reproduce it at will yet and it seems strace > > I've been talking to Ross, he mentioned it but no, rings no bells. > > > was enough to unblock it. It seems arm specific. > > If it's 32 bit I'm not really doing anything that looks at it. Its aarch64 in qemu running on aarch64 hardware with KVM. If you do happen to see anything let us know! Cheers, Richard
On Wed, Jun 16, 2021 at 08:51:57AM -0400, Paul Gortmaker wrote: > Richard reported sporadic (roughly one in 10 or so) null dereferences and > other strange behaviour for a set of automated LTP tests. Things like: > > BUG: kernel NULL pointer dereference, address: 0000000000000008 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 0 P4D 0 > Oops: 0000 [#1] PREEMPT SMP PTI > CPU: 0 PID: 1516 Comm: umount Not tainted 5.10.0-yocto-standard #1 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014 > RIP: 0010:kernfs_sop_show_path+0x1b/0x60 > > ...or these others: > > RIP: 0010:do_mkdirat+0x6a/0xf0 > RIP: 0010:d_alloc_parallel+0x98/0x510 > RIP: 0010:do_readlinkat+0x86/0x120 > > There were other less common instances of some kind of a general scribble > but the common theme was mount and cgroup and a dubious dentry triggering > the NULL dereference. I was only able to reproduce it under qemu by > replicating Richard's setup as closely as possible - I never did get it > to happen on bare metal, even while keeping everything else the same. > > In commit 71d883c37e8d ("cgroup_do_mount(): massage calling conventions") > we see this as a part of the overall change: > > -------------- > struct cgroup_subsys *ss; > - struct dentry *dentry; > > [...] > > - dentry = cgroup_do_mount(&cgroup_fs_type, fc->sb_flags, root, > - CGROUP_SUPER_MAGIC, ns); > > [...] > > - if (percpu_ref_is_dying(&root->cgrp.self.refcnt)) { > - struct super_block *sb = dentry->d_sb; > - dput(dentry); > + ret = cgroup_do_mount(fc, CGROUP_SUPER_MAGIC, ns); > + if (!ret && percpu_ref_is_dying(&root->cgrp.self.refcnt)) { > + struct super_block *sb = fc->root->d_sb; > + dput(fc->root); > deactivate_locked_super(sb); > msleep(10); > return restart_syscall(); > } > -------------- > > In changing from the local "*dentry" variable to using fc->root, we now > export/leave that dentry pointer in the file context after doing the dput() > in the unlikely "is_dying" case. With LTP doing a crazy amount of back to > back mount/unmount [testcases/bin/cgroup_regression_5_1.sh] the unlikely > becomes slightly likely and then bad things happen. > > A fix would be to not leave the stale reference in fc->root as follows: > > -------------- > dput(fc->root); > + fc->root = NULL; > deactivate_locked_super(sb); > -------------- > > ...but then we are just open-coding a duplicate of fc_drop_locked() so we > simply use that instead. > > Cc: Al Viro <viro@zeniv.linux.org.uk> > Cc: Tejun Heo <tj@kernel.org> > Cc: Zefan Li <lizefan.x@bytedance.com> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: stable@vger.kernel.org # v5.1+ > Reported-by: Richard Purdie <richard.purdie@linuxfoundation.org> > Fixes: 71d883c37e8d ("cgroup_do_mount(): massage calling conventions") > Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> I dropped the ball on this and this didn't get pushed. Re-applied to for-5.14-fixes. Will send out in a few days. Thanks. -- tejun
diff --git a/fs/internal.h b/fs/internal.h index 6aeae7ef3380..728f8d70d7f1 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -61,7 +61,6 @@ extern void __init chrdev_init(void); */ extern const struct fs_context_operations legacy_fs_context_ops; extern int parse_monolithic_mount_data(struct fs_context *, void *); -extern void fc_drop_locked(struct fs_context *); extern void vfs_clean_context(struct fs_context *fc); extern int finish_clean_context(struct fs_context *fc); diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h index 37e1e8f7f08d..5b44b0195a28 100644 --- a/include/linux/fs_context.h +++ b/include/linux/fs_context.h @@ -139,6 +139,7 @@ extern int vfs_parse_fs_string(struct fs_context *fc, const char *key, extern int generic_parse_monolithic(struct fs_context *fc, void *data); extern int vfs_get_tree(struct fs_context *fc); extern void put_fs_context(struct fs_context *fc); +extern void fc_drop_locked(struct fs_context *fc); /* * sget() wrappers to be called from the ->get_tree() op. diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c index 1f274d7fc934..6e554743cf2b 100644 --- a/kernel/cgroup/cgroup-v1.c +++ b/kernel/cgroup/cgroup-v1.c @@ -1223,9 +1223,7 @@ int cgroup1_get_tree(struct fs_context *fc) ret = cgroup_do_get_tree(fc); if (!ret && percpu_ref_is_dying(&ctx->root->cgrp.self.refcnt)) { - struct super_block *sb = fc->root->d_sb; - dput(fc->root); - deactivate_locked_super(sb); + fc_drop_locked(fc); ret = 1; }
Richard reported sporadic (roughly one in 10 or so) null dereferences and other strange behaviour for a set of automated LTP tests. Things like: BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 1516 Comm: umount Not tainted 5.10.0-yocto-standard #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014 RIP: 0010:kernfs_sop_show_path+0x1b/0x60 ...or these others: RIP: 0010:do_mkdirat+0x6a/0xf0 RIP: 0010:d_alloc_parallel+0x98/0x510 RIP: 0010:do_readlinkat+0x86/0x120 There were other less common instances of some kind of a general scribble but the common theme was mount and cgroup and a dubious dentry triggering the NULL dereference. I was only able to reproduce it under qemu by replicating Richard's setup as closely as possible - I never did get it to happen on bare metal, even while keeping everything else the same. In commit 71d883c37e8d ("cgroup_do_mount(): massage calling conventions") we see this as a part of the overall change: -------------- struct cgroup_subsys *ss; - struct dentry *dentry; [...] - dentry = cgroup_do_mount(&cgroup_fs_type, fc->sb_flags, root, - CGROUP_SUPER_MAGIC, ns); [...] - if (percpu_ref_is_dying(&root->cgrp.self.refcnt)) { - struct super_block *sb = dentry->d_sb; - dput(dentry); + ret = cgroup_do_mount(fc, CGROUP_SUPER_MAGIC, ns); + if (!ret && percpu_ref_is_dying(&root->cgrp.self.refcnt)) { + struct super_block *sb = fc->root->d_sb; + dput(fc->root); deactivate_locked_super(sb); msleep(10); return restart_syscall(); } -------------- In changing from the local "*dentry" variable to using fc->root, we now export/leave that dentry pointer in the file context after doing the dput() in the unlikely "is_dying" case. With LTP doing a crazy amount of back to back mount/unmount [testcases/bin/cgroup_regression_5_1.sh] the unlikely becomes slightly likely and then bad things happen. A fix would be to not leave the stale reference in fc->root as follows: -------------- dput(fc->root); + fc->root = NULL; deactivate_locked_super(sb); -------------- ...but then we are just open-coding a duplicate of fc_drop_locked() so we simply use that instead. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: stable@vger.kernel.org # v5.1+ Reported-by: Richard Purdie <richard.purdie@linuxfoundation.org> Fixes: 71d883c37e8d ("cgroup_do_mount(): massage calling conventions") Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>