Message ID | 20180613121711.5018-5-juri.lelli@redhat.com |
---|---|
State | New |
Headers | show |
Series | [v4,1/5] sched/topology: Add check to backup comment about hotplug lock | expand |
On Wed, 13 Jun 2018 14:17:10 +0200 Ju > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index b42037e6e81d..d26fd4795aa3 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -2409,6 +2409,22 @@ void __init cpuset_init_smp(void) > BUG_ON(!cpuset_migrate_mm_wq); > } > > +/** > + * cpuset_lock - Grab the cpuset_mutex from another subsysytem > + */ > +int cpuset_lock(void) Shouldn't this be called "cpuset_trylock()" otherwise one may think that it will always return with the cpuset_mutex locked. -- Steve > +{ > + return mutex_trylock(&cpuset_mutex); > +} > + > +/** > + * cpuset_unlock - Release the cpuset_mutex from another subsysytem > + */
On 14/06/18 09:45, Steven Rostedt wrote: > On Wed, 13 Jun 2018 14:17:10 +0200 > Ju > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > > index b42037e6e81d..d26fd4795aa3 100644 > > --- a/kernel/cgroup/cpuset.c > > +++ b/kernel/cgroup/cpuset.c > > @@ -2409,6 +2409,22 @@ void __init cpuset_init_smp(void) > > BUG_ON(!cpuset_migrate_mm_wq); > > } > > > > +/** > > + * cpuset_lock - Grab the cpuset_mutex from another subsysytem > > + */ > > +int cpuset_lock(void) > > Shouldn't this be called "cpuset_trylock()" otherwise one may think > that it will always return with the cpuset_mutex locked. Sure. Thanks, - Juri
On Wed, 13 Jun 2018 14:17:10 +0200 Juri Lelli <juri.lelli@redhat.com> wrote: > +/** > + * cpuset_lock - Grab the cpuset_mutex from another subsysytem > + */ > +int cpuset_lock(void) > +{ > + return mutex_trylock(&cpuset_mutex); > +} > + > +/** > + * cpuset_unlock - Release the cpuset_mutex from another subsysytem > + */ > +void cpuset_unlock(void) > +{ > + mutex_unlock(&cpuset_mutex); > +} > + > /** > * cpuset_cpus_allowed - return cpus_allowed mask from a tasks cpuset. > * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed. > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index ca788f74259d..a5b0c6c25b44 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -4218,6 +4218,14 @@ static int __sched_setscheduler(struct task_struct *p, > if (attr->sched_flags & SCHED_FLAG_SUGOV) > return -EINVAL; > > + /* > + * Make sure we don't race with the cpuset subsystem where root > + * domains can be rebuilt or modified while operations like DL > + * admission checks are carried out. > + */ > + if (!cpuset_lock()) > + return -EBUSY; > + How important is this for being a trylock? Reason I am asking is that it is making my deadline test fail to get the resources. What my test does is to create a bunch of threads, and they try to get the deadline resources as they are created. This happens in parallel. Thus, when one gets the cpuset_lock, the others are hitting this and returning -EBUSY. At first I thought I was over committing somehow but after adding a trace_printk() in cpuset_lock(), and this fail case it became obvious to what was happening: deadline_test-1376 [004] 107.179693: sys_enter_sched_setattr: pid: 0x00000000, uattr: 0x7f017fceeed0, flags: 0x00000000 deadline_test-1376 [004] 107.179697: bputs: cpuset_lock: cpuset_mutex trylock deadline_test-1377 [003] 107.179763: sys_exit_futex: 0x0 deadline_test-1377 [003] 107.179767: sys_enter_sched_setattr: pid: 0x00000000, uattr: 0x7f017f4eded0, flags: 0x00000000 deadline_test-1377 [003] 107.179771: bputs: cpuset_lock: cpuset_mutex trylock deadline_test-1377 [003] 107.179771: bputs: __sched_setscheduler: cpuset_lock deadline_test-1377 [003] 107.179773: sys_exit_sched_setattr: 0xfffffffffffffff0 -- Steve > retval = security_task_setscheduler(p); > if (retval) > return retval;
On 14/06/18 16:11, Steven Rostedt wrote: > On Wed, 13 Jun 2018 14:17:10 +0200 > Juri Lelli <juri.lelli@redhat.com> wrote: > > > +/** > > + * cpuset_lock - Grab the cpuset_mutex from another subsysytem > > + */ > > +int cpuset_lock(void) > > +{ > > + return mutex_trylock(&cpuset_mutex); > > +} > > + > > +/** > > + * cpuset_unlock - Release the cpuset_mutex from another subsysytem > > + */ > > +void cpuset_unlock(void) > > +{ > > + mutex_unlock(&cpuset_mutex); > > +} > > + > > /** > > * cpuset_cpus_allowed - return cpus_allowed mask from a tasks cpuset. > > * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed. > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index ca788f74259d..a5b0c6c25b44 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -4218,6 +4218,14 @@ static int __sched_setscheduler(struct task_struct *p, > > if (attr->sched_flags & SCHED_FLAG_SUGOV) > > return -EINVAL; > > > > + /* > > + * Make sure we don't race with the cpuset subsystem where root > > + * domains can be rebuilt or modified while operations like DL > > + * admission checks are carried out. > > + */ > > + if (!cpuset_lock()) > > + return -EBUSY; > > + > > How important is this for being a trylock? Reason I am asking is that > it is making my deadline test fail to get the resources. What my test > does is to create a bunch of threads, and they try to get the deadline > resources as they are created. This happens in parallel. Thus, when one > gets the cpuset_lock, the others are hitting this and returning -EBUSY. > At first I thought I was over committing somehow but after adding a > trace_printk() in cpuset_lock(), and this fail case it became obvious > to what was happening: > > deadline_test-1376 [004] 107.179693: sys_enter_sched_setattr: pid: 0x00000000, uattr: 0x7f017fceeed0, flags: 0x00000000 > deadline_test-1376 [004] 107.179697: bputs: cpuset_lock: cpuset_mutex trylock > deadline_test-1377 [003] 107.179763: sys_exit_futex: 0x0 > deadline_test-1377 [003] 107.179767: sys_enter_sched_setattr: pid: 0x00000000, uattr: 0x7f017f4eded0, flags: 0x00000000 > deadline_test-1377 [003] 107.179771: bputs: cpuset_lock: cpuset_mutex trylock > deadline_test-1377 [003] 107.179771: bputs: __sched_setscheduler: cpuset_lock > deadline_test-1377 [003] 107.179773: sys_exit_sched_setattr: 0xfffffffffffffff0 Right, not nice. So, even if we can rework sched_setattr and do_sched_setscheduler to endup calling __sched_setscheduler outside of RCU read_lock sections (as Peter suggested) it seems that we might still get into troubles, e.g. [ 39.890366] ====================================================== [ 39.892741] WARNING: possible circular locking dependency detected [ 39.894723] 4.17.0-rc6-before+ #169 Not tainted [ 39.896380] ------------------------------------------------------ [ 39.898369] bash/2206 is trying to acquire lock: [ 39.899862] 0000000083b596bc (cpu_hotplug_lock.rw_sem){++++}, at: rebuild_sched_domains_locked+0x29/0x660 [ 39.902931] [ 39.902931] but task is already holding lock: [ 39.904790] 000000004fc7a567 (cpuset_mutex){+.+.}, at: cpuset_write_u64+0x23/0x140 [ 39.907263] [ 39.907263] which lock already depends on the new lock. [ 39.907263] [ 39.909936] [ 39.909936] the existing dependency chain (in reverse order) is: [ 39.912479] [ 39.912479] -> #2 (cpuset_mutex){+.+.}: [ 39.914249] __sched_setscheduler+0xc6/0x880 [ 39.915779] _sched_setscheduler+0x77/0xa0 [ 39.917270] __kthread_create_on_node+0x122/0x1a0 [ 39.918948] kthread_create_on_node+0x46/0x70 [ 39.920512] init_rescuer.part.8+0x49/0xa0 [ 39.921688] workqueue_init+0x1ec/0x2d5 [ 39.922659] kernel_init_freeable+0x23b/0x5b9 [ 39.923756] kernel_init+0xa/0x110 [ 39.924656] ret_from_fork+0x3a/0x50 [ 39.925570] [ 39.925570] -> #1 (wq_pool_mutex){+.+.}: [ 39.926591] apply_workqueue_attrs+0x20/0x50 [ 39.927554] __alloc_workqueue_key+0x1c7/0x490 [ 39.928700] workqueue_init_early+0x257/0x33f [ 39.929685] start_kernel+0x2fe/0x531 [ 39.930530] secondary_startup_64+0xa5/0xb0 [ 39.931336] [ 39.931336] -> #0 (cpu_hotplug_lock.rw_sem){++++}: [ 39.932195] cpus_read_lock+0x39/0xa0 [ 39.932772] rebuild_sched_domains_locked+0x29/0x660 [ 39.933513] update_flag+0x1fd/0x210 [ 39.934075] cpuset_write_u64+0xe7/0x140 [ 39.934708] cgroup_file_write+0x174/0x230 [ 39.935350] kernfs_fop_write+0x113/0x1a0 [ 39.935991] __vfs_write+0x36/0x180 [ 39.936473] vfs_write+0xc5/0x1b0 [ 39.936991] ksys_write+0x55/0xc0 [ 39.937504] do_syscall_64+0x73/0x4d0 [ 39.938061] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 39.938803] [ 39.938803] other info that might help us debug this: [ 39.938803] [ 39.939708] Chain exists of: [ 39.939708] cpu_hotplug_lock.rw_sem --> wq_pool_mutex --> cpuset_mutex [ 39.939708] [ 39.940960] Possible unsafe locking scenario: [ 39.940960] [ 39.941607] CPU0 CPU1 [ 39.942056] ---- ---- [ 39.942523] lock(cpuset_mutex); [ 39.942858] lock(wq_pool_mutex); [ 39.943436] lock(cpuset_mutex); [ 39.944028] lock(cpu_hotplug_lock.rw_sem); [ 39.944449] [ 39.944449] *** DEADLOCK *** [ 39.944449] [ 39.945113] 4 locks held by bash/2206: [ 39.945485] #0: 000000003cc231db (sb_writers#10){.+.+}, at: vfs_write+0x193/0x1b0 [ 39.946246] #1: 00000000fdc2c059 (&of->mutex){+.+.}, at: kernfs_fop_write+0xe2/0x1a0 [ 39.946980] #2: 00000000e12e3a5d (kn->count#11){.+.+}, at: kernfs_fop_write+0xeb/0x1a0 [ 39.947736] #3: 000000004fc7a567 (cpuset_mutex){+.+.}, at: cpuset_write_u64+0x23/0x140 [ 39.948463] [ 39.948463] stack backtrace: [ 39.948871] CPU: 0 PID: 2206 Comm: bash Not tainted 4.17.0-rc6-before+ #169 [ 39.949489] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 39.950271] Call Trace: [ 39.950502] dump_stack+0x67/0x9b [ 39.950810] print_circular_bug.isra.24+0x1c8/0x2b0 [ 39.951252] __lock_acquire+0x1813/0x18e0 [ 39.951618] ? __lock_acquire+0x1219/0x18e0 [ 39.952016] ? lock_acquire+0x8e/0x240 [ 39.952357] lock_acquire+0x8e/0x240 [ 39.952683] ? rebuild_sched_domains_locked+0x29/0x660 [ 39.953179] cpus_read_lock+0x39/0xa0 [ 39.953516] ? rebuild_sched_domains_locked+0x29/0x660 [ 39.953981] rebuild_sched_domains_locked+0x29/0x660 [ 39.954431] ? mark_held_locks+0x56/0x80 [ 39.954790] ? _raw_spin_unlock_irq+0x29/0x50 [ 39.955186] update_flag+0x1fd/0x210 [ 39.955541] cpuset_write_u64+0xe7/0x140 [ 39.955900] cgroup_file_write+0x174/0x230 [ 39.956274] kernfs_fop_write+0x113/0x1a0 [ 39.956640] __vfs_write+0x36/0x180 [ 39.956963] ? rcu_read_lock_sched_held+0x74/0x80 [ 39.957482] ? rcu_sync_lockdep_assert+0x2e/0x60 [ 39.957969] ? __sb_start_write+0x154/0x1f0 [ 39.958415] ? __sb_start_write+0x16a/0x1f0 [ 39.958877] vfs_write+0xc5/0x1b0 [ 39.959310] ksys_write+0x55/0xc0 [ 39.959696] do_syscall_64+0x73/0x4d0 [ 39.960054] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 39.960533] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 39.961071] RIP: 0033:0x7f534866d780 [ 39.961477] RSP: 002b:00007fff54dfa398 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 39.962380] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f534866d780 [ 39.963166] RDX: 0000000000000002 RSI: 0000000000d97408 RDI: 0000000000000001 [ 39.964008] RBP: 0000000000d97408 R08: 000000000000000a R09: 00007f5348f6d700 [ 39.964780] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f53489237a0 [ 39.965588] R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000000 Happening the moment one disables sched_load_balance at root level. I'll try harder to find alternatives, but suggestions are welcome! :-) Best, - Juri
On 15/06/18 09:01, Juri Lelli wrote:
[...]
> I'll try harder to find alternatives, but suggestions are welcome! :-)
I wonder if something like the following might actually work. IIUC
cpuset.c comment [1], callback_lock is the one to actually take if one
needs to only query cpusets.
[1] https://elixir.bootlin.com/linux/latest/source/kernel/cgroup/cpuset.c#L266
--->8---
include/linux/cpuset.h | 4 +--
kernel/cgroup/cpuset.c | 72 +++++++++++++++++++++++++-------------------------
kernel/sched/core.c | 24 +++++++----------
3 files changed, 48 insertions(+), 52 deletions(-)
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index a1970862ab8e..4bbb3f5a3020 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -55,7 +55,7 @@ extern void cpuset_init_smp(void);
extern void cpuset_force_rebuild(void);
extern void cpuset_update_active_cpus(void);
extern void cpuset_wait_for_hotplug(void);
-extern int cpuset_lock(void);
+extern void cpuset_lock(void);
extern void cpuset_unlock(void);
extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask);
extern void cpuset_cpus_allowed_fallback(struct task_struct *p);
@@ -178,7 +178,7 @@ static inline void cpuset_update_active_cpus(void)
static inline void cpuset_wait_for_hotplug(void) { }
-static inline int cpuset_lock(void) { return 1; }
+static inline void cpuset_lock(void) { }
static inline void cpuset_unlock(void) { }
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index d26fd4795aa3..d5a0b4ec31af 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -288,7 +288,7 @@ static struct cpuset top_cpuset = {
*/
static DEFINE_MUTEX(cpuset_mutex);
-static DEFINE_SPINLOCK(callback_lock);
+static DEFINE_RAW_SPINLOCK(callback_lock);
static struct workqueue_struct *cpuset_migrate_mm_wq;
@@ -921,9 +921,9 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
continue;
rcu_read_unlock();
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
cpumask_copy(cp->effective_cpus, new_cpus);
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
WARN_ON(!is_in_v2_mode() &&
!cpumask_equal(cp->cpus_allowed, cp->effective_cpus));
@@ -988,9 +988,9 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
if (retval < 0)
return retval;
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
/* use trialcs->cpus_allowed as a temp variable */
update_cpumasks_hier(cs, trialcs->cpus_allowed);
@@ -1174,9 +1174,9 @@ static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems)
continue;
rcu_read_unlock();
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
cp->effective_mems = *new_mems;
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
WARN_ON(!is_in_v2_mode() &&
!nodes_equal(cp->mems_allowed, cp->effective_mems));
@@ -1244,9 +1244,9 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
if (retval < 0)
goto done;
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
cs->mems_allowed = trialcs->mems_allowed;
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
/* use trialcs->mems_allowed as a temp variable */
update_nodemasks_hier(cs, &trialcs->mems_allowed);
@@ -1337,9 +1337,9 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
spread_flag_changed = ((is_spread_slab(cs) != is_spread_slab(trialcs))
|| (is_spread_page(cs) != is_spread_page(trialcs)));
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
cs->flags = trialcs->flags;
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
if (!cpumask_empty(trialcs->cpus_allowed) && balance_flag_changed)
rebuild_sched_domains_locked();
@@ -1754,7 +1754,7 @@ static int cpuset_common_seq_show(struct seq_file *sf, void *v)
cpuset_filetype_t type = seq_cft(sf)->private;
int ret = 0;
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
switch (type) {
case FILE_CPULIST:
@@ -1773,7 +1773,7 @@ static int cpuset_common_seq_show(struct seq_file *sf, void *v)
ret = -EINVAL;
}
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
return ret;
}
@@ -1988,12 +1988,12 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
cpuset_inc();
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
if (is_in_v2_mode()) {
cpumask_copy(cs->effective_cpus, parent->effective_cpus);
cs->effective_mems = parent->effective_mems;
}
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
if (!test_bit(CGRP_CPUSET_CLONE_CHILDREN, &css->cgroup->flags))
goto out_unlock;
@@ -2020,12 +2020,12 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
}
rcu_read_unlock();
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
cs->mems_allowed = parent->mems_allowed;
cs->effective_mems = parent->mems_allowed;
cpumask_copy(cs->cpus_allowed, parent->cpus_allowed);
cpumask_copy(cs->effective_cpus, parent->cpus_allowed);
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
out_unlock:
mutex_unlock(&cpuset_mutex);
return 0;
@@ -2064,7 +2064,7 @@ static void cpuset_css_free(struct cgroup_subsys_state *css)
static void cpuset_bind(struct cgroup_subsys_state *root_css)
{
mutex_lock(&cpuset_mutex);
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
if (is_in_v2_mode()) {
cpumask_copy(top_cpuset.cpus_allowed, cpu_possible_mask);
@@ -2075,7 +2075,7 @@ static void cpuset_bind(struct cgroup_subsys_state *root_css)
top_cpuset.mems_allowed = top_cpuset.effective_mems;
}
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
mutex_unlock(&cpuset_mutex);
}
@@ -2173,12 +2173,12 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
{
bool is_empty;
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
cpumask_copy(cs->cpus_allowed, new_cpus);
cpumask_copy(cs->effective_cpus, new_cpus);
cs->mems_allowed = *new_mems;
cs->effective_mems = *new_mems;
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
/*
* Don't call update_tasks_cpumask() if the cpuset becomes empty,
@@ -2215,10 +2215,10 @@ hotplug_update_tasks(struct cpuset *cs,
if (nodes_empty(*new_mems))
*new_mems = parent_cs(cs)->effective_mems;
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
cpumask_copy(cs->effective_cpus, new_cpus);
cs->effective_mems = *new_mems;
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
if (cpus_updated)
update_tasks_cpumask(cs);
@@ -2311,21 +2311,21 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
/* synchronize cpus_allowed to cpu_active_mask */
if (cpus_updated) {
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
if (!on_dfl)
cpumask_copy(top_cpuset.cpus_allowed, &new_cpus);
cpumask_copy(top_cpuset.effective_cpus, &new_cpus);
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
/* we don't mess with cpumasks of tasks in top_cpuset */
}
/* synchronize mems_allowed to N_MEMORY */
if (mems_updated) {
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
if (!on_dfl)
top_cpuset.mems_allowed = new_mems;
top_cpuset.effective_mems = new_mems;
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
update_tasks_nodemask(&top_cpuset);
}
@@ -2412,9 +2412,9 @@ void __init cpuset_init_smp(void)
/**
* cpuset_lock - Grab the cpuset_mutex from another subsysytem
*/
-int cpuset_lock(void)
+void cpuset_lock(void)
{
- return mutex_trylock(&cpuset_mutex);
+ raw_spin_lock(&callback_lock);
}
/**
@@ -2422,7 +2422,7 @@ int cpuset_lock(void)
*/
void cpuset_unlock(void)
{
- mutex_unlock(&cpuset_mutex);
+ raw_spin_unlock(&callback_lock);
}
/**
@@ -2440,11 +2440,11 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
{
unsigned long flags;
- spin_lock_irqsave(&callback_lock, flags);
+ raw_spin_lock_irqsave(&callback_lock, flags);
rcu_read_lock();
guarantee_online_cpus(task_cs(tsk), pmask);
rcu_read_unlock();
- spin_unlock_irqrestore(&callback_lock, flags);
+ raw_spin_unlock_irqrestore(&callback_lock, flags);
}
void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
@@ -2492,11 +2492,11 @@ nodemask_t cpuset_mems_allowed(struct task_struct *tsk)
nodemask_t mask;
unsigned long flags;
- spin_lock_irqsave(&callback_lock, flags);
+ raw_spin_lock_irqsave(&callback_lock, flags);
rcu_read_lock();
guarantee_online_mems(task_cs(tsk), &mask);
rcu_read_unlock();
- spin_unlock_irqrestore(&callback_lock, flags);
+ raw_spin_unlock_irqrestore(&callback_lock, flags);
return mask;
}
@@ -2588,14 +2588,14 @@ bool __cpuset_node_allowed(int node, gfp_t gfp_mask)
return true;
/* Not hardwall and node outside mems_allowed: scan up cpusets */
- spin_lock_irqsave(&callback_lock, flags);
+ raw_spin_lock_irqsave(&callback_lock, flags);
rcu_read_lock();
cs = nearest_hardwall_ancestor(task_cs(current));
allowed = node_isset(node, cs->mems_allowed);
rcu_read_unlock();
- spin_unlock_irqrestore(&callback_lock, flags);
+ raw_spin_unlock_irqrestore(&callback_lock, flags);
return allowed;
}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a5b0c6c25b44..9c5285cc082c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4218,14 +4218,6 @@ static int __sched_setscheduler(struct task_struct *p,
if (attr->sched_flags & SCHED_FLAG_SUGOV)
return -EINVAL;
- /*
- * Make sure we don't race with the cpuset subsystem where root
- * domains can be rebuilt or modified while operations like DL
- * admission checks are carried out.
- */
- if (!cpuset_lock())
- return -EBUSY;
-
retval = security_task_setscheduler(p);
if (retval)
return retval;
@@ -4241,6 +4233,13 @@ static int __sched_setscheduler(struct task_struct *p,
rq = task_rq_lock(p, &rf);
update_rq_clock(rq);
+ /*
+ * Make sure we don't race with the cpuset subsystem where root
+ * domains can be rebuilt or modified while operations like DL
+ * admission checks are carried out.
+ */
+ cpuset_lock();
+
/*
* Changing the policy of the stop threads its a very bad idea:
*/
@@ -4302,9 +4301,8 @@ static int __sched_setscheduler(struct task_struct *p,
/* Re-check policy now with rq lock held: */
if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
policy = oldpolicy = -1;
+ cpuset_unlock();
task_rq_unlock(rq, p, &rf);
- if (user)
- cpuset_unlock();
goto recheck;
}
@@ -4361,9 +4359,8 @@ static int __sched_setscheduler(struct task_struct *p,
/* Avoid rq from going away on us: */
preempt_disable();
+ cpuset_unlock();
task_rq_unlock(rq, p, &rf);
- if (user)
- cpuset_unlock();
if (pi)
rt_mutex_adjust_pi(p);
@@ -4375,9 +4372,8 @@ static int __sched_setscheduler(struct task_struct *p,
return 0;
unlock:
+ cpuset_unlock();
task_rq_unlock(rq, p, &rf);
- if (user)
- cpuset_unlock();
return retval;
}
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h index 934633a05d20..a1970862ab8e 100644 --- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -55,6 +55,8 @@ extern void cpuset_init_smp(void); extern void cpuset_force_rebuild(void); extern void cpuset_update_active_cpus(void); extern void cpuset_wait_for_hotplug(void); +extern int cpuset_lock(void); +extern void cpuset_unlock(void); extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask); extern void cpuset_cpus_allowed_fallback(struct task_struct *p); extern nodemask_t cpuset_mems_allowed(struct task_struct *p); @@ -176,6 +178,10 @@ static inline void cpuset_update_active_cpus(void) static inline void cpuset_wait_for_hotplug(void) { } +static inline int cpuset_lock(void) { return 1; } + +static inline void cpuset_unlock(void) { } + static inline void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask) { diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index b42037e6e81d..d26fd4795aa3 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2409,6 +2409,22 @@ void __init cpuset_init_smp(void) BUG_ON(!cpuset_migrate_mm_wq); } +/** + * cpuset_lock - Grab the cpuset_mutex from another subsysytem + */ +int cpuset_lock(void) +{ + return mutex_trylock(&cpuset_mutex); +} + +/** + * cpuset_unlock - Release the cpuset_mutex from another subsysytem + */ +void cpuset_unlock(void) +{ + mutex_unlock(&cpuset_mutex); +} + /** * cpuset_cpus_allowed - return cpus_allowed mask from a tasks cpuset. * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed. diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ca788f74259d..a5b0c6c25b44 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4218,6 +4218,14 @@ static int __sched_setscheduler(struct task_struct *p, if (attr->sched_flags & SCHED_FLAG_SUGOV) return -EINVAL; + /* + * Make sure we don't race with the cpuset subsystem where root + * domains can be rebuilt or modified while operations like DL + * admission checks are carried out. + */ + if (!cpuset_lock()) + return -EBUSY; + retval = security_task_setscheduler(p); if (retval) return retval; @@ -4295,6 +4303,8 @@ static int __sched_setscheduler(struct task_struct *p, if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) { policy = oldpolicy = -1; task_rq_unlock(rq, p, &rf); + if (user) + cpuset_unlock(); goto recheck; } @@ -4352,6 +4362,8 @@ static int __sched_setscheduler(struct task_struct *p, /* Avoid rq from going away on us: */ preempt_disable(); task_rq_unlock(rq, p, &rf); + if (user) + cpuset_unlock(); if (pi) rt_mutex_adjust_pi(p); @@ -4364,6 +4376,8 @@ static int __sched_setscheduler(struct task_struct *p, unlock: task_rq_unlock(rq, p, &rf); + if (user) + cpuset_unlock(); return retval; }