diff mbox series

[2/5] cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated

Message ID 20230306200849.376804-3-longman@redhat.com
State Accepted
Commit 6667439f51c446fead5d991ff49b842a811a6195
Headers show
Series cgroup/cpuset: Miscellaneous updates | expand

Commit Message

Waiman Long March 6, 2023, 8:08 p.m. UTC
Similar to commit 3fb906e7fabb ("group/cpuset: Don't filter offline
CPUs in cpuset_cpus_allowed() for top cpuset tasks"), the whole set of
possible CPUs including offline ones should be used for setting cpumasks
for tasks in the top cpuset when a cpuset partition is modified.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

Comments

Michal Koutný March 14, 2023, 5:34 p.m. UTC | #1
Hello Waiman.

On Mon, Mar 06, 2023 at 03:08:46PM -0500, Waiman Long <longman@redhat.com> wrote:
> -		/*
> -		 * Percpu kthreads in top_cpuset are ignored
> -		 */
> -		if (top_cs && (task->flags & PF_KTHREAD) &&
> -		    kthread_is_per_cpu(task))
> -			continue;
> +		const struct cpumask *possible_mask = task_cpu_possible_mask(task);
>  
> -		cpumask_and(new_cpus, cs->effective_cpus,
> -			    task_cpu_possible_mask(task));
> +		if (top_cs) {
> +			/*
> +			 * Percpu kthreads in top_cpuset are ignored
> +			 */
> +			if ((task->flags & PF_KTHREAD) && kthread_is_per_cpu(task))
> +				continue;
> +			cpumask_andnot(new_cpus, possible_mask, cs->subparts_cpus);
> +		} else {
> +			cpumask_and(new_cpus, cs->effective_cpus, possible_mask);
> +		}

I'm wrapping my head around this slightly.
1) I'd suggest swapping args in of cpumask_and() to have possible_mask
   consistently first.
2) Then I'm wondering whether two branches are truly different when
   effective_cpus := cpus_allowed - subparts_cpus
   top_cpuset.cpus_allowed == possible_mask        (1)

IOW, can you see a difference in what affinities are set to eligible
top_cpuset tasks before and after this patch upon CPU hotplug?
(Hm, (1) holds only in v2. So is this a fix for v1 only?)

Thanks,
Michal
Waiman Long March 14, 2023, 7:02 p.m. UTC | #2
On 3/14/23 13:34, Michal Koutný wrote:
> Hello Waiman.
>
> On Mon, Mar 06, 2023 at 03:08:46PM -0500, Waiman Long <longman@redhat.com> wrote:
>> -		/*
>> -		 * Percpu kthreads in top_cpuset are ignored
>> -		 */
>> -		if (top_cs && (task->flags & PF_KTHREAD) &&
>> -		    kthread_is_per_cpu(task))
>> -			continue;
>> +		const struct cpumask *possible_mask = task_cpu_possible_mask(task);
>>   
>> -		cpumask_and(new_cpus, cs->effective_cpus,
>> -			    task_cpu_possible_mask(task));
>> +		if (top_cs) {
>> +			/*
>> +			 * Percpu kthreads in top_cpuset are ignored
>> +			 */
>> +			if ((task->flags & PF_KTHREAD) && kthread_is_per_cpu(task))
>> +				continue;
>> +			cpumask_andnot(new_cpus, possible_mask, cs->subparts_cpus);
>> +		} else {
>> +			cpumask_and(new_cpus, cs->effective_cpus, possible_mask);
>> +		}
> I'm wrapping my head around this slightly.
> 1) I'd suggest swapping args in of cpumask_and() to have possible_mask
>     consistently first.
I don't quite understand what you meant by "swapping args". It is 
effective new_cpus = cs->effective_cpus ∩ possible_mask. What is the 
point of swapping cs->effective_cpus and possible_mask.
> 2) Then I'm wondering whether two branches are truly different when
>     effective_cpus := cpus_allowed - subparts_cpus
>     top_cpuset.cpus_allowed == possible_mask        (1)
effective_cpus may not be equal "cpus_allowed - subparts_cpus" if some 
of the CPUs are offline as effective_cpus contains only online CPUs. 
subparts_cpu can include offline cpus too. That is why I choose that 
expression. I will add a comment to clarify that.
>
> IOW, can you see a difference in what affinities are set to eligible
> top_cpuset tasks before and after this patch upon CPU hotplug?
> (Hm, (1) holds only in v2. So is this a fix for v1 only?)

This is due to the fact that cpu hotplug code currently doesn't update 
the cpu affinity of tasks in the top cpuset. Tasks not in the top cpuset 
can rely on the hotplug code to update the cpu affinity appropriately. 
For the tasks in the top cpuset, we have to make sure that all the 
offline CPUs are included.

Cheers,
Longman
Michal Koutný March 15, 2023, 10:06 a.m. UTC | #3
On Tue, Mar 14, 2023 at 03:02:53PM -0400, Waiman Long <longman@redhat.com> wrote:
> > > +			cpumask_andnot(new_cpus, possible_mask, cs->subparts_cpus);
> > > +		} else {
> > > +			cpumask_and(new_cpus, cs->effective_cpus, possible_mask);
> > > +		}
> > I'm wrapping my head around this slightly.
> > 1) I'd suggest swapping args in of cpumask_and() to have possible_mask
> >     consistently first.
> I don't quite understand what you meant by "swapping args". It is effective
> new_cpus = cs->effective_cpus ∩ possible_mask. What is the point of swapping
> cs->effective_cpus and possible_mask.

No effect except better readability (possible_mask comes first in the
other branch above too, left hand arg as the "base" that's modified).


> > 2) Then I'm wondering whether two branches are truly different when
> >     effective_cpus := cpus_allowed - subparts_cpus
> >     top_cpuset.cpus_allowed == possible_mask        (1)
> effective_cpus may not be equal "cpus_allowed - subparts_cpus" if some of
> the CPUs are offline as effective_cpus contains only online CPUs.
> subparts_cpu can include offline cpus too. That is why I choose that
> expression. I will add a comment to clarify that.

I see now that it returns offlined cpus to top cpuset's tasks.

> > 
> > IOW, can you see a difference in what affinities are set to eligible
> > top_cpuset tasks before and after this patch upon CPU hotplug?
> > (Hm, (1) holds only in v2. So is this a fix for v1 only?)
> 
> This is due to the fact that cpu hotplug code currently doesn't update the
> cpu affinity of tasks in the top cpuset. Tasks not in the top cpuset can
> rely on the hotplug code to update the cpu affinity appropriately.

Oh, I mistook this for hotplug changing behavior but it's actually for
updating top_cpuset when its children becomes a partition root.

	IIUC, top cpuset + hotplug has been treated specially because
	hotplug must have taken care of affinity regardless of cpuset.
	v1 allowed modification of top cpuset's mask (not sure if that
	worked), v2 won't allow direct top cpuset's mask modificiation
	but indirectly via partition root children.

So this is a continuation for 3fb906e7fabb ("cgroup/cpuset: Don't filter
offline CPUs in cpuset_cpus_allowed() for top cpuset tasks") to ensure
hotplug offline/online cycle won't overwrite top_cpuset tasks'
affinities (when partition change during offlined period).
This looks correct in this regard then.
(I wish it were simpler but that's for a different/broader top
cpuset+hotplug approach.)

Thanks,
Michal
Waiman Long March 15, 2023, 2:39 p.m. UTC | #4
On 3/15/23 06:06, Michal Koutný wrote:
> On Tue, Mar 14, 2023 at 03:02:53PM -0400, Waiman Long <longman@redhat.com> wrote:
>>>> +			cpumask_andnot(new_cpus, possible_mask, cs->subparts_cpus);
>>>> +		} else {
>>>> +			cpumask_and(new_cpus, cs->effective_cpus, possible_mask);
>>>> +		}
>>> I'm wrapping my head around this slightly.
>>> 1) I'd suggest swapping args in of cpumask_and() to have possible_mask
>>>      consistently first.
>> I don't quite understand what you meant by "swapping args". It is effective
>> new_cpus = cs->effective_cpus ∩ possible_mask. What is the point of swapping
>> cs->effective_cpus and possible_mask.
> No effect except better readability (possible_mask comes first in the
> other branch above too, left hand arg as the "base" that's modified).
OK, I got it now. I will swap it as suggested.
>
>>> 2) Then I'm wondering whether two branches are truly different when
>>>      effective_cpus := cpus_allowed - subparts_cpus
>>>      top_cpuset.cpus_allowed == possible_mask        (1)
>> effective_cpus may not be equal "cpus_allowed - subparts_cpus" if some of
>> the CPUs are offline as effective_cpus contains only online CPUs.
>> subparts_cpu can include offline cpus too. That is why I choose that
>> expression. I will add a comment to clarify that.
> I see now that it returns offlined cpus to top cpuset's tasks.
>
>>> IOW, can you see a difference in what affinities are set to eligible
>>> top_cpuset tasks before and after this patch upon CPU hotplug?
>>> (Hm, (1) holds only in v2. So is this a fix for v1 only?)
>> This is due to the fact that cpu hotplug code currently doesn't update the
>> cpu affinity of tasks in the top cpuset. Tasks not in the top cpuset can
>> rely on the hotplug code to update the cpu affinity appropriately.
> Oh, I mistook this for hotplug changing behavior but it's actually for
> updating top_cpuset when its children becomes a partition root.
>
> 	IIUC, top cpuset + hotplug has been treated specially because
> 	hotplug must have taken care of affinity regardless of cpuset.
> 	v1 allowed modification of top cpuset's mask (not sure if that
> 	worked), v2 won't allow direct top cpuset's mask modificiation
> 	but indirectly via partition root children.
>
> So this is a continuation for 3fb906e7fabb ("cgroup/cpuset: Don't filter
> offline CPUs in cpuset_cpus_allowed() for top cpuset tasks") to ensure
> hotplug offline/online cycle won't overwrite top_cpuset tasks'
> affinities (when partition change during offlined period).
> This looks correct in this regard then.
> (I wish it were simpler but that's for a different/broader top
> cpuset+hotplug approach.)

You can't change the content of "cpuset.cpus" in the top cpuset (both v1 
& v2). I believe the CPU hotplug does not attempt to update the cpumask 
of tasks in the top cpuset mainly due to potential locking issue as 
hotplug is triggered by hardware event. Partition change, however, is a 
userspace event. So there shouldn't be any locking implication other 
than the fact per-cpu kthreads should not have their cpumasks changed.

To be consistent with commit 3fb906e7fabb ("cgroup/cpuset: Don't filter 
offline CPUs in cpuset_cpus_allowed() for top cpuset tasks"), similar 
logic needs to be applied in the later case.

Cheers,
Longman
diff mbox series

Patch

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index a801abad3bac..bbf57dcb2f68 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1209,7 +1209,8 @@  void rebuild_sched_domains(void)
  *
  * Iterate through each task of @cs updating its cpus_allowed to the
  * effective cpuset's.  As this function is called with cpuset_rwsem held,
- * cpuset membership stays stable.
+ * cpuset membership stays stable. For top_cpuset, task_cpu_possible_mask()
+ * is used instead of effective_cpus.
  */
 static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
 {
@@ -1219,15 +1220,18 @@  static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
 
 	css_task_iter_start(&cs->css, 0, &it);
 	while ((task = css_task_iter_next(&it))) {
-		/*
-		 * Percpu kthreads in top_cpuset are ignored
-		 */
-		if (top_cs && (task->flags & PF_KTHREAD) &&
-		    kthread_is_per_cpu(task))
-			continue;
+		const struct cpumask *possible_mask = task_cpu_possible_mask(task);
 
-		cpumask_and(new_cpus, cs->effective_cpus,
-			    task_cpu_possible_mask(task));
+		if (top_cs) {
+			/*
+			 * Percpu kthreads in top_cpuset are ignored
+			 */
+			if ((task->flags & PF_KTHREAD) && kthread_is_per_cpu(task))
+				continue;
+			cpumask_andnot(new_cpus, possible_mask, cs->subparts_cpus);
+		} else {
+			cpumask_and(new_cpus, cs->effective_cpus, possible_mask);
+		}
 		set_cpus_allowed_ptr(task, new_cpus);
 	}
 	css_task_iter_end(&it);