mbox series

[v10,0/8] cgroup/cpuset: Major cpu partition code restructuring

Message ID 20220503162149.1764245-1-longman@redhat.com
Headers show
Series cgroup/cpuset: Major cpu partition code restructuring | expand

Message

Waiman Long May 3, 2022, 4:21 p.m. UTC
v10:
 - Relax constraints for changes made to "cpuset.cpus"
   and "cpuset.cpus.partition" as suggested. Now almost all changes
   are allowed.

v9:
 - Add a new patch 1 to remove the child cpuset restriction on parent's
   "cpuset.cpus".
 - Relax initial root partition entry limitation to allow cpuset.cpus to
   overlap that of parent's.
 - An "isolated invalid" displayed type is added to
   cpuset.cpus.partition.
 - Resetting partition root to "member" will leave child partition root
   as invalid.
 - Update documentation and test accordingly.

v8:
 - Reorganize the patch series and rationalize the features and
   constraints of a partition.
 - Update patch descriptions and documentation accordingly.

This patchset include the following enhancements to the cpuset v2
partition code.
 1) Allow partitions that have no task to have empty effective cpus.
 2) Relax the constraints on what changes are allowed in cpuset.cpus
    and cpuset.cpus.partition. However, the partition remain invalid
    until the constraints of a valid partition root is satisfied.
 3) Add a new "isolated" partition type for partitions with no load
    balancing which is available in v1 but not yet in v2.
 4) Allow the reading of cpuset.cpus.partition to include a reason
    string as to why the partition remain invalid.

 In addition, the cgroup-v2.rst documentation file is updated and
 a self test is added to verify the correctness the partition code.

Waiman Long (8):
  cgroup/cpuset: Add top_cpuset check in update_tasks_cpumask()
  cgroup/cpuset: Miscellaneous cleanups & add helper functions
  cgroup/cpuset: Allow no-task partition to have empty
    cpuset.cpus.effective
  cgroup/cpuset: Relax constraints to partition & cpus changes
  cgroup/cpuset: Add a new isolated cpus.partition type
  cgroup/cpuset: Show invalid partition reason string
  cgroup/cpuset: Update description of cpuset.cpus.partition in
    cgroup-v2.rst
  kselftest/cgroup: Add cpuset v2 partition root state test

 Documentation/admin-guide/cgroup-v2.rst       | 145 ++--
 kernel/cgroup/cpuset.c                        | 712 +++++++++++-------
 tools/testing/selftests/cgroup/Makefile       |   5 +-
 .../selftests/cgroup/test_cpuset_prs.sh       | 674 +++++++++++++++++
 tools/testing/selftests/cgroup/wait_inotify.c |  87 +++
 5 files changed, 1295 insertions(+), 328 deletions(-)
 create mode 100755 tools/testing/selftests/cgroup/test_cpuset_prs.sh
 create mode 100644 tools/testing/selftests/cgroup/wait_inotify.c

Comments

Phil Auld May 3, 2022, 5:54 p.m. UTC | #1
Hi Waiman

On Tue, May 03, 2022 at 12:21:44PM -0400 Waiman Long wrote:
> Currently, a partition root cannot have empty "cpuset.cpus.effective".
> As a result, a parent partition root cannot distribute out all its CPUs
>  to child partitions with no CPUs left. However in most cases, there
> shouldn't be any tasks associated with intermediate nodes of the default
>  hierarchy. So the current rule is too restrictive and can waste valuable
>  CPU resource.
> 
> To address this issue, we are now allowing a partition to have empty
> "cpuset.cpus.effective" as long as it has no task. Therefore, a parent
> partition with no task can now have all its CPUs distributed out to its
> child partitions. The top cpuset always have some house-keeping tasks
> running and so its list of effective cpu can't never be empty.

s/never/ever/

> 
> Once a partition with empty "cpuset.cpus.effective" is formed, no
> new task can be moved into it until "cpuset.cpus.effective" becomes
> non-empty.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/cgroup/cpuset.c | 111 +++++++++++++++++++++++++++++++----------
>  1 file changed, 84 insertions(+), 27 deletions(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index d156a39d7a08..7d9abd50a1b9 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -412,6 +412,41 @@ static inline bool is_in_v2_mode(void)
>  	      (cpuset_cgrp_subsys.root->flags & CGRP_ROOT_CPUSET_V2_MODE);
>  }
>  
> +/**
> + * partition_is_populated - check if partition has tasks
> + * @cs: partition root to be checked
> + * @excluded_child: a child cpuset to be excluded in task checking
> + * Return: true if there are tasks, false otherwise
> + *
> + * It is assumed that @cs is a valid partition root. @excluded_child should
> + * be non-NULL when this cpuset is going to become a partition itself.
> + */
> +static inline bool partition_is_populated(struct cpuset *cs,
> +					  struct cpuset *excluded_child)
> +{
> +	struct cgroup_subsys_state *css;
> +	struct cpuset *child;
> +
> +	if (cs->css.cgroup->nr_populated_csets)
> +		return true;
> +	if (!excluded_child && !cs->nr_subparts_cpus)
> +		return cgroup_is_populated(cs->css.cgroup);
> +
> +	rcu_read_lock();
> +	cpuset_for_each_child(child, css, cs) {
> +		if (child == excluded_child)
> +			continue;
> +		if (is_partition_valid(child))
> +			continue;
> +		if (cgroup_is_populated(child->css.cgroup)) {
> +			rcu_read_unlock();
> +			return true;
> +		}
> +	}
> +	rcu_read_unlock();
> +	return false;
> +}
> +
>  /*
>   * Return in pmask the portion of a task's cpusets's cpus_allowed that
>   * are online and are capable of running the task.  If none are found,
> @@ -1252,22 +1287,25 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
>  	if ((cmd != partcmd_update) && css_has_online_children(&cs->css))
>  		return -EBUSY;
>  
> -	/*
> -	 * Enabling partition root is not allowed if not all the CPUs
> -	 * can be granted from parent's effective_cpus or at least one
> -	 * CPU will be left after that.
> -	 */
> -	if ((cmd == partcmd_enable) &&
> -	   (!cpumask_subset(cs->cpus_allowed, parent->effective_cpus) ||
> -	     cpumask_equal(cs->cpus_allowed, parent->effective_cpus)))
> -		return -EINVAL;
> -
> -	/*
> -	 * A cpumask update cannot make parent's effective_cpus become empty.
> -	 */
>  	adding = deleting = false;
>  	old_prs = new_prs = cs->partition_root_state;
>  	if (cmd == partcmd_enable) {
> +		/*
> +		 * Enabling partition root is not allowed if not all the CPUs
> +		 * can be granted from parent's effective_cpus.
> +		 */
> +		if (!cpumask_subset(cs->cpus_allowed, parent->effective_cpus))
> +			return -EINVAL;
> +
> +		/*
> +		 * A parent can be left with no CPU as long as there is no
> +		 * task directly associated with the parent partition. For
> +		 * such a parent, no new task can be moved into it.
> +		 */
> +		if (partition_is_populated(parent, cs) &&
> +		    cpumask_equal(cs->cpus_allowed, parent->effective_cpus))
> +			return -EINVAL;
> +

You might consider switching these around to check the cpumasks first.


>  		cpumask_copy(tmp->addmask, cs->cpus_allowed);
>  		adding = true;
>  	} else if (cmd == partcmd_disable) {
> @@ -1289,9 +1327,10 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
>  		adding = cpumask_andnot(tmp->addmask, tmp->addmask,
>  					parent->subparts_cpus);
>  		/*
> -		 * Return error if the new effective_cpus could become empty.
> +		 * Return error if the new effective_cpus could become empty
> +		 * and there are tasks in the parent.
>  		 */
> -		if (adding &&
> +		if (adding && partition_is_populated(parent, cs) &&
>  		    cpumask_equal(parent->effective_cpus, tmp->addmask)) {

Same.



Cheers,
Phil


>  			if (!deleting)
>  				return -EINVAL;
> @@ -1317,8 +1356,8 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
>  		 */
>  		adding = cpumask_and(tmp->addmask, cs->cpus_allowed,
>  				     parent->effective_cpus);
> -		part_error = cpumask_equal(tmp->addmask,
> -					   parent->effective_cpus);
> +		part_error = cpumask_equal(tmp->addmask, parent->effective_cpus) &&
> +			     partition_is_populated(parent, cs);
>  	}
>  
>  	if (cmd == partcmd_update) {
> @@ -1420,9 +1459,15 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp)
>  
>  		/*
>  		 * If it becomes empty, inherit the effective mask of the
> -		 * parent, which is guaranteed to have some CPUs.
> +		 * parent, which is guaranteed to have some CPUs unless
> +		 * it is a partition root that has explicitly distributed
> +		 * out all its CPUs.
>  		 */
>  		if (is_in_v2_mode() && cpumask_empty(tmp->new_cpus)) {
> +			if (is_partition_valid(cp) &&
> +			    cpumask_equal(cp->cpus_allowed, cp->subparts_cpus))
> +				goto update_parent_subparts;
> +
>  			cpumask_copy(tmp->new_cpus, parent->effective_cpus);
>  			if (!cp->use_parent_ecpus) {
>  				cp->use_parent_ecpus = true;
> @@ -1444,6 +1489,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp)
>  			continue;
>  		}
>  
> +update_parent_subparts:
>  		/*
>  		 * update_parent_subparts_cpumask() should have been called
>  		 * for cs already in update_cpumask(). We should also call
> @@ -2249,6 +2295,13 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>  	    (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed)))
>  		goto out_unlock;
>  
> +	/*
> +	 * On default hierarchy, task cannot be moved to a cpuset with empty
> +	 * effective cpus.
> +	 */
> +	if (is_in_v2_mode() && cpumask_empty(cs->effective_cpus))
> +		goto out_unlock;
> +
>  	cgroup_taskset_for_each(task, css, tset) {
>  		ret = task_can_attach(task, cs->cpus_allowed);
>  		if (ret)
> @@ -3115,7 +3168,8 @@ hotplug_update_tasks(struct cpuset *cs,
>  		     struct cpumask *new_cpus, nodemask_t *new_mems,
>  		     bool cpus_updated, bool mems_updated)
>  {
> -	if (cpumask_empty(new_cpus))
> +	/* A partition root is allowed to have empty effective cpus */
> +	if (cpumask_empty(new_cpus) && !is_partition_valid(cs))
>  		cpumask_copy(new_cpus, parent_cs(cs)->effective_cpus);
>  	if (nodes_empty(*new_mems))
>  		*new_mems = parent_cs(cs)->effective_mems;
> @@ -3184,10 +3238,11 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
>  
>  	/*
>  	 * In the unlikely event that a partition root has empty
> -	 * effective_cpus or its parent becomes invalid, we have to
> -	 * transition it to the invalid state.
> +	 * effective_cpus with tasks or its parent becomes invalid, we
> +	 * have to transition it to the invalid state.
>  	 */
> -	if (is_partition_valid(cs) && (cpumask_empty(&new_cpus) ||
> +	if (is_partition_valid(cs) &&
> +	   ((cpumask_empty(&new_cpus) && partition_is_populated(cs, NULL)) ||
>  	    is_partition_invalid(parent))) {
>  		if (cs->nr_subparts_cpus) {
>  			spin_lock_irq(&callback_lock);
> @@ -3198,13 +3253,15 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
>  		}
>  
>  		/*
> -		 * If the effective_cpus is empty because the child
> -		 * partitions take away all the CPUs, we can keep
> -		 * the current partition and let the child partitions
> -		 * fight for available CPUs.
> +		 * Force the partition to become invalid if either one of
> +		 * the following conditions hold:
> +		 * 1) empty effective cpus but not valid empty partition.
> +		 * 2) parent is invalid or doesn't grant any cpus to child
> +		 *    partitions.
>  		 */
>  		if (is_partition_invalid(parent) ||
> -		     cpumask_empty(&new_cpus)) {
> +		    (cpumask_empty(&new_cpus) &&
> +		     partition_is_populated(cs, NULL))) {
>  			int old_prs;
>  
>  			update_parent_subparts_cpumask(cs, partcmd_disable,
> -- 
> 2.27.0
> 


--
Michal Koutný May 4, 2022, 11:28 a.m. UTC | #2
Hello.

On Tue, May 03, 2022 at 12:21:41PM -0400, Waiman Long <longman@redhat.com> wrote:
> v10:
>  - Relax constraints for changes made to "cpuset.cpus"
>    and "cpuset.cpus.partition" as suggested. Now almost all changes
>    are allowed.

I see there were also some other changes from v9 (like the first patches
of series).
Any chance you have a public git repo with both versions for a
convenient range-diff?

Thanks,
Michal
Waiman Long May 4, 2022, 6:33 p.m. UTC | #3
On 5/4/22 07:28, Michal Koutný wrote:
> Hello.
>
> On Tue, May 03, 2022 at 12:21:41PM -0400, Waiman Long <longman@redhat.com> wrote:
>> v10:
>>   - Relax constraints for changes made to "cpuset.cpus"
>>     and "cpuset.cpus.partition" as suggested. Now almost all changes
>>     are allowed.
> I see there were also some other changes from v9 (like the first patches
> of series).
> Any chance you have a public git repo with both versions for a
> convenient range-diff?

That is true. Both patches 1 and 2 are new and the changes are pretty 
straight forward. Patch 1 of v9 has been merged with a latent bug. Patch 
4 of this series is a replacement of patch 3 "cgroup/cpuset: Refining 
features and constraints of a  partition" of v9. The other patches are 
similar to their versions in v9 with some adjustment based on the 
different code base.

I don't have a public repo. Attached is the file diff between v9 and v10 
in cpuset.c with some other unrelated cpuset patches included.

Cheers,
Longman