Message ID | 1543325060-1599-1-git-send-email-daniel.lezcano@linaro.org |
---|---|
State | New |
Headers | show |
Series | [V5,1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE | expand |
Hi Daniel, On 27/11/18 14:24, Daniel Lezcano wrote: > The mutex protects a per_cpu variable access. The potential race can > happen only when the cpufreq governor module is loaded and at the same > time the cpu capacity is changed in the sysfs. > > There is no real interest of using a mutex to protect a variable > assignation when there is no situation where a task can take the lock > and block. > > Replace the mutex by READ_ONCE / WRITE_ONCE. > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > Cc: Sudeep Holla <sudeep.holla@arm.com> > Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> > --- > drivers/base/arch_topology.c | 7 +------ > include/linux/arch_topology.h | 2 +- > 2 files changed, 2 insertions(+), 7 deletions(-) > > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c > index edfcf8d..fd5325b 100644 > --- a/drivers/base/arch_topology.c > +++ b/drivers/base/arch_topology.c > @@ -31,12 +31,11 @@ void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq, > per_cpu(freq_scale, i) = scale; > } > > -static DEFINE_MUTEX(cpu_scale_mutex); > DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE; > > void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity) > { > - per_cpu(cpu_scale, cpu) = capacity; > + WRITE_ONCE(per_cpu(cpu_scale, cpu), capacity); > } > > static ssize_t cpu_capacity_show(struct device *dev, > @@ -71,10 +70,8 @@ static ssize_t cpu_capacity_store(struct device *dev, > if (new_capacity > SCHED_CAPACITY_SCALE) > return -EINVAL; > > - mutex_lock(&cpu_scale_mutex); > for_each_cpu(i, &cpu_topology[this_cpu].core_sibling) > topology_set_cpu_scale(i, new_capacity); > - mutex_unlock(&cpu_scale_mutex); IIRC this was meant to ensure atomic updates of all siblings with the new capacity value. I actually now wonder if readers should not grab the mutex as well (cpu_capacity_show()). Can't we get into a situation where a reader might see siblings with intermediate values (while the loop above is performing an update)? BTW, please update my email address. :-) Best, - Juri
On 28/11/2018 12:44, Juri Lelli wrote: > Hi Daniel, > > On 27/11/18 14:24, Daniel Lezcano wrote: >> The mutex protects a per_cpu variable access. The potential race can >> happen only when the cpufreq governor module is loaded and at the same >> time the cpu capacity is changed in the sysfs. >> >> There is no real interest of using a mutex to protect a variable >> assignation when there is no situation where a task can take the lock >> and block. >> >> Replace the mutex by READ_ONCE / WRITE_ONCE. >> >> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> >> Cc: Sudeep Holla <sudeep.holla@arm.com> >> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> >> --- >> drivers/base/arch_topology.c | 7 +------ >> include/linux/arch_topology.h | 2 +- >> 2 files changed, 2 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c >> index edfcf8d..fd5325b 100644 >> --- a/drivers/base/arch_topology.c >> +++ b/drivers/base/arch_topology.c >> @@ -31,12 +31,11 @@ void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq, >> per_cpu(freq_scale, i) = scale; >> } >> >> -static DEFINE_MUTEX(cpu_scale_mutex); >> DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE; >> >> void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity) >> { >> - per_cpu(cpu_scale, cpu) = capacity; >> + WRITE_ONCE(per_cpu(cpu_scale, cpu), capacity); >> } >> >> static ssize_t cpu_capacity_show(struct device *dev, >> @@ -71,10 +70,8 @@ static ssize_t cpu_capacity_store(struct device *dev, >> if (new_capacity > SCHED_CAPACITY_SCALE) >> return -EINVAL; >> >> - mutex_lock(&cpu_scale_mutex); >> for_each_cpu(i, &cpu_topology[this_cpu].core_sibling) >> topology_set_cpu_scale(i, new_capacity); >> - mutex_unlock(&cpu_scale_mutex); > > IIRC this was meant to ensure atomic updates of all siblings with the new > capacity value. I actually now wonder if readers should not grab the > mutex as well (cpu_capacity_show()). Can't we get into a situation where > a reader might see siblings with intermediate values (while the loop > above is performing an update)? With or without this patch, it is the case: task1 task2 | | read("/sys/.../cpu1/cpu_capacity) | | write("/sys/.../cpu1/cpu_capacity") read("/sys/.../cpu2/cpu_capacity) | There is no guarantee userspace can have a consistent view of the capacity. As soon as it reads a capacity, it can be changed in its back. > BTW, please update my email address. :-) Sure. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog
On 28/11/18 18:54, Daniel Lezcano wrote: > On 28/11/2018 12:44, Juri Lelli wrote: > > Hi Daniel, > > > > On 27/11/18 14:24, Daniel Lezcano wrote: > >> The mutex protects a per_cpu variable access. The potential race can > >> happen only when the cpufreq governor module is loaded and at the same > >> time the cpu capacity is changed in the sysfs. > >> > >> There is no real interest of using a mutex to protect a variable > >> assignation when there is no situation where a task can take the lock > >> and block. > >> > >> Replace the mutex by READ_ONCE / WRITE_ONCE. > >> > >> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > >> Cc: Sudeep Holla <sudeep.holla@arm.com> > >> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> > >> --- > >> drivers/base/arch_topology.c | 7 +------ > >> include/linux/arch_topology.h | 2 +- > >> 2 files changed, 2 insertions(+), 7 deletions(-) > >> > >> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c > >> index edfcf8d..fd5325b 100644 > >> --- a/drivers/base/arch_topology.c > >> +++ b/drivers/base/arch_topology.c > >> @@ -31,12 +31,11 @@ void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq, > >> per_cpu(freq_scale, i) = scale; > >> } > >> > >> -static DEFINE_MUTEX(cpu_scale_mutex); > >> DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE; > >> > >> void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity) > >> { > >> - per_cpu(cpu_scale, cpu) = capacity; > >> + WRITE_ONCE(per_cpu(cpu_scale, cpu), capacity); > >> } > >> > >> static ssize_t cpu_capacity_show(struct device *dev, > >> @@ -71,10 +70,8 @@ static ssize_t cpu_capacity_store(struct device *dev, > >> if (new_capacity > SCHED_CAPACITY_SCALE) > >> return -EINVAL; > >> > >> - mutex_lock(&cpu_scale_mutex); > >> for_each_cpu(i, &cpu_topology[this_cpu].core_sibling) > >> topology_set_cpu_scale(i, new_capacity); > >> - mutex_unlock(&cpu_scale_mutex); > > > > IIRC this was meant to ensure atomic updates of all siblings with the new > > capacity value. I actually now wonder if readers should not grab the > > mutex as well (cpu_capacity_show()). Can't we get into a situation where > > a reader might see siblings with intermediate values (while the loop > > above is performing an update)? > > With or without this patch, it is the case: > > task1 task2 > | | > read("/sys/.../cpu1/cpu_capacity) | > | write("/sys/.../cpu1/cpu_capacity") > read("/sys/.../cpu2/cpu_capacity) | > > > There is no guarantee userspace can have a consistent view of the > capacity. As soon as it reads a capacity, it can be changed in its back. True, but w/o the mutex task1 could read different cpu_capacity values for a cluster (it actually can also with current implementation, we should grab the mutex in the read path as well if we want to avoid this). Just pointing this out, not sure it is a problem, though, as even the notion of strictly equal capacities clusters is going to disappear AFAIK.
On 29/11/2018 08:04, Juri Lelli wrote: [ ... ] >> With or without this patch, it is the case: >> >> task1 task2 >> | | >> read("/sys/.../cpu1/cpu_capacity) | >> | write("/sys/.../cpu1/cpu_capacity") >> read("/sys/.../cpu2/cpu_capacity) | >> >> >> There is no guarantee userspace can have a consistent view of the >> capacity. As soon as it reads a capacity, it can be changed in its back. > > True, but w/o the mutex task1 could read different cpu_capacity values > for a cluster (it actually can also with current implementation, we > should grab the mutex in the read path as well if we want to avoid > this). Even if the mutex is on the read path, the userspace can see different capacities because it will read the cpu_capacity per cpu directory. The mutex will be take when reading cpu0/cpu_capacity, not for cpu[0-9]/cpu_capacity. Between two reads, a write can happen because the lock is released in between. Do you agree with the patch ? Or do you want me to drop it ? -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog
On 29/11/18 10:18, Daniel Lezcano wrote: > On 29/11/2018 08:04, Juri Lelli wrote: > > [ ... ] > > >> With or without this patch, it is the case: > >> > >> task1 task2 > >> | | > >> read("/sys/.../cpu1/cpu_capacity) | > >> | write("/sys/.../cpu1/cpu_capacity") > >> read("/sys/.../cpu2/cpu_capacity) | > >> > >> > >> There is no guarantee userspace can have a consistent view of the > >> capacity. As soon as it reads a capacity, it can be changed in its back. > > > > True, but w/o the mutex task1 could read different cpu_capacity values > > for a cluster (it actually can also with current implementation, we > > should grab the mutex in the read path as well if we want to avoid > > this). > > Even if the mutex is on the read path, the userspace can see different > capacities because it will read the cpu_capacity per cpu directory. > > The mutex will be take when reading cpu0/cpu_capacity, not for > cpu[0-9]/cpu_capacity. Between two reads, a write can happen because the > lock is released in between. > > Do you agree with the patch ? Or do you want me to drop it ? I don't actually have cases at hand that are showing regression with it, I was just trying to understand if we might potentially hit problems in the future. So, I'm not against this patch. :-)
On 29/11/2018 10:58, Juri Lelli wrote: > On 29/11/18 10:18, Daniel Lezcano wrote: >> On 29/11/2018 08:04, Juri Lelli wrote: >> >> [ ... ] >> >>>> With or without this patch, it is the case: >>>> >>>> task1 task2 >>>> | | >>>> read("/sys/.../cpu1/cpu_capacity) | >>>> | write("/sys/.../cpu1/cpu_capacity") >>>> read("/sys/.../cpu2/cpu_capacity) | >>>> >>>> >>>> There is no guarantee userspace can have a consistent view of the >>>> capacity. As soon as it reads a capacity, it can be changed in its back. >>> >>> True, but w/o the mutex task1 could read different cpu_capacity values >>> for a cluster (it actually can also with current implementation, we >>> should grab the mutex in the read path as well if we want to avoid >>> this). >> >> Even if the mutex is on the read path, the userspace can see different >> capacities because it will read the cpu_capacity per cpu directory. >> >> The mutex will be take when reading cpu0/cpu_capacity, not for >> cpu[0-9]/cpu_capacity. Between two reads, a write can happen because the >> lock is released in between. >> >> Do you agree with the patch ? Or do you want me to drop it ? > > I don't actually have cases at hand that are showing regression with it, > I was just trying to understand if we might potentially hit problems in > the future. So, I'm not against this patch. :-) not-not-acked-by ? :) -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog
On 29/11/18 11:02, Daniel Lezcano wrote: > On 29/11/2018 10:58, Juri Lelli wrote: > > On 29/11/18 10:18, Daniel Lezcano wrote: > >> On 29/11/2018 08:04, Juri Lelli wrote: > >> > >> [ ... ] > >> > >>>> With or without this patch, it is the case: > >>>> > >>>> task1 task2 > >>>> | | > >>>> read("/sys/.../cpu1/cpu_capacity) | > >>>> | write("/sys/.../cpu1/cpu_capacity") > >>>> read("/sys/.../cpu2/cpu_capacity) | > >>>> > >>>> > >>>> There is no guarantee userspace can have a consistent view of the > >>>> capacity. As soon as it reads a capacity, it can be changed in its back. > >>> > >>> True, but w/o the mutex task1 could read different cpu_capacity values > >>> for a cluster (it actually can also with current implementation, we > >>> should grab the mutex in the read path as well if we want to avoid > >>> this). > >> > >> Even if the mutex is on the read path, the userspace can see different > >> capacities because it will read the cpu_capacity per cpu directory. > >> > >> The mutex will be take when reading cpu0/cpu_capacity, not for > >> cpu[0-9]/cpu_capacity. Between two reads, a write can happen because the > >> lock is released in between. > >> > >> Do you agree with the patch ? Or do you want me to drop it ? > > > > I don't actually have cases at hand that are showing regression with it, > > I was just trying to understand if we might potentially hit problems in > > the future. So, I'm not against this patch. :-) > > not-not-acked-by ? :) :-) I'm not maintaining this, so, Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index edfcf8d..fd5325b 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -31,12 +31,11 @@ void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq, per_cpu(freq_scale, i) = scale; } -static DEFINE_MUTEX(cpu_scale_mutex); DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE; void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity) { - per_cpu(cpu_scale, cpu) = capacity; + WRITE_ONCE(per_cpu(cpu_scale, cpu), capacity); } static ssize_t cpu_capacity_show(struct device *dev, @@ -71,10 +70,8 @@ static ssize_t cpu_capacity_store(struct device *dev, if (new_capacity > SCHED_CAPACITY_SCALE) return -EINVAL; - mutex_lock(&cpu_scale_mutex); for_each_cpu(i, &cpu_topology[this_cpu].core_sibling) topology_set_cpu_scale(i, new_capacity); - mutex_unlock(&cpu_scale_mutex); schedule_work(&update_topology_flags_work); @@ -141,7 +138,6 @@ void topology_normalize_cpu_scale(void) return; pr_debug("cpu_capacity: capacity_scale=%u\n", capacity_scale); - mutex_lock(&cpu_scale_mutex); for_each_possible_cpu(cpu) { pr_debug("cpu_capacity: cpu=%d raw_capacity=%u\n", cpu, raw_capacity[cpu]); @@ -151,7 +147,6 @@ void topology_normalize_cpu_scale(void) pr_debug("cpu_capacity: CPU%d cpu_capacity=%lu\n", cpu, topology_get_cpu_scale(NULL, cpu)); } - mutex_unlock(&cpu_scale_mutex); } bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu) diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h index d9bdc1a..12c439f 100644 --- a/include/linux/arch_topology.h +++ b/include/linux/arch_topology.h @@ -20,7 +20,7 @@ struct sched_domain; static inline unsigned long topology_get_cpu_scale(struct sched_domain *sd, int cpu) { - return per_cpu(cpu_scale, cpu); + return READ_ONCE(per_cpu(cpu_scale, cpu)); } void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity);