Message ID | e54ce908bb964d643a2bbb945ed4a8dcb3ea00bf.1375809311.git.viresh.kumar@linaro.org |
---|---|
State | New |
Headers | show |
On Tuesday, August 06, 2013 10:53:09 PM Viresh Kumar wrote: > For iterating over all policies currently we are iterating over all CPUs and > then finding their policies. Lets use the newly created infrastructure > cpufreq_policy_list. > > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> I noticed that the current linux-next branch of linux-pm.git caused the BUG_ON() in lock_policy_rwsem_##mode() to trigger when user space tried to access cpufreq sysfs attributes before system suspend and after system resume. I tried to debug that and it turned out that this patch caused resume to block indefinitely on one of my test machines and after reverting it the BUG_ON() stopped triggering, so I've just reverted it in my tree (it is not an important change). I don't have the time to figure out why this change breaks things and I would appreciate it if you tested stuff like suspend/resume on an x86 laptop or similar with your patches applied before posting them for merging. Thanks, Rafael > --- > drivers/cpufreq/cpufreq.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index f5999c4..fe04b79 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -984,8 +984,8 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, > struct cpufreq_policy *policy; > unsigned long flags; > #ifdef CONFIG_HOTPLUG_CPU > + struct cpufreq_policy *tpolicy; > struct cpufreq_governor *gov; > - int sibling; > #endif > > if (cpu_is_offline(cpu)) > @@ -1005,11 +1005,11 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, > #ifdef CONFIG_HOTPLUG_CPU > /* Check if this cpu was hot-unplugged earlier and has siblings */ > read_lock_irqsave(&cpufreq_driver_lock, flags); > - for_each_online_cpu(sibling) { > - struct cpufreq_policy *cp = per_cpu(cpufreq_cpu_data, sibling); > - if (cp && cpumask_test_cpu(cpu, cp->related_cpus)) { > + list_for_each_entry(tpolicy, &cpufreq_policy_list, policy_list) { > + if (cpumask_test_cpu(cpu, tpolicy->related_cpus)) { > read_unlock_irqrestore(&cpufreq_driver_lock, flags); > - return cpufreq_add_policy_cpu(cp, cpu, dev, frozen); > + return cpufreq_add_policy_cpu(tpolicy, cpu, dev, > + frozen); > } > } > read_unlock_irqrestore(&cpufreq_driver_lock, flags); >
On 18 August 2013 19:36, Rafael J. Wysocki <rjw@sisk.pl> wrote: > I noticed that the current linux-next branch of linux-pm.git caused the > BUG_ON() in lock_policy_rwsem_##mode() to trigger when user space tried to > access cpufreq sysfs attributes before system suspend and after system > resume. Hmm... > I tried to debug that and it turned out that this patch caused resume > to block indefinitely on one of my test machines and after reverting it the > BUG_ON() stopped triggering, so I've just reverted it in my tree (it is not an > important change). > > I don't have the time to figure out why this change breaks things It wasn't my patch actually.. It only made it visible that's it :) The problem is: - On suspend all CPUs are removed and so governors are stopped. - On resume, handle_update() is called for the boot cpu and cpu_add_dev for all others. handle_update() doesn't start governor but only plays with CPUFREQ_GOV_LIMITS.. when we start adding other CPUs and call: cpufreq_add_policy_cpu() which fails in following call: __cpufreq_governor(policy, CPUFREQ_GOV_STOP); and so cpufreq_policy_cpu never gets initialized to policy->cpu and stays at -1, and hence the crash. So, there are few problems with core at this point: - I don't understand how does the work done in cpufreq_add_dev() gets done for boot cpu during resume ? And so how does Srivatsa's "frozen" solution really works (I haven't had time to investigate, its not that I couldn't understand it :) ).. - We need to start governor boot cpu in handle_update() and things would be solved... > and I would > appreciate it if you tested stuff like suspend/resume on an x86 laptop or > similar with your patches applied before posting them for merging. suspend/resume is broken on my ARM board and that's why didn't test it.. Testing anything on my thinkpad (with ubuntu) is a pain.. it takes more than an hour to compile/test a single image... I currently follow below steps for doing that, don't know if something much simpler/faster is available :) https://wiki.ubuntu.com/KernelTeam/GitKernelBuild Whole day I was able to boot test only 4-5 kernel builds. Its too slow :(
On Mon, Aug 19, 2013 at 4:57 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote: > Testing anything on my thinkpad (with ubuntu) is a pain.. it takes > more than an hour to compile/test a single image... I currently follow > below steps for doing that, don't know if something much > simpler/faster is available :) > > https://wiki.ubuntu.com/KernelTeam/GitKernelBuild > > Whole day I was able to boot test only 4-5 kernel builds. > Its too slow :( Why do you create .deb packages to test development kernels? It _is_ slow. Instead I'd just add a new grub menu entry to /boot/grub/grub.cfg and point it to /boot/MyzImage. After a 'make install; make modules_install' just link /boot/MyzImage to the installed kernel in /boot. And then generate a new initramfs using 'update-initramfs -c -k <ver>'
On Monday, August 19, 2013 04:57:19 PM Viresh Kumar wrote: > On 18 August 2013 19:36, Rafael J. Wysocki <rjw@sisk.pl> wrote: > > > I noticed that the current linux-next branch of linux-pm.git caused the > > BUG_ON() in lock_policy_rwsem_##mode() to trigger when user space tried to > > access cpufreq sysfs attributes before system suspend and after system > > resume. > > Hmm... > > > I tried to debug that and it turned out that this patch caused resume > > to block indefinitely on one of my test machines and after reverting it the > > BUG_ON() stopped triggering, so I've just reverted it in my tree (it is not an > > important change). > > > > I don't have the time to figure out why this change breaks things > > It wasn't my patch actually.. It only made it visible that's it :) > The problem is: > - On suspend all CPUs are removed and so governors are > stopped. > - On resume, handle_update() is called for the boot cpu and > cpu_add_dev for all others. > > handle_update() doesn't start governor but only plays with > CPUFREQ_GOV_LIMITS.. when we start adding other > CPUs and call: cpufreq_add_policy_cpu() which fails in > following call: > > __cpufreq_governor(policy, CPUFREQ_GOV_STOP); > > and so cpufreq_policy_cpu never gets initialized to > policy->cpu and stays at -1, and hence the crash. > > So, there are few problems with core at this point: > - I don't understand how does the work done in > cpufreq_add_dev() gets done for boot cpu during > resume ? And so how does Srivatsa's "frozen" solution > really works (I haven't had time to investigate, its not > that I couldn't understand it :) ).. > > - We need to start governor boot cpu in handle_update() > and things would be solved... > > > and I would > > appreciate it if you tested stuff like suspend/resume on an x86 laptop or > > similar with your patches applied before posting them for merging. > > suspend/resume is broken on my ARM board and that's why > didn't test it.. > > Testing anything on my thinkpad (with ubuntu) is a pain.. it takes > more than an hour to compile/test a single image... I currently follow > below steps for doing that, don't know if something much > simpler/faster is available :) > > https://wiki.ubuntu.com/KernelTeam/GitKernelBuild > > Whole day I was able to boot test only 4-5 kernel builds. > Its too slow :( Well, that's a matter of setting up a workspace, basically. As a general rule, you should be able test the changes you're making on hardware that they are supposed to run on. If that's something not very easy to acquire, like s390, you can make an excuse of not having access to that hardware and hope that someone will test the changes for you (or ACKs them without testing, because they are "obviously" correct). However, if that's ACPI-compatible x86, that excuse pretty much doesn't work, because you can find those things everywhere. I have no experience with building kernels on Ubuntu to be honest, as I've been using openSUSE as my testbed distro for several years. From my experience, however, it is good to figure out what needs to be included into your test kernel and configure away everything else. Also, I'd recommend to build as much as possible into the kernel image and avoid compiling too many modules, because installing modules takes time too (ideally, if you can get away without any modules at all, that's the best option timing-wise). Just select only the drivers that you're going to use and unset all of the options that don't apply to your test machine. Thanks, Rafael
On 19 August 2013 17:15, Amit Kucheria <amit.kucheria@linaro.org> wrote: > Why do you create .deb packages to test development kernels? It _is_ slow. > > Instead I'd just add a new grub menu entry to /boot/grub/grub.cfg and > point it to /boot/MyzImage. After a 'make install; make > modules_install' just link /boot/MyzImage to the installed kernel in > /boot. And then generate a new initramfs using 'update-initramfs -c -k > <ver>' initramfs was fixed automatically for me with make install.. Thanks it saved lots of my time :)
On 20 August 2013 04:52, Rafael J. Wysocki <rjw@sisk.pl> wrote: > From my experience, however, it is good to figure out what needs to be included > into your test kernel and configure away everything else. Also, I'd recommend > to build as much as possible into the kernel image and avoid compiling too many > modules, because installing modules takes time too (ideally, if you can get > away without any modules at all, that's the best option timing-wise). Just > select only the drivers that you're going to use and unset all of the options > that don't apply to your test machine. I haven't done this exercise yet but I have a workable solution with default .config as suggested by Amit :)
On 19 August 2013 16:57, Viresh Kumar <viresh.kumar@linaro.org> wrote: > It wasn't my patch actually.. It only made it visible that's it :) > The problem is: > - On suspend all CPUs are removed and so governors are > stopped. > - On resume, handle_update() is called for the boot cpu and > cpu_add_dev for all others. > > handle_update() doesn't start governor but only plays with > CPUFREQ_GOV_LIMITS.. when we start adding other > CPUs and call: cpufreq_add_policy_cpu() which fails in > following call: > > __cpufreq_governor(policy, CPUFREQ_GOV_STOP); > > and so cpufreq_policy_cpu never gets initialized to > policy->cpu and stays at -1, and hence the crash. > > So, there are few problems with core at this point: > - I don't understand how does the work done in > cpufreq_add_dev() gets done for boot cpu during > resume ? And so how does Srivatsa's "frozen" solution > really works (I haven't had time to investigate, its not > that I couldn't understand it :) ).. > > - We need to start governor boot cpu in handle_update() > and things would be solved... Whatever I wrote here was simply _Bullshit_ :( I am about to send you a fixup patchset that fixes this issue, and yes it was my patch which introduced this problem :(, but because of some mishandling of cpufreq_policy_list :) -- viresh
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index f5999c4..fe04b79 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -984,8 +984,8 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, struct cpufreq_policy *policy; unsigned long flags; #ifdef CONFIG_HOTPLUG_CPU + struct cpufreq_policy *tpolicy; struct cpufreq_governor *gov; - int sibling; #endif if (cpu_is_offline(cpu)) @@ -1005,11 +1005,11 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, #ifdef CONFIG_HOTPLUG_CPU /* Check if this cpu was hot-unplugged earlier and has siblings */ read_lock_irqsave(&cpufreq_driver_lock, flags); - for_each_online_cpu(sibling) { - struct cpufreq_policy *cp = per_cpu(cpufreq_cpu_data, sibling); - if (cp && cpumask_test_cpu(cpu, cp->related_cpus)) { + list_for_each_entry(tpolicy, &cpufreq_policy_list, policy_list) { + if (cpumask_test_cpu(cpu, tpolicy->related_cpus)) { read_unlock_irqrestore(&cpufreq_driver_lock, flags); - return cpufreq_add_policy_cpu(cp, cpu, dev, frozen); + return cpufreq_add_policy_cpu(tpolicy, cpu, dev, + frozen); } } read_unlock_irqrestore(&cpufreq_driver_lock, flags);
For iterating over all policies currently we are iterating over all CPUs and then finding their policies. Lets use the newly created infrastructure cpufreq_policy_list. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> --- drivers/cpufreq/cpufreq.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)