diff mbox

cpufreq: Set cpufreq_cpu_data to NULL before putting kobject

Message ID ed8fd187687cb4ea9afd0bc32107ca5abf03e679.1422580135.git.viresh.kumar@linaro.org
State New
Headers show

Commit Message

Viresh Kumar Jan. 30, 2015, 1:13 a.m. UTC
cpufreq_cpu_data is protected by cpufreq_driver_lock and one of the instances
has missed this. And as a result we get this:

 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
 kobject_get+0x41/0x50()
 Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
 lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
 ...
 Call Trace:
  [<ffffffff81661b14>] dump_stack+0x46/0x58
  [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
  [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff812e16d1>] kobject_get+0x41/0x50
  [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
  [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
  [<ffffffff810b8cb2>] ? up+0x32/0x50
  [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
  [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
  [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
  [<ffffffff81360967>] ? acpi_has_method+0x25/0x40
  [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
  [<ffffffff81089566>] ? move_linked_works+0x66/0x90
  [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
  [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
  [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
  [<ffffffff8108c910>] process_one_work+0x160/0x410
  [<ffffffff8108d05b>] worker_thread+0x11b/0x520
  [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
  [<ffffffff81092421>] kthread+0xe1/0x100
  [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
  [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
  [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
 ---[ end trace 89e66eb9795efdf7 ]---

And here is the race:

 Thread A: Workqueue: kacpi_notify

 acpi_processor_notify()
   acpi_processor_ppc_has_changed()
         cpufreq_update_policy()
           cpufreq_cpu_get()
             kobject_get()

 Thread B: xenbus_thread()

 xenbus_thread()
   msg->u.watch.handle->callback()
     handle_vcpu_hotplug_event()
       vcpu_hotplug()
         cpu_down()
           __cpu_notify(CPU_POST_DEAD..)
             cpufreq_cpu_callback()
               __cpufreq_remove_dev_finish()
                 cpufreq_policy_put_kobj()
                   kobject_put()

cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under
cpufreq_driver_lock, and once it gets a valid policy it expects it to not be
freed until cpufreq_cpu_put() is called.

But the race happens when another thread puts the kobject first and updates
cpufreq_cpu_data later and that too without these locks. And so the first thread
gets a valid policy structure and before it does kobject_get() on it, the second
one does kobject_put(). And so this WARN().

Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
too under locks.

Cc: <stable@vger.kernel.org>        # 3.12+
Reported-by: Ethan Zhao <ethan.zhao@oracle.com>
Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
@Santosh: I have changed read locks to write locks here and so you need to test
again.

 drivers/cpufreq/cpufreq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Viresh Kumar Jan. 30, 2015, 2:05 a.m. UTC | #1
On 30 January 2015 at 07:00, ethan zhao <ethan.zhao@oracle.com> wrote:

>  It is another bug.

Hmm, lets see..

>  Yes, here is one bug should be fix. but seems not enough to avoid the issue
> completely,

Did you test it ?

>  how about the Thread B running here
>
>  Thread B: xenbus_thread()
>
>  xenbus_thread()
>    msg->u.watch.handle->callback()
>      handle_vcpu_hotplug_event()
>        vcpu_hotplug()
>          cpu_down()
>            __cpu_notify(CPU_POST_DEAD..)
>              cpufreq_cpu_callback()
>                __cpufreq_remove_dev_prepare
>                  update_policy_cpu(){
>                    ...
>                    down_write(&policy->rwsem);
>                    policy->last_cpu = policy->cpu;
>                    policy->cpu = cpu;
>                    up_write(&policy->rwsem);
>                    ---->
>                 }
>
> And thread A run to the same position as above, don't ignore my work
> blindly, that piece of bandage
> could save your time

Oh, please don't misunderstand me. I didn't had any intention of showing
any kind of disrespect.

Okay, why do you say that the above thread you shown has a bug in there?
Its juse updating policy->cpu and that shouldn't make anything unstable.

Please explain again one more time, in details why do you think you hit a
different bug. Also look at kref.h to see which piece of code has hit that
WARN() and it will happen only in the case I have shown. Lets see if I
missed something :)

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Jan. 30, 2015, 2:13 a.m. UTC | #2
On 30 January 2015 at 07:40, ethan zhao <ethan.zhao@oracle.com> wrote:
> For a PPC notification and xen-bus thread race, could you tell me a way how
> to reproduce it by trigger the PPC notification and xen-bus events manually
> ?
> You really want me write some code into a test kernel to flood the PPC and
> xen-bus at the same time ? if we could analysis code and get the issue
> clearly, we wouldn't wait the users to yell out.

I thought you already have a test where you are hitting the issue you originally
reported. Atleast Santosh did confirm that he is hitting 3/5 times in his kernel
during boot..

My reasoning of why your observation doesn't fit here:

Copying from your earlier mail..

 Thread A: Workqueue: kacpi_notify

 acpi_processor_notify()
   acpi_processor_ppc_has_changed()
         cpufreq_update_policy()
           cpufreq_cpu_get()
             kobject_get()

This tries to increment the count and the warning you have mentioned
happen because:

WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);

i.e. even after incrementing the count, it is < 2. Which I believe will be
1. Which means that we have tried to do kobject_get() on a kobject
for which kobject_put() is already done.

 Thread B: xenbus_thread()

 xenbus_thread()
   msg->u.watch.handle->callback()
     handle_vcpu_hotplug_event()
       vcpu_hotplug()
         cpu_down()
           __cpu_notify(CPU_DOWN_PREPARE..)
             cpufreq_cpu_callback()
               __cpufreq_remove_dev_prepare()
                 update_policy_cpu()
                   kobject_move()


Okay, where is the race or kobject_put() here ? We are just moving
the kobject and it has nothing to do with the refcount of kobject.

Why do you see its a race ?
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Jan. 30, 2015, 3:14 a.m. UTC | #3
On 30 January 2015 at 07:51, ethan zhao <ethan.zhao@oracle.com> wrote:
>> My reasoning of why your observation doesn't fit here:
>>
>> Copying from your earlier mail..
>>
>>   Thread A: Workqueue: kacpi_notify
>>
>>   acpi_processor_notify()
>>     acpi_processor_ppc_has_changed()
>>           cpufreq_update_policy()
>>             cpufreq_cpu_get()
>>               kobject_get()
>>
>> This tries to increment the count and the warning you have mentioned
>> happen because:
>>
>> WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
>>
>> i.e. even after incrementing the count, it is < 2. Which I believe will be
>> 1. Which means that we have tried to do kobject_get() on a kobject
>> for which kobject_put() is already done.
>>
>>   Thread B: xenbus_thread()
>>
>>   xenbus_thread()
>>     msg->u.watch.handle->callback()
>>       handle_vcpu_hotplug_event()
>>         vcpu_hotplug()
>>           cpu_down()
>>             __cpu_notify(CPU_DOWN_PREPARE..)
>>               cpufreq_cpu_callback()
>>                 __cpufreq_remove_dev_prepare()
>>                   update_policy_cpu()
>>                     kobject_move()
>>
>>
>> Okay, where is the race or kobject_put() here ? We are just moving
>> the kobject and it has nothing to do with the refcount of kobject.
>>
>> Why do you see its a race ?
>
>  I mean the policy->cpu has been changed, that CPU is about to be down,
>  Thread A continue to get and update the policy for it blindly, that is
>  what I Say 'race', not the refcount itself.

First of all, the WARN you had in your patch doesn't have anything to do
with the so-called race you just define. Its because of the reason I defined
earlier.

Second, what if policy->cpu is getting updated? A policy manages a group
of CPUs, not a single cpu. And there still are other CPUs online for that
policy and so kobject_get() for that policy->kobj is perfectly valid.
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Jan. 30, 2015, 4:14 a.m. UTC | #4
On 30 January 2015 at 09:16, ethan zhao <ethan.zhao@oracle.com> wrote:
>  You mean the policy is shared by all CPUs, so PPC notification about one

A policy may or maynot be shared by multiple CPUs. It all depends on your
systems configuration. All CPUs which share clock line, share a policy.

You can verify that from /sys/devices/system/cpu/cpu*/related_cpus. This
gives the CPUs that share policy.

>  CPU should update all CPU's policy, right ? even the requested CPU is
> shutting  down.

CPUs sharing policy have a single policy structure. And so policy would
be updated for all CPUs that share the poilcy. Even if some cpu goes down,
the policy might still be up and running.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Jan. 31, 2015, 12:31 a.m. UTC | #5
On 31 January 2015 at 04:27, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> No, it isn't.  cpufreq_cpu_data is a pointer and doesn't need any locks to
> protect it.  What the lock does is to guarantee the callers of cpufreq_cpu_get()
> that the policy object won't go away from under them (it is used for some other
> purposes too, but they are unrelated).

Yeah, its not just locking. Otherwise the locking only at the bottom
of this routine
should have fixed it.

> That's almost correct.  In addition to the above (albeit maybe unintentionally)
> the patch also fixes the possible race between two instances of
> __cpufreq_remove_dev_finish() with the same arguments running in parallel with
> each other.  The proof is left as an exercise to the reader. :-)

Two __cpufreq_remove_dev_finish() can't get called in parallel, so it
doesn't fix
any race there :).

> Please try to improve the changelog ...

Yes.
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Feb. 2, 2015, 3:20 a.m. UTC | #6
On 2 February 2015 at 07:24, ethan zhao <ethan.zhao@oracle.com> wrote:
>  Policy will be freed only when that's last CPU shutting down, how does it
>  happen when system booting ?

I couldn't understand what you are asking here. Though policy will be allocated
for the first CPU of every policy during boot..

>  The description of the patch would be wrong (the Xen_bus call stack) --
>  Did the xen_bus shut down every CPU till the last during booting ?

I don't know about that..
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 4473eba1d6b0..e3bf702b5588 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1409,9 +1409,10 @@  static int __cpufreq_remove_dev_finish(struct device *dev,
 	unsigned long flags;
 	struct cpufreq_policy *policy;
 
-	read_lock_irqsave(&cpufreq_driver_lock, flags);
+	write_lock_irqsave(&cpufreq_driver_lock, flags);
 	policy = per_cpu(cpufreq_cpu_data, cpu);
-	read_unlock_irqrestore(&cpufreq_driver_lock, flags);
+	per_cpu(cpufreq_cpu_data, cpu) = NULL;
+	write_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
 	if (!policy) {
 		pr_debug("%s: No cpu_data found\n", __func__);
@@ -1466,7 +1467,6 @@  static int __cpufreq_remove_dev_finish(struct device *dev,
 		}
 	}
 
-	per_cpu(cpufreq_cpu_data, cpu) = NULL;
 	return 0;
 }