Message ID | 1350387889-15324-2-git-send-email-hongbo.zhang@linaro.com |
---|---|
State | New |
Headers | show |
Hi, On 10/16/2012 01:44 PM, hongbo.zhang wrote: > From: "hongbo.zhang" <hongbo.zhang@linaro.com> > > In the previous bind function, cdev->get_max_state(cdev, &max_state) is called > before the registration function finishes, but at this moment, the parameter > cdev at thermal driver layer isn't ready--it will get ready only after its > registration, so the the get_max_state callback cannot tell the max_state > according to the cdev input. > This problem can be fixed by separating the bind operation out of registration > and doing it when registration completely finished. When thermal_cooling_device_register() is called, the thermal framework assumes the cooling device is "ready", i.e. all of its ops callbacks return meaningful results. If the cooling device is not ready at this point, then this is a bug in the code that registers it. Specifically, the faulty code in your case is in the cpufreq cooling implementation, where the cooling device is registered before being added to the internal list of cpufreq cooling devices. So, IMHO the fix is needed there. -- Francesco
Hi Francesco, I found out more points about this issue. [1] cdev should be ready when get_max_state callback be called, otherwise parameter cdev is useless, imagine there may be cases that get_max_state call back is shared by more than one cooling devices of same kind, like this: common_get_max_state(*cdev, *state) { if (cdev == cdev1) *state = 3; else if (cdev == cdev) *state = 5; else } [2] In my cpufreq cooling case(in fact cdev is not used to calculate max_state), the cooling_cpufreq_list should be ready when get_max_state call back is called. In this patch I defer the binding when registration finished, but this is not perfect now, see this routine in cpufreq_cooling_register: thermal_cooling_device_register; at this time: thermal_bind_work -> get_max_state -> get NULL cooling_cpufreq_list and then: list_add_tail(&cpufreq_dev->node, &cooling_cpufreq_list) This is due to we cannot know exactly when the bind work is executed. (and this can be fixed by moving mutex_lock(&cooling_cpufreq_lock) aheadof thermal_cooling_device_register and other corresponding modifications, but I found another way as below) [3] Root cause of this problem is calling get_max_state in register -> bind routine. Better solution is to add another parameter in cooling device register function, also add a max_state member in struct cdev, like: thermal_cooling_device_register(char *type, void *devdata, const struct thermal_cooling_device_ops *ops, unsigned long max_state) and then in the bind function: if(cdev->max_state) max_state = cdev->max_state; else cdev->get_max_state(cdev, &max_state) It is common sense that the cooling driver should know its cooling max_state(ability) before registration, and it can be offered when register. I think this way doesn't change both thermal and cooling layer much, it is more clear. I will update this patch soon. On 21 October 2012 18:05, Francesco Lavra <francescolavra.fl@gmail.com> wrote: > Hi, > > On 10/16/2012 01:44 PM, hongbo.zhang wrote: >> From: "hongbo.zhang" <hongbo.zhang@linaro.com> >> >> In the previous bind function, cdev->get_max_state(cdev, &max_state) is called >> before the registration function finishes, but at this moment, the parameter >> cdev at thermal driver layer isn't ready--it will get ready only after its >> registration, so the the get_max_state callback cannot tell the max_state >> according to the cdev input. >> This problem can be fixed by separating the bind operation out of registration >> and doing it when registration completely finished. > > When thermal_cooling_device_register() is called, the thermal framework > assumes the cooling device is "ready", i.e. all of its ops callbacks > return meaningful results. If the cooling device is not ready at this > point, then this is a bug in the code that registers it. > Specifically, the faulty code in your case is in the cpufreq cooling > implementation, where the cooling device is registered before being > added to the internal list of cpufreq cooling devices. So, IMHO the fix > is needed there. > > -- > Francesco
Hi, On 10/23/2012 10:23 AM, Hongbo Zhang wrote: > Hi Francesco, > I found out more points about this issue. > > [1] > cdev should be ready when get_max_state callback be called, otherwise > parameter cdev is useless, imagine there may be cases that > get_max_state call back is shared by more than one cooling devices of > same kind, like this: > common_get_max_state(*cdev, *state) > { > if (cdev == cdev1) *state = 3; > else if (cdev == cdev) *state = 5; > else > } > > [2] > In my cpufreq cooling case(in fact cdev is not used to calculate > max_state), the cooling_cpufreq_list should be ready when > get_max_state call back is called. In this patch I defer the binding > when registration finished, but this is not perfect now, see this > routine in cpufreq_cooling_register: > > thermal_cooling_device_register; > at this time: thermal_bind_work -> get_max_state -> get NULL > cooling_cpufreq_list > and then: list_add_tail(&cpufreq_dev->node, &cooling_cpufreq_list) > This is due to we cannot know exactly when the bind work is executed. > (and this can be fixed by moving mutex_lock(&cooling_cpufreq_lock) > aheadof thermal_cooling_device_register and other corresponding > modifications, but I found another way as below) > > [3] > Root cause of this problem is calling get_max_state in register -> bind routine. > Better solution is to add another parameter in cooling device register > function, also add a max_state member in struct cdev, like: > thermal_cooling_device_register(char *type, void *devdata, const > struct thermal_cooling_device_ops *ops, unsigned long max_state) > and then in the bind function: > if(cdev->max_state) > max_state = cdev->max_state; > else > cdev->get_max_state(cdev, &max_state) > > It is common sense that the cooling driver should know its cooling > max_state(ability) before registration, and it can be offered when > register. > I think this way doesn't change both thermal and cooling layer much, > it is more clear. > I will update this patch soon. I still believe the thermal layer doesn't need any change to work around this problem, and I still believe that when a cooling device is being registered, all of its ops should be fully functional. The problem with the cpufreq cooling device driver is that its callbacks use the internal list of devices to retrieve the struct cpufreq_cooling_device instance corresponding to a given struct thermal_cooling_device. This is not necessary, because the struct thermal_cooling_device has a private data pointer (devdata) which in this case is exactly a reference to the struct cpufreq_cooling_device instance the callbacks are looking for. In fact, I think the cooling_cpufreq_list is not necessary at all, and should be removed from the cpufreq cooling driver. So the 3 callbacks, instead of iterating through the device list, should have something like: struct cpufreq_cooling_device *cpufreq_device = cdev->devdata; That would do the trick. -- Francesco
On 24 October 2012 06:13, Francesco Lavra <francescolavra.fl@gmail.com> wrote: > Hi, > On 10/23/2012 10:23 AM, Hongbo Zhang wrote: >> Hi Francesco, >> I found out more points about this issue. >> >> [1] >> cdev should be ready when get_max_state callback be called, otherwise >> parameter cdev is useless, imagine there may be cases that >> get_max_state call back is shared by more than one cooling devices of >> same kind, like this: >> common_get_max_state(*cdev, *state) >> { >> if (cdev == cdev1) *state = 3; >> else if (cdev == cdev) *state = 5; >> else >> } >> >> [2] >> In my cpufreq cooling case(in fact cdev is not used to calculate >> max_state), the cooling_cpufreq_list should be ready when >> get_max_state call back is called. In this patch I defer the binding >> when registration finished, but this is not perfect now, see this >> routine in cpufreq_cooling_register: >> >> thermal_cooling_device_register; >> at this time: thermal_bind_work -> get_max_state -> get NULL >> cooling_cpufreq_list >> and then: list_add_tail(&cpufreq_dev->node, &cooling_cpufreq_list) >> This is due to we cannot know exactly when the bind work is executed. >> (and this can be fixed by moving mutex_lock(&cooling_cpufreq_lock) >> aheadof thermal_cooling_device_register and other corresponding >> modifications, but I found another way as below) >> >> [3] >> Root cause of this problem is calling get_max_state in register -> bind routine. >> Better solution is to add another parameter in cooling device register >> function, also add a max_state member in struct cdev, like: >> thermal_cooling_device_register(char *type, void *devdata, const >> struct thermal_cooling_device_ops *ops, unsigned long max_state) >> and then in the bind function: >> if(cdev->max_state) >> max_state = cdev->max_state; >> else >> cdev->get_max_state(cdev, &max_state) >> >> It is common sense that the cooling driver should know its cooling >> max_state(ability) before registration, and it can be offered when >> register. >> I think this way doesn't change both thermal and cooling layer much, >> it is more clear. >> I will update this patch soon. > > I still believe the thermal layer doesn't need any change to work around > this problem, and I still believe that when a cooling device is being > registered, all of its ops should be fully functional. > The problem with the cpufreq cooling device driver is that its callbacks > use the internal list of devices to retrieve the struct > cpufreq_cooling_device instance corresponding to a given struct > thermal_cooling_device. This is not necessary, because the struct > thermal_cooling_device has a private data pointer (devdata) which in > this case is exactly a reference to the struct cpufreq_cooling_device > instance the callbacks are looking for. In fact, I think the > cooling_cpufreq_list is not necessary at all, and should be removed from > the cpufreq cooling driver. > So the 3 callbacks, instead of iterating through the device list, should > have something like: > struct cpufreq_cooling_device *cpufreq_device = cdev->devdata; > That would do the trick. Hi Francesco, When I found out this issue, I was hesitating to select the best solution among several ideas. It is clear now after talk with you, I will send patch for cpufreq cooling layer. Thank you. > > -- > Francesco
diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c index 9ee42ca..dd3d024 100644 --- a/drivers/thermal/thermal_sys.c +++ b/drivers/thermal/thermal_sys.c @@ -70,6 +70,8 @@ static LIST_HEAD(thermal_tz_list); static LIST_HEAD(thermal_cdev_list); static DEFINE_MUTEX(thermal_list_lock); +static struct work_struct thermal_bind; + static int get_idr(struct idr *idr, struct mutex *lock, int *id) { int err; @@ -777,7 +779,7 @@ int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz, dev->lower = lower; dev->target = THERMAL_NO_TARGET; - result = get_idr(&tz->idr, &tz->lock, &dev->id); + result = get_idr(&tz->idr, NULL, &dev->id); if (result) goto free_mem; @@ -796,7 +798,7 @@ int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz, if (result) goto remove_symbol_link; - mutex_lock(&tz->lock); + /* tz->lock should have been locked outside this function */ mutex_lock(&cdev->lock); list_for_each_entry(pos, &tz->thermal_instances, tz_node) if (pos->tz == tz && pos->trip == trip && pos->cdev == cdev) { @@ -808,7 +810,6 @@ int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz, list_add_tail(&dev->cdev_node, &cdev->thermal_instances); } mutex_unlock(&cdev->lock); - mutex_unlock(&tz->lock); if (!result) return 0; @@ -895,7 +896,6 @@ thermal_cooling_device_register(char *type, void *devdata, const struct thermal_cooling_device_ops *ops) { struct thermal_cooling_device *cdev; - struct thermal_zone_device *pos; int result; if (type && strlen(type) >= THERMAL_NAME_LENGTH) @@ -947,16 +947,10 @@ thermal_cooling_device_register(char *type, void *devdata, mutex_lock(&thermal_list_lock); list_add(&cdev->node, &thermal_cdev_list); - list_for_each_entry(pos, &thermal_tz_list, node) { - if (!pos->ops->bind) - continue; - result = pos->ops->bind(pos, cdev); - if (result) - break; - - } mutex_unlock(&thermal_list_lock); + schedule_work(&thermal_bind); + if (!result) return cdev; @@ -1141,19 +1135,13 @@ static void thermal_zone_trip_update(struct thermal_zone_device *tz, return; } -/** - * thermal_zone_device_update - force an update of a thermal zone's state - * @ttz: the thermal zone to update - */ -void thermal_zone_device_update(struct thermal_zone_device *tz) +void __thermal_zone_device_update(struct thermal_zone_device *tz) { int count, ret = 0; long temp, trip_temp; enum thermal_trip_type trip_type; - mutex_lock(&tz->lock); - if (tz->ops->get_temp(tz, &temp)) { /* get_temp failed - retry it later */ pr_warn("failed to read out thermal zone %d\n", tz->id); @@ -1206,10 +1194,56 @@ leave: thermal_zone_device_set_polling(tz, tz->polling_delay); else thermal_zone_device_set_polling(tz, 0); +} + +/** + * thermal_zone_device_update - force an update of a thermal zone's state + * @tz: the thermal zone to update + */ +void thermal_zone_device_update(struct thermal_zone_device *tz) +{ + mutex_lock(&tz->lock); + + __thermal_zone_device_update(tz); + mutex_unlock(&tz->lock); } EXPORT_SYMBOL(thermal_zone_device_update); +static void thermal_zone_do_bind_work(struct work_struct *work) +{ + struct thermal_instance *instance; + struct thermal_zone_device *tz; + struct thermal_cooling_device *cdev; + + mutex_lock(&thermal_list_lock); + + list_for_each_entry(tz, &thermal_tz_list, node) + list_for_each_entry(cdev, &thermal_cdev_list, node) { + + mutex_lock(&tz->lock); + + if (list_empty(&tz->thermal_instances) + && tz->ops->bind) { + tz->ops->bind(tz, cdev); + __thermal_zone_device_update(tz); + mutex_unlock(&tz->lock); + break; + } + + list_for_each_entry(instance, &tz->thermal_instances, + tz_node) + if (instance->cdev != cdev && tz->ops->bind) { + tz->ops->bind(tz, cdev); + __thermal_zone_device_update(tz); + } + + mutex_unlock(&tz->lock); + } + + mutex_unlock(&thermal_list_lock); +} + /** * create_trip_attrs - create attributes for trip points * @tz: the thermal zone device @@ -1335,7 +1369,6 @@ struct thermal_zone_device *thermal_zone_device_register(const char *type, int passive_delay, int polling_delay) { struct thermal_zone_device *tz; - struct thermal_cooling_device *pos; enum thermal_trip_type trip_type; int result; int count; @@ -1419,17 +1452,12 @@ struct thermal_zone_device *thermal_zone_device_register(const char *type, mutex_lock(&thermal_list_lock); list_add_tail(&tz->node, &thermal_tz_list); - if (ops->bind) - list_for_each_entry(pos, &thermal_cdev_list, node) { - result = ops->bind(tz, pos); - if (result) - break; - } mutex_unlock(&thermal_list_lock); - INIT_DELAYED_WORK(&(tz->poll_queue), thermal_zone_device_check); + if (ops->bind) + schedule_work(&thermal_bind); - thermal_zone_device_update(tz); + INIT_DELAYED_WORK(&(tz->poll_queue), thermal_zone_device_check); if (!result) return tz; @@ -1588,6 +1616,7 @@ static int __init thermal_init(void) { int result = 0; + INIT_WORK(&thermal_bind, thermal_zone_do_bind_work); result = class_register(&thermal_class); if (result) { idr_destroy(&thermal_tz_idr); @@ -1601,6 +1630,7 @@ static int __init thermal_init(void) static void __exit thermal_exit(void) { + cancel_work_sync(&thermal_bind); class_unregister(&thermal_class); idr_destroy(&thermal_tz_idr); idr_destroy(&thermal_cdev_idr);