diff mbox

[4/8] opp: Introduce APIs to remove OPPs

Message ID f815d0855c1bd28d8389554176acd2620e56b3c8.1416910766.git.viresh.kumar@linaro.org
State New
Headers show

Commit Message

Viresh Kumar Nov. 25, 2014, 10:34 a.m. UTC
OPPs are created statically (from DT) or dynamically. Currently we don't free
OPPs that are created statically, when the module unloads. And so if the module
is inserted back again, we get warning for duplicate OPPs as the same were
already present.

Also, there might be a need to remove dynamic OPPs in future and so API for that
is also added.

This patch adds helper APIs to remove/free existing static and dynamic OPPs.

Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

---
@Paul/Rafael: Do we need to use call_srcu() instead of kfree_rcu() in
opp_set_availability()? I left it as it looked a bit different.
srcu_notifier_call_chain() is getting called after deleting the node.

 drivers/base/power/opp.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm_opp.h   |  12 +++++-
 2 files changed, 113 insertions(+), 1 deletion(-)

Comments

Viresh Kumar Nov. 25, 2014, 5:16 p.m. UTC | #1
On 25 November 2014 at 21:54, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> If the data is alway accessed under an SRCU read-side critical section,
> then you do need call_srcu() when freeing it -- as you pointed out,
> -with- the srcu_struct included.  ;-)

:)

> If the data is accessed under both SRCU and RCU, then things get a bit
> more involved.

Yes, that is always the case here.

>> +void __dev_pm_opp_remove(struct device_opp *dev_opp, struct dev_pm_opp *opp)
>> +{
>> +     /*
>> +      * Notify the changes in the availability of the operable
>> +      * frequency/voltage list.
>> +      */
>> +     srcu_notifier_call_chain(&dev_opp->srcu_head, OPP_EVENT_REMOVE, opp);
>> +     list_del_rcu(&opp->node);
>> +     call_srcu(&dev_opp->srcu_head.srcu, &opp->rcu_head, kfree_opp_rcu);
>
> I am guessing that opp->node is being removed from the list traversed
> by srcu_notifier_call_chain()...  (Looks that way below.)

I couldn't find any code in kernel which is registered for this notifier chain
yet. And so don't know what that code will do. It may not traverse the list
or it may :)

I couldn't understand this comment of yours, sorry: (Looks that way below.)

>> +
>> +     if (list_empty(&dev_opp->opp_list)) {
>
> Does the above want to be "if (!list_empty(&dev_opp->opp_list)) {"?

No. dev_opp contains list of all OPPs. When all OPPs are gone, we don't
want dev_opp anymore and so freeing it.

>> +             list_del_rcu(&dev_opp->node);
>> +             call_srcu(&dev_opp->srcu_head.srcu, &dev_opp->rcu_head,
>> +                       kfree_device_rcu);
>> +     }
>> +}

So, I am still not sure what we need to do here as we have readers with
both rcu and srcu critical regions.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Nov. 26, 2014, 6:29 a.m. UTC | #2
On 25 November 2014 at 23:09, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Tue, Nov 25, 2014 at 10:46:29PM +0530, Viresh Kumar wrote:

>> I couldn't find any code in kernel which is registered for this notifier chain
>> yet. And so don't know what that code will do. It may not traverse the list
>> or it may :)
>
> I would guess that the srcu_notifier_chain_register() in
> devfreq_register_opp_notifier() is adding to the call chain, but I
> do not claim to understand this code.  The srcu_notifier_call_chain()
> above is traversing it.

Actually I searched for who is calling devfreq_register_opp_notifier() and
couldn't notice any callers. But there was one from within devfreq.c and
that was called from exynos code..

So yes, it is used and I was wrong. Ultimately update_devfreq() would
be called from the notifier chain and that may try to change the freq and
so is sleep-able.

>> So, I am still not sure what we need to do here as we have readers with
>> both rcu and srcu critical regions.
>
> Well, the most straightforward approach from an RCU perspective would
> be to make all the readers use the same flavor of RCU.  From Rafael's
> earlier note, I am guessing that some of the code paths absolutely
> require SRCU.  Of course, if all the readers use SRCU, then you can
> simply free using SRCU.

Yeah, so the update_devfreq() call surely requires SRCU.

> You -can- handle situations having more than one type of reader, but
> this requires waiting for all corresponding flavors of grace period.
> This turns out to be fairly simple in this case, for example, have your
> kfree_opp_rcu() function invoke kfree_rcu() instead of kfree().

Hmm, I see.

> But I strongly recommend gaining a full understanding of the code first.
> You can dig yourself a really deep hole really quickly by patching code
> without understanding it!  And apologies if you really do already fully
> understand the code, but your statement above leads me to believe that
> you do not yet fully understand it.

I wouldn't claim to know every part of these frameworks (OPP, Devfreq),
but the maintenance of the nodes/lists is what I understand well.

But yeah, I understand you are worried about. Pushing more bugs for
solving one.

>         I couldn't find any code in kernel which is registered for this
>         notifier chain yet. And so don't know what that code will do. It
>         may not traverse the list or it may :)
>
> One thing that can help internalize code (relatively) quickly is to
> get a large piece of paper and to draw the relationships of the data
> structures first and the relationship of the code later.  When the
> drawing gets too messy, start over with a clean sheet of paper.

Thanks for your suggestions Paul. I was able to do it without a pen
and paper this time as it wasn't that complex. But yes this pen-paper
things really works while working on complex stuff. I found a memory
leak Bug in scheduling domains creation code earlier that way :)

Also it looks like I should use call_srcu() with kfree_rcu() for
opp_set_availability() case as well to avoid any corner cases.

Thanks for your time and help. Really appreciate that :)

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
index b249b01..7ae4db5 100644
--- a/drivers/base/power/opp.c
+++ b/drivers/base/power/opp.c
@@ -79,6 +79,7 @@  struct dev_pm_opp {
  *		however addition is possible and is secured by dev_opp_list_lock
  * @dev:	device pointer
  * @srcu_head:	notifier head to notify the OPP availability changes.
+ * @rcu_head:	RCU callback head used for deferred freeing
  * @opp_list:	list of opps
  *
  * This is an internal data structure maintaining the link to opps attached to
@@ -90,6 +91,7 @@  struct device_opp {
 
 	struct device *dev;
 	struct srcu_notifier_head srcu_head;
+	struct rcu_head rcu_head;
 	struct list_head opp_list;
 };
 
@@ -498,6 +500,76 @@  int dev_pm_opp_add(struct device *dev, unsigned long freq, unsigned long u_volt)
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_add);
 
+static void kfree_opp_rcu(struct rcu_head *head)
+{
+	struct dev_pm_opp *opp = container_of(head, struct dev_pm_opp, rcu_head);
+
+	kfree(opp);
+}
+
+static void kfree_device_rcu(struct rcu_head *head)
+{
+	struct device_opp *device_opp = container_of(head, struct device_opp, rcu_head);
+
+	kfree(device_opp);
+}
+
+void __dev_pm_opp_remove(struct device_opp *dev_opp, struct dev_pm_opp *opp)
+{
+	/*
+	 * Notify the changes in the availability of the operable
+	 * frequency/voltage list.
+	 */
+	srcu_notifier_call_chain(&dev_opp->srcu_head, OPP_EVENT_REMOVE, opp);
+	list_del_rcu(&opp->node);
+	call_srcu(&dev_opp->srcu_head.srcu, &opp->rcu_head, kfree_opp_rcu);
+
+	if (list_empty(&dev_opp->opp_list)) {
+		list_del_rcu(&dev_opp->node);
+		call_srcu(&dev_opp->srcu_head.srcu, &dev_opp->rcu_head,
+			  kfree_device_rcu);
+	}
+}
+
+/**
+ * dev_pm_opp_remove()  - Remove an OPP from OPP list
+ * @dev:	device for which we do this operation
+ * @freq:	OPP to remove with matching 'freq'
+ *
+ * This function removes an opp from the opp list.
+ */
+void dev_pm_opp_remove(struct device *dev, unsigned long freq)
+{
+	struct dev_pm_opp *opp;
+	struct device_opp *dev_opp;
+	bool found = false;
+
+	/* Hold our list modification lock here */
+	mutex_lock(&dev_opp_list_lock);
+
+	dev_opp = find_device_opp(dev);
+	if (IS_ERR(dev_opp))
+		goto unlock;
+
+	list_for_each_entry(opp, &dev_opp->opp_list, node) {
+		if (opp->rate == freq) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		dev_warn(dev, "%s: Couldn't find OPP with freq: %lu\n",
+			 __func__, freq);
+		goto unlock;
+	}
+
+	__dev_pm_opp_remove(dev_opp, opp);
+unlock:
+	mutex_unlock(&dev_opp_list_lock);
+}
+EXPORT_SYMBOL_GPL(dev_pm_opp_remove);
+
 /**
  * opp_set_availability() - helper to set the availability of an opp
  * @dev:		device for which we do this operation
@@ -687,4 +759,34 @@  int of_init_opp_table(struct device *dev)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(of_init_opp_table);
+
+/**
+ * of_free_opp_table() - Free OPP table entries created from static DT entries
+ * @dev:	device pointer used to lookup device OPPs.
+ *
+ * Free OPPs created using static entries present in DT.
+ */
+void of_free_opp_table(struct device *dev)
+{
+	struct device_opp *dev_opp = find_device_opp(dev);
+	struct dev_pm_opp *opp, *tmp;
+
+	/* Check for existing list for 'dev' */
+	dev_opp = find_device_opp(dev);
+	if (WARN(IS_ERR(dev_opp), "%s: dev_opp: %ld\n", dev_name(dev),
+		 PTR_ERR(dev_opp)))
+		return;
+
+	/* Hold our list modification lock here */
+	mutex_lock(&dev_opp_list_lock);
+
+	/* Free static OPPs */
+	list_for_each_entry_safe(opp, tmp, &dev_opp->opp_list, node) {
+		if (!opp->dynamic)
+			__dev_pm_opp_remove(dev_opp, opp);
+	}
+
+	mutex_unlock(&dev_opp_list_lock);
+}
+EXPORT_SYMBOL_GPL(of_free_opp_table);
 #endif
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
index 0330217..cec2d45 100644
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -21,7 +21,7 @@  struct dev_pm_opp;
 struct device;
 
 enum dev_pm_opp_event {
-	OPP_EVENT_ADD, OPP_EVENT_ENABLE, OPP_EVENT_DISABLE,
+	OPP_EVENT_ADD, OPP_EVENT_REMOVE, OPP_EVENT_ENABLE, OPP_EVENT_DISABLE,
 };
 
 #if defined(CONFIG_PM_OPP)
@@ -44,6 +44,7 @@  struct dev_pm_opp *dev_pm_opp_find_freq_ceil(struct device *dev,
 
 int dev_pm_opp_add(struct device *dev, unsigned long freq,
 		   unsigned long u_volt);
+void dev_pm_opp_remove(struct device *dev, unsigned long freq);
 
 int dev_pm_opp_enable(struct device *dev, unsigned long freq);
 
@@ -90,6 +91,10 @@  static inline int dev_pm_opp_add(struct device *dev, unsigned long freq,
 	return -EINVAL;
 }
 
+static inline void dev_pm_opp_remove(struct device *dev, unsigned long freq)
+{
+}
+
 static inline int dev_pm_opp_enable(struct device *dev, unsigned long freq)
 {
 	return 0;
@@ -109,11 +114,16 @@  static inline struct srcu_notifier_head *dev_pm_opp_get_notifier(
 
 #if defined(CONFIG_PM_OPP) && defined(CONFIG_OF)
 int of_init_opp_table(struct device *dev);
+void of_free_opp_table(struct device *dev);
 #else
 static inline int of_init_opp_table(struct device *dev)
 {
 	return -EINVAL;
 }
+
+static inline void of_free_opp_table(struct device *dev)
+{
+}
 #endif
 
 #endif		/* __LINUX_OPP_H__ */