Message ID | 20200422145408.v4.1.Ic7096b3b9b7828cdd41cd5469a6dee5eb6abf549@changeid |
---|---|
State | New |
Headers | show |
Series | [v4,1/5] soc: qcom: rpmh-rsc: Corrently ignore CPU_CLUSTER_PM notifications | expand |
Reviewed-by: Maulik Shah <mkshah@codeaurora.org> Thanks, Maulik On 4/23/2020 3:25 AM, Douglas Anderson wrote: > When a PM Notifier returns NOTIFY_BAD it doesn't get called with > CPU_PM_ENTER_FAILED. It only get called for CPU_PM_ENTER_FAILED if > someone else (further down the notifier chain) returns NOTIFY_BAD. > > Handle this case by taking our CPU out of the list of ones that have > entered PM. Without this it's possible we could detect that the last > CPU went down (and we would flush) even if some CPU was alive. That's > not good since our flushing routines currently assume they're running > on the last CPU for mutual exclusion. > > Fixes: 985427f997b6 ("soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches") > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > > Changes in v4: > - ("...We aren't notified of our own failure...") split out for v4. > > Changes in v3: None > Changes in v2: None > > drivers/soc/qcom/rpmh-rsc.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c > index 3571a99fc839..e540e49fd61c 100644 > --- a/drivers/soc/qcom/rpmh-rsc.c > +++ b/drivers/soc/qcom/rpmh-rsc.c > @@ -823,6 +823,10 @@ static int rpmh_rsc_cpu_pm_callback(struct notifier_block *nfb, > ret = NOTIFY_OK; > > exit: > + if (ret == NOTIFY_BAD) > + /* We won't be called w/ CPU_PM_ENTER_FAILED */ > + cpumask_clear_cpu(smp_processor_id(), &drv->cpus_entered_pm); > + > spin_unlock(&drv->pm_lock); > return ret; > }
Quoting Douglas Anderson (2020-04-22 14:55:00) > When a PM Notifier returns NOTIFY_BAD it doesn't get called with > CPU_PM_ENTER_FAILED. It only get called for CPU_PM_ENTER_FAILED if > someone else (further down the notifier chain) returns NOTIFY_BAD. > > Handle this case by taking our CPU out of the list of ones that have > entered PM. Without this it's possible we could detect that the last > CPU went down (and we would flush) even if some CPU was alive. That's > not good since our flushing routines currently assume they're running > on the last CPU for mutual exclusion. > > Fixes: 985427f997b6 ("soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches") > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- Reported-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Quoting Stephen Boyd (2020-04-23 19:38:24) > Quoting Douglas Anderson (2020-04-22 14:55:00) > > When a PM Notifier returns NOTIFY_BAD it doesn't get called with > > CPU_PM_ENTER_FAILED. It only get called for CPU_PM_ENTER_FAILED if > > someone else (further down the notifier chain) returns NOTIFY_BAD. > > > > Handle this case by taking our CPU out of the list of ones that have > > entered PM. Without this it's possible we could detect that the last > > CPU went down (and we would flush) even if some CPU was alive. That's > > not good since our flushing routines currently assume they're running > > on the last CPU for mutual exclusion. > > > > Fixes: 985427f997b6 ("soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches") > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > --- > > Reported-by: Stephen Boyd <swboyd@chromium.org> Scratch that one! Copy/paste for the lose.
Hi, On Thu, Apr 23, 2020 at 7:48 PM Stephen Boyd <swboyd@chromium.org> wrote: > > Quoting Douglas Anderson (2020-04-22 14:55:02) > > The rpmh-rsc code had both a driver-level lock (sometimes referred to > > in comments as drv->lock) and a lock per-TCS. The idea was supposed > > to be that there would be times where you could get by with just > > locking a TCS lock and therefor other RPMH users wouldn't be blocked. > > > > The above didn't work out so well. > > > > Looking at tcs_write() the bigger drv->lock was held for most of the > > function anyway. Only the __tcs_buffer_write() and > > __tcs_set_trigger() calls were called without it the drv->lock. It > > without holding the drv->lock > > > actually turns out that in tcs_write() we don't need to hold the > > drv->lock for those function calls anyway even if the per-TCS lock > > isn't there anymore. > > Why? It's in the commit as comments, but I'll copy it here too for clarity. > > Thus, from a tcs_write() point of view, the > > per-TCS lock was useless. > > > > Looking at rpmh_rsc_write_ctrl_data(), only the per-TCS lock was held. > > It turns out, though, that this function already needs to be called > > with the equivalent of the drv->lock held anyway (we either need to > > hold drv->lock as we will in a future patch or we need to know no > > other CPUs could be running as happens today). Specifically > > rpmh_rsc_write_ctrl_data() might be writing to a TCS that has been > > borrowed for writing an active transation but it never checks this. > > > > Let's eliminate this extra overhead and avoid possible AB BA locking > > headaches. > > > > Suggested-by: Maulik Shah <mkshah@codeaurora.org> > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > --- > > > > diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c > > index e540e49fd61c..71cebe7fd452 100644 > > --- a/drivers/soc/qcom/rpmh-rsc.c > > +++ b/drivers/soc/qcom/rpmh-rsc.c > > @@ -581,24 +575,19 @@ static int tcs_write(struct rsc_drv *drv, const struct tcs_request *msg) > > if (IS_ERR(tcs)) > > return PTR_ERR(tcs); > > > > - spin_lock_irqsave(&tcs->lock, flags); > > - spin_lock(&drv->lock); > > + spin_lock_irqsave(&drv->lock, flags); > > /* > > * The h/w does not like if we send a request to the same address, > > * when one is already in-flight or being processed. > > */ > > ret = check_for_req_inflight(drv, tcs, msg); > > - if (ret) { > > - spin_unlock(&drv->lock); > > - goto done_write; > > - } > > + if (ret) > > + goto err; > > Nitpick: Usually 'goto err' is used for error paths, not unlock paths. > Use 'goto unlock' for that. I'm confused why this isn't an error path. We're about to return a non-zero value from the function (indicating an error) and we didn't actually do the write. Isn't this an error path? In any case, "unlock" is fine with me so I'll change it, but maybe you can help clarify so I don't make the same mistake next time. > > - tcs_id = find_free_tcs(tcs); > > - if (tcs_id < 0) { > > - ret = tcs_id; > > - spin_unlock(&drv->lock); > > - goto done_write; > > - } > > + ret = find_free_tcs(tcs); > > + if (ret < 0) > > + goto err; > > + tcs_id = ret; > > > > tcs->req[tcs_id - tcs->offset] = msg; > > set_bit(tcs_id, drv->tcs_in_use); > > @@ -612,13 +601,21 @@ static int tcs_write(struct rsc_drv *drv, const struct tcs_request *msg) > > write_tcs_reg_sync(drv, RSC_DRV_CMD_WAIT_FOR_CMPL, tcs_id, 0); > > enable_tcs_irq(drv, tcs_id, true); > > } > > - spin_unlock(&drv->lock); > > + spin_unlock_irqrestore(&drv->lock, flags); > > > > + /* > > + * These two can be done after the lock is released because: > > + * - We marked "tcs_in_use" under lock. > > + * - Once "tcs_in_use" has been marked nobody else could be writing > > + * to these registers until the interrupt goes off. > > + * - The interrupt can't go off until we trigger. > > trigger via some function? I was referring to the __tcs_set_trigger() function call two lines below. I'll clarify. > > + */ > > __tcs_buffer_write(drv, tcs_id, 0, msg); > > __tcs_set_trigger(drv, tcs_id, true); > > > > -done_write: > > - spin_unlock_irqrestore(&tcs->lock, flags); > > + return 0; > > +err: > > + spin_unlock_irqrestore(&drv->lock, flags); > > return ret; > > } > >
diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c index a9e15699f55f..3571a99fc839 100644 --- a/drivers/soc/qcom/rpmh-rsc.c +++ b/drivers/soc/qcom/rpmh-rsc.c @@ -806,6 +806,8 @@ static int rpmh_rsc_cpu_pm_callback(struct notifier_block *nfb, case CPU_PM_EXIT: cpumask_clear_cpu(smp_processor_id(), &drv->cpus_entered_pm); goto exit; + default: + return NOTIFY_DONE; } ret = rpmh_rsc_ctrlr_is_busy(drv);
Our switch statement doesn't have entries for CPU_CLUSTER_PM_ENTER, CPU_CLUSTER_PM_ENTER_FAILED, and CPU_CLUSTER_PM_EXIT and doesn't have a default. This means that we'll try to do a flush in those cases but we won't necessarily be the last CPU down. That's not so ideal since our (lack of) locking assumes we're on the last CPU. Luckily this isn't as big a problem as you'd think since (at least on the SoC I tested) we don't get these notifications except on full system suspend. ...and on full system suspend we get them on the last CPU down. That means that the worst problem we hit is flushing twice. Still, it's good to make it correct. Fixes: 985427f997b6 ("soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches") Reported-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> --- Changes in v4: - ("...Corrently ignore CPU_CLUSTER_PM notifications") split out for v4. Changes in v3: None Changes in v2: None drivers/soc/qcom/rpmh-rsc.c | 2 ++ 1 file changed, 2 insertions(+)