[0/3] Misc stability fixes and optimization for rpmh driver

Message ID	1580796831-18996-1-git-send-email-mkshah@codeaurora.org
Headers	show Return-Path: <SRS0=6KYC=3Y=vger.kernel.org=linux-arm-msm-owner@kernel.org> Date: Subject: Cc: To: From: Sender; bh=8AUoz89RioV7iQtgC6x47BZ4XsDCMQfyG7d4Q9lkC18=; b=At/BzPYkS1pzGchDfgMLpGiQspOzVrWgVNb7drBWm8rAqBnRPTtM1hM0aSS8ZGCTccel2CQ9 24vr4SEh+vaAKfSTTVjXbzdbiYNSHe9pCXkxH8pclJiQU+ogtidl476+WCLh2TNwVRMZnPAN Gb1uXsEAlN3IdwZjIS1pisC2fzQ= sender: mkshah) by smtp.codeaurora.org (Postfix) with ESMTPSA id 6335DC43383; Tue, 4 Feb 2020 06:14:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 6335DC43383 From: Maulik Shah <mkshah@codeaurora.org> To: bjorn.andersson@linaro.org, agross@kernel.org Cc: linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org, swboyd@chromium.org, evgreen@chromium.org, dianders@chromium.org, rnayak@codeaurora.org, ilina@codeaurora.org, lsrao@codeaurora.org, Maulik Shah <mkshah@codeaurora.org> Subject: [PATCH 0/3] Misc stability fixes and optimization for rpmh driver Date: Tue, 4 Feb 2020 11:43:48 +0530 Message-Id: <1580796831-18996-1-git-send-email-mkshah@codeaurora.org> Sender: linux-arm-msm-owner@vger.kernel.org Precedence: bulk
Series	Misc stability fixes and optimization for rpmh driver \| expand [0/3] Misc stability fixes and optimization for rpmh driver [3/3] soc: qcom: rpmh: Invalidate sleep and wake TCS before flushing new data

Message ID

1580796831-18996-1-git-send-email-mkshah@codeaurora.org

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 6335DC43383
From: Maulik Shah <mkshah@codeaurora.org>
To: bjorn.andersson@linaro.org, agross@kernel.org
Cc: linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
	swboyd@chromium.org, evgreen@chromium.org, dianders@chromium.org,
	rnayak@codeaurora.org, ilina@codeaurora.org, lsrao@codeaurora.org,
	Maulik Shah <mkshah@codeaurora.org>
Subject: [PATCH 0/3] Misc stability fixes and optimization for rpmh driver
Date: Tue,  4 Feb 2020 11:43:48 +0530
Message-Id: <1580796831-18996-1-git-send-email-mkshah@codeaurora.org>
Sender: linux-arm-msm-owner@vger.kernel.org
Precedence: bulk

Series

Misc stability fixes and optimization for rpmh driver | expand

Message

Maulik Shah Feb. 4, 2020, 6:13 a.m. UTC

This series includes stability fixes and optimization for rpmh driver.

Maulik Shah (3):
  soc: qcom: rpmh: Update dirty flag only when data changes
  soc: qcom: rpmh: Update rpm_msgs offset address and add list_del
  soc: qcom: rpmh: Invalidate sleep and wake TCS before flushing new
    data

 drivers/soc/qcom/rpmh.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

Comments

Evan Green Feb. 5, 2020, 12:35 a.m. UTC | #1

On Mon, Feb 3, 2020 at 10:14 PM Maulik Shah <mkshah@codeaurora.org> wrote:
>
> Currently rpmh ctrlr dirty flag is set for all cases regardless
> of data is really changed or not.
>
> Add changes to update it when data is updated to new values.
>
> Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
> ---
>  drivers/soc/qcom/rpmh.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
> index 035091f..c3d6f00 100644
> --- a/drivers/soc/qcom/rpmh.c
> +++ b/drivers/soc/qcom/rpmh.c
> @@ -139,20 +139,27 @@ static struct cache_req *cache_rpm_request(struct rpmh_ctrlr *ctrlr,
>  existing:
>         switch (state) {
>         case RPMH_ACTIVE_ONLY_STATE:
> -               if (req->sleep_val != UINT_MAX)
> +               if (req->sleep_val != UINT_MAX) {
>                         req->wake_val = cmd->data;
> +                       ctrlr->dirty = true;
> +               }

Don't you need to set dirty = true for ACTIVE_ONLY state always? The
conditional is just saying "if nobody set a sleep vote, then maintain
this vote when we wake back up".

Maulik Shah Feb. 5, 2020, 4:14 a.m. UTC | #2

On 2/5/2020 6:05 AM, Evan Green wrote:
> On Mon, Feb 3, 2020 at 10:14 PM Maulik Shah <mkshah@codeaurora.org> wrote:
>> Currently rpmh ctrlr dirty flag is set for all cases regardless
>> of data is really changed or not.
>>
>> Add changes to update it when data is updated to new values.
>>
>> Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
>> ---
>>   drivers/soc/qcom/rpmh.c | 15 +++++++++++----
>>   1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
>> index 035091f..c3d6f00 100644
>> --- a/drivers/soc/qcom/rpmh.c
>> +++ b/drivers/soc/qcom/rpmh.c
>> @@ -139,20 +139,27 @@ static struct cache_req *cache_rpm_request(struct rpmh_ctrlr *ctrlr,
>>   existing:
>>          switch (state) {
>>          case RPMH_ACTIVE_ONLY_STATE:
>> -               if (req->sleep_val != UINT_MAX)
>> +               if (req->sleep_val != UINT_MAX) {
>>                          req->wake_val = cmd->data;
>> +                       ctrlr->dirty = true;
>> +               }
> Don't you need to set dirty = true for ACTIVE_ONLY state always? The
> conditional is just saying "if nobody set a sleep vote, then maintain
> this vote when we wake back up".

The ACTIVE_ONLY vote is cached as wake_val to be apply when wakeup happens.

In case value didn't change,wake_val is still same as older value and 
there is no need to mark the entire cache as dirty.

Evan Green Feb. 5, 2020, 6:07 p.m. UTC | #3

On Tue, Feb 4, 2020 at 8:14 PM Maulik Shah <mkshah@codeaurora.org> wrote:
>
>
> On 2/5/2020 6:05 AM, Evan Green wrote:
> > On Mon, Feb 3, 2020 at 10:14 PM Maulik Shah <mkshah@codeaurora.org> wrote:
> >> Currently rpmh ctrlr dirty flag is set for all cases regardless
> >> of data is really changed or not.
> >>
> >> Add changes to update it when data is updated to new values.
> >>
> >> Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
> >> ---
> >>   drivers/soc/qcom/rpmh.c | 15 +++++++++++----
> >>   1 file changed, 11 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
> >> index 035091f..c3d6f00 100644
> >> --- a/drivers/soc/qcom/rpmh.c
> >> +++ b/drivers/soc/qcom/rpmh.c
> >> @@ -139,20 +139,27 @@ static struct cache_req *cache_rpm_request(struct rpmh_ctrlr *ctrlr,
> >>   existing:
> >>          switch (state) {
> >>          case RPMH_ACTIVE_ONLY_STATE:
> >> -               if (req->sleep_val != UINT_MAX)
> >> +               if (req->sleep_val != UINT_MAX) {
> >>                          req->wake_val = cmd->data;
> >> +                       ctrlr->dirty = true;
> >> +               }
> > Don't you need to set dirty = true for ACTIVE_ONLY state always? The
> > conditional is just saying "if nobody set a sleep vote, then maintain
> > this vote when we wake back up".
>
> The ACTIVE_ONLY vote is cached as wake_val to be apply when wakeup happens.
>
> In case value didn't change,wake_val is still same as older value and
> there is no need to mark the entire cache as dirty.
>

Ah, I see it now. We don't actually cache active_only votes anywhere,
since they're one time requests. The sleep/wake votes seem to be the
only thing that gets cached.

I was thinking it might be safer to also set dirty = true just after
list_add_tail, since in the non-existing case this is a new batch that
RPMh has never seen before and should always be written. But I suppose
your checks here should cover that case, since sleep_val and wake_val
are initialized to UINT_MAX. If you think the code might evolve, it
might still be nice to add it.

While I'm looking at that, why do we have this needless INIT_LIST_HEAD?
        INIT_LIST_HEAD(&req->list);
        list_add_tail(&req->list, &ctrlr->cache);

-Evan

> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Evan Green Feb. 21, 2020, 1:05 a.m. UTC | #4

On Wed, Feb 12, 2020 at 4:15 AM Maulik Shah <mkshah@codeaurora.org> wrote:
>
> On 2/5/2020 11:51 PM, Evan Green wrote:
> > On Tue, Feb 4, 2020 at 9:12 PM Maulik Shah <mkshah@codeaurora.org> wrote:
> >>
> >> On 2/5/2020 6:01 AM, Evan Green wrote:
> >>> On Mon, Feb 3, 2020 at 10:14 PM Maulik Shah <mkshah@codeaurora.org> wrote:
> >>>> rpm_msgs are copied in continuously allocated memory during write_batch.
> >>>> Update request pointer to correctly point to designated area for rpm_msgs.
> >>>>
> >>>> While at this also add missing list_del before freeing rpm_msgs.
> >>>>
> >>>> Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
> >>>> ---
> >>>>    drivers/soc/qcom/rpmh.c | 9 ++++++---
> >>>>    1 file changed, 6 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
> >>>> index c3d6f00..04c7805 100644
> >>>> --- a/drivers/soc/qcom/rpmh.c
> >>>> +++ b/drivers/soc/qcom/rpmh.c
> >>>> @@ -65,7 +65,7 @@ struct cache_req {
> >>>>    struct batch_cache_req {
> >>>>           struct list_head list;
> >>>>           int count;
> >>>> -       struct rpmh_request rpm_msgs[];
> >>>> +       struct rpmh_request *rpm_msgs;
> >>>>    };
> >>>>
> >>>>    static struct rpmh_ctrlr *get_rpmh_ctrlr(const struct device *dev)
> >>>> @@ -327,8 +327,10 @@ static void invalidate_batch(struct rpmh_ctrlr *ctrlr)
> >>>>           unsigned long flags;
> >>>>
> >>>>           spin_lock_irqsave(&ctrlr->cache_lock, flags);
> >>>> -       list_for_each_entry_safe(req, tmp, &ctrlr->batch_cache, list)
> >>>> +       list_for_each_entry_safe(req, tmp, &ctrlr->batch_cache, list) {
> >>>> +               list_del(&req->list);
> >>>>                   kfree(req);
> >>>> +       }
> >>>>           INIT_LIST_HEAD(&ctrlr->batch_cache);
> >>> Hm, I don't get it. list_for_each_entry_safe ensures you can traverse
> >>> the list while freeing it behind you. ctrlr->batch_cache is now a
> >>> bogus list, but is re-inited with the lock held. From my reading,
> >>> there doesn't seem to be anything wrong with the current code. Can you
> >>> elaborate on the bug you found?
> >> Hi Evan,
> >>
> >> when we don't do list_del, there might be access to already freed memory.
> >> Even after current item free via kfree(req), without list_del, the next
> >> and prev item's pointer are still pointing to this freed region.
> >> it seem best to call list_del to ensure that before freeing this area,
> >> no other item in list refer to this.
> > I don't think that's true. the "_safe" part of
> > list_for_each_entry_safe ensures that we don't touch the ->next member
> > of any node after freeing it. So I don't think there's any case where
> > we could touch freed memory. The list_del still seems like needless
> > code to me.
>
> Hmm, ok. i can drop list_del.
>
> see the reason below to include list_del.
>
> >>>>           spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
> >>>>    }
> >>>> @@ -377,10 +379,11 @@ int rpmh_write_batch(const struct device *dev, enum rpmh_state state,
> >>>>                   return -ENOMEM;
> >>>>
> >>>>           req = ptr;
> >>>> +       rpm_msgs = ptr + sizeof(*req);
> >>>>           compls = ptr + sizeof(*req) + count * sizeof(*rpm_msgs);
> >>>>
> >>>>           req->count = count;
> >>>> -       rpm_msgs = req->rpm_msgs;
> >>>> +       req->rpm_msgs = rpm_msgs;
> >>> I don't really understand what this is fixing either, can you explain?
> >> the continous memory allocated via below is for 3 items,
> >>
> >> ptr = kzalloc(sizeof(*req) + count * (sizeof(req->rpm_msgs[0]) +
> >> sizeof(*compls)), GFP_ATOMIC);
> >>
> >> 1. batch_cache_req,  followed by
> >> 2. total count of rpmh_request,  followed by
> >> 3. total count of compls
> >>
> >> current code starts using (3) compls from proper offset in memory
> >>           compls = ptr + sizeof(*req) + count * sizeof(*rpm_msgs);
> >>
> >> however for (2) rpmh_request it does
> >>
> >>           rpm_msgs = req->rpm_msgs;
> >>
> >> because of this it starts 8 byte before its designated area and overlaps
> >> with (1) batch_cache_req struct's last entry.
> >> this patch corrects it via below to ensure rpmh_request uses correct
> >> start address in memory.
> >>
> >>           rpm_msgs = ptr + sizeof(*req);
> > I don't follow that either. The empty array declaration (or the
> > GCC-specific version of it would be  "struct rpmh_request
> > rpm_msgs[0];") is a flexible array member, meaning the member itself
> > doesn't take up any space in the struct. So, for instance, it holds
> > true that &(req->rpm_msgs[0]) == (req + 1). By my reading the existing
> > code is correct, and your patch just adds a needless pointer
> > indirection. Check out this wikipedia entry:
> >
> > https://en.wikipedia.org/wiki/Flexible_array_member
> Thanks Evan,
>
> Agree that code works even without this.
>
> However from the same wiki,
>
>  >>It is common to allocate sizeof(struct) + array_len*sizeof(array
> element) bytes.
>
>  >>This is not wrong, however it may allocate a few more bytes than
> necessary:
>
> this is what i wanted to convery above, currently it allocated 8 more
> bytes than necessary.
>
> The reason for the change was one use after free reported in rpmh driver.
>
> After including this change, we have not seen this reported again.

Hm, I would not expect that an allocaton of too many bytes would
result in a use-after-free warning. If you still have the warning and
are able to share it, I'm happy to take a look.

>
> I can drop this change in new revision if we don't want it.

Yes, let's drop it for now.
-Evan