Message ID | 20230916113327.85693-1-bo.ye@mediatek.com |
---|---|
State | Superseded |
Headers | show |
Series | Subject: thermal: Fix potential race condition in suspend/resume | expand |
On Sat, 2023-09-16 at 19:33 +0800, Bo Ye wrote: Correct mail title format: remove "Subject:" from mail title. > From: "yugang.wang" <yugang.wang@mediatek.com> > > Body: > This patch fixes a race condition during system resume. It occurs if > the system is exiting a suspend state and a user is trying to > register/unregister a thermal zone concurrently. The root cause is > that both actions access the `thermal_tz_list`. > > In detail: > > 1. At PM_POST_SUSPEND during the resume, the system reads all thermal > zones in `thermal_tz_list`, then resets and updates their > temperatures. > 2. When registering/unregistering a thermal zone, the > `thermal_tz_list` gets manipulated. > > These two actions might occur concurrently, causing a race condition. > To solve this issue, we introduce a mutex lock to protect > `thermal_tz_list` from being modified while it's being read and > updated during the resume from suspend. > > Kernel oops excerpt related to this fix: > > [ 5201.869845] [T316822] pc: [0xffffffeb7d4876f0] > mutex_lock+0x34/0x170 > [ 5201.869856] [T316822] lr: [0xffffffeb7ca98a84] > thermal_pm_notify+0xd4/0x26c > [... cut for brevity ...] > [ 5201.871061] [T316822] suspend_prepare+0x150/0x470 > [ 5201.871067] [T316822] enter_state+0x84/0x6f4 > [ 5201.871076] [T316822] state_store+0x15c/0x1e8 > > Change-Id: Ifdbdecba17093f91eab7e36ce04b46d311ca6568 > Signed-off-by: yugang.wang <yugang.wang@mediatek.com> > Signed-off-by: Bo Ye <bo.ye@mediatek.com> > --- > drivers/thermal/thermal_core.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/thermal/thermal_core.c > b/drivers/thermal/thermal_core.c > index 8717a3343512..a7a18ed57b6d 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -1529,12 +1529,14 @@ static int thermal_pm_notify(struct > notifier_block *nb, > case PM_POST_HIBERNATION: > case PM_POST_RESTORE: > case PM_POST_SUSPEND: > + mutex_lock(&thermal_list_lock); > atomic_set(&in_suspend, 0); > list_for_each_entry(tz, &thermal_tz_list, node) { > thermal_zone_device_init(tz); > thermal_zone_device_update(tz, > THERMAL_EVENT_UNSPEC > IFIED); > } > + mutex_unlock(&thermal_list_lock); > break; > default: > break;
On 16/09/2023 13:33, Bo Ye wrote: > From: "yugang.wang" <yugang.wang@mediatek.com> > > Body: > This patch fixes a race condition during system resume. It occurs if > the system is exiting a suspend state and a user is trying to > register/unregister a thermal zone concurrently. The root cause is > that both actions access the `thermal_tz_list`. I'm not sure the tasks are already thawed during POST_RESTORE, so no user can unload a driver and then reaching the race window. Is that an observed issue? > In detail: > > 1. At PM_POST_SUSPEND during the resume, the system reads all thermal > zones in `thermal_tz_list`, then resets and updates their > temperatures. > 2. When registering/unregistering a thermal zone, the > `thermal_tz_list` gets manipulated. > > These two actions might occur concurrently, causing a race condition. > To solve this issue, we introduce a mutex lock to protect > `thermal_tz_list` from being modified while it's being read and > updated during the resume from suspend. > > Kernel oops excerpt related to this fix: > > [ 5201.869845] [T316822] pc: [0xffffffeb7d4876f0] mutex_lock+0x34/0x170 > [ 5201.869856] [T316822] lr: [0xffffffeb7ca98a84] thermal_pm_notify+0xd4/0x26c > [... cut for brevity ...] > [ 5201.871061] [T316822] suspend_prepare+0x150/0x470 > [ 5201.871067] [T316822] enter_state+0x84/0x6f4 > [ 5201.871076] [T316822] state_store+0x15c/0x1e8 > > Change-Id: Ifdbdecba17093f91eab7e36ce04b46d311ca6568 > Signed-off-by: yugang.wang <yugang.wang@mediatek.com> > Signed-off-by: Bo Ye <bo.ye@mediatek.com> > --- > drivers/thermal/thermal_core.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > index 8717a3343512..a7a18ed57b6d 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -1529,12 +1529,14 @@ static int thermal_pm_notify(struct notifier_block *nb, > case PM_POST_HIBERNATION: > case PM_POST_RESTORE: > case PM_POST_SUSPEND: > + mutex_lock(&thermal_list_lock); > atomic_set(&in_suspend, 0); > list_for_each_entry(tz, &thermal_tz_list, node) { > thermal_zone_device_init(tz); > thermal_zone_device_update(tz, > THERMAL_EVENT_UNSPECIFIED); > } > + mutex_unlock(&thermal_list_lock); > break; > default: > break;
On Thu, Oct 12, 2023 at 5:39 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > On 16/09/2023 13:33, Bo Ye wrote: > > From: "yugang.wang" <yugang.wang@mediatek.com> > > > > Body: > > This patch fixes a race condition during system resume. It occurs if > > the system is exiting a suspend state and a user is trying to > > register/unregister a thermal zone concurrently. The root cause is > > that both actions access the `thermal_tz_list`. > > I'm not sure the tasks are already thawed during POST_RESTORE, so no > user can unload a driver and then reaching the race window. Yes, they are. > Is that an observed issue? Good question, but the patch looks correct to me. > > In detail: > > > > 1. At PM_POST_SUSPEND during the resume, the system reads all thermal > > zones in `thermal_tz_list`, then resets and updates their > > temperatures. > > 2. When registering/unregistering a thermal zone, the > > `thermal_tz_list` gets manipulated. > > > > These two actions might occur concurrently, causing a race condition. > > To solve this issue, we introduce a mutex lock to protect > > `thermal_tz_list` from being modified while it's being read and > > updated during the resume from suspend. > > > > Kernel oops excerpt related to this fix: > > > > [ 5201.869845] [T316822] pc: [0xffffffeb7d4876f0] mutex_lock+0x34/0x170 > > [ 5201.869856] [T316822] lr: [0xffffffeb7ca98a84] thermal_pm_notify+0xd4/0x26c > > [... cut for brevity ...] > > [ 5201.871061] [T316822] suspend_prepare+0x150/0x470 > > [ 5201.871067] [T316822] enter_state+0x84/0x6f4 > > [ 5201.871076] [T316822] state_store+0x15c/0x1e8 > > > > Change-Id: Ifdbdecba17093f91eab7e36ce04b46d311ca6568 > > Signed-off-by: yugang.wang <yugang.wang@mediatek.com> > > Signed-off-by: Bo Ye <bo.ye@mediatek.com> > > --- > > drivers/thermal/thermal_core.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > > index 8717a3343512..a7a18ed57b6d 100644 > > --- a/drivers/thermal/thermal_core.c > > +++ b/drivers/thermal/thermal_core.c > > @@ -1529,12 +1529,14 @@ static int thermal_pm_notify(struct notifier_block *nb, > > case PM_POST_HIBERNATION: > > case PM_POST_RESTORE: > > case PM_POST_SUSPEND: > > + mutex_lock(&thermal_list_lock); > > atomic_set(&in_suspend, 0); > > list_for_each_entry(tz, &thermal_tz_list, node) { > > thermal_zone_device_init(tz); > > thermal_zone_device_update(tz, > > THERMAL_EVENT_UNSPECIFIED); > > } > > + mutex_unlock(&thermal_list_lock); > > break; > > default: > > break; > > --
On Mon, Oct 23, 2023 at 3:20 AM Bo Ye (叶波) <Bo.Ye@mediatek.com> wrote: > > Yes, it is observed issue. It does happen, so it's not just "potential" and the subject of the patch is slightly misleading. Please adjust it. > Firstly, it needs to be clarified that this issue occurs in a real- > world environment. By analyzing the logs, we inferred that the issue > occurred just as the system was entering suspend mode, and the user was > switching the thermal policy (this action causes all thermal zones to > unregister/register). In addition, we conducted degradation tests and > also reproduced this issue. The specific method is to first switch the > thermal policy through a command, and then immediately put the system > into suspend state through another command. This method can also > reproduce the issue. OK, so please add this information to the patch changelog. > On Thu, 2023-10-12 at 07:35 +0000, Bo Ye (叶波) wrote: > > On Sat, 2023-09-16 at 19:33 +0800, Bo Ye wrote: > > > > Correct mail title format: remove "Subject:" from mail title. > > > > > From: "yugang.wang" <yugang.wang@mediatek.com> > > > > > > Body: > > > This patch fixes a race condition during system resume. It occurs > > > if > > > the system is exiting a suspend state and a user is trying to > > > register/unregister a thermal zone concurrently. The root cause is > > > that both actions access the `thermal_tz_list`. > > > > > > In detail: > > > > > > 1. At PM_POST_SUSPEND during the resume, the system reads all > > > thermal > > > zones in `thermal_tz_list`, then resets and updates their > > > temperatures. > > > 2. When registering/unregistering a thermal zone, the > > > `thermal_tz_list` gets manipulated. > > > > > > These two actions might occur concurrently, causing a race > > > condition. > > > To solve this issue, we introduce a mutex lock to protect > > > `thermal_tz_list` from being modified while it's being read and > > > updated during the resume from suspend. > > > > > > Kernel oops excerpt related to this fix: > > > > > > [ 5201.869845] [T316822] pc: [0xffffffeb7d4876f0] > > > mutex_lock+0x34/0x170 > > > [ 5201.869856] [T316822] lr: [0xffffffeb7ca98a84] > > > thermal_pm_notify+0xd4/0x26c > > > [... cut for brevity ...] > > > [ 5201.871061] [T316822] suspend_prepare+0x150/0x470 > > > [ 5201.871067] [T316822] enter_state+0x84/0x6f4 > > > [ 5201.871076] [T316822] state_store+0x15c/0x1e8 Well, the connection between the above log snippet and the issue addressed by the patch is rather hard to establish. Please include more of the oops information. > > > > > > Change-Id: Ifdbdecba17093f91eab7e36ce04b46d311ca6568 > > > Signed-off-by: yugang.wang <yugang.wang@mediatek.com> > > > Signed-off-by: Bo Ye <bo.ye@mediatek.com> > > > --- > > > drivers/thermal/thermal_core.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/drivers/thermal/thermal_core.c > > > b/drivers/thermal/thermal_core.c > > > index 8717a3343512..a7a18ed57b6d 100644 > > > --- a/drivers/thermal/thermal_core.c > > > +++ b/drivers/thermal/thermal_core.c > > > @@ -1529,12 +1529,14 @@ static int thermal_pm_notify(struct > > > notifier_block *nb, > > > case PM_POST_HIBERNATION: > > > case PM_POST_RESTORE: > > > case PM_POST_SUSPEND: > > > + mutex_lock(&thermal_list_lock); > > > atomic_set(&in_suspend, 0); It is not clear to me why the above statement needs to be under the lock. > > > list_for_each_entry(tz, &thermal_tz_list, node) { > > > thermal_zone_device_init(tz); > > > thermal_zone_device_update(tz, > > > THERMAL_EVENT_UNSPEC > > > IFIED); > > > } > > > + mutex_unlock(&thermal_list_lock); > > > break; > > > default: > > > break;
Hi Rafael sir, Could you help review this patch, thanks a lot. Best Regards Bo Ye On Wed, 2023-11-01 at 22:58 +0800, Bo Ye wrote: > On Wed, 2023-10-25 at 20:21 +0200, Rafael J. Wysocki wrote: > > > > External email : Please do not click links or open attachments > > until > > you have verified the sender or the content. > > On Mon, Oct 23, 2023 at 3:20 AM Bo Ye (叶波) <Bo.Ye@mediatek.com> > > wrote: > > > > > > Yes, it is observed issue. > > > > It does happen, so it's not just "potential" and the subject of the > > patch is slightly misleading. Please adjust it. > > Done > > > > > > Firstly, it needs to be clarified that this issue occurs in a > > > real- > > > world environment. By analyzing the logs, we inferred that the > > > > issue > > > occurred just as the system was entering suspend mode, and the > > > user > > > > was > > > switching the thermal policy (this action causes all thermal > > > zones > > > > to > > > unregister/register). In addition, we conducted degradation tests > > > > and > > > also reproduced this issue. The specific method is to first > > > switch > > > > the > > > thermal policy through a command, and then immediately put the > > > > system > > > into suspend state through another command. This method can also > > > reproduce the issue. > > > > OK, so please add this information to the patch changelog. > > > > > On Thu, 2023-10-12 at 07:35 +0000, Bo Ye (叶波) wrote: > > > > On Sat, 2023-09-16 at 19:33 +0800, Bo Ye wrote: > > > > > > > > Correct mail title format: remove "Subject:" from mail title. > > > > > > > > > From: "yugang.wang" <yugang.wang@mediatek.com> > > > > > > > > > > Body: > > > > > This patch fixes a race condition during system resume. It > > > > occurs > > > > > if > > > > > the system is exiting a suspend state and a user is trying to > > > > > register/unregister a thermal zone concurrently. The root > > > > > cause > > > > is > > > > > that both actions access the `thermal_tz_list`. > > > > > > > > > > In detail: > > > > > > > > > > 1. At PM_POST_SUSPEND during the resume, the system reads all > > > > > thermal > > > > > zones in `thermal_tz_list`, then resets and updates their > > > > > temperatures. > > > > > 2. When registering/unregistering a thermal zone, the > > > > > `thermal_tz_list` gets manipulated. > > > > > > > > > > These two actions might occur concurrently, causing a race > > > > > condition. > > > > > To solve this issue, we introduce a mutex lock to protect > > > > > `thermal_tz_list` from being modified while it's being read > > > > > and > > > > > updated during the resume from suspend. > > > > > > > > > > Kernel oops excerpt related to this fix: > > > > > > > > > > [ 5201.869845] [T316822] pc: [0xffffffeb7d4876f0] > > > > > mutex_lock+0x34/0x170 > > > > > [ 5201.869856] [T316822] lr: [0xffffffeb7ca98a84] > > > > > thermal_pm_notify+0xd4/0x26c > > > > > [... cut for brevity ...] > > > > > [ 5201.871061] [T316822] suspend_prepare+0x150/0x470 > > > > > [ 5201.871067] [T316822] enter_state+0x84/0x6f4 > > > > > [ 5201.871076] [T316822] state_store+0x15c/0x1e8 > > > > Well, the connection between the above log snippet and the issue > > addressed by the patch is rather hard to establish. Please include > > more of the oops information. > > Thank you very much for reviewing the additional explanations. > > 1.Enable thermal policy operation will unregister/register all > thermal > zones > 10-21 06:13:59.280 854 922 I libMtcLoader: enable thermal policy > thermal_policy_09. > > 2.System suspend entry time is 2023-10-20 22:13:59.242 > [ 4106.364175][T609387] binder:534_2: [name:spm&][SPM] PM: suspend > entry 2023-10-20 22:13:59.242898243 UTC > [ 4106.366185][T609387] binder:534_2: PM: [name:wakeup&]PM: Pending > Wakeup Sources: NETLINK > > 3. It can be proven that the absence of a switch strategy will > perform > unregister/register operations on thermal zones (android time is > 2023- > 10-20 22:13:59.282), > Because the logs for other thermal zones switching are not enabled by > default, we cannot see the logs related to other thermal zones. > [ 4106.404167][T600922] mtkPowerMsgHdl: > [name:thermal_monitor&][Thermal/TZ/CPU]tscpu_unbind unbinding OK > [ 4106.404215][T600922] mtkPowerMsgHdl: > [name:thermal_monitor&][Thermal/TZ/CPU]tscpu_unbind unbinding OK > [ 4106.404225][T600922] mtkPowerMsgHdl: > [name:thermal_monitor&][Thermal/TZ/CPU]tscpu_unbind unbinding OK > [ 4106.404504][T600922] mtkPowerMsgHdl: > [name:thermal_monitor&][Thermal/TZ/CPU]tscpu_bind binding OK, 0 > [ 4106.404545][T600922] mtkPowerMsgHdl: > [name:thermal_monitor&][Thermal/TZ/CPU]tscpu_bind binding OK, 2 > [ 4106.404566][T600922] mtkPowerMsgHdl: > [name:thermal_monitor&][Thermal/TZ/CPU]tscpu_bind binding OK, 1 > > 4. thermal_pm_notify trigger KE(android time: 2023-10-20 22:13:59. > 315894) > [ 4106.437171][T209387] binder:534_2: [name:mrdump&]Kernel Offset: > 0x289cc80000 from 0xffffffc008000000 > [ 4106.437182][T209387] binder:534_2: [name:mrdump&]PHYS_OFFSET: > 0x40000000 > [ 4106.437191][T209387] binder:534_2: [name:mrdump&]pstate: 80400005 > (Nzcv daif +PAN -UAO) > [ 4106.437204][T209387] binder:534_2: [name:mrdump&]pc : > [0xffffffe8a6688200] mutex_lock+0x34/0x184 > [ 4106.437214][T209387] binder:534_2: [name:mrdump&]lr : > [0xffffffe8a5ce66bc] thermal_pm_notify+0xd4/0x26c > [ 4106.437220][T209387] binder:534_2: [name:mrdump&]sp : > ffffffc01bab3ae0 > [ 4106.437226][T209387] binder:534_2: [name:mrdump&]x29: > ffffffc01bab3af0 x28: 0000000000000001 > > > > > > > > > > > > > Change-Id: Ifdbdecba17093f91eab7e36ce04b46d311ca6568 > > > > > Signed-off-by: yugang.wang <yugang.wang@mediatek.com> > > > > > Signed-off-by: Bo Ye <bo.ye@mediatek.com> > > > > > --- > > > > > drivers/thermal/thermal_core.c | 2 ++ > > > > > 1 file changed, 2 insertions(+) > > > > > > > > > > diff --git a/drivers/thermal/thermal_core.c > > > > > b/drivers/thermal/thermal_core.c > > > > > index 8717a3343512..a7a18ed57b6d 100644 > > > > > --- a/drivers/thermal/thermal_core.c > > > > > +++ b/drivers/thermal/thermal_core.c > > > > > @@ -1529,12 +1529,14 @@ static int thermal_pm_notify(struct > > > > > notifier_block *nb, > > > > > case PM_POST_HIBERNATION: > > > > > case PM_POST_RESTORE: > > > > > case PM_POST_SUSPEND: > > > > > + mutex_lock(&thermal_list_lock); > > > > > atomic_set(&in_suspend, 0); > > > > It is not clear to me why the above statement needs to be under the > > lock. > > > > > > > list_for_each_entry(tz, &thermal_tz_list, node) { > > > > > thermal_zone_device_init(tz); > > > > > thermal_zone_device_update(tz, > > > > > > THERMAL_EVENT_UNSP > > > EC > > > > > IFIED); > > > > > } > > > > > + mutex_unlock(&thermal_list_lock); > > > > > break; > > > > > default: > > > > > break;
On Thu, 2023-11-16 at 23:06 +0800, Bo Ye wrote: Hi Reviewers, Please help review this change, or provide your suggestion. Thanks a lot > Hi Rafael sir, > > Could you help review this patch, thanks a lot. > > Best Regards > Bo Ye > > On Wed, 2023-11-01 at 22:58 +0800, Bo Ye wrote: > > On Wed, 2023-10-25 at 20:21 +0200, Rafael J. Wysocki wrote: > > > > > > External email : Please do not click links or open attachments > > > until > > > you have verified the sender or the content. > > > On Mon, Oct 23, 2023 at 3:20 AM Bo Ye (叶波) <Bo.Ye@mediatek.com> > > > wrote: > > > > > > > > Yes, it is observed issue. > > > > > > It does happen, so it's not just "potential" and the subject of > > > the > > > patch is slightly misleading. Please adjust it. > > > > Done > > > > > > > > > Firstly, it needs to be clarified that this issue occurs in a > > > > real- > > > > world environment. By analyzing the logs, we inferred that the > > > > > > issue > > > > occurred just as the system was entering suspend mode, and the > > > > user > > > > > > was > > > > switching the thermal policy (this action causes all thermal > > > > zones > > > > > > to > > > > unregister/register). In addition, we conducted degradation > > > > tests > > > > > > and > > > > also reproduced this issue. The specific method is to first > > > > switch > > > > > > the > > > > thermal policy through a command, and then immediately put the > > > > > > system > > > > into suspend state through another command. This method can > > > > also > > > > reproduce the issue. > > > > > > OK, so please add this information to the patch changelog. > > > > > > > On Thu, 2023-10-12 at 07:35 +0000, Bo Ye (叶波) wrote: > > > > > On Sat, 2023-09-16 at 19:33 +0800, Bo Ye wrote: > > > > > > > > > > Correct mail title format: remove "Subject:" from mail title. > > > > > > > > > > > From: "yugang.wang" <yugang.wang@mediatek.com> > > > > > > > > > > > > Body: > > > > > > This patch fixes a race condition during system resume. It > > > > > > occurs > > > > > > if > > > > > > the system is exiting a suspend state and a user is trying > > > > > > to > > > > > > register/unregister a thermal zone concurrently. The root > > > > > > cause > > > > > > is > > > > > > that both actions access the `thermal_tz_list`. > > > > > > > > > > > > In detail: > > > > > > > > > > > > 1. At PM_POST_SUSPEND during the resume, the system reads > > > > > > all > > > > > > thermal > > > > > > zones in `thermal_tz_list`, then resets and updates > > > > > > their > > > > > > temperatures. > > > > > > 2. When registering/unregistering a thermal zone, the > > > > > > `thermal_tz_list` gets manipulated. > > > > > > > > > > > > These two actions might occur concurrently, causing a race > > > > > > condition. > > > > > > To solve this issue, we introduce a mutex lock to protect > > > > > > `thermal_tz_list` from being modified while it's being read > > > > > > and > > > > > > updated during the resume from suspend. > > > > > > > > > > > > Kernel oops excerpt related to this fix: > > > > > > > > > > > > [ 5201.869845] [T316822] pc: [0xffffffeb7d4876f0] > > > > > > mutex_lock+0x34/0x170 > > > > > > [ 5201.869856] [T316822] lr: [0xffffffeb7ca98a84] > > > > > > thermal_pm_notify+0xd4/0x26c > > > > > > [... cut for brevity ...] > > > > > > [ 5201.871061] [T316822] suspend_prepare+0x150/0x470 > > > > > > [ 5201.871067] [T316822] enter_state+0x84/0x6f4 > > > > > > [ 5201.871076] [T316822] state_store+0x15c/0x1e8 > > > > > > Well, the connection between the above log snippet and the issue > > > addressed by the patch is rather hard to establish. Please > > > include > > > more of the oops information. > > > > Thank you very much for reviewing the additional explanations. > > > > 1.Enable thermal policy operation will unregister/register all > > thermal > > zones > > 10-21 06:13:59.280 854 922 I libMtcLoader: enable thermal > > policy > > thermal_policy_09. > > > > 2.System suspend entry time is 2023-10-20 22:13:59.242 > > [ 4106.364175][T609387] binder:534_2: [name:spm&][SPM] PM: suspend > > entry 2023-10-20 22:13:59.242898243 UTC > > [ 4106.366185][T609387] binder:534_2: PM: [name:wakeup&]PM: Pending > > Wakeup Sources: NETLINK > > > > 3. It can be proven that the absence of a switch strategy will > > perform > > unregister/register operations on thermal zones (android time is > > 2023- > > 10-20 22:13:59.282), > > Because the logs for other thermal zones switching are not enabled > > by > > default, we cannot see the logs related to other thermal zones. > > [ 4106.404167][T600922] mtkPowerMsgHdl: > > [name:thermal_monitor&][Thermal/TZ/CPU]tscpu_unbind unbinding OK > > [ 4106.404215][T600922] mtkPowerMsgHdl: > > [name:thermal_monitor&][Thermal/TZ/CPU]tscpu_unbind unbinding OK > > [ 4106.404225][T600922] mtkPowerMsgHdl: > > [name:thermal_monitor&][Thermal/TZ/CPU]tscpu_unbind unbinding OK > > [ 4106.404504][T600922] mtkPowerMsgHdl: > > [name:thermal_monitor&][Thermal/TZ/CPU]tscpu_bind binding OK, 0 > > [ 4106.404545][T600922] mtkPowerMsgHdl: > > [name:thermal_monitor&][Thermal/TZ/CPU]tscpu_bind binding OK, 2 > > [ 4106.404566][T600922] mtkPowerMsgHdl: > > [name:thermal_monitor&][Thermal/TZ/CPU]tscpu_bind binding OK, 1 > > > > 4. thermal_pm_notify trigger KE(android time: 2023-10-20 22:13:59. > > 315894) > > [ 4106.437171][T209387] binder:534_2: [name:mrdump&]Kernel Offset: > > 0x289cc80000 from 0xffffffc008000000 > > [ 4106.437182][T209387] binder:534_2: [name:mrdump&]PHYS_OFFSET: > > 0x40000000 > > [ 4106.437191][T209387] binder:534_2: [name:mrdump&]pstate: > > 80400005 > > (Nzcv daif +PAN -UAO) > > [ 4106.437204][T209387] binder:534_2: [name:mrdump&]pc : > > [0xffffffe8a6688200] mutex_lock+0x34/0x184 > > [ 4106.437214][T209387] binder:534_2: [name:mrdump&]lr : > > [0xffffffe8a5ce66bc] thermal_pm_notify+0xd4/0x26c > > [ 4106.437220][T209387] binder:534_2: [name:mrdump&]sp : > > ffffffc01bab3ae0 > > [ 4106.437226][T209387] binder:534_2: [name:mrdump&]x29: > > ffffffc01bab3af0 x28: 0000000000000001 > > > > > > > > > > > > > > Signed-off-by: yugang.wang <yugang.wang@mediatek.com> > > > > > > Signed-off-by: Bo Ye <bo.ye@mediatek.com> > > > > > > --- > > > > > > drivers/thermal/thermal_core.c | 2 ++ > > > > > > 1 file changed, 2 insertions(+) > > > > > > > > > > > > diff --git a/drivers/thermal/thermal_core.c > > > > > > b/drivers/thermal/thermal_core.c > > > > > > index 8717a3343512..a7a18ed57b6d 100644 > > > > > > --- a/drivers/thermal/thermal_core.c > > > > > > +++ b/drivers/thermal/thermal_core.c > > > > > > @@ -1529,12 +1529,14 @@ static int thermal_pm_notify(struct > > > > > > notifier_block *nb, > > > > > > case PM_POST_HIBERNATION: > > > > > > case PM_POST_RESTORE: > > > > > > case PM_POST_SUSPEND: > > > > > > + mutex_lock(&thermal_list_lock); > > > > > > atomic_set(&in_suspend, 0); > > > > > > It is not clear to me why the above statement needs to be under > > > the > > > lock. > > > > > > > > > list_for_each_entry(tz, &thermal_tz_list, node) > > > > > > { > > > > > > thermal_zone_device_init(tz); > > > > > > thermal_zone_device_update(tz, > > > > > > > > THERMAL_EVENT_UN > > > > SP > > > > EC > > > > > > IFIED); > > > > > > } > > > > > > + mutex_unlock(&thermal_list_lock); > > > > > > break; > > > > > > default: > > > > > > break;
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 8717a3343512..a7a18ed57b6d 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -1529,12 +1529,14 @@ static int thermal_pm_notify(struct notifier_block *nb, case PM_POST_HIBERNATION: case PM_POST_RESTORE: case PM_POST_SUSPEND: + mutex_lock(&thermal_list_lock); atomic_set(&in_suspend, 0); list_for_each_entry(tz, &thermal_tz_list, node) { thermal_zone_device_init(tz); thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED); } + mutex_unlock(&thermal_list_lock); break; default: break;