Message ID | 20230826095836.1138608-1-liaochang1@huawei.com |
---|---|
State | Superseded |
Headers | show |
Series | cpufreq: Fix the race condition while updating the transition_task of policy | expand |
On 26-08-23, 09:58, Liao Chang wrote: > The field 'transition_task' of policy structure is used to track the > task which is performing the frequency transition. Using this field to > print a warning once detect a case where the same task is calling > _begin() again before completing the preivous frequency transition via > the _end(). > > However, there is a potential race condition in _end() and _begin() APIs > while updating the field 'transition_task' of policy, the scenario is > depicted below: > > Task A Task B > > /* 1st freq transition */ > Invoke _begin() { > ... > ... > } > /* 2nd freq transition */ > Invoke _begin() { > ... //waiting for A to > ... //clear > ... //transition_ongoing > ... //in _end() for > ... //the 1st transition > | > Change the frequency | > | > Invoke _end() { | > ... | > ... | > transition_ongoing = false; V > transition_ongoing = true; > transition_task = current; Task B here won't move ahead until "wake_up(&policy->transition_wait)" is called, isn't it ? Also I think the CPU is free to change the order of the two instructions and so this commit won't make a difference. Also I don't feel there is a race here as wake_up() hasn't happened. > transition_task = NULL; > ... //A overwrites the task > ... //performing the transition > ... //result in error warning. > } > > To fix this race condition, the order of the updates to the > 'transition_task' and 'transition_ongoing' fields has been changed, the > 'transition_task' field is now cleared before the 'transition_ongoing' > field, which ensure that only one task can update the 'transition_task' > field at a time. > > Fixes: ca654dc3a93d ("cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end") > Signed-off-by: Liao Chang <liaochang1@huawei.com> > --- > drivers/cpufreq/cpufreq.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index a757f90aa9d6..f8eb6dde57f2 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -455,8 +455,8 @@ void cpufreq_freq_transition_end(struct cpufreq_policy *policy, > policy->cur, > policy->cpuinfo.max_freq); > > - policy->transition_ongoing = false; > policy->transition_task = NULL; > + policy->transition_ongoing = false; > > wake_up(&policy->transition_wait); > } > -- > 2.34.1
Hi Viresh. 在 2023/8/28 15:23, Viresh Kumar 写道: > On 26-08-23, 09:58, Liao Chang wrote: >> The field 'transition_task' of policy structure is used to track the >> task which is performing the frequency transition. Using this field to >> print a warning once detect a case where the same task is calling >> _begin() again before completing the preivous frequency transition via >> the _end(). >> >> However, there is a potential race condition in _end() and _begin() APIs >> while updating the field 'transition_task' of policy, the scenario is >> depicted below: >> >> Task A Task B >> >> /* 1st freq transition */ >> Invoke _begin() { >> ... >> ... >> } >> /* 2nd freq transition */ >> Invoke _begin() { >> ... //waiting for A to >> ... //clear >> ... //transition_ongoing >> ... //in _end() for >> ... //the 1st transition >> | >> Change the frequency | >> | >> Invoke _end() { | >> ... | >> ... | >> transition_ongoing = false; V >> transition_ongoing = true; >> transition_task = current; > > Task B here won't move ahead until "wake_up(&policy->transition_wait)" > is called, isn't it ? Task B does not necessarily go to sleep when it calls wait_event(), it depends on the condition to wait for evaluate false or not. So there is a small race window where Task A already set 'transition_ongoing' to false and Task B can cross wait_event() immediately. wait_event: do { might_sleep(); if (condition) // !transition_ongoing break; __wait_event(); }; I hope I do not miss something important in the code above. > > Also I think the CPU is free to change the order of the two > instructions and so this commit won't make a difference. Also I don't Yes, if the CPU uses weak memroy model, it is possible for the instructions to be reordered. therefore, it is a good idea to insert an smb() between these two lines if there is race here. Thanks. > feel there is a race here as wake_up() hasn't happened. > >> transition_task = NULL; >> ... //A overwrites the task >> ... //performing the transition >> ... //result in error warning. >> } >> >> To fix this race condition, the order of the updates to the >> 'transition_task' and 'transition_ongoing' fields has been changed, the >> 'transition_task' field is now cleared before the 'transition_ongoing' >> field, which ensure that only one task can update the 'transition_task' >> field at a time. >> >> Fixes: ca654dc3a93d ("cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end") >> Signed-off-by: Liao Chang <liaochang1@huawei.com> >> --- >> drivers/cpufreq/cpufreq.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >> index a757f90aa9d6..f8eb6dde57f2 100644 >> --- a/drivers/cpufreq/cpufreq.c >> +++ b/drivers/cpufreq/cpufreq.c >> @@ -455,8 +455,8 @@ void cpufreq_freq_transition_end(struct cpufreq_policy *policy, >> policy->cur, >> policy->cpuinfo.max_freq); >> >> - policy->transition_ongoing = false; >> policy->transition_task = NULL; >> + policy->transition_ongoing = false; >> >> wake_up(&policy->transition_wait); >> } >> -- >> 2.34.1 >
On 28-08-23, 16:29, Liao, Chang wrote: > Task B does not necessarily go to sleep when it calls wait_event(), it depends on > the condition to wait for evaluate false or not. So there is a small race window > where Task A already set 'transition_ongoing' to false and Task B can cross wait_event() > immediately. > > wait_event: > do { > might_sleep(); > if (condition) // !transition_ongoing > break; > __wait_event(); > }; > > I hope I do not miss something important in the code above. > Yes, if the CPU uses weak memroy model, it is possible for the instructions to be reordered. > therefore, it is a good idea to insert an smb() between these two lines if there is race here. Maybe it would be better to do this instead ? diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 6b52ebe5a890..f11b01b25e8d 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -455,8 +455,10 @@ void cpufreq_freq_transition_end(struct cpufreq_policy *policy, policy->cur, policy->cpuinfo.max_freq); + spin_lock(&policy->transition_lock); policy->transition_ongoing = false; policy->transition_task = NULL; + spin_unlock(&policy->transition_lock); wake_up(&policy->transition_wait); }
On Mon, Aug 28, 2023 at 10:52 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 28-08-23, 16:29, Liao, Chang wrote: > > Task B does not necessarily go to sleep when it calls wait_event(), it depends on > > the condition to wait for evaluate false or not. So there is a small race window > > where Task A already set 'transition_ongoing' to false and Task B can cross wait_event() > > immediately. > > > > wait_event: > > do { > > might_sleep(); > > if (condition) // !transition_ongoing > > break; > > __wait_event(); > > }; > > > > I hope I do not miss something important in the code above. > > > Yes, if the CPU uses weak memroy model, it is possible for the instructions to be reordered. > > therefore, it is a good idea to insert an smb() between these two lines if there is race here. > > Maybe it would be better to do this instead ? > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 6b52ebe5a890..f11b01b25e8d 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -455,8 +455,10 @@ void cpufreq_freq_transition_end(struct cpufreq_policy *policy, > policy->cur, > policy->cpuinfo.max_freq); > > + spin_lock(&policy->transition_lock); > policy->transition_ongoing = false; > policy->transition_task = NULL; > + spin_unlock(&policy->transition_lock); > > wake_up(&policy->transition_wait); > } > > -- I was about to suggest the same thing. wake_up() is a full memory barrier only if it actually wakes up a task and if it doesn't do that, without the locking the other task may see a state in which transition_ongoing is false already and transition_task is still NULL regardless of the relative ordering of the statements before the wake_up() call.
在 2023/8/28 16:52, Viresh Kumar 写道: > On 28-08-23, 16:29, Liao, Chang wrote: >> Task B does not necessarily go to sleep when it calls wait_event(), it depends on >> the condition to wait for evaluate false or not. So there is a small race window >> where Task A already set 'transition_ongoing' to false and Task B can cross wait_event() >> immediately. >> >> wait_event: >> do { >> might_sleep(); >> if (condition) // !transition_ongoing >> break; >> __wait_event(); >> }; >> >> I hope I do not miss something important in the code above. > >> Yes, if the CPU uses weak memroy model, it is possible for the instructions to be reordered. >> therefore, it is a good idea to insert an smb() between these two lines if there is race here. > > Maybe it would be better to do this instead ? > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 6b52ebe5a890..f11b01b25e8d 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -455,8 +455,10 @@ void cpufreq_freq_transition_end(struct cpufreq_policy *policy, > policy->cur, > policy->cpuinfo.max_freq); > > + spin_lock(&policy->transition_lock); > policy->transition_ongoing = false; > policy->transition_task = NULL; > + spin_unlock(&policy->transition_lock); I think it is more straightforward, I will use it in next revision. Thanks. > > wake_up(&policy->transition_wait); > } >
Hi, Rafael 在 2023/8/28 16:58, Rafael J. Wysocki 写道: > On Mon, Aug 28, 2023 at 10:52 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: >> >> On 28-08-23, 16:29, Liao, Chang wrote: >>> Task B does not necessarily go to sleep when it calls wait_event(), it depends on >>> the condition to wait for evaluate false or not. So there is a small race window >>> where Task A already set 'transition_ongoing' to false and Task B can cross wait_event() >>> immediately. >>> >>> wait_event: >>> do { >>> might_sleep(); >>> if (condition) // !transition_ongoing >>> break; >>> __wait_event(); >>> }; >>> >>> I hope I do not miss something important in the code above. >> >>> Yes, if the CPU uses weak memroy model, it is possible for the instructions to be reordered. >>> therefore, it is a good idea to insert an smb() between these two lines if there is race here. >> >> Maybe it would be better to do this instead ? >> >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >> index 6b52ebe5a890..f11b01b25e8d 100644 >> --- a/drivers/cpufreq/cpufreq.c >> +++ b/drivers/cpufreq/cpufreq.c >> @@ -455,8 +455,10 @@ void cpufreq_freq_transition_end(struct cpufreq_policy *policy, >> policy->cur, >> policy->cpuinfo.max_freq); >> >> + spin_lock(&policy->transition_lock); >> policy->transition_ongoing = false; >> policy->transition_task = NULL; >> + spin_unlock(&policy->transition_lock); >> >> wake_up(&policy->transition_wait); >> } >> >> -- > > I was about to suggest the same thing. > > wake_up() is a full memory barrier only if it actually wakes up a task > and if it doesn't do that, without the locking the other task may see > a state in which transition_ongoing is false already and > transition_task is still NULL regardless of the relative ordering of > the statements before the wake_up() call. I agree, unless the transition_ongoing and transition_task fields are updated atomically, there is always a window where inconsistency can occur in the policy structure.
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index a757f90aa9d6..f8eb6dde57f2 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -455,8 +455,8 @@ void cpufreq_freq_transition_end(struct cpufreq_policy *policy, policy->cur, policy->cpuinfo.max_freq); - policy->transition_ongoing = false; policy->transition_task = NULL; + policy->transition_ongoing = false; wake_up(&policy->transition_wait); }
The field 'transition_task' of policy structure is used to track the task which is performing the frequency transition. Using this field to print a warning once detect a case where the same task is calling _begin() again before completing the preivous frequency transition via the _end(). However, there is a potential race condition in _end() and _begin() APIs while updating the field 'transition_task' of policy, the scenario is depicted below: Task A Task B /* 1st freq transition */ Invoke _begin() { ... ... } /* 2nd freq transition */ Invoke _begin() { ... //waiting for A to ... //clear ... //transition_ongoing ... //in _end() for ... //the 1st transition | Change the frequency | | Invoke _end() { | ... | ... | transition_ongoing = false; V transition_ongoing = true; transition_task = current; transition_task = NULL; ... //A overwrites the task ... //performing the transition ... //result in error warning. } To fix this race condition, the order of the updates to the 'transition_task' and 'transition_ongoing' fields has been changed, the 'transition_task' field is now cleared before the 'transition_ongoing' field, which ensure that only one task can update the 'transition_task' field at a time. Fixes: ca654dc3a93d ("cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end") Signed-off-by: Liao Chang <liaochang1@huawei.com> --- drivers/cpufreq/cpufreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)