Message ID | 20220721065833.26887-1-peter.wang@mediatek.com |
---|---|
State | Superseded |
Headers | show |
Series | [v3] scsi: ufs: correct ufshcd_shutdown flow | expand |
On 7/20/22 23:58, peter.wang@mediatek.com wrote: > Both ufshcd_shtdown and ufshcd_wl_shutdown could run concurrently. > And it have race condition when ufshcd_wl_shutdown on-going and > clock/power turn off by ufshcd_shutdown. > > The normal case: > ufshcd_wl_shutdown -> ufshcd_shtdown > ufshcd_shtdown should turn off clock/power. > > The abnormal case: > ufshcd_shtdown -> ufshcd_wl_shutdown How can this happen since device_shutdown() iterates over devices in the opposite order in which these have been created? Thanks, Bart.
On Sat, Jul 23, 2022 at 5:15 AM Bart Van Assche <bvanassche@acm.org> wrote: > > On 7/20/22 23:58, peter.wang@mediatek.com wrote: > > Both ufshcd_shtdown and ufshcd_wl_shutdown could run concurrently. > > And it have race condition when ufshcd_wl_shutdown on-going and > > clock/power turn off by ufshcd_shutdown. > > > > The normal case: > > ufshcd_wl_shutdown -> ufshcd_shtdown > > ufshcd_shtdown should turn off clock/power. > > > > The abnormal case: > > ufshcd_shtdown -> ufshcd_wl_shutdown > > How can this happen since device_shutdown() iterates over devices in the > opposite order in which these have been created? > Is it possible for more than one initiator to invoke kernel_power_off() at the same time? In this case, the order of device shutdown is not promised anymore because devices_kset is manipulated simultaneously by multiple initiators in device_shutdown(). > Thanks, > > Bart.
On 7/23/22 5:12 AM, Bart Van Assche wrote: > On 7/20/22 23:58, peter.wang@mediatek.com wrote: >> Both ufshcd_shtdown and ufshcd_wl_shutdown could run concurrently. >> And it have race condition when ufshcd_wl_shutdown on-going and >> clock/power turn off by ufshcd_shutdown. >> >> The normal case: >> ufshcd_wl_shutdown -> ufshcd_shtdown >> ufshcd_shtdown should turn off clock/power. >> >> The abnormal case: >> ufshcd_shtdown -> ufshcd_wl_shutdown > > How can this happen since device_shutdown() iterates over devices in > the opposite order in which these have been created? > > Thanks, > > Bart. Hi Bart, Because kernel_restart is export, and mediatek may call kernel_restart while shutdown is on going. EXPORT_SYMBOL_GPL(kernel_restart); kernel_restart -> kernel_restart_prepare -> device_shutdown So, there may have two thread execute device_shutdown concurrently. And the list_lock in device_shutdown function is not protect shutdown callback function, caused two callback function(ufshcd_shutdown/ufshcd_wl_shutdown) could run concurrently. Here is the error log that two thread in device_shutdown. [37684.002227] [T1500641] platform +platform:112b0000.ufshci kpoc_charger: ufshcd-mtk 112b0000.ufshci: [name:core&]shutdown [37684.002264] [T1600339] scsi +scsi:0:0:0:49488 charger_thread: ufs_device_wlun 0:0:0:49488: [name:core&]shutdown Thanks. BR Peter
On 7/24/22 20:47, Peter Wang wrote: > Because kernel_restart is export, and mediatek may call kernel_restart ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Is this code upstream? > while shutdown is on going. > EXPORT_SYMBOL_GPL(kernel_restart); > kernel_restart -> kernel_restart_prepare -> device_shutdown > > So, there may have two thread execute device_shutdown concurrently. > And the list_lock in device_shutdown function is not protect shutdown > callback function, > caused two callback function(ufshcd_shutdown/ufshcd_wl_shutdown) could > run concurrently. > > Here is the error log that two thread in device_shutdown. > [37684.002227] [T1500641] platform +platform:112b0000.ufshci > kpoc_charger: ufshcd-mtk 112b0000.ufshci: [name:core&]shutdown > [37684.002264] [T1600339] scsi +scsi:0:0:0:49488 charger_thread: > ufs_device_wlun 0:0:0:49488: [name:core&]shutdown Hi Peter, I had not yet taken a look at the kernel_restart() function. Now that I have taken a look, it seems to me that kernel_restart() calls must be serialized via the system_transition_mutex. Please make sure that the kernel_restart() calls are serialized. Thanks, Bart.
Hi Bart, On 7/26/22 1:07 AM, Bart Van Assche wrote: > On 7/24/22 20:47, Peter Wang wrote: >> Because kernel_restart is export, and mediatek may call kernel_restart > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Is this code upstream? > No, it not upstream. And sorry that I correct mediatek usage, it is kernel_power_off, not kernel_restart. >> while shutdown is on going. >> EXPORT_SYMBOL_GPL(kernel_restart); >> kernel_restart -> kernel_restart_prepare -> device_shutdown >> >> So, there may have two thread execute device_shutdown concurrently. >> And the list_lock in device_shutdown function is not protect shutdown >> callback function, >> caused two callback function(ufshcd_shutdown/ufshcd_wl_shutdown) >> could run concurrently. >> >> Here is the error log that two thread in device_shutdown. >> [37684.002227] [T1500641] platform +platform:112b0000.ufshci >> kpoc_charger: ufshcd-mtk 112b0000.ufshci: [name:core&]shutdown >> [37684.002264] [T1600339] scsi +scsi:0:0:0:49488 charger_thread: >> ufs_device_wlun 0:0:0:49488: [name:core&]shutdown > > Hi Peter, > > I had not yet taken a look at the kernel_restart() function. Now that > I have taken a look, it seems to me that kernel_restart() calls must > be serialized via the system_transition_mutex. Please make sure that > the kernel_restart() calls are serialized. > > Thanks, > > Bart. I think kernel_power_off could use system_transition_mutex to protect shutdown racing. We will try it, Thanks for the suggestion. And if no need wait, it could more simple in this patch. I will upload next version again. Thanks. Peter
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index c7b337480e3e..47b639fd28b9 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -58,6 +58,9 @@ /* Task management command timeout */ #define TM_CMD_TIMEOUT 100 /* msecs */ +/* Shutdown wait devcie into power off timeout */ +#define UFS_SHUTDOWN_TIMEOUT 500 /* msecs */ + /* maximum number of retries for a general UIC command */ #define UFS_UIC_COMMAND_RETRIES 3 @@ -9461,10 +9464,15 @@ EXPORT_SYMBOL(ufshcd_runtime_resume); */ int ufshcd_shutdown(struct ufs_hba *hba) { - if (ufshcd_is_ufs_dev_poweroff(hba) && ufshcd_is_link_off(hba)) - goto out; + unsigned long timeout; - pm_runtime_get_sync(hba->dev); + /* Wait ufshcd_wl_shutdown clear ufs state */ + timeout = jiffies + msecs_to_jiffies(UFS_SHUTDOWN_TIMEOUT); + while (!ufshcd_is_ufs_dev_poweroff(hba) || !ufshcd_is_link_off(hba)) { + if (time_after(jiffies, timeout)) + goto out; + msleep(1); + } ufshcd_suspend(hba); out: