Message ID | 20230318081303.792969-1-zyytlz.wz@163.com |
---|---|
State | New |
Headers | show |
Series | [RESEND] scsi: qedi: Fix use after free bug in qedi_remove due to race condition | expand |
> -----Original Message----- > From: Zheng Hacker <hackerzheng666@gmail.com> > Sent: Thursday, March 23, 2023 9:15 AM > To: Mike Christie <michael.christie@oracle.com> > Cc: Zheng Wang <zyytlz.wz@163.com>; Nilesh Javali <njavali@marvell.com>; > Manish Rangankar <mrangankar@marvell.com>; GR-QLogic-Storage- > Upstream <GR-QLogic-Storage-Upstream@marvell.com>; > jejb@linux.ibm.com; martin.petersen@oracle.com; linux- > scsi@vger.kernel.org; linux-kernel@vger.kernel.org; > 1395428693sheep@gmail.com; alex000young@gmail.com > Subject: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in > qedi_remove due to race condition > > External Email > > ---------------------------------------------------------------------- > Mike Christie <michael.christie@oracle.com> 于2023年3月21日周二 00:11写 > 道: > > > > On 3/18/23 3:13 AM, Zheng Wang wrote: > > > In qedi_probe, it calls __qedi_probe, which bound > > > &qedi->recovery_work with qedi_recovery_handler and bound > > > &qedi->board_disable_work with qedi_board_disable_work. > > > > > > When it calls qedi_schedule_recovery_handler, it will finally call > > > schedule_delayed_work to start the work. > > > > > > When we call qedi_remove to remove the driver, there may be a > > > sequence as follows: > > > > > > Fix it by finishing the work before cleanup in qedi_remove. > > > > > > CPU0 CPU1 > > > > > > |qedi_recovery_handler > > > qedi_remove | > > > __qedi_remove | > > > iscsi_host_free | > > > scsi_host_put | > > > //free shost | > > > |iscsi_host_for_each_session > > > |//use qedi->shost > > > > > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process") > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > --- > > > drivers/scsi/qedi/qedi_main.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/drivers/scsi/qedi/qedi_main.c > > > b/drivers/scsi/qedi/qedi_main.c index f2ee49756df8..25223f6f5344 > > > 100644 > > > --- a/drivers/scsi/qedi/qedi_main.c > > > +++ b/drivers/scsi/qedi/qedi_main.c > > > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev > *pdev, int mode) > > > int rval; > > > u16 retry = 10; > > > > > > + /*cancel work*/ > > > > This comment is not needed. The name of the functions you are calling > > have "cancel" and "work" in them so we know. If you want to add a > > comment explain why the cancel calls are needed here. > > > > Hi, > > Sorry for my late reply and thanks for your advice. Will remove it in the next > version of patch. > > > > > > + cancel_delayed_work_sync(&qedi->recovery_work); > > > + cancel_delayed_work_sync(&qedi->board_disable_work); > > > > > > How do you know after you have called cancel_delayed_work_sync that > > schedule_recovery_handler or schedule_hw_err_handler can't be called? > > I don't know the qed driver well, but it looks like you could have > > operations still running, so after you cancel here one of those ops > > could lead to them scheduling the work again. > > > > Sorry I didn't know how to make sure there's no more schedule. But I do > think this is important. Maybe there're someone else who can give us advice. > > Best regards, > Zheng > > Best place to call cancel_delayed_work_sync is after qedi_ops->stop(qedi->cdev) and qedi_ops->ll2->stop(qedi->cdev);, after these qed calls firmware will not post any events to qedi driver. Thanks, Manish
Manish Rangankar <mrangankar@marvell.com> 于2023年3月23日周四 18:17写道: > > > > > -----Original Message----- > > From: Zheng Hacker <hackerzheng666@gmail.com> > > Sent: Thursday, March 23, 2023 9:15 AM > > To: Mike Christie <michael.christie@oracle.com> > > Cc: Zheng Wang <zyytlz.wz@163.com>; Nilesh Javali <njavali@marvell.com>; > > Manish Rangankar <mrangankar@marvell.com>; GR-QLogic-Storage- > > Upstream <GR-QLogic-Storage-Upstream@marvell.com>; > > jejb@linux.ibm.com; martin.petersen@oracle.com; linux- > > scsi@vger.kernel.org; linux-kernel@vger.kernel.org; > > 1395428693sheep@gmail.com; alex000young@gmail.com > > Subject: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in > > qedi_remove due to race condition > > > > External Email > > > > ---------------------------------------------------------------------- > > Mike Christie <michael.christie@oracle.com> 于2023年3月21日周二 00:11写 > > 道: > > > > > > On 3/18/23 3:13 AM, Zheng Wang wrote: > > > > In qedi_probe, it calls __qedi_probe, which bound > > > > &qedi->recovery_work with qedi_recovery_handler and bound > > > > &qedi->board_disable_work with qedi_board_disable_work. > > > > > > > > When it calls qedi_schedule_recovery_handler, it will finally call > > > > schedule_delayed_work to start the work. > > > > > > > > When we call qedi_remove to remove the driver, there may be a > > > > sequence as follows: > > > > > > > > Fix it by finishing the work before cleanup in qedi_remove. > > > > > > > > CPU0 CPU1 > > > > > > > > |qedi_recovery_handler > > > > qedi_remove | > > > > __qedi_remove | > > > > iscsi_host_free | > > > > scsi_host_put | > > > > //free shost | > > > > |iscsi_host_for_each_session > > > > |//use qedi->shost > > > > > > > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process") > > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > > --- > > > > drivers/scsi/qedi/qedi_main.c | 4 ++++ > > > > 1 file changed, 4 insertions(+) > > > > > > > > diff --git a/drivers/scsi/qedi/qedi_main.c > > > > b/drivers/scsi/qedi/qedi_main.c index f2ee49756df8..25223f6f5344 > > > > 100644 > > > > --- a/drivers/scsi/qedi/qedi_main.c > > > > +++ b/drivers/scsi/qedi/qedi_main.c > > > > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev > > *pdev, int mode) > > > > int rval; > > > > u16 retry = 10; > > > > > > > > + /*cancel work*/ > > > > > > This comment is not needed. The name of the functions you are calling > > > have "cancel" and "work" in them so we know. If you want to add a > > > comment explain why the cancel calls are needed here. > > > > > > > Hi, > > > > Sorry for my late reply and thanks for your advice. Will remove it in the next > > version of patch. > > > > > > > > > + cancel_delayed_work_sync(&qedi->recovery_work); > > > > + cancel_delayed_work_sync(&qedi->board_disable_work); > > > > > > > > > How do you know after you have called cancel_delayed_work_sync that > > > schedule_recovery_handler or schedule_hw_err_handler can't be called? > > > I don't know the qed driver well, but it looks like you could have > > > operations still running, so after you cancel here one of those ops > > > could lead to them scheduling the work again. > > > > > > > Sorry I didn't know how to make sure there's no more schedule. But I do > > think this is important. Maybe there're someone else who can give us advice. > > > > Best regards, > > Zheng > > > > > Best place to call cancel_delayed_work_sync is after qedi_ops->stop(qedi->cdev) and > qedi_ops->ll2->stop(qedi->cdev);, after these qed calls firmware will not post any events to qedi driver. > Sorry for my late reply. Will apply that in next version. Best reagrds, Zheng > Thanks, > Manish >
diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c index f2ee49756df8..25223f6f5344 100644 --- a/drivers/scsi/qedi/qedi_main.c +++ b/drivers/scsi/qedi/qedi_main.c @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev *pdev, int mode) int rval; u16 retry = 10; + /*cancel work*/ + cancel_delayed_work_sync(&qedi->recovery_work); + cancel_delayed_work_sync(&qedi->board_disable_work); + if (mode == QEDI_MODE_NORMAL) iscsi_host_remove(qedi->shost, false); else if (mode == QEDI_MODE_SHUTDOWN)
In qedi_probe, it calls __qedi_probe, which bound &qedi->recovery_work with qedi_recovery_handler and bound &qedi->board_disable_work with qedi_board_disable_work. When it calls qedi_schedule_recovery_handler, it will finally call schedule_delayed_work to start the work. When we call qedi_remove to remove the driver, there may be a sequence as follows: Fix it by finishing the work before cleanup in qedi_remove. CPU0 CPU1 |qedi_recovery_handler qedi_remove | __qedi_remove | iscsi_host_free | scsi_host_put | //free shost | |iscsi_host_for_each_session |//use qedi->shost Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process") Signed-off-by: Zheng Wang <zyytlz.wz@163.com> --- drivers/scsi/qedi/qedi_main.c | 4 ++++ 1 file changed, 4 insertions(+)