Message ID | 20221105071725.2313316-1-yangyingliang@huawei.com |
---|---|
State | New |
Headers | show |
Series | scsi: libsas: fix error handling in sas_phy_add() | expand |
Hi, On 2022/11/7 17:35, John Garry wrote: > On 05/11/2022 07:17, Yang Yingliang wrote: > > This is not libsas. > > BTW, before I go further, note this: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst#n312 > Thanks for replying. Yes, it should be "scsi: scsi_transport_sas: XXX" > >> If transport_add_device() fails in sas_phy_add(), but it's not handled, >> it will lead kernel crash because of trying to delete not added device >> in transport_remove_device() called from sas_remove_host(). >> >> Unable to handle kernel NULL pointer dereference at virtual address >> 0000000000000108 >> CPU: 61 PID: 42829 Comm: rmmod Kdump: loaded Tainted: G W >> 6.1.0-rc1+ #173 >> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> pc : device_del+0x54/0x3d0 >> lr : device_del+0x37c/0x3d0 >> Call trace: >> device_del+0x54/0x3d0 >> attribute_container_class_device_del+0x28/0x38 >> transport_remove_classdev+0x6c/0x80 >> attribute_container_device_trigger+0x108/0x110 >> transport_remove_device+0x28/0x38 >> sas_phy_delete+0x30/0x60 [scsi_transport_sas] >> do_sas_phy_delete+0x6c/0x80 [scsi_transport_sas] >> device_for_each_child+0x68/0xb0 >> sas_remove_children+0x40/0x50 [scsi_transport_sas] >> sas_remove_host+0x20/0x38 [scsi_transport_sas] >> hisi_sas_remove+0x40/0x68 [hisi_sas_main] >> hisi_sas_v2_remove+0x20/0x30 [hisi_sas_v2_hw] >> platform_remove+0x2c/0x60 >> >> Fix this by checking and handling return value of transport_add_device() >> in sas_phy_add(). transport_destroy_device() has been called in >> sas_phy_free() >> in the error path, so it's no need to call it here. > > "there's no need", rather than "it's no need". And I don't know why > you bother even mentioning about transport_destroy_device(). Because the device set up by transport_setup_device(), it should be destroyed by transport_destroy_device(), so I mention this. > >> >> Fixes: c7ebbbce366c ("[SCSI] SAS transport class") >> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> >> --- >> drivers/scsi/scsi_transport_sas.c | 7 +++++-- >> 1 file changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/scsi/scsi_transport_sas.c >> b/drivers/scsi/scsi_transport_sas.c >> index 2f88c61216ee..cb364a7c6097 100644 >> --- a/drivers/scsi/scsi_transport_sas.c >> +++ b/drivers/scsi/scsi_transport_sas.c >> @@ -723,8 +723,11 @@ int sas_phy_add(struct sas_phy *phy) >> error = device_add(&phy->dev); >> if (!error) { > > personally I think that the following looks better: > > int sas_phy_add(struct sas_phy *phy) > { > int error; > > error = device_add(&phy->dev); > if (error) > return error; > > error = transport_add_device(&phy->dev); > if (error) { > device_del(&phy->dev); > return error; > } > transport_configure_device(&phy->dev); > > return 0; > } > EXPORT_SYMBOL(sas_phy_add); Yes, it's looks better, but I was trying to change least this code to fix this problem. I can send a v2 later to change these. Thanks, Yang > >> - transport_add_device(&phy->dev); >> - transport_configure_device(&phy->dev); >> + error = transport_add_device(&phy->dev); >> + if (!error) >> + transport_configure_device(&phy->dev); >> + else >> + device_del(&phy->dev); >> } >> return error; > > > .
diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c index 2f88c61216ee..cb364a7c6097 100644 --- a/drivers/scsi/scsi_transport_sas.c +++ b/drivers/scsi/scsi_transport_sas.c @@ -723,8 +723,11 @@ int sas_phy_add(struct sas_phy *phy) error = device_add(&phy->dev); if (!error) { - transport_add_device(&phy->dev); - transport_configure_device(&phy->dev); + error = transport_add_device(&phy->dev); + if (!error) + transport_configure_device(&phy->dev); + else + device_del(&phy->dev); } return error;
If transport_add_device() fails in sas_phy_add(), but it's not handled, it will lead kernel crash because of trying to delete not added device in transport_remove_device() called from sas_remove_host(). Unable to handle kernel NULL pointer dereference at virtual address 0000000000000108 CPU: 61 PID: 42829 Comm: rmmod Kdump: loaded Tainted: G W 6.1.0-rc1+ #173 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : device_del+0x54/0x3d0 lr : device_del+0x37c/0x3d0 Call trace: device_del+0x54/0x3d0 attribute_container_class_device_del+0x28/0x38 transport_remove_classdev+0x6c/0x80 attribute_container_device_trigger+0x108/0x110 transport_remove_device+0x28/0x38 sas_phy_delete+0x30/0x60 [scsi_transport_sas] do_sas_phy_delete+0x6c/0x80 [scsi_transport_sas] device_for_each_child+0x68/0xb0 sas_remove_children+0x40/0x50 [scsi_transport_sas] sas_remove_host+0x20/0x38 [scsi_transport_sas] hisi_sas_remove+0x40/0x68 [hisi_sas_main] hisi_sas_v2_remove+0x20/0x30 [hisi_sas_v2_hw] platform_remove+0x2c/0x60 Fix this by checking and handling return value of transport_add_device() in sas_phy_add(). transport_destroy_device() has been called in sas_phy_free() in the error path, so it's no need to call it here. Fixes: c7ebbbce366c ("[SCSI] SAS transport class") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> --- drivers/scsi/scsi_transport_sas.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)