Message ID | 20240320110809.12901-1-Alexander@wetzel-home.de |
---|---|
State | New |
Headers | show |
Series | [v2] scsi: sg: Avoid sg device teardown race | expand |
On Wed, Mar 20, 2024 at 12:08:09PM +0100, Alexander Wetzel wrote: > sg_remove_sfp_usercontext() must not use sg_device_destroy() after > calling scsi_device_put(). > > sg_device_destroy() is accessing the parent scsi device request_queue. > Which will already be set to NULL when the preceding call to > scsi_device_put() removed the last reference to the parent scsi device. > > The resulting NULL pointer exception will then crash the kernel. > > Link: https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de > Cc: <stable@vger.kernel.org> > Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de> > --- > Changes compared to V1: > Reworked the commit message What commit id does this fix? thanks, greg k-h
On 3/20/24 04:08, Alexander Wetzel wrote: > sg_remove_sfp_usercontext() must not use sg_device_destroy() after > calling scsi_device_put(). > > sg_device_destroy() is accessing the parent scsi device request_queue. > Which will already be set to NULL when the preceding call to > scsi_device_put() removed the last reference to the parent scsi device. > > The resulting NULL pointer exception will then crash the kernel. > > Link: https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de > Cc: <stable@vger.kernel.org> > Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de> > --- > Changes compared to V1: > Reworked the commit message > > Alexander > --- > drivers/scsi/sg.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c > index 86210e4dd0d3..80e0d1981191 100644 > --- a/drivers/scsi/sg.c > +++ b/drivers/scsi/sg.c > @@ -2232,8 +2232,8 @@ sg_remove_sfp_usercontext(struct work_struct *work) > "sg_remove_sfp: sfp=0x%p\n", sfp)); > kfree(sfp); > > - scsi_device_put(sdp->device); > kref_put(&sdp->d_ref, sg_device_destroy); > + scsi_device_put(sdp->device); > module_put(THIS_MODULE); > } > Is it guaranteed that the above kref_put() call is the last kref_put() call on sdp->d_ref? If not, how about inserting code between the kref_put() call and the scsi_device_put() call that waits until sg_device_destroy() has finished? Thanks, Bart.
On 20.03.24 16:02, Bart Van Assche wrote: > On 3/20/24 04:08, Alexander Wetzel wrote: >> sg_remove_sfp_usercontext() must not use sg_device_destroy() after >> calling scsi_device_put(). >> >> sg_device_destroy() is accessing the parent scsi device request_queue. >> Which will already be set to NULL when the preceding call to >> scsi_device_put() removed the last reference to the parent scsi device. >> >> The resulting NULL pointer exception will then crash the kernel. >> >> Link: >> https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de >> Cc: <stable@vger.kernel.org> >> Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de> >> --- >> Changes compared to V1: >> Reworked the commit message >> >> Alexander >> --- >> drivers/scsi/sg.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c >> index 86210e4dd0d3..80e0d1981191 100644 >> --- a/drivers/scsi/sg.c >> +++ b/drivers/scsi/sg.c >> @@ -2232,8 +2232,8 @@ sg_remove_sfp_usercontext(struct work_struct *work) >> "sg_remove_sfp: sfp=0x%p\n", sfp)); >> kfree(sfp); >> - scsi_device_put(sdp->device); >> kref_put(&sdp->d_ref, sg_device_destroy); >> + scsi_device_put(sdp->device); >> module_put(THIS_MODULE); >> } > > Is it guaranteed that the above kref_put() call is the last kref_put() > call on sdp->d_ref? If not, how about inserting code between the > kref_put() call and the scsi_device_put() call that waits until > sg_device_destroy() has finished? > While I'm not familiar with the code, I'm pretty sure kref_put() is removing the last reference to d_ref here. Anything else would be odd, based on my - really sketchy - understanding of the flows. Also waiting for another process looks wrong. I guess we would then have to delay the call to sg_release(). And at least for me it's always the last d_ref reference. I changed the section to: kref_put(&sdp->d_ref, sg_device_destroy); printk("XXXX scsi=%u, dref=%u\n", \ kref_read(&sdp->device->sdev_gendev.kobj.kref), \ kref_read(&sdp->d_ref)); scsi_device_put(sdp->device); And connected/disconnected my test USB device a few times: XXXX scsi=2, dref=0 XXXX scsi=1, dref=0 XXXX scsi=2, dref=0 XXXX scsi=1, dref=0 XXXX scsi=1, dref=0 XXXX scsi=1, dref=0 XXXX scsi=1, dref=0 XXXX scsi=1, dref=0 XXXX scsi=1, dref=0 XXXX scsi=1, dref=0 (scsi=1 are the cases which would cause the NULL pointer exceptions with the unpatched driver.) Alexander
On 3/20/24 04:08, Alexander Wetzel wrote: > diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c > index 86210e4dd0d3..80e0d1981191 100644 > --- a/drivers/scsi/sg.c > +++ b/drivers/scsi/sg.c > @@ -2232,8 +2232,8 @@ sg_remove_sfp_usercontext(struct work_struct *work) > "sg_remove_sfp: sfp=0x%p\n", sfp)); > kfree(sfp); > > - scsi_device_put(sdp->device); > kref_put(&sdp->d_ref, sg_device_destroy); > + scsi_device_put(sdp->device); > module_put(THIS_MODULE); > } Since sg_device_destroy() frees struct sg_device and since the scsi_device_put() call reads from struct sg_device, does this patch introduce a use-after-free? Has it been tested with KASAN enabled? Thanks, Bart.
On 3/20/24 14:30, Alexander Wetzel wrote: > sg_remove_sfp_usercontext() must not use sg_device_destroy() after > calling scsi_device_put(). > > sg_device_destroy() is accessing the parent scsi device request_queue. > Which will already be set to NULL when the preceding call to > scsi_device_put() removed the last reference to the parent scsi device. > > The resulting NULL pointer exception will then crash the kernel. Reviewed-by: Bart Van Assche <bvanassche@acm.org>
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 86210e4dd0d3..80e0d1981191 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -2232,8 +2232,8 @@ sg_remove_sfp_usercontext(struct work_struct *work) "sg_remove_sfp: sfp=0x%p\n", sfp)); kfree(sfp); - scsi_device_put(sdp->device); kref_put(&sdp->d_ref, sg_device_destroy); + scsi_device_put(sdp->device); module_put(THIS_MODULE); }
sg_remove_sfp_usercontext() must not use sg_device_destroy() after calling scsi_device_put(). sg_device_destroy() is accessing the parent scsi device request_queue. Which will already be set to NULL when the preceding call to scsi_device_put() removed the last reference to the parent scsi device. The resulting NULL pointer exception will then crash the kernel. Link: https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de Cc: <stable@vger.kernel.org> Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de> --- Changes compared to V1: Reworked the commit message Alexander --- drivers/scsi/sg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)