Message ID | 20230607182249.22623-5-mwilck@suse.com |
---|---|
State | New |
Headers | show |
Series | scsi: fixes for targets with many LUNs, and scsi_target_block rework | expand |
On 6/7/23 22:44, Christoph Hellwig wrote: >>> Thanks. This wasn't obvious to me from the current code. I'll add a >>> comment in the next version. >> >> The crucial question is now, is it sufficient to call >> blk_mq_quiesce_queue_nowait() under the mutex, or does the call to >> blk_mq_wait_quiesce_done() have to be under the mutex, too? >> The latter would actually kill off our attempt to fix the delay >> in fc_remote_port_delete() that was caused by repeated >> synchronize_rcu() calls. >> >> But if I understand you correctly, moving the wait out of the mutex >> should be ok. I'll update the series accordingly. > > I can't think of a reason we'd want to lock over the wait, but Bart > knows this code way better than I do. Unless deep changes would be made in the block layer blk_mq_quiesce_queue_nowait() and/or blk_mq_wait_quiesce_done() functions, moving blk_mq_wait_quiesce_done() outside the critical section seems fine to me. Thanks, Bart.
On Thu, 2023-06-08 at 13:54 -0500, Mike Christie wrote: > On 6/8/23 9:12 AM, Bart Van Assche wrote: > > On 6/7/23 22:44, Christoph Hellwig wrote: > > > > > Thanks. This wasn't obvious to me from the current code. I'll > > > > > add a > > > > > comment in the next version. > > > > > > > > The crucial question is now, is it sufficient to call > > > > blk_mq_quiesce_queue_nowait() under the mutex, or does the call > > > > to > > > > blk_mq_wait_quiesce_done() have to be under the mutex, too? > > > > The latter would actually kill off our attempt to fix the delay > > > > in fc_remote_port_delete() that was caused by repeated > > > > synchronize_rcu() calls. > > > > > > > > But if I understand you correctly, moving the wait out of the > > > > mutex > > > > should be ok. I'll update the series accordingly. > > > > > > I can't think of a reason we'd want to lock over the wait, but > > > Bart > > > knows this code way better than I do. > > > > Unless deep changes would be made in the block layer > > blk_mq_quiesce_queue_nowait() and/or blk_mq_wait_quiesce_done() > > functions, moving blk_mq_wait_quiesce_done() outside the critical > > section seems fine to me. > > If it helps, I tested with iscsi and the mutex changes discussed > above > and it worked ok for me. I guess the race that Bart was hinting at is hard to trigger. I would like to remark that the fact that we need to hold the SCSI state_mutex while calling blk_mq_quiesce_queue_nowait() looks like a layering issue to me. Not sure if, and how, this could be avoided, though. Regards Martin
On 6/12/23 04:15, Martin Wilck wrote: > I guess the race that Bart was hinting at is hard to trigger. Are you sure about this? I think this scenario can be triggered by writing into the sysfs attribute that changes the SCSI device state while a scsi_target_block() call is in progress. See also store_state_field(). > I would like to remark that the fact that we need to hold the SCSI > state_mutex while calling blk_mq_quiesce_queue_nowait() looks like a > layering issue to me. Not sure if, and how, this could be avoided, > though. I do not agree that this is a layering issue. Is holding a mutex around a call of a function in a lower layer ever a layering issue? What matters is to be very careful with locks while invoking callback functions. See also slide 7 in Ousterhout's presentation "Why Threads Are A Bad Idea (for most purposes)" from 1996 (https://web.stanford.edu/~ouster/cgi-bin/papers/threads.pdf). Bart.
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index ce5788643011..26e7ce25fa05 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2795,9 +2795,9 @@ static void scsi_device_block(struct scsi_device *sdev, void *data) mutex_lock(&sdev->state_mutex); err = __scsi_internal_device_block_nowait(sdev); + mutex_unlock(&sdev->state_mutex); if (err == 0) scsi_stop_queue(sdev, false); - mutex_unlock(&sdev->state_mutex); WARN_ONCE(err, "__scsi_internal_device_block_nowait(%s) failed: err = %d\n", dev_name(&sdev->sdev_gendev), err);