mbox series

[v3,0/4] SCSI: Fix issues between removing device and error handle

Message ID 20231016020314.1269636-1-haowenchao2@huawei.com
Headers show
Series SCSI: Fix issues between removing device and error handle | expand

Message

Wenchao Hao Oct. 16, 2023, 2:03 a.m. UTC
I am testing SCSI error handle with my previous scsi_debug error
injection patches, and found some issues when removing device and
error handler happened together.

These issues are triggered because devices in removing would be skipped
when calling shost_for_each_device().

Three issues are found:
1. statistic info printed at beginning of scsi_error_handler is wrong
2. device reset is not triggered
3. IO requeued to request_queue would be hang after error handle

V3:
  - Update patch description
  - Update comments of functions added

V2:
  - Fix IO hang by run all devices' queue after error handler
  - Do not modify shost_for_each_device() directly but add a new
    helper to iterate devices but do not skip devices in removing

Wenchao Hao (4):
  scsi: core: Add new helper to iterate all devices of host
  scsi: scsi_error: Fix wrong statistic when print error info
  scsi: scsi_error: Fix device reset is not triggered
  scsi: scsi_core: Fix IO hang when device removing

 drivers/scsi/scsi.c        | 46 ++++++++++++++++++++++++++------------
 drivers/scsi/scsi_error.c  |  4 ++--
 drivers/scsi/scsi_lib.c    |  2 +-
 include/scsi/scsi_device.h | 25 ++++++++++++++++++---
 4 files changed, 57 insertions(+), 20 deletions(-)

Comments

Wenchao Hao Oct. 17, 2023, 5 p.m. UTC | #1
On Mon, Oct 16, 2023 at 10:04 AM Wenchao Hao <haowenchao2@huawei.com> wrote:
>
> I am testing SCSI error handle with my previous scsi_debug error
> injection patches, and found some issues when removing device and
> error handler happened together.
>
> These issues are triggered because devices in removing would be skipped
> when calling shost_for_each_device().

Friendly ping...

>
> Three issues are found:
> 1. statistic info printed at beginning of scsi_error_handler is wrong
> 2. device reset is not triggered
> 3. IO requeued to request_queue would be hang after error handle
>
> V3:
>   - Update patch description
>   - Update comments of functions added
>
> V2:
>   - Fix IO hang by run all devices' queue after error handler
>   - Do not modify shost_for_each_device() directly but add a new
>     helper to iterate devices but do not skip devices in removing
>
> Wenchao Hao (4):
>   scsi: core: Add new helper to iterate all devices of host
>   scsi: scsi_error: Fix wrong statistic when print error info
>   scsi: scsi_error: Fix device reset is not triggered
>   scsi: scsi_core: Fix IO hang when device removing
>
>  drivers/scsi/scsi.c        | 46 ++++++++++++++++++++++++++------------
>  drivers/scsi/scsi_error.c  |  4 ++--
>  drivers/scsi/scsi_lib.c    |  2 +-
>  include/scsi/scsi_device.h | 25 ++++++++++++++++++---
>  4 files changed, 57 insertions(+), 20 deletions(-)
>
> --
> 2.32.0
>
Bart Van Assche Oct. 17, 2023, 9:41 p.m. UTC | #2
On 10/17/23 10:00, Wenchao Hao wrote:
> On Mon, Oct 16, 2023 at 10:04 AM Wenchao Hao <haowenchao2@huawei.com> wrote:
>>
>> I am testing SCSI error handle with my previous scsi_debug error
>> injection patches, and found some issues when removing device and
>> error handler happened together.
>>
>> These issues are triggered because devices in removing would be skipped
>> when calling shost_for_each_device().
> 
> Friendly ping...

The patch series was posted on October 15, 7 PM PDT and the ping has
been posted on October 17, 10 AM PDT. That's less than two days after
the patch series was posted. Isn't that way too soon to post a "ping"?

Thanks,

Bart.
Wenchao Hao Oct. 18, 2023, 1:37 a.m. UTC | #3
On 2023/10/18 5:41, Bart Van Assche wrote:
> 
> On 10/17/23 10:00, Wenchao Hao wrote:
>> On Mon, Oct 16, 2023 at 10:04 AM Wenchao Hao <haowenchao2@huawei.com> wrote:
>>>
>>> I am testing SCSI error handle with my previous scsi_debug error
>>> injection patches, and found some issues when removing device and
>>> error handler happened together.
>>>
>>> These issues are triggered because devices in removing would be skipped
>>> when calling shost_for_each_device().
>>
>> Friendly ping...
> 
> The patch series was posted on October 15, 7 PM PDT and the ping has
> been posted on October 17, 10 AM PDT. That's less than two days after
> the patch series was posted. Isn't that way too soon to post a "ping"?
> 

The previous version was posted on 2023/9/28 but not reviewed, so I ping
soon after repost.

> Thanks,
> 
> Bart.
> 
>
Bart Van Assche Oct. 18, 2023, 1:51 p.m. UTC | #4
On 10/17/23 18:37, Wenchao Hao wrote:
> The previous version was posted on 2023/9/28 but not reviewed, so I
> ping soon after repost.

Since a repost counts as a ping, I think posting a ping soon after
reposting is considered aggressive.

Bart.
Wenchao Hao Oct. 18, 2023, 2:40 p.m. UTC | #5
On Wed, Oct 18, 2023 at 9:51 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 10/17/23 18:37, Wenchao Hao wrote:
> > The previous version was posted on 2023/9/28 but not reviewed, so I
> > ping soon after repost.
>
> Since a repost counts as a ping, I think posting a ping soon after
> reposting is considered aggressive.
>

I didn't mean that, then how long is appropriate to post a ping?

Thanks.

> Bart.
Bart Van Assche Oct. 18, 2023, 6:03 p.m. UTC | #6
On 10/18/23 07:40, Wenchao Hao wrote:
> On Wed, Oct 18, 2023 at 9:51 PM Bart Van Assche <bvanassche@acm.org> wrote:
>>
>> On 10/17/23 18:37, Wenchao Hao wrote:
>>> The previous version was posted on 2023/9/28 but not reviewed, so I
>>> ping soon after repost.
>>
>> Since a repost counts as a ping, I think posting a ping soon after
>> reposting is considered aggressive.
> 
> I didn't mean that, then how long is appropriate to post a ping?

It depends on how busy the reviewers are. I wait at least one week
before sending out a ping or reposting a patch series.

Bart.
Wenchao Hao Nov. 13, 2023, 4 p.m. UTC | #7
On 10/16/23 10:03 AM, Wenchao Hao wrote:
> I am testing SCSI error handle with my previous scsi_debug error
> injection patches, and found some issues when removing device and
> error handler happened together.
> 
> These issues are triggered because devices in removing would be skipped
> when calling shost_for_each_device().
> 

Friendly ping...

> Three issues are found:
> 1. statistic info printed at beginning of scsi_error_handler is wrong
> 2. device reset is not triggered
> 3. IO requeued to request_queue would be hang after error handle
> 
> V3:
>   - Update patch description
>   - Update comments of functions added
> 
> V2:
>   - Fix IO hang by run all devices' queue after error handler
>   - Do not modify shost_for_each_device() directly but add a new
>     helper to iterate devices but do not skip devices in removing
> 
> Wenchao Hao (4):
>   scsi: core: Add new helper to iterate all devices of host
>   scsi: scsi_error: Fix wrong statistic when print error info
>   scsi: scsi_error: Fix device reset is not triggered
>   scsi: scsi_core: Fix IO hang when device removing
> 
>  drivers/scsi/scsi.c        | 46 ++++++++++++++++++++++++++------------
>  drivers/scsi/scsi_error.c  |  4 ++--
>  drivers/scsi/scsi_lib.c    |  2 +-
>  include/scsi/scsi_device.h | 25 ++++++++++++++++++---
>  4 files changed, 57 insertions(+), 20 deletions(-)
>