Message ID | 20230723234422.1629194-1-haowenchao2@huawei.com |
---|---|
Headers | show |
Series | scsi: Support LUN/target based error handle | expand |
On 2023/7/24 7:44, Wenchao Hao wrote: > The origin error handle would set host to recovery state and perform > error recovery operations, and makes all LUNs which share a same host > can not handle IOs. This phenomenon is unbearable for systems which > deploy many LUNs in one HBA. > > This patchset introduce support for LUN/target based error handle, > drivers can chose if to implement it. They can implement LUN, target or > both of LUN and target based error handle by their own error handle > strategy. The first patch defined this framework, it abstract three > key operations which are: add error command, wake up error handle, block > ios when error command is added and recoverying. Drivers should > implement these three function callbacks and setup to SCSI middle level. > Ping... Is anyone reviewing these changes? > Besides the basic framework, this patchset also add a basic LUN/target > based error handle strategy. > > For LUN based eh, it would try check sense, start unit and reset LUN, > if all above steps can not recovery all error commands, fallback to > further recovery like tartget based (if implemented) or host based error > handle. > > It's same for tartget based eh, it would try check sense, start unit, > reset LUN and reset target. If all above steps can not recovery all error > commands, fallback to further recovery which is host based error handle. > > This patchset is tested by scsi_debug which support single LUN error > injection, the scsi_debug patches is here: > > https://lore.kernel.org/linux-scsi/20230723234105.1628982-1-haowenchao2@huawei.com/T/#t > > Wenchao Hao (13): > scsi: Define basic framework for driver LUN/target based error handle > scsi:scsi_error: Move complete variable eh_action from shost to sdevice > scsi:scsi_error: Check if to do reset in scsi_try_xxx_reset > scsi:scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT > scsi:scsi_error: Add helper scsi_eh_sdev_reset to do lun reset > scsi:scsi_error: Add flags to mark error handle steps has done > scsi:scsi_error: Define helper to perform LUN based error handle > scsi:scsi_error: Add LUN based error handler based previous helper > scsi:core: increase/decrease target_busy without check can_queue > scsi:scsi_error: Define helper to perform target based error handle > scsi:scsi_error: Add target based error handler based previous helper > scsi:scsi_debug: Add param to control if setup LUN based error handle > scsi:scsi_debug: Add param to control if setup target based error handle > > drivers/scsi/scsi_debug.c | 19 + > drivers/scsi/scsi_error.c | 705 ++++++++++++++++++++++++++++++++++--- > drivers/scsi/scsi_lib.c | 23 +- > drivers/scsi/scsi_priv.h | 20 ++ > include/scsi/scsi_device.h | 97 +++++ > include/scsi/scsi_eh.h | 4 + > include/scsi/scsi_host.h | 2 - > 7 files changed, 813 insertions(+), 57 deletions(-) >
On 2023/7/24 7:44, Wenchao Hao wrote: Ping again... > The origin error handle would set host to recovery state and perform > error recovery operations, and makes all LUNs which share a same host > can not handle IOs. This phenomenon is unbearable for systems which > deploy many LUNs in one HBA. > > This patchset introduce support for LUN/target based error handle, > drivers can chose if to implement it. They can implement LUN, target or > both of LUN and target based error handle by their own error handle > strategy. The first patch defined this framework, it abstract three > key operations which are: add error command, wake up error handle, block > ios when error command is added and recoverying. Drivers should > implement these three function callbacks and setup to SCSI middle level. > > Besides the basic framework, this patchset also add a basic LUN/target > based error handle strategy. > > For LUN based eh, it would try check sense, start unit and reset LUN, > if all above steps can not recovery all error commands, fallback to > further recovery like tartget based (if implemented) or host based error > handle. > > It's same for tartget based eh, it would try check sense, start unit, > reset LUN and reset target. If all above steps can not recovery all error > commands, fallback to further recovery which is host based error handle. > > This patchset is tested by scsi_debug which support single LUN error > injection, the scsi_debug patches is here: > > https://lore.kernel.org/linux-scsi/20230723234105.1628982-1-haowenchao2@huawei.com/T/#t > > Wenchao Hao (13): > scsi: Define basic framework for driver LUN/target based error handle > scsi:scsi_error: Move complete variable eh_action from shost to sdevice > scsi:scsi_error: Check if to do reset in scsi_try_xxx_reset > scsi:scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT > scsi:scsi_error: Add helper scsi_eh_sdev_reset to do lun reset > scsi:scsi_error: Add flags to mark error handle steps has done > scsi:scsi_error: Define helper to perform LUN based error handle > scsi:scsi_error: Add LUN based error handler based previous helper > scsi:core: increase/decrease target_busy without check can_queue > scsi:scsi_error: Define helper to perform target based error handle > scsi:scsi_error: Add target based error handler based previous helper > scsi:scsi_debug: Add param to control if setup LUN based error handle > scsi:scsi_debug: Add param to control if setup target based error handle > > drivers/scsi/scsi_debug.c | 19 + > drivers/scsi/scsi_error.c | 705 ++++++++++++++++++++++++++++++++++--- > drivers/scsi/scsi_lib.c | 23 +- > drivers/scsi/scsi_priv.h | 20 ++ > include/scsi/scsi_device.h | 97 +++++ > include/scsi/scsi_eh.h | 4 + > include/scsi/scsi_host.h | 2 - > 7 files changed, 813 insertions(+), 57 deletions(-) >