mbox series

[RFC,v2,0/8] block-backend: Introduce I/O hang

Message ID 20200930094606.5323-1-cenjiahui@huawei.com
Headers show
Series block-backend: Introduce I/O hang | expand

Message

cenjiahui Sept. 30, 2020, 9:45 a.m. UTC
A VM in the cloud environment may use a virutal disk as the backend storage,
and there are usually filesystems on the virtual block device. When backend
storage is temporarily down, any I/O issued to the virtual block device will
cause an error. For example, an error occurred in ext4 filesystem would make
the filesystem readonly. However a cloud backend storage can be soon recovered.
For example, an IP-SAN may be down due to network failure and will be online
soon after network is recovered. The error in the filesystem may not be
recovered unless a device reattach or system restart. So an I/O rehandle is
in need to implement a self-healing mechanism.

This patch series propose a feature called I/O hang. It can rehandle AIOs
with EIO error without sending error back to guest. From guest's perspective
of view it is just like an IO is hanging and not returned. Guest can get
back running smoothly when I/O is recovred with this feature enabled.

v1->v2:
* Rebase to fix compile problems.
* Fix incorrect remove of rehandle list.
* Provide rehandle pause interface.

Jiahui Cen (8):
  block-backend: introduce I/O rehandle info
  block-backend: rehandle block aios when EIO
  block-backend: add I/O hang timeout
  block-backend: add I/O rehandle pause/unpause
  block-backend: enable I/O hang when timeout is set
  virtio-blk: pause I/O hang when resetting
  qemu-option: add I/O hang timeout option
  qapi: add I/O hang and I/O hang timeout qapi event

 block/block-backend.c          | 300 +++++++++++++++++++++++++++++++++
 blockdev.c                     |  11 ++
 hw/block/virtio-blk.c          |   8 +
 include/sysemu/block-backend.h |   5 +
 qapi/block-core.json           |  26 +++
 5 files changed, 350 insertions(+)

Comments

cenjiahui Oct. 10, 2020, 2:27 a.m. UTC | #1
Hi Kevin,

Could you please spend some time reviewing and commenting on this patch series.

Thanks,
Jiahui Cen

On 2020/9/30 17:45, Jiahui Cen wrote:
> A VM in the cloud environment may use a virutal disk as the backend storage,
> and there are usually filesystems on the virtual block device. When backend
> storage is temporarily down, any I/O issued to the virtual block device will
> cause an error. For example, an error occurred in ext4 filesystem would make
> the filesystem readonly. However a cloud backend storage can be soon recovered.
> For example, an IP-SAN may be down due to network failure and will be online
> soon after network is recovered. The error in the filesystem may not be
> recovered unless a device reattach or system restart. So an I/O rehandle is
> in need to implement a self-healing mechanism.
> 
> This patch series propose a feature called I/O hang. It can rehandle AIOs
> with EIO error without sending error back to guest. From guest's perspective
> of view it is just like an IO is hanging and not returned. Guest can get
> back running smoothly when I/O is recovred with this feature enabled.
> 
> v1->v2:
> * Rebase to fix compile problems.
> * Fix incorrect remove of rehandle list.
> * Provide rehandle pause interface.
> 
> Jiahui Cen (8):
>   block-backend: introduce I/O rehandle info
>   block-backend: rehandle block aios when EIO
>   block-backend: add I/O hang timeout
>   block-backend: add I/O rehandle pause/unpause
>   block-backend: enable I/O hang when timeout is set
>   virtio-blk: pause I/O hang when resetting
>   qemu-option: add I/O hang timeout option
>   qapi: add I/O hang and I/O hang timeout qapi event
> 
>  block/block-backend.c          | 300 +++++++++++++++++++++++++++++++++
>  blockdev.c                     |  11 ++
>  hw/block/virtio-blk.c          |   8 +
>  include/sysemu/block-backend.h |   5 +
>  qapi/block-core.json           |  26 +++
>  5 files changed, 350 insertions(+)
>
Ying Fang Oct. 16, 2020, 1:12 a.m. UTC | #2
On 10/10/2020 10:27 AM, cenjiahui wrote:
> Hi Kevin,
> 
> Could you please spend some time reviewing and commenting on this patch series.
> 
> Thanks,
> Jiahui Cen

This feature is confirmed effective in a cloud storage environment since
it can help to improve the availability without pausing the entire
guest. Hope it won't be lost on the thread. Any comments or reviews
are welcome.

> 
> On 2020/9/30 17:45, Jiahui Cen wrote:
>> A VM in the cloud environment may use a virutal disk as the backend storage,
>> and there are usually filesystems on the virtual block device. When backend
>> storage is temporarily down, any I/O issued to the virtual block device will
>> cause an error. For example, an error occurred in ext4 filesystem would make
>> the filesystem readonly. However a cloud backend storage can be soon recovered.
>> For example, an IP-SAN may be down due to network failure and will be online
>> soon after network is recovered. The error in the filesystem may not be
>> recovered unless a device reattach or system restart. So an I/O rehandle is
>> in need to implement a self-healing mechanism.
>>
>> This patch series propose a feature called I/O hang. It can rehandle AIOs
>> with EIO error without sending error back to guest. From guest's perspective
>> of view it is just like an IO is hanging and not returned. Guest can get
>> back running smoothly when I/O is recovred with this feature enabled.
>>
>> v1->v2:
>> * Rebase to fix compile problems.
>> * Fix incorrect remove of rehandle list.
>> * Provide rehandle pause interface.
>>
>> Jiahui Cen (8):
>>    block-backend: introduce I/O rehandle info
>>    block-backend: rehandle block aios when EIO
>>    block-backend: add I/O hang timeout
>>    block-backend: add I/O rehandle pause/unpause
>>    block-backend: enable I/O hang when timeout is set
>>    virtio-blk: pause I/O hang when resetting
>>    qemu-option: add I/O hang timeout option
>>    qapi: add I/O hang and I/O hang timeout qapi event
>>
>>   block/block-backend.c          | 300 +++++++++++++++++++++++++++++++++
>>   blockdev.c                     |  11 ++
>>   hw/block/virtio-blk.c          |   8 +
>>   include/sysemu/block-backend.h |   5 +
>>   qapi/block-core.json           |  26 +++
>>   5 files changed, 350 insertions(+)
>>
> .
>