Message ID | 20200927130420.1095-1-fangying1@huawei.com |
---|---|
Headers | show |
Series | block-backend: Introduce I/O hang | expand |
Patchew URL: https://patchew.org/QEMU/20200927130420.1095-1-fangying1@huawei.com/ Hi, This series failed the docker-mingw@fedora build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #! /bin/bash export ARCH=x86_64 make docker-image-fedora V=1 NETWORK=1 time make docker-test-mingw@fedora J=14 NETWORK=1 === TEST SCRIPT END === Host machine cpu: x86_64 Target machine cpu family: x86 Target machine cpu: x86_64 ../src/meson.build:10: WARNING: Module unstable-keyval has no backwards or forwards compatibility and might not exist in future releases. Program sh found: YES Program python3 found: YES (/usr/bin/python3) Configuring ninjatool using configuration --- Compiling C object libblock.fa.p/block_vdi.c.obj Compiling C object libblock.fa.p/block_cloop.c.obj ../src/block/block-backend.c: In function 'blk_new': ../src/block/block-backend.c:386:5: error: implicit declaration of function 'atomic_set'; did you mean 'qatomic_set'? [-Werror=implicit-function-declaration] 386 | atomic_set(&blk->reinfo.in_flight, 0); | ^~~~~~~~~~ | qatomic_set ../src/block/block-backend.c:386:5: error: nested extern declaration of 'atomic_set' [-Werror=nested-externs] In file included from /usr/x86_64-w64-mingw32/sys-root/mingw/lib/glib-2.0/include/glibconfig.h:9, from /usr/x86_64-w64-mingw32/sys-root/mingw/include/glib-2.0/glib/gtypes.h:32, from /usr/x86_64-w64-mingw32/sys-root/mingw/include/glib-2.0/glib/galloca.h:32, --- from /tmp/qemu-test/src/include/qemu/osdep.h:126, from ../src/block/block-backend.c:13: ../src/block/block-backend.c: In function 'blk_delete': ../src/block/block-backend.c:479:12: error: implicit declaration of function 'atomic_read'; did you mean 'qatomic_read'? [-Werror=implicit-function-declaration] 479 | assert(atomic_read(&blk->reinfo.in_flight) == 0); | ^~~~~~~~~~~ /usr/x86_64-w64-mingw32/sys-root/mingw/include/glib-2.0/glib/gmacros.h:928:8: note: in definition of macro '_G_BOOLEAN_EXPR' --- ../src/block/block-backend.c:479:5: note: in expansion of macro 'assert' 479 | assert(atomic_read(&blk->reinfo.in_flight) == 0); | ^~~~~~ ../src/block/block-backend.c:479:12: error: nested extern declaration of 'atomic_read' [-Werror=nested-externs] 479 | assert(atomic_read(&blk->reinfo.in_flight) == 0); | ^~~~~~~~~~~ /usr/x86_64-w64-mingw32/sys-root/mingw/include/glib-2.0/glib/gmacros.h:928:8: note: in definition of macro '_G_BOOLEAN_EXPR' --- 479 | assert(atomic_read(&blk->reinfo.in_flight) == 0); | ^~~~~~ ../src/block/block-backend.c: In function 'blk_rehandle_insert_aiocb': ../src/block/block-backend.c:2459:5: error: implicit declaration of function 'atomic_inc'; did you mean 'qatomic_inc'? [-Werror=implicit-function-declaration] 2459 | atomic_inc(&blk->reinfo.in_flight); | ^~~~~~~~~~ | qatomic_inc ../src/block/block-backend.c:2459:5: error: nested extern declaration of 'atomic_inc' [-Werror=nested-externs] ../src/block/block-backend.c: In function 'blk_rehandle_remove_aiocb': ../src/block/block-backend.c:2468:5: error: implicit declaration of function 'atomic_dec'; did you mean 'qatomic_dec'? [-Werror=implicit-function-declaration] 2468 | atomic_dec(&blk->reinfo.in_flight); | ^~~~~~~~~~ | qatomic_dec ../src/block/block-backend.c:2468:5: error: nested extern declaration of 'atomic_dec' [-Werror=nested-externs] cc1: all warnings being treated as errors make: *** [Makefile.ninja:888: libblock.fa.p/block_block-backend.c.obj] Error 1 make: *** Waiting for unfinished jobs.... Traceback (most recent call last): File "./tests/docker/docker.py", line 709, in <module> --- raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--rm', '--label', 'com.qemu.instance.uuid=4c3aba1eb35b428ca91e79a610e892a6', '-u', '1001', '--security-opt', 'seccomp=unconfined', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-1pm_eno6/src/docker-src.2020-09-27-09.21.55.30331:/var/tmp/qemu:z,ro', 'qemu/fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2. filter=--filter=label=com.qemu.instance.uuid=4c3aba1eb35b428ca91e79a610e892a6 make[1]: *** [docker-run] Error 1 make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-1pm_eno6/src' make: *** [docker-run-test-mingw@fedora] Error 2 real 5m11.016s user 0m19.775s The full log is available at http://patchew.org/logs/20200927130420.1095-1-fangying1@huawei.com/testing.docker-mingw@fedora/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
Patchew URL: https://patchew.org/QEMU/20200927130420.1095-1-fangying1@huawei.com/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash make docker-image-centos7 V=1 NETWORK=1 time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 === TEST SCRIPT END === C linker for the host machine: cc ld.bfd 2.27-43 Host machine cpu family: x86_64 Host machine cpu: x86_64 ../src/meson.build:10: WARNING: Module unstable-keyval has no backwards or forwards compatibility and might not exist in future releases. Program sh found: YES Program python3 found: YES (/usr/bin/python3) Configuring ninjatool using configuration --- Compiling C object libblock.fa.p/block_commit.c.o Compiling C object libblock.fa.p/block_vhdx-endian.c.o ../src/block/block-backend.c: In function 'blk_new': ../src/block/block-backend.c:386:5: error: implicit declaration of function 'atomic_set' [-Werror=implicit-function-declaration] atomic_set(&blk->reinfo.in_flight, 0); ^ ../src/block/block-backend.c:386:5: error: nested extern declaration of 'atomic_set' [-Werror=nested-externs] ../src/block/block-backend.c: In function 'blk_delete': ../src/block/block-backend.c:479:5: error: implicit declaration of function 'atomic_read' [-Werror=implicit-function-declaration] assert(atomic_read(&blk->reinfo.in_flight) == 0); ^ ../src/block/block-backend.c:479:5: error: nested extern declaration of 'atomic_read' [-Werror=nested-externs] ../src/block/block-backend.c: In function 'blk_rehandle_insert_aiocb': ../src/block/block-backend.c:2459:5: error: implicit declaration of function 'atomic_inc' [-Werror=implicit-function-declaration] atomic_inc(&blk->reinfo.in_flight); ^ ../src/block/block-backend.c:2459:5: error: nested extern declaration of 'atomic_inc' [-Werror=nested-externs] ../src/block/block-backend.c: In function 'blk_rehandle_remove_aiocb': ../src/block/block-backend.c:2468:5: error: implicit declaration of function 'atomic_dec' [-Werror=implicit-function-declaration] atomic_dec(&blk->reinfo.in_flight); ^ ../src/block/block-backend.c:2468:5: error: nested extern declaration of 'atomic_dec' [-Werror=nested-externs] cc1: all warnings being treated as errors make: *** [libblock.fa.p/block_block-backend.c.o] Error 1 make: *** Waiting for unfinished jobs.... Traceback (most recent call last): File "./tests/docker/docker.py", line 709, in <module> --- raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--rm', '--label', 'com.qemu.instance.uuid=39951e04bf3b4809a4afe5755ca771f5', '-u', '1001', '--security-opt', 'seccomp=unconfined', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-5qkiksy1/src/docker-src.2020-09-27-09.28.20.6987:/var/tmp/qemu:z,ro', 'qemu/centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2. filter=--filter=label=com.qemu.instance.uuid=39951e04bf3b4809a4afe5755ca771f5 make[1]: *** [docker-run] Error 1 make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-5qkiksy1/src' make: *** [docker-run-test-quick@centos7] Error 2 real 4m6.755s user 0m23.139s The full log is available at http://patchew.org/logs/20200927130420.1095-1-fangying1@huawei.com/testing.docker-quick@centos7/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
Am 27.09.2020 um 15:04 hat Ying Fang geschrieben: > A VM in the cloud environment may use a virutal disk as the backend storage, > and there are usually filesystems on the virtual block device. When backend > storage is temporarily down, any I/O issued to the virtual block device will > cause an error. For example, an error occurred in ext4 filesystem would make > the filesystem readonly. However a cloud backend storage can be soon recovered. > For example, an IP-SAN may be down due to network failure and will be online > soon after network is recovered. The error in the filesystem may not be > recovered unless a device reattach or system restart. So an I/O rehandle is > in need to implement a self-healing mechanism. > > This patch series propose a feature called I/O hang. It can rehandle AIOs > with EIO error without sending error back to guest. From guest's perspective > of view it is just like an IO is hanging and not returned. Guest can get > back running smoothly when I/O is recovred with this feature enabled. What is the problem with setting werror=stop and rerror=stop for the device? Is it that QEMU won't automatically retry, but management tool interaction is required to resume the guest? I haven't checked your patches in detail yet, but implementing this functionality in the backend means that blk_drain() will hang (or if it doesn't hang, it doesn't do what it's supposed to do), making the whole QEMU process unresponsive until the I/O succeeds again. Amongst others, this would make it impossible to migrate away from a host with storage problems. Kevin
On 2020/9/28 18:57, Kevin Wolf wrote: > Am 27.09.2020 um 15:04 hat Ying Fang geschrieben: >> A VM in the cloud environment may use a virutal disk as the backend storage, >> and there are usually filesystems on the virtual block device. When backend >> storage is temporarily down, any I/O issued to the virtual block device will >> cause an error. For example, an error occurred in ext4 filesystem would make >> the filesystem readonly. However a cloud backend storage can be soon recovered. >> For example, an IP-SAN may be down due to network failure and will be online >> soon after network is recovered. The error in the filesystem may not be >> recovered unless a device reattach or system restart. So an I/O rehandle is >> in need to implement a self-healing mechanism. >> >> This patch series propose a feature called I/O hang. It can rehandle AIOs >> with EIO error without sending error back to guest. From guest's perspective >> of view it is just like an IO is hanging and not returned. Guest can get >> back running smoothly when I/O is recovred with this feature enabled. > > What is the problem with setting werror=stop and rerror=stop for the When an I/O error occurs, if simply setting werror=stop and rerror=stop, the whole VM will be paused and unavailable. Moreover, the VM won't be recovered until the management tool manually resumes it after the backend storage recovers. > device? Is it that QEMU won't automatically retry, but management tool > interaction is required to resume the guest? By using I/O Hang mechanism, we can temporarily hang the IOs, and any other services unrelated with the hung virtual block device like network can go on working. Besides, once the backend storage is recovered, our I/O rehandle mechanism will automatically complete the hung IOs and continue the VM's work. > > I haven't checked your patches in detail yet, but implementing this > functionality in the backend means that blk_drain() will hang (or if it > doesn't hang, it doesn't do what it's supposed to do), making the whole What if we disable rehandle before blk_drain(). > QEMU process unresponsive until the I/O succeeds again. Amongst others, > this would make it impossible to migrate away from a host with storage > problems. Exactly if the storage is recovered during migration iteration phase, the migration can succeed, but if the storage is still not recovered on migration completion phase, the migration should fail and be cancelled. Thanks, Jiahui Cen > > Kevin > > > . >