From patchwork Sun Sep 27 13:04:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ying Fang X-Patchwork-Id: 272659 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BAF9C4346E for ; Sun, 27 Sep 2020 13:08:41 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D747E2389F for ; Sun, 27 Sep 2020 13:08:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D747E2389F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:36056 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kMWPw-0004NS-15 for qemu-devel@archiver.kernel.org; Sun, 27 Sep 2020 09:08:40 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60868) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMS-0007qq-Ny for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:04 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:5149 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMQ-0003N5-Bk for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:04 -0400 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id D389B1639CE2190F26BD; Sun, 27 Sep 2020 21:04:48 +0800 (CST) Received: from localhost (10.174.185.104) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.487.0; Sun, 27 Sep 2020 21:04:38 +0800 From: Ying Fang To: Subject: [RFC PATCH 1/7] block-backend: introduce I/O rehandle info Date: Sun, 27 Sep 2020 21:04:14 +0800 Message-ID: <20200927130420.1095-2-fangying1@huawei.com> X-Mailer: git-send-email 2.28.0.windows.1 In-Reply-To: <20200927130420.1095-1-fangying1@huawei.com> References: <20200927130420.1095-1-fangying1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.185.104] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.190; envelope-from=fangying1@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/27 09:04:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Ying Fang , Jiahui Cen , zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" The I/O hang feature is realized based on a rehandle mechanism. Each block backend will have a list to store hanging block AIOs, and a timer to regularly resend these aios. In order to issue the AIOs again, each block AIOs also need to store its coroutine entry. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index 24dd0670d1..bf104a7cf5 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -35,6 +35,18 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb); +/* block backend rehandle timer interval 5s */ +#define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL 5000 + +typedef struct BlockBackendRehandleInfo { + bool enable; + QEMUTimer *ts; + unsigned timer_interval_ms; + + unsigned int in_flight; + QTAILQ_HEAD(, BlkAioEmAIOCB) re_aios; +} BlockBackendRehandleInfo; + typedef struct BlockBackendAioNotifier { void (*attached_aio_context)(AioContext *new_context, void *opaque); void (*detach_aio_context)(void *opaque); @@ -95,6 +107,8 @@ struct BlockBackend { * Accessed with atomic ops. */ unsigned int in_flight; + + BlockBackendRehandleInfo reinfo; }; typedef struct BlockBackendAIOCB { @@ -350,6 +364,7 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm) qemu_co_queue_init(&blk->queued_requests); notifier_list_init(&blk->remove_bs_notifiers); notifier_list_init(&blk->insert_bs_notifiers); + QLIST_INIT(&blk->aio_notifiers); QTAILQ_INSERT_TAIL(&block_backends, blk, link); @@ -1392,6 +1407,10 @@ typedef struct BlkAioEmAIOCB { BlkRwCo rwco; int bytes; bool has_returned; + + /* for rehandle */ + CoroutineEntry *co_entry; + QTAILQ_ENTRY(BlkAioEmAIOCB) list; } BlkAioEmAIOCB; static AioContext *blk_aio_em_aiocb_get_aio_context(BlockAIOCB *acb_) From patchwork Sun Sep 27 13:04:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ying Fang X-Patchwork-Id: 272658 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F890C4346E for ; Sun, 27 Sep 2020 13:08:50 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E95F12389F for ; Sun, 27 Sep 2020 13:08:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E95F12389F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:36460 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kMWQ5-0004XO-2e for qemu-devel@archiver.kernel.org; Sun, 27 Sep 2020 09:08:49 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60872) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMS-0007qy-U9 for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:04 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:60246 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMQ-0003N2-C2 for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:04 -0400 Received: from DGGEMS402-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id C68EDEBA499277E74A44; Sun, 27 Sep 2020 21:04:48 +0800 (CST) Received: from localhost (10.174.185.104) by DGGEMS402-HUB.china.huawei.com (10.3.19.202) with Microsoft SMTP Server id 14.3.487.0; Sun, 27 Sep 2020 21:04:39 +0800 From: Ying Fang To: Subject: [RFC PATCH 2/7] block-backend: rehandle block aios when EIO Date: Sun, 27 Sep 2020 21:04:15 +0800 Message-ID: <20200927130420.1095-3-fangying1@huawei.com> X-Mailer: git-send-email 2.28.0.windows.1 In-Reply-To: <20200927130420.1095-1-fangying1@huawei.com> References: <20200927130420.1095-1-fangying1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.185.104] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.32; envelope-from=fangying1@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/27 09:04:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Ying Fang , Jiahui Cen , zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" When a backend device temporarily does not response, like a network disk down due to some network faults, any IO to the coresponding virtual block device in VM would return I/O error. If the hypervisor returns the error to VM, the filesystem on this block device may not work as usual. And in many situations, the returned error is often an EIO. To avoid this unavailablity, we can store the failed AIOs, and resend them later. If the error is temporary, the retries can succeed and the AIOs can be successfully completed. Signed-off-by: Ying Fang Signed-off-by: Jiahui Cen --- block/block-backend.c | 89 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index bf104a7cf5..90f1ca5753 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -365,6 +365,12 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm) notifier_list_init(&blk->remove_bs_notifiers); notifier_list_init(&blk->insert_bs_notifiers); + /* for rehandle */ + blk->reinfo.enable = false; + blk->reinfo.ts = NULL; + atomic_set(&blk->reinfo.in_flight, 0); + QTAILQ_INIT(&blk->reinfo.re_aios); + QLIST_INIT(&blk->aio_notifiers); QTAILQ_INSERT_TAIL(&block_backends, blk, link); @@ -1425,8 +1431,16 @@ static const AIOCBInfo blk_aio_em_aiocb_info = { .get_aio_context = blk_aio_em_aiocb_get_aio_context, }; +static void blk_rehandle_timer_cb(void *opaque); +static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb); + static void blk_aio_complete(BlkAioEmAIOCB *acb) { + if (acb->rwco.blk->reinfo.enable) { + blk_rehandle_aio_complete(acb); + return; + } + if (acb->has_returned) { acb->common.cb(acb->common.opaque, acb->rwco.ret); blk_dec_in_flight(acb->rwco.blk); @@ -1459,6 +1473,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes, .ret = NOT_DONE, }; acb->bytes = bytes; + acb->co_entry = co_entry; acb->has_returned = false; co = qemu_coroutine_create(co_entry, acb); @@ -2054,6 +2069,20 @@ static int blk_do_set_aio_context(BlockBackend *blk, AioContext *new_context, throttle_group_attach_aio_context(tgm, new_context); bdrv_drained_end(bs); } + + if (blk->reinfo.enable) { + if (blk->reinfo.ts) { + timer_del(blk->reinfo.ts); + timer_free(blk->reinfo.ts); + } + blk->reinfo.ts = aio_timer_new(new_context, QEMU_CLOCK_REALTIME, + SCALE_MS, blk_rehandle_timer_cb, + blk); + if (atomic_read(&blk->reinfo.in_flight)) { + timer_mod(blk->reinfo.ts, + qemu_clock_get_ms(QEMU_CLOCK_REALTIME)); + } + } } blk->ctx = new_context; @@ -2405,6 +2434,66 @@ static void blk_root_drained_end(BdrvChild *child, int *drained_end_counter) } } +static void blk_rehandle_insert_aiocb(BlockBackend *blk, BlkAioEmAIOCB *acb) +{ + assert(blk->reinfo.enable); + + atomic_inc(&blk->reinfo.in_flight); + QTAILQ_INSERT_TAIL(&blk->reinfo.re_aios, acb, list); + timer_mod(blk->reinfo.ts, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + + blk->reinfo.timer_interval_ms); +} + +static void blk_rehandle_remove_aiocb(BlockBackend *blk, BlkAioEmAIOCB *acb) +{ + QTAILQ_REMOVE(&blk->reinfo.re_aios, acb, list); + atomic_dec(&blk->reinfo.in_flight); +} + +static void blk_rehandle_timer_cb(void *opaque) +{ + BlockBackend *blk = opaque; + BlockBackendRehandleInfo *reinfo = &blk->reinfo; + BlkAioEmAIOCB *acb, *tmp; + Coroutine *co; + + aio_context_acquire(blk_get_aio_context(blk)); + QTAILQ_FOREACH_SAFE(acb, &reinfo->re_aios, list, tmp) { + if (acb->rwco.ret == NOT_DONE) { + continue; + } + + blk_inc_in_flight(acb->rwco.blk); + acb->rwco.ret = NOT_DONE; + acb->has_returned = false; + + co = qemu_coroutine_create(acb->co_entry, acb); + bdrv_coroutine_enter(blk_bs(blk), co); + + acb->has_returned = true; + if (acb->rwco.ret != NOT_DONE) { + blk_rehandle_remove_aiocb(acb->rwco.blk, acb); + replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), + blk_aio_complete_bh, acb); + } + } + aio_context_release(blk_get_aio_context(blk)); +} + +static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb) +{ + if (acb->has_returned) { + blk_dec_in_flight(acb->rwco.blk); + if (acb->rwco.ret == -EIO) { + blk_rehandle_insert_aiocb(acb->rwco.blk, acb); + return; + } + + acb->common.cb(acb->common.opaque, acb->rwco.ret); + qemu_aio_unref(acb); + } +} + void blk_register_buf(BlockBackend *blk, void *host, size_t size) { bdrv_register_buf(blk_bs(blk), host, size); From patchwork Sun Sep 27 13:04:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ying Fang X-Patchwork-Id: 272660 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92BA5C4346E for ; Sun, 27 Sep 2020 13:06:47 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 30F3C2399C for ; Sun, 27 Sep 2020 13:06:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 30F3C2399C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:56494 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kMWO6-0001Ev-8O for qemu-devel@archiver.kernel.org; Sun, 27 Sep 2020 09:06:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60892) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMU-0007tk-PU for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:06 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:60240 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMQ-0003N0-Vy for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:06 -0400 Received: from DGGEMS401-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 2361F5E510B5427236C8; Sun, 27 Sep 2020 21:04:47 +0800 (CST) Received: from localhost (10.174.185.104) by DGGEMS401-HUB.china.huawei.com (10.3.19.201) with Microsoft SMTP Server id 14.3.487.0; Sun, 27 Sep 2020 21:04:40 +0800 From: Ying Fang To: Subject: [RFC PATCH 3/7] block-backend: add I/O hang timeout Date: Sun, 27 Sep 2020 21:04:16 +0800 Message-ID: <20200927130420.1095-4-fangying1@huawei.com> X-Mailer: git-send-email 2.28.0.windows.1 In-Reply-To: <20200927130420.1095-1-fangying1@huawei.com> References: <20200927130420.1095-1-fangying1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.185.104] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.32; envelope-from=fangying1@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/27 09:04:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Ying Fang , Jiahui Cen , zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Not all errors would be fixed, so it is better to add a rehandle timeout for I/O hang. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 99 +++++++++++++++++++++++++++++++++- include/sysemu/block-backend.h | 2 + 2 files changed, 100 insertions(+), 1 deletion(-) diff --git a/block/block-backend.c b/block/block-backend.c index 90f1ca5753..d0b2b59f55 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -38,6 +38,11 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb); /* block backend rehandle timer interval 5s */ #define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL 5000 +enum BlockIOHangStatus { + BLOCK_IO_HANG_STATUS_NORMAL = 0, + BLOCK_IO_HANG_STATUS_HANG, +}; + typedef struct BlockBackendRehandleInfo { bool enable; QEMUTimer *ts; @@ -109,6 +114,11 @@ struct BlockBackend { unsigned int in_flight; BlockBackendRehandleInfo reinfo; + + int64_t iohang_timeout; /* The I/O hang timeout value in sec. */ + int64_t iohang_time; /* The I/O hang start time */ + bool is_iohang_timeout; + int iohang_status; }; typedef struct BlockBackendAIOCB { @@ -2480,20 +2490,107 @@ static void blk_rehandle_timer_cb(void *opaque) aio_context_release(blk_get_aio_context(blk)); } +static bool blk_iohang_handle(BlockBackend *blk, int new_status) +{ + int64_t now; + int old_status = blk->iohang_status; + bool need_rehandle = false; + + switch (new_status) { + case BLOCK_IO_HANG_STATUS_NORMAL: + if (old_status == BLOCK_IO_HANG_STATUS_HANG) { + /* Case when I/O Hang is recovered */ + blk->is_iohang_timeout = false; + blk->iohang_time = 0; + } + break; + case BLOCK_IO_HANG_STATUS_HANG: + if (old_status != BLOCK_IO_HANG_STATUS_HANG) { + /* Case when I/O hang is first triggered */ + blk->iohang_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000; + need_rehandle = true; + } else { + if (!blk->is_iohang_timeout) { + now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000; + if (now >= (blk->iohang_time + blk->iohang_timeout)) { + /* Case when I/O hang is timeout */ + blk->is_iohang_timeout = true; + } else { + /* Case when I/O hang is continued */ + need_rehandle = true; + } + } + } + break; + default: + break; + } + + blk->iohang_status = new_status; + return need_rehandle; +} + +static bool blk_rehandle_aio(BlkAioEmAIOCB *acb, bool *has_timeout) +{ + bool need_rehandle = false; + + /* Rehandle aio which returns EIO before hang timeout */ + if (acb->rwco.ret == -EIO) { + if (acb->rwco.blk->is_iohang_timeout) { + /* I/O hang has timeout and not recovered */ + *has_timeout = true; + } else { + need_rehandle = blk_iohang_handle(acb->rwco.blk, + BLOCK_IO_HANG_STATUS_HANG); + /* I/O hang timeout first trigger */ + if (acb->rwco.blk->is_iohang_timeout) { + *has_timeout = true; + } + } + } + + return need_rehandle; +} + static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb) { + bool has_timeout = false; + bool need_rehandle = false; + if (acb->has_returned) { blk_dec_in_flight(acb->rwco.blk); - if (acb->rwco.ret == -EIO) { + need_rehandle = blk_rehandle_aio(acb, &has_timeout); + if (need_rehandle) { blk_rehandle_insert_aiocb(acb->rwco.blk, acb); return; } acb->common.cb(acb->common.opaque, acb->rwco.ret); + + /* I/O hang return to normal status */ + if (!has_timeout) { + blk_iohang_handle(acb->rwco.blk, BLOCK_IO_HANG_STATUS_NORMAL); + } + qemu_aio_unref(acb); } } +void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout) +{ + if (!blk) { + return; + } + + blk->is_iohang_timeout = false; + blk->iohang_time = 0; + blk->iohang_timeout = 0; + blk->iohang_status = BLOCK_IO_HANG_STATUS_NORMAL; + if (iohang_timeout > 0) { + blk->iohang_timeout = iohang_timeout; + } +} + void blk_register_buf(BlockBackend *blk, void *host, size_t size) { bdrv_register_buf(blk_bs(blk), host, size); diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index 8203d7f6f9..bfebe3a960 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -268,4 +268,6 @@ const BdrvChild *blk_root(BlockBackend *blk); int blk_make_empty(BlockBackend *blk, Error **errp); +void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout); + #endif From patchwork Sun Sep 27 13:04:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ying Fang X-Patchwork-Id: 304335 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07A14C4741F for ; Sun, 27 Sep 2020 13:06:48 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 82916239A1 for ; Sun, 27 Sep 2020 13:06:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 82916239A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:56506 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kMWO6-0001F8-HF for qemu-devel@archiver.kernel.org; Sun, 27 Sep 2020 09:06:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60858) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMS-0007qe-5d for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:04 -0400 Received: from szxga07-in.huawei.com ([45.249.212.35]:40272 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMQ-0003Mz-1H for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:03 -0400 Received: from DGGEMS404-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id AFA9BD578B7977E05671; Sun, 27 Sep 2020 21:04:47 +0800 (CST) Received: from localhost (10.174.185.104) by DGGEMS404-HUB.china.huawei.com (10.3.19.204) with Microsoft SMTP Server id 14.3.487.0; Sun, 27 Sep 2020 21:04:41 +0800 From: Ying Fang To: Subject: [RFC PATCH 4/7] block-backend: add I/O hang drain when disbale Date: Sun, 27 Sep 2020 21:04:17 +0800 Message-ID: <20200927130420.1095-5-fangying1@huawei.com> X-Mailer: git-send-email 2.28.0.windows.1 In-Reply-To: <20200927130420.1095-1-fangying1@huawei.com> References: <20200927130420.1095-1-fangying1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.185.104] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.35; envelope-from=fangying1@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/27 09:04:48 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Ying Fang , Jiahui Cen , zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" To disable I/O hang, all hanging AIOs need to be drained. A rehandle status field is introduced to notify rehandle mechanism not to rehandle failed AIOs when I/O hang is disabled. Signed-off-by: Ying Fang Signed-off-by: Jiahui Cen --- block/block-backend.c | 85 ++++++++++++++++++++++++++++++++-- include/sysemu/block-backend.h | 3 ++ 2 files changed, 84 insertions(+), 4 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index d0b2b59f55..95b2d6a679 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -37,6 +37,9 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb); /* block backend rehandle timer interval 5s */ #define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL 5000 +#define BLOCK_BACKEND_REHANDLE_NORMAL 1 +#define BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED 2 +#define BLOCK_BACKEND_REHANDLE_DRAINED 3 enum BlockIOHangStatus { BLOCK_IO_HANG_STATUS_NORMAL = 0, @@ -50,6 +53,8 @@ typedef struct BlockBackendRehandleInfo { unsigned int in_flight; QTAILQ_HEAD(, BlkAioEmAIOCB) re_aios; + + int status; } BlockBackendRehandleInfo; typedef struct BlockBackendAioNotifier { @@ -471,6 +476,8 @@ static void blk_delete(BlockBackend *blk) assert(!blk->refcnt); assert(!blk->name); assert(!blk->dev); + assert(atomic_read(&blk->reinfo.in_flight) == 0); + blk_rehandle_disable(blk); if (blk->public.throttle_group_member.throttle_state) { blk_io_limits_disable(blk); } @@ -2460,6 +2467,37 @@ static void blk_rehandle_remove_aiocb(BlockBackend *blk, BlkAioEmAIOCB *acb) atomic_dec(&blk->reinfo.in_flight); } +static void blk_rehandle_drain(BlockBackend *blk) +{ + if (blk_bs(blk)) { + bdrv_drained_begin(blk_bs(blk)); + BDRV_POLL_WHILE(blk_bs(blk), atomic_read(&blk->reinfo.in_flight) > 0); + bdrv_drained_end(blk_bs(blk)); + } +} + +static bool blk_rehandle_is_paused(BlockBackend *blk) +{ + return blk->reinfo.status == BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED || + blk->reinfo.status == BLOCK_BACKEND_REHANDLE_DRAINED; +} + +static void blk_rehandle_pause(BlockBackend *blk) +{ + BlockBackendRehandleInfo *reinfo = &blk->reinfo; + + aio_context_acquire(blk_get_aio_context(blk)); + if (!reinfo->enable || reinfo->status == BLOCK_BACKEND_REHANDLE_DRAINED) { + aio_context_release(blk_get_aio_context(blk)); + return; + } + + reinfo->status = BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED; + blk_rehandle_drain(blk); + reinfo->status = BLOCK_BACKEND_REHANDLE_DRAINED; + aio_context_release(blk_get_aio_context(blk)); +} + static void blk_rehandle_timer_cb(void *opaque) { BlockBackend *blk = opaque; @@ -2559,10 +2597,12 @@ static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb) if (acb->has_returned) { blk_dec_in_flight(acb->rwco.blk); - need_rehandle = blk_rehandle_aio(acb, &has_timeout); - if (need_rehandle) { - blk_rehandle_insert_aiocb(acb->rwco.blk, acb); - return; + if (!blk_rehandle_is_paused(acb->rwco.blk)) { + need_rehandle = blk_rehandle_aio(acb, &has_timeout); + if (need_rehandle) { + blk_rehandle_insert_aiocb(acb->rwco.blk, acb); + return; + } } acb->common.cb(acb->common.opaque, acb->rwco.ret); @@ -2576,6 +2616,42 @@ static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb) } } +void blk_rehandle_enable(BlockBackend *blk) +{ + BlockBackendRehandleInfo *reinfo = &blk->reinfo; + + aio_context_acquire(blk_get_aio_context(blk)); + if (reinfo->enable) { + aio_context_release(blk_get_aio_context(blk)); + return; + } + + reinfo->ts = aio_timer_new(blk_get_aio_context(blk), QEMU_CLOCK_REALTIME, + SCALE_MS, blk_rehandle_timer_cb, blk); + reinfo->timer_interval_ms = BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL; + reinfo->status = BLOCK_BACKEND_REHANDLE_NORMAL; + reinfo->enable = true; + aio_context_release(blk_get_aio_context(blk)); +} + +void blk_rehandle_disable(BlockBackend *blk) +{ + if (!blk->reinfo.enable) { + return; + } + + blk_rehandle_pause(blk); + timer_del(blk->reinfo.ts); + timer_free(blk->reinfo.ts); + blk->reinfo.ts = NULL; + blk->reinfo.enable = false; +} + +bool blk_iohang_is_enabled(BlockBackend *blk) +{ + return blk->iohang_timeout != 0; +} + void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout) { if (!blk) { @@ -2588,6 +2664,7 @@ void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout) blk->iohang_status = BLOCK_IO_HANG_STATUS_NORMAL; if (iohang_timeout > 0) { blk->iohang_timeout = iohang_timeout; + blk_rehandle_enable(blk); } } diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index bfebe3a960..375ae13b0b 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -268,6 +268,9 @@ const BdrvChild *blk_root(BlockBackend *blk); int blk_make_empty(BlockBackend *blk, Error **errp); +void blk_rehandle_enable(BlockBackend *blk); +void blk_rehandle_disable(BlockBackend *blk); +bool blk_iohang_is_enabled(BlockBackend *blk); void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout); #endif From patchwork Sun Sep 27 13:04:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ying Fang X-Patchwork-Id: 304334 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3687BC4741F for ; Sun, 27 Sep 2020 13:08:42 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DDD9D2389F for ; Sun, 27 Sep 2020 13:08:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DDD9D2389F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:36186 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kMWPx-0004QW-2q for qemu-devel@archiver.kernel.org; Sun, 27 Sep 2020 09:08:41 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60918) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMX-00081g-EL for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:09 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:60448 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMV-0003Nt-F2 for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:09 -0400 Received: from DGGEMS410-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 03E51313DBA182BE1FC0; Sun, 27 Sep 2020 21:04:50 +0800 (CST) Received: from localhost (10.174.185.104) by DGGEMS410-HUB.china.huawei.com (10.3.19.210) with Microsoft SMTP Server id 14.3.487.0; Sun, 27 Sep 2020 21:04:41 +0800 From: Ying Fang To: Subject: [RFC PATCH 5/7] virtio-blk: disable I/O hang when resetting Date: Sun, 27 Sep 2020 21:04:18 +0800 Message-ID: <20200927130420.1095-6-fangying1@huawei.com> X-Mailer: git-send-email 2.28.0.windows.1 In-Reply-To: <20200927130420.1095-1-fangying1@huawei.com> References: <20200927130420.1095-1-fangying1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.185.104] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.32; envelope-from=fangying1@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/27 09:04:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Ying Fang , Jiahui Cen , zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" All AIOs including the hanging AIOs need to be drained when resetting virtio-blk. So it is necessary to disable I/O hang before resetting and enable I/O hang again after resetting if I/O hang is enabled. Signed-off-by: Ying Fang Signed-off-by: Jiahui Cen --- hw/block/virtio-blk.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 2204ba149e..11837a54f5 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -892,6 +892,10 @@ static void virtio_blk_reset(VirtIODevice *vdev) AioContext *ctx; VirtIOBlockReq *req; + if (blk_iohang_is_enabled(s->blk)) { + blk_rehandle_disable(s->blk); + } + ctx = blk_get_aio_context(s->blk); aio_context_acquire(ctx); blk_drain(s->blk); @@ -909,6 +913,10 @@ static void virtio_blk_reset(VirtIODevice *vdev) assert(!s->dataplane_started); blk_set_enable_write_cache(s->blk, s->original_wce); + + if (blk_iohang_is_enabled(s->blk)) { + blk_rehandle_enable(s->blk); + } } /* coalesce internal state, copy to pci i/o region 0 From patchwork Sun Sep 27 13:04:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ying Fang X-Patchwork-Id: 272661 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35656C4346E for ; Sun, 27 Sep 2020 13:06:44 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 904D62389F for ; Sun, 27 Sep 2020 13:06:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 904D62389F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:56088 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kMWO2-00013b-D9 for qemu-devel@archiver.kernel.org; Sun, 27 Sep 2020 09:06:42 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60820) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMR-0007qX-DZ for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:03 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:60242 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMP-0003N3-5j for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:03 -0400 Received: from DGGEMS408-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id A1FD2B9EE6899670B86C; Sun, 27 Sep 2020 21:04:48 +0800 (CST) Received: from localhost (10.174.185.104) by DGGEMS408-HUB.china.huawei.com (10.3.19.208) with Microsoft SMTP Server id 14.3.487.0; Sun, 27 Sep 2020 21:04:42 +0800 From: Ying Fang To: Subject: [RFC PATCH 6/7] qemu-option: add I/O hang timeout option Date: Sun, 27 Sep 2020 21:04:19 +0800 Message-ID: <20200927130420.1095-7-fangying1@huawei.com> X-Mailer: git-send-email 2.28.0.windows.1 In-Reply-To: <20200927130420.1095-1-fangying1@huawei.com> References: <20200927130420.1095-1-fangying1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.185.104] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.32; envelope-from=fangying1@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/27 09:04:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Ying Fang , Jiahui Cen , zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" I/O hang timeout should be different under different situations. So it is better to provide an option for user to determine I/O hang timeout for each block device. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- blockdev.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/blockdev.c b/blockdev.c index 7f2561081e..ff8cdcd497 100644 --- a/blockdev.c +++ b/blockdev.c @@ -500,6 +500,7 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, BlockdevDetectZeroesOptions detect_zeroes = BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF; const char *throttling_group = NULL; + int64_t iohang_timeout = 0; /* Check common options by copying from bs_opts to opts, all other options * stay in bs_opts for processing by bdrv_open(). */ @@ -622,6 +623,12 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, bs->detect_zeroes = detect_zeroes; + /* init timeout value for I/O Hang */ + iohang_timeout = qemu_opt_get_number(opts, "iohang-timeout", 0); + if (iohang_timeout > 0) { + blk_iohang_init(blk, iohang_timeout); + } + block_acct_setup(blk_get_stats(blk), account_invalid, account_failed); if (!parse_stats_intervals(blk_get_stats(blk), interval_list, errp)) { @@ -3786,6 +3793,10 @@ QemuOptsList qemu_common_drive_opts = { .type = QEMU_OPT_BOOL, .help = "whether to account for failed I/O operations " "in the statistics", + },{ + .name = "iohang-timeout", + .type = QEMU_OPT_NUMBER, + .help = "timeout value for I/O Hang", }, { /* end of list */ } }, From patchwork Sun Sep 27 13:04:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ying Fang X-Patchwork-Id: 304333 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC6B6C4346E for ; Sun, 27 Sep 2020 13:11:04 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 34EDF2389F for ; Sun, 27 Sep 2020 13:11:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 34EDF2389F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:40558 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kMWSF-0006Ig-8z for qemu-devel@archiver.kernel.org; Sun, 27 Sep 2020 09:11:03 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60886) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMU-0007so-Ac for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:06 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:60250 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMQ-0003N4-NT for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:06 -0400 Received: from DGGEMS408-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 9CBBE948378066848F43; Sun, 27 Sep 2020 21:04:48 +0800 (CST) Received: from localhost (10.174.185.104) by DGGEMS408-HUB.china.huawei.com (10.3.19.208) with Microsoft SMTP Server id 14.3.487.0; Sun, 27 Sep 2020 21:04:42 +0800 From: Ying Fang To: Subject: [RFC PATCH 7/7] qapi: add I/O hang and I/O hang timeout qapi event Date: Sun, 27 Sep 2020 21:04:20 +0800 Message-ID: <20200927130420.1095-8-fangying1@huawei.com> X-Mailer: git-send-email 2.28.0.windows.1 In-Reply-To: <20200927130420.1095-1-fangying1@huawei.com> References: <20200927130420.1095-1-fangying1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.185.104] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.32; envelope-from=fangying1@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/27 09:04:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Ying Fang , Jiahui Cen , zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Sometimes hypervisor management tools like libvirt may need to monitor I/O hang events. Let's report I/O hang and I/O hang timeout event via qapi. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 3 +++ qapi/block-core.json | 26 ++++++++++++++++++++++++++ 2 files changed, 29 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index 95b2d6a679..5dc5b11bcc 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2540,6 +2540,7 @@ static bool blk_iohang_handle(BlockBackend *blk, int new_status) /* Case when I/O Hang is recovered */ blk->is_iohang_timeout = false; blk->iohang_time = 0; + qapi_event_send_block_io_hang(false); } break; case BLOCK_IO_HANG_STATUS_HANG: @@ -2547,12 +2548,14 @@ static bool blk_iohang_handle(BlockBackend *blk, int new_status) /* Case when I/O hang is first triggered */ blk->iohang_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000; need_rehandle = true; + qapi_event_send_block_io_hang(true); } else { if (!blk->is_iohang_timeout) { now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000; if (now >= (blk->iohang_time + blk->iohang_timeout)) { /* Case when I/O hang is timeout */ blk->is_iohang_timeout = true; + qapi_event_send_block_io_hang_timeout(true); } else { /* Case when I/O hang is continued */ need_rehandle = true; diff --git a/qapi/block-core.json b/qapi/block-core.json index 3c16f1e11d..7bdf75c6d7 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -5535,3 +5535,29 @@ { 'command': 'blockdev-snapshot-delete-internal-sync', 'data': { 'device': 'str', '*id': 'str', '*name': 'str'}, 'returns': 'SnapshotInfo' } + +## +# @BLOCK_IO_HANG: +# +# Emitted when device I/O hang trigger event begin or end +# +# @set: true if I/O hang begin; false if I/O hang end. +# +# Since: 5.2 +# +## +{ 'event': 'BLOCK_IO_HANG', + 'data': { 'set': 'bool' }} + +## +# @BLOCK_IO_HANG_TIMEOUT: +# +# Emitted when device I/O hang timeout event set or clear +# +# @set: true if set; false if clear. +# +# Since: 5.2 +# +## +{ 'event': 'BLOCK_IO_HANG_TIMEOUT', + 'data': { 'set': 'bool' }}