From patchwork Tue May 19 20:02:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Zhang, Chen" X-Patchwork-Id: 282329 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=FROM_WSP_TRAIL, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66A51C433DF for ; Tue, 19 May 2020 20:15:16 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2F5552070A for ; Tue, 19 May 2020 20:15:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2F5552070A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:36020 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jb8dv-0000uG-7v for qemu-devel@archiver.kernel.org; Tue, 19 May 2020 16:15:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57686) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jb8be-0004Iy-Kg for qemu-devel@nongnu.org; Tue, 19 May 2020 16:12:54 -0400 Received: from mga01.intel.com ([192.55.52.88]:59949) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jb8bd-00089z-2I for qemu-devel@nongnu.org; Tue, 19 May 2020 16:12:54 -0400 IronPort-SDR: t6nlP8BzKB4LJLRWI6m6EdqHxC5RcOMf0Vk93CPC9zpYBtVsDBdz12P7VDvAsa0bKh9kQYFCnj o1T2rMOu/+5Q== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 May 2020 13:12:46 -0700 IronPort-SDR: rj6BgQWTXyz+2WPtuX1HkZASeeQfqZ0ehttwxJ9eLMAuscI4ufAeltzvdTGQgjh8jZ1mw8H1K0 9acIztqNGFTw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,410,1583222400"; d="scan'208";a="268007867" Received: from unknown (HELO localhost.localdomain) ([10.239.13.19]) by orsmga006.jf.intel.com with ESMTP; 19 May 2020 13:12:45 -0700 From: Zhang Chen To: Jason Wang Subject: [PATCH 4/7] net/colo-compare.c: Fix deadlock in compare_chr_send Date: Wed, 20 May 2020 04:02:04 +0800 Message-Id: <20200519200207.17773-5-chen.zhang@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200519200207.17773-1-chen.zhang@intel.com> References: <20200519200207.17773-1-chen.zhang@intel.com> Received-SPF: pass client-ip=192.55.52.88; envelope-from=chen.zhang@intel.com; helo=mga01.intel.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/19 16:12:40 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, FROM_ADDR_WS=2.999, FROM_WSP_TRAIL=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhang Chen , Lukas Straub , qemu-dev , Zhang Chen Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Lukas Straub The chr_out chardev is connected to a filter-redirector running in the main loop. qemu_chr_fe_write_all might block here in compare_chr_send if the (socket-)buffer is full. If another filter-redirector in the main loop want's to send data to chr_pri_in it might also block if the buffer is full. This leads to a deadlock because both event loops get blocked. Fix this by converting compare_chr_send to a coroutine and putting the packets in a send queue. Signed-off-by: Lukas Straub Reviewed-by: Zhang Chen Tested-by: Zhang Chen Signed-off-by: Zhang Chen --- net/colo-compare.c | 193 ++++++++++++++++++++++++++++++++++----------- net/colo.c | 7 ++ net/colo.h | 1 + 3 files changed, 156 insertions(+), 45 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index 2edfa31f6a..fe557b4693 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -32,6 +32,9 @@ #include "migration/migration.h" #include "util.h" +#include "block/aio-wait.h" +#include "qemu/coroutine.h" + #define TYPE_COLO_COMPARE "colo-compare" #define COLO_COMPARE(obj) \ OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE) @@ -77,6 +80,23 @@ static int event_unhandled_count; * |packet | |packet + |packet | |packet + * +--------+ +--------+ +--------+ +--------+ */ + +typedef struct SendCo { + Coroutine *co; + struct CompareState *s; + CharBackend *chr; + GQueue send_list; + bool notify_remote_frame; + bool done; + int ret; +} SendCo; + +typedef struct SendEntry { + uint32_t size; + uint32_t vnet_hdr_len; + uint8_t *buf; +} SendEntry; + typedef struct CompareState { Object parent; @@ -91,6 +111,8 @@ typedef struct CompareState { SocketReadState pri_rs; SocketReadState sec_rs; SocketReadState notify_rs; + SendCo out_sendco; + SendCo notify_sendco; bool vnet_hdr; uint32_t compare_timeout; uint32_t expired_scan_cycle; @@ -128,10 +150,11 @@ static const char *colo_mode[] = { }; static int compare_chr_send(CompareState *s, - const uint8_t *buf, + uint8_t *buf, uint32_t size, uint32_t vnet_hdr_len, - bool notify_remote_frame); + bool notify_remote_frame, + bool zero_copy); static bool packet_matches_str(const char *str, const uint8_t *buf, @@ -149,7 +172,7 @@ static void notify_remote_frame(CompareState *s) char msg[] = "DO_CHECKPOINT"; int ret = 0; - ret = compare_chr_send(s, (uint8_t *)msg, strlen(msg), 0, true); + ret = compare_chr_send(s, (uint8_t *)msg, strlen(msg), 0, true, false); if (ret < 0) { error_report("Notify Xen COLO-frame failed"); } @@ -279,12 +302,13 @@ static void colo_release_primary_pkt(CompareState *s, Packet *pkt) pkt->data, pkt->size, pkt->vnet_hdr_len, - false); + false, + true); if (ret < 0) { error_report("colo send primary packet failed"); } trace_colo_compare_main("packet same and release packet"); - packet_destroy(pkt, NULL); + packet_destroy_partial(pkt, NULL); } /* @@ -706,65 +730,115 @@ static void colo_compare_connection(void *opaque, void *user_data) } } -static int compare_chr_send(CompareState *s, - const uint8_t *buf, - uint32_t size, - uint32_t vnet_hdr_len, - bool notify_remote_frame) +static void coroutine_fn _compare_chr_send(void *opaque) { + SendCo *sendco = opaque; + CompareState *s = sendco->s; int ret = 0; - uint32_t len = htonl(size); - if (!size) { - return 0; - } + while (!g_queue_is_empty(&sendco->send_list)) { + SendEntry *entry = g_queue_pop_tail(&sendco->send_list); + uint32_t len = htonl(entry->size); - if (notify_remote_frame) { - ret = qemu_chr_fe_write_all(&s->chr_notify_dev, - (uint8_t *)&len, - sizeof(len)); - } else { - ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len)); - } + ret = qemu_chr_fe_write_all(sendco->chr, (uint8_t *)&len, sizeof(len)); - if (ret != sizeof(len)) { - goto err; - } + if (ret != sizeof(len)) { + g_free(entry->buf); + g_slice_free(SendEntry, entry); + goto err; + } - if (s->vnet_hdr) { - /* - * We send vnet header len make other module(like filter-redirector) - * know how to parse net packet correctly. - */ - len = htonl(vnet_hdr_len); + if (!sendco->notify_remote_frame && s->vnet_hdr) { + /* + * We send vnet header len make other module(like filter-redirector) + * know how to parse net packet correctly. + */ + len = htonl(entry->vnet_hdr_len); - if (!notify_remote_frame) { - ret = qemu_chr_fe_write_all(&s->chr_out, + ret = qemu_chr_fe_write_all(sendco->chr, (uint8_t *)&len, sizeof(len)); + + if (ret != sizeof(len)) { + g_free(entry->buf); + g_slice_free(SendEntry, entry); + goto err; + } } - if (ret != sizeof(len)) { + ret = qemu_chr_fe_write_all(sendco->chr, + (uint8_t *)entry->buf, + entry->size); + + if (ret != entry->size) { + g_free(entry->buf); + g_slice_free(SendEntry, entry); goto err; } + + g_free(entry->buf); + g_slice_free(SendEntry, entry); } + sendco->ret = 0; + goto out; + +err: + while (!g_queue_is_empty(&sendco->send_list)) { + SendEntry *entry = g_queue_pop_tail(&sendco->send_list); + g_free(entry->buf); + g_slice_free(SendEntry, entry); + } + sendco->ret = ret < 0 ? ret : -EIO; +out: + sendco->co = NULL; + sendco->done = true; + aio_wait_kick(); +} + +static int compare_chr_send(CompareState *s, + uint8_t *buf, + uint32_t size, + uint32_t vnet_hdr_len, + bool notify_remote_frame, + bool zero_copy) +{ + SendCo *sendco; + SendEntry *entry; + if (notify_remote_frame) { - ret = qemu_chr_fe_write_all(&s->chr_notify_dev, - (uint8_t *)buf, - size); + sendco = &s->notify_sendco; } else { - ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size); + sendco = &s->out_sendco; } - if (ret != size) { - goto err; + if (!size) { + return 0; } - return 0; + entry = g_slice_new(SendEntry); + entry->size = size; + entry->vnet_hdr_len = vnet_hdr_len; + if (zero_copy) { + entry->buf = buf; + } else { + entry->buf = g_malloc(size); + memcpy(entry->buf, buf, size); + } + g_queue_push_head(&sendco->send_list, entry); + + if (sendco->done) { + sendco->co = qemu_coroutine_create(_compare_chr_send, sendco); + sendco->done = false; + qemu_coroutine_enter(sendco->co); + if (sendco->done) { + /* report early errors */ + return sendco->ret; + } + } -err: - return ret < 0 ? ret : -EIO; + /* assume success */ + return 0; } static int compare_chr_can_read(void *opaque) @@ -1070,6 +1144,7 @@ static void compare_pri_rs_finalize(SocketReadState *pri_rs) pri_rs->buf, pri_rs->packet_len, pri_rs->vnet_hdr_len, + false, false); } else { /* compare packet in the specified connection */ @@ -1100,7 +1175,7 @@ static void compare_notify_rs_finalize(SocketReadState *notify_rs) if (packet_matches_str("COLO_USERSPACE_PROXY_INIT", notify_rs->buf, notify_rs->packet_len)) { - ret = compare_chr_send(s, (uint8_t *)msg, strlen(msg), 0, true); + ret = compare_chr_send(s, (uint8_t *)msg, strlen(msg), 0, true, false); if (ret < 0) { error_report("Notify Xen COLO-frame INIT failed"); } @@ -1206,6 +1281,20 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp) QTAILQ_INSERT_TAIL(&net_compares, s, next); + s->out_sendco.s = s; + s->out_sendco.chr = &s->chr_out; + s->out_sendco.notify_remote_frame = false; + s->out_sendco.done = true; + g_queue_init(&s->out_sendco.send_list); + + if (s->notify_dev) { + s->notify_sendco.s = s; + s->notify_sendco.chr = &s->chr_notify_dev; + s->notify_sendco.notify_remote_frame = true; + s->notify_sendco.done = true; + g_queue_init(&s->notify_sendco.send_list); + } + g_queue_init(&s->conn_list); qemu_mutex_init(&event_mtx); @@ -1232,8 +1321,9 @@ static void colo_flush_packets(void *opaque, void *user_data) pkt->data, pkt->size, pkt->vnet_hdr_len, - false); - packet_destroy(pkt, NULL); + false, + true); + packet_destroy_partial(pkt, NULL); } while (!g_queue_is_empty(&conn->secondary_list)) { pkt = g_queue_pop_head(&conn->secondary_list); @@ -1304,10 +1394,23 @@ static void colo_compare_finalize(Object *obj) } } + AioContext *ctx = iothread_get_aio_context(s->iothread); + aio_context_acquire(ctx); + AIO_WAIT_WHILE(ctx, !s->out_sendco.done); + if (s->notify_dev) { + AIO_WAIT_WHILE(ctx, !s->notify_sendco.done); + } + aio_context_release(ctx); + /* Release all unhandled packets after compare thead exited */ g_queue_foreach(&s->conn_list, colo_flush_packets, s); + AIO_WAIT_WHILE(NULL, !s->out_sendco.done); g_queue_clear(&s->conn_list); + g_queue_clear(&s->out_sendco.send_list); + if (s->notify_dev) { + g_queue_clear(&s->notify_sendco.send_list); + } if (s->connection_track_table) { g_hash_table_destroy(s->connection_track_table); diff --git a/net/colo.c b/net/colo.c index 8196b35837..a6c66d829a 100644 --- a/net/colo.c +++ b/net/colo.c @@ -185,6 +185,13 @@ void packet_destroy(void *opaque, void *user_data) g_slice_free(Packet, pkt); } +void packet_destroy_partial(void *opaque, void *user_data) +{ + Packet *pkt = opaque; + + g_slice_free(Packet, pkt); +} + /* * Clear hashtable, stop this hash growing really huge */ diff --git a/net/colo.h b/net/colo.h index 679314b1ca..573ab91785 100644 --- a/net/colo.h +++ b/net/colo.h @@ -102,5 +102,6 @@ bool connection_has_tracked(GHashTable *connection_track_table, void connection_hashtable_reset(GHashTable *connection_track_table); Packet *packet_new(const void *data, int size, int vnet_hdr_len); void packet_destroy(void *opaque, void *user_data); +void packet_destroy_partial(void *opaque, void *user_data); #endif /* NET_COLO_H */ From patchwork Tue May 19 20:02:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Zhang, Chen" X-Patchwork-Id: 282328 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.0 required=3.0 tests=FROM_WSP_TRAIL, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNWANTED_LANGUAGE_BODY, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6D2DC433E0 for ; Tue, 19 May 2020 20:16:20 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B14AD2070A for ; Tue, 19 May 2020 20:16:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B14AD2070A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:40478 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jb8ex-0002oB-Uv for qemu-devel@archiver.kernel.org; Tue, 19 May 2020 16:16:19 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57688) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jb8be-0004JA-Py for qemu-devel@nongnu.org; Tue, 19 May 2020 16:12:54 -0400 Received: from mga01.intel.com ([192.55.52.88]:59955) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jb8bd-0008AC-Tl for qemu-devel@nongnu.org; Tue, 19 May 2020 16:12:54 -0400 IronPort-SDR: AX7R+Yxf+yBdXik1Co1OV5oHJKKzBtKwdAvYk+v4e/4sarKo0Br2hDKlfqk6FPYQKuYxBSx/OE exxBU13dJ23Q== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 May 2020 13:12:48 -0700 IronPort-SDR: UpiRzud2ATa82uMMgxuQC/qXqdgCyDdepwiTfrJP0BPFXG0w+nsMgtV52p7a5kdcvU/QZCR9jT iqMWdj1OzduQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,410,1583222400"; d="scan'208";a="268007878" Received: from unknown (HELO localhost.localdomain) ([10.239.13.19]) by orsmga006.jf.intel.com with ESMTP; 19 May 2020 13:12:46 -0700 From: Zhang Chen To: Jason Wang Subject: [PATCH 5/7] net/colo-compare.c: Only hexdump packets if tracing is enabled Date: Wed, 20 May 2020 04:02:05 +0800 Message-Id: <20200519200207.17773-6-chen.zhang@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200519200207.17773-1-chen.zhang@intel.com> References: <20200519200207.17773-1-chen.zhang@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=192.55.52.88; envelope-from=chen.zhang@intel.com; helo=mga01.intel.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/19 16:12:40 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, FROM_ADDR_WS=2.999, FROM_WSP_TRAIL=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhang Chen , Lukas Straub , qemu-dev , Zhang Chen Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Lukas Straub Else the log will be flooded if there is a lot of network traffic. Signed-off-by: Lukas Straub Reviewed-by: Zhang Chen Reviewed-by: Philippe Mathieu-Daudé Tested-by: Philippe Mathieu-Daudé Signed-off-by: Zhang Chen --- net/colo-compare.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index fe557b4693..db536c9419 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -490,10 +490,12 @@ sec: g_queue_push_head(&conn->primary_list, ppkt); g_queue_push_head(&conn->secondary_list, spkt); - qemu_hexdump((char *)ppkt->data, stderr, - "colo-compare ppkt", ppkt->size); - qemu_hexdump((char *)spkt->data, stderr, - "colo-compare spkt", spkt->size); + if (trace_event_get_state_backends(TRACE_COLO_COMPARE_MISCOMPARE)) { + qemu_hexdump((char *)ppkt->data, stderr, + "colo-compare ppkt", ppkt->size); + qemu_hexdump((char *)spkt->data, stderr, + "colo-compare spkt", spkt->size); + } colo_compare_inconsistency_notify(s); } From patchwork Tue May 19 20:02:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Zhang, Chen" X-Patchwork-Id: 282330 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=FROM_WSP_TRAIL, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9B24C433DF for ; Tue, 19 May 2020 20:13:43 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 831C6207E8 for ; Tue, 19 May 2020 20:13:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 831C6207E8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:57054 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jb8cQ-0006Ec-M7 for qemu-devel@archiver.kernel.org; Tue, 19 May 2020 16:13:42 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57692) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jb8bf-0004Jq-4o for qemu-devel@nongnu.org; Tue, 19 May 2020 16:12:55 -0400 Received: from mga01.intel.com ([192.55.52.88]:59953) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jb8be-0008A4-1q for qemu-devel@nongnu.org; Tue, 19 May 2020 16:12:54 -0400 IronPort-SDR: nCN07tMGBimySkUeawXcH3VMuRIVB4342j8O8VFn8WSB3AEJVgha4BwdzVMlf8mEUM4pT4LI1F yc5UPbV68KEA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 May 2020 13:12:50 -0700 IronPort-SDR: gd48eiiwx+sIarx8zolZMyzpCNnpFK8mBej0mMgIdFGf55DV90WpKRjUIGcvqfqVX05YKvYyZo 2PLcgXSKEDXQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,410,1583222400"; d="scan'208";a="268007885" Received: from unknown (HELO localhost.localdomain) ([10.239.13.19]) by orsmga006.jf.intel.com with ESMTP; 19 May 2020 13:12:48 -0700 From: Zhang Chen To: Jason Wang Subject: [PATCH 6/7] net/colo-compare.c, softmmu/vl.c: Check that colo-compare is active Date: Wed, 20 May 2020 04:02:06 +0800 Message-Id: <20200519200207.17773-7-chen.zhang@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200519200207.17773-1-chen.zhang@intel.com> References: <20200519200207.17773-1-chen.zhang@intel.com> Received-SPF: pass client-ip=192.55.52.88; envelope-from=chen.zhang@intel.com; helo=mga01.intel.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/19 16:12:40 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, FROM_ADDR_WS=2.999, FROM_WSP_TRAIL=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhang Chen , Lukas Straub , qemu-dev , Zhang Chen Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Lukas Straub If the colo-compare object is removed before failover and a checkpoint happens, qemu crashes because it tries to lock the destroyed event_mtx in colo_notify_compares_event. Fix this by checking if everything is initialized by introducing a new variable colo_compare_active which is protected by a new mutex colo_compare_mutex. The new mutex also protects against concurrent access of the net_compares list and makes sure that colo_notify_compares_event isn't active while we destroy event_mtx and event_complete_cond. With this it also is again possible to use colo without colo-compare (periodic mode) and to use multiple colo-compare for multiple network interfaces. Signed-off-by: Lukas Straub Reviewed-by: Zhang Chen Signed-off-by: Zhang Chen --- net/colo-compare.c | 35 +++++++++++++++++++++++++++++------ net/colo-compare.h | 1 + softmmu/vl.c | 2 ++ 3 files changed, 32 insertions(+), 6 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index db536c9419..1ee8b9dc3c 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -54,6 +54,8 @@ static NotifierList colo_compare_notifiers = #define REGULAR_PACKET_CHECK_MS 3000 #define DEFAULT_TIME_OUT_MS 3000 +static QemuMutex colo_compare_mutex; +static bool colo_compare_active; static QemuMutex event_mtx; static QemuCond event_complete_cond; static int event_unhandled_count; @@ -913,6 +915,12 @@ static void check_old_packet_regular(void *opaque) void colo_notify_compares_event(void *opaque, int event, Error **errp) { CompareState *s; + qemu_mutex_lock(&colo_compare_mutex); + + if (!colo_compare_active) { + qemu_mutex_unlock(&colo_compare_mutex); + return; + } qemu_mutex_lock(&event_mtx); QTAILQ_FOREACH(s, &net_compares, next) { @@ -926,6 +934,7 @@ void colo_notify_compares_event(void *opaque, int event, Error **errp) } qemu_mutex_unlock(&event_mtx); + qemu_mutex_unlock(&colo_compare_mutex); } static void colo_compare_timer_init(CompareState *s) @@ -1281,7 +1290,14 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp) s->vnet_hdr); } + qemu_mutex_lock(&colo_compare_mutex); + if (!colo_compare_active) { + qemu_mutex_init(&event_mtx); + qemu_cond_init(&event_complete_cond); + colo_compare_active = true; + } QTAILQ_INSERT_TAIL(&net_compares, s, next); + qemu_mutex_unlock(&colo_compare_mutex); s->out_sendco.s = s; s->out_sendco.chr = &s->chr_out; @@ -1299,9 +1315,6 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp) g_queue_init(&s->conn_list); - qemu_mutex_init(&event_mtx); - qemu_cond_init(&event_complete_cond); - s->connection_track_table = g_hash_table_new_full(connection_key_hash, connection_key_equal, g_free, @@ -1389,12 +1402,19 @@ static void colo_compare_finalize(Object *obj) qemu_bh_delete(s->event_bh); + qemu_mutex_lock(&colo_compare_mutex); QTAILQ_FOREACH(tmp, &net_compares, next) { if (tmp == s) { QTAILQ_REMOVE(&net_compares, s, next); break; } } + if (QTAILQ_EMPTY(&net_compares)) { + colo_compare_active = false; + qemu_mutex_destroy(&event_mtx); + qemu_cond_destroy(&event_complete_cond); + } + qemu_mutex_unlock(&colo_compare_mutex); AioContext *ctx = iothread_get_aio_context(s->iothread); aio_context_acquire(ctx); @@ -1422,15 +1442,18 @@ static void colo_compare_finalize(Object *obj) object_unref(OBJECT(s->iothread)); } - qemu_mutex_destroy(&event_mtx); - qemu_cond_destroy(&event_complete_cond); - g_free(s->pri_indev); g_free(s->sec_indev); g_free(s->outdev); g_free(s->notify_dev); } +void colo_compare_init_globals(void) +{ + colo_compare_active = false; + qemu_mutex_init(&colo_compare_mutex); +} + static const TypeInfo colo_compare_info = { .name = TYPE_COLO_COMPARE, .parent = TYPE_OBJECT, diff --git a/net/colo-compare.h b/net/colo-compare.h index 22ddd512e2..eb483ac586 100644 --- a/net/colo-compare.h +++ b/net/colo-compare.h @@ -17,6 +17,7 @@ #ifndef QEMU_COLO_COMPARE_H #define QEMU_COLO_COMPARE_H +void colo_compare_init_globals(void); void colo_notify_compares_event(void *opaque, int event, Error **errp); void colo_compare_register_notifier(Notifier *notify); void colo_compare_unregister_notifier(Notifier *notify); diff --git a/softmmu/vl.c b/softmmu/vl.c index ae5451bc23..81602c12b5 100644 --- a/softmmu/vl.c +++ b/softmmu/vl.c @@ -112,6 +112,7 @@ #include "qapi/qmp/qerror.h" #include "sysemu/iothread.h" #include "qemu/guest-random.h" +#include "net/colo-compare.h" #define MAX_VIRTIO_CONSOLES 1 @@ -2906,6 +2907,7 @@ void qemu_init(int argc, char **argv, char **envp) precopy_infrastructure_init(); postcopy_infrastructure_init(); monitor_init_globals(); + colo_compare_init_globals(); if (qcrypto_init(&err) < 0) { error_reportf_err(err, "cannot initialize crypto: ");