From patchwork Thu Sep 3 15:26:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 274667 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7914C433E2 for ; Thu, 3 Sep 2020 15:28:12 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6DC3F2072A for ; Thu, 3 Sep 2020 15:28:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NmBA/tLw" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6DC3F2072A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:35452 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kDr9n-0001vp-Io for qemu-devel@archiver.kernel.org; Thu, 03 Sep 2020 11:28:11 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:50764) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kDr8Z-00005P-QS for qemu-devel@nongnu.org; Thu, 03 Sep 2020 11:26:55 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:47413) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1kDr8X-0000D2-Hy for qemu-devel@nongnu.org; Thu, 03 Sep 2020 11:26:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1599146811; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Ela2dBsFTykng41E7aq8ufboQddcW9Vw3rt4nAA9llc=; b=NmBA/tLwcGlf7+DolVNF+vjFC2O1T/B4z1OKNvJQIpcS+f1+nxEXXnXNSJYrWA7t7gtRNq LlE0siZqXIyvaboaiUEC6+2n8513sWTCmTHNEcVsws4r2QEkqwY+O/ejdrrNTFdxbTGyog hh3jX1dz5a+9KiyhO5qSJjDEA1djY8k= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-421-PzFLLFYNPt2NdoVYgWtc3A-1; Thu, 03 Sep 2020 11:26:50 -0400 X-MC-Unique: PzFLLFYNPt2NdoVYgWtc3A-1 Received: by mail-qt1-f198.google.com with SMTP id p43so2316856qtb.23 for ; Thu, 03 Sep 2020 08:26:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=j3SM1TLg7mC0rr1rHdWGbz6H72rDytERpF/kUaa062o=; b=gVaopB5+FZxZTsjwCPX5wKSQTsOhRKhLYubo/VtGHAdbnvaaNOT3Fv7x4uB4hqtUtx R7zb2tK3aWzV235gOfKhzR0NcHlXLVPbB7rxLJm9LPiAUImFNaGbkkenDBHW55gAnBzN f5gRlL8YW3tXL1PXuKoT5yPkrerJoX5KdF8Apvc400uClO3reNuo7ah2tF1BHm2mGuOG yB4lGdhRG7PVbjJJzCN8EAs/l28bxJWYyKiPrb7axChzRFLUCTrQUEfxsZqACh4qTJNs m5a0PU8cNM8fXDZkljpPwbzuPFXpgc4hFukdhkknbEewOFmFWvxIY9QuVS6T1g40vEzH kgIw== X-Gm-Message-State: AOAM5308Unkhq4jKhiy5lSOcSfk9UNVPrsoufe7F8UZnriltYs7DYFpi tclDLPf0dHwJ3pvy2Q+86BHjkd04g843p0Gr8OjTAgFlwL6NL0AS/Jk2MRpQvMds+EI6pEq1BP+ HmUWW1JKWg3+4HBM= X-Received: by 2002:a0c:f704:: with SMTP id w4mr3405823qvn.79.1599146809262; Thu, 03 Sep 2020 08:26:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzie40GufkAiG2wB9/ghcnWlZjQAdKItcDtuKIH89E1hW3AauedPSR02MHTKsYN86G8RIO61Q== X-Received: by 2002:a0c:f704:: with SMTP id w4mr3405794qvn.79.1599146808945; Thu, 03 Sep 2020 08:26:48 -0700 (PDT) Received: from xz-x1.redhat.com (bras-vprn-toroon474qw-lp130-11-70-53-122-15.dsl.bell.ca. [70.53.122.15]) by smtp.gmail.com with ESMTPSA id l38sm2319889qtl.58.2020.09.03.08.26.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Sep 2020 08:26:47 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Subject: [PATCH 0/5] migration/postcopy: Sync faulted addresses after network recovered Date: Thu, 3 Sep 2020 11:26:41 -0400 Message-Id: <20200903152646.93336-1-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0.003 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=216.205.24.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/03 01:47:17 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Xiaohui Li , "Dr . David Alan Gilbert" , peterx@redhat.com, Juan Quintela Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" We've seen conditional guest hangs on destination VM after postcopy recovered. However the hang will resolve itself after a few minutes. The problem is: after a postcopy recovery, the prioritized postcopy queue on the source VM is actually missing. So all the faulted threads before the postcopy recovery happened will keep halted until (accidentally) the page got copied by the background precopy migration stream. The solution is to also refresh this information after postcopy recovery. To achieve this, we need to maintain a list of faulted addresses on the destination node, so that we can resend the list when necessary. This work is done via patch 1-4. With that, the last thing we need to do is to send this extra information to source VM after recovered. Very luckily, this synchronization can be "emulated" by sending a bunch of page requests (although these pages have been sent previously!) to source VM just like when we've got a page fault. Even in the 1st version of the postcopy code we'll handle duplicated pages well. So this fix does not even need a new capability bit and it'll work smoothly on old QEMUs when we migrate from them to the new QEMUs. Please review, thanks. Peter Xu (5): migration: Rework migrate_send_rp_req_pages() function migration: Introduce migrate_send_rp_message_req_pages() migration: Pass incoming state into qemu_ufd_copy_ioctl() migration: Maintain postcopy faulted addresses migration: Sync requested pages after postcopy recovery migration/migration.c | 71 +++++++++++++++++++++++++++++++++++----- migration/migration.h | 23 +++++++++++-- migration/postcopy-ram.c | 46 +++++++++++--------------- migration/savevm.c | 56 +++++++++++++++++++++++++++++++ migration/trace-events | 3 ++ 5 files changed, 163 insertions(+), 36 deletions(-) -- 2.26.2 Reviewed-by: Dr. David Alan Gilbert