From patchwork Mon Sep 21 16:31:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "gregkh@linuxfoundation.org" X-Patchwork-Id: 263638 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 450EEC43469 for ; Mon, 21 Sep 2020 16:52:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1A8BB2076E for ; Mon, 21 Sep 2020 16:52:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600707155; bh=gGDHczpz8r2zceftid4pkqX/Ns6RtjqZI6MZyWdcknA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=2M3wpDdpgCknVtVIP2nw92T7ldwe6YRAOrryiXUGgqgiZOhqgM/Uit1LdORKU/wYI n0ke8rsAQBRAfZ9bT3hMdRPR0IHB5z5Rg3a/DKhXgiuuWMlRoHgp9gnzPFVIeT5K9J M0eO7h2rWRFmTdxWWZbXiCwZc7TYYfYs/8AFMq7A= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729880AbgIUQtG (ORCPT ); Mon, 21 Sep 2020 12:49:06 -0400 Received: from mail.kernel.org ([198.145.29.99]:56658 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729867AbgIUQs4 (ORCPT ); Mon, 21 Sep 2020 12:48:56 -0400 Received: from localhost (83-86-74-64.cable.dynamic.v4.ziggo.nl [83.86.74.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 451652396F; Mon, 21 Sep 2020 16:48:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600706935; bh=gGDHczpz8r2zceftid4pkqX/Ns6RtjqZI6MZyWdcknA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=JZCB8JGoQ7NsFAjcKWdaaHtNCuCUjvROeXQrmAtjrqeA7a2ipF0R3eFKXlHIStqOu P/7cAPfEsfiDyXUARPpDHGOpibevoYjszg+utjz94dn4F/eKi9HSwHKzXbLy4s5WJR B/5FjP0QW2hKnC/G0sBBVwKgGAL5n49rBvdOSxJM= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Dexuan Cui , Michael Kelley , Wei Liu , Sasha Levin Subject: [PATCH 5.4 34/72] Drivers: hv: vmbus: hibernation: do not hang forever in vmbus_bus_resume() Date: Mon, 21 Sep 2020 18:31:13 +0200 Message-Id: <20200921163123.491352784@linuxfoundation.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200921163121.870386357@linuxfoundation.org> References: <20200921163121.870386357@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Dexuan Cui [ Upstream commit 19873eec7e13fda140a0ebc75d6664e57c00bfb1 ] After we Stop and later Start a VM that uses Accelerated Networking (NIC SR-IOV), currently the VF vmbus device's Instance GUID can change, so after vmbus_bus_resume() -> vmbus_request_offers(), vmbus_onoffer() can not find the original vmbus channel of the VF, and hence we can't complete() vmbus_connection.ready_for_resume_event in check_ready_for_resume_event(), and the VM hangs in vmbus_bus_resume() forever. Fix the issue by adding a timeout, so the resuming can still succeed, and the saved state is not lost, and according to my test, the user can disable Accelerated Networking and then will be able to SSH into the VM for further recovery. Also prevent the VM in question from suspending again. The host will be fixed so in future the Instance GUID will stay the same across hibernation. Fixes: d8bd2d442bb2 ("Drivers: hv: vmbus: Resume after fixing up old primary channels") Signed-off-by: Dexuan Cui Reviewed-by: Michael Kelley Link: https://lore.kernel.org/r/20200905025555.45614-1-decui@microsoft.com Signed-off-by: Wei Liu Signed-off-by: Sasha Levin --- drivers/hv/vmbus_drv.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 24c38e44ed3bc..2d2568dac2a66 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -2231,7 +2231,10 @@ static int vmbus_bus_suspend(struct device *dev) if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0) wait_for_completion(&vmbus_connection.ready_for_suspend_event); - WARN_ON(atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0); + if (atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0) { + pr_err("Can not suspend due to a previous failed resuming\n"); + return -EBUSY; + } mutex_lock(&vmbus_connection.channel_mutex); @@ -2305,7 +2308,9 @@ static int vmbus_bus_resume(struct device *dev) vmbus_request_offers(); - wait_for_completion(&vmbus_connection.ready_for_resume_event); + if (wait_for_completion_timeout( + &vmbus_connection.ready_for_resume_event, 10 * HZ) == 0) + pr_err("Some vmbus device is missing after suspending?\n"); /* Reset the event for the next suspend. */ reinit_completion(&vmbus_connection.ready_for_suspend_event);