From patchwork Mon Jun 27 13:53:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Laurent Dufour X-Patchwork-Id: 585425 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4AB1BC43334 for ; Mon, 27 Jun 2022 13:54:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236420AbiF0Nyb (ORCPT ); Mon, 27 Jun 2022 09:54:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236376AbiF0Ny3 (ORCPT ); Mon, 27 Jun 2022 09:54:29 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 130C26410; Mon, 27 Jun 2022 06:54:29 -0700 (PDT) Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25RDN0Ok003909; Mon, 27 Jun 2022 13:53:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=67UygGYGf07pEc+brtQSixcAB93afqPeEVm3JWtgZJ0=; b=clXhOhyrBidlGNqht7ose0Q/ZuQ5uSoYb18IyS8C5dG9psWqmSv9cRskKVDSrpEcQsoU 4qRmA+27bwzQWiYHC/3/O+Z0DwAJxPwHLdO4TROb2MAzykqm2dLmmlozeHiLLFE+DzeT ZP7Y13Woa6SDp4vhaj+0Nx8DXnOBvE3dFTGS9eM1tFNuPvieA1/5ZyFBBIzO1yJE7czk tOau5q6MC/x7Fg59qeMNJmD57NMyW6Z6mg+SWFBVSw3zKGvFzV2JLyiUxqOlNAuwUsx/ efKESalzY9R8W9/EHJoM2q5gjslqv9MeLoZ2YlBHDfVG2jRAcOJvNHJqndFVpS6y0IOb Zg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3gydcv8vpp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Jun 2022 13:53:55 +0000 Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 25RDrJml018788; Mon, 27 Jun 2022 13:53:54 GMT Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3gydcv8vn8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Jun 2022 13:53:54 +0000 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 25RDocMu016916; Mon, 27 Jun 2022 13:53:52 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma02fra.de.ibm.com with ESMTP id 3gwt08tasq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Jun 2022 13:53:52 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 25RDrmd112583268 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 27 Jun 2022 13:53:49 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E08F9AE051; Mon, 27 Jun 2022 13:53:48 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6772FAE045; Mon, 27 Jun 2022 13:53:48 +0000 (GMT) Received: from pomme.tlslab.ibm.com (unknown [9.101.4.33]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 27 Jun 2022 13:53:48 +0000 (GMT) From: Laurent Dufour To: wim@linux-watchdog.org, linux@roeck-us.net, mpe@ellerman.id.au, benh@kernel.crashing.org, paulus@samba.org, nathanl@linux.ibm.com, haren@linux.vnet.ibm.com, npiggin@gmail.com Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-watchdog@vger.kernel.org Subject: [PATCH v3 1/4] powerpc/mobility: wait for memory transfer to complete Date: Mon, 27 Jun 2022 15:53:44 +0200 Message-Id: <20220627135347.32624-2-ldufour@linux.ibm.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220627135347.32624-1-ldufour@linux.ibm.com> References: <20220627135347.32624-1-ldufour@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: tKZzxeAgc2eR1r6hcefJLj1ZgdKtJxkm X-Proofpoint-GUID: nVTSqCxsugdSdiw8pPPP8wB9BjKPByIg X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-06-27_06,2022-06-24_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 bulkscore=0 adultscore=0 spamscore=0 mlxscore=0 clxscore=1015 phishscore=0 mlxlogscore=976 suspectscore=0 malwarescore=0 impostorscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2204290000 definitions=main-2206270059 Precedence: bulk List-ID: X-Mailing-List: linux-watchdog@vger.kernel.org In pseries_migration_partition(), loop until the memory transfer is complete. This way the calling drmgr process will not exit earlier, allowing callbacks to be run only once the migration is fully completed. If reading the VASI state is done after the hypervisor has completed the migration, the HCALL is returning H_PARAMETER. We can safely assume that the memory transfer is achieved if this happens. This will also allow to manage the NMI watchdog state in the next commits. Reviewed-by: Nathan Lynch Signed-off-by: Laurent Dufour --- arch/powerpc/platforms/pseries/mobility.c | 42 +++++++++++++++++++++-- 1 file changed, 40 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index 78f3f74c7056..907a779074d6 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -427,6 +427,43 @@ static int wait_for_vasi_session_suspending(u64 handle) return ret; } +static void wait_for_vasi_session_completed(u64 handle) +{ + unsigned long state = 0; + int ret; + + pr_info("waiting for memory transfert to complete...\n"); + + /* + * Wait for transition from H_VASI_RESUMED to H_VASI_COMPLETED. + */ + while (true) { + ret = poll_vasi_state(handle, &state); + + /* + * If the memory transfer is already complete and the migration + * has been cleaned up by the hypervisor, H_PARAMETER is return, + * which is translate in EINVAL by poll_vasi_state(). + */ + if (ret == -EINVAL || (!ret && state == H_VASI_COMPLETED)) { + pr_info("memory transfert completed.\n"); + break; + } + + if (ret) { + pr_err("H_VASI_STATE return error (%d)\n", ret); + break; + } + + if (state != H_VASI_RESUMED) { + pr_err("unexpected H_VASI_STATE result %lu\n", state); + break; + } + + msleep(500); + } +} + static void prod_single(unsigned int target_cpu) { long hvrc; @@ -673,9 +710,10 @@ static int pseries_migrate_partition(u64 handle) vas_migration_handler(VAS_SUSPEND); ret = pseries_suspend(handle); - if (ret == 0) + if (ret == 0) { post_mobility_fixup(); - else + wait_for_vasi_session_completed(handle); + } else pseries_cancel_migration(handle, ret); vas_migration_handler(VAS_RESUME); From patchwork Mon Jun 27 13:53:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Laurent Dufour X-Patchwork-Id: 585804 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0708C43334 for ; Mon, 27 Jun 2022 13:54:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236426AbiF0Nyf (ORCPT ); Mon, 27 Jun 2022 09:54:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57214 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236436AbiF0Nyd (ORCPT ); Mon, 27 Jun 2022 09:54:33 -0400 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8DD5EA1AF; Mon, 27 Jun 2022 06:54:30 -0700 (PDT) Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25RDhNtm004842; Mon, 27 Jun 2022 13:53:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=aL2N43eAIRSKLPFSUp+oRPJIvcuNlMiwt1RDl0sYTXo=; b=Ggj9KE0jNcmHTa2ZF72h277oOMgpFE3V1OWtz4ltEA5wS0I504Jk/69GG+dVJiFO+U2O W+GnO3QEYZmMMqBnA88mZVnIWz9a9IgLF/9gog1snMWzxEnq4YPScfZBHAdnMzXuT2DF tI8WTgkqhC5C3kk1zoxk+q902CIYVifnSDvz+0gPp1kERRpBsK9rzkW4cLAzFULFndIU 4nKWzi59gXa67drdx0NGieoTxZitBK6itboAFPVn8fwqjnjacV4h5de14UUqZDOJC217 Cv3K31L2+zhQvNTRSkqi6Fn1AUQloB9z6lByI57e4A7tLBqlugH7OA2MvESe2c6M6UJJ KA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3gydpa08v2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Jun 2022 13:53:55 +0000 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 25RDl5eP022588; Mon, 27 Jun 2022 13:53:54 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3gydpa08ua-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Jun 2022 13:53:54 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 25RDocEO004786; Mon, 27 Jun 2022 13:53:52 GMT Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by ppma04ams.nl.ibm.com with ESMTP id 3gwt08u13u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Jun 2022 13:53:52 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 25RDrn2717236380 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 27 Jun 2022 13:53:49 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7BAF7AE04D; Mon, 27 Jun 2022 13:53:49 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EFB13AE045; Mon, 27 Jun 2022 13:53:48 +0000 (GMT) Received: from pomme.tlslab.ibm.com (unknown [9.101.4.33]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 27 Jun 2022 13:53:48 +0000 (GMT) From: Laurent Dufour To: wim@linux-watchdog.org, linux@roeck-us.net, mpe@ellerman.id.au, benh@kernel.crashing.org, paulus@samba.org, nathanl@linux.ibm.com, haren@linux.vnet.ibm.com, npiggin@gmail.com Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-watchdog@vger.kernel.org, Christoph Hellwig Subject: [PATCH v3 2/4] watchdog: export lockup_detector_reconfigure Date: Mon, 27 Jun 2022 15:53:45 +0200 Message-Id: <20220627135347.32624-3-ldufour@linux.ibm.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220627135347.32624-1-ldufour@linux.ibm.com> References: <20220627135347.32624-1-ldufour@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 51LUuHDzlmcy6sYwoMkXpQqQa62TEkAJ X-Proofpoint-GUID: R2lfIlRFJ8DeQzPd45U8IyAo1jb5DcfF X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-06-27_06,2022-06-24_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1011 malwarescore=0 suspectscore=0 bulkscore=0 priorityscore=1501 spamscore=0 impostorscore=0 lowpriorityscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2204290000 definitions=main-2206270059 Precedence: bulk List-ID: X-Mailing-List: linux-watchdog@vger.kernel.org In some circumstances it may be interesting to reconfigure the watchdog from inside the kernel. On PowerPC, this may helpful before and after a LPAR migration (LPM) is initiated, because it implies some latencies, watchdog, and especially NMI watchdog is expected to be triggered during this operation. Reconfiguring the watchdog with a factor, would prevent it to happen too frequently during LPM. Rename lockup_detector_reconfigure() as __lockup_detector_reconfigure() and create a new function lockup_detector_reconfigure() calling __lockup_detector_reconfigure() under the protection of watchdog_mutex. Cc: Christoph Hellwig Signed-off-by: Laurent Dufour --- include/linux/nmi.h | 2 ++ kernel/watchdog.c | 21 ++++++++++++++++----- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 750c7f395ca9..f700ff2df074 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -122,6 +122,8 @@ int watchdog_nmi_probe(void); int watchdog_nmi_enable(unsigned int cpu); void watchdog_nmi_disable(unsigned int cpu); +void lockup_detector_reconfigure(void); + /** * touch_nmi_watchdog - restart NMI watchdog timeout. * diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 20a7a55e62b6..90e6c41d5e33 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -541,7 +541,7 @@ int lockup_detector_offline_cpu(unsigned int cpu) return 0; } -static void lockup_detector_reconfigure(void) +static void __lockup_detector_reconfigure(void) { cpus_read_lock(); watchdog_nmi_stop(); @@ -561,6 +561,13 @@ static void lockup_detector_reconfigure(void) __lockup_detector_cleanup(); } +void lockup_detector_reconfigure(void) +{ + mutex_lock(&watchdog_mutex); + __lockup_detector_reconfigure(); + mutex_unlock(&watchdog_mutex); +} + /* * Create the watchdog infrastructure and configure the detector(s). */ @@ -577,13 +584,13 @@ static __init void lockup_detector_setup(void) return; mutex_lock(&watchdog_mutex); - lockup_detector_reconfigure(); + __lockup_detector_reconfigure(); softlockup_initialized = true; mutex_unlock(&watchdog_mutex); } #else /* CONFIG_SOFTLOCKUP_DETECTOR */ -static void lockup_detector_reconfigure(void) +void __lockup_detector_reconfigure(void) { cpus_read_lock(); watchdog_nmi_stop(); @@ -591,9 +598,13 @@ static void lockup_detector_reconfigure(void) watchdog_nmi_start(); cpus_read_unlock(); } +static inline void lockup_detector_reconfigure(void) +{ + __lockup_detector_reconfigure(); +} static inline void lockup_detector_setup(void) { - lockup_detector_reconfigure(); + __lockup_detector_reconfigure(); } #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */ @@ -633,7 +644,7 @@ static void proc_watchdog_update(void) { /* Remove impossible cpus to keep sysctl output clean. */ cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask); - lockup_detector_reconfigure(); + __lockup_detector_reconfigure(); } /* From patchwork Mon Jun 27 13:53:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Laurent Dufour X-Patchwork-Id: 585424 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DD37CCA473 for ; Mon, 27 Jun 2022 13:54:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235393AbiF0Nyq (ORCPT ); Mon, 27 Jun 2022 09:54:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57124 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235354AbiF0Nyd (ORCPT ); Mon, 27 Jun 2022 09:54:33 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E67F643B; Mon, 27 Jun 2022 06:54:31 -0700 (PDT) Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25RDDPbX030325; Mon, 27 Jun 2022 13:53:56 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=ZoX2KeeTTIfhJtt8SK3et9fRdjFWHZxAUxNccobhCQg=; b=HmGz0Hub0+afYO6CFlQ/9C7M7pH/thvC9PkBnU6/1gzR3DMjB1MCd47GBjsbJ0imKbex HI4jqeULoRe3jNKBT1IwA7gCehEdOJYPdztUuysJ8uVNSJDo2ePga4k7AAZ7NbiXivOC 9gYdyrH7oAEqIMP0y1WRPzU/TXKjF5+dS8/Sn2IrK8rrddloZYCZOIPfoa5f9EfkqPhk y4LVidvX9KgOPoulljUhxdjkZmHypx3UIEJ9dUZwXFTZiikF3e+MhsDFRCZ4TueFEGeL 3VE8E52LRWSbuuj8e55GZ0x9ciPnYZWBDAQ0+1LeDUM6zwKhUz2Itt5d3LBRBAVpywzl nA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3gyd89h61u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Jun 2022 13:53:55 +0000 Received: from m0098420.ppops.net (m0098420.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 25RDHAXu015356; Mon, 27 Jun 2022 13:53:55 GMT Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3gyd89h61a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Jun 2022 13:53:55 +0000 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 25RDqYbd030164; Mon, 27 Jun 2022 13:53:53 GMT Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by ppma05fra.de.ibm.com with ESMTP id 3gwt092aem-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Jun 2022 13:53:53 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 25RDrojS21496136 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 27 Jun 2022 13:53:50 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 103DFAE045; Mon, 27 Jun 2022 13:53:50 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8D6A7AE053; Mon, 27 Jun 2022 13:53:49 +0000 (GMT) Received: from pomme.tlslab.ibm.com (unknown [9.101.4.33]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 27 Jun 2022 13:53:49 +0000 (GMT) From: Laurent Dufour To: wim@linux-watchdog.org, linux@roeck-us.net, mpe@ellerman.id.au, benh@kernel.crashing.org, paulus@samba.org, nathanl@linux.ibm.com, haren@linux.vnet.ibm.com, npiggin@gmail.com Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-watchdog@vger.kernel.org Subject: [PATCH v3 3/4] powerpc/watchdog: introduce a NMI watchdog's factor Date: Mon, 27 Jun 2022 15:53:46 +0200 Message-Id: <20220627135347.32624-4-ldufour@linux.ibm.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220627135347.32624-1-ldufour@linux.ibm.com> References: <20220627135347.32624-1-ldufour@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: sQ5Cqp20Zfj5nFrveBXjq4PbGHHVMD9g X-Proofpoint-GUID: B-FpR0e-5RjdwG9oRQjvsPe2svmvL4qD X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-06-27_06,2022-06-24_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 malwarescore=0 phishscore=0 impostorscore=0 suspectscore=0 priorityscore=1501 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2204290000 definitions=main-2206270059 Precedence: bulk List-ID: X-Mailing-List: linux-watchdog@vger.kernel.org Introduce a factor which would apply to the NMI watchdog timeout. This factor is a percentage added to the watchdog_tresh value. The value is set under the watchdog_mutex protection and lockup_detector_reconfigure() is called to recompute wd_panic_timeout_tb. Once the factor is set, it remains until it is set back to 0, which means no impact. Signed-off-by: Laurent Dufour Reviewed-by: Nicholas Piggin --- arch/powerpc/include/asm/nmi.h | 2 ++ arch/powerpc/kernel/watchdog.c | 21 ++++++++++++++++++++- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h index ea0e487f87b1..7d6a8d9b0543 100644 --- a/arch/powerpc/include/asm/nmi.h +++ b/arch/powerpc/include/asm/nmi.h @@ -5,8 +5,10 @@ #ifdef CONFIG_PPC_WATCHDOG extern void arch_touch_nmi_watchdog(void); long soft_nmi_interrupt(struct pt_regs *regs); +void watchdog_nmi_set_lpm_factor(u64 factor); #else static inline void arch_touch_nmi_watchdog(void) {} +static inline void watchdog_nmi_set_lpm_factor(u64 factor) {} #endif #ifdef CONFIG_NMI_IPI diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index 7d28b9553654..80851b228f71 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -91,6 +91,10 @@ static cpumask_t wd_smp_cpus_pending; static cpumask_t wd_smp_cpus_stuck; static u64 wd_smp_last_reset_tb; +#ifdef CONFIG_PPC_PSERIES +static u64 wd_factor; +#endif + /* * Try to take the exclusive watchdog action / NMI IPI / printing lock. * wd_smp_lock must be held. If this fails, we should return and wait @@ -527,7 +531,13 @@ static int stop_watchdog_on_cpu(unsigned int cpu) static void watchdog_calc_timeouts(void) { - wd_panic_timeout_tb = watchdog_thresh * ppc_tb_freq; + u64 threshold = watchdog_thresh; + +#ifdef CONFIG_PPC_PSERIES + threshold += (READ_ONCE(wd_factor) * threshold) / 100; +#endif + + wd_panic_timeout_tb = threshold * ppc_tb_freq; /* Have the SMP detector trigger a bit later */ wd_smp_panic_timeout_tb = wd_panic_timeout_tb * 3 / 2; @@ -570,3 +580,12 @@ int __init watchdog_nmi_probe(void) } return 0; } + +#ifdef CONFIG_PPC_PSERIES +void watchdog_nmi_set_lpm_factor(u64 factor) +{ + pr_info("Set the NMI watchdog factor to %llu%%\n", factor); + WRITE_ONCE(wd_factor, factor); + lockup_detector_reconfigure(); +} +#endif From patchwork Mon Jun 27 13:53:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Laurent Dufour X-Patchwork-Id: 585805 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7FE5EC433EF for ; Mon, 27 Jun 2022 13:54:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236390AbiF0Nyb (ORCPT ); Mon, 27 Jun 2022 09:54:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57124 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235420AbiF0Ny2 (ORCPT ); Mon, 27 Jun 2022 09:54:28 -0400 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0C9CAE54; Mon, 27 Jun 2022 06:54:26 -0700 (PDT) Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25RDhHuN001053; Mon, 27 Jun 2022 13:53:56 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=+cawotrj/1dSxR3/0ou5PM5kym3DFbOzGlpY1H5jZs0=; b=aBtgLnVVPt3T06W/R216skhKYQ7e6PhZNANxjFA6Y8JG10CfCXhnrhmlKW+7hmOrnAhW /7/KFZhLDQcvzF6bKIvcsbDKS+QwZOKFZ3AQEANZzmm0SgbW1dYqNK9nYO+pQoQZtY0N jYLiDtX8ND1q8dBRci5Cf7lBMbbAs2j95Fbh6BMyYDDQS7prn6DVlDEjxC5D2ygpGMWs gPug2iLCSzDhFdznIJzxoxd4eyHhsn1SGmlVEP0iRtKwXY1NGj6nEFF2ppsMGMlOMRyp wBWrTG4j67zGxyzHtlMCixbvwNRyQj4vOIVyAg2CoElbW++xzco5ccAiskqra1GnJ0bm GQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3gydpa08vr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Jun 2022 13:53:56 +0000 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 25RDkKON015980; Mon, 27 Jun 2022 13:53:56 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3gydpa08us-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Jun 2022 13:53:56 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 25RDoUlX004750; Mon, 27 Jun 2022 13:53:54 GMT Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by ppma04ams.nl.ibm.com with ESMTP id 3gwt08u13w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Jun 2022 13:53:53 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 25RDroDO21496142 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 27 Jun 2022 13:53:50 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 97B21AE053; Mon, 27 Jun 2022 13:53:50 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 205FAAE04D; Mon, 27 Jun 2022 13:53:50 +0000 (GMT) Received: from pomme.tlslab.ibm.com (unknown [9.101.4.33]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 27 Jun 2022 13:53:50 +0000 (GMT) From: Laurent Dufour To: wim@linux-watchdog.org, linux@roeck-us.net, mpe@ellerman.id.au, benh@kernel.crashing.org, paulus@samba.org, nathanl@linux.ibm.com, haren@linux.vnet.ibm.com, npiggin@gmail.com Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-watchdog@vger.kernel.org Subject: [PATCH v3 4/4] pseries/mobility: set NMI watchdog factor during LPM Date: Mon, 27 Jun 2022 15:53:47 +0200 Message-Id: <20220627135347.32624-5-ldufour@linux.ibm.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220627135347.32624-1-ldufour@linux.ibm.com> References: <20220627135347.32624-1-ldufour@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: L0pY_-q6hY9VfxBmu9shvaCyp9vOKJUH X-Proofpoint-GUID: 9qFYSrQaiwpP9SXFE4jv4aYNbhhYBRYg X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-06-27_06,2022-06-24_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 malwarescore=0 suspectscore=0 bulkscore=0 priorityscore=1501 spamscore=0 impostorscore=0 lowpriorityscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2204290000 definitions=main-2206270059 Precedence: bulk List-ID: X-Mailing-List: linux-watchdog@vger.kernel.org During a LPM, while the memory transfer is in progress on the arrival side, some latencies is generated when accessing not yet transferred pages on the arrival side. Thus, the NMI watchdog may be triggered too frequently, which increases the risk to hit a NMI interrupt in a bad place in the kernel, leading to a kernel panic. Disabling the Hard Lockup Watchdog until the memory transfer could be a too strong work around, some users would want this timeout to be eventually triggered if the system is hanging even during LPM. Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply a factor to the NMI watchdog timeout during a LPM. Just before the CPU are stopped for the switchover sequence, the NMI watchdog timer is set to watchdog_tresh + factor% A value of 0 has no effect. The default value is 200, meaning that the NMI watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value). Once the memory transfer is achieved, the factor is reset to 0. Setting this value to a high number is like disabling the NMI watchdog during a LPM. Signed-off-by: Laurent Dufour --- Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++ arch/powerpc/platforms/pseries/mobility.c | 43 +++++++++++++++++++++ 2 files changed, 55 insertions(+) diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index ddccd1077462..0bb0b7f27e96 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -592,6 +592,18 @@ to the guest kernel command line (see Documentation/admin-guide/kernel-parameters.rst). +nmi_watchdog_factor (PPC only) +================================== + +Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is +set to 1). This factor represents the percentage added to +``watchdog_thresh`` when calculating the NMI watchdog timeout during a +LPM. The soft lockup timeout is not impacted. + +A value of 0 means no change. The default value is 200 meaning the NMI +watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10). + + numa_balancing ============== diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index 907a779074d6..649155faafc2 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -48,6 +48,39 @@ struct update_props_workarea { #define MIGRATION_SCOPE (1) #define PRRN_SCOPE -2 +#ifdef CONFIG_PPC_WATCHDOG +static unsigned int nmi_wd_factor = 200; + +#ifdef CONFIG_SYSCTL +static struct ctl_table nmi_wd_factor_ctl_table[] = { + { + .procname = "nmi_watchdog_factor", + .data = &nmi_wd_factor, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_douintvec_minmax, + }, + {} +}; +static struct ctl_table nmi_wd_factor_sysctl_root[] = { + { + .procname = "kernel", + .mode = 0555, + .child = nmi_wd_factor_ctl_table, + }, + {} +}; + +static int __init register_nmi_wd_factor_sysctl(void) +{ + register_sysctl_table(nmi_wd_factor_sysctl_root); + + return 0; +} +device_initcall(register_nmi_wd_factor_sysctl); +#endif /* CONFIG_SYSCTL */ +#endif /* CONFIG_PPC_WATCHDOG */ + static int mobility_rtas_call(int token, char *buf, s32 scope) { int rc; @@ -702,13 +735,20 @@ static int pseries_suspend(u64 handle) static int pseries_migrate_partition(u64 handle) { int ret; + unsigned int factor = 0; +#ifdef CONFIG_PPC_WATCHDOG + factor = nmi_wd_factor; +#endif ret = wait_for_vasi_session_suspending(handle); if (ret) return ret; vas_migration_handler(VAS_SUSPEND); + if (factor) + watchdog_nmi_set_lpm_factor(factor); + ret = pseries_suspend(handle); if (ret == 0) { post_mobility_fixup(); @@ -716,6 +756,9 @@ static int pseries_migrate_partition(u64 handle) } else pseries_cancel_migration(handle, ret); + if (factor) + watchdog_nmi_set_lpm_factor(0); + vas_migration_handler(VAS_RESUME); return ret;