From patchwork Thu Mar 14 07:06:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Liuye X-Patchwork-Id: 780712 Received: from h3cspam02-ex.h3c.com (smtp.h3c.com [60.191.123.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E349610E; Thu, 14 Mar 2024 07:08:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=60.191.123.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710400104; cv=none; b=UEMgPgN4JzSrqfyY3iwGtKtf2ZYicR4za0AsDp1plnf7Rby8KIE4ngawAjll7zfCUsLFBtsmVwwSwd5lDbpfum2nr+U+NcE8bVPQq/WGpXFl7ywGIfIyenTdD42Gy/OppprMeL2nZ2jY58BwxRqJ9pYzeMyfiHVlpZg3Hn1pNPo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710400104; c=relaxed/simple; bh=VHQmqpvBmzUmxKyyeewCBIRvDYNSD5Sb7pCTFIjAXyw=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=AcSp/9MdygN+H3vM7x6bIjsTZyR0K+W4qh1uaAAtUIteqlhiVoTwOcCxJ0s+SqwOSMYO2Ekyxq/t6dhmh1DGdSU3xCSW6z9AVDeSCfK54FlGrseXMQlzpihpColaxDQmtezzHze5UFLFHn38yQASjTq7p3RRGnfy3IlaM8e9wy0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=h3c.com; spf=pass smtp.mailfrom=h3c.com; arc=none smtp.client-ip=60.191.123.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=h3c.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=h3c.com Received: from mail.maildlp.com ([172.25.15.154]) by h3cspam02-ex.h3c.com with ESMTP id 42E76L3B099275; Thu, 14 Mar 2024 15:06:21 +0800 (GMT-8) (envelope-from liu.yeC@h3c.com) Received: from DAG6EX02-IMDC.srv.huawei-3com.com (unknown [10.62.14.11]) by mail.maildlp.com (Postfix) with ESMTP id 582D3200BBEB; Thu, 14 Mar 2024 15:07:53 +0800 (CST) Received: from DAG6EX02-IMDC.srv.huawei-3com.com (10.62.14.11) by DAG6EX02-IMDC.srv.huawei-3com.com (10.62.14.11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.27; Thu, 14 Mar 2024 15:06:22 +0800 Received: from DAG6EX02-IMDC.srv.huawei-3com.com ([fe80::4c21:7c89:4f9d:e4c4]) by DAG6EX02-IMDC.srv.huawei-3com.com ([fe80::4c21:7c89:4f9d:e4c4%16]) with mapi id 15.02.1258.027; Thu, 14 Mar 2024 15:06:22 +0800 From: Liuye To: Daniel Thompson CC: "jason.wessel@windriver.com" , "dianders@chromium.org" , "gregkh@linuxfoundation.org" , "jirislaby@kernel.org" , "kgdb-bugreport@lists.sourceforge.net" , "linux-kernel@vger.kernel.org" , "linux-serial@vger.kernel.org" Subject: =?eucgb2312_cn?b?tPC4tDogtPC4tDogtPC4tDogtPC4tDogtPC4tDogW1BBVENIXSBrZGI6?= =?eucgb2312_cn?b?IEZpeCB0aGUgZGVhZGxvY2sgaXNzdWUgaW4gS0RCIGRlYnVnZ2luZy4=?= Thread-Topic: =?eucgb2312_cn?b?tPC4tDogtPC4tDogtPC4tDogtPC4tDogW1BBVENIXSBrZGI6?= =?eucgb2312_cn?b?IEZpeCB0aGUgZGVhZGxvY2sgaXNzdWUgaW4gS0RCIGRlYnVnZ2luZy4=?= Thread-Index: AQHaafG3YC/Li+j42kau1FDQhHr2m7EfIsgAgAMadaD///fWgIAJrHcQgAeL+QCAAIb8YP//gGOAgAGAGvCAAFNzgIABn6DQ Date: Thu, 14 Mar 2024 07:06:22 +0000 Message-ID: <56ed54fd241c462189d2d030ad51eac6@h3c.com> References: <20240228025602.3087748-1-liu.yeC@h3c.com> <20240228120516.GA22898@aspen.lan> <8b41d34adaef4ddcacde2dd00d4e3541@h3c.com> <20240301105931.GB5795@aspen.lan> <2ea381e7407a49aaa0b08fa7d4ff62d3@h3c.com> <20240312095756.GB202685@aspen.lan> <06cfa3459ed848cf8f228997b983cf53@h3c.com> <20240312102419.GC202685@aspen.lan> <410a443612e8441cb729c640a0d606c6@h3c.com> <20240313141745.GD202685@aspen.lan> In-Reply-To: <20240313141745.GD202685@aspen.lan> Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-sender-location: DAG2 Precedence: bulk X-Mailing-List: linux-serial@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-DNSRBL: X-SPAM-SOURCE-CHECK: pass X-MAIL: h3cspam02-ex.h3c.com 42E76L3B099275 >On Wed, Mar 13, 2024 at 01:22:17AM +0000, Liuye wrote: >> >On Tue, Mar 12, 2024 at 10:04:54AM +0000, Liuye wrote: >> >> >On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote: >> >> >> I know that you said schedule_work is not NMI save, which is the >> >> >> first issue. Perhaps it can be fixed using irq_work_queue. But >> >> >> even if irq_work_queue is used to implement it, there will still >> >> >> be a deadlock problem because slave cpu1 still has not released >> >> >> the running queue lock of master CPU0. >> >> > >> >> >This doesn't sound right to me. Why do you think CPU1 won't >> >> >release the run queue lock? >> >> >> >> In this example, CPU1 is waiting for CPU0 to release >> >> dbg_slave_lock. >> > >> >That shouldn't be a problem. CPU0 will have released that lock by the >> >time the irq work is dispatched. >> >> Release dbg_slave_lock in CPU0. Before that, shcedule_work needs to be >> handled, and we are back to the previous issue. > >Sorry but I still don't understand what problem you think can happen here. What is wrong with calling schedule_work() from the IRQ work handler? > >Both irq_work_queue() and schedule_work() are calls to queue deferred work. It does not matter when the work is queued (providing we are lock safe). What matters is when the work is actually executed. > >Please can you describe the problem you think exists based on when the work is executed. CPU0 enters the KDB process when processing serial port interrupts and triggers an IPI (NMI) to other CPUs. After entering a stable state, CPU0 is in interrupt context, while other CPUs are in NMI context. Before other CPUs enter NMI context, there is a chance to obtain the running queue of CPU0. At this time, when CPU0 is processing kgdboc_restore_input, calling schedule_work, need_more_worker here determines the chance to wake up processes on system_wq. This will cause CPU0 to acquire the running queue lock of this core, which is held by other CPUs. but other CPUs are still in NMI context and have not exited because waiting for CPU0 to release the dbg_slave_lock after schedule_work. After thinking about it, the problem is not whether schedule_work is NMI safe, but that processes on system_wq should not be awakened immediately when schedule_work is called. I replaced schedule_work with schedule_delayed_work, and this solved my problem. The new patch is as follows: Index: drivers/tty/serial/kgdboc.c =================================================================== --- drivers/tty/serial/kgdboc.c (revision 57862) +++ drivers/tty/serial/kgdboc.c (working copy) @@ -92,12 +92,12 @@ mutex_unlock(&kgdboc_reset_mutex); } -static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper); +static DECLARE_DELAYED_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper); static void kgdboc_restore_input(void) { if (likely(system_state == SYSTEM_RUNNING)) - schedule_work(&kgdboc_restore_input_work); + schedule_delayed_work(&kgdboc_restore_input_work,2*HZ); } static int kgdboc_register_kbd(char **cptr) @@ -128,7 +128,7 @@ i--; } } - flush_work(&kgdboc_restore_input_work); + flush_delayed_work(&kgdboc_restore_input_work); } #else /* ! CONFIG_KDB_KEYBOARD */ #define kgdboc_register_kbd(x) 0