From patchwork Tue Nov 11 17:50:32 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Thompson X-Patchwork-Id: 40607 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-ee0-f69.google.com (mail-ee0-f69.google.com [74.125.83.69]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 5F791206A5 for ; Tue, 11 Nov 2014 17:50:43 +0000 (UTC) Received: by mail-ee0-f69.google.com with SMTP id c41sf7159466eek.0 for ; Tue, 11 Nov 2014 09:50:42 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject :date:message-id:in-reply-to:references:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-unsubscribe; bh=FaCqrKqml3mFDnyuRGGVDdpDtRCk9zmnQ3nmFU/pceY=; b=efyWZpwagZzuxzsLnfllI1EKdkzJqjnXbf5+0pvNTGZ7LKfyS1vY3bPylIR3YmVWNX zHTGwzaXO3EC7vSqX1LNIvjpzlmTyuu14MdIdbyzkV8xU8kk9Vl6J/kZiu/62w8UNjtc mw4svlhhEWWeEqJFLgL2YzeHWmOyVtg/FT5GZ7e3L8How6lsggxurIVcKZjnD2CmIxkP eTvxJuIM1XoDimv5yoQmZuHDpUnE6RgK+ku3cM+DExMT/tvaNf3+IPa14RNGvkZHmAug 2osV8/4HEzX52GT3tESmnF1eOXvyuNdafFI2MHe0UifaU8Muk01oeztXhKdF8ngvPqAR XOQA== X-Gm-Message-State: ALoCoQneScQ6duBIevs0DrhLeUSsRhwL3XcyuHTRI26tsBBeAiikYCzgoju+HI13LVqT85cAuA9U X-Received: by 10.194.133.40 with SMTP id oz8mr6886339wjb.2.1415728242576; Tue, 11 Nov 2014 09:50:42 -0800 (PST) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.22.72 with SMTP id b8ls315228laf.2.gmail; Tue, 11 Nov 2014 09:50:42 -0800 (PST) X-Received: by 10.153.11.5 with SMTP id ee5mr5489668lad.55.1415728242261; Tue, 11 Nov 2014 09:50:42 -0800 (PST) Received: from mail-la0-f48.google.com (mail-la0-f48.google.com. [209.85.215.48]) by mx.google.com with ESMTPS id at1si5356292lbc.18.2014.11.11.09.50.42 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 11 Nov 2014 09:50:42 -0800 (PST) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.48 as permitted sender) client-ip=209.85.215.48; Received: by mail-la0-f48.google.com with SMTP id gq15so9854490lab.21 for ; Tue, 11 Nov 2014 09:50:42 -0800 (PST) X-Received: by 10.152.120.199 with SMTP id le7mr36853436lab.67.1415728242172; Tue, 11 Nov 2014 09:50:42 -0800 (PST) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.112.184.201 with SMTP id ew9csp298161lbc; Tue, 11 Nov 2014 09:50:41 -0800 (PST) X-Received: by 10.194.87.131 with SMTP id ay3mr57425794wjb.66.1415728241518; Tue, 11 Nov 2014 09:50:41 -0800 (PST) Received: from mail-wg0-f49.google.com (mail-wg0-f49.google.com. [74.125.82.49]) by mx.google.com with ESMTPS id e1si24076819wix.32.2014.11.11.09.50.41 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 11 Nov 2014 09:50:41 -0800 (PST) Received-SPF: pass (google.com: domain of daniel.thompson@linaro.org designates 74.125.82.49 as permitted sender) client-ip=74.125.82.49; Received: by mail-wg0-f49.google.com with SMTP id x13so12299216wgg.8 for ; Tue, 11 Nov 2014 09:50:41 -0800 (PST) X-Received: by 10.180.11.194 with SMTP id s2mr9464484wib.45.1415728241121; Tue, 11 Nov 2014 09:50:41 -0800 (PST) Received: from sundance.lan (cpc4-aztw19-0-0-cust157.18-1.cable.virginm.net. [82.33.25.158]) by mx.google.com with ESMTPSA id ji10sm18392039wid.7.2014.11.11.09.50.39 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Nov 2014 09:50:39 -0800 (PST) From: Daniel Thompson To: Jason Wessel Cc: Daniel Thompson , linux-kernel@vger.kernel.org, patches@linaro.org, linaro-kernel@lists.linaro.org, John Stultz , Sumit Semwal , Mike Travis , Randy Dunlap , Dimitri Sivanich , Andrew Morton , Borislav Petkov , kgdb-bugreport@lists.sourceforge.net, Ingo Molnar Subject: [PATCH v3] kgdb: Timeout if secondary CPUs ignore the roundup Date: Tue, 11 Nov 2014 17:50:32 +0000 Message-Id: <1415728232-9954-1-git-send-email-daniel.thompson@linaro.org> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1408374110-19656-1-git-send-email-daniel.thompson@linaro.org> References: <1408374110-19656-1-git-send-email-daniel.thompson@linaro.org> X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: daniel.thompson@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.48 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , Currently if an active CPU fails to respond to a roundup request the CPU that requested the roundup will become stuck. This needlessly reduces the robustness of the debugger. This patch introduces a timeout allowing the system state to be examined even when the system contains unresponsive processors. It also modifies kdb's cpu command to make it censor attempts to switch to unresponsive processors and to report their state as (D)ead. Signed-off-by: Daniel Thompson Cc: Jason Wessel Cc: Mike Travis Cc: Randy Dunlap Cc: Dimitri Sivanich Cc: Andrew Morton Cc: Borislav Petkov Cc: kgdb-bugreport@lists.sourceforge.net Cc: Ingo Molnar --- Notes: I had to spin this one again due to an out-by-one in kdb_cpu() so I've tackled a couple of other small issues at the same time. v3: * Fix an out-by-one error in kdb_cpu(). * Replace NR_CPUS with CONFIG_NR_CPUS to tell checkpatch that we really want a static limit (Jason Wessel). * Removed the "KGDB: " prefix from the pr_crit() in debug_core.c (kgdb-next contains a patch which introduced pr_fmt() to this file to the tag will now be applied automatically). v2: * Set CATASTROPHIC if the system contains unresponsive processors (Jason Wessel) kernel/debug/debug_core.c | 9 +++++++-- kernel/debug/kdb/kdb_debugger.c | 4 ++++ kernel/debug/kdb/kdb_main.c | 4 +++- 3 files changed, 14 insertions(+), 3 deletions(-) -- 1.9.3 diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c index 1adf62b39b96..f21580b347cc 100644 --- a/kernel/debug/debug_core.c +++ b/kernel/debug/debug_core.c @@ -471,6 +471,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs, int cpu; int trace_on = 0; int online_cpus = num_online_cpus(); + u64 time_left; kgdb_info[ks->cpu].enter_kgdb++; kgdb_info[ks->cpu].exception_state |= exception_state; @@ -595,9 +596,13 @@ return_normal: /* * Wait for the other CPUs to be notified and be waiting for us: */ - while (kgdb_do_roundup && (atomic_read(&masters_in_kgdb) + - atomic_read(&slaves_in_kgdb)) != online_cpus) + time_left = loops_per_jiffy * HZ; + while (kgdb_do_roundup && --time_left && + (atomic_read(&masters_in_kgdb) + atomic_read(&slaves_in_kgdb)) != + online_cpus) cpu_relax(); + if (!time_left) + pr_crit("Timed out waiting for secondary CPUs.\n"); /* * At this point the primary processor is completely diff --git a/kernel/debug/kdb/kdb_debugger.c b/kernel/debug/kdb/kdb_debugger.c index 8859ca34dcfe..15e1a7af5dd0 100644 --- a/kernel/debug/kdb/kdb_debugger.c +++ b/kernel/debug/kdb/kdb_debugger.c @@ -129,6 +129,10 @@ int kdb_stub(struct kgdb_state *ks) ks->pass_exception = 1; KDB_FLAG_SET(CATASTROPHIC); } + /* set CATASTROPHIC if the system contains unresponsive processors */ + for_each_online_cpu(i) + if (!kgdb_info[i].enter_kgdb) + KDB_FLAG_SET(CATASTROPHIC); if (KDB_STATE(SSBPT) && reason == KDB_REASON_SSTEP) { KDB_STATE_CLEAR(SSBPT); KDB_STATE_CLEAR(DOING_SS); diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c index 379650b984f8..0c1dc7fa2e58 100644 --- a/kernel/debug/kdb/kdb_main.c +++ b/kernel/debug/kdb/kdb_main.c @@ -2157,6 +2157,8 @@ static void kdb_cpu_status(void) for (start_cpu = -1, i = 0; i < NR_CPUS; i++) { if (!cpu_online(i)) { state = 'F'; /* cpu is offline */ + } else if (!kgdb_info[i].enter_kgdb) { + state = 'D'; /* cpu is online but unresponsive */ } else { state = ' '; /* cpu is responding to kdb */ if (kdb_task_state_char(KDB_TSK(i)) == 'I') @@ -2210,7 +2212,7 @@ static int kdb_cpu(int argc, const char **argv) /* * Validate cpunum */ - if ((cpunum > NR_CPUS) || !cpu_online(cpunum)) + if ((cpunum >= CONFIG_NR_CPUS) || !kgdb_info[cpunum].enter_kgdb) return KDB_BADCPUNUM; dbg_switch_cpu = cpunum;