diff mbox

[RFC,v4] debug: prevent entering debug mode on panic/exception.

Message ID 1419338309-16729-1-git-send-email-kiran.kumar@linaro.org
State New
Headers show

Commit Message

Kiran Kumar Raparthy Dec. 23, 2014, 12:38 p.m. UTC
From: Colin Cross <ccross@android.com>

debug: prevent entering debug mode on panic/exception.

On non-developer devices, kgdb prevents the device from rebooting
after a panic.

Incase of panics and exceptions, to allow the device to reboot, prevent
entering debug mode to avoid getting stuck waiting for the user to interact
with debugger.

To avoid entering the debugger on panic/exception without any extra
configuration, panic_timeout is being used which can be set via
/proc/sys/kernel/panic at run time and CONFIG_PANIC_TIMEOUT sets the default
value.

Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: kgdb-bugreport@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
Cc: Android Kernel Team <kernel-team@android.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Colin Cross <ccross@android.com>
[Kiran: Added context to commit message.
panic_timeout is used instead of break_on_panic and
break_on_exception to honor CONFIG_PANIC_TIMEOUT.
Modified the commit as per community feedback]
Signed-off-by: Kiran Raparthy <kiran.kumar@linaro.org>
---
 kernel/debug/debug_core.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Kiran Kumar Raparthy Jan. 6, 2015, 5:01 a.m. UTC | #1
Hi Andrew,

On 6 January 2015 at 05:44, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 23 Dec 2014 18:08:29 +0530 Kiran Raparthy <kiran.kumar@linaro.org> wrote:
>
>> From: Colin Cross <ccross@android.com>
>>
>> debug: prevent entering debug mode on panic/exception.
>>
>> On non-developer devices, kgdb prevents the device from rebooting
>> after a panic.
>>
>> Incase of panics and exceptions, to allow the device to reboot, prevent
>> entering debug mode to avoid getting stuck waiting for the user to interact
>> with debugger.
>>
>> To avoid entering the debugger on panic/exception without any extra
>> configuration, panic_timeout is being used which can be set via
>> /proc/sys/kernel/panic at run time and CONFIG_PANIC_TIMEOUT sets the default
>> value.
>>
>> ...
>>
>
> hm.  Why overload the meaning of panic_timeout in this fashion?  If
> someone is using kgdb and has panic_timeout set, they'll get quiet a
> surprise.
We are just using already exported kernel symbol to avoid extra configuration.
So we have chosen panic_timeout to allow kgdb to honor CONFIG_PANIC_TIMEOUT.
I can modify the commit text if you feel the above commit text
overloading the meaning of panic_timeout.

To give you a better idea about why we have used panic_timeout,
sharing Daniel's previous review comment:

"It ought to be possible for kgdb/kdb to honour CONFIG_PANIC_TIMEOUT by
tracking how long it takes for the user to attach a debugger (or to run
the first kdb command after the panic). As it happens the timeout value
is already an exported kernel symbol so all the info is there for us to
use.Doing so would save us imposing further configuration burden on the user"

>
> Would it be cleaner/clearer to add some new standalone control for
> this?  One which is only present if CONFIG_KGDB.

Original Android patch uses separate configuration but since we can
achieve the same with
already existing kernel symbol,we have used panic_timeout.
>
>
> We appear to have forgotten to document panic_timeout.  Sigh.
Regards,
Kiran
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Daniel Thompson Jan. 6, 2015, 3:25 p.m. UTC | #2
On 06/01/15 00:14, Andrew Morton wrote:
> On Tue, 23 Dec 2014 18:08:29 +0530 Kiran Raparthy <kiran.kumar@linaro.org> wrote:
> 
>> From: Colin Cross <ccross@android.com>
>>
>> debug: prevent entering debug mode on panic/exception.
>>
>> On non-developer devices, kgdb prevents the device from rebooting
>> after a panic.
>>
>> Incase of panics and exceptions, to allow the device to reboot, prevent
>> entering debug mode to avoid getting stuck waiting for the user to interact
>> with debugger.
>>
>> To avoid entering the debugger on panic/exception without any extra
>> configuration, panic_timeout is being used which can be set via
>> /proc/sys/kernel/panic at run time and CONFIG_PANIC_TIMEOUT sets the default
>> value.
>>
>> ...
>>
> 
> hm.  Why overload the meaning of panic_timeout in this fashion?  If
> someone is using kgdb and has panic_timeout set, they'll get quiet a
> surprise.

Will they?

Setting panic_timeout means the user explicitly requested that the
machine try to perform an unattended reboot after a panic. Wedging in
kgdb during panic on such a system absolutely contradicts the user's
request to have an automated reboot.

In this case it is not overloading the meaning... we are simply altering
kgdb to properly honour the existing meaning.

Changing the behaviour for exceptions as well as for panics *might*
maybe veer into overloading but even there, I think we are simply better
reflecting user intent. It is hard to imagine a user that wants
automatic-reboot-on-panic being happy for the kernel to get stuck
overnight just because of an oops.


> Would it be cleaner/clearer to add some new standalone control for
> this?  One which is only present if CONFIG_KGDB.

I don't think it is clearer (although given it was my review that lead
us here that's probably to be expected).

Separating them out simply means an additional gotcha a user must be
aware of if they want automatic reboot *and* kgdb (i.e. two kernel
cmdline arguments must be used that then just one).


> We appear to have forgotten to document panic_timeout.  Sigh.

On the subject of documentation it would be useful to update the kgdb
docs to document the effect of panic_timeout on panic/exception behaviour.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Daniel Thompson Jan. 6, 2015, 3:31 p.m. UTC | #3
On 06/01/15 00:14, Andrew Morton wrote:
> We appear to have forgotten to document panic_timeout.  Sigh.

I knew I'd seen it somewhere...

Once its been sysctlised it ends up called "panic":
  https://www.kernel.org/doc/Documentation/sysctl/kernel.txt

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Kiran Kumar Raparthy Jan. 7, 2015, 3:36 a.m. UTC | #4
On 7 January 2015 at 02:47, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Tue, 06 Jan 2015 15:25:51 +0000 Daniel Thompson <daniel.thompson@linaro.org> wrote:
>
>> >> To avoid entering the debugger on panic/exception without any extra
>> >> configuration, panic_timeout is being used which can be set via
>> >> /proc/sys/kernel/panic at run time and CONFIG_PANIC_TIMEOUT sets the default
>> >> value.
>> >>
>> >> ...
>> >>
>> >
>> > hm.  Why overload the meaning of panic_timeout in this fashion?  If
>> > someone is using kgdb and has panic_timeout set, they'll get quiet a
>> > surprise.
>>
>> Will they?
>>
>> Setting panic_timeout means the user explicitly requested that the
>> machine try to perform an unattended reboot after a panic. Wedging in
>> kgdb during panic on such a system absolutely contradicts the user's
>> request to have an automated reboot.
>>
>> In this case it is not overloading the meaning... we are simply altering
>> kgdb to properly honour the existing meaning.
>>
>> Changing the behaviour for exceptions as well as for panics *might*
>> maybe veer into overloading but even there, I think we are simply better
>> reflecting user intent. It is hard to imagine a user that wants
>> automatic-reboot-on-panic being happy for the kernel to get stuck
>> overnight just because of an oops.
>
> OK, when you put it that way...
>
> Kiran, could we please have an updated patch which includes the above
> observations in the changelog and which documents panic_timeout?
Sure,I'll do that.Thanks for your time.
Regards,
Kiran
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Daniel Thompson Jan. 7, 2015, 8:57 a.m. UTC | #5
On 06/01/15 21:16, Andrew Morton wrote:
> On Tue, 06 Jan 2015 15:31:31 +0000 Daniel Thompson <daniel.thompson@linaro.org> wrote:
> 
>> On 06/01/15 00:14, Andrew Morton wrote:
>>> We appear to have forgotten to document panic_timeout.  Sigh.
>>
>> I knew I'd seen it somewhere...
>>
>> Once its been sysctlised it ends up called "panic":
>>   https://www.kernel.org/doc/Documentation/sysctl/kernel.txt
> 
> Confused.  That file doesn't mention panic_timeout.  But it is the
> place where panic_timeout can be described.

The sysctl is called "panic" (as is the closely related kernel parameter)

The name panic_timeout is used only for the C symbol presumably because
panic was already used for something else...


Daniel.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 1adf62b..0012a1f 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -689,6 +689,14 @@  kgdb_handle_exception(int evector, int signo, int ecode, struct pt_regs *regs)
 
 	if (arch_kgdb_ops.enable_nmi)
 		arch_kgdb_ops.enable_nmi(0);
+	/*
+	 * Avoid entering the debugger if we were triggered due to an oops
+	 * but panic_timeout indicates the system should automatically
+	 * reboot on panic. We don't want to get stuck waiting for input
+	 * on such systems, especially if its "just" an oops.
+	 */
+	if (signo != SIGTRAP && panic_timeout)
+		return 1;
 
 	memset(ks, 0, sizeof(struct kgdb_state));
 	ks->cpu			= raw_smp_processor_id();
@@ -821,6 +829,15 @@  static int kgdb_panic_event(struct notifier_block *self,
 			    unsigned long val,
 			    void *data)
 {
+	/*
+	 * Avoid entering the debugger if we were triggered due to a panic
+	 * We don't want to get stuck waiting for input from user in such case.
+	 * panic_timeout indicates the system should automatically
+	 * reboot on panic.
+	 */
+	if (panic_timeout)
+		return NOTIFY_DONE;
+
 	if (dbg_kdb_mode)
 		kdb_printf("PANIC: %s\n", (char *)data);
 	kgdb_breakpoint();