Message ID | 1395684784-12601-1-git-send-email-sebastian.capella@linaro.org |
---|---|
State | New |
Headers | show |
Let's Cc: LAKML, and To: Russell. Russell, any comments on this? Without this patch we got the heartbeat's reboot_notifier called twice while testing the recent hibernation patches, which was unexpected and produced a kernel panic: https://lkml.org/lkml/2014/3/19/363 Instead of fixing the heartbeat LED trigger, or the hibernation code, it seems better to fix the ARM machine_power_off, as it's not supposed to return. On Mar 24, Sebastian Capella wrote: > Add loop to prevent return from machine_power_off if > pm_power_off is null or does not halt the system. > This caused a panic during hibernation testing on Kirkwood > Openblocks A6 board. > > Signed-off-by: Sebastian Capella <sebastian.capella@linaro.org> > Reported-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> > Cc: Len Brown <len.brown@intel.com> > Cc: Pavel Machek <pavel@ucw.cz> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> > --- > arch/arm/kernel/process.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c > index f58b723..6ffdc2c 100644 > --- a/arch/arm/kernel/process.c > +++ b/arch/arm/kernel/process.c > @@ -217,6 +217,8 @@ void machine_power_off(void) > > if (pm_power_off) > pm_power_off(); > + while (1) > + cpu_relax(); > } > > /* > -- > 1.7.9.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 25, 2014 at 07:45:55PM -0300, Ezequiel Garcia wrote: > Let's Cc: LAKML, and To: Russell. > > Russell, any comments on this? > > Without this patch we got the heartbeat's reboot_notifier called twice while > testing the recent hibernation patches, which was unexpected and produced a > kernel panic: https://lkml.org/lkml/2014/3/19/363 I don't see why we should make this change. kernel/reboot.c handles this function returning, so other places should do too. Even on x86, this function can return: void machine_power_off(void) { machine_ops.power_off(); } .power_off = native_machine_power_off, static void native_machine_power_off(void) { if (pm_power_off) { if (!reboot_force) machine_shutdown(); pm_power_off(); } /* A fallback in case there is no PM info available */ tboot_shutdown(TB_SHUTDOWN_HALT); } void tboot_shutdown(u32 shutdown_type) { void (*shutdown)(void); if (!tboot_enabled()) return; Therefore, I'd say... it's a bug in the hibernation code - or we probably have many buggy architectures. I'd suggest fixing the hibernation code rather than stuffing some workaround like an endless loop into every architecture.
Russell, Thanks for the reply! On Mar 26, Russell King - ARM Linux wrote: > On Tue, Mar 25, 2014 at 07:45:55PM -0300, Ezequiel Garcia wrote: > > > > Without this patch we got the heartbeat's reboot_notifier called twice while > > testing the recent hibernation patches, which was unexpected and produced a > > kernel panic: https://lkml.org/lkml/2014/3/19/363 > > I don't see why we should make this change. kernel/reboot.c handles > this function returning, so other places should do too. > > Even on x86, this function can return: > [..] > > Therefore, I'd say... it's a bug in the hibernation code - or we probably > have many buggy architectures. I'd suggest fixing the hibernation code > rather than stuffing some workaround like an endless loop into every > architecture. > Which is exactly what Sebastian did first: https://lkml.org/lkml/2014/3/20/605 But Pavel asked to fix ARM's machine_power_off instead. Also, looking at the other architectures, it seems this API is not well defined: some of them have an infinite loop, some don't. So it's hard to say the function is supposed to return or not.
On Wed, Mar 26, 2014 at 07:12:27AM -0300, Ezequiel Garcia wrote: > On Mar 26, Russell King - ARM Linux wrote: > > On Tue, Mar 25, 2014 at 07:45:55PM -0300, Ezequiel Garcia wrote: > > > > > > Without this patch we got the heartbeat's reboot_notifier called twice while > > > testing the recent hibernation patches, which was unexpected and produced a > > > kernel panic: https://lkml.org/lkml/2014/3/19/363 > > > > I don't see why we should make this change. kernel/reboot.c handles > > this function returning, so other places should do too. > > > > Even on x86, this function can return: > > > [..] > > > > Therefore, I'd say... it's a bug in the hibernation code - or we probably > > have many buggy architectures. I'd suggest fixing the hibernation code > > rather than stuffing some workaround like an endless loop into every > > architecture. > > > > Which is exactly what Sebastian did first: > > https://lkml.org/lkml/2014/3/20/605 > > But Pavel asked to fix ARM's machine_power_off instead. > > Also, looking at the other architectures, it seems this API is not well > defined: some of them have an infinite loop, some don't. So it's hard to > say the function is supposed to return or not. I'm going by x86 (which I regard as definitive) and the generic power-off kernel code (which I've looked at all the way back to 2.6.12-rc2). The hibernation code path should really be fixed - the paths in kernel/reboot.c have coped for 9+ with all of these platform hooks returning, and it's only the silly switch() in the hibernation code that doesn't use a "default" case to handle the kernel_halt() case which is the real cause of the problem. As you've found, calling kernel_power_off() followed by kernel_halt() leads to bugs in drivers: this is not an architecture thing, it's partly a hibernation code failure for doing that, and partly a driver bug for trying to unregister something that it's already unregistered.
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c index f58b723..6ffdc2c 100644 --- a/arch/arm/kernel/process.c +++ b/arch/arm/kernel/process.c @@ -217,6 +217,8 @@ void machine_power_off(void) if (pm_power_off) pm_power_off(); + while (1) + cpu_relax(); } /*
Add loop to prevent return from machine_power_off if pm_power_off is null or does not halt the system. This caused a panic during hibernation testing on Kirkwood Openblocks A6 board. Signed-off-by: Sebastian Capella <sebastian.capella@linaro.org> Reported-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> --- arch/arm/kernel/process.c | 2 ++ 1 file changed, 2 insertions(+)