diff mbox

ARM: machine_power_off should not return

Message ID 1395684784-12601-1-git-send-email-sebastian.capella@linaro.org
State New
Headers show

Commit Message

Sebastian Capella March 24, 2014, 6:13 p.m. UTC
Add loop to prevent return from machine_power_off if
pm_power_off is null or does not halt the system.
This caused a panic during hibernation testing on Kirkwood
Openblocks A6 board.

Signed-off-by: Sebastian Capella <sebastian.capella@linaro.org>
Reported-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
---
 arch/arm/kernel/process.c |    2 ++
 1 file changed, 2 insertions(+)

Comments

Ezequiel Garcia March 25, 2014, 10:45 p.m. UTC | #1
Let's Cc: LAKML, and To: Russell.

Russell, any comments on this?

Without this patch we got the heartbeat's reboot_notifier called twice while
testing the recent hibernation patches, which was unexpected and produced a
kernel panic: https://lkml.org/lkml/2014/3/19/363

Instead of fixing the heartbeat LED trigger, or the hibernation code, it
seems better to fix the ARM machine_power_off, as it's not supposed to return.

On Mar 24, Sebastian Capella wrote:
> Add loop to prevent return from machine_power_off if
> pm_power_off is null or does not halt the system.
> This caused a panic during hibernation testing on Kirkwood
> Openblocks A6 board.
> 
> Signed-off-by: Sebastian Capella <sebastian.capella@linaro.org>
> Reported-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> Cc: Len Brown <len.brown@intel.com>
> Cc: Pavel Machek <pavel@ucw.cz>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> ---
>  arch/arm/kernel/process.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
> index f58b723..6ffdc2c 100644
> --- a/arch/arm/kernel/process.c
> +++ b/arch/arm/kernel/process.c
> @@ -217,6 +217,8 @@ void machine_power_off(void)
>  
>  	if (pm_power_off)
>  		pm_power_off();
> +	while (1)
> +		cpu_relax();
>  }
>  
>  /*
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux March 26, 2014, 12:51 a.m. UTC | #2
On Tue, Mar 25, 2014 at 07:45:55PM -0300, Ezequiel Garcia wrote:
> Let's Cc: LAKML, and To: Russell.
> 
> Russell, any comments on this?
> 
> Without this patch we got the heartbeat's reboot_notifier called twice while
> testing the recent hibernation patches, which was unexpected and produced a
> kernel panic: https://lkml.org/lkml/2014/3/19/363

I don't see why we should make this change.  kernel/reboot.c handles
this function returning, so other places should do too.

Even on x86, this function can return:

void machine_power_off(void)
{
        machine_ops.power_off();
}

        .power_off = native_machine_power_off,

static void native_machine_power_off(void)
{
        if (pm_power_off) {
                if (!reboot_force)
                        machine_shutdown();
                pm_power_off();
        }
        /* A fallback in case there is no PM info available */
        tboot_shutdown(TB_SHUTDOWN_HALT);
}

void tboot_shutdown(u32 shutdown_type)
{
        void (*shutdown)(void);

        if (!tboot_enabled())
                return;

Therefore, I'd say... it's a bug in the hibernation code - or we probably
have many buggy architectures.  I'd suggest fixing the hibernation code
rather than stuffing some workaround like an endless loop into every
architecture.
Ezequiel Garcia March 26, 2014, 10:12 a.m. UTC | #3
Russell,

Thanks for the reply!

On Mar 26, Russell King - ARM Linux wrote:
> On Tue, Mar 25, 2014 at 07:45:55PM -0300, Ezequiel Garcia wrote:
> > 
> > Without this patch we got the heartbeat's reboot_notifier called twice while
> > testing the recent hibernation patches, which was unexpected and produced a
> > kernel panic: https://lkml.org/lkml/2014/3/19/363
> 
> I don't see why we should make this change.  kernel/reboot.c handles
> this function returning, so other places should do too.
> 
> Even on x86, this function can return:
> 
[..]
> 
> Therefore, I'd say... it's a bug in the hibernation code - or we probably
> have many buggy architectures.  I'd suggest fixing the hibernation code
> rather than stuffing some workaround like an endless loop into every
> architecture.
> 

Which is exactly what Sebastian did first:

https://lkml.org/lkml/2014/3/20/605

But Pavel asked to fix ARM's machine_power_off instead.

Also, looking at the other architectures, it seems this API is not well
defined: some of them have an infinite loop, some don't. So it's hard to
say the function is supposed to return or not.
Russell King - ARM Linux March 26, 2014, 10:59 a.m. UTC | #4
On Wed, Mar 26, 2014 at 07:12:27AM -0300, Ezequiel Garcia wrote:
> On Mar 26, Russell King - ARM Linux wrote:
> > On Tue, Mar 25, 2014 at 07:45:55PM -0300, Ezequiel Garcia wrote:
> > > 
> > > Without this patch we got the heartbeat's reboot_notifier called twice while
> > > testing the recent hibernation patches, which was unexpected and produced a
> > > kernel panic: https://lkml.org/lkml/2014/3/19/363
> > 
> > I don't see why we should make this change.  kernel/reboot.c handles
> > this function returning, so other places should do too.
> > 
> > Even on x86, this function can return:
> > 
> [..]
> > 
> > Therefore, I'd say... it's a bug in the hibernation code - or we probably
> > have many buggy architectures.  I'd suggest fixing the hibernation code
> > rather than stuffing some workaround like an endless loop into every
> > architecture.
> > 
> 
> Which is exactly what Sebastian did first:
> 
> https://lkml.org/lkml/2014/3/20/605
> 
> But Pavel asked to fix ARM's machine_power_off instead.
> 
> Also, looking at the other architectures, it seems this API is not well
> defined: some of them have an infinite loop, some don't. So it's hard to
> say the function is supposed to return or not.

I'm going by x86 (which I regard as definitive) and the generic power-off
kernel code (which I've looked at all the way back to 2.6.12-rc2).

The hibernation code path should really be fixed - the paths in
kernel/reboot.c have coped for 9+ with all of these platform hooks
returning, and it's only the silly switch() in the hibernation code
that doesn't use a "default" case to handle the kernel_halt() case
which is the real cause of the problem.

As you've found, calling kernel_power_off() followed by kernel_halt()
leads to bugs in drivers: this is not an architecture thing, it's partly
a hibernation code failure for doing that, and partly a driver bug for
trying to unregister something that it's already unregistered.
diff mbox

Patch

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index f58b723..6ffdc2c 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -217,6 +217,8 @@  void machine_power_off(void)
 
 	if (pm_power_off)
 		pm_power_off();
+	while (1)
+		cpu_relax();
 }
 
 /*