diff mbox series

[v1] kexec_core: Add and update comments regarding the KEXEC_JUMP flow

Message ID 4968818.GXAFRqVoOG@rjwysocki.net
State New
Headers show
Series [v1] kexec_core: Add and update comments regarding the KEXEC_JUMP flow | expand

Commit Message

Rafael J. Wysocki Dec. 16, 2024, 1:39 p.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The KEXEC_JUMP flow is analogous to hibernation flows occurring before
and after creating an image and before and after jumping from the
restore kernel to the image one, which is why it uses the same device
callbacks as those hibernation flows.

Add comments explaining that to the code in question and update an
existing comment in it which appears a bit out of context.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

The kexec_jump code has a few problems AFAICS.

First off, it should use lock_system_sleep() or "interesting" things may
happen when it is run in parallel to a system-wide PM transition.

Second, it looks like it should use pm_sleep_disable/enable_decondary_cpus()
instead of the "raw" suspend_disable/enable_secondary_cpus() because running
it with unpaused cpuidle is kind of a slippery slope.

Moreover, it wouldn't hurt to somehow call acpi_pm_freeze() somewhere during
it to prevent background platform activity from interfering with the "resume"
part of it.

It also might be useful to unify it with the analogous hibernation flows more
directly, but that would require some rearrangements of the latter.

I'm going to send patches along these lines at one point in the future
unless I'm told that this is a bad idea.

Thanks!

---
 kernel/kexec_core.c |   23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

Comments

Baoquan He Dec. 20, 2024, 7:22 a.m. UTC | #1
On 12/16/24 at 12:21pm, Eric W. Biederman wrote:
> David Woodhouse <dwmw2@infradead.org> writes:
> 
> > It isn't broken. I know of it being used a few million times a week.
> >
> > There are corner cases which have never worked right, like the callee
> > putting a different return address for its next invocation, on the
> > stack *above* the address it 'ret's to. Which since the first kjump
> > patch has been the first word of the page *after* the swap page (and
> > is now fixed in my tree). But fundamentally it *does* work.
> >
> > I only started messing with it because I was working on
> > relocate_kernel() and needed to write a test case for it; the fact
> > that I know of it being used in production is actually just a
> > coincidence.
> 
> Cool.  I had the sense that the original developer never got around
> to using it, so I figured I should check.
> 
> Mind if I ask what you know of it being used for?

I am also very curious about the use case and asked David in other
thread, while David didn't tell. Not sure if it's one company's
confidential information. We may want to know what it's used for to
evaluate if it's a generally useful use case, or an unintentional
testing.

> 
> I had imagined it might be a way to call firmware code preventing the
> need to code of a specific interface for each type of firmware.
> 
> Eric
> 
>
diff mbox series

Patch

--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1001,6 +1001,12 @@ 
 
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
+		/*
+		 * This flow is analogous to hibernation flows that occur before
+		 * creating an image and before jumping from the restore kernel
+		 * to the image one, so it uses the same device device callbacks
+		 * as those two flows.
+		 */
 		pm_prepare_console();
 		error = freeze_processes();
 		if (error) {
@@ -1011,12 +1017,10 @@ 
 		error = dpm_suspend_start(PMSG_FREEZE);
 		if (error)
 			goto Resume_console;
-		/* At this point, dpm_suspend_start() has been called,
-		 * but *not* dpm_suspend_end(). We *must* call
-		 * dpm_suspend_end() now.  Otherwise, drivers for
-		 * some devices (e.g. interrupt controllers) become
-		 * desynchronized with the actual state of the
-		 * hardware at resume time, and evil weirdness ensues.
+		/*
+		 * dpm_suspend_end() must be called after dpm_suspend_start()
+		 * to complete the transition, like in the hibernation flows
+		 * mentioned above.
 		 */
 		error = dpm_suspend_end(PMSG_FREEZE);
 		if (error)
@@ -1052,6 +1056,13 @@ 
 
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
+		/*
+		 * This flow is analogous to hibernation flows that occur after
+		 * creating an image and after the image hernel has got control
+		 * back, and in case the devices have been reset or otherwise
+		 * manipulated in the meantime, it uses the device callbacks
+		 * used by the latter.
+		 */
 		syscore_resume();
  Enable_irqs:
 		local_irq_enable();