diff mbox series

[v2] kernel/power : add pr_err() for debugging "Error -14 resuming" error

Message ID 20221031142017.26750-1-luoxueqin66@gmail.com
State New
Headers show
Series [v2] kernel/power : add pr_err() for debugging "Error -14 resuming" error | expand

Commit Message

Luo Xueqin Oct. 31, 2022, 2:20 p.m. UTC
From: Xueqin Luo <luoxueqin@kylinos.cn>

The system memory map can change over a hibernation-restore cycle due 
to a defect in the platform firmware, and some of the page frames used 
by the kernel before hibernation may not be available any more during 
the subsequent restore which leads to the error below.

[  T357] PM: Image loading progress:   0%
[  T357] PM: Read 2681596 kbytes in 0.03 seconds (89386.53 MB/s)
[  T357] PM: Error -14 resuming
[  T357] PM: Failed to load hibernation image, recovering.
[  T357] PM: Basic memory bitmaps freed
[  T357] OOM killer enabled.
[  T357] Restarting tasks ... done.
[  T357] PM: resume from hibernation failed (-14)
[  T357] PM: Hibernation image not present or could not be loaded.

So, by adding an Error message to the unpack () function, you can quickly 
navigate to the Error page number and analyze the cause when an "Error -14 
resuming" error occurs in S4. This can save developers the cost of 
debugging time.

Signed-off-by: Xueqin Luo <luoxueqin@kylinos.cn>
---

v2: Modify the commit message and pr_err() function output

 kernel/power/snapshot.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Rafael J. Wysocki Nov. 10, 2022, 2:13 p.m. UTC | #1
On Mon, Oct 31, 2022 at 3:20 PM Luo Xueqin <luoxueqin66@gmail.com> wrote:
>
> From: Xueqin Luo <luoxueqin@kylinos.cn>
>
> The system memory map can change over a hibernation-restore cycle due
> to a defect in the platform firmware, and some of the page frames used
> by the kernel before hibernation may not be available any more during
> the subsequent restore which leads to the error below.
>
> [  T357] PM: Image loading progress:   0%
> [  T357] PM: Read 2681596 kbytes in 0.03 seconds (89386.53 MB/s)
> [  T357] PM: Error -14 resuming
> [  T357] PM: Failed to load hibernation image, recovering.
> [  T357] PM: Basic memory bitmaps freed
> [  T357] OOM killer enabled.
> [  T357] Restarting tasks ... done.
> [  T357] PM: resume from hibernation failed (-14)
> [  T357] PM: Hibernation image not present or could not be loaded.
>
> So, by adding an Error message to the unpack () function, you can quickly
> navigate to the Error page number and analyze the cause when an "Error -14
> resuming" error occurs in S4. This can save developers the cost of
> debugging time.
>
> Signed-off-by: Xueqin Luo <luoxueqin@kylinos.cn>
> ---
>
> v2: Modify the commit message and pr_err() function output
>
>  kernel/power/snapshot.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
> index 2a406753af90..2be2e9f5a060 100644
> --- a/kernel/power/snapshot.c
> +++ b/kernel/power/snapshot.c
> @@ -2259,10 +2259,13 @@ static int unpack_orig_pfns(unsigned long *buf, struct memory_bitmap *bm)
>                 if (unlikely(buf[j] == BM_END_OF_MAP))
>                         break;
>
> -               if (pfn_valid(buf[j]) && memory_bm_pfn_present(bm, buf[j]))
> +               if (pfn_valid(buf[j]) && memory_bm_pfn_present(bm, buf[j])) {
>                         memory_bm_set_bit(bm, buf[j]);
> -               else
> +               } else {
> +                       if (!pfn_valid(buf[j]))
> +                               pr_err("The page frame number: %lx is not valid.\n", buf[j]);

What about printing this message instead:

pr_err(FW_BUG "Memory map mismatch at 0x%llx after hibernation\n",
page_address(pfn_to_page(buf[j])));

>                         return -EFAULT;
> +               }
>         }
>
>         return 0;
> --
> 2.25.1
>
Rafael J. Wysocki Nov. 10, 2022, 2:35 p.m. UTC | #2
On Thu, Nov 10, 2022 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, Oct 31, 2022 at 3:20 PM Luo Xueqin <luoxueqin66@gmail.com> wrote:
> >
> > From: Xueqin Luo <luoxueqin@kylinos.cn>
> >
> > The system memory map can change over a hibernation-restore cycle due
> > to a defect in the platform firmware, and some of the page frames used
> > by the kernel before hibernation may not be available any more during
> > the subsequent restore which leads to the error below.
> >
> > [  T357] PM: Image loading progress:   0%
> > [  T357] PM: Read 2681596 kbytes in 0.03 seconds (89386.53 MB/s)
> > [  T357] PM: Error -14 resuming
> > [  T357] PM: Failed to load hibernation image, recovering.
> > [  T357] PM: Basic memory bitmaps freed
> > [  T357] OOM killer enabled.
> > [  T357] Restarting tasks ... done.
> > [  T357] PM: resume from hibernation failed (-14)
> > [  T357] PM: Hibernation image not present or could not be loaded.
> >
> > So, by adding an Error message to the unpack () function, you can quickly
> > navigate to the Error page number and analyze the cause when an "Error -14
> > resuming" error occurs in S4. This can save developers the cost of
> > debugging time.
> >
> > Signed-off-by: Xueqin Luo <luoxueqin@kylinos.cn>
> > ---
> >
> > v2: Modify the commit message and pr_err() function output
> >
> >  kernel/power/snapshot.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
> > index 2a406753af90..2be2e9f5a060 100644
> > --- a/kernel/power/snapshot.c
> > +++ b/kernel/power/snapshot.c
> > @@ -2259,10 +2259,13 @@ static int unpack_orig_pfns(unsigned long *buf, struct memory_bitmap *bm)
> >                 if (unlikely(buf[j] == BM_END_OF_MAP))
> >                         break;
> >
> > -               if (pfn_valid(buf[j]) && memory_bm_pfn_present(bm, buf[j]))
> > +               if (pfn_valid(buf[j]) && memory_bm_pfn_present(bm, buf[j])) {
> >                         memory_bm_set_bit(bm, buf[j]);
> > -               else
> > +               } else {
> > +                       if (!pfn_valid(buf[j]))
> > +                               pr_err("The page frame number: %lx is not valid.\n", buf[j]);
>
> What about printing this message instead:
>
> pr_err(FW_BUG "Memory map mismatch at 0x%llx after hibernation\n",
> page_address(pfn_to_page(buf[j])));

Actually, this should be

pr_err(FW_BUG "Memory map mismatch at 0x%llx after hibernation\n",
PFN_PHYS(buf[j])));

>
> >                         return -EFAULT;
> > +               }
> >         }
> >
> >         return 0;
> > --
> > 2.25.1
> >
diff mbox series

Patch

diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 2a406753af90..2be2e9f5a060 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -2259,10 +2259,13 @@  static int unpack_orig_pfns(unsigned long *buf, struct memory_bitmap *bm)
 		if (unlikely(buf[j] == BM_END_OF_MAP))
 			break;
 
-		if (pfn_valid(buf[j]) && memory_bm_pfn_present(bm, buf[j]))
+		if (pfn_valid(buf[j]) && memory_bm_pfn_present(bm, buf[j])) {
 			memory_bm_set_bit(bm, buf[j]);
-		else
+		} else {
+			if (!pfn_valid(buf[j]))
+				pr_err("The page frame number: %lx is not valid.\n", buf[j]);
 			return -EFAULT;
+		}
 	}
 
 	return 0;