mbox series

[0/3] usable memory range fixes (arm64/fdt/efi)

Message ID 20220110210809.3528-1-fllinden@amazon.com
Headers show
Series usable memory range fixes (arm64/fdt/efi) | expand

Message

Frank van der Linden Jan. 10, 2022, 9:08 p.m. UTC
b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling")
moved capping memory ranges using the FDT-specified linux,usable-memory-range
to the FDT code. This property is used to specify the memory range that
a crash kernel runs in.

While this correctly filters any memory ranges that come from the DT,
this breaks crash kernels on arm64 EFI systems. In these cases, DT is used
for the usable-memory-range property, but the actual memory ranges
come from EFI. Since the call to filter them was moved to the FDT
code, which runs before the EFI init code, the EFI ranges are not
filtered anymore, leading to the crash kernel using memory that
it shouldn't.

This set fixes the the issue by having the EFI code cap its
memory ranges too, and defining a common interface for both the
DT and EFI code to use.

These changes stick to the "firmware code should cap its own memory ranges"
idea, using a common memblock interface. An alternative would be to
use an FDT-specific interface as before, called from arm64_memblock_init,
but having things a little more generalized seemed like a good idea.

This is only a functional change on architectures that have both
DT and EFI, and a usable-memory-range property (which is just arm64).
On any other architecture, usable_size will not be set, leading to a
memblock_cap_memory_range call with 0 size, which is a no-op.

Frank van der Linden (3):
  memblock: define functions to set the usable memory range
  of: fdt: use memblock usable range interface
  efi: enforce usable memory range after reserving regions

 drivers/firmware/efi/efi-init.c |  7 +++++++
 drivers/of/fdt.c                |  3 ++-
 include/linux/memblock.h        |  2 ++
 mm/memblock.c                   | 37 +++++++++++++++++++++++++++++++++
 4 files changed, 48 insertions(+), 1 deletion(-)

Comments

Frank van der Linden Jan. 11, 2022, 8:44 p.m. UTC | #1
On Tue, Jan 11, 2022 at 12:31:58PM +0200, Mike Rapoport wrote:
> > --- a/include/linux/memblock.h
> > +++ b/include/linux/memblock.h
> > @@ -481,6 +481,8 @@ phys_addr_t memblock_reserved_size(void);
> >  phys_addr_t memblock_start_of_DRAM(void);
> >  phys_addr_t memblock_end_of_DRAM(void);
> >  void memblock_enforce_memory_limit(phys_addr_t memory_limit);
> > +void memblock_set_usable_range(phys_addr_t base, phys_addr_t size);
> > +void memblock_enforce_usable_range(void);
> >  void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
> >  void memblock_mem_limit_remove_map(phys_addr_t limit);
> 
> We already have 3 very similar interfaces that deal with memory capping.
> Now you suggest to add fourth that will "generically" solve a single use
> case of DT, EFI and kdump interaction on arm64.
> 
> Looks like a workaround for a fundamental issue of incompatibility between
> DT and EFI wrt memory registration.

Yep, I figured this would be the main argument against this - arm64
already added several other more-or-less special cased interfaces over
time.

I'm more than happy to solve this in a different way.

What would you suggest:

1) Try to merge the similar interfaces in to one.
2) Just deal with it at a lower (arm64) level?
3) Some other way?

Thanks,

- Frank
Mike Rapoport Jan. 13, 2022, 5:33 p.m. UTC | #2
On Tue, Jan 11, 2022 at 08:44:41PM +0000, Frank van der Linden wrote:
> On Tue, Jan 11, 2022 at 12:31:58PM +0200, Mike Rapoport wrote:
> > > --- a/include/linux/memblock.h
> > > +++ b/include/linux/memblock.h
> > > @@ -481,6 +481,8 @@ phys_addr_t memblock_reserved_size(void);
> > >  phys_addr_t memblock_start_of_DRAM(void);
> > >  phys_addr_t memblock_end_of_DRAM(void);
> > >  void memblock_enforce_memory_limit(phys_addr_t memory_limit);
> > > +void memblock_set_usable_range(phys_addr_t base, phys_addr_t size);
> > > +void memblock_enforce_usable_range(void);
> > >  void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
> > >  void memblock_mem_limit_remove_map(phys_addr_t limit);
> > 
> > We already have 3 very similar interfaces that deal with memory capping.
> > Now you suggest to add fourth that will "generically" solve a single use
> > case of DT, EFI and kdump interaction on arm64.
> > 
> > Looks like a workaround for a fundamental issue of incompatibility between
> > DT and EFI wrt memory registration.
> 
> Yep, I figured this would be the main argument against this - arm64
> already added several other more-or-less special cased interfaces over
> time.
> 
> I'm more than happy to solve this in a different way.
> 
> What would you suggest:
> 
> 1) Try to merge the similar interfaces in to one.
> 2) Just deal with it at a lower (arm64) level?
> 3) Some other way?

We've discussed this with Ard on IRC, and our conclusion was that on arm64
kdump kernel should have memblock.memory exactly the same as the normal
kernel. Then, the memory outside usable-memory-range should be reserved so
that kdump kernel won't step over it.

With that, simple (untested) patch below could be what we need:

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index bdca35284ceb..371418dffaf1 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -1275,7 +1275,8 @@ void __init early_init_dt_scan_nodes(void)
 	of_scan_flat_dt(early_init_dt_scan_memory, NULL);
 
 	/* Handle linux,usable-memory-range property */
-	memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
+	memblock_reserve(0, cap_mem_addr);
+	memblock_reserve(cap_mem_addr + cap_mem_size, PHYS_ADDR_MAX);
 }
 
 bool __init early_init_dt_scan(void *params)

> Thanks,
> 
> - Frank
>
Frank van der Linden Jan. 14, 2022, 12:10 a.m. UTC | #3
On Thu, Jan 13, 2022 at 07:33:11PM +0200, Mike Rapoport wrote:
> On Tue, Jan 11, 2022 at 08:44:41PM +0000, Frank van der Linden wrote:
> > On Tue, Jan 11, 2022 at 12:31:58PM +0200, Mike Rapoport wrote:
> > > > --- a/include/linux/memblock.h
> > > > +++ b/include/linux/memblock.h
> > > > @@ -481,6 +481,8 @@ phys_addr_t memblock_reserved_size(void);
> > > >  phys_addr_t memblock_start_of_DRAM(void);
> > > >  phys_addr_t memblock_end_of_DRAM(void);
> > > >  void memblock_enforce_memory_limit(phys_addr_t memory_limit);
> > > > +void memblock_set_usable_range(phys_addr_t base, phys_addr_t size);
> > > > +void memblock_enforce_usable_range(void);
> > > >  void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
> > > >  void memblock_mem_limit_remove_map(phys_addr_t limit);
> > >
> > > We already have 3 very similar interfaces that deal with memory capping.
> > > Now you suggest to add fourth that will "generically" solve a single use
> > > case of DT, EFI and kdump interaction on arm64.
> > >
> > > Looks like a workaround for a fundamental issue of incompatibility between
> > > DT and EFI wrt memory registration.
> >
> > Yep, I figured this would be the main argument against this - arm64
> > already added several other more-or-less special cased interfaces over
> > time.
> >
> > I'm more than happy to solve this in a different way.
> >
> > What would you suggest:
> >
> > 1) Try to merge the similar interfaces in to one.
> > 2) Just deal with it at a lower (arm64) level?
> > 3) Some other way?
> 
> We've discussed this with Ard on IRC, and our conclusion was that on arm64
> kdump kernel should have memblock.memory exactly the same as the normal
> kernel. Then, the memory outside usable-memory-range should be reserved so
> that kdump kernel won't step over it.
> 
> With that, simple (untested) patch below could be what we need:
> 
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index bdca35284ceb..371418dffaf1 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -1275,7 +1275,8 @@ void __init early_init_dt_scan_nodes(void)
>         of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> 
>         /* Handle linux,usable-memory-range property */
> -       memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
> +       memblock_reserve(0, cap_mem_addr);
> +       memblock_reserve(cap_mem_addr + cap_mem_size, PHYS_ADDR_MAX);
>  }
> 
>  bool __init early_init_dt_scan(void *params)

Thanks for discussing this further! I tried the above change, although
I wrapped it in an if (cap_mem_size != 0), so that a normal kernel
doesn't get its entire memory marked as reserved.

The crash kernel hung without producing output - I haven't chased that
down. This particular kernel was a bit older (5.15), so I'll try again
with 5.17-rc to make sure.

- Frank
Frank van der Linden Jan. 14, 2022, 12:22 a.m. UTC | #4
On Fri, Jan 14, 2022 at 12:10:46AM +0000, Frank van der Linden wrote:
> Thanks for discussing this further! I tried the above change, although
> I wrapped it in an if (cap_mem_size != 0), so that a normal kernel
> doesn't get its entire memory marked as reserved.

Just to clarify: I had to do that because the code is slightly different
on 5.15, where I tried it. It wasn't a bug in the proposed patch.

Anyway, will try 5.17-rc tomorrow, I'll let you know, thanks again.

- Frank
Frank van der Linden Jan. 14, 2022, 11:27 p.m. UTC | #5
On Thu, Jan 13, 2022 at 07:33:11PM +0200, Mike Rapoport wrote:
> On Tue, Jan 11, 2022 at 08:44:41PM +0000, Frank van der Linden wrote:
> > On Tue, Jan 11, 2022 at 12:31:58PM +0200, Mike Rapoport wrote:
> > > > --- a/include/linux/memblock.h
> > > > +++ b/include/linux/memblock.h
> > > > @@ -481,6 +481,8 @@ phys_addr_t memblock_reserved_size(void);
> > > >  phys_addr_t memblock_start_of_DRAM(void);
> > > >  phys_addr_t memblock_end_of_DRAM(void);
> > > >  void memblock_enforce_memory_limit(phys_addr_t memory_limit);
> > > > +void memblock_set_usable_range(phys_addr_t base, phys_addr_t size);
> > > > +void memblock_enforce_usable_range(void);
> > > >  void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
> > > >  void memblock_mem_limit_remove_map(phys_addr_t limit);
> > >
> > > We already have 3 very similar interfaces that deal with memory capping.
> > > Now you suggest to add fourth that will "generically" solve a single use
> > > case of DT, EFI and kdump interaction on arm64.
> > >
> > > Looks like a workaround for a fundamental issue of incompatibility between
> > > DT and EFI wrt memory registration.
> >
> > Yep, I figured this would be the main argument against this - arm64
> > already added several other more-or-less special cased interfaces over
> > time.
> >
> > I'm more than happy to solve this in a different way.
> >
> > What would you suggest:
> >
> > 1) Try to merge the similar interfaces in to one.
> > 2) Just deal with it at a lower (arm64) level?
> > 3) Some other way?
> 
> We've discussed this with Ard on IRC, and our conclusion was that on arm64
> kdump kernel should have memblock.memory exactly the same as the normal
> kernel. Then, the memory outside usable-memory-range should be reserved so
> that kdump kernel won't step over it.
> 
> With that, simple (untested) patch below could be what we need:
> 
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index bdca35284ceb..371418dffaf1 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -1275,7 +1275,8 @@ void __init early_init_dt_scan_nodes(void)
>         of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> 
>         /* Handle linux,usable-memory-range property */
> -       memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
> +       memblock_reserve(0, cap_mem_addr);
> +       memblock_reserve(cap_mem_addr + cap_mem_size, PHYS_ADDR_MAX);
>  }
> 
>  bool __init early_init_dt_scan(void *params)

Ok, tested this on 5.17-rc, and it's working OK there. Main kernel has
32G, crash kernel gets 512M:

Main kernel:

[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000040000000-0x00000000ffffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   [mem 0x0000000100000000-0x0000000b96ffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000040000000-0x00000000786effff]
[    0.000000]   node   0: [mem 0x00000000786f0000-0x000000007872ffff]
[    0.000000]   node   0: [mem 0x0000000078730000-0x000000007bbfffff]
[    0.000000]   node   0: [mem 0x000000007bc00000-0x000000007bfdffff]
[    0.000000]   node   0: [mem 0x000000007bfe0000-0x000000007fffffff]
[    0.000000]   node   0: [mem 0x0000000400000000-0x0000000b96ffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x0000000b96ffffff]
[    0.000000] On node 0, zone Normal: 4096 pages in unavailable ranges
[    0.000000] cma: Reserved 64 MiB at 0x000000007c000000
[    0.000000] crashkernel reserved: 0x0000000054400000 - 0x0000000074400000 (512 MB)


Crash kernel:

[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000054400000-0x000000007bfdffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000054400000-0x00000000743fffff]
[    0.000000]   node   0: [mem 0x00000000786f0000-0x000000007872ffff]
[    0.000000]   node   0: [mem 0x000000007bc00000-0x000000007bfdffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000054400000-0x000000007bfdffff]
[    0.000000] On node 0, zone DMA: 17408 pages in unavailable ranges
[    0.000000] On node 0, zone DMA: 17136 pages in unavailable ranges
[    0.000000] On node 0, zone DMA: 13520 pages in unavailable ranges
[    0.000000] On node 0, zone DMA: 16416 pages in unavailable ranges

Not sure why I had trouble with the same on 5.15, I'll have to look
at that again. But this seems fine for 5.16+

Thanks,

- Frank
Frank van der Linden Jan. 24, 2022, 9:05 p.m. UTC | #6
Meanwhile, it seems that this issue was already addressed in:

https://lore.kernel.org/all/20211215021348.8766-1-kernelfans@gmail.com/

..which has now been pulled in, and sent to stable@ for 5.15. I
somehow missed that message, and sent my change in a few weeks
later.

The fix to just reserve the ranges does seem a bit cleaner overall,
but this will do fine.

Thanks!

- Frank