diff mbox series

[1/3] memblock: define functions to set the usable memory range

Message ID 20220110210809.3528-2-fllinden@amazon.com
State New
Headers show
Series usable memory range fixes (arm64/fdt/efi) | expand

Commit Message

Frank van der Linden Jan. 10, 2022, 9:08 p.m. UTC
Some architectures might limit the usable memory range based
on a firmware property, like "linux,usable-memory-range"
for ARM crash kernels. This limit needs to be enforced after
firmware memory map processing has been done, which might be
e.g. FDT or EFI, or both.

Define an interface for it that is firmware type agnostic.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
---
 include/linux/memblock.h |  2 ++
 mm/memblock.c            | 37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

Comments

Mike Rapoport Jan. 11, 2022, 10:31 a.m. UTC | #1
On Mon, Jan 10, 2022 at 09:08:07PM +0000, Frank van der Linden wrote:
> Some architectures might limit the usable memory range based
> on a firmware property, like "linux,usable-memory-range"
> for ARM crash kernels. This limit needs to be enforced after
> firmware memory map processing has been done, which might be
> e.g. FDT or EFI, or both.
> 
> Define an interface for it that is firmware type agnostic.
> 
> Signed-off-by: Frank van der Linden <fllinden@amazon.com>
> ---
>  include/linux/memblock.h |  2 ++
>  mm/memblock.c            | 37 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 39 insertions(+)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 34de69b3b8ba..6128efa50d33 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -481,6 +481,8 @@ phys_addr_t memblock_reserved_size(void);
>  phys_addr_t memblock_start_of_DRAM(void);
>  phys_addr_t memblock_end_of_DRAM(void);
>  void memblock_enforce_memory_limit(phys_addr_t memory_limit);
> +void memblock_set_usable_range(phys_addr_t base, phys_addr_t size);
> +void memblock_enforce_usable_range(void);
>  void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
>  void memblock_mem_limit_remove_map(phys_addr_t limit);

We already have 3 very similar interfaces that deal with memory capping.
Now you suggest to add fourth that will "generically" solve a single use
case of DT, EFI and kdump interaction on arm64.

Looks like a workaround for a fundamental issue of incompatibility between
DT and EFI wrt memory registration.

>  bool memblock_is_memory(phys_addr_t addr);
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 5096500b2647..cb961965f3ad 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -101,6 +101,7 @@ unsigned long max_low_pfn;
>  unsigned long min_low_pfn;
>  unsigned long max_pfn;
>  unsigned long long max_possible_pfn;
> +phys_addr_t usable_start, usable_size;
>
>  static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
>  static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_RESERVED_REGIONS] __initdata_memblock;
> @@ -1715,6 +1716,42 @@ void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t size)
>  			base + size, PHYS_ADDR_MAX);
>  }
>  
> +/**
> + * memblock_set_usable_range - set usable memory range
> + * @base: physical address that is the start of the range
> + * @size: size of the range.
> + *
> + * Used when a firmware property limits the range of usable
> + * memory, like for the linux,usable-memory-range property
> + * used by ARM crash kernels.
> + */
> +void __init memblock_set_usable_range(phys_addr_t base, phys_addr_t size)
> +{
> +	usable_start = base;
> +	usable_size = size;
> +}
> +
> +/**
> + * memblock_enforce_usable_range - cap memory ranges to usable range
> + *
> + * Some architectures call this during boot after firmware memory ranges
> + * have been scanned, to make sure they fall within the usable range
> + * set by memblock_set_usable_range.
> + *
> + * This may be called more than once if there are multiple firmware sources
> + * for memory ranges.
> + *
> + * Avoid "no memory registered" warning - the warning itself is
> + * useful, but we know this can be called with no registered
> + * memory (e.g. when the synthetic DT for the crash kernel has
> + * been parsed on EFI arm64 systems).
> + */
> +void __init memblock_enforce_usable_range(void)
> +{
> +	if (memblock_memory->total_size)
> +		memblock_cap_memory_range(usable_start, usable_size);
> +}
> +
>  void __init memblock_mem_limit_remove_map(phys_addr_t limit)
>  {
>  	phys_addr_t max_addr;
> -- 
> 2.32.0
>
Frank van der Linden Jan. 11, 2022, 8:44 p.m. UTC | #2
On Tue, Jan 11, 2022 at 12:31:58PM +0200, Mike Rapoport wrote:
> > --- a/include/linux/memblock.h
> > +++ b/include/linux/memblock.h
> > @@ -481,6 +481,8 @@ phys_addr_t memblock_reserved_size(void);
> >  phys_addr_t memblock_start_of_DRAM(void);
> >  phys_addr_t memblock_end_of_DRAM(void);
> >  void memblock_enforce_memory_limit(phys_addr_t memory_limit);
> > +void memblock_set_usable_range(phys_addr_t base, phys_addr_t size);
> > +void memblock_enforce_usable_range(void);
> >  void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
> >  void memblock_mem_limit_remove_map(phys_addr_t limit);
> 
> We already have 3 very similar interfaces that deal with memory capping.
> Now you suggest to add fourth that will "generically" solve a single use
> case of DT, EFI and kdump interaction on arm64.
> 
> Looks like a workaround for a fundamental issue of incompatibility between
> DT and EFI wrt memory registration.

Yep, I figured this would be the main argument against this - arm64
already added several other more-or-less special cased interfaces over
time.

I'm more than happy to solve this in a different way.

What would you suggest:

1) Try to merge the similar interfaces in to one.
2) Just deal with it at a lower (arm64) level?
3) Some other way?

Thanks,

- Frank
Mike Rapoport Jan. 12, 2022, 6:05 p.m. UTC | #3
On Tue, Jan 11, 2022 at 08:44:41PM +0000, Frank van der Linden wrote:
> On Tue, Jan 11, 2022 at 12:31:58PM +0200, Mike Rapoport wrote:
> > > --- a/include/linux/memblock.h
> > > +++ b/include/linux/memblock.h
> > > @@ -481,6 +481,8 @@ phys_addr_t memblock_reserved_size(void);
> > >  phys_addr_t memblock_start_of_DRAM(void);
> > >  phys_addr_t memblock_end_of_DRAM(void);
> > >  void memblock_enforce_memory_limit(phys_addr_t memory_limit);
> > > +void memblock_set_usable_range(phys_addr_t base, phys_addr_t size);
> > > +void memblock_enforce_usable_range(void);
> > >  void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
> > >  void memblock_mem_limit_remove_map(phys_addr_t limit);
> > 
> > We already have 3 very similar interfaces that deal with memory capping.
> > Now you suggest to add fourth that will "generically" solve a single use
> > case of DT, EFI and kdump interaction on arm64.
> > 
> > Looks like a workaround for a fundamental issue of incompatibility between
> > DT and EFI wrt memory registration.
> 
> Yep, I figured this would be the main argument against this - arm64
> already added several other more-or-less special cased interfaces over
> time.
> 
> I'm more than happy to solve this in a different way.
> 
> What would you suggest:
> 
> 1) Try to merge the similar interfaces in to one.

This could be a nice cleanup regardless of how we handle
"linux,usable-memory-range".

> 2) Just deal with it at a lower (arm64) level?

Probably it will be the simplest solution in the short term.

> 3) Some other way?

I'm not expert enough on DT and EFI to see how they communicate the
linux,usable-memory-range property. 

One thought I have is since we already create a DT for kexec/kdump why
can't we add some data to EFI memory description similar to
linux,usable-memore-range?

Another thing is, if we could presume that DT and EFI are consistent in
their view what is the span of the physical memory, we could drop
memblock_remove(EVERYTHIING) and early_init_dt_add_memory_arch() from
efi_init::reserve_regions() and then the loop over EFI memory descriptors
will only take care of reserved and nomap regions.

> Thanks,
> 
> - Frank
>
Mike Rapoport Jan. 13, 2022, 5:33 p.m. UTC | #4
On Tue, Jan 11, 2022 at 08:44:41PM +0000, Frank van der Linden wrote:
> On Tue, Jan 11, 2022 at 12:31:58PM +0200, Mike Rapoport wrote:
> > > --- a/include/linux/memblock.h
> > > +++ b/include/linux/memblock.h
> > > @@ -481,6 +481,8 @@ phys_addr_t memblock_reserved_size(void);
> > >  phys_addr_t memblock_start_of_DRAM(void);
> > >  phys_addr_t memblock_end_of_DRAM(void);
> > >  void memblock_enforce_memory_limit(phys_addr_t memory_limit);
> > > +void memblock_set_usable_range(phys_addr_t base, phys_addr_t size);
> > > +void memblock_enforce_usable_range(void);
> > >  void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
> > >  void memblock_mem_limit_remove_map(phys_addr_t limit);
> > 
> > We already have 3 very similar interfaces that deal with memory capping.
> > Now you suggest to add fourth that will "generically" solve a single use
> > case of DT, EFI and kdump interaction on arm64.
> > 
> > Looks like a workaround for a fundamental issue of incompatibility between
> > DT and EFI wrt memory registration.
> 
> Yep, I figured this would be the main argument against this - arm64
> already added several other more-or-less special cased interfaces over
> time.
> 
> I'm more than happy to solve this in a different way.
> 
> What would you suggest:
> 
> 1) Try to merge the similar interfaces in to one.
> 2) Just deal with it at a lower (arm64) level?
> 3) Some other way?

We've discussed this with Ard on IRC, and our conclusion was that on arm64
kdump kernel should have memblock.memory exactly the same as the normal
kernel. Then, the memory outside usable-memory-range should be reserved so
that kdump kernel won't step over it.

With that, simple (untested) patch below could be what we need:

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index bdca35284ceb..371418dffaf1 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -1275,7 +1275,8 @@ void __init early_init_dt_scan_nodes(void)
 	of_scan_flat_dt(early_init_dt_scan_memory, NULL);
 
 	/* Handle linux,usable-memory-range property */
-	memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
+	memblock_reserve(0, cap_mem_addr);
+	memblock_reserve(cap_mem_addr + cap_mem_size, PHYS_ADDR_MAX);
 }
 
 bool __init early_init_dt_scan(void *params)

> Thanks,
> 
> - Frank
>
Frank van der Linden Jan. 14, 2022, 12:10 a.m. UTC | #5
On Thu, Jan 13, 2022 at 07:33:11PM +0200, Mike Rapoport wrote:
> On Tue, Jan 11, 2022 at 08:44:41PM +0000, Frank van der Linden wrote:
> > On Tue, Jan 11, 2022 at 12:31:58PM +0200, Mike Rapoport wrote:
> > > > --- a/include/linux/memblock.h
> > > > +++ b/include/linux/memblock.h
> > > > @@ -481,6 +481,8 @@ phys_addr_t memblock_reserved_size(void);
> > > >  phys_addr_t memblock_start_of_DRAM(void);
> > > >  phys_addr_t memblock_end_of_DRAM(void);
> > > >  void memblock_enforce_memory_limit(phys_addr_t memory_limit);
> > > > +void memblock_set_usable_range(phys_addr_t base, phys_addr_t size);
> > > > +void memblock_enforce_usable_range(void);
> > > >  void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
> > > >  void memblock_mem_limit_remove_map(phys_addr_t limit);
> > >
> > > We already have 3 very similar interfaces that deal with memory capping.
> > > Now you suggest to add fourth that will "generically" solve a single use
> > > case of DT, EFI and kdump interaction on arm64.
> > >
> > > Looks like a workaround for a fundamental issue of incompatibility between
> > > DT and EFI wrt memory registration.
> >
> > Yep, I figured this would be the main argument against this - arm64
> > already added several other more-or-less special cased interfaces over
> > time.
> >
> > I'm more than happy to solve this in a different way.
> >
> > What would you suggest:
> >
> > 1) Try to merge the similar interfaces in to one.
> > 2) Just deal with it at a lower (arm64) level?
> > 3) Some other way?
> 
> We've discussed this with Ard on IRC, and our conclusion was that on arm64
> kdump kernel should have memblock.memory exactly the same as the normal
> kernel. Then, the memory outside usable-memory-range should be reserved so
> that kdump kernel won't step over it.
> 
> With that, simple (untested) patch below could be what we need:
> 
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index bdca35284ceb..371418dffaf1 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -1275,7 +1275,8 @@ void __init early_init_dt_scan_nodes(void)
>         of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> 
>         /* Handle linux,usable-memory-range property */
> -       memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
> +       memblock_reserve(0, cap_mem_addr);
> +       memblock_reserve(cap_mem_addr + cap_mem_size, PHYS_ADDR_MAX);
>  }
> 
>  bool __init early_init_dt_scan(void *params)

Thanks for discussing this further! I tried the above change, although
I wrapped it in an if (cap_mem_size != 0), so that a normal kernel
doesn't get its entire memory marked as reserved.

The crash kernel hung without producing output - I haven't chased that
down. This particular kernel was a bit older (5.15), so I'll try again
with 5.17-rc to make sure.

- Frank
Frank van der Linden Jan. 14, 2022, 12:22 a.m. UTC | #6
On Fri, Jan 14, 2022 at 12:10:46AM +0000, Frank van der Linden wrote:
> Thanks for discussing this further! I tried the above change, although
> I wrapped it in an if (cap_mem_size != 0), so that a normal kernel
> doesn't get its entire memory marked as reserved.

Just to clarify: I had to do that because the code is slightly different
on 5.15, where I tried it. It wasn't a bug in the proposed patch.

Anyway, will try 5.17-rc tomorrow, I'll let you know, thanks again.

- Frank
Frank van der Linden Jan. 14, 2022, 11:27 p.m. UTC | #7
On Thu, Jan 13, 2022 at 07:33:11PM +0200, Mike Rapoport wrote:
> On Tue, Jan 11, 2022 at 08:44:41PM +0000, Frank van der Linden wrote:
> > On Tue, Jan 11, 2022 at 12:31:58PM +0200, Mike Rapoport wrote:
> > > > --- a/include/linux/memblock.h
> > > > +++ b/include/linux/memblock.h
> > > > @@ -481,6 +481,8 @@ phys_addr_t memblock_reserved_size(void);
> > > >  phys_addr_t memblock_start_of_DRAM(void);
> > > >  phys_addr_t memblock_end_of_DRAM(void);
> > > >  void memblock_enforce_memory_limit(phys_addr_t memory_limit);
> > > > +void memblock_set_usable_range(phys_addr_t base, phys_addr_t size);
> > > > +void memblock_enforce_usable_range(void);
> > > >  void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
> > > >  void memblock_mem_limit_remove_map(phys_addr_t limit);
> > >
> > > We already have 3 very similar interfaces that deal with memory capping.
> > > Now you suggest to add fourth that will "generically" solve a single use
> > > case of DT, EFI and kdump interaction on arm64.
> > >
> > > Looks like a workaround for a fundamental issue of incompatibility between
> > > DT and EFI wrt memory registration.
> >
> > Yep, I figured this would be the main argument against this - arm64
> > already added several other more-or-less special cased interfaces over
> > time.
> >
> > I'm more than happy to solve this in a different way.
> >
> > What would you suggest:
> >
> > 1) Try to merge the similar interfaces in to one.
> > 2) Just deal with it at a lower (arm64) level?
> > 3) Some other way?
> 
> We've discussed this with Ard on IRC, and our conclusion was that on arm64
> kdump kernel should have memblock.memory exactly the same as the normal
> kernel. Then, the memory outside usable-memory-range should be reserved so
> that kdump kernel won't step over it.
> 
> With that, simple (untested) patch below could be what we need:
> 
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index bdca35284ceb..371418dffaf1 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -1275,7 +1275,8 @@ void __init early_init_dt_scan_nodes(void)
>         of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> 
>         /* Handle linux,usable-memory-range property */
> -       memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
> +       memblock_reserve(0, cap_mem_addr);
> +       memblock_reserve(cap_mem_addr + cap_mem_size, PHYS_ADDR_MAX);
>  }
> 
>  bool __init early_init_dt_scan(void *params)

Ok, tested this on 5.17-rc, and it's working OK there. Main kernel has
32G, crash kernel gets 512M:

Main kernel:

[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000040000000-0x00000000ffffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   [mem 0x0000000100000000-0x0000000b96ffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000040000000-0x00000000786effff]
[    0.000000]   node   0: [mem 0x00000000786f0000-0x000000007872ffff]
[    0.000000]   node   0: [mem 0x0000000078730000-0x000000007bbfffff]
[    0.000000]   node   0: [mem 0x000000007bc00000-0x000000007bfdffff]
[    0.000000]   node   0: [mem 0x000000007bfe0000-0x000000007fffffff]
[    0.000000]   node   0: [mem 0x0000000400000000-0x0000000b96ffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x0000000b96ffffff]
[    0.000000] On node 0, zone Normal: 4096 pages in unavailable ranges
[    0.000000] cma: Reserved 64 MiB at 0x000000007c000000
[    0.000000] crashkernel reserved: 0x0000000054400000 - 0x0000000074400000 (512 MB)


Crash kernel:

[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000054400000-0x000000007bfdffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000054400000-0x00000000743fffff]
[    0.000000]   node   0: [mem 0x00000000786f0000-0x000000007872ffff]
[    0.000000]   node   0: [mem 0x000000007bc00000-0x000000007bfdffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000054400000-0x000000007bfdffff]
[    0.000000] On node 0, zone DMA: 17408 pages in unavailable ranges
[    0.000000] On node 0, zone DMA: 17136 pages in unavailable ranges
[    0.000000] On node 0, zone DMA: 13520 pages in unavailable ranges
[    0.000000] On node 0, zone DMA: 16416 pages in unavailable ranges

Not sure why I had trouble with the same on 5.15, I'll have to look
at that again. But this seems fine for 5.16+

Thanks,

- Frank
Frank van der Linden Jan. 24, 2022, 9:05 p.m. UTC | #8
Meanwhile, it seems that this issue was already addressed in:

https://lore.kernel.org/all/20211215021348.8766-1-kernelfans@gmail.com/

..which has now been pulled in, and sent to stable@ for 5.15. I
somehow missed that message, and sent my change in a few weeks
later.

The fix to just reserve the ranges does seem a bit cleaner overall,
but this will do fine.

Thanks!

- Frank
Ard Biesheuvel Jan. 29, 2022, 4:19 p.m. UTC | #9
On Mon, 24 Jan 2022 at 22:05, Frank van der Linden <fllinden@amazon.com> wrote:
>
> Meanwhile, it seems that this issue was already addressed in:
>
> https://lore.kernel.org/all/20211215021348.8766-1-kernelfans@gmail.com/
>
> ..which has now been pulled in, and sent to stable@ for 5.15. I
> somehow missed that message, and sent my change in a few weeks
> later.
>
> The fix to just reserve the ranges does seem a bit cleaner overall,
> but this will do fine.
>

Works for me.
diff mbox series

Patch

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 34de69b3b8ba..6128efa50d33 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -481,6 +481,8 @@  phys_addr_t memblock_reserved_size(void);
 phys_addr_t memblock_start_of_DRAM(void);
 phys_addr_t memblock_end_of_DRAM(void);
 void memblock_enforce_memory_limit(phys_addr_t memory_limit);
+void memblock_set_usable_range(phys_addr_t base, phys_addr_t size);
+void memblock_enforce_usable_range(void);
 void memblock_cap_memory_range(phys_addr_t base, phys_addr_t size);
 void memblock_mem_limit_remove_map(phys_addr_t limit);
 bool memblock_is_memory(phys_addr_t addr);
diff --git a/mm/memblock.c b/mm/memblock.c
index 5096500b2647..cb961965f3ad 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -101,6 +101,7 @@  unsigned long max_low_pfn;
 unsigned long min_low_pfn;
 unsigned long max_pfn;
 unsigned long long max_possible_pfn;
+phys_addr_t usable_start, usable_size;
 
 static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
 static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_RESERVED_REGIONS] __initdata_memblock;
@@ -1715,6 +1716,42 @@  void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t size)
 			base + size, PHYS_ADDR_MAX);
 }
 
+/**
+ * memblock_set_usable_range - set usable memory range
+ * @base: physical address that is the start of the range
+ * @size: size of the range.
+ *
+ * Used when a firmware property limits the range of usable
+ * memory, like for the linux,usable-memory-range property
+ * used by ARM crash kernels.
+ */
+void __init memblock_set_usable_range(phys_addr_t base, phys_addr_t size)
+{
+	usable_start = base;
+	usable_size = size;
+}
+
+/**
+ * memblock_enforce_usable_range - cap memory ranges to usable range
+ *
+ * Some architectures call this during boot after firmware memory ranges
+ * have been scanned, to make sure they fall within the usable range
+ * set by memblock_set_usable_range.
+ *
+ * This may be called more than once if there are multiple firmware sources
+ * for memory ranges.
+ *
+ * Avoid "no memory registered" warning - the warning itself is
+ * useful, but we know this can be called with no registered
+ * memory (e.g. when the synthetic DT for the crash kernel has
+ * been parsed on EFI arm64 systems).
+ */
+void __init memblock_enforce_usable_range(void)
+{
+	if (memblock_memory->total_size)
+		memblock_cap_memory_range(usable_start, usable_size);
+}
+
 void __init memblock_mem_limit_remove_map(phys_addr_t limit)
 {
 	phys_addr_t max_addr;