mbox series

[v3,0/6] introduce mirrored memory support for arm64

Message ID 20220607093805.1354256-1-mawupeng1@huawei.com
Headers show
Series introduce mirrored memory support for arm64 | expand

Message

mawupeng June 7, 2022, 9:37 a.m. UTC
From: Ma Wupeng <mawupeng1@huawei.com>

Commit b05b9f5f9dcf ("x86, mirror: x86 enabling - find mirrored memory ranges")
introduced mirrored memory support for x86. This support rely on UEFI to
report mirrored memory address ranges.  See UEFI 2.5 spec pages 157-158:

  http://www.uefi.org/sites/default/files/resources/UEFI%202_5.pdf

Memory mirroring is a technique used to separate memory into two separate
channels, usually on a memory device, like a server. In memory mirroring,
one channel is copied to another to create redundancy. This method makes
input/output (I/O) registers and memory appear with more than one address
range because the same physical byte is accessible at more than one
address. Using memory mirroring, higher memory reliability and a higher
level of memory consolidation are possible.

These EFI memory regions have various attributes, and the "mirrored"
attribute is one of them. The physical memory region whose descriptors
in EFI memory map has EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are
mirrored. The address range mirroring feature of the kernel arranges such
mirrored regions into normal zones and other regions into movable zones.

Arm64 can support this too. So mirrored memory support is added to support
arm64.

The main purpose of this patch set is to introduce mirrored support for
arm64 and we have already fixed the problems we had which is shown in
patch #5 to patch #8 and try to bring total isolation in patch #9 which
will disable mirror feature if kernelcore is not specified.

In order to test this support in arm64:
- patch this patch set
- add kernelcore=mirror in kernel parameter
- start you kernel

Patch #1-#2 introduce mirrored memory support form arm64.
Patch #3-#5 fix some bugs for arm64 if memory reliable is enabled.
Patch #6 disable mirror feature if kernelcore is not specified.

Thanks to Ard Biesheuvel's hard work [1], now kernel will perfer mirrored
memory if kaslr is enabled.

[1] https://lore.kernel.org/linux-arm-kernel/CAMj1kXEPVEzMgOM4+Yj6PxHA-jFuDOAUdDJSiSxy_XaP4P7LSw@mail.gmail.com/T/

Changelog since v2:
- remove efi_fake_mem support
- remove Commit ("remove some redundant code in ia64 efi_init") since
  efi_print_memmap() is not public
- add mirror flag back on initrd memory

Changelog since v1:
- update changelog in cover letter
- use PHYS_PFN in patch #7

Ma Wupeng (6):
  efi: Make efi_find_mirror() public
  arm64/mirror: arm64 enabling - find mirrored memory ranges
  mm: Ratelimited mirrored memory related warning messages
  mm: Demote warning message in vmemmap_verify() to debug level
  mm: Add mirror flag back on initrd memory
  efi: Disable mirror feature if kernelcore is not specified

 .../admin-guide/kernel-parameters.txt         |  2 +-
 arch/arm64/kernel/setup.c                     |  1 +
 arch/arm64/mm/init.c                          |  9 +++++++
 arch/x86/include/asm/efi.h                    |  4 ---
 arch/x86/platform/efi/efi.c                   | 23 ----------------
 drivers/firmware/efi/efi.c                    | 26 +++++++++++++++++++
 include/linux/efi.h                           |  3 +++
 include/linux/memblock.h                      |  1 +
 include/linux/mm.h                            |  2 ++
 mm/memblock.c                                 | 24 +++++++++++++++--
 mm/page_alloc.c                               |  2 +-
 mm/sparse-vmemmap.c                           |  2 +-
 12 files changed, 67 insertions(+), 32 deletions(-)

Comments

David Hildenbrand June 7, 2022, 12:25 p.m. UTC | #1
On 07.06.22 11:38, Wupeng Ma wrote:
> From: Ma Wupeng <mawupeng1@huawei.com>
> 
> For a system only have limited mirrored memory or some numa node without
> mirrored memory, the per node vmemmap page_structs prefer to allocate
> memory from mirrored region, which will lead to vmemmap_verify() in
> vmemmap_populate_basepages() report lots of warning message.
> 
> This patch demote the "potential offnode page_structs" warning messages
> to debug level to avoid a very long print during bootup.
> 
> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> ---
>  mm/sparse-vmemmap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index f4fa61dbbee3..78debdb89eb1 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -528,7 +528,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
>  	int actual_node = early_pfn_to_nid(pfn);
>  
>  	if (node_distance(actual_node, node) > LOCAL_DISTANCE)
> -		pr_warn("[%lx-%lx] potential offnode page_structs\n",
> +		pr_debug("[%lx-%lx] potential offnode page_structs\n",
>  			start, end - 1);
>  }
>  

This will possibly hide it in environments where this might indeed
indicate performance issues.

What about a pr_warn_once()?
mawupeng June 8, 2022, 1:26 a.m. UTC | #2
在 2022/6/7 20:25, David Hildenbrand 写道:
> On 07.06.22 11:38, Wupeng Ma wrote:
>> From: Ma Wupeng <mawupeng1@huawei.com>
>>
>> For a system only have limited mirrored memory or some numa node without
>> mirrored memory, the per node vmemmap page_structs prefer to allocate
>> memory from mirrored region, which will lead to vmemmap_verify() in
>> vmemmap_populate_basepages() report lots of warning message.
>>
>> This patch demote the "potential offnode page_structs" warning messages
>> to debug level to avoid a very long print during bootup.
>>
>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
>> ---
>>   mm/sparse-vmemmap.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>> index f4fa61dbbee3..78debdb89eb1 100644
>> --- a/mm/sparse-vmemmap.c
>> +++ b/mm/sparse-vmemmap.c
>> @@ -528,7 +528,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
>>   	int actual_node = early_pfn_to_nid(pfn);
>>   
>>   	if (node_distance(actual_node, node) > LOCAL_DISTANCE)
>> -		pr_warn("[%lx-%lx] potential offnode page_structs\n",
>> +		pr_debug("[%lx-%lx] potential offnode page_structs\n",
>>   			start, end - 1);
>>   }
>>   
> 
> This will possibly hide it in environments where this might indeed
> indicate performance issues.
> 
> What about a pr_warn_once()?
> 

Sure.

This will works. We can certainly use a pr_warn_once().
Anshuman Khandual June 8, 2022, 10 a.m. UTC | #3
On 6/8/22 06:56, mawupeng wrote:
> 
> 
> 在 2022/6/7 20:25, David Hildenbrand 写道:
>> On 07.06.22 11:38, Wupeng Ma wrote:
>>> From: Ma Wupeng <mawupeng1@huawei.com>
>>>
>>> For a system only have limited mirrored memory or some numa node without
>>> mirrored memory, the per node vmemmap page_structs prefer to allocate
>>> memory from mirrored region, which will lead to vmemmap_verify() in
>>> vmemmap_populate_basepages() report lots of warning message.
>>>
>>> This patch demote the "potential offnode page_structs" warning messages
>>> to debug level to avoid a very long print during bootup.
>>>
>>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
>>> ---
>>>   mm/sparse-vmemmap.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>>> index f4fa61dbbee3..78debdb89eb1 100644
>>> --- a/mm/sparse-vmemmap.c
>>> +++ b/mm/sparse-vmemmap.c
>>> @@ -528,7 +528,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
>>>       int actual_node = early_pfn_to_nid(pfn);
>>>         if (node_distance(actual_node, node) > LOCAL_DISTANCE)
>>> -        pr_warn("[%lx-%lx] potential offnode page_structs\n",
>>> +        pr_debug("[%lx-%lx] potential offnode page_structs\n",
>>>               start, end - 1);
>>>   }
>>>   
>>
>> This will possibly hide it in environments where this might indeed
>> indicate performance issues.
>>
>> What about a pr_warn_once()?
>>
> 
> Sure.
> 
> This will works. We can certainly use a pr_warn_once().

Why not pr_warn_ratelimited() like in the previous patch ?
mawupeng June 9, 2022, 8:13 a.m. UTC | #4
在 2022/6/8 18:00, Anshuman Khandual 写道:
> 
> 
> On 6/8/22 06:56, mawupeng wrote:
>>
>>
>> 在 2022/6/7 20:25, David Hildenbrand 写道:
>>> On 07.06.22 11:38, Wupeng Ma wrote:
>>>> From: Ma Wupeng <mawupeng1@huawei.com>
>>>>
>>>> For a system only have limited mirrored memory or some numa node without
>>>> mirrored memory, the per node vmemmap page_structs prefer to allocate
>>>> memory from mirrored region, which will lead to vmemmap_verify() in
>>>> vmemmap_populate_basepages() report lots of warning message.
>>>>
>>>> This patch demote the "potential offnode page_structs" warning messages
>>>> to debug level to avoid a very long print during bootup.
>>>>
>>>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
>>>> ---
>>>>    mm/sparse-vmemmap.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>>>> index f4fa61dbbee3..78debdb89eb1 100644
>>>> --- a/mm/sparse-vmemmap.c
>>>> +++ b/mm/sparse-vmemmap.c
>>>> @@ -528,7 +528,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
>>>>        int actual_node = early_pfn_to_nid(pfn);
>>>>          if (node_distance(actual_node, node) > LOCAL_DISTANCE)
>>>> -        pr_warn("[%lx-%lx] potential offnode page_structs\n",
>>>> +        pr_debug("[%lx-%lx] potential offnode page_structs\n",
>>>>                start, end - 1);
>>>>    }
>>>>    
>>>
>>> This will possibly hide it in environments where this might indeed
>>> indicate performance issues.
>>>
>>> What about a pr_warn_once()?
>>>
>>
>> Sure.
>>
>> This will works. We can certainly use a pr_warn_once().
> 
> Why not pr_warn_ratelimited() like in the previous patch ?

Function vmemmap_populate_basepages() is used to populate base pages.
System with huge memory will produce lots lots of warning message
during this populate process even with ratelimited. This may lead to slow
startup.

Thanks for reviewing.

> .
Kefeng Wang June 10, 2022, 9:22 a.m. UTC | #5
On 2022/6/7 17:38, Wupeng Ma wrote:
> From: Ma Wupeng <mawupeng1@huawei.com>
>
> Commit b05b9f5f9dcf ("x86, mirror: x86 enabling - find mirrored memory
> ranges") introduce the efi_find_mirror function on x86. In order to reuse
> the API we make it public in preparation for arm64 to support mirrord
> memory.
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> ---
>   arch/x86/include/asm/efi.h  |  4 ----
>   arch/x86/platform/efi/efi.c | 23 -----------------------
>   drivers/firmware/efi/efi.c  | 23 +++++++++++++++++++++++
>   include/linux/efi.h         |  3 +++
>   4 files changed, 26 insertions(+), 27 deletions(-)
>
> diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
> index 71943dce691e..eb90206eae80 100644
> --- a/arch/x86/include/asm/efi.h
> +++ b/arch/x86/include/asm/efi.h
> @@ -383,7 +383,6 @@ static inline bool efi_is_64bit(void)
>   extern bool efi_reboot_required(void);
>   extern bool efi_is_table_address(unsigned long phys_addr);
>   
> -extern void efi_find_mirror(void);
>   extern void efi_reserve_boot_services(void);
>   #else
>   static inline void parse_efi_setup(u64 phys_addr, u32 data_len) {}
> @@ -395,9 +394,6 @@ static inline  bool efi_is_table_address(unsigned long phys_addr)
>   {
>   	return false;
>   }
> -static inline void efi_find_mirror(void)
> -{
> -}
>   static inline void efi_reserve_boot_services(void)
>   {
>   }
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 1591d67e0bcd..6e598bd78eef 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -108,29 +108,6 @@ static int __init setup_add_efi_memmap(char *arg)
>   }
>   early_param("add_efi_memmap", setup_add_efi_memmap);
>   
> -void __init efi_find_mirror(void)
> -{
> -	efi_memory_desc_t *md;
> -	u64 mirror_size = 0, total_size = 0;
> -
> -	if (!efi_enabled(EFI_MEMMAP))
> -		return;
> -
> -	for_each_efi_memory_desc(md) {
> -		unsigned long long start = md->phys_addr;
> -		unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
> -
> -		total_size += size;
> -		if (md->attribute & EFI_MEMORY_MORE_RELIABLE) {
> -			memblock_mark_mirror(start, size);
> -			mirror_size += size;
> -		}
> -	}
> -	if (mirror_size)
> -		pr_info("Memory: %lldM/%lldM mirrored memory\n",
> -			mirror_size>>20, total_size>>20);
> -}
> -
>   /*
>    * Tell the kernel about the EFI memory map.  This might include
>    * more than the max 128 entries that can fit in the passed in e820
> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index 860534bcfdac..79c232e07de7 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -446,6 +446,29 @@ static int __init efisubsys_init(void)
>   
>   subsys_initcall(efisubsys_init);
>   
> +void __init efi_find_mirror(void)
> +{
> +	efi_memory_desc_t *md;
> +	u64 mirror_size = 0, total_size = 0;
> +
> +	if (!efi_enabled(EFI_MEMMAP))
> +		return;
> +
> +	for_each_efi_memory_desc(md) {
> +		unsigned long long start = md->phys_addr;
> +		unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
> +
> +		total_size += size;
> +		if (md->attribute & EFI_MEMORY_MORE_RELIABLE) {
> +			memblock_mark_mirror(start, size);
> +			mirror_size += size;
> +		}
> +	}
> +	if (mirror_size)
> +		pr_info("Memory: %lldM/%lldM mirrored memory\n",
> +			mirror_size>>20, total_size>>20);
> +}
> +
>   /*
>    * Find the efi memory descriptor for a given physical address.  Given a
>    * physical address, determine if it exists within an EFI Memory Map entry,
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index 7d9b0bb47eb3..53f64c14a525 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -872,6 +872,7 @@ static inline bool efi_rt_services_supported(unsigned int mask)
>   {
>   	return (efi.runtime_supported_mask & mask) == mask;
>   }
> +extern void efi_find_mirror(void);
>   #else
>   static inline bool efi_enabled(int feature)
>   {
> @@ -889,6 +890,8 @@ static inline bool efi_rt_services_supported(unsigned int mask)
>   {
>   	return false;
>   }
> +
> +static inline void efi_find_mirror(void) {}
>   #endif
>   
>   extern int efi_status_to_err(efi_status_t status);
Kefeng Wang June 10, 2022, 9:35 a.m. UTC | #6
On 2022/6/9 16:13, mawupeng wrote:
>
>
> 在 2022/6/8 18:00, Anshuman Khandual 写道:
>>
>>
>> On 6/8/22 06:56, mawupeng wrote:
>>>
>>>
>>> 在 2022/6/7 20:25, David Hildenbrand 写道:
>>>> On 07.06.22 11:38, Wupeng Ma wrote:
>>>>> From: Ma Wupeng <mawupeng1@huawei.com>
>>>>>
>>>>> For a system only have limited mirrored memory or some numa node 
>>>>> without
>>>>> mirrored memory, the per node vmemmap page_structs prefer to allocate
>>>>> memory from mirrored region, which will lead to vmemmap_verify() in
>>>>> vmemmap_populate_basepages() report lots of warning message.
>>>>>
>>>>> This patch demote the "potential offnode page_structs" warning 
>>>>> messages
>>>>> to debug level to avoid a very long print during bootup.
>>>>>
>>>>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
>>>>> ---
>>>>>    mm/sparse-vmemmap.c | 2 +-
>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>>>>> index f4fa61dbbee3..78debdb89eb1 100644
>>>>> --- a/mm/sparse-vmemmap.c
>>>>> +++ b/mm/sparse-vmemmap.c
>>>>> @@ -528,7 +528,7 @@ void __meminit vmemmap_verify(pte_t *pte, int 
>>>>> node,
>>>>>        int actual_node = early_pfn_to_nid(pfn);
>>>>>          if (node_distance(actual_node, node) > LOCAL_DISTANCE)
>>>>> -        pr_warn("[%lx-%lx] potential offnode page_structs\n",
>>>>> +        pr_debug("[%lx-%lx] potential offnode page_structs\n",
>>>>>                start, end - 1);
>>>>>    }
>>>>
>>>> This will possibly hide it in environments where this might indeed
>>>> indicate performance issues.
>>>>
>>>> What about a pr_warn_once()?
>>>>
>>>
>>> Sure.
>>>
>>> This will works. We can certainly use a pr_warn_once().
>>
>> Why not pr_warn_ratelimited() like in the previous patch ?
>
> Function vmemmap_populate_basepages() is used to populate base pages.
> System with huge memory will produce lots lots of warning message
> during this populate process even with ratelimited. This may lead to slow
> startup.

I think pr_warn_once is better, the memblock_alloc fallback is not frequent,

but vmemmap_verify will verify each memory and print a lot.

>
> Thanks for reviewing.
>
>> .
> .
Ard Biesheuvel June 10, 2022, 11:20 a.m. UTC | #7
On Tue, 7 Jun 2022 at 11:16, Wupeng Ma <mawupeng1@huawei.com> wrote:
>
> From: Ma Wupeng <mawupeng1@huawei.com>
>
> If system have some mirrored memory and mirrored feature is not specified
> in boot parameter, the basic mirrored feature will be enabled and this will
> lead to the following situations:
>
> - memblock memory allocation prefers mirrored region. This may have some
>   unexpected influence on numa affinity.
>
> - contiguous memory will be split into several parts if parts of them
>   is mirrored memory via memblock_mark_mirror().
>
> To fix this, variable mirrored_kernelcore will be checked before calling
> efi_find_mirror() which will enable basic mirrored feature and this
> variable is true if kernelcore=mirror is added in the kernel parameters.
>
> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>

This seems like the wrong place to do this. If mirrored memory is
irrelevant to memblock, it should ignore the attribute. So I think
this check belongs in mm/memblock.c instead.


> ---
>  drivers/firmware/efi/efi.c | 3 +++
>  include/linux/mm.h         | 2 ++
>  mm/page_alloc.c            | 2 +-
>  3 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index 79c232e07de7..8a5edcb0dd82 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -454,6 +454,9 @@ void __init efi_find_mirror(void)
>         if (!efi_enabled(EFI_MEMMAP))
>                 return;
>
> +       if (!mirrored_kernelcore)
> +               return;
> +
>         for_each_efi_memory_desc(md) {
>                 unsigned long long start = md->phys_addr;
>                 unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bc8f326be0ce..741ac7d022c3 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2540,6 +2540,8 @@ extern void get_pfn_range_for_nid(unsigned int nid,
>                         unsigned long *start_pfn, unsigned long *end_pfn);
>  extern unsigned long find_min_pfn_with_active_regions(void);
>
> +extern bool mirrored_kernelcore;
> +
>  #ifndef CONFIG_NUMA
>  static inline int early_pfn_to_nid(unsigned long pfn)
>  {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e008a3df0485..cf6f70aba787 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -356,7 +356,7 @@ static unsigned long required_kernelcore_percent __initdata;
>  static unsigned long required_movablecore __initdata;
>  static unsigned long required_movablecore_percent __initdata;
>  static unsigned long zone_movable_pfn[MAX_NUMNODES] __initdata;
> -static bool mirrored_kernelcore __meminitdata;
> +bool mirrored_kernelcore __meminitdata;
>
>  /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
>  int movable_zone;
> --
> 2.25.1
>
Ard Biesheuvel June 10, 2022, 11:23 a.m. UTC | #8
On Tue, 7 Jun 2022 at 11:16, Wupeng Ma <mawupeng1@huawei.com> wrote:
>
> From: Ma Wupeng <mawupeng1@huawei.com>
>
> Commit b05b9f5f9dcf ("x86, mirror: x86 enabling - find mirrored memory ranges")
> introduced mirrored memory support for x86. This support rely on UEFI to
> report mirrored memory address ranges.  See UEFI 2.5 spec pages 157-158:
>
>   http://www.uefi.org/sites/default/files/resources/UEFI%202_5.pdf
>
> Memory mirroring is a technique used to separate memory into two separate
> channels, usually on a memory device, like a server. In memory mirroring,
> one channel is copied to another to create redundancy. This method makes
> input/output (I/O) registers and memory appear with more than one address
> range because the same physical byte is accessible at more than one
> address. Using memory mirroring, higher memory reliability and a higher
> level of memory consolidation are possible.
>
> These EFI memory regions have various attributes, and the "mirrored"
> attribute is one of them. The physical memory region whose descriptors
> in EFI memory map has EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are
> mirrored. The address range mirroring feature of the kernel arranges such
> mirrored regions into normal zones and other regions into movable zones.
>
> Arm64 can support this too. So mirrored memory support is added to support
> arm64.
>
> The main purpose of this patch set is to introduce mirrored support for
> arm64 and we have already fixed the problems we had which is shown in
> patch #5 to patch #8 and try to bring total isolation in patch #9 which
> will disable mirror feature if kernelcore is not specified.
>
> In order to test this support in arm64:
> - patch this patch set
> - add kernelcore=mirror in kernel parameter
> - start you kernel
>
> Patch #1-#2 introduce mirrored memory support form arm64.
> Patch #3-#5 fix some bugs for arm64 if memory reliable is enabled.
> Patch #6 disable mirror feature if kernelcore is not specified.
>
> Thanks to Ard Biesheuvel's hard work [1], now kernel will perfer mirrored
> memory if kaslr is enabled.
>
> [1] https://lore.kernel.org/linux-arm-kernel/CAMj1kXEPVEzMgOM4+Yj6PxHA-jFuDOAUdDJSiSxy_XaP4P7LSw@mail.gmail.com/T/
>
> Changelog since v2:
> - remove efi_fake_mem support
> - remove Commit ("remove some redundant code in ia64 efi_init") since
>   efi_print_memmap() is not public
> - add mirror flag back on initrd memory
>
> Changelog since v1:
> - update changelog in cover letter
> - use PHYS_PFN in patch #7
>
> Ma Wupeng (6):
>   efi: Make efi_find_mirror() public
>   arm64/mirror: arm64 enabling - find mirrored memory ranges
>   mm: Ratelimited mirrored memory related warning messages
>   mm: Demote warning message in vmemmap_verify() to debug level
>   mm: Add mirror flag back on initrd memory
>   efi: Disable mirror feature if kernelcore is not specified
>

I have tested these changes on QEMU/arm64 with the patch below, and
things seem to work as expected. We have some minor issues to work out
but the general shape of this code is good.

As for the mm/ changes: does anyone mind if I take those through the
EFI tree as well? I don't think the EFI and -mm changes depend on each
other, so they can go into -mm separately as well.
Ard Biesheuvel June 10, 2022, 11:24 a.m. UTC | #9
On Fri, 10 Jun 2022 at 13:23, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Tue, 7 Jun 2022 at 11:16, Wupeng Ma <mawupeng1@huawei.com> wrote:
> >
> > From: Ma Wupeng <mawupeng1@huawei.com>
> >
> > Commit b05b9f5f9dcf ("x86, mirror: x86 enabling - find mirrored memory ranges")
> > introduced mirrored memory support for x86. This support rely on UEFI to
> > report mirrored memory address ranges.  See UEFI 2.5 spec pages 157-158:
> >
> >   http://www.uefi.org/sites/default/files/resources/UEFI%202_5.pdf
> >
> > Memory mirroring is a technique used to separate memory into two separate
> > channels, usually on a memory device, like a server. In memory mirroring,
> > one channel is copied to another to create redundancy. This method makes
> > input/output (I/O) registers and memory appear with more than one address
> > range because the same physical byte is accessible at more than one
> > address. Using memory mirroring, higher memory reliability and a higher
> > level of memory consolidation are possible.
> >
> > These EFI memory regions have various attributes, and the "mirrored"
> > attribute is one of them. The physical memory region whose descriptors
> > in EFI memory map has EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are
> > mirrored. The address range mirroring feature of the kernel arranges such
> > mirrored regions into normal zones and other regions into movable zones.
> >
> > Arm64 can support this too. So mirrored memory support is added to support
> > arm64.
> >
> > The main purpose of this patch set is to introduce mirrored support for
> > arm64 and we have already fixed the problems we had which is shown in
> > patch #5 to patch #8 and try to bring total isolation in patch #9 which
> > will disable mirror feature if kernelcore is not specified.
> >
> > In order to test this support in arm64:
> > - patch this patch set
> > - add kernelcore=mirror in kernel parameter
> > - start you kernel
> >
> > Patch #1-#2 introduce mirrored memory support form arm64.
> > Patch #3-#5 fix some bugs for arm64 if memory reliable is enabled.
> > Patch #6 disable mirror feature if kernelcore is not specified.
> >
> > Thanks to Ard Biesheuvel's hard work [1], now kernel will perfer mirrored
> > memory if kaslr is enabled.
> >
> > [1] https://lore.kernel.org/linux-arm-kernel/CAMj1kXEPVEzMgOM4+Yj6PxHA-jFuDOAUdDJSiSxy_XaP4P7LSw@mail.gmail.com/T/
> >
> > Changelog since v2:
> > - remove efi_fake_mem support
> > - remove Commit ("remove some redundant code in ia64 efi_init") since
> >   efi_print_memmap() is not public
> > - add mirror flag back on initrd memory
> >
> > Changelog since v1:
> > - update changelog in cover letter
> > - use PHYS_PFN in patch #7
> >
> > Ma Wupeng (6):
> >   efi: Make efi_find_mirror() public
> >   arm64/mirror: arm64 enabling - find mirrored memory ranges
> >   mm: Ratelimited mirrored memory related warning messages
> >   mm: Demote warning message in vmemmap_verify() to debug level
> >   mm: Add mirror flag back on initrd memory
> >   efi: Disable mirror feature if kernelcore is not specified
> >
>
> I have tested these changes on QEMU/arm64 with the patch below, and
> things seem to work as expected. We have some minor issues to work out
> but the general shape of this code is good.
>
> As for the mm/ changes: does anyone mind if I take those through the
> EFI tree as well? I don't think the EFI and -mm changes depend on each
> other, so they can go into -mm separately as well.

diff --git a/drivers/firmware/efi/libstub/efi-stub.c
b/drivers/firmware/efi/libstub/efi-stub.c
index f515394cce6e..1d4dd8aca3e6 100644
--- a/drivers/firmware/efi/libstub/efi-stub.c
+++ b/drivers/firmware/efi/libstub/efi-stub.c
@@ -136,6 +136,7 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
        enum efi_secureboot_mode secure_boot;
        struct screen_info *si;
        efi_properties_table_t *prop_tbl;
+       const efi_dxe_services_table_t *efi_dxe_table;

        efi_system_table = sys_table_arg;

@@ -161,6 +162,20 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
                goto fail;
        }

+       efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
+       if (efi_dxe_table) {
+               efi_physical_addr_t pa = 0x60000000;
+               efi_gcd_memory_space_desc_t desc;
+
+               status = efi_dxe_call(get_memory_space_descriptor, pa, &desc);
+               if (status != EFI_SUCCESS)
+                       efi_err("Failed to get memory space
descriptor: %lx\n", status);
+               status = efi_dxe_call(set_memory_space_capabilities, pa, SZ_1G,
+                                     desc.capabilities |
EFI_MEMORY_MORE_RELIABLE);
+               if (status != EFI_SUCCESS)
+                       efi_err("Failed to set memory space
capabilities: %lx\n", status);
+       }
+
        /*
         * Get the command line from EFI, using the LOADED_IMAGE
         * protocol. We are going to copy the command line into the
diff --git a/drivers/firmware/efi/libstub/efistub.h
b/drivers/firmware/efi/libstub/efistub.h
index b0ae0a454404..bf11d85bf9b4 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -378,7 +378,8 @@ union efi_dxe_services_table {
                void *schedule;
                void *trust;
                void *process_firmware_volume;
-               void *set_memory_space_capabilities;
+               efi_status_t (__efiapi
*set_memory_space_capabilities)(efi_physical_addr_t,
+
u64, u64);
        };
        struct {
                efi_table_hdr_t hdr;
Kefeng Wang June 10, 2022, 12:15 p.m. UTC | #10
On 2022/6/10 19:20, Ard Biesheuvel wrote:
> On Tue, 7 Jun 2022 at 11:16, Wupeng Ma <mawupeng1@huawei.com> wrote:
>> From: Ma Wupeng <mawupeng1@huawei.com>
>>
>> If system have some mirrored memory and mirrored feature is not specified
>> in boot parameter, the basic mirrored feature will be enabled and this will
>> lead to the following situations:
>>
>> - memblock memory allocation prefers mirrored region. This may have some
>>    unexpected influence on numa affinity.
>>
>> - contiguous memory will be split into several parts if parts of them
>>    is mirrored memory via memblock_mark_mirror().
>>
>> To fix this, variable mirrored_kernelcore will be checked before calling
>> efi_find_mirror() which will enable basic mirrored feature and this
>> variable is true if kernelcore=mirror is added in the kernel parameters.
>>
>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> This seems like the wrong place to do this. If mirrored memory is
> irrelevant to memblock, it should ignore the attribute. So I think
> this check belongs in mm/memblock.c instead.

Sound good,  let's add check into memblock_mark_mirror() and retain

the efi memory mirror information printing.

>
>
>> ---
>>   drivers/firmware/efi/efi.c | 3 +++
>>   include/linux/mm.h         | 2 ++
>>   mm/page_alloc.c            | 2 +-
>>   3 files changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
>> index 79c232e07de7..8a5edcb0dd82 100644
>> --- a/drivers/firmware/efi/efi.c
>> +++ b/drivers/firmware/efi/efi.c
>> @@ -454,6 +454,9 @@ void __init efi_find_mirror(void)
>>          if (!efi_enabled(EFI_MEMMAP))
>>                  return;
>>
>> +       if (!mirrored_kernelcore)
>> +               return;
>> +
>>          for_each_efi_memory_desc(md) {
>>                  unsigned long long start = md->phys_addr;
>>                  unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index bc8f326be0ce..741ac7d022c3 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -2540,6 +2540,8 @@ extern void get_pfn_range_for_nid(unsigned int nid,
>>                          unsigned long *start_pfn, unsigned long *end_pfn);
>>   extern unsigned long find_min_pfn_with_active_regions(void);
>>
>> +extern bool mirrored_kernelcore;
>> +
>>   #ifndef CONFIG_NUMA
>>   static inline int early_pfn_to_nid(unsigned long pfn)
>>   {
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index e008a3df0485..cf6f70aba787 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -356,7 +356,7 @@ static unsigned long required_kernelcore_percent __initdata;
>>   static unsigned long required_movablecore __initdata;
>>   static unsigned long required_movablecore_percent __initdata;
>>   static unsigned long zone_movable_pfn[MAX_NUMNODES] __initdata;
>> -static bool mirrored_kernelcore __meminitdata;
>> +bool mirrored_kernelcore __meminitdata;
>>
>>   /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
>>   int movable_zone;
>> --
>> 2.25.1
>>
> .
Mike Rapoport June 11, 2022, 9:56 a.m. UTC | #11
On Fri, Jun 10, 2022 at 01:23:34PM +0200, Ard Biesheuvel wrote:
> On Tue, 7 Jun 2022 at 11:16, Wupeng Ma <mawupeng1@huawei.com> wrote:
> >
> > From: Ma Wupeng <mawupeng1@huawei.com>
> >
> > Commit b05b9f5f9dcf ("x86, mirror: x86 enabling - find mirrored memory ranges")
> > introduced mirrored memory support for x86. This support rely on UEFI to
> > report mirrored memory address ranges.  See UEFI 2.5 spec pages 157-158:
> >
> >   http://www.uefi.org/sites/default/files/resources/UEFI%202_5.pdf
> >
> > Memory mirroring is a technique used to separate memory into two separate
> > channels, usually on a memory device, like a server. In memory mirroring,
> > one channel is copied to another to create redundancy. This method makes
> > input/output (I/O) registers and memory appear with more than one address
> > range because the same physical byte is accessible at more than one
> > address. Using memory mirroring, higher memory reliability and a higher
> > level of memory consolidation are possible.
> >
> > These EFI memory regions have various attributes, and the "mirrored"
> > attribute is one of them. The physical memory region whose descriptors
> > in EFI memory map has EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are
> > mirrored. The address range mirroring feature of the kernel arranges such
> > mirrored regions into normal zones and other regions into movable zones.
> >
> > Arm64 can support this too. So mirrored memory support is added to support
> > arm64.
> >
> > The main purpose of this patch set is to introduce mirrored support for
> > arm64 and we have already fixed the problems we had which is shown in
> > patch #5 to patch #8 and try to bring total isolation in patch #9 which
> > will disable mirror feature if kernelcore is not specified.
> >
> > In order to test this support in arm64:
> > - patch this patch set
> > - add kernelcore=mirror in kernel parameter
> > - start you kernel
> >
> > Patch #1-#2 introduce mirrored memory support form arm64.
> > Patch #3-#5 fix some bugs for arm64 if memory reliable is enabled.
> > Patch #6 disable mirror feature if kernelcore is not specified.
> >
> > Thanks to Ard Biesheuvel's hard work [1], now kernel will perfer mirrored
> > memory if kaslr is enabled.
> >
> > [1] https://lore.kernel.org/linux-arm-kernel/CAMj1kXEPVEzMgOM4+Yj6PxHA-jFuDOAUdDJSiSxy_XaP4P7LSw@mail.gmail.com/T/
> >
> > Changelog since v2:
> > - remove efi_fake_mem support
> > - remove Commit ("remove some redundant code in ia64 efi_init") since
> >   efi_print_memmap() is not public
> > - add mirror flag back on initrd memory
> >
> > Changelog since v1:
> > - update changelog in cover letter
> > - use PHYS_PFN in patch #7
> >
> > Ma Wupeng (6):
> >   efi: Make efi_find_mirror() public
> >   arm64/mirror: arm64 enabling - find mirrored memory ranges
> >   mm: Ratelimited mirrored memory related warning messages
> >   mm: Demote warning message in vmemmap_verify() to debug level
> >   mm: Add mirror flag back on initrd memory
> >   efi: Disable mirror feature if kernelcore is not specified
> >
> 
> I have tested these changes on QEMU/arm64 with the patch below, and
> things seem to work as expected. We have some minor issues to work out
> but the general shape of this code is good.
> 
> As for the mm/ changes: does anyone mind if I take those through the
> EFI tree as well?

No objections from me.

> I don't think the EFI and -mm changes depend on each other, so they
> can go into -mm separately as well.

--
Sincerely yours,
Mike.