diff mbox series

[PATCHv3,2/4] x86/64/mm: Make SPARSEMEM_VMEMMAP the only memory model

Message ID 20250516123306.3812286-3-kirill.shutemov@linux.intel.com
State New
Headers show
Series x86: Make 5-level paging support unconditional for x86-64 | expand

Commit Message

Kirill A. Shutemov May 16, 2025, 12:33 p.m. UTC
5-level paging only supports SPARSEMEM_VMEMMAP. CONFIG_X86_5LEVEL is
being phased out, making 5-level paging support mandatory.

Make CONFIG_SPARSEMEM_VMEMMAP mandatory for x86-64 and eliminate
any associated conditional statements.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/Kconfig      | 2 +-
 arch/x86/mm/init_64.c | 9 +--------
 2 files changed, 2 insertions(+), 9 deletions(-)

Comments

Dave Hansen May 16, 2025, 1:42 p.m. UTC | #1
On 5/16/25 05:33, Kirill A. Shutemov wrote:
> 5-level paging only supports SPARSEMEM_VMEMMAP. CONFIG_X86_5LEVEL is
> being phased out, making 5-level paging support mandatory.
> 
> Make CONFIG_SPARSEMEM_VMEMMAP mandatory for x86-64 and eliminate
> any associated conditional statements.
I think we have ourselves a catch-22 here.

SPARSEMEM_VMEMMAP was selected because the other sparsemem modes
couldn't handle a dynamic MAX_PHYS{MEM,ADDR}_BITS introduced by 5-level
paging. Now you're proposing making it static again, but keeping the
SPARSEMEM_VMEMMAP dependency.

If you remove the dynamic MAX_PHYS{MEM,ADDR}_BITS, you should also
remove the dependency on SPARSEMEM_VMEMMAP. No?
Ingo Molnar May 16, 2025, 1:45 p.m. UTC | #2
* Dave Hansen <dave.hansen@intel.com> wrote:

> On 5/16/25 05:33, Kirill A. Shutemov wrote:
> > 5-level paging only supports SPARSEMEM_VMEMMAP. CONFIG_X86_5LEVEL is
> > being phased out, making 5-level paging support mandatory.
> > 
> > Make CONFIG_SPARSEMEM_VMEMMAP mandatory for x86-64 and eliminate
> > any associated conditional statements.
> I think we have ourselves a catch-22 here.
> 
> SPARSEMEM_VMEMMAP was selected because the other sparsemem modes
> couldn't handle a dynamic MAX_PHYS{MEM,ADDR}_BITS introduced by 5-level
> paging. Now you're proposing making it static again, but keeping the
> SPARSEMEM_VMEMMAP dependency.
> 
> If you remove the dynamic MAX_PHYS{MEM,ADDR}_BITS, you should also
> remove the dependency on SPARSEMEM_VMEMMAP. No?

Isn't it the other way around? MAX_PHYS{MEM,ADDR}_BITS are now *always* 
dynamic, their value depending on whether LA57 is available and used.

Thanks,

	Ingo
Kirill A. Shutemov May 16, 2025, 2:01 p.m. UTC | #3
On Fri, May 16, 2025 at 06:42:03AM -0700, Dave Hansen wrote:
> On 5/16/25 05:33, Kirill A. Shutemov wrote:
> > 5-level paging only supports SPARSEMEM_VMEMMAP. CONFIG_X86_5LEVEL is
> > being phased out, making 5-level paging support mandatory.
> > 
> > Make CONFIG_SPARSEMEM_VMEMMAP mandatory for x86-64 and eliminate
> > any associated conditional statements.
> I think we have ourselves a catch-22 here.
> 
> SPARSEMEM_VMEMMAP was selected because the other sparsemem modes
> couldn't handle a dynamic MAX_PHYS{MEM,ADDR}_BITS introduced by 5-level
> paging. Now you're proposing making it static again, but keeping the
> SPARSEMEM_VMEMMAP dependency.
> 
> If you remove the dynamic MAX_PHYS{MEM,ADDR}_BITS, you should also
> remove the dependency on SPARSEMEM_VMEMMAP. No?

I guess. But how?

And is there any value to support !SPARSEMEM_VMEMMAP?
Ingo Molnar May 16, 2025, 2:08 p.m. UTC | #4
* Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:

> On Fri, May 16, 2025 at 06:42:03AM -0700, Dave Hansen wrote:
> > On 5/16/25 05:33, Kirill A. Shutemov wrote:
> > > 5-level paging only supports SPARSEMEM_VMEMMAP. CONFIG_X86_5LEVEL is
> > > being phased out, making 5-level paging support mandatory.
> > > 
> > > Make CONFIG_SPARSEMEM_VMEMMAP mandatory for x86-64 and eliminate
> > > any associated conditional statements.
> > I think we have ourselves a catch-22 here.
> > 
> > SPARSEMEM_VMEMMAP was selected because the other sparsemem modes
> > couldn't handle a dynamic MAX_PHYS{MEM,ADDR}_BITS introduced by 5-level
> > paging. Now you're proposing making it static again, but keeping the
> > SPARSEMEM_VMEMMAP dependency.
> > 
> > If you remove the dynamic MAX_PHYS{MEM,ADDR}_BITS, you should also
> > remove the dependency on SPARSEMEM_VMEMMAP. No?
> 
> I guess. But how?
> 
> And is there any value to support !SPARSEMEM_VMEMMAP?

Not really IMHO:

  .config.opensuse.default:    CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
  .config.ubuntu.localinstall: CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
  .config.fedora.generic:      CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
  .config.rhel.generic:        CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y

Thanks,

	Ingo
Dave Hansen May 16, 2025, 2:59 p.m. UTC | #5
On 5/16/25 06:45, Ingo Molnar wrote:
> Isn't it the other way around? MAX_PHYS{MEM,ADDR}_BITS are now *always* 
> dynamic, their value depending on whether LA57 is available and used.

MAX_PHYS{MEM,ADDR}_BITS were always intended to be the compile-time
maximums on the architecture. They're static on every architecture
except x86 and some arm64 configs (who probably copied x86).

That's why having them be dynamic broke non-vmemmap sparsemem.

But we also seems to have defined MAXMEM to derive from MAX_PHYSMEM_BITS
and MAXMEM cascades into a bunch of other stuff, including KASAN. So we
can't just make them static again, I guess.

The only option would be to make them static when using non-vmemmap
sparsemem. But that's new-ish, and probably won't get any testing.

I guess there's not much we can do about it.
Dave Hansen May 16, 2025, 3:01 p.m. UTC | #6
On 5/16/25 07:01, Kirill A. Shutemov wrote:
>> If you remove the dynamic MAX_PHYS{MEM,ADDR}_BITS, you should also
>> remove the dependency on SPARSEMEM_VMEMMAP. No?
> I guess. But how?
> 
> And is there any value to support !SPARSEMEM_VMEMMAP?

Not really, other than making sure x86 supports a really wide variety of
arch-independent options. I've used it a time or two to go looking for
bugs but I'm not sure if those originated on x86 or other architectures.
Dave Hansen May 16, 2025, 3:03 p.m. UTC | #7
On 5/16/25 07:08, Ingo Molnar wrote:
>> And is there any value to support !SPARSEMEM_VMEMMAP?
> Not really IMHO:
> 
>   .config.opensuse.default:    CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
>   .config.ubuntu.localinstall: CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
>   .config.fedora.generic:      CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
>   .config.rhel.generic:        CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y

I look at the distro configs all the time as well.

But let's also not forget that none of these have lockdep turned on, and
we don't want to toss out lockdep support on x86, for example. ;)
Dave Hansen May 16, 2025, 3:08 p.m. UTC | #8
On 5/16/25 07:59, Dave Hansen wrote:
> The only option would be to make them static when using non-vmemmap
> sparsemem. But that's new-ish, and probably won't get any testing.

Something like this. But I don't particularly like it.
Ingo Molnar May 16, 2025, 3:35 p.m. UTC | #9
* Dave Hansen <dave.hansen@intel.com> wrote:

> On 5/16/25 07:08, Ingo Molnar wrote:
> >> And is there any value to support !SPARSEMEM_VMEMMAP?
> > Not really IMHO:
> > 
> >   .config.opensuse.default:    CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
> >   .config.ubuntu.localinstall: CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
> >   .config.fedora.generic:      CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
> >   .config.rhel.generic:        CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
> 
> I look at the distro configs all the time as well.
> 
> But let's also not forget that none of these have lockdep turned on,

Not actually true, quite a few distro debug kernel packages have 
lockdep on, and it's often enabled in internal QA rigs as well.

Not so SPARSEMEM_VMEMMAP...

> [...] and we don't want to toss out lockdep support on x86, for 
> example. ;)

That's an apples to oranges comparison:

 - SPARSEMEM_VMEMMAP is always enabled, in standard and debug kernels 
   as well.

 - lockdep is disabled in standard kernels, enabled in debug kernels 
   and very much relied on.

Thanks,

	Ingo
Dave Hansen May 16, 2025, 3:46 p.m. UTC | #10
On 5/16/25 08:35, Ingo Molnar wrote:
>   .config.opensuse.default:    CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
>   .config.ubuntu.localinstall: CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
>   .config.fedora.generic:      CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
>   .config.rhel.generic:        CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y

That reminds me...

Does everybody keep their own local copies of these configs in their
environment? I do, and I refresh them periodically from the distros. I
assume everybody else is doing something similar.

Is there a better way?
H. Peter Anvin May 16, 2025, 6:28 p.m. UTC | #11
On May 16, 2025 8:46:07 AM PDT, Dave Hansen <dave.hansen@intel.com> wrote:
>On 5/16/25 08:35, Ingo Molnar wrote:
>>   .config.opensuse.default:    CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
>>   .config.ubuntu.localinstall: CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
>>   .config.fedora.generic:      CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
>>   .config.rhel.generic:        CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
>
>That reminds me...
>
>Does everybody keep their own local copies of these configs in their
>environment? I do, and I refresh them periodically from the distros. I
>assume everybody else is doing something similar.
>
>Is there a better way?

What I do is keep a set of minimal configs which are the deltas from the default.
diff mbox series

Patch

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d3c2da3b2f0b..45b36a019b5e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1467,7 +1467,6 @@  config X86_PAE
 config X86_5LEVEL
 	bool "Enable 5-level page tables support"
 	default y
-	select SPARSEMEM_VMEMMAP
 	depends on X86_64
 	help
 	  5-level paging enables access to larger address space:
@@ -1579,6 +1578,7 @@  config ARCH_SPARSEMEM_ENABLE
 	def_bool y
 	select SPARSEMEM_STATIC if X86_32
 	select SPARSEMEM_VMEMMAP_ENABLE if X86_64
+	select SPARSEMEM_VMEMMAP if X86_64
 
 config ARCH_SPARSEMEM_DEFAULT
 	def_bool X86_64 || (NUMA && X86_32)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index bf45c7aed336..66330fe4e18c 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -833,7 +833,6 @@  void __init paging_init(void)
 	zone_sizes_init();
 }
 
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
 #define PAGE_UNUSED 0xFD
 
 /*
@@ -932,7 +931,6 @@  static void __meminit vmemmap_use_new_sub_pmd(unsigned long start, unsigned long
 	if (!IS_ALIGNED(end, PMD_SIZE))
 		unused_pmd_start = end;
 }
-#endif
 
 /*
  * Memory hotplug specific functions
@@ -1152,16 +1150,13 @@  remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
 				pmd_clear(pmd);
 				spin_unlock(&init_mm.page_table_lock);
 				pages++;
-			}
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-			else if (vmemmap_pmd_is_unused(addr, next)) {
+			} else if (vmemmap_pmd_is_unused(addr, next)) {
 					free_hugepage_table(pmd_page(*pmd),
 							    altmap);
 					spin_lock(&init_mm.page_table_lock);
 					pmd_clear(pmd);
 					spin_unlock(&init_mm.page_table_lock);
 			}
-#endif
 			continue;
 		}
 
@@ -1500,7 +1495,6 @@  unsigned long memory_block_size_bytes(void)
 	return memory_block_size_probed;
 }
 
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
 /*
  * Initialise the sparsemem vmemmap using huge-pages at the PMD level.
  */
@@ -1647,4 +1641,3 @@  void __meminit vmemmap_populate_print_last(void)
 		node_start = 0;
 	}
 }
-#endif