Message ID | E1r0JLa-00CTxY-Uv@rmk-PC.armlinux.org.uk |
---|---|
State | New |
Headers | show |
Series | [RFC,01/22] arch_topology: Make register_cpu_capacity_sysctl() tolerant to late CPUs | expand |
On 11/7/23 20:30, Russell King (Oracle) wrote: > Since "drivers: base: Move cpu_dev_init() after node_dev_init()", we > can remove some redundant code. > > node_dev_init() will walk through the nodes calling register_one_node() > on each. This will trickle down to __register_one_node() which walks > all present CPUs, calling register_cpu_under_node() on each. > > register_cpu_under_node() will call get_cpu_device(cpu) for each, which > will return NULL until the CPU is registered using register_cpu(). This > now happens _after_ node_dev_init(). > > Therefore, calling register_cpu_under_node() from __register_one_node() > becomes a no-op, and can be removed. > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> > --- > drivers/base/node.c | 7 ------- > 1 file changed, 7 deletions(-) > __register_one_node() can be called in memory hot add path either. In that path, a new NUMA node can be presented and becomes online. Does this become a problem after the logic of associating CPU with newly added NUMA node? Thanks, Gavin > diff --git a/drivers/base/node.c b/drivers/base/node.c > index 493d533f8375..4d5ac7cf8757 100644 > --- a/drivers/base/node.c > +++ b/drivers/base/node.c > @@ -867,7 +867,6 @@ void register_memory_blocks_under_node(int nid, unsigned long start_pfn, > int __register_one_node(int nid) > { > int error; > - int cpu; > > node_devices[nid] = kzalloc(sizeof(struct node), GFP_KERNEL); > if (!node_devices[nid]) > @@ -875,12 +874,6 @@ int __register_one_node(int nid) > > error = register_node(node_devices[nid], nid); > > - /* link cpu under this node */ > - for_each_present_cpu(cpu) { > - if (cpu_to_node(cpu) == nid) > - register_cpu_under_node(cpu, nid); > - } > - > INIT_LIST_HEAD(&node_devices[nid]->access_list); > node_init_caches(nid); >
On Mon, Nov 13, 2023 at 02:04:32PM +1000, Gavin Shan wrote: > On 11/7/23 20:30, Russell King (Oracle) wrote: > > Since "drivers: base: Move cpu_dev_init() after node_dev_init()", we > > can remove some redundant code. > > > > node_dev_init() will walk through the nodes calling register_one_node() > > on each. This will trickle down to __register_one_node() which walks > > all present CPUs, calling register_cpu_under_node() on each. > > > > register_cpu_under_node() will call get_cpu_device(cpu) for each, which > > will return NULL until the CPU is registered using register_cpu(). This > > now happens _after_ node_dev_init(). > > > > Therefore, calling register_cpu_under_node() from __register_one_node() > > becomes a no-op, and can be removed. > > > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> > > --- > > drivers/base/node.c | 7 ------- > > 1 file changed, 7 deletions(-) > > > > __register_one_node() can be called in memory hot add path either. In that path, > a new NUMA node can be presented and becomes online. Does this become a problem > after the logic of associating CPU with newly added NUMA node? I guess this is where ordering matters. As mentioned in the commit message, register_cpu_under_node() does this: if (!node_online(nid)) return 0; obj = get_cpu_device(cpu); if (!obj) return 0; get_cpu_device() will return NULL if the CPU is not possible or is out of range, or register_cpu() has not yet been called for this CPU, and register_cpu() will call register_cpu_under_node(). I guess it is possible for a CPU it be present, but the node its associated with would not be online, which means we end up with register_cpu_under_node() returning on !node_online(nid) but we've populated the CPU devices (thus get_cpu_device(cpu) would return non-NULL). Then when the numa node comes online, we do still need to call this path, so this change is incorrect. It came about trying to address Jonathan's comment for this patch: https://lore.kernel.org/r/20230913163823.7880-7-james.morse@arm.com I think my response to Jonathan is still correct - but didn't need a code change. I'm dropping this patch.
diff --git a/drivers/base/node.c b/drivers/base/node.c index 493d533f8375..4d5ac7cf8757 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -867,7 +867,6 @@ void register_memory_blocks_under_node(int nid, unsigned long start_pfn, int __register_one_node(int nid) { int error; - int cpu; node_devices[nid] = kzalloc(sizeof(struct node), GFP_KERNEL); if (!node_devices[nid]) @@ -875,12 +874,6 @@ int __register_one_node(int nid) error = register_node(node_devices[nid], nid); - /* link cpu under this node */ - for_each_present_cpu(cpu) { - if (cpu_to_node(cpu) == nid) - register_cpu_under_node(cpu, nid); - } - INIT_LIST_HEAD(&node_devices[nid]->access_list); node_init_caches(nid);
Since "drivers: base: Move cpu_dev_init() after node_dev_init()", we can remove some redundant code. node_dev_init() will walk through the nodes calling register_one_node() on each. This will trickle down to __register_one_node() which walks all present CPUs, calling register_cpu_under_node() on each. register_cpu_under_node() will call get_cpu_device(cpu) for each, which will return NULL until the CPU is registered using register_cpu(). This now happens _after_ node_dev_init(). Therefore, calling register_cpu_under_node() from __register_one_node() becomes a no-op, and can be removed. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> --- drivers/base/node.c | 7 ------- 1 file changed, 7 deletions(-)