Message ID | 20210629131542.743888-1-clg@kaod.org |
---|---|
State | New |
Headers | show |
Series | powerpc/xive: Do not skip CPU-less nodes when creating the IPIs | expand |
* C?dric Le Goater <clg@kaod.org> [2021-06-29 15:15:42]: > On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at > runtime. Today, the IPI is not created for such nodes, and hot-plugged > CPUs use a bogus IPI, which leads to soft lockups. > > We could create the node IPI on demand but it is a bit complex because > this code would be called under bringup_up() and some IRQ locking is > being done. The simplest solution is to create the IPIs for all nodes > at startup. Thanks for quickly coming up with the fix. > > Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") > Cc: stable@vger.kernel.org # v5.13 > Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> > Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> > Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> > --- > > This patch breaks old versions of irqbalance (<= v1.4). Possible nodes > are collected from /sys/devices/system/node/ but CPU-less nodes are > not listed there. When interrupts are scanned, the link representing > the node structure is NULL and segfault occurs. > > Version 1.7 seems immune. > > --- > arch/powerpc/sysdev/xive/common.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c > index f3b16ed48b05..5d2c58dba57e 100644 > --- a/arch/powerpc/sysdev/xive/common.c > +++ b/arch/powerpc/sysdev/xive/common.c > @@ -1143,10 +1143,6 @@ static int __init xive_request_ipi(void) > struct xive_ipi_desc *xid = &xive_ipis[node]; > struct xive_ipi_alloc_info info = { node }; > > - /* Skip nodes without CPUs */ > - if (cpumask_empty(cpumask_of_node(node))) > - continue; > - > /* > * Map one IPI interrupt per node for all cpus of that node. > * Since the HW interrupt number doesn't have any meaning, > -- > 2.31.1 > -- Thanks and Regards Srikar Dronamraju
On 29/06/2021 15:15, Cédric Le Goater wrote: > On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at > runtime. Today, the IPI is not created for such nodes, and hot-plugged > CPUs use a bogus IPI, which leads to soft lockups. > > We could create the node IPI on demand but it is a bit complex because > this code would be called under bringup_up() and some IRQ locking is > being done. The simplest solution is to create the IPIs for all nodes > at startup. > > Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") > Cc: stable@vger.kernel.org # v5.13 > Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> > Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> > Signed-off-by: Cédric Le Goater <clg@kaod.org> > --- > > This patch breaks old versions of irqbalance (<= v1.4). Possible nodes > are collected from /sys/devices/system/node/ but CPU-less nodes are > not listed there. When interrupts are scanned, the link representing > the node structure is NULL and segfault occurs. > > Version 1.7 seems immune. > > --- > arch/powerpc/sysdev/xive/common.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c > index f3b16ed48b05..5d2c58dba57e 100644 > --- a/arch/powerpc/sysdev/xive/common.c > +++ b/arch/powerpc/sysdev/xive/common.c > @@ -1143,10 +1143,6 @@ static int __init xive_request_ipi(void) > struct xive_ipi_desc *xid = &xive_ipis[node]; > struct xive_ipi_alloc_info info = { node }; > > - /* Skip nodes without CPUs */ > - if (cpumask_empty(cpumask_of_node(node))) > - continue; > - > /* > * Map one IPI interrupt per node for all cpus of that node. > * Since the HW interrupt number doesn't have any meaning, Tested-by: Laurent Vivier <lvivier@redhat.com>
On 29/06/2021 15:15, Cédric Le Goater wrote: > On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at > runtime. Today, the IPI is not created for such nodes, and hot-plugged > CPUs use a bogus IPI, which leads to soft lockups. > > We could create the node IPI on demand but it is a bit complex because > this code would be called under bringup_up() and some IRQ locking is > being done. The simplest solution is to create the IPIs for all nodes > at startup. > > Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") > Cc: stable@vger.kernel.org # v5.13 > Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> > Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> > Signed-off-by: Cédric Le Goater <clg@kaod.org> > --- > > This patch breaks old versions of irqbalance (<= v1.4). Possible nodes > are collected from /sys/devices/system/node/ but CPU-less nodes are > not listed there. When interrupts are scanned, the link representing > the node structure is NULL and segfault occurs. > > Version 1.7 seems immune. > > --- > arch/powerpc/sysdev/xive/common.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c > index f3b16ed48b05..5d2c58dba57e 100644 > --- a/arch/powerpc/sysdev/xive/common.c > +++ b/arch/powerpc/sysdev/xive/common.c > @@ -1143,10 +1143,6 @@ static int __init xive_request_ipi(void) > struct xive_ipi_desc *xid = &xive_ipis[node]; > struct xive_ipi_alloc_info info = { node }; > > - /* Skip nodes without CPUs */ > - if (cpumask_empty(cpumask_of_node(node))) > - continue; > - > /* > * Map one IPI interrupt per node for all cpus of that node. > * Since the HW interrupt number doesn't have any meaning, > What happened to this fix? Will it be merged? Thanks, Laurent
Cédric Le Goater <clg@kaod.org> writes: > On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at > runtime. Today, the IPI is not created for such nodes, and hot-plugged > CPUs use a bogus IPI, which leads to soft lockups. > > We could create the node IPI on demand but it is a bit complex because > this code would be called under bringup_up() and some IRQ locking is > being done. The simplest solution is to create the IPIs for all nodes > at startup. > > Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") > Cc: stable@vger.kernel.org # v5.13 > Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> > Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> > Signed-off-by: Cédric Le Goater <clg@kaod.org> > --- > > This patch breaks old versions of irqbalance (<= v1.4). Possible nodes > are collected from /sys/devices/system/node/ but CPU-less nodes are > not listed there. When interrupts are scanned, the link representing > the node structure is NULL and segfault occurs. Breaking userspace is usually frowned upon, even if it is irqbalance. If CPU-less nodes appeared in /sys/devices/system/node would that fix it? Could we do that or is that not possible for other reasons? > Version 1.7 seems immune. Which was released in August 2020. Looks like some distros still ship 1.6, I take it you're not sure if that is broken or not. cheers
On 8/2/21 8:37 AM, Michael Ellerman wrote: > Cédric Le Goater <clg@kaod.org> writes: >> On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at >> runtime. Today, the IPI is not created for such nodes, and hot-plugged >> CPUs use a bogus IPI, which leads to soft lockups. >> >> We could create the node IPI on demand but it is a bit complex because >> this code would be called under bringup_up() and some IRQ locking is >> being done. The simplest solution is to create the IPIs for all nodes >> at startup. >> >> Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") >> Cc: stable@vger.kernel.org # v5.13 >> Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> >> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> >> Signed-off-by: Cédric Le Goater <clg@kaod.org> >> --- >> >> This patch breaks old versions of irqbalance (<= v1.4). Possible nodes >> are collected from /sys/devices/system/node/ but CPU-less nodes are >> not listed there. When interrupts are scanned, the link representing >> the node structure is NULL and segfault occurs. > > Breaking userspace is usually frowned upon, even if it is irqbalance. > > If CPU-less nodes appeared in /sys/devices/system/node would that fix > it? Could we do that or is that not possible for other reasons? > >> Version 1.7 seems immune. > > Which was released in August 2020. > > Looks like some distros still ship 1.6, I take it you're not sure if > that is broken or not. I did a bisect on irqbalance and the "bad" commit was introduced between version 1.7 and version 1.8 : commit 31dea01f3a47 ("Also fetch node info for non-PCI devices") https://github.com/Irqbalance/irqbalance/commit/31dea01f3a47aa6374560638486879e5129f9c94 which was backported on RHEL 8 in RPM irqbalance-1.4.0-6.el8. Any distro using irqbalance <= 1.7 without the patch above is fine. Since irqbalance handled cleanly irqs referencing offline nodes before this patch, I am inclined to think that the irqbalance fix is incomplete. Unfortunately, the commit log lacks some context on the non-PCI devices. Thanks, C.
On 6/29/21 3:15 PM, Cédric Le Goater wrote: > On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at > runtime. Today, the IPI is not created for such nodes, and hot-plugged > CPUs use a bogus IPI, which leads to soft lockups. > > We could create the node IPI on demand but it is a bit complex because > this code would be called under bringup_up() and some IRQ locking is > being done. The simplest solution is to create the IPIs for all nodes > at startup. > > Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") > Cc: stable@vger.kernel.org # v5.13 > Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> > Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> > Signed-off-by: Cédric Le Goater <clg@kaod.org> > --- > > This patch breaks old versions of irqbalance (<= v1.4). Possible nodes > are collected from /sys/devices/system/node/ but CPU-less nodes are > not listed there. When interrupts are scanned, the link representing > the node structure is NULL and segfault occurs. This is an irqbalance regression due to : https://github.com/Irqbalance/irqbalance/pull/172 I will report through an issue. Anyhow, there is a better approach which is to allocate IPIs for all nodes at boot time and do the mapping on demand. Removing the mapping on last use seems more complex though. I will send a v2 after some tests. Thanks, C. > Version 1.7 seems immune. > > --- > arch/powerpc/sysdev/xive/common.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c > index f3b16ed48b05..5d2c58dba57e 100644 > --- a/arch/powerpc/sysdev/xive/common.c > +++ b/arch/powerpc/sysdev/xive/common.c > @@ -1143,10 +1143,6 @@ static int __init xive_request_ipi(void) > struct xive_ipi_desc *xid = &xive_ipis[node]; > struct xive_ipi_alloc_info info = { node }; > > - /* Skip nodes without CPUs */ > - if (cpumask_empty(cpumask_of_node(node))) > - continue; > - > /* > * Map one IPI interrupt per node for all cpus of that node. > * Since the HW interrupt number doesn't have any meaning, >
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c index f3b16ed48b05..5d2c58dba57e 100644 --- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -1143,10 +1143,6 @@ static int __init xive_request_ipi(void) struct xive_ipi_desc *xid = &xive_ipis[node]; struct xive_ipi_alloc_info info = { node }; - /* Skip nodes without CPUs */ - if (cpumask_empty(cpumask_of_node(node))) - continue; - /* * Map one IPI interrupt per node for all cpus of that node. * Since the HW interrupt number doesn't have any meaning,
On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at runtime. Today, the IPI is not created for such nodes, and hot-plugged CPUs use a bogus IPI, which leads to soft lockups. We could create the node IPI on demand but it is a bit complex because this code would be called under bringup_up() and some IRQ locking is being done. The simplest solution is to create the IPIs for all nodes at startup. Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") Cc: stable@vger.kernel.org # v5.13 Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> --- This patch breaks old versions of irqbalance (<= v1.4). Possible nodes are collected from /sys/devices/system/node/ but CPU-less nodes are not listed there. When interrupts are scanned, the link representing the node structure is NULL and segfault occurs. Version 1.7 seems immune. --- arch/powerpc/sysdev/xive/common.c | 4 ---- 1 file changed, 4 deletions(-)