Message ID | 1414377878-23497-2-git-send-email-wangyijing@huawei.com |
---|---|
State | New |
Headers | show |
On 27/10/14 02:44, Yijing Wang wrote: > Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()") > fixed MSI mask bug which may cause kernel crash. But the commit > made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask" > to ignore MSI/MSI-X to fix this issue, it's a cleaner solution. > And the commit 0e4ccb1505a9 will be reverted in the later patch. Reviewed-by: David Vrabel <david.vrabel@citrix.com> In the sense that it keeps the odd Xen behaviour. But... Konrad, why was this fixed like this in the first place? IMO, it would have been better to get Xen to trap-and-emulate accesses to the relevant MSI/MSI-X registers. The mask/unmask on setup/teardown isn't performance critical. David
On Mon, Oct 27, 2014 at 11:09:42AM +0000, David Vrabel wrote: > On 27/10/14 02:44, Yijing Wang wrote: > > Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()") > > fixed MSI mask bug which may cause kernel crash. But the commit > > made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask" > > to ignore MSI/MSI-X to fix this issue, it's a cleaner solution. > > And the commit 0e4ccb1505a9 will be reverted in the later patch. > > Reviewed-by: David Vrabel <david.vrabel@citrix.com> > > In the sense that it keeps the odd Xen behaviour. But... > > Konrad, why was this fixed like this in the first place? IMO, it would As 0e4ccb1505a9 explains: PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq() Certain platforms do not allow writes in the MSI-X BARs to setup or tear down vector values. To combat against the generic code trying to write to that and either silently being ignored or crashing due to the pagetables being marked R/O this patch introduces a platform override. Note that we keep two separate, non-weak, functions default_mask_msi_irqs() and default_mask_msix_irqs() for the behavior of the arch_mask_msi_irqs() and arch_mask_msix_irqs(), as the default behavior is needed by x86 PCI code. For Xen, which does not allow the guest to write to MSI-X tables - as the hypervisor is solely responsible for setting the vector values - we implement two nops. This fixes a Xen guest crash when passing a PCI device with MSI-X to the guest. See the bugzilla for more details. [bhelgaas: add bugzilla info] Reference: https://bugzilla.kernel.org/show_bug.cgi?id=64581 > have been better to get Xen to trap-and-emulate accesses to the relevant > MSI/MSI-X registers. The mask/unmask on setup/teardown isn't > performance critical. > > David > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On 27/10/14 19:45, Konrad Rzeszutek Wilk wrote: > On Mon, Oct 27, 2014 at 11:09:42AM +0000, David Vrabel wrote: >> On 27/10/14 02:44, Yijing Wang wrote: >>> Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()") >>> fixed MSI mask bug which may cause kernel crash. But the commit >>> made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask" >>> to ignore MSI/MSI-X to fix this issue, it's a cleaner solution. >>> And the commit 0e4ccb1505a9 will be reverted in the later patch. >> >> Reviewed-by: David Vrabel <david.vrabel@citrix.com> >> >> In the sense that it keeps the odd Xen behaviour. But... >> >> Konrad, why was this fixed like this in the first place? IMO, it would > > As 0e4ccb1505a9 explains: > PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq() > > Certain platforms do not allow writes in the MSI-X BARs to setup or tear > down vector values. To combat against the generic code trying to write to > that and either silently being ignored or crashing due to the pagetables > being marked R/O this patch introduces a platform override. > > Note that we keep two separate, non-weak, functions default_mask_msi_irqs() > and default_mask_msix_irqs() for the behavior of the arch_mask_msi_irqs() > and arch_mask_msix_irqs(), as the default behavior is needed by x86 PCI > code. > > For Xen, which does not allow the guest to write to MSI-X tables - as the > hypervisor is solely responsible for setting the vector values - we > implement two nops. My question specifically was: could this be fixed by allowing writes to set/clear the mask bit? If so, why was this not done and could we do this now? David
On Tue, Oct 28, 2014 at 04:44:44PM +0000, David Vrabel wrote: > On 27/10/14 19:45, Konrad Rzeszutek Wilk wrote: > > On Mon, Oct 27, 2014 at 11:09:42AM +0000, David Vrabel wrote: > >> On 27/10/14 02:44, Yijing Wang wrote: > >>> Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()") > >>> fixed MSI mask bug which may cause kernel crash. But the commit > >>> made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask" > >>> to ignore MSI/MSI-X to fix this issue, it's a cleaner solution. > >>> And the commit 0e4ccb1505a9 will be reverted in the later patch. > >> > >> Reviewed-by: David Vrabel <david.vrabel@citrix.com> > >> > >> In the sense that it keeps the odd Xen behaviour. But... > >> > >> Konrad, why was this fixed like this in the first place? IMO, it would > > > > As 0e4ccb1505a9 explains: > > PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq() > > > > Certain platforms do not allow writes in the MSI-X BARs to setup or tear > > down vector values. To combat against the generic code trying to write to > > that and either silently being ignored or crashing due to the pagetables > > being marked R/O this patch introduces a platform override. > > > > Note that we keep two separate, non-weak, functions default_mask_msi_irqs() > > and default_mask_msix_irqs() for the behavior of the arch_mask_msi_irqs() > > and arch_mask_msix_irqs(), as the default behavior is needed by x86 PCI > > code. > > > > For Xen, which does not allow the guest to write to MSI-X tables - as the > > hypervisor is solely responsible for setting the vector values - we > > implement two nops. > > My question specifically was: could this be fixed by allowing writes to > set/clear the mask bit? If so, why was this not done and could we do No. > this now? > > David
On Mon, Oct 27, 2014 at 10:44:36AM +0800, Yijing Wang wrote: > Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()") > fixed MSI mask bug which may cause kernel crash. But the commit > made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask" > to ignore MSI/MSI-X to fix this issue, it's a cleaner solution. > And the commit 0e4ccb1505a9 will be reverted in the later patch. The 0e4ccb1505a9 changelog says Xen guests can't write to MSI-X BARs. But it makes mask_irq a no-op for both MSI-X and MSI. The MSI mask bit is in config space, not in memory space. So why does mask_irq need to be a no-op for MSI as well? Are Xen guests prohibited from writing to config space, too? (It's fine if they are; it's just that the changelog specifically mentioned MSI-X memory space tables, and it didn't mention config space at all.) And I *assume* there's some Xen mechanism that accomplishes the mask_irq in a different way, since the actual mask_irq interface does nothing? (This is really a question for 0e4ccb1505a9, since I don't think this particular patch changes anything in that respect.) > Signed-off-by: Yijing Wang <wangyijing@huawei.com> > CC: David Vrabel <david.vrabel@citrix.com> > CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > CC: xen-devel@lists.xenproject.org > --- > arch/x86/pci/xen.c | 2 ++ > drivers/pci/msi.c | 7 ++++++- > include/linux/msi.h | 1 + > 3 files changed, 9 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c > index 093f5f4..5ef62ed 100644 > --- a/arch/x86/pci/xen.c > +++ b/arch/x86/pci/xen.c > @@ -427,6 +427,7 @@ int __init pci_xen_init(void) > x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs; > x86_msi.msi_mask_irq = xen_nop_msi_mask_irq; > x86_msi.msix_mask_irq = xen_nop_msix_mask_irq; > + pci_msi_ignore_mask = 1; > #endif > return 0; > } > @@ -508,6 +509,7 @@ int __init pci_xen_initial_domain(void) > x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs; > x86_msi.msi_mask_irq = xen_nop_msi_mask_irq; > x86_msi.msix_mask_irq = xen_nop_msix_mask_irq; > + pci_msi_ignore_mask = 1; > #endif > xen_setup_acpi_sci(); > __acpi_register_gsi = acpi_register_gsi_xen; > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > index 38511d9..ecb5f54 100644 > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > @@ -23,6 +23,7 @@ > #include "pci.h" > > static int pci_msi_enable = 1; > +int pci_msi_ignore_mask; > > #define msix_table_size(flags) ((flags & PCI_MSIX_FLAGS_QSIZE) + 1) > > @@ -166,7 +167,7 @@ u32 default_msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) > { > u32 mask_bits = desc->masked; > > - if (!desc->msi_attrib.maskbit) > + if (pci_msi_ignore_mask || !desc->msi_attrib.maskbit) > return 0; > > mask_bits &= ~mask; > @@ -198,6 +199,10 @@ u32 default_msix_mask_irq(struct msi_desc *desc, u32 flag) > u32 mask_bits = desc->masked; > unsigned offset = desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE + > PCI_MSIX_ENTRY_VECTOR_CTRL; > + > + if (pci_msi_ignore_mask) > + return 0; > + > mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; > if (flag) > mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT; > diff --git a/include/linux/msi.h b/include/linux/msi.h > index 44f4746..86dc501 100644 > --- a/include/linux/msi.h > +++ b/include/linux/msi.h > @@ -10,6 +10,7 @@ struct msi_msg { > u32 data; /* 16 bits of msi message data */ > }; > > +extern int pci_msi_ignore_mask; > /* Helper functions */ > struct irq_data; > struct msi_desc; > -- > 1.7.1 >
On 2014/11/11 8:04, Bjorn Helgaas wrote: > On Mon, Oct 27, 2014 at 10:44:36AM +0800, Yijing Wang wrote: >> Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()") >> fixed MSI mask bug which may cause kernel crash. But the commit >> made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask" >> to ignore MSI/MSI-X to fix this issue, it's a cleaner solution. >> And the commit 0e4ccb1505a9 will be reverted in the later patch. > > The 0e4ccb1505a9 changelog says Xen guests can't write to MSI-X BARs. > But it makes mask_irq a no-op for both MSI-X and MSI. The MSI mask bit is > in config space, not in memory space. So why does mask_irq need to be a > no-op for MSI as well? Are Xen guests prohibited from writing to config > space, too? (It's fine if they are; it's just that the changelog > specifically mentioned MSI-X memory space tables, and it didn't mention > config space at all.) > > And I *assume* there's some Xen mechanism that accomplishes the mask_irq in > a different way, since the actual mask_irq interface does nothing? (This > is really a question for 0e4ccb1505a9, since I don't think this particular > patch changes anything in that respect.) Yes, it's another history problem, maybe Konrad know the detail. > >> Signed-off-by: Yijing Wang <wangyijing@huawei.com> >> CC: David Vrabel <david.vrabel@citrix.com> >> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >> CC: xen-devel@lists.xenproject.org >> --- >> arch/x86/pci/xen.c | 2 ++ >> drivers/pci/msi.c | 7 ++++++- >> include/linux/msi.h | 1 + >> 3 files changed, 9 insertions(+), 1 deletions(-) >> >> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c >> index 093f5f4..5ef62ed 100644 >> --- a/arch/x86/pci/xen.c >> +++ b/arch/x86/pci/xen.c >> @@ -427,6 +427,7 @@ int __init pci_xen_init(void) >> x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs; >> x86_msi.msi_mask_irq = xen_nop_msi_mask_irq; >> x86_msi.msix_mask_irq = xen_nop_msix_mask_irq; >> + pci_msi_ignore_mask = 1; >> #endif >> return 0; >> } >> @@ -508,6 +509,7 @@ int __init pci_xen_initial_domain(void) >> x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs; >> x86_msi.msi_mask_irq = xen_nop_msi_mask_irq; >> x86_msi.msix_mask_irq = xen_nop_msix_mask_irq; >> + pci_msi_ignore_mask = 1; >> #endif >> xen_setup_acpi_sci(); >> __acpi_register_gsi = acpi_register_gsi_xen; >> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c >> index 38511d9..ecb5f54 100644 >> --- a/drivers/pci/msi.c >> +++ b/drivers/pci/msi.c >> @@ -23,6 +23,7 @@ >> #include "pci.h" >> >> static int pci_msi_enable = 1; >> +int pci_msi_ignore_mask; >> >> #define msix_table_size(flags) ((flags & PCI_MSIX_FLAGS_QSIZE) + 1) >> >> @@ -166,7 +167,7 @@ u32 default_msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) >> { >> u32 mask_bits = desc->masked; >> >> - if (!desc->msi_attrib.maskbit) >> + if (pci_msi_ignore_mask || !desc->msi_attrib.maskbit) >> return 0; >> >> mask_bits &= ~mask; >> @@ -198,6 +199,10 @@ u32 default_msix_mask_irq(struct msi_desc *desc, u32 flag) >> u32 mask_bits = desc->masked; >> unsigned offset = desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE + >> PCI_MSIX_ENTRY_VECTOR_CTRL; >> + >> + if (pci_msi_ignore_mask) >> + return 0; >> + >> mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; >> if (flag) >> mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT; >> diff --git a/include/linux/msi.h b/include/linux/msi.h >> index 44f4746..86dc501 100644 >> --- a/include/linux/msi.h >> +++ b/include/linux/msi.h >> @@ -10,6 +10,7 @@ struct msi_msg { >> u32 data; /* 16 bits of msi message data */ >> }; >> >> +extern int pci_msi_ignore_mask; >> /* Helper functions */ >> struct irq_data; >> struct msi_desc; >> -- >> 1.7.1 >> > > . >
On Mon, Nov 10, 2014 at 05:04:56PM -0700, Bjorn Helgaas wrote: > On Mon, Oct 27, 2014 at 10:44:36AM +0800, Yijing Wang wrote: > > Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()") > > fixed MSI mask bug which may cause kernel crash. But the commit > > made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask" > > to ignore MSI/MSI-X to fix this issue, it's a cleaner solution. > > And the commit 0e4ccb1505a9 will be reverted in the later patch. > > The 0e4ccb1505a9 changelog says Xen guests can't write to MSI-X BARs. > But it makes mask_irq a no-op for both MSI-X and MSI. The MSI mask bit is > in config space, not in memory space. So why does mask_irq need to be a > no-op for MSI as well? Are Xen guests prohibited from writing to config The PV guests it refers do not do write to config space. They have an PCI frontend and backend driver which communicates to the backend (the trusted domain) to setup the MSI and MSI-X. The 'pci_xen_init' (arch/x86/pci/xen.c) is the one that sets up the overrides. When an MSI or MSI-X is requested it is done via the request_irq which ends up calling 'xen_setup_msi_irqs'. If you look there you can see: 173 if (type == PCI_CAP_ID_MSIX) 174 ret = xen_pci_frontend_enable_msix(dev, v, nvec); 175 else 176 ret = xen_pci_frontend_enable_msi(dev, v); 177 if (ret) 178 goto error; Which are the calls to the Xen PCI driver to communicate with the backend to setup the MSI. > space, too? (It's fine if they are; it's just that the changelog > specifically mentioned MSI-X memory space tables, and it didn't mention > config space at all.) Correct. The config space is accessible to the guest but if it writes to it - all of those values are ignored by the hypervisor - and that is because the backend is suppose to communicate to the hypervisor whether the guest can indeed setup MSI or MSI-X. > > And I *assume* there's some Xen mechanism that accomplishes the mask_irq in > a different way, since the actual mask_irq interface does nothing? (This > is really a question for 0e4ccb1505a9, since I don't think this particular > patch changes anything in that respect.) Correct. 'request_irq' ends up doing that. Or rather it ends up calling xen_setup_msi_irqs which takes care of that. The Xen PV guests (not to be confused with Xen HVM guests) run without any emulated devices. That means most of the x86 platform things - ioports, VGA, etc - are removed. Instead that functionality is provided via frontend drivers that communicate to the backend via a ring. Hopefully this clarifies it? > > > Signed-off-by: Yijing Wang <wangyijing@huawei.com> > > CC: David Vrabel <david.vrabel@citrix.com> > > CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > CC: xen-devel@lists.xenproject.org > > --- > > arch/x86/pci/xen.c | 2 ++ > > drivers/pci/msi.c | 7 ++++++- > > include/linux/msi.h | 1 + > > 3 files changed, 9 insertions(+), 1 deletions(-) > > > > diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c > > index 093f5f4..5ef62ed 100644 > > --- a/arch/x86/pci/xen.c > > +++ b/arch/x86/pci/xen.c > > @@ -427,6 +427,7 @@ int __init pci_xen_init(void) > > x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs; > > x86_msi.msi_mask_irq = xen_nop_msi_mask_irq; > > x86_msi.msix_mask_irq = xen_nop_msix_mask_irq; > > + pci_msi_ignore_mask = 1; > > #endif > > return 0; > > } > > @@ -508,6 +509,7 @@ int __init pci_xen_initial_domain(void) > > x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs; > > x86_msi.msi_mask_irq = xen_nop_msi_mask_irq; > > x86_msi.msix_mask_irq = xen_nop_msix_mask_irq; > > + pci_msi_ignore_mask = 1; > > #endif > > xen_setup_acpi_sci(); > > __acpi_register_gsi = acpi_register_gsi_xen; > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > > index 38511d9..ecb5f54 100644 > > --- a/drivers/pci/msi.c > > +++ b/drivers/pci/msi.c > > @@ -23,6 +23,7 @@ > > #include "pci.h" > > > > static int pci_msi_enable = 1; > > +int pci_msi_ignore_mask; > > > > #define msix_table_size(flags) ((flags & PCI_MSIX_FLAGS_QSIZE) + 1) > > > > @@ -166,7 +167,7 @@ u32 default_msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) > > { > > u32 mask_bits = desc->masked; > > > > - if (!desc->msi_attrib.maskbit) > > + if (pci_msi_ignore_mask || !desc->msi_attrib.maskbit) > > return 0; > > > > mask_bits &= ~mask; > > @@ -198,6 +199,10 @@ u32 default_msix_mask_irq(struct msi_desc *desc, u32 flag) > > u32 mask_bits = desc->masked; > > unsigned offset = desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE + > > PCI_MSIX_ENTRY_VECTOR_CTRL; > > + > > + if (pci_msi_ignore_mask) > > + return 0; > > + > > mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; > > if (flag) > > mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT; > > diff --git a/include/linux/msi.h b/include/linux/msi.h > > index 44f4746..86dc501 100644 > > --- a/include/linux/msi.h > > +++ b/include/linux/msi.h > > @@ -10,6 +10,7 @@ struct msi_msg { > > u32 data; /* 16 bits of msi message data */ > > }; > > > > +extern int pci_msi_ignore_mask; > > /* Helper functions */ > > struct irq_data; > > struct msi_desc; > > -- > > 1.7.1 > >
On Tue, Nov 11, 2014 at 10:45:38AM -0500, Konrad Rzeszutek Wilk wrote: > On Mon, Nov 10, 2014 at 05:04:56PM -0700, Bjorn Helgaas wrote: > > On Mon, Oct 27, 2014 at 10:44:36AM +0800, Yijing Wang wrote: > > > Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()") > > > fixed MSI mask bug which may cause kernel crash. But the commit > > > made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask" > > > to ignore MSI/MSI-X to fix this issue, it's a cleaner solution. > > > And the commit 0e4ccb1505a9 will be reverted in the later patch. > > > > The 0e4ccb1505a9 changelog says Xen guests can't write to MSI-X BARs. > > But it makes mask_irq a no-op for both MSI-X and MSI. The MSI mask bit is > > in config space, not in memory space. So why does mask_irq need to be a > > no-op for MSI as well? Are Xen guests prohibited from writing to config > > The PV guests it refers do not do write to config space. They have > an PCI frontend and backend driver which communicates to the backend (the > trusted domain) to setup the MSI and MSI-X. The 'pci_xen_init' (arch/x86/pci/xen.c) > is the one that sets up the overrides. When an MSI or MSI-X is requested > it is done via the request_irq which ends up calling 'xen_setup_msi_irqs'. > If you look there you can see: > > 173 if (type == PCI_CAP_ID_MSIX) > 174 ret = xen_pci_frontend_enable_msix(dev, v, nvec); > 175 else > 176 ret = xen_pci_frontend_enable_msi(dev, v); > 177 if (ret) > 178 goto error; > > Which are the calls to the Xen PCI driver to communicate with the > backend to setup the MSI. > > > space, too? (It's fine if they are; it's just that the changelog > > specifically mentioned MSI-X memory space tables, and it didn't mention > > config space at all.) > > Correct. The config space is accessible to the guest but if it writes > to it - all of those values are ignored by the hypervisor - and that > is because the backend is suppose to communicate to the hypervisor > whether the guest can indeed setup MSI or MSI-X. > > > > > And I *assume* there's some Xen mechanism that accomplishes the mask_irq in > > a different way, since the actual mask_irq interface does nothing? (This > > is really a question for 0e4ccb1505a9, since I don't think this particular > > patch changes anything in that respect.) > > Correct. 'request_irq' ends up doing that. Or rather it ends up > calling xen_setup_msi_irqs which takes care of that. > > The Xen PV guests (not to be confused with Xen HVM guests) run without > any emulated devices. That means most of the x86 platform things - ioports, > VGA, etc - are removed. Instead that functionality is provided via > frontend drivers that communicate to the backend via a ring. > > Hopefully this clarifies it? I think so. I propose the following changelog. Let me know if it's still inaccurate: PCI/MSI: Add pci_msi_ignore_mask to prevent MSI/MSI-X BAR writes MSI-X vector Mask Bits are in MSI-X Tables in PCI memory space. Xen guests can't write to those tables. MSI vector Mask Bits are in PCI configuration space. Xen guests can write to config space, but those writes are ignored. Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()") added a way to override default_mask_msi_irqs() and default_mask_msix_irqs() so they can be no-ops in Xen guests, but this is more complicated than necessary. Add "pci_msi_ignore_mask" in the core PCI MSI code. If set, default_mask_msi_irqs() and default_mask_msix_irqs() return without doing anything. This is less flexible, but much simpler. I guess you mentioned PV and HVM guests, and it sounds like all this only applies to HVM guests. Bjorn
On Tue, Nov 11, 2014 at 01:24:30PM -0700, Bjorn Helgaas wrote: > On Tue, Nov 11, 2014 at 10:45:38AM -0500, Konrad Rzeszutek Wilk wrote: > > On Mon, Nov 10, 2014 at 05:04:56PM -0700, Bjorn Helgaas wrote: > > > On Mon, Oct 27, 2014 at 10:44:36AM +0800, Yijing Wang wrote: > > > > Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()") > > > > fixed MSI mask bug which may cause kernel crash. But the commit > > > > made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask" > > > > to ignore MSI/MSI-X to fix this issue, it's a cleaner solution. > > > > And the commit 0e4ccb1505a9 will be reverted in the later patch. > > > > > > The 0e4ccb1505a9 changelog says Xen guests can't write to MSI-X BARs. > > > But it makes mask_irq a no-op for both MSI-X and MSI. The MSI mask bit is > > > in config space, not in memory space. So why does mask_irq need to be a > > > no-op for MSI as well? Are Xen guests prohibited from writing to config > > > > The PV guests it refers do not do write to config space. They have > > an PCI frontend and backend driver which communicates to the backend (the > > trusted domain) to setup the MSI and MSI-X. The 'pci_xen_init' (arch/x86/pci/xen.c) > > is the one that sets up the overrides. When an MSI or MSI-X is requested > > it is done via the request_irq which ends up calling 'xen_setup_msi_irqs'. > > If you look there you can see: > > > > 173 if (type == PCI_CAP_ID_MSIX) > > 174 ret = xen_pci_frontend_enable_msix(dev, v, nvec); > > 175 else > > 176 ret = xen_pci_frontend_enable_msi(dev, v); > > 177 if (ret) > > 178 goto error; > > > > Which are the calls to the Xen PCI driver to communicate with the > > backend to setup the MSI. > > > > > space, too? (It's fine if they are; it's just that the changelog > > > specifically mentioned MSI-X memory space tables, and it didn't mention > > > config space at all.) > > > > Correct. The config space is accessible to the guest but if it writes > > to it - all of those values are ignored by the hypervisor - and that > > is because the backend is suppose to communicate to the hypervisor > > whether the guest can indeed setup MSI or MSI-X. > > > > > > > > And I *assume* there's some Xen mechanism that accomplishes the mask_irq in > > > a different way, since the actual mask_irq interface does nothing? (This > > > is really a question for 0e4ccb1505a9, since I don't think this particular > > > patch changes anything in that respect.) > > > > Correct. 'request_irq' ends up doing that. Or rather it ends up > > calling xen_setup_msi_irqs which takes care of that. > > > > The Xen PV guests (not to be confused with Xen HVM guests) run without > > any emulated devices. That means most of the x86 platform things - ioports, > > VGA, etc - are removed. Instead that functionality is provided via > > frontend drivers that communicate to the backend via a ring. > > > > Hopefully this clarifies it? > > I think so. I propose the following changelog. Let me know if it's still > inaccurate: > > PCI/MSI: Add pci_msi_ignore_mask to prevent MSI/MSI-X BAR writes > > MSI-X vector Mask Bits are in MSI-X Tables in PCI memory space. Xen guests Xen PV > can't write to those tables. MSI vector Mask Bits are in PCI configuration > space. Xen guests can write to config space, but those writes are ignored. > > Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and > msix_mask_irq()") added a way to override default_mask_msi_irqs() and > default_mask_msix_irqs() so they can be no-ops in Xen guests, but this is Xen PV guests > more complicated than necessary. > > Add "pci_msi_ignore_mask" in the core PCI MSI code. If set, > default_mask_msi_irqs() and default_mask_msix_irqs() return without doing > anything. This is less flexible, but much simpler. > > I guess you mentioned PV and HVM guests, and it sounds like all this only > applies to HVM guests. No, PV guests :-) HVM guests will do the normal x86 type machinery. > > Bjorn
diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c index 093f5f4..5ef62ed 100644 --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -427,6 +427,7 @@ int __init pci_xen_init(void) x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs; x86_msi.msi_mask_irq = xen_nop_msi_mask_irq; x86_msi.msix_mask_irq = xen_nop_msix_mask_irq; + pci_msi_ignore_mask = 1; #endif return 0; } @@ -508,6 +509,7 @@ int __init pci_xen_initial_domain(void) x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs; x86_msi.msi_mask_irq = xen_nop_msi_mask_irq; x86_msi.msix_mask_irq = xen_nop_msix_mask_irq; + pci_msi_ignore_mask = 1; #endif xen_setup_acpi_sci(); __acpi_register_gsi = acpi_register_gsi_xen; diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 38511d9..ecb5f54 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -23,6 +23,7 @@ #include "pci.h" static int pci_msi_enable = 1; +int pci_msi_ignore_mask; #define msix_table_size(flags) ((flags & PCI_MSIX_FLAGS_QSIZE) + 1) @@ -166,7 +167,7 @@ u32 default_msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) { u32 mask_bits = desc->masked; - if (!desc->msi_attrib.maskbit) + if (pci_msi_ignore_mask || !desc->msi_attrib.maskbit) return 0; mask_bits &= ~mask; @@ -198,6 +199,10 @@ u32 default_msix_mask_irq(struct msi_desc *desc, u32 flag) u32 mask_bits = desc->masked; unsigned offset = desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL; + + if (pci_msi_ignore_mask) + return 0; + mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; if (flag) mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT; diff --git a/include/linux/msi.h b/include/linux/msi.h index 44f4746..86dc501 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -10,6 +10,7 @@ struct msi_msg { u32 data; /* 16 bits of msi message data */ }; +extern int pci_msi_ignore_mask; /* Helper functions */ struct irq_data; struct msi_desc;
Commit 0e4ccb1505a9 ("PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()") fixed MSI mask bug which may cause kernel crash. But the commit made MSI code complex. Introduce a new global flag "pci_msi_ignore_mask" to ignore MSI/MSI-X to fix this issue, it's a cleaner solution. And the commit 0e4ccb1505a9 will be reverted in the later patch. Signed-off-by: Yijing Wang <wangyijing@huawei.com> CC: David Vrabel <david.vrabel@citrix.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: xen-devel@lists.xenproject.org --- arch/x86/pci/xen.c | 2 ++ drivers/pci/msi.c | 7 ++++++- include/linux/msi.h | 1 + 3 files changed, 9 insertions(+), 1 deletions(-)