Message ID | 1364293331-8722-8-git-send-email-peter.maydell@linaro.org |
---|---|
State | Superseded |
Headers | show |
On Tuesday 26 March 2013, Peter Maydell wrote: > > Implement the correct IRQ mapping for the Versatile PCI controller; it > differs between realview and versatile boards, but the previous QEMU > implementation was correct only for the first PCI card on a versatile > board, since we weren't swizzling IRQs based on the slot number. > > Since this change would otherwise break any uses of PCI on Linux kernels > which have an equivalent bug (since they have effectively only been > tested against QEMU, not real hardware), we implement a mechanism > for automatically detecting those broken kernels and switching back > to the old mapping. This works by looking at the values the kernel > writes to the PCI_INTERRUPT_LINE register in the config space, which > is effectively the interrupt number the kernel expects the device > to be using. If this doesn't match reality we use the broken mapping. > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Yes, very good. We will probably introduce sparse irq support on versatile in the near future, and then the value we write into the PCI_INTERRUPT_LINE field will become arbitrary from qemu's point of view, but I will make sure that we fix the interrupt mapping in the kernel at the same time so we always fall into the "s->broken_irq_mapping = false;" case. We also need to find a way to make the new kernel work with an old qemu, and I think we can do that by using the versatile-dt board type with a PCI device node that sets all four lines to 27, while using the actual interrupt lines for the default versatile device tree. Arnd
On 26 March 2013 10:54, Arnd Bergmann <arnd@arndb.de> wrote: > Yes, very good. We will probably introduce sparse irq support on > versatile in the near future, and then the value we write into the > PCI_INTERRUPT_LINE field will become arbitrary from qemu's point > of view, but I will make sure that we fix the interrupt mapping > in the kernel at the same time so we always fall into the > "s->broken_irq_mapping = false;" case. Yeah, as long as you avoid the number 27 you're ok :-) > We also need to find a way to make the new kernel work with > an old qemu, and I think we can do that by using the versatile-dt > board type with a PCI device node that sets all four lines to > 27, while using the actual interrupt lines for the default > versatile device tree. Personally I'd be happy for you to just say "needs a new QEMU". The broken QEMU is missing so much (including working memory windows) that I think it would be a pain to get the kernel to cope with it. -- PMM
On Tuesday 26 March 2013, Peter Maydell wrote: > > On 26 March 2013 10:54, Arnd Bergmann <arnd@arndb.de> wrote: > > Yes, very good. We will probably introduce sparse irq support on > > versatile in the near future, and then the value we write into the > > PCI_INTERRUPT_LINE field will become arbitrary from qemu's point > > of view, but I will make sure that we fix the interrupt mapping > > in the kernel at the same time so we always fall into the > > "s->broken_irq_mapping = false;" case. > > Yeah, as long as you avoid the number 27 you're ok :-) Good point. I guess we'll have to keep using a legacy domain for versatile then. > > We also need to find a way to make the new kernel work with > > an old qemu, and I think we can do that by using the versatile-dt > > board type with a PCI device node that sets all four lines to > > 27, while using the actual interrupt lines for the default > > versatile device tree. > > Personally I'd be happy for you to just say "needs a new QEMU". > The broken QEMU is missing so much (including working memory > windows) that I think it would be a pain to get the kernel to > cope with it. But it was working earlier, so I'd definitely try not to break if at all possible. A lot of people use the verstatile qemu model to run kernels and I would not want to deal with the complaints I'd get if we break those. Using a separate dts file seems easy enough. Arnd
On 26 March 2013 11:08, Arnd Bergmann <arnd@arndb.de> wrote: > On Tuesday 26 March 2013, Peter Maydell wrote: >> On 26 March 2013 10:54, Arnd Bergmann <arnd@arndb.de> wrote: >> > Yes, very good. We will probably introduce sparse irq support on >> > versatile in the near future, and then the value we write into the >> > PCI_INTERRUPT_LINE field will become arbitrary from qemu's point >> > of view, but I will make sure that we fix the interrupt mapping >> > in the kernel at the same time so we always fall into the >> > "s->broken_irq_mapping = false;" case. >> >> Yeah, as long as you avoid the number 27 you're ok :-) > > Good point. I guess we'll have to keep using a legacy domain for > versatile then. I'm happy to provide some other way for QEMU to detect a new working kernel if you want to implement one in the kernel, if that's cleaner. >> > We also need to find a way to make the new kernel work with >> > an old qemu, and I think we can do that by using the versatile-dt >> > board type with a PCI device node that sets all four lines to >> > 27, while using the actual interrupt lines for the default >> > versatile device tree. >> >> Personally I'd be happy for you to just say "needs a new QEMU". >> The broken QEMU is missing so much (including working memory >> windows) that I think it would be a pain to get the kernel to >> cope with it. > > But it was working earlier, so I'd definitely try not to break > if at all possible. A lot of people use the verstatile qemu > model to run kernels and I would not want to deal with the > complaints I'd get if we break those. Using a separate dts > file seems easy enough. Yeah, but it only worked earlier because the kernel PCI controller support was so broken and limited, I think. If you run a new kernel with the old QEMU it won't work even if you avoid the irq-mapping issues. Also I really don't want to get people into the habit of using a QEMU-specific dts file. The result will be that nobody will ever move on from that dts file to the one that gets used with real hardware. Plus, you guys make kernel changes that break running on QEMU all the time, so I don't see that as a big deal really. (Most recently, vexpress needed a pile of support for the config registers that define the voltage regulators and clocks.) -- PMM
On Tue, Mar 26, 2013 at 11:17:55AM +0000, Peter Maydell wrote: > On 26 March 2013 11:08, Arnd Bergmann <arnd@arndb.de> wrote: > > On Tuesday 26 March 2013, Peter Maydell wrote: > >> On 26 March 2013 10:54, Arnd Bergmann <arnd@arndb.de> wrote: > >> > Yes, very good. We will probably introduce sparse irq support on > >> > versatile in the near future, and then the value we write into the > >> > PCI_INTERRUPT_LINE field will become arbitrary from qemu's point > >> > of view, but I will make sure that we fix the interrupt mapping > >> > in the kernel at the same time so we always fall into the > >> > "s->broken_irq_mapping = false;" case. > >> > >> Yeah, as long as you avoid the number 27 you're ok :-) > > > > Good point. I guess we'll have to keep using a legacy domain for > > versatile then. > > I'm happy to provide some other way for QEMU to detect a > new working kernel if you want to implement one in the > kernel, if that's cleaner. That's why I suggested writing some number at offset 0x08 (that's revision ID) in configuration space of device 0 on bus 0. It's guaranteed harmless on real hardware and QEMU could detect this write and say "aha new kernel, even though it uses #27". > >> > We also need to find a way to make the new kernel work with > >> > an old qemu, and I think we can do that by using the versatile-dt > >> > board type with a PCI device node that sets all four lines to > >> > 27, while using the actual interrupt lines for the default > >> > versatile device tree. > >> > >> Personally I'd be happy for you to just say "needs a new QEMU". > >> The broken QEMU is missing so much (including working memory > >> windows) that I think it would be a pain to get the kernel to > >> cope with it. > > > > But it was working earlier, so I'd definitely try not to break > > if at all possible. A lot of people use the verstatile qemu > > model to run kernels and I would not want to deal with the > > complaints I'd get if we break those. Using a separate dts > > file seems easy enough. > > Yeah, but it only worked earlier because the kernel PCI controller > support was so broken and limited, I think. If you run a new > kernel with the old QEMU it won't work even if you avoid the > irq-mapping issues. > > Also I really don't want to get people into the habit of > using a QEMU-specific dts file. The result will be that > nobody will ever move on from that dts file to the one that > gets used with real hardware. > > Plus, you guys make kernel changes that break running on > QEMU all the time, so I don't see that as a big deal really. > (Most recently, vexpress needed a pile of support for the > config registers that define the voltage regulators and clocks.) > > -- PMM I have to agree with that.
On 26 March 2013 21:12, Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Mar 26, 2013 at 11:17:55AM +0000, Peter Maydell wrote: >> On 26 March 2013 11:08, Arnd Bergmann <arnd@arndb.de> wrote: >> > On Tuesday 26 March 2013, Peter Maydell wrote: >> >> On 26 March 2013 10:54, Arnd Bergmann <arnd@arndb.de> wrote: >> >> > Yes, very good. We will probably introduce sparse irq support on >> >> > versatile in the near future, and then the value we write into the >> >> > PCI_INTERRUPT_LINE field will become arbitrary from qemu's point >> >> > of view, but I will make sure that we fix the interrupt mapping >> >> > in the kernel at the same time so we always fall into the >> >> > "s->broken_irq_mapping = false;" case. >> >> >> >> Yeah, as long as you avoid the number 27 you're ok :-) >> > >> > Good point. I guess we'll have to keep using a legacy domain for >> > versatile then. >> >> I'm happy to provide some other way for QEMU to detect a >> new working kernel if you want to implement one in the >> kernel, if that's cleaner. > > That's why I suggested writing some number at offset 0x08 > (that's revision ID) in configuration space of device 0 on bus 0. > It's guaranteed harmless on real hardware and QEMU could detect this > write and say "aha new kernel, even though it uses #27". OK, that makes sense. Would somebody like to provide a kernel patch (preferably against 2.6.35+Arnd's patchset :-)) which just prods that ID space, so I have something to test the QEMU end against? thanks -- PMM
On Thu, Apr 04, 2013 at 12:05:51PM +0100, Peter Maydell wrote: > On 28 March 2013 10:28, Peter Maydell <peter.maydell@linaro.org> wrote: > > On 26 March 2013 21:12, Michael S. Tsirkin <mst@redhat.com> wrote: > >> On Tue, Mar 26, 2013 at 11:17:55AM +0000, Peter Maydell wrote: > >>> I'm happy to provide some other way for QEMU to detect a > >>> new working kernel if you want to implement one in the > >>> kernel, if that's cleaner. > >> > >> That's why I suggested writing some number at offset 0x08 > >> (that's revision ID) in configuration space of device 0 on bus 0. > >> It's guaranteed harmless on real hardware and QEMU could detect this > >> write and say "aha new kernel, even though it uses #27". > > > > OK, that makes sense. > > In fact I've just realised that "allow new kernels to use #27 > without getting the back-compat broken irq mapping" conflicts > with "allow new kernels to kexec old broken kernels". Could you clarify why, pls? Also - does this really matter so much? kexec between version is always a risky affair... > So the > easiest approach is just to drop the code which allows us to > switch back to broken mode again. Then the new kernel's init > code can force non-broken mode by writing something other than > #27 to some device's PCI_INTERRUPT_LINE register, and then > is free to use #27 if it wants to. > > -- PMM I'm fine with this too.
On 28 March 2013 10:28, Peter Maydell <peter.maydell@linaro.org> wrote: > On 26 March 2013 21:12, Michael S. Tsirkin <mst@redhat.com> wrote: >> On Tue, Mar 26, 2013 at 11:17:55AM +0000, Peter Maydell wrote: >>> I'm happy to provide some other way for QEMU to detect a >>> new working kernel if you want to implement one in the >>> kernel, if that's cleaner. >> >> That's why I suggested writing some number at offset 0x08 >> (that's revision ID) in configuration space of device 0 on bus 0. >> It's guaranteed harmless on real hardware and QEMU could detect this >> write and say "aha new kernel, even though it uses #27". > > OK, that makes sense. In fact I've just realised that "allow new kernels to use #27 without getting the back-compat broken irq mapping" conflicts with "allow new kernels to kexec old broken kernels". So the easiest approach is just to drop the code which allows us to switch back to broken mode again. Then the new kernel's init code can force non-broken mode by writing something other than #27 to some device's PCI_INTERRUPT_LINE register, and then is free to use #27 if it wants to. -- PMM
On 4 April 2013 11:58, Michael S. Tsirkin <mst@redhat.com> wrote: > On Thu, Apr 04, 2013 at 12:05:51PM +0100, Peter Maydell wrote: >> On 28 March 2013 10:28, Peter Maydell <peter.maydell@linaro.org> wrote: >> > On 26 March 2013 21:12, Michael S. Tsirkin <mst@redhat.com> wrote: >> >> On Tue, Mar 26, 2013 at 11:17:55AM +0000, Peter Maydell wrote: >> >>> I'm happy to provide some other way for QEMU to detect a >> >>> new working kernel if you want to implement one in the >> >>> kernel, if that's cleaner. >> >> >> >> That's why I suggested writing some number at offset 0x08 >> >> (that's revision ID) in configuration space of device 0 on bus 0. >> >> It's guaranteed harmless on real hardware and QEMU could detect this >> >> write and say "aha new kernel, even though it uses #27". >> > >> > OK, that makes sense. >> >> In fact I've just realised that "allow new kernels to use #27 >> without getting the back-compat broken irq mapping" conflicts >> with "allow new kernels to kexec old broken kernels". > > Could you clarify why, pls? Either we provide a path for QEMU's state to go from "this is a new kernel and we act like hardware" to "this is an old kernel and we do not", or we don't. If we provide a way for that state transition to occur, then new kernels can't use #27; if we don't, then the kexec'd old kernel wouldn't see the old broken mapping it expects. I only stuck the "allow us to go back to the old mapping" code in there because I didn't see why it would hurt at the time. > Also - does this really matter so much? > kexec between version is always a risky affair... I agree, which is why I'm dropping that idea. -- PMM
On 26 March 2013 11:08, Arnd Bergmann <arnd@arndb.de> wrote: > On Tuesday 26 March 2013, Peter Maydell wrote: >> >> On 26 March 2013 10:54, Arnd Bergmann <arnd@arndb.de> wrote: >> > Yes, very good. We will probably introduce sparse irq support on >> > versatile in the near future, and then the value we write into the >> > PCI_INTERRUPT_LINE field will become arbitrary from qemu's point >> > of view, but I will make sure that we fix the interrupt mapping >> > in the kernel at the same time so we always fall into the >> > "s->broken_irq_mapping = false;" case. >> >> Yeah, as long as you avoid the number 27 you're ok :-) > > Good point. I guess we'll have to keep using a legacy domain for > versatile then. So there turns out to be a problem with this. Newer kernels (I'm looking at current mainline master) write a number other than 27 (in this case 94) to the PCI_INTERRUPT_LINE register, but they still assume the old broken IRQ mapping. So our "autodetect busted kernels" code doesn't work. I've also discovered that there are some kernels (including current master) which don't like it when the host PCI bridge goes at slot 29 like it does on real hardware; so I think I'll just have to revert that patch. -- PMM
diff --git a/hw/versatile_pci.c b/hw/versatile_pci.c index 576e619..1b56796 100644 --- a/hw/versatile_pci.c +++ b/hw/versatile_pci.c @@ -26,6 +26,9 @@ typedef struct { /* Constant for life of device: */ int realview; + + /* Variable state: */ + bool broken_irq_mapping; } PCIVPBState; #define TYPE_VERSATILE_PCI "versatile_pci" @@ -44,14 +47,37 @@ static inline uint32_t vpb_pci_config_addr(hwaddr addr) static void pci_vpb_config_write(void *opaque, hwaddr addr, uint64_t val, unsigned size) { - pci_data_write(opaque, vpb_pci_config_addr(addr), val, size); + /* Old and buggy versions of QEMU used the wrong mapping from + * PCI IRQs to system interrupt lines. Unfortunately the Linux + * kernel also had the corresponding bug in setting up interrupts + * (so older kernels work on QEMU and not on real hardware). + * We automatically detect these broken kernels and flip back + * to the broken irq mapping by spotting guest writes to the + * PCI_INTERRUPT_LINE register to see where the guest thinks + * interrupts are going to be routed. + * We allow arbitrary flipping between broken and non broken + * to permit corner cases like kexec between two kernels. + */ + PCIVPBState *s = (PCIVPBState *)opaque; + if (!s->realview && (addr & 0xff) == PCI_INTERRUPT_LINE) { + uint8_t devfn = addr >> 8; + if ((PCI_SLOT(devfn) % PCI_NUM_PINS) != 2) { + if (val == 27) { + s->broken_irq_mapping = true; + } else { + s->broken_irq_mapping = false; + } + } + } + pci_data_write(&s->pci_bus, vpb_pci_config_addr(addr), val, size); } static uint64_t pci_vpb_config_read(void *opaque, hwaddr addr, unsigned size) { + PCIVPBState *s = (PCIVPBState *)opaque; uint32_t val; - val = pci_data_read(opaque, vpb_pci_config_addr(addr), size); + val = pci_data_read(&s->pci_bus, vpb_pci_config_addr(addr), size); return val; } @@ -63,7 +89,47 @@ static const MemoryRegionOps pci_vpb_config_ops = { static int pci_vpb_map_irq(PCIDevice *d, int irq_num) { - return irq_num; + PCIVPBState *s = container_of(d->bus, PCIVPBState, pci_bus); + + if (s->broken_irq_mapping) { + /* Legacy broken IRQ mapping for compatibility with old and + * buggy Linux guests + */ + return irq_num; + } + + /* Slot to IRQ mapping for RealView Platform Baseboard 926 backplane + * name slot IntA IntB IntC IntD + * A 31 IRQ28 IRQ29 IRQ30 IRQ27 + * B 30 IRQ27 IRQ28 IRQ29 IRQ30 + * C 29 IRQ30 IRQ27 IRQ28 IRQ29 + * Slot C is for the host bridge; A and B the peripherals. + * Our output irqs 0..3 correspond to the baseboard's 27..30. + * + * This mapping function takes account of an oddity in the PB926 + * board wiring, where the FPGA's P_nINTA input is connected to + * the INTB connection on the board PCI edge connector, P_nINTB + * is connected to INTC, and so on, so everything is one number + * further round from where you might expect. + */ + return pci_swizzle_map_irq_fn(d, irq_num + 2); +} + +static int pci_vpb_rv_map_irq(PCIDevice *d, int irq_num) +{ + /* Slot to IRQ mapping for RealView EB and PB1176 backplane + * name slot IntA IntB IntC IntD + * A 31 IRQ50 IRQ51 IRQ48 IRQ49 + * B 30 IRQ49 IRQ50 IRQ51 IRQ48 + * C 29 IRQ48 IRQ49 IRQ50 IRQ51 + * Slot C is for the host bridge; A and B the peripherals. + * Our output irqs 0..3 correspond to the baseboard's 48..51. + * + * The PB1176 and EB boards don't have the PB926 wiring oddity + * described above; P_nINTA connects to INTA, P_nINTB to INTB + * and so on, which is why this mapping function is different. + */ + return pci_swizzle_map_irq_fn(d, irq_num + 3); } static void pci_vpb_set_irq(void *opaque, int irq_num, int level) @@ -95,13 +161,20 @@ static void pci_vpb_realize(DeviceState *dev, Error **errp) { PCIVPBState *s = PCI_VPB(dev); SysBusDevice *sbd = SYS_BUS_DEVICE(dev); + pci_map_irq_fn mapfn; int i; for (i = 0; i < 4; i++) { sysbus_init_irq(sbd, &s->irq[i]); } - pci_bus_irqs(&s->pci_bus, pci_vpb_set_irq, pci_vpb_map_irq, s->irq, 4); + if (s->realview) { + mapfn = pci_vpb_rv_map_irq; + } else { + mapfn = pci_vpb_map_irq; + } + + pci_bus_irqs(&s->pci_bus, pci_vpb_set_irq, mapfn, s->irq, 4); /* ??? Register memory space. */ @@ -110,10 +183,10 @@ static void pci_vpb_realize(DeviceState *dev, Error **errp) * 1 : PCI config window * 2 : PCI IO window */ - memory_region_init_io(&s->mem_config, &pci_vpb_config_ops, &s->pci_bus, + memory_region_init_io(&s->mem_config, &pci_vpb_config_ops, s, "pci-vpb-selfconfig", 0x1000000); sysbus_init_mmio(sbd, &s->mem_config); - memory_region_init_io(&s->mem_config2, &pci_vpb_config_ops, &s->pci_bus, + memory_region_init_io(&s->mem_config2, &pci_vpb_config_ops, s, "pci-vpb-config", 0x1000000); sysbus_init_mmio(sbd, &s->mem_config2);
Implement the correct IRQ mapping for the Versatile PCI controller; it differs between realview and versatile boards, but the previous QEMU implementation was correct only for the first PCI card on a versatile board, since we weren't swizzling IRQs based on the slot number. Since this change would otherwise break any uses of PCI on Linux kernels which have an equivalent bug (since they have effectively only been tested against QEMU, not real hardware), we implement a mechanism for automatically detecting those broken kernels and switching back to the old mapping. This works by looking at the values the kernel writes to the PCI_INTERRUPT_LINE register in the config space, which is effectively the interrupt number the kernel expects the device to be using. If this doesn't match reality we use the broken mapping. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- hw/versatile_pci.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 79 insertions(+), 6 deletions(-)