Message ID | 1414144717-32328-5-git-send-email-ian.campbell@citrix.com |
---|---|
State | New |
Headers | show |
Hi Ian, On 24/10/14 10:58, Ian Campbell wrote: > These properties are defined in ePAPR and the OpenFirmware PCI Bus Binding > Specification (IEEE Std 1275-1994). > > This replaces the xgene specific mapping. Tested on Mustang and on a model with > a PCI virtio controller. I'm wondering why you choose to map everything at Xen boot time rather than implementing PHYSDEVOP_pci_device_add to do the job? This would allow us to re-use most of the interrupts/mmio decoding provided in the device tree library. It would also avoid missing support of cascade ranges/interrupt-map. Regards,
Hi Ian, On 18/02/2015 13:50, Ian Campbell wrote: > On Tue, 2015-02-17 at 17:33 +0000, Julien Grall wrote: >> Hi Ian, >> >> On 24/10/14 10:58, Ian Campbell wrote: >>> These properties are defined in ePAPR and the OpenFirmware PCI Bus Binding >>> Specification (IEEE Std 1275-1994). >>> >>> This replaces the xgene specific mapping. Tested on Mustang and on a model with >>> a PCI virtio controller. >> >> I'm wondering why you choose to map everything at Xen boot time rather >> than implementing PHYSDEVOP_pci_device_add to do the job? > > Does pci_device_add contain sufficient information to do so? Hmmm... for the interrupt the SBDF is enough. Although for the MMIO it looks like there is no difference between PCI bars. > The regions which are being mapped are essentially the PCI host > controllers MMIO, IO and CFG "windows" which are then consumed by the > various bars of the devices on the bus. > > So mapping based on pci_device_add would require us to go from the SBDF > to a set of BARS which need mapping, which is a whole lot more complex > than just mapping all of the resources owned by the root complex through > to the h/w domain. I gave a look to the code which parse the host bridge resource (see of_pci_get_host_bridge_resources). They seem to re-use to the of_translate_* function. Would not it be possible to do the same? > Or perhaps I've misunderstood what you were suggesting? I was suggesting to do it via pci_add_device but it looks like it's only possible for IRQ not MMIO. >> This would allow us to re-use most of the interrupts/mmio decoding >> provided in the device tree library. It would also avoid missing support >> of cascade ranges/interrupt-map. > > I *think* (if I'm remembering right) I decided we don't need to worry > about cascades of these things because the second level resources are > all fully contained within the first (top level) one and so with the > approach I've taken here are all fully mapped already. That's why I made > this patch stop descending into children when such a "bus node" is > found. I don't understand this paragraph, sorry. The address range you decoded via the PCI bus may be an intermediate address which needs to be translated in the physical hardware address. Regards,
On 18/02/2015 14:37, Ian Campbell wrote: > On Wed, 2015-02-18 at 14:19 +0000, Julien Grall wrote: > I think so, and we probably should consider the two cases separately > since the right answer could reasonably differ for different resource > types. > > I am reasonably convinced that for MMIO (+IO+CFG space) we should map > everything as described by the ranges property of the top most node, it > can be considered an analogue to / extension of the reg property of that > node. Agreed. > For IRQ I'm not so sure, it's possible that routing the IRQ at > pci_add_device time might be better, or fit in better with e.g. the ACPI > architecture, but mapping everything described in interrupt-map at start > of day is also an option and a reasonably simple one, probably. I agree that it's simple. Are we sure that we would be able to get a "better" solution later without modifying the kernel? If not, we may need to keep this solution forever. > This isn't to do with IPA->PA translations but to do with translations > between different PA addressing regimes. i.e. the different addressing > schemes of difference busses. I meant bus address. The name "intermediate address" was misused, sorry. > Lets say we have a system with a PCI-ROOT device exposing a PCI bus, > which in turn contains a PCI-BRIDGE which for the sake of argument lets > say is a PCI-FOOBUS bridge. > Lets just consider the MMIO hole for now, but IRQ is basically the same. > > The ranges property on a node describes a mapping from a "parent" > address space into a "child" address space. > > For PCI-ROOT "parent" is the host physical address space and "child" is > the PCI MMIO/IO/CFG address spaces. > > For PCI-BRIDGE "parent" is the PCI-ROOT's child address space (i.e. PCI > MMIO/IO/CFG) and "child" is the FOOBUS address space. > > The inputs ("parents") of the PCI-BRIDGE ranges property must therefore > by definition be valid outputs of the PCI-ROOT ranges property (i.e. be > "child" addresses). > > Therefore if we map all of the input/parent ranges described by > PCI-ROOT's ranges property we do not need to recurse further and > consider PCI-BRIDGE's ranges property -- we've effectively already dealt > with it. > > Does that make more sense? I'm still confused, what prevents the PCI-ROOT device to not be connected to another bus? In device tree format, that would give something like: / { soc { ranges = "..."; pcie { ranges = "..."; } } } The address retrieved from the PCI-ROOT would be a bus address and not a physical address. Regards,
On 18/02/2015 14:37, Ian Campbell wrote: > I am reasonably convinced that for MMIO (+IO+CFG space) we should map > everything as described by the ranges property of the top most node, it > can be considered an analogue to / extension of the reg property of that > node. BTW, the CFG space is part of the "reg" property, which is already mapped. The "ranges" property only covers the IO/MMIO BARs.
On 18/02/2015 15:05, Julien Grall wrote: > > > On 18/02/2015 14:37, Ian Campbell wrote: >> On Wed, 2015-02-18 at 14:19 +0000, Julien Grall wrote: >> I think so, and we probably should consider the two cases separately >> since the right answer could reasonably differ for different resource >> types. >> >> I am reasonably convinced that for MMIO (+IO+CFG space) we should map >> everything as described by the ranges property of the top most node, it >> can be considered an analogue to / extension of the reg property of that >> node. > > Agreed. > >> For IRQ I'm not so sure, it's possible that routing the IRQ at >> pci_add_device time might be better, or fit in better with e.g. the ACPI >> architecture, but mapping everything described in interrupt-map at start >> of day is also an option and a reasonably simple one, probably. > > I agree that it's simple. Are we sure that we would be able to get a > "better" solution later without modifying the kernel? > > If not, we may need to keep this solution forever. > >> This isn't to do with IPA->PA translations but to do with translations >> between different PA addressing regimes. i.e. the different addressing >> schemes of difference busses. > > I meant bus address. The name "intermediate address" was misused, sorry. > >> Lets say we have a system with a PCI-ROOT device exposing a PCI bus, >> which in turn contains a PCI-BRIDGE which for the sake of argument lets >> say is a PCI-FOOBUS bridge. > >> Lets just consider the MMIO hole for now, but IRQ is basically the same. >> >> The ranges property on a node describes a mapping from a "parent" >> address space into a "child" address space. >> >> For PCI-ROOT "parent" is the host physical address space and "child" is >> the PCI MMIO/IO/CFG address spaces. >> >> For PCI-BRIDGE "parent" is the PCI-ROOT's child address space (i.e. PCI >> MMIO/IO/CFG) and "child" is the FOOBUS address space. >> >> The inputs ("parents") of the PCI-BRIDGE ranges property must therefore >> by definition be valid outputs of the PCI-ROOT ranges property (i.e. be >> "child" addresses). >> >> Therefore if we map all of the input/parent ranges described by >> PCI-ROOT's ranges property we do not need to recurse further and >> consider PCI-BRIDGE's ranges property -- we've effectively already dealt >> with it. >> >> Does that make more sense? > > I'm still confused, what prevents the PCI-ROOT device to not be > connected to another bus? > > In device tree format, that would give something like: > > / { > > soc { > ranges = "..."; > > pcie { > ranges = "..."; > } > } > } Actually the device tree of the x-gene board has something similar. / { soc { ranges; pcie { ranges = "..."; } } "ranges;" means there is not translation necessary. But nothing prevent to have a the property "ranges" set. Regards,
On 18/02/2015 15:18, Ian Campbell wrote: > On Wed, 2015-02-18 at 15:05 +0000, Julien Grall wrote: >> >> On 18/02/2015 14:37, Ian Campbell wrote: >>> On Wed, 2015-02-18 at 14:19 +0000, Julien Grall wrote: >>> I think so, and we probably should consider the two cases separately >>> since the right answer could reasonably differ for different resource >>> types. >>> >>> I am reasonably convinced that for MMIO (+IO+CFG space) we should map >>> everything as described by the ranges property of the top most node, it >>> can be considered an analogue to / extension of the reg property of that >>> node. >> >> Agreed. >> >>> For IRQ I'm not so sure, it's possible that routing the IRQ at >>> pci_add_device time might be better, or fit in better with e.g. the ACPI >>> architecture, but mapping everything described in interrupt-map at start >>> of day is also an option and a reasonably simple one, probably. >> >> I agree that it's simple. Are we sure that we would be able to get a >> "better" solution later without modifying the kernel? >> >> If not, we may need to keep this solution forever. > > True. I suppose feature flags would be one way out, but not a very > convenient one.. > >>> This isn't to do with IPA->PA translations but to do with translations >>> between different PA addressing regimes. i.e. the different addressing >>> schemes of difference busses. >> >> I meant bus address. The name "intermediate address" was misused, sorry. >> >>> Lets say we have a system with a PCI-ROOT device exposing a PCI bus, >>> which in turn contains a PCI-BRIDGE which for the sake of argument lets >>> say is a PCI-FOOBUS bridge. >> >>> Lets just consider the MMIO hole for now, but IRQ is basically the same. >>> >>> The ranges property on a node describes a mapping from a "parent" >>> address space into a "child" address space. >>> >>> For PCI-ROOT "parent" is the host physical address space and "child" is >>> the PCI MMIO/IO/CFG address spaces. >>> >>> For PCI-BRIDGE "parent" is the PCI-ROOT's child address space (i.e. PCI >>> MMIO/IO/CFG) and "child" is the FOOBUS address space. >>> >>> The inputs ("parents") of the PCI-BRIDGE ranges property must therefore >>> by definition be valid outputs of the PCI-ROOT ranges property (i.e. be >>> "child" addresses). >>> >>> Therefore if we map all of the input/parent ranges described by >>> PCI-ROOT's ranges property we do not need to recurse further and >>> consider PCI-BRIDGE's ranges property -- we've effectively already dealt >>> with it. >>> >>> Does that make more sense? >> >> I'm still confused, what prevents the PCI-ROOT device to not be >> connected to another bus? >> >> In device tree format, that would give something like: >> >> / { >> >> soc { >> ranges = "..."; >> >> pcie { >> ranges = "..."; >> } >> } >> } >> >> The address retrieved from the PCI-ROOT would be a bus address and not a >> physical address. > > Hrm, nothing, I see what you are getting at now. > > Either soc has a device_type property which we understand, in which case > we would handle it and stop recursing or (more likely for an soc) it > does not, in which case we would handle the pcie ranges property, but it > needs to be translated through the ranges property of soc, which the > patch doesn't do and probably it should. The code to do it is quite complicate and hard to maintain (actually it's a copy of the Linux one). It would be good if you can re-use the functions to translate in common/device_tree.c. I think we may have the same problem for interrupts too. Regards,
Hi Ian, On 05/03/2015 14:43, Ian Campbell wrote: > On Wed, 2015-02-18 at 15:16 +0000, Julien Grall wrote: >> "ranges;" means there is not translation necessary. But nothing prevent >> to have a the property "ranges" set. > > I don't suppose you know of a system where this translation is needed do > you? So I've got something to test with... AFAIR, no. I guess, it should not be to difficult to tweak a device tree for testing. Regards,
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index 998b6fd..00b5e07 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -907,6 +907,191 @@ static int make_timer_node(const struct domain *d, void *fdt, return res; } +static int map_pci_device_ranges(struct domain *d, + const struct dt_device_node *dev, + const struct dt_pci_ranges_entry *ranges, + const u32 len) +{ + int i, nr, res; + + if ( len % sizeof(struct dt_pci_ranges_entry) ) + { + printk(XENLOG_ERR + "%s: ranges property length is not a multiple of entry size\n", + dt_node_name(dev)); + return -EINVAL; + } + + nr = len / sizeof(*ranges); + + for ( i = 0; i < nr ; i++ ) + { + const struct dt_pci_ranges_entry *range = &ranges[i]; + u64 addr, len; + + addr = fdt64_to_cpu(range->cpu_addr); + len = fdt64_to_cpu(range->length); + + DPRINT("%s: ranges[%d]: 0x%08x.%08x.%08x 0x%"PRIx64" size 0x%"PRIx64"\n", + dt_node_name(dev), i, + fdt32_to_cpu(range->pci_addr.hi), + fdt32_to_cpu(range->pci_addr.mid), + fdt32_to_cpu(range->pci_addr.lo), + addr, len); + + res = map_mmio_regions(d, paddr_to_pfn(addr & PAGE_MASK), + DIV_ROUND_UP(len, PAGE_SIZE), + paddr_to_pfn(addr & PAGE_MASK)); + if ( res < 0 ) + { + printk(XENLOG_ERR "%s: ranges[%d]: " + "Unable to map 0x%"PRIx64" - 0x%"PRIx64" in domain %d\n", + dt_node_name(dev), i, + addr & PAGE_MASK, PAGE_ALIGN(addr + len) - 1, + d->domain_id); + return res; + } + } + return 0; +} + +static int map_device_ranges(struct domain *d, const struct dt_device_node *dev) +{ + const void *ranges; + u32 len; + + if ( !dev->type ) + return 0; /* No device type, nothing to do */ + + ranges = dt_get_property(dev, "ranges", &len); + if ( !ranges ) + return 0; /* No ranges, nothing to do */ + + if ( dt_device_type_is_equal(dev, "pci") ) + return map_pci_device_ranges(d, dev, ranges, len); + else if ( dt_device_type_is_equal(dev, "<NULL>") ) + return 0; /* "<NULL>" is used for none */ + + DPRINT("Cannot handle ranges property for non-PCI device %s type %s\n", + dt_node_name(dev), dev->type); + + /* Non-critical. */ + return 0; +} + +static int map_pci_device_interrupts(struct domain *d, + const struct dt_device_node *dev, + const __be32 *cells, const u32 len) +{ + const int intc_addr_cells = + dt_n_addr_cells_children(dt_interrupt_controller); + const size_t entry_size = + sizeof(struct dt_interrupt_map_entry) + + (intc_addr_cells + DT_NR_GIC_INTERRUPT_CELLS)*sizeof(__be32); + + int i, nr, res; + u64 parent_addr; + + if ( len % entry_size ) + { + printk(XENLOG_ERR + "%s: interrupts property length is not a multiple of entry size %zd\n", + dt_node_name(dev), entry_size); + return -EINVAL; + } + + nr = len / entry_size; + + for ( i = 0; i < nr ; i++ ) + { + const struct dt_interrupt_map_entry *map = (void *)cells; + struct dt_raw_irq dt_raw_irq; + struct dt_irq dt_irq; + int j; + + if ( fdt32_to_cpu(map->phandle) != dt_interrupt_controller->phandle ) + { + printk("%s: interrupt-map[%d]: Not connected to primary controller.\n", + dt_node_name(dev), i); + continue; + } + + dt_raw_irq.controller = dt_interrupt_controller; + dt_raw_irq.size = DT_NR_GIC_INTERRUPT_CELLS; + + cells = &map->irq_specifier[0]; + + parent_addr = dt_next_cell(intc_addr_cells, &cells); + for (j = 0; j < DT_NR_GIC_INTERRUPT_CELLS; j++) + dt_raw_irq.specifier[j] = dt_next_cell(1, &cells); + + res = dt_irq_translate(&dt_raw_irq, &dt_irq); + if ( res < 0 ) + { + printk(XENLOG_ERR + "%s: interrupt-map[%d]: Unable to translate raw IRQ\n", + dt_node_name(dev), i); + continue; + } + + DPRINT("%s: interrupt-map[%d]: 0x%"PRIx32".%"PRIx32".%"PRIx32" " + "pin=%"PRId32" phandle=%"PRIx32" " + "paddr=%"PRIx64" type=%"PRIx32" irq=%x\n", + dt_node_name(dev), i, + fdt32_to_cpu(map->pci_irq_mask.pci_addr.hi), + fdt32_to_cpu(map->pci_irq_mask.pci_addr.mid), + fdt32_to_cpu(map->pci_irq_mask.pci_addr.lo), + fdt32_to_cpu(map->pci_irq_mask.pci_pin), + fdt32_to_cpu(map->phandle), + parent_addr, dt_irq.type, dt_irq.irq); + + /* Setup the IRQ type */ + res = irq_set_type(dt_irq.irq, dt_irq.type); + if ( res ) + { + printk(XENLOG_ERR + "%s: interrupt-map[%d]: Unable to setup IRQ%"PRId32" to dom%d\n", + dt_node_name(dev), i, dt_irq.irq, d->domain_id); + return res; + } + + res = route_irq_to_guest(d, dt_irq.irq, dt_node_name(dev)); + if ( res < 0 ) + { + printk(XENLOG_ERR + "%s: interrupt-map[%d]: Unable to map IRQ%"PRId32" to dom%d\n", + dt_node_name(dev), i, dt_irq.irq, d->domain_id); + return res; + } + } + return 0; +} + +static int map_device_interrupts(struct domain *d, const struct dt_device_node *dev) +{ + const void *map; + u32 len; + + if ( !dev->type ) + return 0; /* No device type, nothing to do */ + + map = dt_get_property(dev, "interrupt-map", &len); + if ( !map ) /* nothing to do */ + return 0; + + if ( dt_device_type_is_equal(dev, "pci") ) + return map_pci_device_interrupts(d, dev, map, len); + else if ( dt_device_type_is_equal(dev, "<NULL>") ) + return 0; /* "<NULL>" is used for none */ + + printk("Cannot handle interrupt-map for non-PCI device %s type %s\n", + dt_node_name(dev), dev->type); + + /* Non-critical. */ + return 0; +} + + /* Map the device in the domain */ static int map_device(struct domain *d, struct dt_device_node *dev) { @@ -1015,6 +1200,14 @@ static int map_device(struct domain *d, struct dt_device_node *dev) } } + res = map_device_ranges(d, dev); + if ( res ) + return res; + + res = map_device_interrupts(d, dev); + if ( res ) + return res; + return 0; } diff --git a/xen/arch/arm/platforms/xgene-storm.c b/xen/arch/arm/platforms/xgene-storm.c index 29c4752..f88d306 100644 --- a/xen/arch/arm/platforms/xgene-storm.c +++ b/xen/arch/arm/platforms/xgene-storm.c @@ -40,95 +40,6 @@ static uint32_t xgene_storm_quirks(void) return PLATFORM_QUIRK_GIC_64K_STRIDE|PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI; } -static int map_one_mmio(struct domain *d, const char *what, - unsigned long start, unsigned long end) -{ - int ret; - - printk("Additional MMIO %"PRIpaddr"-%"PRIpaddr" (%s)\n", - start, end, what); - ret = map_mmio_regions(d, start, end - start + 1, start); - if ( ret ) - printk("Failed to map %s @ %"PRIpaddr" to dom%d\n", - what, start, d->domain_id); - return ret; -} - -static int map_one_spi(struct domain *d, const char *what, - unsigned int spi, unsigned int type) -{ - unsigned int irq; - int ret; - - irq = spi + 32; /* SPIs start at IRQ 32 */ - - ret = irq_set_spi_type(irq, type); - if ( ret ) - { - printk("Failed to set the type for IRQ%u\n", irq); - return ret; - } - - printk("Additional IRQ %u (%s)\n", irq, what); - - ret = route_irq_to_guest(d, irq, what); - if ( ret ) - printk("Failed to route %s to dom%d\n", what, d->domain_id); - - return ret; -} - -/* - * Xen does not currently support mapping MMIO regions and interrupt - * for bus child devices (referenced via the "ranges" and - * "interrupt-map" properties to domain 0). Instead for now map the - * necessary resources manually. - */ -static int xgene_storm_specific_mapping(struct domain *d) -{ - int ret; - - /* Map the PCIe bus resources */ - ret = map_one_mmio(d, "PCI MEM REGION", paddr_to_pfn(0xe000000000UL), - paddr_to_pfn(0xe010000000UL)); - if ( ret ) - goto err; - - ret = map_one_mmio(d, "PCI IO REGION", paddr_to_pfn(0xe080000000UL), - paddr_to_pfn(0xe080010000UL)); - if ( ret ) - goto err; - - ret = map_one_mmio(d, "PCI CFG REGION", paddr_to_pfn(0xe0d0000000UL), - paddr_to_pfn(0xe0d0200000UL)); - if ( ret ) - goto err; - ret = map_one_mmio(d, "PCI MSI REGION", paddr_to_pfn(0xe010000000UL), - paddr_to_pfn(0xe010800000UL)); - if ( ret ) - goto err; - - ret = map_one_spi(d, "PCI#INTA", 0xc2, DT_IRQ_TYPE_LEVEL_HIGH); - if ( ret ) - goto err; - - ret = map_one_spi(d, "PCI#INTB", 0xc3, DT_IRQ_TYPE_LEVEL_HIGH); - if ( ret ) - goto err; - - ret = map_one_spi(d, "PCI#INTC", 0xc4, DT_IRQ_TYPE_LEVEL_HIGH); - if ( ret ) - goto err; - - ret = map_one_spi(d, "PCI#INTD", 0xc5, DT_IRQ_TYPE_LEVEL_HIGH); - if ( ret ) - goto err; - - ret = 0; -err: - return ret; -} - static void xgene_storm_reset(void) { void __iomem *addr; @@ -177,7 +88,6 @@ PLATFORM_START(xgene_storm, "APM X-GENE STORM") .init = xgene_storm_init, .reset = xgene_storm_reset, .quirks = xgene_storm_quirks, - .specific_mapping = xgene_storm_specific_mapping, .dom0_evtchn_ppi = 24, .dom0_gnttab_start = 0x1f800000, diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h index ac2e876..d60ca94 100644 --- a/xen/include/xen/device_tree.h +++ b/xen/include/xen/device_tree.h @@ -682,6 +682,38 @@ int dt_parse_phandle_with_args(const struct dt_device_node *np, const char *cells_name, int index, struct dt_phandle_args *out_args); +/* + * Definitions for implementing parts of the OpenFirmware PCI Bus Binding + * Specification (IEEE Std 1275-1994). + */ + +struct dt_pci_unit_address { + __be32 hi, mid, lo; +} __attribute__((packed)); + +struct dt_pci_irq_mask { + struct dt_pci_unit_address pci_addr; + __be32 pci_pin; +} __attribute__((packed)); + +struct dt_pci_ranges_entry { + struct dt_pci_unit_address pci_addr; + __be64 cpu_addr; + __be64 length; +} __attribute__((packed)); + +struct dt_interrupt_map_entry { + struct dt_pci_irq_mask pci_irq_mask; + __be32 phandle; + __be32 irq_specifier[0]; + /* + * irq_specifier is: parent-addr and interrupt-descriptor, both + * sized according to the interrupt-parent's #address-cells and + * #interrupt-cells. + */ +} __attribute__((packed)); + + #endif /* __XEN_DEVICE_TREE_H */ /*
These properties are defined in ePAPR and the OpenFirmware PCI Bus Binding Specification (IEEE Std 1275-1994). This replaces the xgene specific mapping. Tested on Mustang and on a model with a PCI virtio controller. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- xen/arch/arm/domain_build.c | 193 ++++++++++++++++++++++++++++++++++ xen/arch/arm/platforms/xgene-storm.c | 90 ---------------- xen/include/xen/device_tree.h | 32 ++++++ 3 files changed, 225 insertions(+), 90 deletions(-)