Message ID | 20201202133813.6917-1-jianjun.wang@mediatek.com |
---|---|
Headers | show |
Series | PCI: mediatek: Add new generation controller support | expand |
On Wed, Dec 2, 2020 at 9:39 PM Jianjun Wang <jianjun.wang@mediatek.com> wrote: > > MediaTek's PCIe host controller has three generation HWs, the new > generation HW is an individual bridge, it supports Gen3 speed and > up to 256 MSI interrupt numbers for multi-function devices. > > Add support for new Gen3 controller which can be found on MT8192. > > Signed-off-by: Jianjun Wang <jianjun.wang@mediatek.com> > Acked-by: Ryder Lee <ryder.lee@mediatek.com> FWIW, I looked at Rob and Bjorn's comments on v4, and they seem to have been addressed (with one small nit highlighted below). > --- > This patch dependents on "PCI: Export pci_pio_to_address() for module use"[1] > to build as a kernel module. > > This interface will be used by PCI host drivers for PIO translation, > export it to support compiling those drivers as kernel modules. > > [1]http://lists.infradead.org/pipermail/linux-mediatek/2020-December/019504.html > --- > drivers/pci/controller/Kconfig | 13 + > drivers/pci/controller/Makefile | 1 + > drivers/pci/controller/pcie-mediatek-gen3.c | 1039 +++++++++++++++++++ > 3 files changed, 1053 insertions(+) > create mode 100644 drivers/pci/controller/pcie-mediatek-gen3.c > > [snip] > diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c b/drivers/pci/controller/pcie-mediatek-gen3.c > new file mode 100644 > index 000000000000..d30ea734ac0a > --- /dev/null > +++ b/drivers/pci/controller/pcie-mediatek-gen3.c > @@ -0,0 +1,1039 @@ > [snip] > +static int mtk_pcie_set_trans_table(struct mtk_pcie_port *port, > + resource_size_t cpu_addr, > + resource_size_t pci_addr, > + resource_size_t size, > + unsigned long type, int num) > +{ > + void __iomem *table; > + u32 val = 0; You don't need to init val to 0. > + > + if (num >= PCIE_MAX_TRANS_TABLES) { > + dev_notice(port->dev, "not enough translate table[%d] for addr: %#llx, limited to [%d]\n", > + num, (unsigned long long) cpu_addr, > + PCIE_MAX_TRANS_TABLES); > + return -ENODEV; > + } > + > + table = port->base + PCIE_TRANS_TABLE_BASE_REG + > + num * PCIE_ATR_TLB_SET_OFFSET; > + > + writel(lower_32_bits(cpu_addr) | PCIE_ATR_SIZE(fls(size) - 1), table); > + writel(upper_32_bits(cpu_addr), table + PCIE_ATR_SRC_ADDR_MSB_OFFSET); > + writel(lower_32_bits(pci_addr), table + PCIE_ATR_TRSL_ADDR_LSB_OFFSET); > + writel(upper_32_bits(pci_addr), table + PCIE_ATR_TRSL_ADDR_MSB_OFFSET); > + > + if (type == IORESOURCE_IO) > + val = PCIE_ATR_TYPE_IO | PCIE_ATR_TLP_TYPE_IO; > + else > + val = PCIE_ATR_TYPE_MEM | PCIE_ATR_TLP_TYPE_MEM; > + > + writel(val, table + PCIE_ATR_TRSL_PARAM_OFFSET); > + > + return 0; > +} > + > +static int mtk_pcie_startup_port(struct mtk_pcie_port *port) > +{ > + struct resource_entry *entry; > + struct pci_host_bridge *host = pci_host_bridge_from_priv(port); > + unsigned int table_index = 0; > + int err; > + u32 val; > + > + /* Set as RC mode */ > + val = readl(port->base + PCIE_SETTING_REG); > + val |= PCIE_RC_MODE; > + writel(val, port->base + PCIE_SETTING_REG); > + > + /* Set class code */ > + val = readl(port->base + PCIE_PCI_IDS_1); > + val &= ~GENMASK(31, 8); > + val |= PCI_CLASS(PCI_CLASS_BRIDGE_PCI << 8); > + writel(val, port->base + PCIE_PCI_IDS_1); > + > + /* Assert all reset signals */ > + val = readl(port->base + PCIE_RST_CTRL_REG); > + val |= PCIE_MAC_RSTB | PCIE_PHY_RSTB | PCIE_BRG_RSTB | PCIE_PE_RSTB; > + writel(val, port->base + PCIE_RST_CTRL_REG); > + > + /* De-assert reset signals */ > + val &= ~(PCIE_MAC_RSTB | PCIE_PHY_RSTB | PCIE_BRG_RSTB); > + writel(val, port->base + PCIE_RST_CTRL_REG); > + > + /* Delay 100ms to wait the reference clocks become stable */ > + usleep_range(100 * 1000, 120 * 1000); Any reason not to use msleep(100)? > + > + /* De-assert PERST# signal */ > + val &= ~PCIE_PE_RSTB; > + writel(val, port->base + PCIE_RST_CTRL_REG); > + > + /* Check if the link is up or not */ > + err = readl_poll_timeout(port->base + PCIE_LINK_STATUS_REG, val, > + !!(val & PCIE_PORT_LINKUP), 20, > + 50 * USEC_PER_MSEC); > + if (err) { > + val = readl(port->base + PCIE_LTSSM_STATUS_REG); > + dev_notice(port->dev, "PCIe link down, ltssm reg val: %#x\n", > + val); > + return err; > + } > + > + /* Set PCIe translation windows */ > + resource_list_for_each_entry(entry, &host->windows) { > + struct resource *res = entry->res; > + unsigned long type = resource_type(res); > + resource_size_t cpu_addr; > + resource_size_t pci_addr; > + resource_size_t size; > + const char *range_type; > + > + if (type == IORESOURCE_IO) { > + cpu_addr = pci_pio_to_address(res->start); > + range_type = "IO"; > + } else if (type == IORESOURCE_MEM) { > + cpu_addr = res->start; > + range_type = "MEM"; > + } else { > + continue; > + } > + > + pci_addr = res->start - entry->offset; > + size = resource_size(res); > + err = mtk_pcie_set_trans_table(port, cpu_addr, pci_addr, size, > + type, table_index); > + if (err) > + return err; > + > + dev_dbg(port->dev, "set %s trans window[%d]: cpu_addr = %#llx, pci_addr = %#llx, size = %#llx\n", > + range_type, table_index, (unsigned long long) cpu_addr, > + (unsigned long long) pci_addr, > + (unsigned long long) size); > + > + table_index++; > + } > + > + return 0; > +} > + > [snip] > +static irq_hw_number_t mtk_pcie_msi_get_hwirq(struct msi_domain_info *info, > + msi_alloc_info_t *arg) > +{ > + struct msi_desc *entry = arg->desc; > + struct mtk_pcie_port *port = info->chip_data; > + int hwirq; > + > + mutex_lock(&port->lock); > + > + hwirq = bitmap_find_free_region(port->msi_irq_in_use, PCIE_MSI_IRQS_NUM, > + order_base_2(entry->nvec_used)); > + if (hwirq < 0) { > + mutex_unlock(&port->lock); > + return -ENOSPC; > + } > + > + mutex_unlock(&port->lock); > + > + return hwirq; Code is good, but I had to look twice to make sure the mutex is unlocked. Is the following marginally better? hwirq = ...; mutex_unlock(&port->lock); if (hwirq < 0) return -ENOSPC; return hwirq; > +} > + > [snip] > +static void mtk_pcie_msi_handler(struct irq_desc *desc) > +{ > + struct mtk_pcie_msi *msi_info = irq_desc_get_handler_data(desc); > + struct irq_chip *irqchip = irq_desc_get_chip(desc); > + unsigned long msi_enable, msi_status; > + unsigned int virq; > + irq_hw_number_t bit, hwirq; > + > + chained_irq_enter(irqchip, desc); > + > + msi_enable = readl(msi_info->base + PCIE_MSI_ENABLE_OFFSET); > + while ((msi_status = readl(msi_info->base + PCIE_MSI_STATUS_OFFSET))) { > + msi_status &= msi_enable; I don't know much about MSI, but what happens if you have a bit that is set in PCIE_MSI_STATUS_OFFSET register, but not in msi_enable? Sounds like you'll just spin-loop forever without acknowledging the interrupt. > + for_each_set_bit(bit, &msi_status, PCIE_MSI_IRQS_PER_SET) { > + hwirq = bit + msi_info->index * PCIE_MSI_IRQS_PER_SET; > + virq = irq_find_mapping(msi_info->domain, hwirq); > + generic_handle_irq(virq); > + } > + } > + > + chained_irq_exit(irqchip, desc); > +} > + > [snip] > +static int __maybe_unused mtk_pcie_suspend_noirq(struct device *dev) > +{ > + struct mtk_pcie_port *port = dev_get_drvdata(dev); > + int err; > + u32 val; > + > + /* Trigger link to L2 state */ > + err = mtk_pcie_turn_off_link(port); > + if (err) { > + dev_notice(port->dev, "can not enter L2 state\n"); Rob suggested dev_error here. (and IMHO, or lot of the other dev_notice above should probably get dev_error) > + return err; > + } > + > + /* Pull down the PERST# pin */ > + val = readl(port->base + PCIE_RST_CTRL_REG); > + val |= PCIE_PE_RSTB; > + writel(val, port->base + PCIE_RST_CTRL_REG); > + > + dev_dbg(port->dev, "enter L2 state success"); > + > + clk_bulk_disable_unprepare(port->num_clks, port->clks); > + > + phy_power_off(port->phy); > + > + return 0; > +} > + > [snip]
On Mon, 2020-12-21 at 10:18 +0800, Nicolas Boichat wrote: > On Wed, Dec 2, 2020 at 9:39 PM Jianjun Wang <jianjun.wang@mediatek.com> wrote: > > > > MediaTek's PCIe host controller has three generation HWs, the new > > generation HW is an individual bridge, it supports Gen3 speed and > > up to 256 MSI interrupt numbers for multi-function devices. > > > > Add support for new Gen3 controller which can be found on MT8192. > > > > Signed-off-by: Jianjun Wang <jianjun.wang@mediatek.com> > > Acked-by: Ryder Lee <ryder.lee@mediatek.com> > > FWIW, I looked at Rob and Bjorn's comments on v4, and they seem to > have been addressed (with one small nit highlighted below). > > > --- > > This patch dependents on "PCI: Export pci_pio_to_address() for module use"[1] > > to build as a kernel module. > > > > This interface will be used by PCI host drivers for PIO translation, > > export it to support compiling those drivers as kernel modules. > > > > [1]http://lists.infradead.org/pipermail/linux-mediatek/2020-December/019504.html > > --- > > drivers/pci/controller/Kconfig | 13 + > > drivers/pci/controller/Makefile | 1 + > > drivers/pci/controller/pcie-mediatek-gen3.c | 1039 +++++++++++++++++++ > > 3 files changed, 1053 insertions(+) > > create mode 100644 drivers/pci/controller/pcie-mediatek-gen3.c > > > > [snip] > > diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c b/drivers/pci/controller/pcie-mediatek-gen3.c > > new file mode 100644 > > index 000000000000..d30ea734ac0a > > --- /dev/null > > +++ b/drivers/pci/controller/pcie-mediatek-gen3.c > > @@ -0,0 +1,1039 @@ > > [snip] > > +static int mtk_pcie_set_trans_table(struct mtk_pcie_port *port, > > + resource_size_t cpu_addr, > > + resource_size_t pci_addr, > > + resource_size_t size, > > + unsigned long type, int num) > > +{ > > + void __iomem *table; > > + u32 val = 0; > > You don't need to init val to 0. > > > + > > + if (num >= PCIE_MAX_TRANS_TABLES) { > > + dev_notice(port->dev, "not enough translate table[%d] for addr: %#llx, limited to [%d]\n", > > + num, (unsigned long long) cpu_addr, > > + PCIE_MAX_TRANS_TABLES); > > + return -ENODEV; > > + } > > + > > + table = port->base + PCIE_TRANS_TABLE_BASE_REG + > > + num * PCIE_ATR_TLB_SET_OFFSET; > > + > > + writel(lower_32_bits(cpu_addr) | PCIE_ATR_SIZE(fls(size) - 1), table); > > + writel(upper_32_bits(cpu_addr), table + PCIE_ATR_SRC_ADDR_MSB_OFFSET); > > + writel(lower_32_bits(pci_addr), table + PCIE_ATR_TRSL_ADDR_LSB_OFFSET); > > + writel(upper_32_bits(pci_addr), table + PCIE_ATR_TRSL_ADDR_MSB_OFFSET); > > + > > + if (type == IORESOURCE_IO) > > + val = PCIE_ATR_TYPE_IO | PCIE_ATR_TLP_TYPE_IO; > > + else > > + val = PCIE_ATR_TYPE_MEM | PCIE_ATR_TLP_TYPE_MEM; > > + > > + writel(val, table + PCIE_ATR_TRSL_PARAM_OFFSET); > > + > > + return 0; > > +} > > + > > +static int mtk_pcie_startup_port(struct mtk_pcie_port *port) > > +{ > > + struct resource_entry *entry; > > + struct pci_host_bridge *host = pci_host_bridge_from_priv(port); > > + unsigned int table_index = 0; > > + int err; > > + u32 val; > > + > > + /* Set as RC mode */ > > + val = readl(port->base + PCIE_SETTING_REG); > > + val |= PCIE_RC_MODE; > > + writel(val, port->base + PCIE_SETTING_REG); > > + > > + /* Set class code */ > > + val = readl(port->base + PCIE_PCI_IDS_1); > > + val &= ~GENMASK(31, 8); > > + val |= PCI_CLASS(PCI_CLASS_BRIDGE_PCI << 8); > > + writel(val, port->base + PCIE_PCI_IDS_1); > > + > > + /* Assert all reset signals */ > > + val = readl(port->base + PCIE_RST_CTRL_REG); > > + val |= PCIE_MAC_RSTB | PCIE_PHY_RSTB | PCIE_BRG_RSTB | PCIE_PE_RSTB; > > + writel(val, port->base + PCIE_RST_CTRL_REG); > > + > > + /* De-assert reset signals */ > > + val &= ~(PCIE_MAC_RSTB | PCIE_PHY_RSTB | PCIE_BRG_RSTB); > > + writel(val, port->base + PCIE_RST_CTRL_REG); > > + > > + /* Delay 100ms to wait the reference clocks become stable */ > > + usleep_range(100 * 1000, 120 * 1000); > > Any reason not to use msleep(100)? No special reasons, but it seems the msleep() should be used when the sleep time is more than 20ms (base on Documentation/timers/timers-howto.rst). I will replace to msleep(100) in the next version, thanks for your review. > > > + > > + /* De-assert PERST# signal */ > > + val &= ~PCIE_PE_RSTB; > > + writel(val, port->base + PCIE_RST_CTRL_REG); > > + > > + /* Check if the link is up or not */ > > + err = readl_poll_timeout(port->base + PCIE_LINK_STATUS_REG, val, > > + !!(val & PCIE_PORT_LINKUP), 20, > > + 50 * USEC_PER_MSEC); > > + if (err) { > > + val = readl(port->base + PCIE_LTSSM_STATUS_REG); > > + dev_notice(port->dev, "PCIe link down, ltssm reg val: %#x\n", > > + val); > > + return err; > > + } > > + > > + /* Set PCIe translation windows */ > > + resource_list_for_each_entry(entry, &host->windows) { > > + struct resource *res = entry->res; > > + unsigned long type = resource_type(res); > > + resource_size_t cpu_addr; > > + resource_size_t pci_addr; > > + resource_size_t size; > > + const char *range_type; > > + > > + if (type == IORESOURCE_IO) { > > + cpu_addr = pci_pio_to_address(res->start); > > + range_type = "IO"; > > + } else if (type == IORESOURCE_MEM) { > > + cpu_addr = res->start; > > + range_type = "MEM"; > > + } else { > > + continue; > > + } > > + > > + pci_addr = res->start - entry->offset; > > + size = resource_size(res); > > + err = mtk_pcie_set_trans_table(port, cpu_addr, pci_addr, size, > > + type, table_index); > > + if (err) > > + return err; > > + > > + dev_dbg(port->dev, "set %s trans window[%d]: cpu_addr = %#llx, pci_addr = %#llx, size = %#llx\n", > > + range_type, table_index, (unsigned long long) cpu_addr, > > + (unsigned long long) pci_addr, > > + (unsigned long long) size); > > + > > + table_index++; > > + } > > + > > + return 0; > > +} > > + > > [snip] > > +static irq_hw_number_t mtk_pcie_msi_get_hwirq(struct msi_domain_info *info, > > + msi_alloc_info_t *arg) > > +{ > > + struct msi_desc *entry = arg->desc; > > + struct mtk_pcie_port *port = info->chip_data; > > + int hwirq; > > + > > + mutex_lock(&port->lock); > > + > > + hwirq = bitmap_find_free_region(port->msi_irq_in_use, PCIE_MSI_IRQS_NUM, > > + order_base_2(entry->nvec_used)); > > + if (hwirq < 0) { > > + mutex_unlock(&port->lock); > > + return -ENOSPC; > > + } > > + > > + mutex_unlock(&port->lock); > > + > > + return hwirq; > > Code is good, but I had to look twice to make sure the mutex is > unlocked. Is the following marginally better? > > hwirq = ...; > > mutex_unlock(&port->lock); > > if (hwirq < 0) > return -ENOSPC; > > return hwirq; Impressive, I will fix it in the next version, and I think the hwirq can be returned directly since it will be a negative value if bitmap_find_free_region is failed. The code will be like the following: hwirq = ...; mutex_unlock(&port->lock); return hwirq; > > > +} > > + > > [snip] > > +static void mtk_pcie_msi_handler(struct irq_desc *desc) > > +{ > > + struct mtk_pcie_msi *msi_info = irq_desc_get_handler_data(desc); > > + struct irq_chip *irqchip = irq_desc_get_chip(desc); > > + unsigned long msi_enable, msi_status; > > + unsigned int virq; > > + irq_hw_number_t bit, hwirq; > > + > > + chained_irq_enter(irqchip, desc); > > + > > + msi_enable = readl(msi_info->base + PCIE_MSI_ENABLE_OFFSET); > > + while ((msi_status = readl(msi_info->base + PCIE_MSI_STATUS_OFFSET))) { > > + msi_status &= msi_enable; > > I don't know much about MSI, but what happens if you have a bit that > is set in PCIE_MSI_STATUS_OFFSET register, but not in msi_enable? If the bit that in PCIE_MSI_STATUS_OFFSET register is set but not in msi_enable, it must be an abnormal usage of MSI or something goes wrong, it should be ignored in case we can not find the corresponding handler. > Sounds like you'll just spin-loop forever without acknowledging the > interrupt. The interrupt will be acknowledged in the irq_ack callback of mtk_msi_irq_chip, which belongs to the msi_domain. > > > + for_each_set_bit(bit, &msi_status, PCIE_MSI_IRQS_PER_SET) { > > + hwirq = bit + msi_info->index * PCIE_MSI_IRQS_PER_SET; > > + virq = irq_find_mapping(msi_info->domain, hwirq); > > + generic_handle_irq(virq); > > + } > > + } > > + > > + chained_irq_exit(irqchip, desc); > > +} > > + > > [snip] > > +static int __maybe_unused mtk_pcie_suspend_noirq(struct device *dev) > > +{ > > + struct mtk_pcie_port *port = dev_get_drvdata(dev); > > + int err; > > + u32 val; > > + > > + /* Trigger link to L2 state */ > > + err = mtk_pcie_turn_off_link(port); > > + if (err) { > > + dev_notice(port->dev, "can not enter L2 state\n"); > > Rob suggested dev_error here. > > (and IMHO, or lot of the other dev_notice above should probably get dev_error) I will replace to dev_err, thanks for your review. > > > + return err; > > + } > > + > > + /* Pull down the PERST# pin */ > > + val = readl(port->base + PCIE_RST_CTRL_REG); > > + val |= PCIE_PE_RSTB; > > + writel(val, port->base + PCIE_RST_CTRL_REG); > > + > > + dev_dbg(port->dev, "enter L2 state success"); > > + > > + clk_bulk_disable_unprepare(port->num_clks, port->clks); > > + > > + phy_power_off(port->phy); > > + > > + return 0; > > +} > > + > > [snip] > > _______________________________________________ > Linux-mediatek mailing list > Linux-mediatek@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-mediatek
On Tue, Dec 22, 2020 at 11:38 AM Jianjun Wang <jianjun.wang@mediatek.com> wrote: > > On Mon, 2020-12-21 at 10:18 +0800, Nicolas Boichat wrote: > > On Wed, Dec 2, 2020 at 9:39 PM Jianjun Wang <jianjun.wang@mediatek.com> wrote: > > > [snip] > > > +static irq_hw_number_t mtk_pcie_msi_get_hwirq(struct msi_domain_info *info, > > > + msi_alloc_info_t *arg) > > > +{ > > > + struct msi_desc *entry = arg->desc; > > > + struct mtk_pcie_port *port = info->chip_data; > > > + int hwirq; > > > + > > > + mutex_lock(&port->lock); > > > + > > > + hwirq = bitmap_find_free_region(port->msi_irq_in_use, PCIE_MSI_IRQS_NUM, > > > + order_base_2(entry->nvec_used)); > > > + if (hwirq < 0) { > > > + mutex_unlock(&port->lock); > > > + return -ENOSPC; > > > + } > > > + > > > + mutex_unlock(&port->lock); > > > + > > > + return hwirq; > > > > Code is good, but I had to look twice to make sure the mutex is > > unlocked. Is the following marginally better? > > > > hwirq = ...; > > > > mutex_unlock(&port->lock); > > > > if (hwirq < 0) > > return -ENOSPC; > > > > return hwirq; > > Impressive, I will fix it in the next version, and I think the hwirq can > be returned directly since it will be a negative value if > bitmap_find_free_region is failed. The code will be like the following: > > hwirq = ...; > > mutex_unlock(&port->lock); > > return hwirq; SG, as long as you're okay with returning -ENOMEM instead of -ENOSPC. But now I'm having doubt if negative return values are ok, as irq_hw_number_t is unsigned long. msi_domain_alloc (https://elixir.bootlin.com/linux/latest/source/kernel/irq/msi.c#L143) uses it to call irq_find_mapping (https://elixir.bootlin.com/linux/latest/source/kernel/irq/irqdomain.c#L882) without check, and I'm not convinced irq_find_mapping will error out gracefully... > > > > > +} > > > + > > > [snip] > > > +static void mtk_pcie_msi_handler(struct irq_desc *desc) > > > +{ > > > + struct mtk_pcie_msi *msi_info = irq_desc_get_handler_data(desc); > > > + struct irq_chip *irqchip = irq_desc_get_chip(desc); > > > + unsigned long msi_enable, msi_status; > > > + unsigned int virq; > > > + irq_hw_number_t bit, hwirq; > > > + > > > + chained_irq_enter(irqchip, desc); > > > + > > > + msi_enable = readl(msi_info->base + PCIE_MSI_ENABLE_OFFSET); > > > + while ((msi_status = readl(msi_info->base + PCIE_MSI_STATUS_OFFSET))) { > > > + msi_status &= msi_enable; > > > > I don't know much about MSI, but what happens if you have a bit that > > is set in PCIE_MSI_STATUS_OFFSET register, but not in msi_enable? > > If the bit that in PCIE_MSI_STATUS_OFFSET register is set but not in > msi_enable, it must be an abnormal usage of MSI or something goes wrong, > it should be ignored in case we can not find the corresponding handler. > > > Sounds like you'll just spin-loop forever without acknowledging the > > interrupt. > > The interrupt will be acknowledged in the irq_ack callback of > mtk_msi_irq_chip, which belongs to the msi_domain. Let's try to go through it (and please explain to me if I get this wrong). Say we have: msi_enable = [PCIE_MSI_ENABLE_OFFSET] = 0x1; while loop: msi_status = [PCIE_MSI_STATUS_OFFSET] = 0x3; msi_status &= msi_enable => msi_status = 0x3 & 0x1 = 0x1; for_each_set_bit(msi_status) { do something that presumably will disable the MSI interrupt status? } (next loop iteration) msi_status = [PCIE_MSI_STATUS_OFFSET] = 0x2; msi_status &= msi_enable => msi_status = 0x2 & 0x1 = 0x0; for_each_set_bit(msi_status) => does nothing. msi_status = [PCIE_MSI_STATUS_OFFSET] = 0x2; (infinite loop) Basically, I'm wondering if you should replace the while condition statement with: while ((msi_status = readl(msi_info->base + PCIE_MSI_STATUS_OFFSET) & msi_enable))
On Tue, 2020-12-22 at 11:55 +0800, Nicolas Boichat wrote: > On Tue, Dec 22, 2020 at 11:38 AM Jianjun Wang <jianjun.wang@mediatek.com> wrote: > > > > On Mon, 2020-12-21 at 10:18 +0800, Nicolas Boichat wrote: > > > On Wed, Dec 2, 2020 at 9:39 PM Jianjun Wang <jianjun.wang@mediatek.com> wrote: > > > > [snip] > > > > +static irq_hw_number_t mtk_pcie_msi_get_hwirq(struct msi_domain_info *info, > > > > + msi_alloc_info_t *arg) > > > > +{ > > > > + struct msi_desc *entry = arg->desc; > > > > + struct mtk_pcie_port *port = info->chip_data; > > > > + int hwirq; > > > > + > > > > + mutex_lock(&port->lock); > > > > + > > > > + hwirq = bitmap_find_free_region(port->msi_irq_in_use, PCIE_MSI_IRQS_NUM, > > > > + order_base_2(entry->nvec_used)); > > > > + if (hwirq < 0) { > > > > + mutex_unlock(&port->lock); > > > > + return -ENOSPC; > > > > + } > > > > + > > > > + mutex_unlock(&port->lock); > > > > + > > > > + return hwirq; > > > > > > Code is good, but I had to look twice to make sure the mutex is > > > unlocked. Is the following marginally better? > > > > > > hwirq = ...; > > > > > > mutex_unlock(&port->lock); > > > > > > if (hwirq < 0) > > > return -ENOSPC; > > > > > > return hwirq; > > > > Impressive, I will fix it in the next version, and I think the hwirq can > > be returned directly since it will be a negative value if > > bitmap_find_free_region is failed. The code will be like the following: > > > > hwirq = ...; > > > > mutex_unlock(&port->lock); > > > > return hwirq; > > SG, as long as you're okay with returning -ENOMEM instead of -ENOSPC. > > But now I'm having doubt if negative return values are ok, as > irq_hw_number_t is unsigned long. > > msi_domain_alloc > (https://elixir.bootlin.com/linux/latest/source/kernel/irq/msi.c#L143) > uses it to call irq_find_mapping > (https://elixir.bootlin.com/linux/latest/source/kernel/irq/irqdomain.c#L882) > without check, and I'm not convinced irq_find_mapping will error out > gracefully... > I see, it seems the msi_domain_alloc function assume the get_hwirq callback always success, maybe I should allocate the real hwirq in the msi_prepare (https://elixir.bootlin.com/linux/latest/source/kernel/irq/msi.c#L304) and set it to arg->hwirq, and override the set_desc (https://elixir.bootlin.com/linux/latest/source/drivers/pci/msi.c#L1405) to prevent the modify of arg->hwirq. > > > > > > > +} > > > > + > > > > [snip] > > > > +static void mtk_pcie_msi_handler(struct irq_desc *desc) > > > > +{ > > > > + struct mtk_pcie_msi *msi_info = irq_desc_get_handler_data(desc); > > > > + struct irq_chip *irqchip = irq_desc_get_chip(desc); > > > > + unsigned long msi_enable, msi_status; > > > > + unsigned int virq; > > > > + irq_hw_number_t bit, hwirq; > > > > + > > > > + chained_irq_enter(irqchip, desc); > > > > + > > > > + msi_enable = readl(msi_info->base + PCIE_MSI_ENABLE_OFFSET); > > > > + while ((msi_status = readl(msi_info->base + PCIE_MSI_STATUS_OFFSET))) { > > > > + msi_status &= msi_enable; > > > > > > I don't know much about MSI, but what happens if you have a bit that > > > is set in PCIE_MSI_STATUS_OFFSET register, but not in msi_enable? > > > > If the bit that in PCIE_MSI_STATUS_OFFSET register is set but not in > > msi_enable, it must be an abnormal usage of MSI or something goes wrong, > > it should be ignored in case we can not find the corresponding handler. > > > > > Sounds like you'll just spin-loop forever without acknowledging the > > > interrupt. > > > > The interrupt will be acknowledged in the irq_ack callback of > > mtk_msi_irq_chip, which belongs to the msi_domain. > > Let's try to go through it (and please explain to me if I get this wrong). > > Say we have: > > msi_enable = [PCIE_MSI_ENABLE_OFFSET] = 0x1; > > while loop: > > msi_status = [PCIE_MSI_STATUS_OFFSET] = 0x3; > msi_status &= msi_enable => msi_status = 0x3 & 0x1 = 0x1; > for_each_set_bit(msi_status) { > do something that presumably will disable the MSI interrupt status? Yes, the corresponding interrupt status will be cleared. > } > (next loop iteration) > > msi_status = [PCIE_MSI_STATUS_OFFSET] = 0x2; > msi_status &= msi_enable => msi_status = 0x2 & 0x1 = 0x0; > for_each_set_bit(msi_status) => does nothing. > > msi_status = [PCIE_MSI_STATUS_OFFSET] = 0x2; > (infinite loop) > > Basically, I'm wondering if you should replace the while condition > statement with: > > while ((msi_status = readl(msi_info->base + PCIE_MSI_STATUS_OFFSET) & > msi_enable)) > Yes, it will be a dead loop if we receive an abnormal interrupt status, I will fix it in the next version, thanks for your kindly review. > _______________________________________________ > Linux-mediatek mailing list > Linux-mediatek@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-mediatek