mbox series

[v5,0/3] PCI: mediatek: Add new generation controller support

Message ID 20201202133813.6917-1-jianjun.wang@mediatek.com
Headers show
Series PCI: mediatek: Add new generation controller support | expand

Message

Jianjun Wang (王建军) Dec. 2, 2020, 1:38 p.m. UTC
These series patches add pcie-mediatek-gen3.c and dt-bindings file to
support new generation PCIe controller.

Changes in v5:
1. Remove unused macros
2. Modify the config read/write callbacks, set the config byte field
   in TLP header and use pci_generic_config_read32/write32
   to access the config space
3. Fix the settings of translation window, both MEM and IO regions
   works properly
4. Fix typos

Changes in v4:
1. Fix PCIe power up/down flow
2. Use "mac" and "phy" for reset names
3. Add clock names
4. Fix the variables type

Changes in v3:
1. Remove standard property in binding document
2. Return error number when get_optional* API throws an error
3. Use the bulk clk APIs

Changes in v2:
1. Fix the typo of dt-bindings patch
2. Remove the unnecessary properties in binding document
3. dispos the irq mappings of msi top domain when irq teardown

Jianjun Wang (3):
  dt-bindings: PCI: mediatek: Add YAML schema
  PCI: mediatek-gen3: Add MediaTek Gen3 driver for MT8192
  MAINTAINERS: update entry for MediaTek PCIe controller

 .../bindings/pci/mediatek-pcie-gen3.yaml      |  135 +++
 MAINTAINERS                                   |    1 +
 drivers/pci/controller/Kconfig                |   13 +
 drivers/pci/controller/Makefile               |    1 +
 drivers/pci/controller/pcie-mediatek-gen3.c   | 1039 +++++++++++++++++
 5 files changed, 1189 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/mediatek-pcie-gen3.yaml
 create mode 100644 drivers/pci/controller/pcie-mediatek-gen3.c

Comments

Nicolas Boichat Dec. 21, 2020, 2:18 a.m. UTC | #1
On Wed, Dec 2, 2020 at 9:39 PM Jianjun Wang <jianjun.wang@mediatek.com> wrote:
>

> MediaTek's PCIe host controller has three generation HWs, the new

> generation HW is an individual bridge, it supports Gen3 speed and

> up to 256 MSI interrupt numbers for multi-function devices.

>

> Add support for new Gen3 controller which can be found on MT8192.

>

> Signed-off-by: Jianjun Wang <jianjun.wang@mediatek.com>

> Acked-by: Ryder Lee <ryder.lee@mediatek.com>


FWIW, I looked at Rob and Bjorn's comments on v4, and they seem to
have been addressed (with one small nit highlighted below).

> ---

> This patch dependents on "PCI: Export pci_pio_to_address() for module use"[1]

> to build as a kernel module.

>

> This interface will be used by PCI host drivers for PIO translation,

> export it to support compiling those drivers as kernel modules.

>

> [1]http://lists.infradead.org/pipermail/linux-mediatek/2020-December/019504.html

> ---

>  drivers/pci/controller/Kconfig              |   13 +

>  drivers/pci/controller/Makefile             |    1 +

>  drivers/pci/controller/pcie-mediatek-gen3.c | 1039 +++++++++++++++++++

>  3 files changed, 1053 insertions(+)

>  create mode 100644 drivers/pci/controller/pcie-mediatek-gen3.c

>

> [snip]

> diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c b/drivers/pci/controller/pcie-mediatek-gen3.c

> new file mode 100644

> index 000000000000..d30ea734ac0a

> --- /dev/null

> +++ b/drivers/pci/controller/pcie-mediatek-gen3.c

> @@ -0,0 +1,1039 @@

> [snip]

> +static int mtk_pcie_set_trans_table(struct mtk_pcie_port *port,

> +                                   resource_size_t cpu_addr,

> +                                   resource_size_t pci_addr,

> +                                   resource_size_t size,

> +                                   unsigned long type, int num)

> +{

> +       void __iomem *table;

> +       u32 val = 0;


You don't need to init val to 0.

> +

> +       if (num >= PCIE_MAX_TRANS_TABLES) {

> +               dev_notice(port->dev, "not enough translate table[%d] for addr: %#llx, limited to [%d]\n",

> +                          num, (unsigned long long) cpu_addr,

> +                          PCIE_MAX_TRANS_TABLES);

> +               return -ENODEV;

> +       }

> +

> +       table = port->base + PCIE_TRANS_TABLE_BASE_REG +

> +               num * PCIE_ATR_TLB_SET_OFFSET;

> +

> +       writel(lower_32_bits(cpu_addr) | PCIE_ATR_SIZE(fls(size) - 1), table);

> +       writel(upper_32_bits(cpu_addr), table + PCIE_ATR_SRC_ADDR_MSB_OFFSET);

> +       writel(lower_32_bits(pci_addr), table + PCIE_ATR_TRSL_ADDR_LSB_OFFSET);

> +       writel(upper_32_bits(pci_addr), table + PCIE_ATR_TRSL_ADDR_MSB_OFFSET);

> +

> +       if (type == IORESOURCE_IO)

> +               val = PCIE_ATR_TYPE_IO | PCIE_ATR_TLP_TYPE_IO;

> +       else

> +               val = PCIE_ATR_TYPE_MEM | PCIE_ATR_TLP_TYPE_MEM;

> +

> +       writel(val, table + PCIE_ATR_TRSL_PARAM_OFFSET);

> +

> +       return 0;

> +}

> +

> +static int mtk_pcie_startup_port(struct mtk_pcie_port *port)

> +{

> +       struct resource_entry *entry;

> +       struct pci_host_bridge *host = pci_host_bridge_from_priv(port);

> +       unsigned int table_index = 0;

> +       int err;

> +       u32 val;

> +

> +       /* Set as RC mode */

> +       val = readl(port->base + PCIE_SETTING_REG);

> +       val |= PCIE_RC_MODE;

> +       writel(val, port->base + PCIE_SETTING_REG);

> +

> +       /* Set class code */

> +       val = readl(port->base + PCIE_PCI_IDS_1);

> +       val &= ~GENMASK(31, 8);

> +       val |= PCI_CLASS(PCI_CLASS_BRIDGE_PCI << 8);

> +       writel(val, port->base + PCIE_PCI_IDS_1);

> +

> +       /* Assert all reset signals */

> +       val = readl(port->base + PCIE_RST_CTRL_REG);

> +       val |= PCIE_MAC_RSTB | PCIE_PHY_RSTB | PCIE_BRG_RSTB | PCIE_PE_RSTB;

> +       writel(val, port->base + PCIE_RST_CTRL_REG);

> +

> +       /* De-assert reset signals */

> +       val &= ~(PCIE_MAC_RSTB | PCIE_PHY_RSTB | PCIE_BRG_RSTB);

> +       writel(val, port->base + PCIE_RST_CTRL_REG);

> +

> +       /* Delay 100ms to wait the reference clocks become stable */

> +       usleep_range(100 * 1000, 120 * 1000);


Any reason not to use msleep(100)?

> +

> +       /* De-assert PERST# signal */

> +       val &= ~PCIE_PE_RSTB;

> +       writel(val, port->base + PCIE_RST_CTRL_REG);

> +

> +       /* Check if the link is up or not */

> +       err = readl_poll_timeout(port->base + PCIE_LINK_STATUS_REG, val,

> +                       !!(val & PCIE_PORT_LINKUP), 20,

> +                       50 * USEC_PER_MSEC);

> +       if (err) {

> +               val = readl(port->base + PCIE_LTSSM_STATUS_REG);

> +               dev_notice(port->dev, "PCIe link down, ltssm reg val: %#x\n",

> +                          val);

> +               return err;

> +       }

> +

> +       /* Set PCIe translation windows */

> +       resource_list_for_each_entry(entry, &host->windows) {

> +               struct resource *res = entry->res;

> +               unsigned long type = resource_type(res);

> +               resource_size_t cpu_addr;

> +               resource_size_t pci_addr;

> +               resource_size_t size;

> +               const char *range_type;

> +

> +               if (type == IORESOURCE_IO) {

> +                       cpu_addr = pci_pio_to_address(res->start);

> +                       range_type = "IO";

> +               } else if (type == IORESOURCE_MEM) {

> +                       cpu_addr = res->start;

> +                       range_type = "MEM";

> +               } else {

> +                       continue;

> +               }

> +

> +               pci_addr = res->start - entry->offset;

> +               size = resource_size(res);

> +               err = mtk_pcie_set_trans_table(port, cpu_addr, pci_addr, size,

> +                                              type, table_index);

> +               if (err)

> +                       return err;

> +

> +               dev_dbg(port->dev, "set %s trans window[%d]: cpu_addr = %#llx, pci_addr = %#llx, size = %#llx\n",

> +                       range_type, table_index, (unsigned long long) cpu_addr,

> +                       (unsigned long long) pci_addr,

> +                       (unsigned long long) size);

> +

> +               table_index++;

> +       }

> +

> +       return 0;

> +}

> +

> [snip]

> +static irq_hw_number_t mtk_pcie_msi_get_hwirq(struct msi_domain_info *info,

> +                                             msi_alloc_info_t *arg)

> +{

> +       struct msi_desc *entry = arg->desc;

> +       struct mtk_pcie_port *port = info->chip_data;

> +       int hwirq;

> +

> +       mutex_lock(&port->lock);

> +

> +       hwirq = bitmap_find_free_region(port->msi_irq_in_use, PCIE_MSI_IRQS_NUM,

> +                                       order_base_2(entry->nvec_used));

> +       if (hwirq < 0) {

> +               mutex_unlock(&port->lock);

> +               return -ENOSPC;

> +       }

> +

> +       mutex_unlock(&port->lock);

> +

> +       return hwirq;


Code is good, but I had to look twice to make sure the mutex is
unlocked. Is the following marginally better?

hwirq = ...;

mutex_unlock(&port->lock);

if (hwirq < 0)
   return -ENOSPC;

return hwirq;

> +}

> +

> [snip]

> +static void mtk_pcie_msi_handler(struct irq_desc *desc)

> +{

> +       struct mtk_pcie_msi *msi_info = irq_desc_get_handler_data(desc);

> +       struct irq_chip *irqchip = irq_desc_get_chip(desc);

> +       unsigned long msi_enable, msi_status;

> +       unsigned int virq;

> +       irq_hw_number_t bit, hwirq;

> +

> +       chained_irq_enter(irqchip, desc);

> +

> +       msi_enable = readl(msi_info->base + PCIE_MSI_ENABLE_OFFSET);

> +       while ((msi_status = readl(msi_info->base + PCIE_MSI_STATUS_OFFSET))) {

> +               msi_status &= msi_enable;


I don't know much about MSI, but what happens if you have a bit that
is set in PCIE_MSI_STATUS_OFFSET register, but not in msi_enable?
Sounds like you'll just spin-loop forever without acknowledging the
interrupt.

> +               for_each_set_bit(bit, &msi_status, PCIE_MSI_IRQS_PER_SET) {

> +                       hwirq = bit + msi_info->index * PCIE_MSI_IRQS_PER_SET;

> +                       virq = irq_find_mapping(msi_info->domain, hwirq);

> +                       generic_handle_irq(virq);

> +               }

> +       }

> +

> +       chained_irq_exit(irqchip, desc);

> +}

> +

> [snip]

> +static int __maybe_unused mtk_pcie_suspend_noirq(struct device *dev)

> +{

> +       struct mtk_pcie_port *port = dev_get_drvdata(dev);

> +       int err;

> +       u32 val;

> +

> +       /* Trigger link to L2 state */

> +       err = mtk_pcie_turn_off_link(port);

> +       if (err) {

> +               dev_notice(port->dev, "can not enter L2 state\n");


Rob suggested dev_error here.

(and IMHO, or lot of the other dev_notice above should probably get dev_error)

> +               return err;

> +       }

> +

> +       /* Pull down the PERST# pin */

> +       val = readl(port->base + PCIE_RST_CTRL_REG);

> +       val |= PCIE_PE_RSTB;

> +       writel(val, port->base + PCIE_RST_CTRL_REG);

> +

> +       dev_dbg(port->dev, "enter L2 state success");

> +

> +       clk_bulk_disable_unprepare(port->num_clks, port->clks);

> +

> +       phy_power_off(port->phy);

> +

> +       return 0;

> +}

> +

> [snip]
Jianjun Wang (王建军) Dec. 22, 2020, 3:38 a.m. UTC | #2
On Mon, 2020-12-21 at 10:18 +0800, Nicolas Boichat wrote:
> On Wed, Dec 2, 2020 at 9:39 PM Jianjun Wang <jianjun.wang@mediatek.com> wrote:

> >

> > MediaTek's PCIe host controller has three generation HWs, the new

> > generation HW is an individual bridge, it supports Gen3 speed and

> > up to 256 MSI interrupt numbers for multi-function devices.

> >

> > Add support for new Gen3 controller which can be found on MT8192.

> >

> > Signed-off-by: Jianjun Wang <jianjun.wang@mediatek.com>

> > Acked-by: Ryder Lee <ryder.lee@mediatek.com>

> 

> FWIW, I looked at Rob and Bjorn's comments on v4, and they seem to

> have been addressed (with one small nit highlighted below).

> 

> > ---

> > This patch dependents on "PCI: Export pci_pio_to_address() for module use"[1]

> > to build as a kernel module.

> >

> > This interface will be used by PCI host drivers for PIO translation,

> > export it to support compiling those drivers as kernel modules.

> >

> > [1]http://lists.infradead.org/pipermail/linux-mediatek/2020-December/019504.html

> > ---

> >  drivers/pci/controller/Kconfig              |   13 +

> >  drivers/pci/controller/Makefile             |    1 +

> >  drivers/pci/controller/pcie-mediatek-gen3.c | 1039 +++++++++++++++++++

> >  3 files changed, 1053 insertions(+)

> >  create mode 100644 drivers/pci/controller/pcie-mediatek-gen3.c

> >

> > [snip]

> > diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c b/drivers/pci/controller/pcie-mediatek-gen3.c

> > new file mode 100644

> > index 000000000000..d30ea734ac0a

> > --- /dev/null

> > +++ b/drivers/pci/controller/pcie-mediatek-gen3.c

> > @@ -0,0 +1,1039 @@

> > [snip]

> > +static int mtk_pcie_set_trans_table(struct mtk_pcie_port *port,

> > +                                   resource_size_t cpu_addr,

> > +                                   resource_size_t pci_addr,

> > +                                   resource_size_t size,

> > +                                   unsigned long type, int num)

> > +{

> > +       void __iomem *table;

> > +       u32 val = 0;

> 

> You don't need to init val to 0.

> 

> > +

> > +       if (num >= PCIE_MAX_TRANS_TABLES) {

> > +               dev_notice(port->dev, "not enough translate table[%d] for addr: %#llx, limited to [%d]\n",

> > +                          num, (unsigned long long) cpu_addr,

> > +                          PCIE_MAX_TRANS_TABLES);

> > +               return -ENODEV;

> > +       }

> > +

> > +       table = port->base + PCIE_TRANS_TABLE_BASE_REG +

> > +               num * PCIE_ATR_TLB_SET_OFFSET;

> > +

> > +       writel(lower_32_bits(cpu_addr) | PCIE_ATR_SIZE(fls(size) - 1), table);

> > +       writel(upper_32_bits(cpu_addr), table + PCIE_ATR_SRC_ADDR_MSB_OFFSET);

> > +       writel(lower_32_bits(pci_addr), table + PCIE_ATR_TRSL_ADDR_LSB_OFFSET);

> > +       writel(upper_32_bits(pci_addr), table + PCIE_ATR_TRSL_ADDR_MSB_OFFSET);

> > +

> > +       if (type == IORESOURCE_IO)

> > +               val = PCIE_ATR_TYPE_IO | PCIE_ATR_TLP_TYPE_IO;

> > +       else

> > +               val = PCIE_ATR_TYPE_MEM | PCIE_ATR_TLP_TYPE_MEM;

> > +

> > +       writel(val, table + PCIE_ATR_TRSL_PARAM_OFFSET);

> > +

> > +       return 0;

> > +}

> > +

> > +static int mtk_pcie_startup_port(struct mtk_pcie_port *port)

> > +{

> > +       struct resource_entry *entry;

> > +       struct pci_host_bridge *host = pci_host_bridge_from_priv(port);

> > +       unsigned int table_index = 0;

> > +       int err;

> > +       u32 val;

> > +

> > +       /* Set as RC mode */

> > +       val = readl(port->base + PCIE_SETTING_REG);

> > +       val |= PCIE_RC_MODE;

> > +       writel(val, port->base + PCIE_SETTING_REG);

> > +

> > +       /* Set class code */

> > +       val = readl(port->base + PCIE_PCI_IDS_1);

> > +       val &= ~GENMASK(31, 8);

> > +       val |= PCI_CLASS(PCI_CLASS_BRIDGE_PCI << 8);

> > +       writel(val, port->base + PCIE_PCI_IDS_1);

> > +

> > +       /* Assert all reset signals */

> > +       val = readl(port->base + PCIE_RST_CTRL_REG);

> > +       val |= PCIE_MAC_RSTB | PCIE_PHY_RSTB | PCIE_BRG_RSTB | PCIE_PE_RSTB;

> > +       writel(val, port->base + PCIE_RST_CTRL_REG);

> > +

> > +       /* De-assert reset signals */

> > +       val &= ~(PCIE_MAC_RSTB | PCIE_PHY_RSTB | PCIE_BRG_RSTB);

> > +       writel(val, port->base + PCIE_RST_CTRL_REG);

> > +

> > +       /* Delay 100ms to wait the reference clocks become stable */

> > +       usleep_range(100 * 1000, 120 * 1000);

> 

> Any reason not to use msleep(100)?


No special reasons, but it seems the msleep() should be used when the
sleep time is more than 20ms (base on
Documentation/timers/timers-howto.rst).

I will replace to msleep(100) in the next version, thanks for your
review.
> 

> > +

> > +       /* De-assert PERST# signal */

> > +       val &= ~PCIE_PE_RSTB;

> > +       writel(val, port->base + PCIE_RST_CTRL_REG);

> > +

> > +       /* Check if the link is up or not */

> > +       err = readl_poll_timeout(port->base + PCIE_LINK_STATUS_REG, val,

> > +                       !!(val & PCIE_PORT_LINKUP), 20,

> > +                       50 * USEC_PER_MSEC);

> > +       if (err) {

> > +               val = readl(port->base + PCIE_LTSSM_STATUS_REG);

> > +               dev_notice(port->dev, "PCIe link down, ltssm reg val: %#x\n",

> > +                          val);

> > +               return err;

> > +       }

> > +

> > +       /* Set PCIe translation windows */

> > +       resource_list_for_each_entry(entry, &host->windows) {

> > +               struct resource *res = entry->res;

> > +               unsigned long type = resource_type(res);

> > +               resource_size_t cpu_addr;

> > +               resource_size_t pci_addr;

> > +               resource_size_t size;

> > +               const char *range_type;

> > +

> > +               if (type == IORESOURCE_IO) {

> > +                       cpu_addr = pci_pio_to_address(res->start);

> > +                       range_type = "IO";

> > +               } else if (type == IORESOURCE_MEM) {

> > +                       cpu_addr = res->start;

> > +                       range_type = "MEM";

> > +               } else {

> > +                       continue;

> > +               }

> > +

> > +               pci_addr = res->start - entry->offset;

> > +               size = resource_size(res);

> > +               err = mtk_pcie_set_trans_table(port, cpu_addr, pci_addr, size,

> > +                                              type, table_index);

> > +               if (err)

> > +                       return err;

> > +

> > +               dev_dbg(port->dev, "set %s trans window[%d]: cpu_addr = %#llx, pci_addr = %#llx, size = %#llx\n",

> > +                       range_type, table_index, (unsigned long long) cpu_addr,

> > +                       (unsigned long long) pci_addr,

> > +                       (unsigned long long) size);

> > +

> > +               table_index++;

> > +       }

> > +

> > +       return 0;

> > +}

> > +

> > [snip]

> > +static irq_hw_number_t mtk_pcie_msi_get_hwirq(struct msi_domain_info *info,

> > +                                             msi_alloc_info_t *arg)

> > +{

> > +       struct msi_desc *entry = arg->desc;

> > +       struct mtk_pcie_port *port = info->chip_data;

> > +       int hwirq;

> > +

> > +       mutex_lock(&port->lock);

> > +

> > +       hwirq = bitmap_find_free_region(port->msi_irq_in_use, PCIE_MSI_IRQS_NUM,

> > +                                       order_base_2(entry->nvec_used));

> > +       if (hwirq < 0) {

> > +               mutex_unlock(&port->lock);

> > +               return -ENOSPC;

> > +       }

> > +

> > +       mutex_unlock(&port->lock);

> > +

> > +       return hwirq;

> 

> Code is good, but I had to look twice to make sure the mutex is

> unlocked. Is the following marginally better?

> 

> hwirq = ...;

> 

> mutex_unlock(&port->lock);

> 

> if (hwirq < 0)

>    return -ENOSPC;

> 

> return hwirq;


Impressive, I will fix it in the next version, and I think the hwirq can
be returned directly since it will be a negative value if
bitmap_find_free_region is failed. The code will be like the following:

hwirq = ...;

mutex_unlock(&port->lock);

return hwirq;

> 

> > +}

> > +

> > [snip]

> > +static void mtk_pcie_msi_handler(struct irq_desc *desc)

> > +{

> > +       struct mtk_pcie_msi *msi_info = irq_desc_get_handler_data(desc);

> > +       struct irq_chip *irqchip = irq_desc_get_chip(desc);

> > +       unsigned long msi_enable, msi_status;

> > +       unsigned int virq;

> > +       irq_hw_number_t bit, hwirq;

> > +

> > +       chained_irq_enter(irqchip, desc);

> > +

> > +       msi_enable = readl(msi_info->base + PCIE_MSI_ENABLE_OFFSET);

> > +       while ((msi_status = readl(msi_info->base + PCIE_MSI_STATUS_OFFSET))) {

> > +               msi_status &= msi_enable;

> 

> I don't know much about MSI, but what happens if you have a bit that

> is set in PCIE_MSI_STATUS_OFFSET register, but not in msi_enable?


If the bit that in PCIE_MSI_STATUS_OFFSET register is set but not in
msi_enable, it must be an abnormal usage of MSI or something goes wrong,
it should be ignored in case we can not find the corresponding handler.

> Sounds like you'll just spin-loop forever without acknowledging the

> interrupt.


The interrupt will be acknowledged in the irq_ack callback of
mtk_msi_irq_chip, which belongs to the msi_domain.

> 

> > +               for_each_set_bit(bit, &msi_status, PCIE_MSI_IRQS_PER_SET) {

> > +                       hwirq = bit + msi_info->index * PCIE_MSI_IRQS_PER_SET;

> > +                       virq = irq_find_mapping(msi_info->domain, hwirq);

> > +                       generic_handle_irq(virq);

> > +               }

> > +       }

> > +

> > +       chained_irq_exit(irqchip, desc);

> > +}

> > +

> > [snip]

> > +static int __maybe_unused mtk_pcie_suspend_noirq(struct device *dev)

> > +{

> > +       struct mtk_pcie_port *port = dev_get_drvdata(dev);

> > +       int err;

> > +       u32 val;

> > +

> > +       /* Trigger link to L2 state */

> > +       err = mtk_pcie_turn_off_link(port);

> > +       if (err) {

> > +               dev_notice(port->dev, "can not enter L2 state\n");

> 

> Rob suggested dev_error here.

> 

> (and IMHO, or lot of the other dev_notice above should probably get dev_error)


I will replace to dev_err, thanks for your review.
> 

> > +               return err;

> > +       }

> > +

> > +       /* Pull down the PERST# pin */

> > +       val = readl(port->base + PCIE_RST_CTRL_REG);

> > +       val |= PCIE_PE_RSTB;

> > +       writel(val, port->base + PCIE_RST_CTRL_REG);

> > +

> > +       dev_dbg(port->dev, "enter L2 state success");

> > +

> > +       clk_bulk_disable_unprepare(port->num_clks, port->clks);

> > +

> > +       phy_power_off(port->phy);

> > +

> > +       return 0;

> > +}

> > +

> > [snip]

> 

> _______________________________________________

> Linux-mediatek mailing list

> Linux-mediatek@lists.infradead.org

> http://lists.infradead.org/mailman/listinfo/linux-mediatek
Nicolas Boichat Dec. 22, 2020, 3:55 a.m. UTC | #3
On Tue, Dec 22, 2020 at 11:38 AM Jianjun Wang <jianjun.wang@mediatek.com> wrote:
>

> On Mon, 2020-12-21 at 10:18 +0800, Nicolas Boichat wrote:

> > On Wed, Dec 2, 2020 at 9:39 PM Jianjun Wang <jianjun.wang@mediatek.com> wrote:

> > > [snip]

> > > +static irq_hw_number_t mtk_pcie_msi_get_hwirq(struct msi_domain_info *info,

> > > +                                             msi_alloc_info_t *arg)

> > > +{

> > > +       struct msi_desc *entry = arg->desc;

> > > +       struct mtk_pcie_port *port = info->chip_data;

> > > +       int hwirq;

> > > +

> > > +       mutex_lock(&port->lock);

> > > +

> > > +       hwirq = bitmap_find_free_region(port->msi_irq_in_use, PCIE_MSI_IRQS_NUM,

> > > +                                       order_base_2(entry->nvec_used));

> > > +       if (hwirq < 0) {

> > > +               mutex_unlock(&port->lock);

> > > +               return -ENOSPC;

> > > +       }

> > > +

> > > +       mutex_unlock(&port->lock);

> > > +

> > > +       return hwirq;

> >

> > Code is good, but I had to look twice to make sure the mutex is

> > unlocked. Is the following marginally better?

> >

> > hwirq = ...;

> >

> > mutex_unlock(&port->lock);

> >

> > if (hwirq < 0)

> >    return -ENOSPC;

> >

> > return hwirq;

>

> Impressive, I will fix it in the next version, and I think the hwirq can

> be returned directly since it will be a negative value if

> bitmap_find_free_region is failed. The code will be like the following:

>

> hwirq = ...;

>

> mutex_unlock(&port->lock);

>

> return hwirq;


SG, as long as you're okay with returning -ENOMEM instead of -ENOSPC.

But now I'm having doubt if negative return values are ok, as
irq_hw_number_t is unsigned long.

msi_domain_alloc
(https://elixir.bootlin.com/linux/latest/source/kernel/irq/msi.c#L143)
uses it to call irq_find_mapping
(https://elixir.bootlin.com/linux/latest/source/kernel/irq/irqdomain.c#L882)
without check, and I'm not convinced irq_find_mapping will error out
gracefully...

> >

> > > +}

> > > +

> > > [snip]

> > > +static void mtk_pcie_msi_handler(struct irq_desc *desc)

> > > +{

> > > +       struct mtk_pcie_msi *msi_info = irq_desc_get_handler_data(desc);

> > > +       struct irq_chip *irqchip = irq_desc_get_chip(desc);

> > > +       unsigned long msi_enable, msi_status;

> > > +       unsigned int virq;

> > > +       irq_hw_number_t bit, hwirq;

> > > +

> > > +       chained_irq_enter(irqchip, desc);

> > > +

> > > +       msi_enable = readl(msi_info->base + PCIE_MSI_ENABLE_OFFSET);

> > > +       while ((msi_status = readl(msi_info->base + PCIE_MSI_STATUS_OFFSET))) {

> > > +               msi_status &= msi_enable;

> >

> > I don't know much about MSI, but what happens if you have a bit that

> > is set in PCIE_MSI_STATUS_OFFSET register, but not in msi_enable?

>

> If the bit that in PCIE_MSI_STATUS_OFFSET register is set but not in

> msi_enable, it must be an abnormal usage of MSI or something goes wrong,

> it should be ignored in case we can not find the corresponding handler.

>

> > Sounds like you'll just spin-loop forever without acknowledging the

> > interrupt.

>

> The interrupt will be acknowledged in the irq_ack callback of

> mtk_msi_irq_chip, which belongs to the msi_domain.


Let's try to go through it (and please explain to me if I get this wrong).

Say we have:

msi_enable = [PCIE_MSI_ENABLE_OFFSET] = 0x1;

while loop:

msi_status = [PCIE_MSI_STATUS_OFFSET] = 0x3;
msi_status &= msi_enable => msi_status = 0x3 & 0x1 = 0x1;
for_each_set_bit(msi_status) {
   do something that presumably will disable the MSI interrupt status?
}
(next loop iteration)

msi_status = [PCIE_MSI_STATUS_OFFSET] = 0x2;
msi_status &= msi_enable => msi_status = 0x2 & 0x1 = 0x0;
for_each_set_bit(msi_status) => does nothing.

msi_status = [PCIE_MSI_STATUS_OFFSET] = 0x2;
(infinite loop)

Basically, I'm wondering if you should replace the while condition
statement with:

while ((msi_status = readl(msi_info->base + PCIE_MSI_STATUS_OFFSET) &
msi_enable))
Jianjun Wang (王建军) Dec. 22, 2020, 8:38 a.m. UTC | #4
On Tue, 2020-12-22 at 11:55 +0800, Nicolas Boichat wrote:
> On Tue, Dec 22, 2020 at 11:38 AM Jianjun Wang <jianjun.wang@mediatek.com> wrote:

> >

> > On Mon, 2020-12-21 at 10:18 +0800, Nicolas Boichat wrote:

> > > On Wed, Dec 2, 2020 at 9:39 PM Jianjun Wang <jianjun.wang@mediatek.com> wrote:

> > > > [snip]

> > > > +static irq_hw_number_t mtk_pcie_msi_get_hwirq(struct msi_domain_info *info,

> > > > +                                             msi_alloc_info_t *arg)

> > > > +{

> > > > +       struct msi_desc *entry = arg->desc;

> > > > +       struct mtk_pcie_port *port = info->chip_data;

> > > > +       int hwirq;

> > > > +

> > > > +       mutex_lock(&port->lock);

> > > > +

> > > > +       hwirq = bitmap_find_free_region(port->msi_irq_in_use, PCIE_MSI_IRQS_NUM,

> > > > +                                       order_base_2(entry->nvec_used));

> > > > +       if (hwirq < 0) {

> > > > +               mutex_unlock(&port->lock);

> > > > +               return -ENOSPC;

> > > > +       }

> > > > +

> > > > +       mutex_unlock(&port->lock);

> > > > +

> > > > +       return hwirq;

> > >

> > > Code is good, but I had to look twice to make sure the mutex is

> > > unlocked. Is the following marginally better?

> > >

> > > hwirq = ...;

> > >

> > > mutex_unlock(&port->lock);

> > >

> > > if (hwirq < 0)

> > >    return -ENOSPC;

> > >

> > > return hwirq;

> >

> > Impressive, I will fix it in the next version, and I think the hwirq can

> > be returned directly since it will be a negative value if

> > bitmap_find_free_region is failed. The code will be like the following:

> >

> > hwirq = ...;

> >

> > mutex_unlock(&port->lock);

> >

> > return hwirq;

> 

> SG, as long as you're okay with returning -ENOMEM instead of -ENOSPC.

> 

> But now I'm having doubt if negative return values are ok, as

> irq_hw_number_t is unsigned long.

> 

> msi_domain_alloc

> (https://elixir.bootlin.com/linux/latest/source/kernel/irq/msi.c#L143)

> uses it to call irq_find_mapping

> (https://elixir.bootlin.com/linux/latest/source/kernel/irq/irqdomain.c#L882)

> without check, and I'm not convinced irq_find_mapping will error out

> gracefully...

> 

I see, it seems the msi_domain_alloc function assume the get_hwirq
callback always success, maybe I should allocate the real hwirq in the
msi_prepare
(https://elixir.bootlin.com/linux/latest/source/kernel/irq/msi.c#L304)
and set it to arg->hwirq, and override the set_desc
(https://elixir.bootlin.com/linux/latest/source/drivers/pci/msi.c#L1405)
to prevent the modify of arg->hwirq.

> > >

> > > > +}

> > > > +

> > > > [snip]

> > > > +static void mtk_pcie_msi_handler(struct irq_desc *desc)

> > > > +{

> > > > +       struct mtk_pcie_msi *msi_info = irq_desc_get_handler_data(desc);

> > > > +       struct irq_chip *irqchip = irq_desc_get_chip(desc);

> > > > +       unsigned long msi_enable, msi_status;

> > > > +       unsigned int virq;

> > > > +       irq_hw_number_t bit, hwirq;

> > > > +

> > > > +       chained_irq_enter(irqchip, desc);

> > > > +

> > > > +       msi_enable = readl(msi_info->base + PCIE_MSI_ENABLE_OFFSET);

> > > > +       while ((msi_status = readl(msi_info->base + PCIE_MSI_STATUS_OFFSET))) {

> > > > +               msi_status &= msi_enable;

> > >

> > > I don't know much about MSI, but what happens if you have a bit that

> > > is set in PCIE_MSI_STATUS_OFFSET register, but not in msi_enable?

> >

> > If the bit that in PCIE_MSI_STATUS_OFFSET register is set but not in

> > msi_enable, it must be an abnormal usage of MSI or something goes wrong,

> > it should be ignored in case we can not find the corresponding handler.

> >

> > > Sounds like you'll just spin-loop forever without acknowledging the

> > > interrupt.

> >

> > The interrupt will be acknowledged in the irq_ack callback of

> > mtk_msi_irq_chip, which belongs to the msi_domain.

> 

> Let's try to go through it (and please explain to me if I get this wrong).

> 

> Say we have:

> 

> msi_enable = [PCIE_MSI_ENABLE_OFFSET] = 0x1;

> 

> while loop:

> 

> msi_status = [PCIE_MSI_STATUS_OFFSET] = 0x3;

> msi_status &= msi_enable => msi_status = 0x3 & 0x1 = 0x1;

> for_each_set_bit(msi_status) {

>    do something that presumably will disable the MSI interrupt status?


Yes, the corresponding interrupt status will be cleared.

> }

> (next loop iteration)

> 

> msi_status = [PCIE_MSI_STATUS_OFFSET] = 0x2;

> msi_status &= msi_enable => msi_status = 0x2 & 0x1 = 0x0;

> for_each_set_bit(msi_status) => does nothing.

> 

> msi_status = [PCIE_MSI_STATUS_OFFSET] = 0x2;

> (infinite loop)

> 

> Basically, I'm wondering if you should replace the while condition

> statement with:

> 

> while ((msi_status = readl(msi_info->base + PCIE_MSI_STATUS_OFFSET) &

> msi_enable))

> 


Yes, it will be a dead loop if we receive an abnormal interrupt status,
I will fix it in the next version, thanks for your kindly review.

> _______________________________________________

> Linux-mediatek mailing list

> Linux-mediatek@lists.infradead.org

> http://lists.infradead.org/mailman/listinfo/linux-mediatek