mbox series

[v3,00/11] Linux RISC-V AIA Support

Message ID 20230508142842.854564-1-apatel@ventanamicro.com
Headers show
Series Linux RISC-V AIA Support | expand

Message

Anup Patel May 8, 2023, 2:28 p.m. UTC
The RISC-V AIA specification is now frozen as-per the RISC-V international
process. The latest frozen specifcation can be found at:
https://github.com/riscv/riscv-aia/releases/download/1.0-RC1/riscv-interrupts-1.0-RC1.pdf

At a high-level, the AIA specification adds three things:
1) AIA CSRs
   - Improved local interrupt support
2) Incoming Message Signaled Interrupt Controller (IMSIC)
   - Per-HART MSI controller
   - Support MSI virtualization
   - Support IPI along with virtualization
3) Advanced Platform-Level Interrupt Controller (APLIC)
   - Wired interrupt controller
   - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
   - In Direct-mode, injects external interrupts directly into HARTs

For an overview of the AIA specification, refer the recent AIA virtualization
talk at KVM Forum 2022:
https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
https://www.youtube.com/watch?v=r071dL8Z0yo

The PATCH3 of this series conflicts with the "irqchip/riscv-intc: Add ACPI
support" patch of the "Add basic ACPI support for RISC-V" series.
(Refer, https://lore.kernel.org/linux-riscv/20230508115237.216337-1-sunilvl@ventanamicro.com/)

To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).

These patches can also be found in the riscv_aia_v3 branch at:
https://github.com/avpatel/linux.git

Changes since v2:
 - Rebased on Linux-6.4-rc1
 - Addressed Rob's comments on DT bindings patches 4 and 8.
 - Addessed Marc's comments on IMSIC driver PATCH5
 - Replaced use of OF apis in APLIC and IMSIC drivers with FWNODE apis
   this makes both drivers easily portable for ACPI support. This also
   removes unnecessary indirection from the APLIC and IMSIC drivers.
 - PATCH1 is a new patch for portability with ACPI support
 - PATCH2 is a new patch to fix probing in APLIC drivers for APLIC-only systems.
 - PATCH7 is a new patch which addresses the IOMMU DMA domain issues pointed
   out by SiFive

Changes since v1:
 - Rebased on Linux-6.2-rc2
 - Addressed comments on IMSIC DT bindings for PATCH4
 - Use raw_spin_lock_irqsave() on ids_lock for PATCH5
 - Improved MMIO alignment checks in PATCH5 to allow MMIO regions
   with holes.
 - Addressed comments on APLIC DT bindings for PATCH6
 - Fixed warning splat in aplic_msi_write_msg() caused by
   zeroed MSI message in PATCH7
 - Dropped DT property riscv,slow-ipi instead will have module
   parameter in future.

Anup Patel (11):
  RISC-V: Add riscv_fw_parent_hartid() function
  of/irq: Set FWNODE_FLAG_BEST_EFFORT for the interrupt controller DT
    nodes
  irqchip/riscv-intc: Add support for RISC-V AIA
  dt-bindings: interrupt-controller: Add RISC-V incoming MSI controller
  irqchip: Add RISC-V incoming MSI controller driver
  irqchip/riscv-imsic: Add support for PCI MSI irqdomain
  irqchip/riscv-imsic: Improve IOMMU DMA support
  dt-bindings: interrupt-controller: Add RISC-V advanced PLIC
  irqchip: Add RISC-V advanced PLIC driver
  RISC-V: Select APLIC and IMSIC drivers
  MAINTAINERS: Add entry for RISC-V AIA drivers

 .../interrupt-controller/riscv,aplic.yaml     |  162 +++
 .../interrupt-controller/riscv,imsics.yaml    |  172 +++
 MAINTAINERS                                   |   12 +
 arch/riscv/Kconfig                            |    2 +
 arch/riscv/include/asm/processor.h            |    3 +
 arch/riscv/kernel/cpu.c                       |   12 +
 drivers/iommu/dma-iommu.c                     |   38 +
 drivers/irqchip/Kconfig                       |   20 +-
 drivers/irqchip/Makefile                      |    2 +
 drivers/irqchip/irq-riscv-aplic.c             |  750 ++++++++++++
 drivers/irqchip/irq-riscv-imsic.c             | 1080 +++++++++++++++++
 drivers/irqchip/irq-riscv-intc.c              |   36 +-
 drivers/of/irq.c                              |   10 +
 include/linux/iommu.h                         |    6 +
 include/linux/irqchip/riscv-aplic.h           |  119 ++
 include/linux/irqchip/riscv-imsic.h           |   86 ++
 16 files changed, 2503 insertions(+), 7 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/interrupt-controller/riscv,aplic.yaml
 create mode 100644 Documentation/devicetree/bindings/interrupt-controller/riscv,imsics.yaml
 create mode 100644 drivers/irqchip/irq-riscv-aplic.c
 create mode 100644 drivers/irqchip/irq-riscv-imsic.c
 create mode 100644 include/linux/irqchip/riscv-aplic.h
 create mode 100644 include/linux/irqchip/riscv-imsic.h

Comments

Robin Murphy May 10, 2023, 10:48 a.m. UTC | #1
On 2023-05-08 15:28, Anup Patel wrote:
> We have a separate RISC-V IMSIC MSI address for each CPU so changing
> MSI (or IRQ) affinity results in re-programming of MSI address in
> the PCIe (or platform) device.
> 
> Currently, the iommu_dma_prepare_msi() is called only once at the
> time of IRQ allocation so IOMMU DMA domain will only have mapping
> for one MSI page. This means iommu_dma_compose_msi_msg() called
> by imsic_irq_compose_msi_msg() will always use the same MSI page
> irrespective to target CPU MSI address. In other words, changing
> MSI (or IRQ) affinity for device using IOMMU DMA domain will not
> work.
> 
> To address above issue, we do the following:
> 1) Map MSI pages for all CPUs in imsic_irq_domain_alloc()
>     using iommu_dma_prepare_msi().
> 2) Add a new iommu_dma_select_msi() API to select a specific
>     MSI page from a set of already mapped MSI pages.
> 3) Use iommu_dma_select_msi() to select a specific MSI page
>     before calling iommu_dma_compose_msi_msg() in
>     imsic_irq_compose_msi_msg().

The high-level design is that prepare ensures any necessary page 
mappings exist, then compose retrieves the appropriate page for the 
given message. I think it generalises well enough without needing a new 
op, it just means that caching a single page in the msi_desc up-front no 
longer fits, so that wants tweaking to allow compose to do a more 
general lookup.

Thanks,
Robin.

> Reported-by: Vincent Chen <vincent.chen@sifive.com>
> Signed-off-by: Anup Patel <apatel@ventanamicro.com>
> ---
>   drivers/iommu/dma-iommu.c         | 38 +++++++++++++++++++++++++++++++
>   drivers/irqchip/irq-riscv-imsic.c | 27 ++++++++++++----------
>   include/linux/iommu.h             |  6 +++++
>   3 files changed, 59 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7a9f0b0bddbd..07782c77a6eb 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1677,6 +1677,44 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
>   	return 0;
>   }
>   
> +/**
> + * iommu_dma_select_msi() - Select a MSI page from a set of
> + * already mapped MSI pages in the IOMMU domain.
> + *
> + * @desc: MSI descriptor prepared by iommu_dma_prepare_msi()
> + * @msi_addr: physical address of the MSI page to be selected
> + *
> + * Return: 0 on success or negative error code if the select failed.
> + */
> +int iommu_dma_select_msi(struct msi_desc *desc, phys_addr_t msi_addr)
> +{
> +	struct device *dev = msi_desc_to_dev(desc);
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +	const struct iommu_dma_msi_page *msi_page;
> +	struct iommu_dma_cookie *cookie;
> +
> +	if (!domain || !domain->iova_cookie) {
> +		desc->iommu_cookie = NULL;
> +		return 0;
> +	}
> +
> +	cookie = domain->iova_cookie;
> +	msi_addr &= ~(phys_addr_t)(cookie_msi_granule(cookie) - 1);
> +
> +	msi_page = msi_desc_get_iommu_cookie(desc);
> +	if (msi_page && msi_page->phys == msi_addr)
> +		return 0;
> +
> +	list_for_each_entry(msi_page, &cookie->msi_page_list, list) {
> +		if (msi_page->phys == msi_addr) {
> +			msi_desc_set_iommu_cookie(desc, msi_page);
> +			return 0;
> +		}
> +	}
> +
> +	return -ENOENT;
> +}
> +
>   /**
>    * iommu_dma_compose_msi_msg() - Apply translation to an MSI message
>    * @desc: MSI descriptor prepared by iommu_dma_prepare_msi()
> diff --git a/drivers/irqchip/irq-riscv-imsic.c b/drivers/irqchip/irq-riscv-imsic.c
> index 30247c84a6b0..ec61c599e0c5 100644
> --- a/drivers/irqchip/irq-riscv-imsic.c
> +++ b/drivers/irqchip/irq-riscv-imsic.c
> @@ -446,6 +446,10 @@ static void imsic_irq_compose_msi_msg(struct irq_data *d,
>   	if (WARN_ON(err))
>   		return;
>   
> +	err = iommu_dma_select_msi(desc, msi_addr);
> +	if (WARN_ON(err))
> +		return;
> +
>   	msg->address_hi = upper_32_bits(msi_addr);
>   	msg->address_lo = lower_32_bits(msi_addr);
>   	msg->data = d->hwirq;
> @@ -493,11 +497,18 @@ static int imsic_irq_domain_alloc(struct irq_domain *domain,
>   	int i, hwirq, err = 0;
>   	unsigned int cpu;
>   
> -	err = imsic_get_cpu(&imsic->lmask, false, &cpu);
> -	if (err)
> -		return err;
> +	/* Map MSI address of all CPUs */
> +	for_each_cpu(cpu, &imsic->lmask) {
> +		err = imsic_cpu_page_phys(cpu, 0, &msi_addr);
> +		if (err)
> +			return err;
>   
> -	err = imsic_cpu_page_phys(cpu, 0, &msi_addr);
> +		err = iommu_dma_prepare_msi(info->desc, msi_addr);
> +		if (err)
> +			return err;
> +	}
> +
> +	err = imsic_get_cpu(&imsic->lmask, false, &cpu);
>   	if (err)
>   		return err;
>   
> @@ -505,10 +516,6 @@ static int imsic_irq_domain_alloc(struct irq_domain *domain,
>   	if (hwirq < 0)
>   		return hwirq;
>   
> -	err = iommu_dma_prepare_msi(info->desc, msi_addr);
> -	if (err)
> -		goto fail;
> -
>   	for (i = 0; i < nr_irqs; i++) {
>   		imsic_id_set_target(hwirq + i, cpu);
>   		irq_domain_set_info(domain, virq + i, hwirq + i,
> @@ -528,10 +535,6 @@ static int imsic_irq_domain_alloc(struct irq_domain *domain,
>   	}
>   
>   	return 0;
> -
> -fail:
> -	imsic_ids_free(hwirq, get_count_order(nr_irqs));
> -	return err;
>   }
>   
>   static void imsic_irq_domain_free(struct irq_domain *domain,
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index e8c9a7da1060..41e8613832ab 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -1117,6 +1117,7 @@ void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit);
>   int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
>   
>   int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr);
> +int iommu_dma_select_msi(struct msi_desc *desc, phys_addr_t msi_addr);
>   void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg);
>   
>   #else /* CONFIG_IOMMU_DMA */
> @@ -1138,6 +1139,11 @@ static inline int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_a
>   	return 0;
>   }
>   
> +static inline int iommu_dma_select_msi(struct msi_desc *desc, phys_addr_t msi_addr)
> +{
> +	return 0;
> +}
> +
>   static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
>   {
>   }
Conor Dooley May 10, 2023, 12:45 p.m. UTC | #2
On Mon, May 08, 2023 at 07:58:32PM +0530, Anup Patel wrote:
> We add common riscv_fw_parent_hartid() which help device drivers
> to get parent hartid of the INTC (i.e. local interrupt controller)
> fwnode. Currently, this new function only supports device tree
> but it can be extended to support ACPI as well.
> 
> Signed-off-by: Anup Patel <apatel@ventanamicro.com>
> ---
>  arch/riscv/include/asm/processor.h |  3 +++
>  arch/riscv/kernel/cpu.c            | 12 ++++++++++++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> index 94a0590c6971..6fb8bbec8459 100644
> --- a/arch/riscv/include/asm/processor.h
> +++ b/arch/riscv/include/asm/processor.h
> @@ -77,6 +77,9 @@ struct device_node;
>  int riscv_of_processor_hartid(struct device_node *node, unsigned long *hartid);
>  int riscv_of_parent_hartid(struct device_node *node, unsigned long *hartid);
>  
> +struct fwnode_handle;
> +int riscv_fw_parent_hartid(struct fwnode_handle *node, unsigned long *hartid);
> +
>  extern void riscv_fill_hwcap(void);
>  extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src);
>  
> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> index 5de6fb703cc2..1adbe48b2b58 100644
> --- a/arch/riscv/kernel/cpu.c
> +++ b/arch/riscv/kernel/cpu.c
> @@ -73,6 +73,18 @@ int riscv_of_parent_hartid(struct device_node *node, unsigned long *hartid)
>  	return -1;
>  }
>  
> +/* Find hart ID of the CPU fwnode under which given fwnode falls. */
> +int riscv_fw_parent_hartid(struct fwnode_handle *node, unsigned long *hartid)
> +{
> +	/*
> +	 * Currently, this function only supports DT but it can be
> +	 * extended to support ACPI as well.
> +	 */

Statement of the obvious here, no?
Although, it seems a little odd to read this comment & the corresponding
statement in the commit message, when the series appears to have been
based on the ACPI?

Perhaps by the time v4 comes around, ACPI support will have been merged
& that'll be moot.

> +	if (!is_of_node(node))
> +		return -EINVAL;
> +	return riscv_of_parent_hartid(to_of_node(node), hartid);

nit: blank line before the return here please.

Thanks,
Conor.
Jason Gunthorpe May 15, 2023, 12:53 p.m. UTC | #3
On Mon, May 08, 2023 at 07:58:38PM +0530, Anup Patel wrote:
> We have a separate RISC-V IMSIC MSI address for each CPU so changing
> MSI (or IRQ) affinity results in re-programming of MSI address in
> the PCIe (or platform) device.
> 
> Currently, the iommu_dma_prepare_msi() is called only once at the
> time of IRQ allocation so IOMMU DMA domain will only have mapping
> for one MSI page. This means iommu_dma_compose_msi_msg() called
> by imsic_irq_compose_msi_msg() will always use the same MSI page
> irrespective to target CPU MSI address. In other words, changing
> MSI (or IRQ) affinity for device using IOMMU DMA domain will not
> work.
> 
> To address above issue, we do the following:
> 1) Map MSI pages for all CPUs in imsic_irq_domain_alloc()
>    using iommu_dma_prepare_msi().
> 2) Add a new iommu_dma_select_msi() API to select a specific
>    MSI page from a set of already mapped MSI pages.
> 3) Use iommu_dma_select_msi() to select a specific MSI page
>    before calling iommu_dma_compose_msi_msg() in
>    imsic_irq_compose_msi_msg().

Is there an iommu driver somewhere in all this? I don't obviously see
one?

There should be no reason to use the dma-iommu.c stuff just to make
interrupts work, that is only necessary if there is an iommu, and the
platform architecture requires the iommu to have the MSI region
programmed into IOPTEs.

And I'd be much happier if we could clean this design up before risc-v
starts using it too :\

Jason
Anup Patel June 13, 2023, 8:05 a.m. UTC | #4
On Wed, May 10, 2023 at 6:15 PM Conor Dooley <conor.dooley@microchip.com> wrote:
>
> On Mon, May 08, 2023 at 07:58:32PM +0530, Anup Patel wrote:
> > We add common riscv_fw_parent_hartid() which help device drivers
> > to get parent hartid of the INTC (i.e. local interrupt controller)
> > fwnode. Currently, this new function only supports device tree
> > but it can be extended to support ACPI as well.
> >
> > Signed-off-by: Anup Patel <apatel@ventanamicro.com>
> > ---
> >  arch/riscv/include/asm/processor.h |  3 +++
> >  arch/riscv/kernel/cpu.c            | 12 ++++++++++++
> >  2 files changed, 15 insertions(+)
> >
> > diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> > index 94a0590c6971..6fb8bbec8459 100644
> > --- a/arch/riscv/include/asm/processor.h
> > +++ b/arch/riscv/include/asm/processor.h
> > @@ -77,6 +77,9 @@ struct device_node;
> >  int riscv_of_processor_hartid(struct device_node *node, unsigned long *hartid);
> >  int riscv_of_parent_hartid(struct device_node *node, unsigned long *hartid);
> >
> > +struct fwnode_handle;
> > +int riscv_fw_parent_hartid(struct fwnode_handle *node, unsigned long *hartid);
> > +
> >  extern void riscv_fill_hwcap(void);
> >  extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src);
> >
> > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> > index 5de6fb703cc2..1adbe48b2b58 100644
> > --- a/arch/riscv/kernel/cpu.c
> > +++ b/arch/riscv/kernel/cpu.c
> > @@ -73,6 +73,18 @@ int riscv_of_parent_hartid(struct device_node *node, unsigned long *hartid)
> >       return -1;
> >  }
> >
> > +/* Find hart ID of the CPU fwnode under which given fwnode falls. */
> > +int riscv_fw_parent_hartid(struct fwnode_handle *node, unsigned long *hartid)
> > +{
> > +     /*
> > +      * Currently, this function only supports DT but it can be
> > +      * extended to support ACPI as well.
> > +      */
>
> Statement of the obvious here, no?
> Although, it seems a little odd to read this comment & the corresponding
> statement in the commit message, when the series appears to have been
> based on the ACPI?
>
> Perhaps by the time v4 comes around, ACPI support will have been merged
> & that'll be moot.

Yes, I was anyway going to update this in v4 to support both DT and ACPI.

>
> > +     if (!is_of_node(node))
> > +             return -EINVAL;
> > +     return riscv_of_parent_hartid(to_of_node(node), hartid);
>
> nit: blank line before the return here please.

Okay, I will update.

>
> Thanks,
> Conor.

Regards,
Anup