mbox series

[RFC,0/2] Add PCI driver for the Apple M1

Message ID 20210815042525.36878-1-alyssa@rosenzweig.io
Headers show
Series Add PCI driver for the Apple M1 | expand

Message

Alyssa Rosenzweig Aug. 15, 2021, 4:25 a.m. UTC
This adds a PCIe driver for the internal bus on the Apple M1 (and
presumably other Apple system-on-chips). It's based on the work of Marc
Zyngier, Mark Kettenis, and Stan Skowronek (Corellium). In conjunction
with a proper device tree and a GPIO driver, this enables the USB type-A
ports and the Ethernet port. It also paves the way for Wi-Fi and
Bluetooth, but that requires further work. The device tree patch is not
included here, as it depends on the GPIO driver which is still
work-in-progress. Nevertheless, I feel comfortable submitting "PCI: apple:
Add driver for the Apple M1".

I expect review feedback will be concentrated on the device tree
bindings. These bindings are a mish-mash of what's in the Marc's initial
driver, Corellium driver, and Mark's U-Boot downstream. They differ from
Maz's bindings by including nodes necessary for hardware bring-up, but
are simplified from Corellium's bindings by omitting tunables which are
handled by the Asahi Linux bootloader (m1n1). I am new to device tree
and expect "dt-bindings: PCI: Add Apple PCI controller" to need changes.
This was my first time working with YAML bindings; please be gentle :-)

I've collected the patches required to test this branch on this branch:

	https://github.com/mu-one/linux/commits/pcie

This branch is based on linux-next and contains a GPIO (pinctrl) driver,
this series, and updates to the M1 (T8103) device tree. The type-A ports
and Ethernet should work out-of-the-box on that tree, provided the
kernel is booted through m1n1. This is a noticeable improvement on Maz's
initial PCIe driver, which required U-Boot to function.

I've started using Linux on M1 as my workstation for Panfrost
development, so this should have 40 hours of testing by this time next
week.

Alyssa Rosenzweig (2):
  dt-bindings: PCI: Add Apple PCI controller
  PCI: apple: Add driver for the Apple M1

 .../devicetree/bindings/pci/apple,pcie.yaml   | 153 ++++++
 MAINTAINERS                                   |   7 +
 drivers/pci/controller/Kconfig                |  13 +
 drivers/pci/controller/Makefile               |   1 +
 drivers/pci/controller/pcie-apple.c           | 466 ++++++++++++++++++
 5 files changed, 640 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/apple,pcie.yaml
 create mode 100644 drivers/pci/controller/pcie-apple.c

Comments

Marc Zyngier Aug. 15, 2021, 7:42 a.m. UTC | #1
On Sun, 15 Aug 2021 05:25:25 +0100,
Alyssa Rosenzweig <alyssa@rosenzweig.io> wrote:
> 
> Add a driver for the PCIe controller found in Apple system-on-chips,
> particularly the Apple M1. This driver exposes the internal bus used for
> the USB type-A ports, Ethernet, Wi-Fi, and Bluetooth. This patch brings
> up the USB type-A ports and Ethernet. Bringing up the radios requires
> interfacing with the System Management Coprocessor via Apple's
> mailboxes, so that's left to a future patch.
> 
> In this state, the driver consists of two major parts: hardware
> initialization and MSI handling. The hardware initialization is derived
> from Mark Kettenis's U-Boot patches which in turn is derived from
> Corellium's patches for the hardware. The rest of the driver is derived
> from Marc Zyngier's driver for the hardware.

This really needs to be split into multiple patches:

- PCI probing
- Clock management
- MSI implementation

A couple of comments below:

> 
> Co-developed-by: Stan Skowronek <stan@corellium.com>
> Signed-off-by: Stan Skowronek <stan@corellium.com>
> Co-developed-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
> ---
>  MAINTAINERS                         |   1 +
>  drivers/pci/controller/Kconfig      |  13 +
>  drivers/pci/controller/Makefile     |   1 +
>  drivers/pci/controller/pcie-apple.c | 466 ++++++++++++++++++++++++++++
>  4 files changed, 481 insertions(+)
>  create mode 100644 drivers/pci/controller/pcie-apple.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d7d2e1d1e2f2..f13f65a844f7 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1274,6 +1274,7 @@ M:	Alyssa Rosenzweig <alyssa@rosenzweig.io>
>  L:	linux-pci@vger.kernel.org
>  S:	Maintained
>  F:	Documentation/devicetree/bindings/pci/apple,pcie.yaml
> +F:	drivers/pci/controller/pcie-apple.c
>  
>  APPLE SMC DRIVER
>  M:	Henrik Rydberg <rydberg@bitmath.org>
> diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
> index 326f7d13024f..881a6a81f3e2 100644
> --- a/drivers/pci/controller/Kconfig
> +++ b/drivers/pci/controller/Kconfig
> @@ -312,6 +312,19 @@ config PCIE_HISI_ERR
>  	  Say Y here if you want error handling support
>  	  for the PCIe controller's errors on HiSilicon HIP SoCs
>  
> +config PCIE_APPLE
> +	tristate "Apple PCIe controller"
> +	depends on ARCH_APPLE || COMPILE_TEST
> +	depends on OF
> +	depends on PCI_MSI_IRQ_DOMAIN
> +	depends on GPIOLIB
> +	help
> +	  Say Y here if you want to enable PCIe controller support on Apple
> +	  system-on-chips, like the Apple M1. This is required for the USB
> +	  type-A ports, Ethernet, Wi-Fi, and Bluetooth.
> +
> +	  If unsure, say Y if you have an Apple Silicon system.
> +
>  source "drivers/pci/controller/dwc/Kconfig"
>  source "drivers/pci/controller/mobiveil/Kconfig"
>  source "drivers/pci/controller/cadence/Kconfig"
> diff --git a/drivers/pci/controller/Makefile b/drivers/pci/controller/Makefile
> index aaf30b3dcc14..f9d40bad932c 100644
> --- a/drivers/pci/controller/Makefile
> +++ b/drivers/pci/controller/Makefile
> @@ -37,6 +37,7 @@ obj-$(CONFIG_VMD) += vmd.o
>  obj-$(CONFIG_PCIE_BRCMSTB) += pcie-brcmstb.o
>  obj-$(CONFIG_PCI_LOONGSON) += pci-loongson.o
>  obj-$(CONFIG_PCIE_HISI_ERR) += pcie-hisi-error.o
> +obj-$(CONFIG_PCIE_APPLE) += pcie-apple.o
>  # pcie-hisi.o quirks are needed even without CONFIG_PCIE_DW
>  obj-y				+= dwc/
>  obj-y				+= mobiveil/
> diff --git a/drivers/pci/controller/pcie-apple.c b/drivers/pci/controller/pcie-apple.c
> new file mode 100644
> index 000000000000..08088e06460f
> --- /dev/null
> +++ b/drivers/pci/controller/pcie-apple.c
> @@ -0,0 +1,466 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PCIe host bridge driver for Apple system-on-chips.
> + *
> + * The HW is ECAM compliant, so once the controller is initialized, the driver
> + * mostly only needs MSI handling. Initialization requires enabling power and
> + * clocks, along with a number of register pokes.
> + *
> + * Copyright (C) 2021 Google LLC
> + * Copyright (C) 2021 Corellium LLC
> + * Copyright (C) 2021 Mark Kettenis <kettenis@openbsd.org>
> + * Copyright (C) 2021 Alyssa Rosenzweig <alyssa@rosenzweig.io>
> + * Author: Marc Zyngier <maz@kernel.org>
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/irqdomain.h>
> +#include <linux/module.h>
> +#include <linux/msi.h>
> +#include <linux/of_irq.h>
> +#include <linux/pci-ecam.h>
> +#include <linux/iopoll.h>
> +#include <linux/gpio/consumer.h>
> +
> +#define CORE_RC_PHYIF_CTL		0x00024
> +#define   CORE_RC_PHYIF_CTL_RUN		BIT(0)
> +#define CORE_RC_PHYIF_STAT		0x00028
> +#define   CORE_RC_PHYIF_STAT_REFCLK	BIT(4)
> +#define CORE_RC_CTL			0x00050
> +#define   CORE_RC_CTL_RUN		BIT(0)
> +#define CORE_RC_STAT			0x00058
> +#define   CORE_RC_STAT_READY		BIT(0)
> +#define CORE_FABRIC_STAT		0x04000
> +#define   CORE_FABRIC_STAT_MASK		0x001F001F
> +#define CORE_PHY_CTL			0x80000
> +#define   CORE_PHY_CTL_CLK0REQ		BIT(0)
> +#define   CORE_PHY_CTL_CLK1REQ		BIT(1)
> +#define   CORE_PHY_CTL_CLK0ACK		BIT(2)
> +#define   CORE_PHY_CTL_CLK1ACK		BIT(3)
> +#define   CORE_PHY_CTL_RESET		BIT(7)
> +#define CORE_LANE_CFG(port)		(0x84000 + 0x4000 * (port))
> +#define   CORE_LANE_CFG_REFCLK0REQ	BIT(0)
> +#define   CORE_LANE_CFG_REFCLK1		BIT(1)
> +#define   CORE_LANE_CFG_REFCLK0ACK	BIT(2)
> +#define   CORE_LANE_CFG_REFCLKEN	(BIT(9) | BIT(10))
> +#define CORE_LANE_CTL(port)		(0x84004 + 0x4000 * (port))
> +#define   CORE_LANE_CTL_CFGACC		BIT(15)
> +
> +#define PORT_LTSSMCTL			0x00080
> +#define   PORT_LTSSMCTL_START		BIT(0)
> +#define PORT_INTSTAT			0x00100
> +#define   PORT_INT_TUNNEL_ERR		BIT(31)
> +#define   PORT_INT_CPL_TIMEOUT		BIT(23)
> +#define   PORT_INT_RID2SID_MAPERR	BIT(22)
> +#define   PORT_INT_CPL_ABORT		BIT(21)
> +#define   PORT_INT_MSI_BAD_DATA		BIT(19)
> +#define   PORT_INT_MSI_ERR		BIT(18)
> +#define   PORT_INT_REQADDR_GT32		BIT(17)
> +#define   PORT_INT_AF_TIMEOUT		BIT(15)
> +#define   PORT_INT_LINK_DOWN		BIT(14)
> +#define   PORT_INT_LINK_UP		BIT(12)
> +#define   PORT_INT_LINK_BWMGMT		BIT(11)
> +#define   PORT_INT_AER_MASK		(15 << 4)
> +#define   PORT_INT_PORT_ERR		BIT(4)
> +#define   PORT_INT_INTx(i)		BIT(i)
> +#define   PORT_INT_INTxALL		15
> +#define PORT_INTMSK			0x00104
> +#define PORT_INTMSKSET			0x00108
> +#define PORT_INTMSKCLR			0x0010c
> +#define PORT_MSICFG			0x00124
> +#define   PORT_MSICFG_EN		BIT(0)
> +#define   PORT_MSICFG_L2MSINUM_SHIFT	4
> +#define PORT_MSIBASE			0x00128
> +#define   PORT_MSIBASE_1_SHIFT		16
> +#define PORT_MSIADDR			0x00168
> +#define PORT_LINKSTS			0x00208
> +#define   PORT_LINKSTS_UP		BIT(0)
> +#define   PORT_LINKSTS_BUSY		BIT(2)
> +#define PORT_LINKCMDSTS			0x00210
> +#define PORT_OUTS_NPREQS		0x00284
> +#define   PORT_OUTS_NPREQS_REQ		BIT(24)
> +#define   PORT_OUTS_NPREQS_CPL		BIT(16)
> +#define PORT_RXWR_FIFO			0x00288
> +#define   PORT_RXWR_FIFO_HDR		GENMASK(15, 10)
> +#define   PORT_RXWR_FIFO_DATA		GENMASK(9, 0)
> +#define PORT_RXRD_FIFO			0x0028C
> +#define   PORT_RXRD_FIFO_REQ		GENMASK(6, 0)
> +#define PORT_OUTS_CPLS			0x00290
> +#define   PORT_OUTS_CPLS_SHRD		GENMASK(14, 8)
> +#define   PORT_OUTS_CPLS_WAIT		GENMASK(6, 0)
> +#define PORT_APPCLK			0x00800
> +#define   PORT_APPCLK_EN		BIT(0)
> +#define   PORT_APPCLK_CGDIS		BIT(8)
> +#define PORT_STATUS			0x00804
> +#define   PORT_STATUS_READY		BIT(0)
> +#define PORT_REFCLK			0x00810
> +#define   PORT_REFCLK_EN		BIT(0)
> +#define   PORT_REFCLK_CGDIS		BIT(8)
> +#define PORT_PERST			0x00814
> +#define   PORT_PERST_OFF		BIT(0)
> +#define PORT_RID2SID(i16)		(0x00828 + 4 * (i16))
> +#define   PORT_RID2SID_VALID		BIT(31)
> +#define   PORT_RID2SID_SID_SHIFT	16
> +#define   PORT_RID2SID_BUS_SHIFT	8
> +#define   PORT_RID2SID_DEV_SHIFT	3
> +#define   PORT_RID2SID_FUNC_SHIFT	0
> +#define PORT_OUTS_PREQS_HDR		0x00980
> +#define   PORT_OUTS_PREQS_HDR_MASK	GENMASK(9, 0)
> +#define PORT_OUTS_PREQS_DATA		0x00984
> +#define   PORT_OUTS_PREQS_DATA_MASK	GENMASK(15, 0)
> +#define PORT_TUNCTRL			0x00988
> +#define   PORT_TUNCTRL_PERST_ON		BIT(0)
> +#define   PORT_TUNCTRL_PERST_ACK_REQ	BIT(1)
> +#define PORT_TUNSTAT			0x0098c
> +#define   PORT_TUNSTAT_PERST_ON		BIT(0)
> +#define   PORT_TUNSTAT_PERST_ACK_PEND	BIT(1)
> +#define PORT_PREFMEM_ENABLE		0x00994
> +
> +/* The doorbell address is "well known" */
> +#define DOORBELL_ADDR			0xfffff000
> +
> +/* The hardware exposes 3 ports. Port 0 (WiFi and Bluetooth) is special, as it
> + * is power-gated by SMC to facilitate rfkill.
> + */
> +enum apple_pcie_port {
> +	APPLE_PCIE_PORT_RADIO    = 0,
> +	APPLE_PCIE_PORT_USB      = 1,
> +	APPLE_PCIE_PORT_ETHERNET = 2,

This really doesn't belong in a low-level PCIe controller driver. The
ports should be purely generic.

> +	APPLE_PCIE_NUM_PORTS
> +};
> +
> +struct apple_pcie {
> +	u32			msi_base;
> +	u32			nvecs;
> +	struct mutex		lock;
> +	struct device		*dev;
> +	struct irq_domain	*domain;
> +	unsigned long		*bitmap;
> +	void __iomem            *rc;
> +};
> +
> +static inline void rmwl(u32 clr, u32 set, void __iomem *addr)
> +{
> +	writel((readl(addr) & ~clr) | set, addr);

Please use relaxed accessors. If the barriers are actually needed,
please document what you are ordering against. This applies throughout
the patch.

This also begs the question: can this be called concurrently?

> +}
> +
> +static void apple_msi_top_irq_mask(struct irq_data *d)
> +{
> +	pci_msi_mask_irq(d);
> +	irq_chip_mask_parent(d);
> +}
> +
> +static void apple_msi_top_irq_unmask(struct irq_data *d)
> +{
> +	pci_msi_unmask_irq(d);
> +	irq_chip_unmask_parent(d);
> +}
> +
> +static void apple_msi_top_irq_eoi(struct irq_data *d)
> +{
> +	irq_chip_eoi_parent(d);
> +}

Drop this and use the irq_chip_eoi_parent() directly in the
irqchip. This was only here for debug.

> +
> +static struct irq_chip apple_msi_top_chip = {
> +	.name			= "PCIe MSI",
> +	.irq_mask		= apple_msi_top_irq_mask,
> +	.irq_unmask		= apple_msi_top_irq_unmask,
> +	.irq_eoi		= apple_msi_top_irq_eoi,
> +	.irq_set_affinity	= irq_chip_set_affinity_parent,
> +	.irq_set_type		= irq_chip_set_type_parent,
> +};
> +
> +static void apple_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
> +{
> +	msg->address_hi = 0;
> +	msg->address_lo = DOORBELL_ADDR;

What was never clear is whether this doorbell address really is always
at this location, or whether it is actually programmed by *something*.

In any case, please use the lower_32bit/upper_32bit helpers, just in
case we can move the address somewhere else.

> +	msg->data = data->hwirq;
> +}
> +
> +static struct irq_chip apple_msi_bottom_chip = {
> +	.name			= "MSI",
> +	.irq_mask		= irq_chip_mask_parent,
> +	.irq_unmask		= irq_chip_unmask_parent,
> +	.irq_set_affinity	= irq_chip_set_affinity_parent,
> +	.irq_eoi		= irq_chip_eoi_parent,
> +	.irq_set_affinity	= irq_chip_set_affinity_parent,
> +	.irq_set_type		= irq_chip_set_type_parent,
> +	.irq_compose_msi_msg	= apple_msi_compose_msg,
> +};
> +
> +static int apple_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
> +				  unsigned int nr_irqs, void *args)
> +{
> +	struct apple_pcie *pcie = domain->host_data;
> +	struct irq_fwspec fwspec;
> +	unsigned int i;
> +	int ret, hwirq;
> +
> +	mutex_lock(&pcie->lock);
> +
> +	hwirq = bitmap_find_free_region(pcie->bitmap, pcie->nvecs,
> +					order_base_2(nr_irqs));
> +
> +	mutex_unlock(&pcie->lock);
> +
> +	if (hwirq < 0)
> +		return -ENOSPC;
> +
> +	fwspec.fwnode = domain->parent->fwnode;
> +	fwspec.param_count = 3;
> +	fwspec.param[0] = 0;
> +	fwspec.param[1] = hwirq + pcie->msi_base;
> +	fwspec.param[2] = IRQ_TYPE_EDGE_RISING;
> +
> +	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, &fwspec);
> +	if (ret)
> +		return ret;
> +
> +	for (i = 0; i < nr_irqs; i++) {
> +		irq_domain_set_hwirq_and_chip(domain, virq + i, hwirq + i,
> +					      &apple_msi_bottom_chip,
> +					      domain->host_data);
> +	}
> +
> +	return 0;
> +}
> +
> +static void apple_msi_domain_free(struct irq_domain *domain, unsigned int virq,
> +				  unsigned int nr_irqs)
> +{
> +	struct irq_data *d = irq_domain_get_irq_data(domain, virq);
> +	struct apple_pcie *pcie = domain->host_data;
> +
> +	mutex_lock(&pcie->lock);
> +
> +	bitmap_release_region(pcie->bitmap, d->hwirq, order_base_2(nr_irqs));
> +
> +	mutex_unlock(&pcie->lock);
> +}
> +
> +static const struct irq_domain_ops apple_msi_domain_ops = {
> +	.alloc	= apple_msi_domain_alloc,
> +	.free	= apple_msi_domain_free,
> +};
> +
> +static struct msi_domain_info apple_msi_info = {
> +	.flags	= (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
> +		   MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX),
> +	.chip	= &apple_msi_top_chip,
> +};
> +
> +static int apple_pcie_setup_refclk(void __iomem *rc,
> +				   void __iomem *port,
> +				   unsigned int idx)
> +{
> +	u32 stat;
> +	int res;
> +
> +	res = readl_poll_timeout(rc + CORE_RC_PHYIF_STAT, stat,
> +				 stat & CORE_RC_PHYIF_STAT_REFCLK, 100, 50000);
> +	if (res < 0)
> +		return res;
> +
> +	rmwl(0, CORE_LANE_CTL_CFGACC, rc + CORE_LANE_CTL(idx));
> +	rmwl(0, CORE_LANE_CFG_REFCLK0REQ, rc + CORE_LANE_CFG(idx));
> +
> +	res = readl_poll_timeout(rc + CORE_LANE_CFG(idx), stat,
> +				 stat & CORE_LANE_CFG_REFCLK0ACK, 100, 50000);
> +	if (res < 0)
> +		return res;
> +
> +	rmwl(0, CORE_LANE_CFG_REFCLK1, rc + CORE_LANE_CFG(idx));
> +	res = readl_poll_timeout(rc + CORE_LANE_CFG(idx), stat,
> +				 stat & CORE_LANE_CFG_REFCLK1, 100, 50000);
> +
> +	if (res < 0)
> +		return res;
> +
> +	rmwl(CORE_LANE_CTL_CFGACC, 0, rc + CORE_LANE_CTL(idx));
> +	udelay(1);
> +	rmwl(0, CORE_LANE_CFG_REFCLKEN, rc + CORE_LANE_CFG(idx));
> +
> +	rmwl(0, PORT_REFCLK_EN, port + PORT_REFCLK);
> +
> +	return 0;
> +}
> +
> +static int apple_pcie_setup_port(struct apple_pcie *pcie, unsigned int i)
> +{
> +	struct fwnode_handle *fwnode = dev_fwnode(pcie->dev);
> +	void __iomem *port;
> +	struct gpio_desc *reset;
> +	uint32_t stat;
> +	int ret;
> +
> +	port = devm_of_iomap(pcie->dev, to_of_node(fwnode), i + 3, NULL);
> +
> +	if (IS_ERR(port))
> +		return -ENODEV;
> +
> +	reset = devm_gpiod_get_index(pcie->dev, "reset", i, 0);
> +	if (IS_ERR(reset))
> +		return PTR_ERR(reset);
> +
> +	gpiod_direction_output(reset, 0);
> +
> +	rmwl(0, PORT_APPCLK_EN, port + PORT_APPCLK);
> +
> +	ret = apple_pcie_setup_refclk(pcie->rc, port, i);
> +	if (ret < 0)
> +		return ret;
> +
> +	rmwl(0, PORT_PERST_OFF, port + PORT_PERST);
> +	gpiod_set_value(reset, 1);
> +
> +	ret = readl_poll_timeout(port + PORT_STATUS, stat,
> +				 stat & PORT_STATUS_READY, 100, 250000);
> +	if (ret < 0) {
> +		dev_err(pcie->dev, "port %u ready wait timeout\n", i);
> +		return ret;
> +	}
> +
> +	rmwl(PORT_REFCLK_CGDIS, 0, port + PORT_REFCLK);
> +	rmwl(PORT_APPCLK_CGDIS, 0, port + PORT_APPCLK);
> +
> +	ret = readl_poll_timeout(port + PORT_LINKSTS, stat,
> +				 !(stat & PORT_LINKSTS_BUSY), 100, 250000);
> +	if (ret < 0) {
> +		dev_err(pcie->dev, "port %u link not busy timeout\n", i);
> +		return ret;
> +	}
> +
> +	writel(0xfb512fff, port + PORT_INTMSKSET);

Magic. What is this for?

> +
> +	writel(PORT_INT_LINK_UP | PORT_INT_LINK_DOWN | PORT_INT_AF_TIMEOUT |
> +	       PORT_INT_REQADDR_GT32 | PORT_INT_MSI_ERR |
> +	       PORT_INT_MSI_BAD_DATA | PORT_INT_CPL_ABORT |
> +	       PORT_INT_CPL_TIMEOUT | (1 << 26), port + PORT_INTSTAT);
> +
> +	usleep_range(5000, 10000);
> +
> +	rmwl(0, PORT_LTSSMCTL_START, port + PORT_LTSSMCTL);
> +
> +	ret = readl_poll_timeout(port + PORT_LINKSTS, stat,
> +				 stat & PORT_LINKSTS_UP, 100, 500000);
> +	if (ret < 0) {
> +		dev_err(pcie->dev, "port %u link up wait timeout\n", i);
> +		return ret;
> +	}

I have the strong feeling that a lot of things in the above is to get
an interrupt when the port reports an event. Why the polling then?

> +
> +	writel(DOORBELL_ADDR, port + PORT_MSIADDR);
> +	writel(0, port + PORT_MSIBASE);

So here you go, the MSI doorbell *is* configurable. Should it be
placed somewhere else? Shouldn't it be configured before the link is
up?

> +	writel((5 << PORT_MSICFG_L2MSINUM_SHIFT) | PORT_MSICFG_EN,
> +	       port + PORT_MSICFG);

Ah, that one actually makes sense (enables 32 MSIs for the port).

> +
> +	return 0;
> +}
> +
> +static int apple_msi_init(struct apple_pcie *pcie)
> +{
> +	struct fwnode_handle *fwnode = dev_fwnode(pcie->dev);
> +	struct device_node *parent_intc;
> +	struct irq_domain *parent;
> +	int ret, i;
> +
> +	pcie->rc = devm_of_iomap(pcie->dev, to_of_node(fwnode), 1, NULL);
> +
> +	if (IS_ERR(pcie->rc))
> +		return -ENODEV;
> +
> +	for (i = 0; i < APPLE_PCIE_NUM_PORTS; ++i) {
> +		/* Bringing up the radios requires SMC. Skip for now. */
> +		if (i == APPLE_PCIE_PORT_RADIO)
> +			continue;

Why? Shouldn't this be moved into the driver for the endpoint, rather
than hardcoding something here which is likely to change from one
system to another? If establishing the link actually requires talking
to another part of the system, then it should be described in DT.

> +
> +		ret = apple_pcie_setup_port(pcie, i);
> +
> +		if (ret) {
> +			dev_err(pcie->dev, "Port %u setup fail: %d\n", i, ret);
> +			return ret;
> +		}
> +	}
> +
> +	ret = of_property_read_u32_index(to_of_node(fwnode), "msi-interrupts",
> +					 0, &pcie->msi_base);
> +	if (ret)
> +		return ret;
> +
> +	ret = of_property_read_u32_index(to_of_node(fwnode), "msi-interrupts",
> +					 1, &pcie->nvecs);
> +	if (ret)
> +		return ret;
> +
> +	pcie->bitmap = devm_kcalloc(pcie->dev, BITS_TO_LONGS(pcie->nvecs),
> +				    sizeof(long), GFP_KERNEL);
> +	if (!pcie->bitmap)
> +		return -ENOMEM;
> +
> +	parent_intc = of_irq_find_parent(to_of_node(fwnode));
> +	parent = irq_find_host(parent_intc);
> +	if (!parent_intc || !parent) {
> +		dev_err(pcie->dev, "failed to find parent domain\n");
> +		return -ENXIO;
> +	}
> +
> +	parent = irq_domain_create_hierarchy(parent, 0, pcie->nvecs, fwnode,
> +					     &apple_msi_domain_ops, pcie);
> +	if (!parent) {
> +		dev_err(pcie->dev, "failed to create IRQ domain\n");
> +		return -ENOMEM;
> +	}
> +	irq_domain_update_bus_token(parent, DOMAIN_BUS_NEXUS);
> +
> +	pcie->domain = pci_msi_create_irq_domain(fwnode, &apple_msi_info,
> +						 parent);
> +	if (!pcie->domain) {
> +		dev_err(pcie->dev, "failed to create MSI domain\n");
> +		irq_domain_remove(parent);
> +		return -ENOMEM;
> +	}
> +
> +	return 0;
> +}
> +
> +static int apple_m1_pci_init(struct pci_config_window *cfg)
> +{
> +	struct device *dev = cfg->parent;
> +	struct apple_pcie *pcie;
> +
> +	pcie = devm_kzalloc(dev, sizeof(*pcie), GFP_KERNEL);
> +	if (!pcie)
> +		return -ENOMEM;
> +
> +	pcie->dev = dev;
> +
> +	mutex_init(&pcie->lock);
> +
> +	return apple_msi_init(pcie);
> +}
> +
> +static const struct pci_ecam_ops apple_m1_cfg_ecam_ops = {
> +	.init		= apple_m1_pci_init,
> +	.pci_ops	= {
> +		.map_bus	= pci_ecam_map_bus,
> +		.read		= pci_generic_config_read,
> +		.write		= pci_generic_config_write,
> +	}
> +};
> +
> +static const struct of_device_id apple_pci_of_match[] = {
> +	{ .compatible = "apple,pcie", .data = &apple_m1_cfg_ecam_ops },
> +	{ },
> +};
> +MODULE_DEVICE_TABLE(of, gen_pci_of_match);
> +
> +static struct platform_driver apple_pci_driver = {
> +	.driver = {
> +		.name = "pcie-apple",
> +		.of_match_table = apple_pci_of_match,
> +	},
> +	.probe = pci_host_common_probe,
> +	.remove = pci_host_common_remove,
> +};
> +module_platform_driver(apple_pci_driver);
> +
> +MODULE_LICENSE("GPL v2");

Thanks,

	M.
Rob Herring Aug. 15, 2021, 8:57 p.m. UTC | #2
On Sat, Aug 14, 2021 at 11:34 PM Alyssa Rosenzweig <alyssa@rosenzweig.io> wrote:
>
> Add a driver for the PCIe controller found in Apple system-on-chips,
> particularly the Apple M1. This driver exposes the internal bus used for
> the USB type-A ports, Ethernet, Wi-Fi, and Bluetooth. This patch brings
> up the USB type-A ports and Ethernet. Bringing up the radios requires
> interfacing with the System Management Coprocessor via Apple's
> mailboxes, so that's left to a future patch.
>
> In this state, the driver consists of two major parts: hardware
> initialization and MSI handling. The hardware initialization is derived
> from Mark Kettenis's U-Boot patches which in turn is derived from
> Corellium's patches for the hardware. The rest of the driver is derived
> from Marc Zyngier's driver for the hardware.
>
> Co-developed-by: Stan Skowronek <stan@corellium.com>
> Signed-off-by: Stan Skowronek <stan@corellium.com>
> Co-developed-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
> ---
>  MAINTAINERS                         |   1 +
>  drivers/pci/controller/Kconfig      |  13 +
>  drivers/pci/controller/Makefile     |   1 +
>  drivers/pci/controller/pcie-apple.c | 466 ++++++++++++++++++++++++++++
>  4 files changed, 481 insertions(+)
>  create mode 100644 drivers/pci/controller/pcie-apple.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d7d2e1d1e2f2..f13f65a844f7 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1274,6 +1274,7 @@ M:        Alyssa Rosenzweig <alyssa@rosenzweig.io>
>  L:     linux-pci@vger.kernel.org
>  S:     Maintained
>  F:     Documentation/devicetree/bindings/pci/apple,pcie.yaml
> +F:     drivers/pci/controller/pcie-apple.c
>
>  APPLE SMC DRIVER
>  M:     Henrik Rydberg <rydberg@bitmath.org>
> diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
> index 326f7d13024f..881a6a81f3e2 100644
> --- a/drivers/pci/controller/Kconfig
> +++ b/drivers/pci/controller/Kconfig
> @@ -312,6 +312,19 @@ config PCIE_HISI_ERR
>           Say Y here if you want error handling support
>           for the PCIe controller's errors on HiSilicon HIP SoCs
>
> +config PCIE_APPLE
> +       tristate "Apple PCIe controller"
> +       depends on ARCH_APPLE || COMPILE_TEST
> +       depends on OF
> +       depends on PCI_MSI_IRQ_DOMAIN
> +       depends on GPIOLIB
> +       help
> +         Say Y here if you want to enable PCIe controller support on Apple
> +         system-on-chips, like the Apple M1. This is required for the USB
> +         type-A ports, Ethernet, Wi-Fi, and Bluetooth.
> +
> +         If unsure, say Y if you have an Apple Silicon system.
> +
>  source "drivers/pci/controller/dwc/Kconfig"
>  source "drivers/pci/controller/mobiveil/Kconfig"
>  source "drivers/pci/controller/cadence/Kconfig"
> diff --git a/drivers/pci/controller/Makefile b/drivers/pci/controller/Makefile
> index aaf30b3dcc14..f9d40bad932c 100644
> --- a/drivers/pci/controller/Makefile
> +++ b/drivers/pci/controller/Makefile
> @@ -37,6 +37,7 @@ obj-$(CONFIG_VMD) += vmd.o
>  obj-$(CONFIG_PCIE_BRCMSTB) += pcie-brcmstb.o
>  obj-$(CONFIG_PCI_LOONGSON) += pci-loongson.o
>  obj-$(CONFIG_PCIE_HISI_ERR) += pcie-hisi-error.o
> +obj-$(CONFIG_PCIE_APPLE) += pcie-apple.o
>  # pcie-hisi.o quirks are needed even without CONFIG_PCIE_DW
>  obj-y                          += dwc/
>  obj-y                          += mobiveil/
> diff --git a/drivers/pci/controller/pcie-apple.c b/drivers/pci/controller/pcie-apple.c
> new file mode 100644
> index 000000000000..08088e06460f
> --- /dev/null
> +++ b/drivers/pci/controller/pcie-apple.c
> @@ -0,0 +1,466 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PCIe host bridge driver for Apple system-on-chips.
> + *
> + * The HW is ECAM compliant, so once the controller is initialized, the driver
> + * mostly only needs MSI handling. Initialization requires enabling power and
> + * clocks, along with a number of register pokes.
> + *
> + * Copyright (C) 2021 Google LLC
> + * Copyright (C) 2021 Corellium LLC
> + * Copyright (C) 2021 Mark Kettenis <kettenis@openbsd.org>
> + * Copyright (C) 2021 Alyssa Rosenzweig <alyssa@rosenzweig.io>
> + * Author: Marc Zyngier <maz@kernel.org>
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/irqdomain.h>
> +#include <linux/module.h>
> +#include <linux/msi.h>
> +#include <linux/of_irq.h>
> +#include <linux/pci-ecam.h>
> +#include <linux/iopoll.h>
> +#include <linux/gpio/consumer.h>
> +
> +#define CORE_RC_PHYIF_CTL              0x00024
> +#define   CORE_RC_PHYIF_CTL_RUN                BIT(0)
> +#define CORE_RC_PHYIF_STAT             0x00028
> +#define   CORE_RC_PHYIF_STAT_REFCLK    BIT(4)
> +#define CORE_RC_CTL                    0x00050
> +#define   CORE_RC_CTL_RUN              BIT(0)
> +#define CORE_RC_STAT                   0x00058
> +#define   CORE_RC_STAT_READY           BIT(0)
> +#define CORE_FABRIC_STAT               0x04000
> +#define   CORE_FABRIC_STAT_MASK                0x001F001F
> +#define CORE_PHY_CTL                   0x80000
> +#define   CORE_PHY_CTL_CLK0REQ         BIT(0)
> +#define   CORE_PHY_CTL_CLK1REQ         BIT(1)
> +#define   CORE_PHY_CTL_CLK0ACK         BIT(2)
> +#define   CORE_PHY_CTL_CLK1ACK         BIT(3)
> +#define   CORE_PHY_CTL_RESET           BIT(7)

I was going to say these should be a phy driver perhaps, but they are
unused. So for now, just drop them.

> +#define CORE_LANE_CFG(port)            (0x84000 + 0x4000 * (port))
> +#define   CORE_LANE_CFG_REFCLK0REQ     BIT(0)
> +#define   CORE_LANE_CFG_REFCLK1                BIT(1)
> +#define   CORE_LANE_CFG_REFCLK0ACK     BIT(2)
> +#define   CORE_LANE_CFG_REFCLKEN       (BIT(9) | BIT(10))
> +#define CORE_LANE_CTL(port)            (0x84004 + 0x4000 * (port))
> +#define   CORE_LANE_CTL_CFGACC         BIT(15)
> +
> +#define PORT_LTSSMCTL                  0x00080
> +#define   PORT_LTSSMCTL_START          BIT(0)
> +#define PORT_INTSTAT                   0x00100
> +#define   PORT_INT_TUNNEL_ERR          BIT(31)
> +#define   PORT_INT_CPL_TIMEOUT         BIT(23)
> +#define   PORT_INT_RID2SID_MAPERR      BIT(22)
> +#define   PORT_INT_CPL_ABORT           BIT(21)
> +#define   PORT_INT_MSI_BAD_DATA                BIT(19)
> +#define   PORT_INT_MSI_ERR             BIT(18)
> +#define   PORT_INT_REQADDR_GT32                BIT(17)
> +#define   PORT_INT_AF_TIMEOUT          BIT(15)
> +#define   PORT_INT_LINK_DOWN           BIT(14)
> +#define   PORT_INT_LINK_UP             BIT(12)
> +#define   PORT_INT_LINK_BWMGMT         BIT(11)
> +#define   PORT_INT_AER_MASK            (15 << 4)
> +#define   PORT_INT_PORT_ERR            BIT(4)
> +#define   PORT_INT_INTx(i)             BIT(i)
> +#define   PORT_INT_INTxALL             15
> +#define PORT_INTMSK                    0x00104
> +#define PORT_INTMSKSET                 0x00108
> +#define PORT_INTMSKCLR                 0x0010c
> +#define PORT_MSICFG                    0x00124
> +#define   PORT_MSICFG_EN               BIT(0)
> +#define   PORT_MSICFG_L2MSINUM_SHIFT   4
> +#define PORT_MSIBASE                   0x00128
> +#define   PORT_MSIBASE_1_SHIFT         16
> +#define PORT_MSIADDR                   0x00168
> +#define PORT_LINKSTS                   0x00208
> +#define   PORT_LINKSTS_UP              BIT(0)
> +#define   PORT_LINKSTS_BUSY            BIT(2)
> +#define PORT_LINKCMDSTS                        0x00210
> +#define PORT_OUTS_NPREQS               0x00284
> +#define   PORT_OUTS_NPREQS_REQ         BIT(24)
> +#define   PORT_OUTS_NPREQS_CPL         BIT(16)
> +#define PORT_RXWR_FIFO                 0x00288
> +#define   PORT_RXWR_FIFO_HDR           GENMASK(15, 10)
> +#define   PORT_RXWR_FIFO_DATA          GENMASK(9, 0)
> +#define PORT_RXRD_FIFO                 0x0028C
> +#define   PORT_RXRD_FIFO_REQ           GENMASK(6, 0)
> +#define PORT_OUTS_CPLS                 0x00290
> +#define   PORT_OUTS_CPLS_SHRD          GENMASK(14, 8)
> +#define   PORT_OUTS_CPLS_WAIT          GENMASK(6, 0)
> +#define PORT_APPCLK                    0x00800
> +#define   PORT_APPCLK_EN               BIT(0)
> +#define   PORT_APPCLK_CGDIS            BIT(8)
> +#define PORT_STATUS                    0x00804
> +#define   PORT_STATUS_READY            BIT(0)
> +#define PORT_REFCLK                    0x00810
> +#define   PORT_REFCLK_EN               BIT(0)
> +#define   PORT_REFCLK_CGDIS            BIT(8)
> +#define PORT_PERST                     0x00814
> +#define   PORT_PERST_OFF               BIT(0)
> +#define PORT_RID2SID(i16)              (0x00828 + 4 * (i16))
> +#define   PORT_RID2SID_VALID           BIT(31)
> +#define   PORT_RID2SID_SID_SHIFT       16
> +#define   PORT_RID2SID_BUS_SHIFT       8
> +#define   PORT_RID2SID_DEV_SHIFT       3
> +#define   PORT_RID2SID_FUNC_SHIFT      0
> +#define PORT_OUTS_PREQS_HDR            0x00980
> +#define   PORT_OUTS_PREQS_HDR_MASK     GENMASK(9, 0)
> +#define PORT_OUTS_PREQS_DATA           0x00984
> +#define   PORT_OUTS_PREQS_DATA_MASK    GENMASK(15, 0)
> +#define PORT_TUNCTRL                   0x00988
> +#define   PORT_TUNCTRL_PERST_ON                BIT(0)
> +#define   PORT_TUNCTRL_PERST_ACK_REQ   BIT(1)
> +#define PORT_TUNSTAT                   0x0098c
> +#define   PORT_TUNSTAT_PERST_ON                BIT(0)
> +#define   PORT_TUNSTAT_PERST_ACK_PEND  BIT(1)
> +#define PORT_PREFMEM_ENABLE            0x00994
> +
> +/* The doorbell address is "well known" */
> +#define DOORBELL_ADDR                  0xfffff000
> +
> +/* The hardware exposes 3 ports. Port 0 (WiFi and Bluetooth) is special, as it
> + * is power-gated by SMC to facilitate rfkill.
> + */
> +enum apple_pcie_port {
> +       APPLE_PCIE_PORT_RADIO    = 0,
> +       APPLE_PCIE_PORT_USB      = 1,
> +       APPLE_PCIE_PORT_ETHERNET = 2,
> +       APPLE_PCIE_NUM_PORTS
> +};
> +
> +struct apple_pcie {
> +       u32                     msi_base;
> +       u32                     nvecs;
> +       struct mutex            lock;
> +       struct device           *dev;
> +       struct irq_domain       *domain;
> +       unsigned long           *bitmap;
> +       void __iomem            *rc;
> +};
> +
> +static inline void rmwl(u32 clr, u32 set, void __iomem *addr)
> +{
> +       writel((readl(addr) & ~clr) | set, addr);
> +}
> +
> +static void apple_msi_top_irq_mask(struct irq_data *d)
> +{
> +       pci_msi_mask_irq(d);
> +       irq_chip_mask_parent(d);
> +}
> +
> +static void apple_msi_top_irq_unmask(struct irq_data *d)
> +{
> +       pci_msi_unmask_irq(d);
> +       irq_chip_unmask_parent(d);
> +}
> +
> +static void apple_msi_top_irq_eoi(struct irq_data *d)
> +{
> +       irq_chip_eoi_parent(d);
> +}
> +
> +static struct irq_chip apple_msi_top_chip = {
> +       .name                   = "PCIe MSI",
> +       .irq_mask               = apple_msi_top_irq_mask,
> +       .irq_unmask             = apple_msi_top_irq_unmask,
> +       .irq_eoi                = apple_msi_top_irq_eoi,
> +       .irq_set_affinity       = irq_chip_set_affinity_parent,
> +       .irq_set_type           = irq_chip_set_type_parent,
> +};
> +
> +static void apple_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
> +{
> +       msg->address_hi = 0;
> +       msg->address_lo = DOORBELL_ADDR;
> +       msg->data = data->hwirq;
> +}
> +
> +static struct irq_chip apple_msi_bottom_chip = {
> +       .name                   = "MSI",
> +       .irq_mask               = irq_chip_mask_parent,
> +       .irq_unmask             = irq_chip_unmask_parent,
> +       .irq_set_affinity       = irq_chip_set_affinity_parent,
> +       .irq_eoi                = irq_chip_eoi_parent,
> +       .irq_set_affinity       = irq_chip_set_affinity_parent,
> +       .irq_set_type           = irq_chip_set_type_parent,
> +       .irq_compose_msi_msg    = apple_msi_compose_msg,
> +};
> +
> +static int apple_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
> +                                 unsigned int nr_irqs, void *args)
> +{
> +       struct apple_pcie *pcie = domain->host_data;
> +       struct irq_fwspec fwspec;
> +       unsigned int i;
> +       int ret, hwirq;
> +
> +       mutex_lock(&pcie->lock);
> +
> +       hwirq = bitmap_find_free_region(pcie->bitmap, pcie->nvecs,
> +                                       order_base_2(nr_irqs));
> +
> +       mutex_unlock(&pcie->lock);
> +
> +       if (hwirq < 0)
> +               return -ENOSPC;
> +
> +       fwspec.fwnode = domain->parent->fwnode;
> +       fwspec.param_count = 3;
> +       fwspec.param[0] = 0;
> +       fwspec.param[1] = hwirq + pcie->msi_base;
> +       fwspec.param[2] = IRQ_TYPE_EDGE_RISING;
> +
> +       ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, &fwspec);
> +       if (ret)
> +               return ret;
> +
> +       for (i = 0; i < nr_irqs; i++) {
> +               irq_domain_set_hwirq_and_chip(domain, virq + i, hwirq + i,
> +                                             &apple_msi_bottom_chip,
> +                                             domain->host_data);
> +       }
> +
> +       return 0;
> +}
> +
> +static void apple_msi_domain_free(struct irq_domain *domain, unsigned int virq,
> +                                 unsigned int nr_irqs)
> +{
> +       struct irq_data *d = irq_domain_get_irq_data(domain, virq);
> +       struct apple_pcie *pcie = domain->host_data;
> +
> +       mutex_lock(&pcie->lock);
> +
> +       bitmap_release_region(pcie->bitmap, d->hwirq, order_base_2(nr_irqs));
> +
> +       mutex_unlock(&pcie->lock);
> +}
> +
> +static const struct irq_domain_ops apple_msi_domain_ops = {
> +       .alloc  = apple_msi_domain_alloc,
> +       .free   = apple_msi_domain_free,
> +};
> +
> +static struct msi_domain_info apple_msi_info = {
> +       .flags  = (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
> +                  MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX),
> +       .chip   = &apple_msi_top_chip,
> +};
> +
> +static int apple_pcie_setup_refclk(void __iomem *rc,
> +                                  void __iomem *port,
> +                                  unsigned int idx)
> +{
> +       u32 stat;
> +       int res;
> +
> +       res = readl_poll_timeout(rc + CORE_RC_PHYIF_STAT, stat,
> +                                stat & CORE_RC_PHYIF_STAT_REFCLK, 100, 50000);
> +       if (res < 0)
> +               return res;
> +
> +       rmwl(0, CORE_LANE_CTL_CFGACC, rc + CORE_LANE_CTL(idx));
> +       rmwl(0, CORE_LANE_CFG_REFCLK0REQ, rc + CORE_LANE_CFG(idx));
> +
> +       res = readl_poll_timeout(rc + CORE_LANE_CFG(idx), stat,
> +                                stat & CORE_LANE_CFG_REFCLK0ACK, 100, 50000);
> +       if (res < 0)
> +               return res;
> +
> +       rmwl(0, CORE_LANE_CFG_REFCLK1, rc + CORE_LANE_CFG(idx));
> +       res = readl_poll_timeout(rc + CORE_LANE_CFG(idx), stat,
> +                                stat & CORE_LANE_CFG_REFCLK1, 100, 50000);
> +
> +       if (res < 0)
> +               return res;
> +
> +       rmwl(CORE_LANE_CTL_CFGACC, 0, rc + CORE_LANE_CTL(idx));
> +       udelay(1);
> +       rmwl(0, CORE_LANE_CFG_REFCLKEN, rc + CORE_LANE_CFG(idx));
> +
> +       rmwl(0, PORT_REFCLK_EN, port + PORT_REFCLK);
> +
> +       return 0;
> +}
> +
> +static int apple_pcie_setup_port(struct apple_pcie *pcie, unsigned int i)
> +{
> +       struct fwnode_handle *fwnode = dev_fwnode(pcie->dev);

Doesn't look like you ever use the fwnode, just get the DT node
pointer. Unless this driver is going to use ACPI someday (and ACPI
changes how PCI is done), there's no point in using fwnode.

> +       void __iomem *port;
> +       struct gpio_desc *reset;
> +       uint32_t stat;
> +       int ret;
> +
> +       port = devm_of_iomap(pcie->dev, to_of_node(fwnode), i + 3, NULL);

It's preferred to use platform resource api and ioremap over DT functions.

> +
> +       if (IS_ERR(port))
> +               return -ENODEV;
> +
> +       reset = devm_gpiod_get_index(pcie->dev, "reset", i, 0);
> +       if (IS_ERR(reset))
> +               return PTR_ERR(reset);
> +
> +       gpiod_direction_output(reset, 0);
> +
> +       rmwl(0, PORT_APPCLK_EN, port + PORT_APPCLK);
> +
> +       ret = apple_pcie_setup_refclk(pcie->rc, port, i);
> +       if (ret < 0)
> +               return ret;
> +
> +       rmwl(0, PORT_PERST_OFF, port + PORT_PERST);
> +       gpiod_set_value(reset, 1);
> +
> +       ret = readl_poll_timeout(port + PORT_STATUS, stat,
> +                                stat & PORT_STATUS_READY, 100, 250000);
> +       if (ret < 0) {
> +               dev_err(pcie->dev, "port %u ready wait timeout\n", i);
> +               return ret;
> +       }
> +
> +       rmwl(PORT_REFCLK_CGDIS, 0, port + PORT_REFCLK);
> +       rmwl(PORT_APPCLK_CGDIS, 0, port + PORT_APPCLK);
> +
> +       ret = readl_poll_timeout(port + PORT_LINKSTS, stat,
> +                                !(stat & PORT_LINKSTS_BUSY), 100, 250000);
> +       if (ret < 0) {
> +               dev_err(pcie->dev, "port %u link not busy timeout\n", i);
> +               return ret;
> +       }
> +
> +       writel(0xfb512fff, port + PORT_INTMSKSET);
> +
> +       writel(PORT_INT_LINK_UP | PORT_INT_LINK_DOWN | PORT_INT_AF_TIMEOUT |
> +              PORT_INT_REQADDR_GT32 | PORT_INT_MSI_ERR |
> +              PORT_INT_MSI_BAD_DATA | PORT_INT_CPL_ABORT |
> +              PORT_INT_CPL_TIMEOUT | (1 << 26), port + PORT_INTSTAT);
> +
> +       usleep_range(5000, 10000);
> +
> +       rmwl(0, PORT_LTSSMCTL_START, port + PORT_LTSSMCTL);
> +
> +       ret = readl_poll_timeout(port + PORT_LINKSTS, stat,
> +                                stat & PORT_LINKSTS_UP, 100, 500000);
> +       if (ret < 0) {
> +               dev_err(pcie->dev, "port %u link up wait timeout\n", i);
> +               return ret;
> +       }
> +
> +       writel(DOORBELL_ADDR, port + PORT_MSIADDR);
> +       writel(0, port + PORT_MSIBASE);
> +       writel((5 << PORT_MSICFG_L2MSINUM_SHIFT) | PORT_MSICFG_EN,
> +              port + PORT_MSICFG);
> +
> +       return 0;
> +}
> +
> +static int apple_msi_init(struct apple_pcie *pcie)
> +{
> +       struct fwnode_handle *fwnode = dev_fwnode(pcie->dev);
> +       struct device_node *parent_intc;
> +       struct irq_domain *parent;
> +       int ret, i;
> +
> +       pcie->rc = devm_of_iomap(pcie->dev, to_of_node(fwnode), 1, NULL);

Use devm_platform_ioremap_resource instead.

Rob
Alyssa Rosenzweig Aug. 15, 2021, 9:33 p.m. UTC | #3
Hi Rob,

Thanks for the review.

> > +#define CORE_RC_PHYIF_CTL              0x00024

> > +#define   CORE_RC_PHYIF_CTL_RUN                BIT(0)

> > +#define CORE_RC_PHYIF_STAT             0x00028

> > +#define   CORE_RC_PHYIF_STAT_REFCLK    BIT(4)

> > +#define CORE_RC_CTL                    0x00050

> > +#define   CORE_RC_CTL_RUN              BIT(0)

> > +#define CORE_RC_STAT                   0x00058

> > +#define   CORE_RC_STAT_READY           BIT(0)

> > +#define CORE_FABRIC_STAT               0x04000

> > +#define   CORE_FABRIC_STAT_MASK                0x001F001F

> > +#define CORE_PHY_CTL                   0x80000

> > +#define   CORE_PHY_CTL_CLK0REQ         BIT(0)

> > +#define   CORE_PHY_CTL_CLK1REQ         BIT(1)

> > +#define   CORE_PHY_CTL_CLK0ACK         BIT(2)

> > +#define   CORE_PHY_CTL_CLK1ACK         BIT(3)

> > +#define   CORE_PHY_CTL_RESET           BIT(7)

> 

> I was going to say these should be a phy driver perhaps, but they are

> unused. So for now, just drop them.


Removed in v2.

CORE_PHY_CTRL is used in the asahi linux bootloader (m1n1, shared between
linux+uboot+bsd) to do early pcie bringup. They are indeed not used
here, nor are they used in the uboot/bsd drivers.

> > +static int apple_pcie_setup_port(struct apple_pcie *pcie, unsigned int i)

> > +{

> > +       struct fwnode_handle *fwnode = dev_fwnode(pcie->dev);

> 

> Doesn't look like you ever use the fwnode, just get the DT node

> pointer. Unless this driver is going to use ACPI someday (and ACPI

> changes how PCI is done), there's no point in using fwnode.


Dropped in v2.

That was a copypaste fail splitting off apple_pcie_setup_port from
apple_msi_init in an early revision.

> It's preferred to use platform resource api and ioremap over DT functions.

> ...

> Use devm_platform_ioremap_resource instead.


Done in v2.

Thanks,

Alyssa
Alyssa Rosenzweig Aug. 15, 2021, 9:40 p.m. UTC | #4
> > +/* The hardware exposes 3 ports. Port 0 (WiFi and Bluetooth) is 

> > special, as it

> > + * is power-gated by SMC to facilitate rfkill.

> > + */

> > +enum apple_pcie_port {

> > +	APPLE_PCIE_PORT_RADIO    = 0,

> > +	APPLE_PCIE_PORT_USB      = 1,

> > +	APPLE_PCIE_PORT_ETHERNET = 2,

> > +	APPLE_PCIE_NUM_PORTS

> > +};

> 

> This will also be used for the Thunderbolt ports, where this enum

> won't apply at all. I could also see Apple changing the individual port

> assignments in the future. I'd just remove it here and have this file be

> a generic PCIe driver for Apple SoCs.


Removed in v2.

> As above, I don't think it makes sense to special-case anything for the

> devices on the bus here. These controllers also have hot plug support,

> so the radios don't have to be on by the time it's initialized.

> We could also just turn them on in the bootloader for now.


This should be fixed in v2 with Mark's device tree bindings.
Alyssa Rosenzweig Aug. 16, 2021, 1:31 a.m. UTC | #5
Hi Marc,

Thank you for the review.

> This really needs to be split into multiple patches:

> 

> - PCI probing

> - Clock management

> - MSI implementation


Split up in v2.

> > +enum apple_pcie_port {

> > +	APPLE_PCIE_PORT_RADIO    = 0,

> > +	APPLE_PCIE_PORT_USB      = 1,

> > +	APPLE_PCIE_PORT_ETHERNET = 2,

> 

> This really doesn't belong in a low-level PCIe controller driver. The

> ports should be purely generic.


Fixed in v2.

> Please use relaxed accessors. If the barriers are actually needed,

> please document what you are ordering against. This applies throughout

> the patch.


Relaxed accessors are used throughout in v2... it Works For Me™ but no
guarantees I didn't introduce a race...

> This also begs the question: can this be called concurrently?


I'm not sure. Sven, any idea how Apple devices are usually structured here?

> > +static void apple_msi_top_irq_eoi(struct irq_data *d)

> > +{

> > +	irq_chip_eoi_parent(d);

> > +}

> 

> Drop this and use the irq_chip_eoi_parent() directly in the

> irqchip. This was only here for debug.


Done in v2.

> In any case, please use the lower_32bit/upper_32bit helpers, just in

> case we can move the address somewhere else.


Done in v2.

> > +	writel(0xfb512fff, port + PORT_INTMSKSET);

> 

> Magic. What is this for?


Sven's explanation is the most likely. This magic value comes from
Corellium via Mark; I assume it's written by macOS.

> > +	writel(PORT_INT_LINK_UP | PORT_INT_LINK_DOWN | PORT_INT_AF_TIMEOUT |

> > +	       PORT_INT_REQADDR_GT32 | PORT_INT_MSI_ERR |

> > +	       PORT_INT_MSI_BAD_DATA | PORT_INT_CPL_ABORT |

> > +	       PORT_INT_CPL_TIMEOUT | (1 << 26), port + PORT_INTSTAT);

> > +

> > +	usleep_range(5000, 10000);

> > +

> > +	rmwl(0, PORT_LTSSMCTL_START, port + PORT_LTSSMCTL);

> > +

> > +	ret = readl_poll_timeout(port + PORT_LINKSTS, stat,

> > +				 stat & PORT_LINKSTS_UP, 100, 500000);

> > +	if (ret < 0) {

> > +		dev_err(pcie->dev, "port %u link up wait timeout\n", i);

> > +		return ret;

> > +	}

> 

> I have the strong feeling that a lot of things in the above is to get

> an interrupt when the port reports an event. Why the polling then?


I reordered the code to have the configuration after this happen before the START command as suggested (this works), and then removed the poll entirely (this also works?). It's possible the poll here was just a debug leftover in the original code. It's possible it's needed in the original but not needed in the interrupt-driven common code (if the link doesn't come up yet, nothing happens, so we don't have to block on it ourselves..)

It's also possible I've introduced a race that we happen to win every time.

Without specs, it's exceedingly hard to know which it is...

The poll isn't what we want at any rate, so I've removed the poll in v2. But we
may want extra interrupt handling code for v3.

> > +

> > +	writel(DOORBELL_ADDR, port + PORT_MSIADDR);

> > +	writel(0, port + PORT_MSIBASE);

> 

> So here you go, the MSI doorbell *is* configurable. Should it be

> placed somewhere else? Shouldn't it be configured before the link is

> up?


Fixed in v2.

> > +		/* Bringing up the radios requires SMC. Skip for now. */

> > +		if (i == APPLE_PCIE_PORT_RADIO)

> > +			continue;

> 

> Why? Shouldn't this be moved into the driver for the endpoint, rather

> than hardcoding something here which is likely to change from one

> system to another? If establishing the link actually requires talking

> to another part of the system, then it should be described in DT.


Fixed in v2.

Thanks,

Alyssa
Alyssa Rosenzweig Aug. 16, 2021, 1:45 a.m. UTC | #6
> > > +static int apple_pcie_setup_refclk(void __iomem *rc,

> > > +				   void __iomem *port,

> > > +				   unsigned int idx)

> > > +{

> > > +	u32 stat;

> > > +	int res;

> > > +

> > > +	res = readl_poll_timeout(rc + CORE_RC_PHYIF_STAT, stat,

> > > +				 stat & CORE_RC_PHYIF_STAT_REFCLK, 100, 50000);

> > > +	if (res < 0)

> > > +		return res;

> > > +

> > > +	rmwl(0, CORE_LANE_CTL_CFGACC, rc + CORE_LANE_CTL(idx));

> > > +	rmwl(0, CORE_LANE_CFG_REFCLK0REQ, rc + CORE_LANE_CFG(idx));

> > > +

> > > +	res = readl_poll_timeout(rc + CORE_LANE_CFG(idx), stat,

> > > +				 stat & CORE_LANE_CFG_REFCLK0ACK, 100, 50000);

> > > +	if (res < 0)

> > > +		return res;

> > > +

> > > +	rmwl(0, CORE_LANE_CFG_REFCLK1, rc + CORE_LANE_CFG(idx));

> > > +	res = readl_poll_timeout(rc + CORE_LANE_CFG(idx), stat,

> > > +				 stat & CORE_LANE_CFG_REFCLK1, 100, 50000);

> > > +

> > > +	if (res < 0)

> > > +		return res;

> > > +

> > > +	rmwl(CORE_LANE_CTL_CFGACC, 0, rc + CORE_LANE_CTL(idx));

> > > +	udelay(1);

> > > +	rmwl(0, CORE_LANE_CFG_REFCLKEN, rc + CORE_LANE_CFG(idx));

> > > +

> > > +	rmwl(0, PORT_REFCLK_EN, port + PORT_REFCLK);

> > > +

> > > +	return 0;

> > > +}

> 

> This really wants to be moved to its own clock driver, unless there is

> a strong guarantee that the clock tree isn't shared with anything

> else. I expect that parts of that clock tree will need to be

> refcounted, and that's where having a real clock driver will help.

> 

> I also have the strong feeling that the clock hierarchy will need to

> be described in DT one way or another, if only to be able to cope with

> future revisions of this block.


Could be, but this is the most magical part of the driver (no docs...)
and that means it's prohibitively difficult to design useful DT
bindings...

For whatever it's worth the Corellium code doesn't expose any DT
bindings here, so maybe this hasn't changed since older Apple SoCs.

Orthogonal to this magic code, we /do/ need DT bindings for the
clock/power gates. At the moment, m1n1 enables the PCIe clock and leaves
it on, so it's not urgent for this series. But in the future when we
want fine grained power management, we'll need it modeled. Sven has a
WIP clock gate driver and proposed bindings, which can be added to the
PCIe DT later nonintrusively. That shouldn't block this series, however.
Alyssa Rosenzweig Aug. 16, 2021, 3:20 a.m. UTC | #7
Hi Andy,

Thanks for the review.

>      +       depends on GPIOLIB

> 

>     It doesn’t seem to provide a GPIO. 


I thought that was needed to consume GPIOs, but it looks like other
PCI drivers don't do it. Removed.

>      +       if (IS_ERR(port))

>      +               return -ENODEV;

>      +

>      +       reset = devm_gpiod_get_index(pcie->dev, "reset", i, 0);

> 

>    Use appropriate flag.

>     

> 

>      +       if (IS_ERR(reset))

>      +               return PTR_ERR(reset);

>      +

>      +       gpiod_direction_output(reset, 0);

> 

>    Ditto and remove this line.


Fixed in v2, thank you.

>      +       usleep_range(5000, 10000);

> 

>    Sleep of such length should be explained.


Removed in v2.

>      +

> 

>    Redundant blank line 


Presumably fixed in v2.

>      +       pcie->bitmap = devm_kcalloc(pcie->dev,

>      BITS_TO_LONGS(pcie->nvecs),

>      +                                   sizeof(long), GFP_KERNEL);

> 

>    devm_bitmap_zalloc()


Done in v2.

>      +static const struct of_device_id apple_pci_of_match[] = {

>      +       { .compatible = "apple,pcie", .data = &apple_m1_cfg_ecam_ops },

> 

>     

> 

>      +       { },

> 

>    No comma for termination entry 


Fixed in v2.

BR,

Alyssa
Sven Peter Aug. 16, 2021, 6:37 a.m. UTC | #8
On Sun, Aug 15, 2021, at 18:49, Marc Zyngier wrote:
> On Sun, 15 Aug 2021 13:33:14 +0100,

> "Sven Peter" <sven@svenpeter.dev> wrote:

> > 

> > 

> > 

> > On Sun, Aug 15, 2021, at 09:42, Marc Zyngier wrote:

> > > On Sun, 15 Aug 2021 05:25:25 +0100,

> > > Alyssa Rosenzweig <alyssa@rosenzweig.io> wrote:

> > [...]

> > > > +

> > > > +static int apple_pcie_setup_port(struct apple_pcie *pcie, unsigned int i)

> > > > +{

> > > > +	struct fwnode_handle *fwnode = dev_fwnode(pcie->dev);

> > > > +	void __iomem *port;

> > > > +	struct gpio_desc *reset;

> > > > +	uint32_t stat;

> > > > +	int ret;

> > > > +

> > > > +	port = devm_of_iomap(pcie->dev, to_of_node(fwnode), i + 3, NULL);

> > > > +

> > > > +	if (IS_ERR(port))

> > > > +		return -ENODEV;

> > > > +

> > > > +	reset = devm_gpiod_get_index(pcie->dev, "reset", i, 0);

> > > > +	if (IS_ERR(reset))

> > > > +		return PTR_ERR(reset);

> > > > +

> > > > +	gpiod_direction_output(reset, 0);

> > > > +

> > > > +	rmwl(0, PORT_APPCLK_EN, port + PORT_APPCLK);

> > > > +

> > > > +	ret = apple_pcie_setup_refclk(pcie->rc, port, i);

> > > > +	if (ret < 0)

> > > > +		return ret;

> > > > +

> > > > +	rmwl(0, PORT_PERST_OFF, port + PORT_PERST);

> > > > +	gpiod_set_value(reset, 1);

> > > > +

> > > > +	ret = readl_poll_timeout(port + PORT_STATUS, stat,

> > > > +				 stat & PORT_STATUS_READY, 100, 250000);

> > > > +	if (ret < 0) {

> > > > +		dev_err(pcie->dev, "port %u ready wait timeout\n", i);

> > > > +		return ret;

> > > > +	}

> > > > +

> > > > +	rmwl(PORT_REFCLK_CGDIS, 0, port + PORT_REFCLK);

> > > > +	rmwl(PORT_APPCLK_CGDIS, 0, port + PORT_APPCLK);

> > > > +

> > > > +	ret = readl_poll_timeout(port + PORT_LINKSTS, stat,

> > > > +				 !(stat & PORT_LINKSTS_BUSY), 100, 250000);

> > > > +	if (ret < 0) {

> > > > +		dev_err(pcie->dev, "port %u link not busy timeout\n", i);

> > > > +		return ret;

> > > > +	}

> > > > +

> > > > +	writel(0xfb512fff, port + PORT_INTMSKSET);

> > > 

> > > Magic. What is this for?

> > 

> > The magic comes from the original Corellium driver. It first masks everything

> > except for the interrupts in the next line, then acks the interrupts it keeps

> > enabled and then probably wants to wait for PORT_INT_LINK_UP (or any of the

> > other interrupts which seem to indicate various error conditions) to fire but

> > instead polls for PORT_LINKSTS_UP.

> > 

> > > 

> > > > +

> > > > +	writel(PORT_INT_LINK_UP | PORT_INT_LINK_DOWN | PORT_INT_AF_TIMEOUT |

> > > > +	       PORT_INT_REQADDR_GT32 | PORT_INT_MSI_ERR |

> > > > +	       PORT_INT_MSI_BAD_DATA | PORT_INT_CPL_ABORT |

> > > > +	       PORT_INT_CPL_TIMEOUT | (1 << 26), port + PORT_INTSTAT);

> > > > +

> > > > +	usleep_range(5000, 10000);

> > > > +

> > > > +	rmwl(0, PORT_LTSSMCTL_START, port + PORT_LTSSMCTL);

> > > > +

> > > > +	ret = readl_poll_timeout(port + PORT_LINKSTS, stat,

> > > > +				 stat & PORT_LINKSTS_UP, 100, 500000);

> > > > +	if (ret < 0) {

> > > > +		dev_err(pcie->dev, "port %u link up wait timeout\n", i);

> > > > +		return ret;

> > > > +	}

> > > 

> > > I have the strong feeling that a lot of things in the above is to get

> > > an interrupt when the port reports an event. Why the polling then?

> > 

> > 

> > I'm pretty sure this is true. The same registers are also used to setup

> > and handle legacy interrupts.

> >

> > My current understanding is that PORT_INTSTAT is used to retrieve the fired

> > interrupts and ack them, and PORT_INTMSK are the masked interrupts.

> > And then PORT_INTMSKSET and PORT_INTMSKCLR can be used to manipulate

> > individual bits of PORT_INTMSK with a single store.

> 

> So this really should be modelled as an interrupt controller. If

> someone comes up with a bit of a spec (though the bit assignment is

> relatively clear), I can update the interrupt code to handle

> that. After all, I moan enough at people writing horrible PCI driver

> code, I might as well write an example myself and point them to it.


Thanks for the offer!
Unfortunately, what I've written above is almost everything I know about this
hardware. If you tell me what additional details you need to know I can see
what I'm able to figure out though and write a small summary.

Thanks,


Sven
Marc Zyngier Aug. 16, 2021, 9:56 p.m. UTC | #9
On Mon, 16 Aug 2021 02:31:40 +0100,
Alyssa Rosenzweig <alyssa@rosenzweig.io> wrote:
> 

> Hi Marc,

> 

> Thank you for the review.


I wouldn't call that a review. Only a cursory look and a quick mention
of what I found really odd. Without specs, this thing is impossible to
properly review.

[...]

> > Please use relaxed accessors. If the barriers are actually needed,

> > please document what you are ordering against. This applies throughout

> > the patch.

> 

> Relaxed accessors are used throughout in v2... it Works For Me™ but no

> guarantees I didn't introduce a race...


That's not exactly what I wanted to read... You really need to make an
informed decision on the need of barriers. If the MMIO write needs to
be ordered after a main memory write (i.e. a memory write that is
consumed by the device you are subsequently writing to), you then need
a barrier. If, as I suspect, the device isn't DMA capable and doesn't
require ordering with the rest of the memory accesses, then no
barriers are required.

> 

> > This also begs the question: can this be called concurrently?

> 

> I'm not sure. Sven, any idea how Apple devices are usually

> structured here?


Nothing here is Apple specific. If you can get two CPUs to issue a RMW
on the same register, this piece of code is broken. This code has an
undocumented assumption of being single threaded, and it is pretty
unclear whether this assumption holds or not.

[...]

> > > +	writel(0xfb512fff, port + PORT_INTMSKSET);

> > 

> > Magic. What is this for?

> 

> Sven's explanation is the most likely. This magic value comes from

> Corellium via Mark; I assume it's written by macOS.


I really wish there was no magic values whatsoever, and I've resisted
posting the PCIe support patch myself for this very reason. Frankly,
this stuff doesn't give me the warm feeling that we know what we're
doing.

> 

> > > +	writel(PORT_INT_LINK_UP | PORT_INT_LINK_DOWN | PORT_INT_AF_TIMEOUT |

> > > +	       PORT_INT_REQADDR_GT32 | PORT_INT_MSI_ERR |

> > > +	       PORT_INT_MSI_BAD_DATA | PORT_INT_CPL_ABORT |

> > > +	       PORT_INT_CPL_TIMEOUT | (1 << 26), port + PORT_INTSTAT);

> > > +

> > > +	usleep_range(5000, 10000);

> > > +

> > > +	rmwl(0, PORT_LTSSMCTL_START, port + PORT_LTSSMCTL);

> > > +

> > > +	ret = readl_poll_timeout(port + PORT_LINKSTS, stat,

> > > +				 stat & PORT_LINKSTS_UP, 100, 500000);

> > > +	if (ret < 0) {

> > > +		dev_err(pcie->dev, "port %u link up wait timeout\n", i);

> > > +		return ret;

> > > +	}

> > 

> > I have the strong feeling that a lot of things in the above is to get

> > an interrupt when the port reports an event. Why the polling then?

> 

> I reordered the code to have the configuration after this happen

> before the START command as suggested (this works), and then removed

> the poll entirely (this also works?). It's possible the poll here

> was just a debug leftover in the original code.


What happens if the core PCI code probes the ports without the link
being up yet?

> It's possible it's needed in the original but not needed in the

> interrupt-driven common code (if the link doesn't come up yet,

> nothing happens, so we don't have to block on it ourselves..)

> 

> It's also possible I've introduced a race that we happen to win every time.

> 

> Without specs, it's exceedingly hard to know which it is...


Indeed, and I hate this "finger in the air" approach. Specially when
you need to trust your data to it.

> The poll isn't what we want at any rate, so I've removed the poll in

> v2. But we may want extra interrupt handling code for v3.


Indeed. I need to rework the MSI patch anyway after the discussion
with Rob, and I'll see what I can do for the rest of the event stuff.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
Arnd Bergmann Aug. 17, 2021, 7:34 a.m. UTC | #10
On Mon, Aug 16, 2021 at 11:57 PM Marc Zyngier <maz@kernel.org> wrote:
> On Mon, 16 Aug 2021 02:31:40 +0100, Alyssa Rosenzweig <alyssa@rosenzweig.io> wrote:

>

> > > Please use relaxed accessors. If the barriers are actually needed,

> > > please document what you are ordering against. This applies throughout

> > > the patch.

> >

> > Relaxed accessors are used throughout in v2... it Works For Me™ but no

> > guarantees I didn't introduce a race...

>

> That's not exactly what I wanted to read... You really need to make an

> informed decision on the need of barriers. If the MMIO write needs to

> be ordered after a main memory write (i.e. a memory write that is

> consumed by the device you are subsequently writing to), you then need

> a barrier. If, as I suspect, the device isn't DMA capable and doesn't

> require ordering with the rest of the memory accesses, then no

> barriers are required.


My normal rule is to always use the normal accessors, and only use
any special variants if this is either required for correct operation
(e.g. heavy barriers on arm32 may call code that must not recursively
use heavy barriers) or that you have proven to /both/ be correct and
relevant for performance. IOW, don't use the relaxed accessors just
because it isn't wrong in your driver, other developers may copy the
code into a driver that can't do it.

      Arnd
Sven Peter Aug. 17, 2021, 7:35 a.m. UTC | #11
On Mon, Aug 16, 2021, at 23:56, Marc Zyngier wrote:
> [...]

> 

> > > > +	writel(0xfb512fff, port + PORT_INTMSKSET);

> > > 

> > > Magic. What is this for?

> > 

> > Sven's explanation is the most likely. This magic value comes from

> > Corellium via Mark; I assume it's written by macOS.

> 

> I really wish there was no magic values whatsoever, and I've resisted

> posting the PCIe support patch myself for this very reason. Frankly,

> this stuff doesn't give me the warm feeling that we know what we're

> doing.


I'm pretty sure we can get rid of most magic since we have register names for
almost everything we need and since m1n1 does the really obscure black magic
involving the PHY layer and those tunables thanks to Mark.

As I mentioned earlier, all bits missing in 0xfb512fff are those used in
the writel one line below. This line only keeps a set of interrupts unmasked
and the next one acks exactly this set (which isn't correct, but that's what
this code does).
There only unknown interrupt here is BIT(26) but this whole sequence is like a
cargo cult anyway right now since nothing checks for these interrupts.

> 

> > 

> > > > +	writel(PORT_INT_LINK_UP | PORT_INT_LINK_DOWN | PORT_INT_AF_TIMEOUT |

> > > > +	       PORT_INT_REQADDR_GT32 | PORT_INT_MSI_ERR |

> > > > +	       PORT_INT_MSI_BAD_DATA | PORT_INT_CPL_ABORT |

> > > > +	       PORT_INT_CPL_TIMEOUT | (1 << 26), port + PORT_INTSTAT);

> > > > +

> > > > +	usleep_range(5000, 10000);

> > > > +

> > > > +	rmwl(0, PORT_LTSSMCTL_START, port + PORT_LTSSMCTL);

> > > > +

> > > > +	ret = readl_poll_timeout(port + PORT_LINKSTS, stat,

> > > > +				 stat & PORT_LINKSTS_UP, 100, 500000);

> > > > +	if (ret < 0) {

> > > > +		dev_err(pcie->dev, "port %u link up wait timeout\n", i);

> > > > +		return ret;

> > > > +	}

> > > 

> > > I have the strong feeling that a lot of things in the above is to get

> > > an interrupt when the port reports an event. Why the polling then?

> > 

> > I reordered the code to have the configuration after this happen

> > before the START command as suggested (this works), and then removed

> > the poll entirely (this also works?). It's possible the poll here

> > was just a debug leftover in the original code.

> 

> What happens if the core PCI code probes the ports without the link

> being up yet?


We pretty much rely on everything being slow enough that the ports aren't probed
if there's no readl_poll_timeout and no waiting for the link up interrupt.
Now it doesn't take long for the link to be up after the LTSSM has been
started, but it's still a few cycles and this is not a good idea.
This needs to wait for the interrupt.

I don't know yet what the first usleep_range is used for, but I'm willing to
bet it's either waiting for another interrupt to fire or sometimes the link doesn't
come up the first time and you just have to try again and the usleep prevents that.
(I'm less inclined to bet on this one, but: this might be required for the first
port with the WiFi/bluetooth radios which will never come up unless power has been
enabled by talking to another co-processor first. That usleep_range might be a hack
so that this code always runs after power has been enabled.)

That same LTSSM retry flow is used for Thunderbolt hotplugging fwiw:
Wait for the NHI layer to notify you, start the link training, wait for the
link up interrupt (or the link down or error interrupt and just try link training
again a few times), rescan the port.

> 

> > It's possible it's needed in the original but not needed in the

> > interrupt-driven common code (if the link doesn't come up yet,

> > nothing happens, so we don't have to block on it ourselves..)

> > 

> > It's also possible I've introduced a race that we happen to win every time.

> > 

> > Without specs, it's exceedingly hard to know which it is...

> 

> Indeed, and I hate this "finger in the air" approach. Specially when

> you need to trust your data to it.

> 

> > The poll isn't what we want at any rate, so I've removed the poll in

> > v2. But we may want extra interrupt handling code for v3.

> 

> Indeed. I need to rework the MSI patch anyway after the discussion

> with Rob, and I'll see what I can do for the rest of the event stuff.


Again, thanks a lot for this! As I said in the other mail, if you need
any specific information about this hardware just let me know.
I won't be able to give you an accurate spec but I can try to figure out
most details you need.

Thanks,


Sven
Marc Zyngier Aug. 17, 2021, 8:12 a.m. UTC | #12
On Tue, 17 Aug 2021 08:34:35 +0100,
Arnd Bergmann <arnd@kernel.org> wrote:
> 

> On Mon, Aug 16, 2021 at 11:57 PM Marc Zyngier <maz@kernel.org> wrote:

> > On Mon, 16 Aug 2021 02:31:40 +0100, Alyssa Rosenzweig <alyssa@rosenzweig.io> wrote:

> >

> > > > Please use relaxed accessors. If the barriers are actually needed,

> > > > please document what you are ordering against. This applies throughout

> > > > the patch.

> > >

> > > Relaxed accessors are used throughout in v2... it Works For Me™ but no

> > > guarantees I didn't introduce a race...

> >

> > That's not exactly what I wanted to read... You really need to make an

> > informed decision on the need of barriers. If the MMIO write needs to

> > be ordered after a main memory write (i.e. a memory write that is

> > consumed by the device you are subsequently writing to), you then need

> > a barrier. If, as I suspect, the device isn't DMA capable and doesn't

> > require ordering with the rest of the memory accesses, then no

> > barriers are required.

> 

> My normal rule is to always use the normal accessors, and only use

> any special variants if this is either required for correct operation

> (e.g. heavy barriers on arm32 may call code that must not recursively

> use heavy barriers) or that you have proven to /both/ be correct and

> relevant for performance. IOW, don't use the relaxed accessors just

> because it isn't wrong in your driver, other developers may copy the

> code into a driver that can't do it.


And that exactly the reason why I think we should *not* use heavy
accessors if they are not required. I have little sympathy for blindly
copied code, and spreading unnecessary barriers means that we cannot
further reason about the actual ordering requirements.

In other words, blanket use of heavy MMIO accessors to guarantee
memory ordering is not dissimilar to reintroducing the BKL because we
don't want people to worry about concurrency.

	M.

-- 
Without deviation from the norm, progress is not possible.
Hector Martin Aug. 18, 2021, 11:43 a.m. UTC | #13
On 15/08/2021 21.33, Sven Peter wrote:
> The magic comes from the original Corellium driver. It first masks everything

> except for the interrupts in the next line, then acks the interrupts it keeps

> enabled and then probably wants to wait for PORT_INT_LINK_UP (or any of the

> other interrupts which seem to indicate various error conditions) to fire but

> instead polls for PORT_LINKSTS_UP.


Let's not take any magic numbers from their drivers (or what macOS does, 
for that matter) without making an attempt to understand what they do, 
unless it becomes clear it's incomprehensible. This has already bit us 
in the past (the SError disable thing).

-- 
Hector Martin (marcan@marcan.st)
Public Key: https://mrcn.st/pub
Mark Kettenis Aug. 18, 2021, 2:22 p.m. UTC | #14
> From: Hector Martin <marcan@marcan.st>

> Date: Wed, 18 Aug 2021 20:43:48 +0900

> 

> On 15/08/2021 21.33, Sven Peter wrote:

> > The magic comes from the original Corellium driver. It first masks

> > everything except for the interrupts in the next line, then acks

> > the interrupts it keeps enabled and then probably wants to wait

> > for PORT_INT_LINK_UP (or any of the other interrupts which seem to

> > indicate various error conditions) to fire but instead polls for

> > PORT_LINKSTS_UP.

> 

> Let's not take any magic numbers from their drivers (or what macOS does, 

> for that matter) without making an attempt to understand what they do, 

> unless it becomes clear it's incomprehensible. This has already bit us 

> in the past (the SError disable thing).


The driver should really only unmask the interrupts it handles in its
interrupt handler.  We should know the meaning of those bits so using
the appropriate symbolic names shouldn't be too difficult.

Didn't delve into this yet since U-Boot doesn't do interrupts (so I
don't touch the port interrupt registers there) and on OpenBSD I only
implemented MSIs for now as all the integrated devices support MSIs
just fine.  I'll need to revisit this at some point to support the
Thunderbolt ports.