diff mbox series

PCI: qcom-ep: Treat unknown irq events as an error

Message ID 20230726152931.18134-1-manivannan.sadhasivam@linaro.org
State Accepted
Commit 823de40c94d680ec8d57a660ac96a653acadb0a5
Headers show
Series PCI: qcom-ep: Treat unknown irq events as an error | expand

Commit Message

Manivannan Sadhasivam July 26, 2023, 3:29 p.m. UTC
Sometimes, the Qcom PCIe EP controller can receive some interrupts that are
not known to the driver like safety interrupts in newer SoCs. In those
cases, if the driver doesn't clear the interrupts, then it will end up in
interrupt storm. But the users won't have any idea about it due to the log
being treated as a debug message.

So let's treat the unknown event log as an error, so that it at least makes
the user aware, thereby getting fixed eventually.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---
 drivers/pci/controller/dwc/pcie-qcom-ep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Bjorn Helgaas July 31, 2023, 4:57 p.m. UTC | #1
On Wed, Jul 26, 2023 at 08:59:31PM +0530, Manivannan Sadhasivam wrote:
> Sometimes, the Qcom PCIe EP controller can receive some interrupts that are
> not known to the driver like safety interrupts in newer SoCs. In those
> cases, if the driver doesn't clear the interrupts, then it will end up in
> interrupt storm. But the users won't have any idea about it due to the log
> being treated as a debug message.
> 
> So let's treat the unknown event log as an error, so that it at least makes
> the user aware, thereby getting fixed eventually.

Would it be practical to log the error message, then clear the
interrupt to avoid the interrupt storm?

> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> ---
>  drivers/pci/controller/dwc/pcie-qcom-ep.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c
> index 267e1247d548..802dedcc929c 100644
> --- a/drivers/pci/controller/dwc/pcie-qcom-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c
> @@ -593,7 +593,7 @@ static irqreturn_t qcom_pcie_ep_global_irq_thread(int irq, void *data)
>  		dw_pcie_ep_linkup(&pci->ep);
>  		pcie_ep->link_status = QCOM_PCIE_EP_LINK_UP;
>  	} else {
> -		dev_dbg(dev, "Received unknown event: %d\n", status);
> +		dev_err(dev, "Received unknown event: %d\n", status);
>  	}
>  
>  	return IRQ_HANDLED;
> -- 
> 2.25.1
>
Manivannan Sadhasivam Aug. 2, 2023, 5:49 a.m. UTC | #2
On Mon, Jul 31, 2023 at 11:57:38AM -0500, Bjorn Helgaas wrote:
> On Wed, Jul 26, 2023 at 08:59:31PM +0530, Manivannan Sadhasivam wrote:
> > Sometimes, the Qcom PCIe EP controller can receive some interrupts that are
> > not known to the driver like safety interrupts in newer SoCs. In those
> > cases, if the driver doesn't clear the interrupts, then it will end up in
> > interrupt storm. But the users won't have any idea about it due to the log
> > being treated as a debug message.
> > 
> > So let's treat the unknown event log as an error, so that it at least makes
> > the user aware, thereby getting fixed eventually.
> 
> Would it be practical to log the error message, then clear the
> interrupt to avoid the interrupt storm?

Just to make it clear that we are already clearing the IRQs in
PARF_INT_ALL_CLEAR. But the issue is, on newer platforms there are a couple more
of these registers that the driver should mask and clear.

Qcom faced an interrupt storm issue while bringing up PCIe EP on a new
platform. On that platform, several interrupts were added for some specific
usecases. Since we do not mask/clear those interrupts, it resulted in an
interrupt storm. Moreover, the absence of error message made it difficult to
debug the issue as there were no logs from the driver (they didn't enable DEBUG
unfortunately during bringup as I thought one would do).

So to catch these kind of issues in the future, I just want to promote the log
to dev_err().

- Mani

> 
> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > ---
> >  drivers/pci/controller/dwc/pcie-qcom-ep.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c
> > index 267e1247d548..802dedcc929c 100644
> > --- a/drivers/pci/controller/dwc/pcie-qcom-ep.c
> > +++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c
> > @@ -593,7 +593,7 @@ static irqreturn_t qcom_pcie_ep_global_irq_thread(int irq, void *data)
> >  		dw_pcie_ep_linkup(&pci->ep);
> >  		pcie_ep->link_status = QCOM_PCIE_EP_LINK_UP;
> >  	} else {
> > -		dev_dbg(dev, "Received unknown event: %d\n", status);
> > +		dev_err(dev, "Received unknown event: %d\n", status);
> >  	}
> >  
> >  	return IRQ_HANDLED;
> > -- 
> > 2.25.1
> >
Krzysztof WilczyƄski Aug. 25, 2023, 4:36 p.m. UTC | #3
Hello,

> Sometimes, the Qcom PCIe EP controller can receive some interrupts that are
> not known to the driver like safety interrupts in newer SoCs. In those
> cases, if the driver doesn't clear the interrupts, then it will end up in
> interrupt storm. But the users won't have any idea about it due to the log
> being treated as a debug message.
> 
> So let's treat the unknown event log as an error, so that it at least makes
> the user aware, thereby getting fixed eventually.

Applied to controller/qcom-ep, thank you!

[1/1] PCI: qcom-ep: Treat unknown IRQ events as an error
      https://git.kernel.org/pci/pci/c/4f4371b9617b

	Krzysztof
diff mbox series

Patch

diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c
index 267e1247d548..802dedcc929c 100644
--- a/drivers/pci/controller/dwc/pcie-qcom-ep.c
+++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c
@@ -593,7 +593,7 @@  static irqreturn_t qcom_pcie_ep_global_irq_thread(int irq, void *data)
 		dw_pcie_ep_linkup(&pci->ep);
 		pcie_ep->link_status = QCOM_PCIE_EP_LINK_UP;
 	} else {
-		dev_dbg(dev, "Received unknown event: %d\n", status);
+		dev_err(dev, "Received unknown event: %d\n", status);
 	}
 
 	return IRQ_HANDLED;