Message ID | 20250416-pcie-reset-slot-v2-3-efe76b278c10@linaro.org |
---|---|
State | New |
Headers | show |
Series | PCI: Add support for resetting the slots in a platform specific way | expand |
On Thu, Apr 17, 2025 at 02:41:55PM +0530, Krishna Chaitanya Chundru wrote: > > > On 4/17/2025 1:24 PM, Manivannan Sadhasivam wrote: > > On Wed, Apr 16, 2025 at 11:21:49PM +0530, Krishna Chaitanya Chundru wrote: > > > > > > > > > On 4/16/2025 9:59 PM, Manivannan Sadhasivam via B4 Relay wrote: > > > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > > > > > > > The PCI link, when down, needs to be recovered to bring it back. But that > > > > cannot be done in a generic way as link recovery procedure is specific to > > > > host bridges. So add a new API pci_host_handle_link_down() that could be > > > > called by the host bridge drivers when the link goes down. > > > > > > > > The API will iterate through all the slots and calls the pcie_do_recovery() > > > > function with 'pci_channel_io_frozen' as the state. This will result in the > > > > execution of the AER Fatal error handling code. Since the link down > > > > recovery is pretty much the same as AER Fatal error handling, > > > > pcie_do_recovery() helper is reused here. First the AER error_detected > > > > callback will be triggered for the bridge and the downstream devices. Then, > > > > pcie_do_slot_reset() will be called for each slots, which will reset the > > > > slots using 'reset_slot' callback to recover the link. Once that's done, > > > > resume message will be broadcasted to the bridge and the downstream devices > > > > indicating successful link recovery. > > > > > > > > In case if the AER support is not enabled in the kernel, only > > > > pci_bus_error_reset() will be called for each slots as there is no way we > > > > could inform the drivers about link recovery. > > > > > > > The PCIe endpoint drivers are registering with err_handlers and they > > > will be invoked only from pcie_do_recovery, but there are getting built > > > by default irrespective of AER is enabled or not. > > > > > > > AER is *one* of the functionalities of an endpoint. And the endpoint could > > mostly work without AER reporting (except for AER fatal/non-fatal where recovery > > need to be performed by the host). So it wouldn't make sense to add AER > > dependency for them. > > > > > Does it make sense to built err.c irrespective of AER is enabled or not > > > to use common logic without the need of having dependency on AER. > > > > > > > Well, yes and no. Right now, only DPC reuses the err handlers except AER. But > > DPC driver itself is functional dependent on AER. So I don't think it is really > > required to build err.c independent of AER. But I will try to rework the code in > > the future for fixing things like 'AER' prefix added to logs and such. > > > Right now we have DPC & AER to use this pcie_do_recovery(), now we are > adding supporting for controller reported error (Link down) not sure if > there will be newer ways to report errors in future. > > May be not in this series, in future better to de-couple err.c from > AER as err.c. As the sources of error reporting is not limited to AER > or DPC alone now. > Yes, that's part of my plan. > > > Also since err.c is tied with AER, DPC also had a hard requirement > > > to enable AER which is not needed technically. > > > > > > > DPC driver is functional dependent on AER. > I got a impression by seeing below statement that DPC can work > independently. > As per spec 6 sec 6.2.11.2, DPC error signaling "A DPC-capable > Downstream Port must support ERR_COR signaling, independent of whether > it supports Advanced Error Reporting (AER) or not". > That's why I intentionally said 'DPC driver' not 'DPC'. The driver has the dependency, not the feature. > In fact it can work if AER is not enabled also, but will not have full > functionality of DPC. > Right. That's why I said functionally dependent. - Mani
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index b81e99cd4b62a3022c8b07a09f212f6888674487..6c1d4c5a82d68e5842636ff296a8d3a06487cb11 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -966,6 +966,7 @@ int pci_aer_clear_status(struct pci_dev *dev); int pci_aer_raw_clear_status(struct pci_dev *dev); void pci_save_aer_state(struct pci_dev *dev); void pci_restore_aer_state(struct pci_dev *dev); +void pcie_do_recover_slots(struct pci_host_bridge *host); #else static inline void pci_no_aer(void) { } static inline void pci_aer_init(struct pci_dev *d) { } @@ -975,6 +976,26 @@ static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; } static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL; } static inline void pci_save_aer_state(struct pci_dev *dev) { } static inline void pci_restore_aer_state(struct pci_dev *dev) { } +static inline void pcie_do_recover_slots(struct pci_host_bridge *host) +{ + struct pci_bus *bus = host->bus; + struct pci_dev *dev; + int ret; + + if (!host->reset_slot) + dev_warn(&host->dev, "Missing reset_slot() callback\n"); + + for_each_pci_bridge(dev, bus) { + if (!pci_is_root_bus(bus)) + continue; + + ret = pci_bus_error_reset(dev); + if (ret) + pci_err(dev, "Failed to reset slot: %d\n", ret); + else + pci_info(dev, "Slot has been reset\n"); + } +} #endif #ifdef CONFIG_ACPI diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index b834fc0d705938540d3d7d3d8739770c09fe7cf1..70d8cd37255c5638fddf38e13ea87cb8ebe8553f 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -270,3 +270,30 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, return status; } + +static pci_ers_result_t pcie_do_slot_reset(struct pci_dev *dev) +{ + int ret; + + ret = pci_bus_error_reset(dev); + if (ret) { + pci_err(dev, "Failed to reset slot: %d\n", ret); + return PCI_ERS_RESULT_DISCONNECT; + } + + pci_info(dev, "Slot has been reset\n"); + + return PCI_ERS_RESULT_RECOVERED; +} + +void pcie_do_recover_slots(struct pci_host_bridge *host) +{ + struct pci_bus *bus = host->bus; + struct pci_dev *dev; + + for_each_pci_bridge(dev, bus) { + if (pci_is_root_bus(bus)) + pcie_do_recovery(dev, pci_channel_io_frozen, + pcie_do_slot_reset); + } +} diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 364fa2a514f8a68fb18bded3259c6847d3932f8b..60ad20eea0259797e68afa7979bb1fc24b6f213b 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -3249,6 +3249,13 @@ int pci_host_probe(struct pci_host_bridge *bridge) } EXPORT_SYMBOL_GPL(pci_host_probe); +void pci_host_handle_link_down(struct pci_host_bridge *bridge) +{ + dev_info(&bridge->dev, "Recovering slots due to Link Down\n"); + pcie_do_recover_slots(bridge); +} +EXPORT_SYMBOL_GPL(pci_host_handle_link_down); + int pci_bus_insert_busn_res(struct pci_bus *b, int bus, int bus_max) { struct resource *res = &b->busn_res; diff --git a/include/linux/pci.h b/include/linux/pci.h index 8d7d2a49b76cf64b4218b179cec495e0d69ddf6f..76e977af2d524200b67f39a6d0417ee565cf5116 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1157,6 +1157,7 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus, struct pci_ops *ops, void *sysdata, struct list_head *resources); int pci_host_probe(struct pci_host_bridge *bridge); +void pci_host_handle_link_down(struct pci_host_bridge *bridge); int pci_bus_insert_busn_res(struct pci_bus *b, int bus, int busmax); int pci_bus_update_busn_res_end(struct pci_bus *b, int busmax); void pci_bus_release_busn_res(struct pci_bus *b);