[5.4,055/168] PCI/PM: Add missing link delays required by the PCIe spec

From: Mika Westerberg <mika.westerberg@linux.intel.com>

From: Mika Westerberg <mika.westerberg@linux.intel.com>

[ Upstream commit ad9001f2f41198784b0423646450ba2cb24793a3 ]

Currently Linux does not follow PCIe spec regarding the required delays
after reset. A concrete example is a Thunderbolt add-in-card that consists
of a PCIe switch and two PCIe endpoints:

  +-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller
                                  +-01.0-[04-36]-- DS hotplug port
                                  +-02.0-[37]----00.0 xHCI controller
                                  \-04.0-[38-6b]-- DS hotplug port

The root port (1b.0) and the PCIe switch downstream ports are all PCIe Gen3
so they support 8GT/s link speeds.

We wait for the PCIe hierarchy to enter D3cold (runtime):

  pcieport 0000:00:1b.0: power state changed by ACPI to D3cold

When it wakes up from D3cold, according to the PCIe 5.0 section 5.8 the
PCIe switch is put to reset and its power is re-applied. This means that we
must follow the rules in PCIe 5.0 section 6.6.1.

For the PCIe Gen3 ports we are dealing with here, the following applies:

  With a Downstream Port that supports Link speeds greater than 5.0 GT/s,
  software must wait a minimum of 100 ms after Link training completes
  before sending a Configuration Request to the device immediately below
  that Port. Software can determine when Link training completes by polling
  the Data Link Layer Link Active bit or by setting up an associated
  interrupt (see Section 6.7.3.3).

Translating this into the above topology we would need to do this (DLLLA
stands for Data Link Layer Link Active):

  0000:00:1b.0: wait for 100 ms after DLLLA is set before access to 0000:01:00.0
  0000:02:00.0: wait for 100 ms after DLLLA is set before access to 0000:03:00.0
  0000:02:02.0: wait for 100 ms after DLLLA is set before access to 0000:37:00.0

I've instrumented the kernel with some additional logging so we can see the
actual delays performed:

  pcieport 0000:00:1b.0: power state changed by ACPI to D0
  pcieport 0000:00:1b.0: waiting for D3cold delay of 100 ms
  pcieport 0000:00:1b.0: waiting for D3hot delay of 10 ms
  pcieport 0000:02:01.0: waiting for D3hot delay of 10 ms
  pcieport 0000:02:04.0: waiting for D3hot delay of 10 ms

For the switch upstream port (01:00.0 reachable through 00:1b.0 root port)
we wait for 100 ms but not taking into account the DLLLA requirement. We
then wait 10 ms for D3hot -> D0 transition of the root port and the two
downstream hotplug ports. This means that we deviate from what the spec
requires.

Performing the same check for system sleep (s2idle) transitions it turns
out to be even worse. None of the mandatory delays are performed. If this
would be S3 instead of s2idle then according to PCI FW spec 3.2 section
4.6.8. there is a specific _DSM that allows the OS to skip the delays but
this platform does not provide the _DSM and does not go to S3 anyway so no
firmware is involved that could already handle these delays.

On this particular platform these delays are not actually needed because
there is an additional delay as part of the ACPI power resource that is
used to turn on power to the hierarchy but since that additional delay is
not required by any of standards (PCIe, ACPI) it is not present in the
Intel Ice Lake, for example where missing the mandatory delays causes
pciehp to start tearing down the stack too early (links are not yet
trained). Below is an example how it looks like when this happens:

  pcieport 0000:83:04.0: pciehp: Slot(4): Card not present
  pcieport 0000:87:04.0: PME# disabled
  pcieport 0000:83:04.0: pciehp: pciehp_unconfigure_device: domain:bus:dev = 0000:86:00
  pcieport 0000:86:00.0: Refused to change power state, currently in D3
  pcieport 0000:86:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x201ff)
  pcieport 0000:86:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
  ...

There is also one reported case (see the bugzilla link below) where the
missing delay causes xHCI on a Titan Ridge controller fail to runtime
resume when USB-C dock is plugged. This does not involve pciehp but instead
the PCI core fails to runtime resume the xHCI device:

  pcieport 0000:04:02.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
  pcieport 0000:04:02.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100406)
  xhci_hcd 0000:39:00.0: Refused to change power state, currently in D3
  xhci_hcd 0000:39:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
  xhci_hcd 0000:39:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
  ...

Add a new function pci_bridge_wait_for_secondary_bus() that is called on
PCI core resume and runtime resume paths accordingly if the bridge entered
D3cold (and thus went through reset).

This is second attempt to add the missing delays. The previous solution in
c2bf1fc212f7 ("PCI: Add missing link delays required by the PCIe spec") was
reverted because of two issues it caused:

  1. One system become unresponsive after S3 resume due to PME service
     spinning in pcie_pme_work_fn(). The root port in question reports that
     the xHCI sent PME but the xHCI device itself does not have PME status
     set. The PME status bit is never cleared in the root port resulting
     the indefinite loop in pcie_pme_work_fn().

  2. Slows down resume if the root/downstream port does not support Data
     Link Layer Active Reporting because pcie_wait_for_link_delay() waits
     1100 ms in that case.

This version should avoid the above issues because we restrict the delay to
happen only if the port went into D3cold.

Link: https://lore.kernel.org/linux-pci/SL2P216MB01878BBCD75F21D882AEEA2880C60@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203885
Link: https://lore.kernel.org/r/20191112091617.70282-3-mika.westerberg@linux.intel.com
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/pci/pci-driver.c |  11 +++-
 drivers/pci/pci.c        | 121 ++++++++++++++++++++++++++++++++++++++-
 drivers/pci/pci.h        |   1 +
 3 files changed, 130 insertions(+), 3 deletions(-)

Message ID	20200428182238.853075240@linuxfoundation.org
State	New
Headers	show Return-Path: <SRS0=2n3K=6M=vger.kernel.org=stable-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F2AEC83004 for <stable@archiver.kernel.org>; Tue, 28 Apr 2020 18:50:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2D48520730 for <stable@archiver.kernel.org>; Tue, 28 Apr 2020 18:50:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1588099838; bh=xnodsSzlqg3ymuvDfcYdShdmvJ5EVROWyPn26FtHWXs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=PXPQvLEsOc9M7QsJzUlsHR0TqVm38vPTAgGKtJWJyYZaGcuRyfhFAvIrnZkvHs8Pc T+xi3Xx5KNctyc2kvKhkedWxJUaS1XYtuIwZQDaYK+h7WFRZgfE1J+LuAAG7QmwFkb 0bGjS1xn3tpOUzL5tHLecPOtuDTGNbUuv3fYMrH4= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730178AbgD1SiM (ORCPT <rfc822;stable@archiver.kernel.org>); Tue, 28 Apr 2020 14:38:12 -0400 Received: from mail.kernel.org ([198.145.29.99]:56520 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730599AbgD1SiL (ORCPT <rfc822;stable@vger.kernel.org>); Tue, 28 Apr 2020 14:38:11 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6AA3B2076A; Tue, 28 Apr 2020 18:38:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1588099089; bh=xnodsSzlqg3ymuvDfcYdShdmvJ5EVROWyPn26FtHWXs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dICrTxsT+hVIzIMUh1oXFJ5dUEqHj8pQhech0MLl3/m/353H2KjqnWkVBvd/nRkMY oZCXnWEhOFdPhmjcwz2lQQbxKZ0l67lKzKkd9tRtBgYv9RSeHfuERvr/KwjeyPK90x rp/laBOv1Yco/jPR8MKTV8qsjZ5HjWkulZUN6vNs= From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, stable@vger.kernel.org, Kai-Heng Feng <kai.heng.feng@canonical.com>, Mika Westerberg <mika.westerberg@linux.intel.com>, Bjorn Helgaas <bhelgaas@google.com>, Sasha Levin <sashal@kernel.org> Subject: [PATCH 5.4 055/168] PCI/PM: Add missing link delays required by the PCIe spec Date: Tue, 28 Apr 2020 20:23:49 +0200 Message-Id: <20200428182238.853075240@linuxfoundation.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200428182231.704304409@linuxfoundation.org> References: <20200428182231.704304409@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: <stable.vger.kernel.org> X-Mailing-List: stable@vger.kernel.org
Series	None \| expand [5.4,002/168] f2fs: fix to avoid memory leakage in f2fs_listxattr [5.4,005/168] arm64: Fake the IminLine size on systems affected by Neoverse-N1 #1542419 [5.4,007/168] arm64: Silence clang warning on mismatched value/register sizes [5.4,008/168] tools/testing/nvdimm: Fix compilation failure without CONFIG_DEV_DAX_PMEM_COMPAT [5.4,009/168] watchdog: reset last_hw_keepalive time at start [5.4,013/168] scsi: libfc: If PRLI rejected, move rport to PLOGI state [5.4,014/168] ceph: return ceph_mdsc_do_request() errors from __get_parent() [5.4,015/168] ceph: dont skip updating wanted caps when cap is stale [5.4,016/168] pwm: rcar: Fix late Runtime PM enablement [5.4,017/168] nvme-tcp: fix possible crash in write_zeroes processing [5.4,019/168] tools/test/nvdimm: Fix out of tree build [5.4,021/168] nvme: fix deadlock caused by ANA update wrong locking [5.4,023/168] dma-direct: fix data truncation in dma_direct_get_required_mask() [5.4,027/168] kconfig: qconf: Fix a few alignment issues [5.4,028/168] lib/raid6/test: fix build on distros whose /bin/sh is not bash [5.4,029/168] s390/cio: generate delayed uevent for vfio-ccw subchannels [5.4,030/168] s390/cio: avoid duplicated ADD uevents [5.4,033/168] powerpc/pseries: Fix MCE handling on pseries [5.4,034/168] nvme: fix compat address handling in several ioctls [5.4,036/168] pwm: bcm2835: Dynamically allocate base [5.4,037/168] perf/core: Disable page faults when getting phys address [5.4,038/168] drm/amd/display: Calculate scaling ratios on every medium/full update [5.4,042/168] xhci: Wait until link state trainsits to U0 after setting USB_SS_PORT_LS_U0 [5.4,047/168] PCI: pciehp: Prevent deadlock on disconnect [5.4,048/168] ASoC: SOF: trace: fix unconditional free in trace release [5.4,050/168] virtio-blk: improve virtqueue error to BLK_STS [5.4,052/168] scsi: smartpqi: fix call trace in device discovery [5.4,054/168] PCI/ASPM: Allow re-enabling Clock PM [5.4,055/168] PCI/PM: Add missing link delays required by the PCIe spec [5.4,060/168] macsec: avoid to set wrong mtu [5.4,061/168] macvlan: fix null dereference in macvlan_device_event() [5.4,063/168] net: bcmgenet: correct per TX/RX ring statistics [5.4,064/168] net/mlx4_en: avoid indirect call in TX completion [5.4,065/168] net: netrom: Fix potential nr_neigh refcnt leak in nr_add_node [5.4,066/168] net: openvswitch: ovs_ct_exit to be done under ovs_lock [5.4,067/168] net: stmmac: dwmac-meson8b: Add missing boundary to RGMII TX clock array [5.4,068/168] net/x25: Fix x25_neigh refcnt leak when receiving frame [5.4,070/168] selftests: Fix suppress test in fib_tests.sh [5.4,071/168] tcp: cache line align MAX_TCP_HEADER [5.4,072/168] team: fix hang in team_mode_get() [5.4,074/168] net: dsa: b53: Lookup VID in ARL searches when VLAN is enabled [5.4,077/168] net: dsa: b53: Rework ARL bin logic [5.4,078/168] net: dsa: b53: b53_arl_rw_op() needs to select IVL or SVL [5.4,080/168] geneve: use the correct nlattr array in NL_SET_ERR_MSG_ATTR [5.4,083/168] KEYS: Avoid false positive ENOMEM error on key read [5.4,084/168] ALSA: hda: Remove ASUS ROG Zenith from the blacklist [5.4,085/168] ALSA: usb-audio: Add static mapping table for ALC1220-VB-based mobos [5.4,086/168] ALSA: usb-audio: Add connector notifier delegation [5.4,087/168] iio: core: remove extra semi-colon from devm_iio_device_register() macro [5.4,089/168] iio: adc: stm32-adc: fix sleep in atomic context [5.4,090/168] iio: adc: ti-ads8344: properly byte swap value [5.4,093/168] iio: xilinx-xadc: Fix sequencer configuration for aux channels in simultaneous mode [5.4,094/168] iio: xilinx-xadc: Make sure not exceed maximum samplerate [5.4,095/168] USB: sisusbvga: Change port variable from signed to unsigned [5.4,096/168] USB: Add USB_QUIRK_DELAY_CTRL_MSG and USB_QUIRK_DELAY_INIT for Corsair K70 RGB RAPI... [5.4,097/168] USB: early: Handle AMDs spec-compliant identifiers, too [5.4,099/168] USB: hub: Fix handling of connect changes during sleep [5.4,100/168] USB: hub: Revert commit bd0e6c9614b9 ("usb: hub: try old enumeration scheme first f... [5.4,102/168] vmalloc: fix remap_vmalloc_range() bounds checks [5.4,104/168] coredump: fix null pointer dereference on coredump [5.4,107/168] tools/vm: fix cross-compile build [5.4,109/168] ALSA: hda/realtek - Fix unexpected init_amp override [5.4,110/168] ALSA: hda/realtek - Add new codec supported for ALC245 [5.4,113/168] ALSA: usb-audio: Filter out unsupported sample rates on Focusrite devices [5.4,114/168] tpm/tpm_tis: Free IRQ if probing fails [5.4,117/168] KVM: s390: Return last valid slot if approx index is out-of-bounds [5.4,119/168] KVM: VMX: Enable machine check support for 32bit targets [5.4,122/168] usb-storage: Add unusual_devs entry for JMicron JMS566 [5.4,125/168] ASoC: dapm: fixup dapm kcontrol widget [5.4,126/168] mac80211: populate debugfs only after cfg80211 init [5.4,128/168] iwlwifi: pcie: actually release queue memory in TVQM [5.4,130/168] iwlwifi: mvm: limit maximum queue appropriately [5.4,135/168] powerpc/setup_64: Set cache-line-size based on cache-block-size [5.4,137/168] staging: comedi: Fix comedi_device refcnt leak in comedi_open [5.4,138/168] vt: dont hardcode the mem allocation upper bound [5.4,141/168] staging: vt6656: Fix calling conditions of vnt_set_bss_mode [5.4,143/168] staging: vt6656: Fix pairwise key entry save. [5.4,145/168] cdc-acm: close race betrween suspend() and acm_softint [5.4,147/168] UAS: no use logging any details in case of ENODEV [5.4,150/168] usb: dwc3: gadget: Fix request completion check [5.4,151/168] usb: f_fs: Clear OS Extended descriptor counts to zero in ffs_data_reset() [5.4,154/168] xhci: Fix handling halted endpoint even if endpoint ring appears empty [5.4,155/168] xhci: prevent bus suspend if a roothub port detected a over-current condition [5.4,156/168] xhci: Dont clear hub TT buffer on ep0 protocol stall [5.4,158/168] Revert "serial: uartps: Fix uartps_major handling" [5.4,160/168] Revert "serial: uartps: Fix error path when alloc failed" [5.4,161/168] Revert "serial: uartps: Do not allow use aliases >= MAX_UART_INSTANCES" [5.4,164/168] Revert "serial: uartps: Register own uart console and driver structures" [5.4,166/168] powerpc/mm: Fix CONFIG_PPC_KUAP_DEBUG on PPC32 [5.4,168/168] s390/mm: fix page table upgrade vs 2ndary address mode accesses

[5.4,055/168] PCI/PM: Add missing link delays required by the PCIe spec

Commit Message

Patch