Message ID | 20220517100219.15146-1-abhsahu@nvidia.com |
---|---|
Headers | show |
Series | vfio/pci: power management changes | expand |
On 5/17/2022 11:57 PM, Alex Williamson wrote: > On Tue, 17 May 2022 15:32:17 +0530 > Abhishek Sahu <abhsahu@nvidia.com> wrote: > >> According to [PCIe v5 9.6.2] for PF Device Power Management States >> >> "The PF's power management state (D-state) has global impact on its >> associated VFs. If a VF does not implement the Power Management >> Capability, then it behaves as if it is in an equivalent >> power state of its associated PF. >> >> If a VF implements the Power Management Capability, the Device behavior >> is undefined if the PF is placed in a lower power state than the VF. >> Software should avoid this situation by placing all VFs in lower power >> state before lowering their associated PF's power state." >> >> From the vfio driver side, user can enable SR-IOV when the PF is in D3hot >> state. If VF does not implement the Power Management Capability, then >> the VF will be actually in D3hot state and then the VF BAR access will >> fail. If VF implements the Power Management Capability, then VF will >> assume that its current power state is D0 when the PF is D3hot and >> in this case, the behavior is undefined. >> >> To support PF power management, we need to create power management >> dependency between PF and its VF's. The runtime power management support >> may help with this where power management dependencies are supported >> through device links. But till we have such support in place, we can >> disallow the PF to go into low power state, if PF has VF enabled. >> There can be a case, where user first enables the VF's and then >> disables the VF's. If there is no user of PF, then the PF can put into >> D3hot state again. But with this patch, the PF will still be in D0 >> state after disabling VF's since detecting this case inside >> vfio_pci_core_sriov_configure() requires access to >> struct vfio_device::open_count along with its locks. But the subsequent >> patches related to runtime PM will handle this case since runtime PM >> maintains its own usage count. >> >> Also, vfio_pci_core_sriov_configure() can be called at any time >> (with and without vfio pci device user), so the power state change >> needs to be protected with the required locks. >> >> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> >> --- >> drivers/vfio/pci/vfio_pci_core.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c >> index b9f222ca48cf..4fe9a4efc751 100644 >> --- a/drivers/vfio/pci/vfio_pci_core.c >> +++ b/drivers/vfio/pci/vfio_pci_core.c >> @@ -217,6 +217,10 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat >> bool needs_restore = false, needs_save = false; >> int ret; >> >> + /* Prevent changing power state for PFs with VFs enabled */ >> + if (pci_num_vf(pdev) && state > PCI_D0) >> + return -EBUSY; >> + >> if (vdev->needs_pm_restore) { >> if (pdev->current_state < PCI_D3hot && state >= PCI_D3hot) { >> pci_save_state(pdev); >> @@ -1960,6 +1964,13 @@ int vfio_pci_core_sriov_configure(struct vfio_pci_core_device *vdev, >> } >> list_add_tail(&vdev->sriov_pfs_item, &vfio_pci_sriov_pfs); >> mutex_unlock(&vfio_pci_sriov_pfs_mutex); >> + >> + /* >> + * The PF power state should always be higher than the VF power >> + * state. If PF is in the low power state, then change the >> + * power state to D0 first before enabling SR-IOV. >> + */ >> + vfio_pci_lock_and_set_power_state(vdev, PCI_D0); > > But we need to hold memory_lock across the next function or else > userspace could race a write to the PM register to set D3 before > pci_num_vf() can protect us. Thanks, > > Alex > Thanks Alex. Yes. We need to bring pci_enable_sriov() also to protect this race condition. I will update this in my next version. Regards, Abhishek >> ret = pci_enable_sriov(pdev, nr_virtfn); >> if (ret) >> goto out_del; >