Message ID | 20220829114850.4341-1-abhsahu@nvidia.com |
---|---|
Headers | show |
Series | vfio/pci: power management changes | expand |
On Mon, 29 Aug 2022 17:18:45 +0530 Abhishek Sahu <abhsahu@nvidia.com> wrote: > This is part 2 for the vfio-pci driver power management support. > Part 1 of this patch series was related to adding D3cold support > when there is no user of the VFIO device and has already merged in the > mainline kernel. If we enable the runtime power management for > vfio-pci device in the guest OS, then the device is being runtime > suspended (for linux guest OS) and the PCI device will be put into > D3hot state (in function vfio_pm_config_write()). If the D3cold > state can be used instead of D3hot, then it will help in saving > maximum power. The D3cold state can't be possible with native > PCI PM. It requires interaction with platform firmware which is > system-specific. To go into low power states (Including D3cold), > the runtime PM framework can be used which internally interacts > with PCI and platform firmware and puts the device into the > lowest possible D-States. > > This patch series adds the support to engage runtime power management > initiated by the user. Since D3cold state can't be achieved by writing > PCI standard PM config registers, so new device features have been > added in DEVICE_FEATURE IOCTL for low power entry and exit related > handling. For the PCI device, this low power state will be D3cold > (if the platform supports the D3cold state). The hypervisors can implement > virtual ACPI methods to make the integration with guest OS. > For example, in guest Linux OS if PCI device ACPI node has > _PR3 and _PR0 power resources with _ON/_OFF method, then guest > Linux OS makes the _OFF call during D3cold transition and > then _ON during D0 transition. The hypervisor can tap these virtual > ACPI calls and then do the low power related IOCTL. > > The entry device feature has two variants. These two variants are mainly > to support the different behaviour for the low power entry. > If there is any access for the VFIO device on the host side, then the > device will be moved out of the low power state without the user's > guest driver involvement. Some devices (for example NVIDIA VGA or > 3D controller) require the user's guest driver involvement for > each low-power entry. In the first variant, the host can move the > device into low power without any guest driver involvement while > in the second variant, the host will send a notification to user > through eventfd and then user guest driver needs to move the device > into low power. The hypervisor can implement the virtual PME > support to notify the guest OS. Please refer > https://lore.kernel.org/lkml/20220701110814.7310-7-abhsahu@nvidia.com/ > where initially this virtual PME was implemented in the vfio-pci driver > itself, but later-on, it has been decided that hypervisor can implement > this. > > * Changes in v7 Applied to vfio next branch for v6.1. Thanks, Alex
On 9/3/2022 12:12 AM, Alex Williamson wrote: > On Mon, 29 Aug 2022 17:18:45 +0530 > Abhishek Sahu <abhsahu@nvidia.com> wrote: > >> This is part 2 for the vfio-pci driver power management support. >> Part 1 of this patch series was related to adding D3cold support >> when there is no user of the VFIO device and has already merged in the >> mainline kernel. If we enable the runtime power management for >> vfio-pci device in the guest OS, then the device is being runtime >> suspended (for linux guest OS) and the PCI device will be put into >> D3hot state (in function vfio_pm_config_write()). If the D3cold >> state can be used instead of D3hot, then it will help in saving >> maximum power. The D3cold state can't be possible with native >> PCI PM. It requires interaction with platform firmware which is >> system-specific. To go into low power states (Including D3cold), >> the runtime PM framework can be used which internally interacts >> with PCI and platform firmware and puts the device into the >> lowest possible D-States. >> >> This patch series adds the support to engage runtime power management >> initiated by the user. Since D3cold state can't be achieved by writing >> PCI standard PM config registers, so new device features have been >> added in DEVICE_FEATURE IOCTL for low power entry and exit related >> handling. For the PCI device, this low power state will be D3cold >> (if the platform supports the D3cold state). The hypervisors can implement >> virtual ACPI methods to make the integration with guest OS. >> For example, in guest Linux OS if PCI device ACPI node has >> _PR3 and _PR0 power resources with _ON/_OFF method, then guest >> Linux OS makes the _OFF call during D3cold transition and >> then _ON during D0 transition. The hypervisor can tap these virtual >> ACPI calls and then do the low power related IOCTL. >> >> The entry device feature has two variants. These two variants are mainly >> to support the different behaviour for the low power entry. >> If there is any access for the VFIO device on the host side, then the >> device will be moved out of the low power state without the user's >> guest driver involvement. Some devices (for example NVIDIA VGA or >> 3D controller) require the user's guest driver involvement for >> each low-power entry. In the first variant, the host can move the >> device into low power without any guest driver involvement while >> in the second variant, the host will send a notification to user >> through eventfd and then user guest driver needs to move the device >> into low power. The hypervisor can implement the virtual PME >> support to notify the guest OS. Please refer >> https://lore.kernel.org/lkml/20220701110814.7310-7-abhsahu@nvidia.com/ >> where initially this virtual PME was implemented in the vfio-pci driver >> itself, but later-on, it has been decided that hypervisor can implement >> this. >> >> * Changes in v7 > > Applied to vfio next branch for v6.1. Thanks, > > Alex > Thanks Alex for your guidance and support. Regards, Abhishek