Message ID | 20201022114026.31968-1-marcel.apfelbaum@gmail.com |
---|---|
State | New |
Headers | show |
Series | pci: Refuse to hotplug PCI Devices when the Guest OS is not ready | expand |
On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote: > From: Marcel Apfelbaum <marcel@redhat.com> > > During PCIe Root Port's transition from Power-Off to Power-ON (or vice-versa) > the "Slot Control Register" has the "Power Indicator Control" > set to "Blinking" expressing a "power transition" mode. > > Any hotplug operation during the "power transition" mode is not permitted > or at least not expected by the Guest OS leading to strange failures. > > Detect and refuse hotplug operations in such case. > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > --- > hw/pci/pcie.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > index 5b48bae0f6..2fe5c1473f 100644 > --- a/hw/pci/pcie.c > +++ b/hw/pci/pcie.c > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, > PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev); > uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap; > uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP); > + uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL); > > /* Check if hot-plug is disabled on the slot */ > if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) { > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, > return; > } > > + if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) { > + error_setg(errp, "Hot-plug failed: %s is in Power Transition", > + DEVICE(hotplug_pdev)->id); > + return; > + } > + > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp); > } Probably the only way to handle for existing machine types. For new ones, can't we queue it in host memory somewhere? > -- > 2.17.2
On Thu, 22 Oct 2020 08:06:55 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote: > > From: Marcel Apfelbaum <marcel@redhat.com> > > > > During PCIe Root Port's transition from Power-Off to Power-ON (or vice-versa) > > the "Slot Control Register" has the "Power Indicator Control" > > set to "Blinking" expressing a "power transition" mode. > > > > Any hotplug operation during the "power transition" mode is not permitted > > or at least not expected by the Guest OS leading to strange failures. > > > > Detect and refuse hotplug operations in such case. > > > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > --- > > hw/pci/pcie.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > index 5b48bae0f6..2fe5c1473f 100644 > > --- a/hw/pci/pcie.c > > +++ b/hw/pci/pcie.c > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, > > PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev); > > uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap; > > uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP); > > + uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL); > > > > /* Check if hot-plug is disabled on the slot */ > > if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) { > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, > > return; > > } > > > > + if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) { > > + error_setg(errp, "Hot-plug failed: %s is in Power Transition", > > + DEVICE(hotplug_pdev)->id); > > + return; > > + } > > + > > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp); > > } > > Probably the only way to handle for existing machine types. > For new ones, can't we queue it in host memory somewhere? I'm not actually convinced we can't do that even for existing machine types. So I'm a bit hesitant to suggest going ahead with this without looking a bit closer at whether we can implement a wait-for-ready in qemu, rather than forcing every user of qemu (human or machine) to do so. -- David Gibson <dgibson@redhat.com> Principal Software Engineer, Virtualization, Red Hat
On Thu, Oct 22, 2020 at 11:56:32PM +1100, David Gibson wrote: > On Thu, 22 Oct 2020 08:06:55 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote: > > > From: Marcel Apfelbaum <marcel@redhat.com> > > > > > > During PCIe Root Port's transition from Power-Off to Power-ON (or vice-versa) > > > the "Slot Control Register" has the "Power Indicator Control" > > > set to "Blinking" expressing a "power transition" mode. > > > > > > Any hotplug operation during the "power transition" mode is not permitted > > > or at least not expected by the Guest OS leading to strange failures. > > > > > > Detect and refuse hotplug operations in such case. > > > > > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > > --- > > > hw/pci/pcie.c | 7 +++++++ > > > 1 file changed, 7 insertions(+) > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > > index 5b48bae0f6..2fe5c1473f 100644 > > > --- a/hw/pci/pcie.c > > > +++ b/hw/pci/pcie.c > > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, > > > PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev); > > > uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap; > > > uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP); > > > + uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL); > > > > > > /* Check if hot-plug is disabled on the slot */ > > > if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) { > > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, > > > return; > > > } > > > > > > + if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) { > > > + error_setg(errp, "Hot-plug failed: %s is in Power Transition", > > > + DEVICE(hotplug_pdev)->id); > > > + return; > > > + } > > > + > > > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp); > > > } > > > > Probably the only way to handle for existing machine types. > > For new ones, can't we queue it in host memory somewhere? > > I'm not actually convinced we can't do that even for existing machine > types. The difficulty would be in migrating the extra "reuested but defferred" state. > So I'm a bit hesitant to suggest going ahead with this without > looking a bit closer at whether we can implement a wait-for-ready in > qemu, rather than forcing every user of qemu (human or machine) to do > so. > > > -- > David Gibson <dgibson@redhat.com> > Principal Software Engineer, Virtualization, Red Hat
Hi David, Michael, On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgibson@redhat.com> wrote: > On Thu, 22 Oct 2020 08:06:55 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote: > > > From: Marcel Apfelbaum <marcel@redhat.com> > > > > > > During PCIe Root Port's transition from Power-Off to Power-ON (or > vice-versa) > > > the "Slot Control Register" has the "Power Indicator Control" > > > set to "Blinking" expressing a "power transition" mode. > > > > > > Any hotplug operation during the "power transition" mode is not > permitted > > > or at least not expected by the Guest OS leading to strange failures. > > > > > > Detect and refuse hotplug operations in such case. > > > > > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > > --- > > > hw/pci/pcie.c | 7 +++++++ > > > 1 file changed, 7 insertions(+) > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > > index 5b48bae0f6..2fe5c1473f 100644 > > > --- a/hw/pci/pcie.c > > > +++ b/hw/pci/pcie.c > > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler > *hotplug_dev, DeviceState *dev, > > > PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev); > > > uint8_t *exp_cap = hotplug_pdev->config + > hotplug_pdev->exp.exp_cap; > > > uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP); > > > + uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL); > > > > > > /* Check if hot-plug is disabled on the slot */ > > > if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) { > > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler > *hotplug_dev, DeviceState *dev, > > > return; > > > } > > > > > > + if ((sltctl & PCI_EXP_SLTCTL_PIC) == > PCI_EXP_SLTCTL_PWR_IND_BLINK) { > > > + error_setg(errp, "Hot-plug failed: %s is in Power Transition", > > > + DEVICE(hotplug_pdev)->id); > > > + return; > > > + } > > > + > > > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp); > > > } > > > > Probably the only way to handle for existing machine types. > I agree > > For new ones, can't we queue it in host memory somewhere? > > I am not sure I understand what will be the flow. - The user asks for a hotplug operation. - QEMU deferred operation. After that the operation may still fail, how would the user know if the operation succeeded or not? > I'm not actually convinced we can't do that even for existing machine > types. Is a Guest visible change, I don't think we can do it. > So I'm a bit hesitant to suggest going ahead with this without > looking a bit closer at whether we can implement a wait-for-ready in > qemu, rather than forcing every user of qemu (human or machine) to do > so. > While I agree it is a pain from the usability point of view, hotplug operations are allowed to fail. This is not more than a corner case, ensuring the right response (gracefully erroring out) may be enough. Thanks, Marcel > > > -- > David Gibson <dgibson@redhat.com> > Principal Software Engineer, Virtualization, Red Hat > <div dir="ltr"><div dir="ltr">Hi David, Michael,</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 22, 2020 at 3:56 PM David Gibson <<a href="mailto:dgibson@redhat.com">dgibson@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, 22 Oct 2020 08:06:55 -0400<br> "Michael S. Tsirkin" <<a href="mailto:mst@redhat.com" target="_blank">mst@redhat.com</a>> wrote:<br> <br> > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:<br> > > From: Marcel Apfelbaum <<a href="mailto:marcel@redhat.com" target="_blank">marcel@redhat.com</a>><br> > > <br> > > During PCIe Root Port's transition from Power-Off to Power-ON (or vice-versa)<br> > > the "Slot Control Register" has the "Power Indicator Control"<br> > > set to "Blinking" expressing a "power transition" mode.<br> > > <br> > > Any hotplug operation during the "power transition" mode is not permitted<br> > > or at least not expected by the Guest OS leading to strange failures.<br> > > <br> > > Detect and refuse hotplug operations in such case.<br> > > <br> > > Signed-off-by: Marcel Apfelbaum <<a href="mailto:marcel.apfelbaum@gmail.com" target="_blank">marcel.apfelbaum@gmail.com</a>><br> > > ---<br> > > hw/pci/pcie.c | 7 +++++++<br> > > 1 file changed, 7 insertions(+)<br> > > <br> > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c<br> > > index 5b48bae0f6..2fe5c1473f 100644<br> > > --- a/hw/pci/pcie.c<br> > > +++ b/hw/pci/pcie.c<br> > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,<br> > > PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);<br> > > uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap;<br> > > uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);<br> > > + uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);<br> > > <br> > > /* Check if hot-plug is disabled on the slot */<br> > > if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {<br> > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,<br> > > return;<br> > > }<br> > > <br> > > + if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) {<br> > > + error_setg(errp, "Hot-plug failed: %s is in Power Transition",<br> > > + DEVICE(hotplug_pdev)->id);<br> > > + return;<br> > > + }<br> > > +<br> > > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp);<br> > > } <br> > <br> > Probably the only way to handle for existing machine types.<br></blockquote><div><br></div><div>I agree</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> > For new ones, can't we queue it in host memory somewhere?<br> <br></blockquote><div><br></div><div>I am not sure I understand what will be the flow.</div><div> - The user asks for a hotplug operation.</div><div> - QEMU deferred operation.</div><div>After that the operation may still fail, how would the user know if the operation</div><div>succeeded or not?</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> I'm not actually convinced we can't do that even for existing machine<br> types. </blockquote><div><br></div><div>Is a Guest visible change, I don't think we can do it.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> So I'm a bit hesitant to suggest going ahead with this without<br> looking a bit closer at whether we can implement a wait-for-ready in<br> qemu, rather than forcing every user of qemu (human or machine) to do<br> so.<br></blockquote><div><br></div><div>While I agree it is a pain from the usability point of view, hotplug operations</div><div>are allowed to fail. This is not more than a corner case, ensuring the right</div><div>response (gracefully erroring out) may be enough.</div><div><br></div><div>Thanks,</div><div>Marcel</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> <br> -- <br> David Gibson <<a href="mailto:dgibson@redhat.com" target="_blank">dgibson@redhat.com</a>><br> Principal Software Engineer, Virtualization, Red Hat<br> </blockquote></div></div>
On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote: > Hi David, Michael, > > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgibson@redhat.com> wrote: > > On Thu, 22 Oct 2020 08:06:55 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote: > > > From: Marcel Apfelbaum <marcel@redhat.com> > > > > > > During PCIe Root Port's transition from Power-Off to Power-ON (or > vice-versa) > > > the "Slot Control Register" has the "Power Indicator Control" > > > set to "Blinking" expressing a "power transition" mode. > > > > > > Any hotplug operation during the "power transition" mode is not > permitted > > > or at least not expected by the Guest OS leading to strange failures. > > > > > > Detect and refuse hotplug operations in such case. > > > > > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > > --- > > > hw/pci/pcie.c | 7 +++++++ > > > 1 file changed, 7 insertions(+) > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > > index 5b48bae0f6..2fe5c1473f 100644 > > > --- a/hw/pci/pcie.c > > > +++ b/hw/pci/pcie.c > > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler > *hotplug_dev, DeviceState *dev, > > >   PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev); > > >   uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev-> > exp.exp_cap; > > >   uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP); > > > +  uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL); > > > > > >   /* Check if hot-plug is disabled on the slot */ > > >   if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) { > > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler > *hotplug_dev, DeviceState *dev, > > >     return; > > >   } > > > > > > +  if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) > { > > > +    error_setg(errp, "Hot-plug failed: %s is in Power Transition", > > > +          DEVICE(hotplug_pdev)->id); > > > +    return; > > > +  } > > > + > > >   pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp); > > > } > > > > Probably the only way to handle for existing machine types. > > > I agree >  > > > For new ones, can't we queue it in host memory somewhere? > > > > I am not sure I understand what will be the flow. >  - The user asks for a hotplug operation. >  - QEMU deferred operation. > After that the operation may still fail, how would the user know if the > operation > succeeded or not? How can it fail? It's just a button press ... >  > > I'm not actually convinced we can't do that even for existing machine > types. > > > Is a Guest visible change, I don't think we can do it. >  > > So I'm a bit hesitant to suggest going ahead with this without > looking a bit closer at whether we can implement a wait-for-ready in > qemu, rather than forcing every user of qemu (human or machine) to do > so. > > > While I agree it is a pain from the usability point of view, hotplug operations > are allowed to fail. This is not more than a corner case, ensuring the right > response (gracefully erroring out) may be enough. > > Thanks, > Marcel > I don't think they ever failed in the past so management is unlikely to handle the failure by retrying ... > > > > -- > David Gibson <dgibson@redhat.com> > Principal Software Engineer, Virtualization, Red Hat >
On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <mst@redhat.com> wrote: > On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote: > > Hi David, Michael, > > > > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgibson@redhat.com> wrote: > > > > On Thu, 22 Oct 2020 08:06:55 -0400 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote: > > > > From: Marcel Apfelbaum <marcel@redhat.com> > > > > > > > > During PCIe Root Port's transition from Power-Off to Power-ON (or > > vice-versa) > > > > the "Slot Control Register" has the "Power Indicator Control" > > > > set to "Blinking" expressing a "power transition" mode. > > > > > > > > Any hotplug operation during the "power transition" mode is not > > permitted > > > > or at least not expected by the Guest OS leading to strange > failures. > > > > > > > > Detect and refuse hotplug operations in such case. > > > > > > > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > > > --- > > > > hw/pci/pcie.c | 7 +++++++ > > > > 1 file changed, 7 insertions(+) > > > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > > > index 5b48bae0f6..2fe5c1473f 100644 > > > > --- a/hw/pci/pcie.c > > > > +++ b/hw/pci/pcie.c > > > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler > > *hotplug_dev, DeviceState *dev, > > > > PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev); > > > > uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev-> > > exp.exp_cap; > > > > uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP); > > > > + uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL); > > > > > > > > /* Check if hot-plug is disabled on the slot */ > > > > if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) { > > > > @@ -418,6 +419,12 @@ void > pcie_cap_slot_pre_plug_cb(HotplugHandler > > *hotplug_dev, DeviceState *dev, > > > > return; > > > > } > > > > > > > > + if ((sltctl & PCI_EXP_SLTCTL_PIC) == > PCI_EXP_SLTCTL_PWR_IND_BLINK) > > { > > > > + error_setg(errp, "Hot-plug failed: %s is in Power > Transition", > > > > + DEVICE(hotplug_pdev)->id); > > > > + return; > > > > + } > > > > + > > > > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, > errp); > > > > } > > > > > > Probably the only way to handle for existing machine types. > > > > > > I agree > > > > > > > For new ones, can't we queue it in host memory somewhere? > > > > > > > > I am not sure I understand what will be the flow. > > - The user asks for a hotplug operation. > > - QEMU deferred operation. > > After that the operation may still fail, how would the user know if the > > operation > > succeeded or not? > > > How can it fail? It's just a button press ... > > Currently we have "Hotplug unsupported." With this change we have "Guest/System not ready" > > > > > > I'm not actually convinced we can't do that even for existing machine > > types. > > > > > > Is a Guest visible change, I don't think we can do it. > > > > > > So I'm a bit hesitant to suggest going ahead with this without > > looking a bit closer at whether we can implement a wait-for-ready in > > qemu, rather than forcing every user of qemu (human or machine) to do > > so. > > > > > > While I agree it is a pain from the usability point of view, hotplug > operations > > are allowed to fail. This is not more than a corner case, ensuring the > right > > response (gracefully erroring out) may be enough. > > > > Thanks, > > Marcel > > > > > I don't think they ever failed in the past so management is unlikely > to handle the failure by retrying ... > That would require some management handling, yes. But even without a "retry", failing is better than strange OS behavior. Trying a better alternative like deferring the operation for new machines would make sense, however is out of the scope of this patch that simply detects the error leaving us in a slightly better state than today. Thanks, Marcel > > > > > > > > > -- > > David Gibson <dgibson@redhat.com> > > Principal Software Engineer, Virtualization, Red Hat > > > > <div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <<a href="mailto:mst@redhat.com">mst@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote:<br> > Hi David, Michael,<br> > <br> > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <<a href="mailto:dgibson@redhat.com" target="_blank">dgibson@redhat.com</a>> wrote:<br> > <br> > On Thu, 22 Oct 2020 08:06:55 -0400<br> > "Michael S. Tsirkin" <<a href="mailto:mst@redhat.com" target="_blank">mst@redhat.com</a>> wrote:<br> > <br> > > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:<br> > > > From: Marcel Apfelbaum <<a href="mailto:marcel@redhat.com" target="_blank">marcel@redhat.com</a>><br> > > ><br> > > > During PCIe Root Port's transition from Power-Off to Power-ON (or<br> > vice-versa)<br> > > > the "Slot Control Register" has the "Power Indicator Control"<br> > > > set to "Blinking" expressing a "power transition" mode.<br> > > ><br> > > > Any hotplug operation during the "power transition" mode is not<br> > permitted<br> > > > or at least not expected by the Guest OS leading to strange failures.<br> > > ><br> > > > Detect and refuse hotplug operations in such case.<br> > > ><br> > > > Signed-off-by: Marcel Apfelbaum <<a href="mailto:marcel.apfelbaum@gmail.com" target="_blank">marcel.apfelbaum@gmail.com</a>><br> > > > ---<br> > > > hw/pci/pcie.c | 7 +++++++<br> > > > 1 file changed, 7 insertions(+)<br> > > ><br> > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c<br> > > > index 5b48bae0f6..2fe5c1473f 100644<br> > > > --- a/hw/pci/pcie.c<br> > > > +++ b/hw/pci/pcie.c<br> > > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler<br> > *hotplug_dev, DeviceState *dev,<br> > > > PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);<br> > > > uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev-><br> > exp.exp_cap;<br> > > > uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);<br> > > > + uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);<br> > > > <br> > > > /* Check if hot-plug is disabled on the slot */<br> > > > if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {<br> > > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler<br> > *hotplug_dev, DeviceState *dev,<br> > > > return;<br> > > > }<br> > > > <br> > > > + if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK)<br> > {<br> > > > + error_setg(errp, "Hot-plug failed: %s is in Power Transition",<br> > > > + DEVICE(hotplug_pdev)->id);<br> > > > + return;<br> > > > + }<br> > > > +<br> > > > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp);<br> > > > } <br> > ><br> > > Probably the only way to handle for existing machine types.<br> > <br> > <br> > I agree<br> > <br> > <br> > > For new ones, can't we queue it in host memory somewhere?<br> > <br> > <br> > <br> > I am not sure I understand what will be the flow.<br> > - The user asks for a hotplug operation.<br> > - QEMU deferred operation.<br> > After that the operation may still fail, how would the user know if the<br> > operation<br> > succeeded or not?<br> <br> <br> How can it fail? It's just a button press ...<br> <br></blockquote><div><br></div><div>Currently we have "Hotplug unsupported."</div><div>With this change we have "Guest/System not ready"</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> > <br> > <br> > I'm not actually convinced we can't do that even for existing machine<br> > types. <br> > <br> > <br> > Is a Guest visible change, I don't think we can do it.<br> > <br> > <br> > So I'm a bit hesitant to suggest going ahead with this without<br> > looking a bit closer at whether we can implement a wait-for-ready in<br> > qemu, rather than forcing every user of qemu (human or machine) to do<br> > so.<br> > <br> > <br> > While I agree it is a pain from the usability point of view, hotplug operations<br> > are allowed to fail. This is not more than a corner case, ensuring the right<br> > response (gracefully erroring out) may be enough.<br> > <br> > Thanks,<br> > Marcel<br> > <br> <br> <br> I don't think they ever failed in the past so management is unlikely<br> to handle the failure by retrying ...<br></blockquote><div><br></div><div>That would require some management handling, yes.</div><div>But even without a "retry", failing is better than strange OS behavior.</div><div><br></div><div>Trying a better alternative like deferring the operation for new machines<br></div><div>would make sense, however is out of the scope of this patch that simply</div><div>detects the error leaving us in a slightly better state than today.</div><div><br></div><div>Thanks,</div><div>Marcel</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> > <br> > <br> > <br> > --<br> > David Gibson <<a href="mailto:dgibson@redhat.com" target="_blank">dgibson@redhat.com</a>><br> > Principal Software Engineer, Virtualization, Red Hat<br> > <br> <br> </blockquote></div></div>
On Thu, Oct 22, 2020 at 05:10:43PM +0300, Marcel Apfelbaum wrote: > > > On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote: > > Hi David, Michael, > > > > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgibson@redhat.com> wrote: > > > >   On Thu, 22 Oct 2020 08:06:55 -0400 > >   "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > >   > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote: > >   > > From: Marcel Apfelbaum <marcel@redhat.com> > >   > > > >   > > During PCIe Root Port's transition from Power-Off to Power-ON (or > >   vice-versa) > >   > > the "Slot Control Register" has the "Power Indicator Control" > >   > > set to "Blinking" expressing a "power transition" mode. > >   > > > >   > > Any hotplug operation during the "power transition" mode is not > >   permitted > >   > > or at least not expected by the Guest OS leading to strange > failures. > >   > > > >   > > Detect and refuse hotplug operations in such case. > >   > > > >   > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > >   > > --- > >   > > hw/pci/pcie.c | 7 +++++++ > >   > > 1 file changed, 7 insertions(+) > >   > > > >   > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > >   > > index 5b48bae0f6..2fe5c1473f 100644 > >   > > --- a/hw/pci/pcie.c > >   > > +++ b/hw/pci/pcie.c > >   > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler > >   *hotplug_dev, DeviceState *dev, > >   > >   PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev); > >   > >   uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev-> > >   exp.exp_cap; > >   > >   uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP); > >   > > +  uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL); > >   > > > >   > >   /* Check if hot-plug is disabled on the slot */ > >   > >   if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) { > >   > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb > (HotplugHandler > >   *hotplug_dev, DeviceState *dev, > >   > >     return; > >   > >   } > >   > > > >   > > +  if ((sltctl & PCI_EXP_SLTCTL_PIC) == > PCI_EXP_SLTCTL_PWR_IND_BLINK) > >   { > >   > > +    error_setg(errp, "Hot-plug failed: %s is in Power > Transition", > >   > > +          DEVICE(hotplug_pdev)->id); > >   > > +    return; > >   > > +  } > >   > > + > >   > >   pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, > errp); > >   > > } > >   > > >   > Probably the only way to handle for existing machine types. > > > > > > I agree > >  > > > >   > For new ones, can't we queue it in host memory somewhere? > > > > > > > > I am not sure I understand what will be the flow. > >  - The user asks for a hotplug operation. > >  - QEMU deferred operation. > > After that the operation may still fail, how would the user know if the > > operation > > succeeded or not? > > > How can it fail? It's just a button press ... > > > > Currently we have "Hotplug unsupported." > With this change we have "Guest/System not ready" Hotplug unsupported is not an error that can trigger with a well behaved management such as libvirt. >  > > >  > > > >   I'm not actually convinced we can't do that even for existing machine > >   types. > > > > > > Is a Guest visible change, I don't think we can do it. > >  > > > >   So I'm a bit hesitant to suggest going ahead with this without > >   looking a bit closer at whether we can implement a wait-for-ready in > >   qemu, rather than forcing every user of qemu (human or machine) to do > >   so. > > > > > > While I agree it is a pain from the usability point of view, hotplug > operations > > are allowed to fail. This is not more than a corner case, ensuring the > right > > response (gracefully erroring out) may be enough. > > > > Thanks, > > Marcel > > > > > I don't think they ever failed in the past so management is unlikely > to handle the failure by retrying ... > > > That would require some management handling, yes. > But even without a "retry", failing is better than strange OS behavior. > > Trying a better alternative like deferring the operation for new machines > would make sense, however is out of the scope of this patch Expand the scope please. The scope should be "solve a problem xx" not "solve a problem xx by doing abc". > that simply > detects the error leaving us in a slightly better state than today. > > Thanks, > Marcel Not applying a patch is the only tool we maintainers have to influence people to solve the problem fully. That's why I'm not inclined to apply "slightly better" patches generally. > > > > > > > > > >   -- > >   David Gibson <dgibson@redhat.com> > >   Principal Software Engineer, Virtualization, Red Hat > > > >
On Thu, Oct 22, 2020 at 5:33 PM Michael S. Tsirkin <mst@redhat.com> wrote: > On Thu, Oct 22, 2020 at 05:10:43PM +0300, Marcel Apfelbaum wrote: > > > > > > On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <mst@redhat.com> > wrote: > > > > On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote: > > > Hi David, Michael, > > > > > > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgibson@redhat.com> > wrote: > > > > > > On Thu, 22 Oct 2020 08:06:55 -0400 > > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum > wrote: > > > > > From: Marcel Apfelbaum <marcel@redhat.com> > > > > > > > > > > During PCIe Root Port's transition from Power-Off to > Power-ON (or > > > vice-versa) > > > > > the "Slot Control Register" has the "Power Indicator > Control" > > > > > set to "Blinking" expressing a "power transition" mode. > > > > > > > > > > Any hotplug operation during the "power transition" mode > is not > > > permitted > > > > > or at least not expected by the Guest OS leading to strange > > failures. > > > > > > > > > > Detect and refuse hotplug operations in such case. > > > > > > > > > > Signed-off-by: Marcel Apfelbaum < > marcel.apfelbaum@gmail.com> > > > > > --- > > > > > hw/pci/pcie.c | 7 +++++++ > > > > > 1 file changed, 7 insertions(+) > > > > > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > > > > index 5b48bae0f6..2fe5c1473f 100644 > > > > > --- a/hw/pci/pcie.c > > > > > +++ b/hw/pci/pcie.c > > > > > @@ -410,6 +410,7 @@ void > pcie_cap_slot_pre_plug_cb(HotplugHandler > > > *hotplug_dev, DeviceState *dev, > > > > > PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev); > > > > > uint8_t *exp_cap = hotplug_pdev->config + > hotplug_pdev-> > > > exp.exp_cap; > > > > > uint32_t sltcap = pci_get_word(exp_cap + > PCI_EXP_SLTCAP); > > > > > + uint32_t sltctl = pci_get_word(exp_cap + > PCI_EXP_SLTCTL); > > > > > > > > > > /* Check if hot-plug is disabled on the slot */ > > > > > if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) > == 0) { > > > > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb > > (HotplugHandler > > > *hotplug_dev, DeviceState *dev, > > > > > return; > > > > > } > > > > > > > > > > + if ((sltctl & PCI_EXP_SLTCTL_PIC) == > > PCI_EXP_SLTCTL_PWR_IND_BLINK) > > > { > > > > > + error_setg(errp, "Hot-plug failed: %s is in Power > > Transition", > > > > > + DEVICE(hotplug_pdev)->id); > > > > > + return; > > > > > + } > > > > > + > > > > > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), > dev, > > errp); > > > > > } > > > > > > > > Probably the only way to handle for existing machine types. > > > > > > > > > I agree > > > > > > > > > > For new ones, can't we queue it in host memory somewhere? > > > > > > > > > > > > I am not sure I understand what will be the flow. > > > - The user asks for a hotplug operation. > > > - QEMU deferred operation. > > > After that the operation may still fail, how would the user know > if the > > > operation > > > succeeded or not? > > > > > > How can it fail? It's just a button press ... > > > > > > > > Currently we have "Hotplug unsupported." > > With this change we have "Guest/System not ready" > > > Hotplug unsupported is not an error that can trigger with > a well behaved management such as libvirt. > > > > > > > > > > > > > > > I'm not actually convinced we can't do that even for existing > machine > > > types. > > > > > > > > > Is a Guest visible change, I don't think we can do it. > > > > > > > > > So I'm a bit hesitant to suggest going ahead with this without > > > looking a bit closer at whether we can implement a > wait-for-ready in > > > qemu, rather than forcing every user of qemu (human or > machine) to do > > > so. > > > > > > > > > While I agree it is a pain from the usability point of view, > hotplug > > operations > > > are allowed to fail. This is not more than a corner case, ensuring > the > > right > > > response (gracefully erroring out) may be enough. > > > > > > Thanks, > > > Marcel > > > > > > > > > I don't think they ever failed in the past so management is unlikely > > to handle the failure by retrying ... > > > > > > That would require some management handling, yes. > > But even without a "retry", failing is better than strange OS behavior. > > > > Trying a better alternative like deferring the operation for new machines > > would make sense, however is out of the scope of this patch > > Expand the scope please. The scope should be "solve a problem xx" not > "solve a problem xx by doing abc". > > The scope is detecting a hotplug error early instead passing to the Guest OS a hotplug operation that we know it will fail. > > that simply > > detects the error leaving us in a slightly better state than today. > > > > Thanks, > > Marcel > > Not applying a patch is the only tool we maintainers have to influence > people to solve the problem fully. That's why I'm not inclined to apply > "slightly better" patches generally. > > The patch is a proposal following some offline discussions on this matter. I personally see the value of it versus what we have today. Thanks, Marcel > > > > > > > > > > > > > > > > > -- > > > David Gibson <dgibson@redhat.com> > > > Principal Software Engineer, Virtualization, Red Hat > > > > > > > > > <div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 22, 2020 at 5:33 PM Michael S. Tsirkin <<a href="mailto:mst@redhat.com">mst@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, Oct 22, 2020 at 05:10:43PM +0300, Marcel Apfelbaum wrote:<br> > <br> > <br> > On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <<a href="mailto:mst@redhat.com" target="_blank">mst@redhat.com</a>> wrote:<br> > <br> > On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote:<br> > > Hi David, Michael,<br> > ><br> > > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <<a href="mailto:dgibson@redhat.com" target="_blank">dgibson@redhat.com</a>> wrote:<br> > ><br> > > On Thu, 22 Oct 2020 08:06:55 -0400<br> > > "Michael S. Tsirkin" <<a href="mailto:mst@redhat.com" target="_blank">mst@redhat.com</a>> wrote:<br> > ><br> > > > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:<br> > > > > From: Marcel Apfelbaum <<a href="mailto:marcel@redhat.com" target="_blank">marcel@redhat.com</a>><br> > > > ><br> > > > > During PCIe Root Port's transition from Power-Off to Power-ON (or<br> > > vice-versa)<br> > > > > the "Slot Control Register" has the "Power Indicator Control"<br> > > > > set to "Blinking" expressing a "power transition" mode.<br> > > > ><br> > > > > Any hotplug operation during the "power transition" mode is not<br> > > permitted<br> > > > > or at least not expected by the Guest OS leading to strange<br> > failures.<br> > > > ><br> > > > > Detect and refuse hotplug operations in such case.<br> > > > ><br> > > > > Signed-off-by: Marcel Apfelbaum <<a href="mailto:marcel.apfelbaum@gmail.com" target="_blank">marcel.apfelbaum@gmail.com</a>><br> > > > > ---<br> > > > > hw/pci/pcie.c | 7 +++++++<br> > > > > 1 file changed, 7 insertions(+)<br> > > > ><br> > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c<br> > > > > index 5b48bae0f6..2fe5c1473f 100644<br> > > > > --- a/hw/pci/pcie.c<br> > > > > +++ b/hw/pci/pcie.c<br> > > > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler<br> > > *hotplug_dev, DeviceState *dev,<br> > > > > PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);<br> > > > > uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev-><br> > > exp.exp_cap;<br> > > > > uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);<br> > > > > + uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);<br> > > > > <br> > > > > /* Check if hot-plug is disabled on the slot */<br> > > > > if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {<br> > > > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb<br> > (HotplugHandler<br> > > *hotplug_dev, DeviceState *dev,<br> > > > > return;<br> > > > > }<br> > > > > <br> > > > > + if ((sltctl & PCI_EXP_SLTCTL_PIC) ==<br> > PCI_EXP_SLTCTL_PWR_IND_BLINK)<br> > > {<br> > > > > + error_setg(errp, "Hot-plug failed: %s is in Power<br> > Transition",<br> > > > > + DEVICE(hotplug_pdev)->id);<br> > > > > + return;<br> > > > > + }<br> > > > > +<br> > > > > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev,<br> > errp);<br> > > > > } <br> > > ><br> > > > Probably the only way to handle for existing machine types.<br> > ><br> > ><br> > > I agree<br> > > <br> > ><br> > > > For new ones, can't we queue it in host memory somewhere?<br> > ><br> > ><br> > ><br> > > I am not sure I understand what will be the flow.<br> > > - The user asks for a hotplug operation.<br> > > - QEMU deferred operation.<br> > > After that the operation may still fail, how would the user know if the<br> > > operation<br> > > succeeded or not?<br> > <br> > <br> > How can it fail? It's just a button press ...<br> > <br> > <br> > <br> > Currently we have "Hotplug unsupported."<br> > With this change we have "Guest/System not ready"<br> <br> <br> Hotplug unsupported is not an error that can trigger with<br> a well behaved management such as libvirt.<br> <br> <br> > <br> > <br> > > <br> > ><br> > > I'm not actually convinced we can't do that even for existing machine<br> > > types. <br> > ><br> > ><br> > > Is a Guest visible change, I don't think we can do it.<br> > > <br> > ><br> > > So I'm a bit hesitant to suggest going ahead with this without<br> > > looking a bit closer at whether we can implement a wait-for-ready in<br> > > qemu, rather than forcing every user of qemu (human or machine) to do<br> > > so.<br> > ><br> > ><br> > > While I agree it is a pain from the usability point of view, hotplug<br> > operations<br> > > are allowed to fail. This is not more than a corner case, ensuring the<br> > right<br> > > response (gracefully erroring out) may be enough.<br> > ><br> > > Thanks,<br> > > Marcel<br> > ><br> > <br> > <br> > I don't think they ever failed in the past so management is unlikely<br> > to handle the failure by retrying ...<br> > <br> > <br> > That would require some management handling, yes.<br> > But even without a "retry", failing is better than strange OS behavior.<br> > <br> > Trying a better alternative like deferring the operation for new machines<br> > would make sense, however is out of the scope of this patch<br> <br> Expand the scope please. The scope should be "solve a problem xx" not<br> "solve a problem xx by doing abc".<br> <br></blockquote><div><br></div><div>The scope is detecting a hotplug error early instead</div><div>passing to the Guest OS a hotplug operation that we know it will fail.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> > that simply<br> > detects the error leaving us in a slightly better state than today.<br> > <br> > Thanks,<br> > Marcel<br> <br> Not applying a patch is the only tool we maintainers have to influence<br> people to solve the problem fully. </blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> That's why I'm not inclined to apply<br> "slightly better" patches generally.<br> <br></blockquote><div><br></div><div>The patch is a proposal following some offline discussions on this matter.</div><div>I personally see the value of it versus what we have today.</div><div><br></div><div>Thanks,</div><div>Marcel</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> > <br> > <br> > ><br> > ><br> > ><br> > > --<br> > > David Gibson <<a href="mailto:dgibson@redhat.com" target="_blank">dgibson@redhat.com</a>><br> > > Principal Software Engineer, Virtualization, Red Hat<br> > ><br> > <br> > <br> <br> </blockquote></div></div>
On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote: > > > On Thu, Oct 22, 2020 at 5:33 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Oct 22, 2020 at 05:10:43PM +0300, Marcel Apfelbaum wrote: > > > > > > On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <mst@redhat.com> > wrote: > > > >   On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote: > >   > Hi David, Michael, > >   > > >   > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgibson@redhat.com> > wrote: > >   > > >   >   On Thu, 22 Oct 2020 08:06:55 -0400 > >   >   "Michael S. Tsirkin" <mst@redhat.com> wrote: > >   > > >   >   > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum > wrote: > >   >   > > From: Marcel Apfelbaum <marcel@redhat.com> > >   >   > > > >   >   > > During PCIe Root Port's transition from Power-Off to > Power-ON (or > >   >   vice-versa) > >   >   > > the "Slot Control Register" has the "Power Indicator > Control" > >   >   > > set to "Blinking" expressing a "power transition" mode. > >   >   > > > >   >   > > Any hotplug operation during the "power transition" mode is > not > >   >   permitted > >   >   > > or at least not expected by the Guest OS leading to strange > >   failures. > >   >   > > > >   >   > > Detect and refuse hotplug operations in such case. > >   >   > > > >   >   > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com > > > >   >   > > --- > >   >   > > hw/pci/pcie.c | 7 +++++++ > >   >   > > 1 file changed, 7 insertions(+) > >   >   > > > >   >   > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > >   >   > > index 5b48bae0f6..2fe5c1473f 100644 > >   >   > > --- a/hw/pci/pcie.c > >   >   > > +++ b/hw/pci/pcie.c > >   >   > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb > (HotplugHandler > >   >   *hotplug_dev, DeviceState *dev, > >   >   > >   PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev); > >   >   > >   uint8_t *exp_cap = hotplug_pdev->config + > hotplug_pdev-> > >   >   exp.exp_cap; > >   >   > >   uint32_t sltcap = pci_get_word(exp_cap + > PCI_EXP_SLTCAP); > >   >   > > +  uint32_t sltctl = pci_get_word(exp_cap + > PCI_EXP_SLTCTL); > >   >   > > > >   >   > >   /* Check if hot-plug is disabled on the slot */ > >   >   > >   if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) = > = 0) { > >   >   > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb > >   (HotplugHandler > >   >   *hotplug_dev, DeviceState *dev, > >   >   > >     return; > >   >   > >   } > >   >   > > > >   >   > > +  if ((sltctl & PCI_EXP_SLTCTL_PIC) == > >   PCI_EXP_SLTCTL_PWR_IND_BLINK) > >   >   { > >   >   > > +    error_setg(errp, "Hot-plug failed: %s is in Power > >   Transition", > >   >   > > +          DEVICE(hotplug_pdev)->id); > >   >   > > +    return; > >   >   > > +  } > >   >   > > + > >   >   > >   pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), > dev, > >   errp); > >   >   > > } > >   >   > > >   >   > Probably the only way to handle for existing machine types. > >   > > >   > > >   > I agree > >   >  > >   > > >   >   > For new ones, can't we queue it in host memory somewhere? > >   > > >   > > >   > > >   > I am not sure I understand what will be the flow. > >   >  - The user asks for a hotplug operation. > >   >  - QEMU deferred operation. > >   > After that the operation may still fail, how would the user know if > the > >   > operation > >   > succeeded or not? > > > > > >   How can it fail? It's just a button press ... > > > > > > > > Currently we have "Hotplug unsupported." > > With this change we have "Guest/System not ready" > > > Hotplug unsupported is not an error that can trigger with > a well behaved management such as libvirt. > > > >  > > > >   >  > >   > > >   >   I'm not actually convinced we can't do that even for existing > machine > >   >   types. > >   > > >   > > >   > Is a Guest visible change, I don't think we can do it. > >   >  > >   > > >   >   So I'm a bit hesitant to suggest going ahead with this without > >   >   looking a bit closer at whether we can implement a > wait-for-ready in > >   >   qemu, rather than forcing every user of qemu (human or machine) > to do > >   >   so. > >   > > >   > > >   > While I agree it is a pain from the usability point of view, > hotplug > >   operations > >   > are allowed to fail. This is not more than a corner case, ensuring > the > >   right > >   > response (gracefully erroring out) may be enough. > >   > > >   > Thanks, > >   > Marcel > >   > > > > > > >   I don't think they ever failed in the past so management is unlikely > >   to handle the failure by retrying ... > > > > > > That would require some management handling, yes. > > But even without a "retry", failing is better than strange OS behavior. > > > > Trying a better alternative like deferring the operation for new machines > > would make sense, however is out of the scope of this patch > > Expand the scope please. The scope should be "solve a problem xx" not > "solve a problem xx by doing abc". > > > > The scope is detecting a hotplug error early instead > passing to the Guest OS a hotplug operation that we know it will fail. > Right. After detecting just failing unconditionally it a bit too simplistic IMHO. > > > that simply > > detects the error leaving us in a slightly better state than today. > > > > Thanks, > > Marcel > > Not applying a patch is the only tool we maintainers have to influence > people to solve the problem fully. > > That's why I'm not inclined to apply > "slightly better" patches generally. > > > > The patch is a proposal following some offline discussions on this matter. > I personally see the value of it versus what we have today. > > Thanks, > Marcel > > > > > > > >   > > >   > > >   > > >   >   -- > >   >   David Gibson <dgibson@redhat.com> > >   >   Principal Software Engineer, Virtualization, Red Hat > >   > > > > > > >
On Thu, 22 Oct 2020 09:15:28 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Thu, Oct 22, 2020 at 11:56:32PM +1100, David Gibson wrote: > [...] > [...] > [...] > > > > > > Probably the only way to handle for existing machine types. > > > For new ones, can't we queue it in host memory somewhere? > > > > I'm not actually convinced we can't do that even for existing machine > > types. > > The difficulty would be in migrating the extra "reuested but defferred" > state. Ah, true. Although we could block migration for the duration instead. -- David Gibson <dgibson@redhat.com> Principal Software Engineer, Virtualization, Red Hat
On Thu, 22 Oct 2020 16:55:10 +0300 Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote: > Hi David, Michael, > > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgibson@redhat.com> wrote: > > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > > > > > > Probably the only way to handle for existing machine types. > > > > I agree > > > > > For new ones, can't we queue it in host memory somewhere? > > > > > I am not sure I understand what will be the flow. > - The user asks for a hotplug operation. > - QEMU deferred operation. > After that the operation may still fail, how would the user know if the > operation > succeeded or not? > > > > > I'm not actually convinced we can't do that even for existing machine > > types. > > > Is a Guest visible change, I don't think we can do it. How is it a guest visible change? > > So I'm a bit hesitant to suggest going ahead with this without > > looking a bit closer at whether we can implement a wait-for-ready in > > qemu, rather than forcing every user of qemu (human or machine) to do > > so. > > While I agree it is a pain from the usability point of view, hotplug > operations > are allowed to fail. This is not more than a corner case, ensuring the right > response (gracefully erroring out) may be enough. > > Thanks, > Marcel > > > > [...] -- David Gibson <dgibson@redhat.com> Principal Software Engineer, Virtualization, Red Hat
On Thu, 22 Oct 2020 11:01:04 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote: > [...] > > Right. After detecting just failing unconditionally it a bit too > simplistic IMHO. There's also another factor here, which I thought I'd mentioned already, but looks like I didn't: I think we're still missing some details in what's going on. The premise for this patch is that plugging while the indicator is in transition state is allowed to fail in any way on the guest side. I don't think that's a reasonable interpretation, because it's unworkable for physical hotplug. If the indicator starts blinking while you're in the middle of shoving a card in, you'd be in trouble. So, what I'm assuming here is that while "don't plug while blinking" is the instruction for the operator to obey as best they can, on the guest side the rule has to be "start blinking, wait a while and by the time you leave blinking state again, you can be confident any plugs or unplugs have completed". Obviously still racy in the strict computer science sense, but about the best you can do with slow humans in the mix. So, qemu should of course endeavour to follow that rule as though it was a human operator on a physical machine and not plug when the indicator is blinking. *But* the qemu plug will in practice be fast enough that if we're hitting real problems here, it suggests the guest is still doing something wrong. -- David Gibson <dgibson@redhat.com> Principal Software Engineer, Virtualization, Red Hat
Hi Michael, On Thu, Oct 22, 2020 at 6:01 PM Michael S. Tsirkin <mst@redhat.com> wrote: > On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote: > > > > > > On Thu, Oct 22, 2020 at 5:33 PM Michael S. Tsirkin <mst@redhat.com> > wrote: > > > > On Thu, Oct 22, 2020 at 05:10:43PM +0300, Marcel Apfelbaum wrote: > > > > > > > > > On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <mst@redhat.com > > > > wrote: > > > > > > On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum > wrote: > > > > Hi David, Michael, > > > > > > > > On Thu, Oct 22, 2020 at 3:56 PM David Gibson < > dgibson@redhat.com> > > wrote: > > > > > > > > On Thu, 22 Oct 2020 08:06:55 -0400 > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > > > > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel > Apfelbaum > > wrote: > > > > > > From: Marcel Apfelbaum <marcel@redhat.com> > > > > > > > > > > > > During PCIe Root Port's transition from Power-Off to > > Power-ON (or > > > > vice-versa) > > > > > > the "Slot Control Register" has the "Power Indicator > > Control" > > > > > > set to "Blinking" expressing a "power transition" > mode. > > > > > > > > > > > > Any hotplug operation during the "power transition" > mode is > > not > > > > permitted > > > > > > or at least not expected by the Guest OS leading to > strange > > > failures. > > > > > > > > > > > > Detect and refuse hotplug operations in such case. > > > > > > > > > > > > Signed-off-by: Marcel Apfelbaum < > marcel.apfelbaum@gmail.com > > > > > > > > > --- > > > > > > hw/pci/pcie.c | 7 +++++++ > > > > > > 1 file changed, 7 insertions(+) > > > > > > > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > > > > > index 5b48bae0f6..2fe5c1473f 100644 > > > > > > --- a/hw/pci/pcie.c > > > > > > +++ b/hw/pci/pcie.c > > > > > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb > > (HotplugHandler > > > > *hotplug_dev, DeviceState *dev, > > > > > > PCIDevice *hotplug_pdev = > PCI_DEVICE(hotplug_dev); > > > > > > uint8_t *exp_cap = hotplug_pdev->config + > > hotplug_pdev-> > > > > exp.exp_cap; > > > > > > uint32_t sltcap = pci_get_word(exp_cap + > > PCI_EXP_SLTCAP); > > > > > > + uint32_t sltctl = pci_get_word(exp_cap + > > PCI_EXP_SLTCTL); > > > > > > > > > > > > /* Check if hot-plug is disabled on the slot */ > > > > > > if (dev->hotplugged && (sltcap & > PCI_EXP_SLTCAP_HPC) = > > = 0) { > > > > > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb > > > (HotplugHandler > > > > *hotplug_dev, DeviceState *dev, > > > > > > return; > > > > > > } > > > > > > > > > > > > + if ((sltctl & PCI_EXP_SLTCTL_PIC) == > > > PCI_EXP_SLTCTL_PWR_IND_BLINK) > > > > { > > > > > > + error_setg(errp, "Hot-plug failed: %s is in > Power > > > Transition", > > > > > > + DEVICE(hotplug_pdev)->id); > > > > > > + return; > > > > > > + } > > > > > > + > > > > > > > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), > > dev, > > > errp); > > > > > > } > > > > > > > > > > Probably the only way to handle for existing machine > types. > > > > > > > > > > > > I agree > > > > > > > > > > > > > For new ones, can't we queue it in host memory > somewhere? > > > > > > > > > > > > > > > > I am not sure I understand what will be the flow. > > > > - The user asks for a hotplug operation. > > > > - QEMU deferred operation. > > > > After that the operation may still fail, how would the user > know if > > the > > > > operation > > > > succeeded or not? > > > > > > > > > How can it fail? It's just a button press ... > > > > > > > > > > > > Currently we have "Hotplug unsupported." > > > With this change we have "Guest/System not ready" > > > > > > Hotplug unsupported is not an error that can trigger with > > a well behaved management such as libvirt. > > > > > > > > > > > > > > > > > > > > > > I'm not actually convinced we can't do that even for > existing > > machine > > > > types. > > > > > > > > > > > > Is a Guest visible change, I don't think we can do it. > > > > > > > > > > > > So I'm a bit hesitant to suggest going ahead with this > without > > > > looking a bit closer at whether we can implement a > > wait-for-ready in > > > > qemu, rather than forcing every user of qemu (human or > machine) > > to do > > > > so. > > > > > > > > > > > > While I agree it is a pain from the usability point of view, > > hotplug > > > operations > > > > are allowed to fail. This is not more than a corner case, > ensuring > > the > > > right > > > > response (gracefully erroring out) may be enough. > > > > > > > > Thanks, > > > > Marcel > > > > > > > > > > > > > I don't think they ever failed in the past so management is > unlikely > > > to handle the failure by retrying ... > > > > > > > > > That would require some management handling, yes. > > > But even without a "retry", failing is better than strange OS > behavior. > > > > > > Trying a better alternative like deferring the operation for new > machines > > > would make sense, however is out of the scope of this patch > > > > Expand the scope please. The scope should be "solve a problem xx" not > > "solve a problem xx by doing abc". > > > > > > > > The scope is detecting a hotplug error early instead > > passing to the Guest OS a hotplug operation that we know it will fail. > > > > Right. After detecting just failing unconditionally it a bit too > simplistic IMHO. > > Simplistic does not mean wrong or incorrect. I fail to see why it is not enough. What QEMU can do better? Wait an unbounded time for the blinking to finish? What if we have a buggy guest with a kernel stuck in blinking? Is QEMU's responsibility to emulate the operator itself? Because the operator is the one who is supposed to wait. Thanks, Marcel [...] <div dir="ltr"><div dir="ltr">Hi Michael,</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 22, 2020 at 6:01 PM Michael S. Tsirkin <<a href="mailto:mst@redhat.com">mst@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote:<br> > <br> > <br> > On Thu, Oct 22, 2020 at 5:33 PM Michael S. Tsirkin <<a href="mailto:mst@redhat.com" target="_blank">mst@redhat.com</a>> wrote:<br> > <br> > On Thu, Oct 22, 2020 at 05:10:43PM +0300, Marcel Apfelbaum wrote:<br> > ><br> > ><br> > > On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <<a href="mailto:mst@redhat.com" target="_blank">mst@redhat.com</a>><br> > wrote:<br> > ><br> > > On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote:<br> > > > Hi David, Michael,<br> > > ><br> > > > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <<a href="mailto:dgibson@redhat.com" target="_blank">dgibson@redhat.com</a>><br> > wrote:<br> > > ><br> > > > On Thu, 22 Oct 2020 08:06:55 -0400<br> > > > "Michael S. Tsirkin" <<a href="mailto:mst@redhat.com" target="_blank">mst@redhat.com</a>> wrote:<br> > > ><br> > > > > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum<br> > wrote:<br> > > > > > From: Marcel Apfelbaum <<a href="mailto:marcel@redhat.com" target="_blank">marcel@redhat.com</a>><br> > > > > ><br> > > > > > During PCIe Root Port's transition from Power-Off to<br> > Power-ON (or<br> > > > vice-versa)<br> > > > > > the "Slot Control Register" has the "Power Indicator<br> > Control"<br> > > > > > set to "Blinking" expressing a "power transition" mode.<br> > > > > ><br> > > > > > Any hotplug operation during the "power transition" mode is<br> > not<br> > > > permitted<br> > > > > > or at least not expected by the Guest OS leading to strange<br> > > failures.<br> > > > > ><br> > > > > > Detect and refuse hotplug operations in such case.<br> > > > > ><br> > > > > > Signed-off-by: Marcel Apfelbaum <<a href="mailto:marcel.apfelbaum@gmail.com" target="_blank">marcel.apfelbaum@gmail.com</a><br> > ><br> > > > > > ---<br> > > > > > hw/pci/pcie.c | 7 +++++++<br> > > > > > 1 file changed, 7 insertions(+)<br> > > > > ><br> > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c<br> > > > > > index 5b48bae0f6..2fe5c1473f 100644<br> > > > > > --- a/hw/pci/pcie.c<br> > > > > > +++ b/hw/pci/pcie.c<br> > > > > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb<br> > (HotplugHandler<br> > > > *hotplug_dev, DeviceState *dev,<br> > > > > > PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);<br> > > > > > uint8_t *exp_cap = hotplug_pdev->config +<br> > hotplug_pdev-><br> > > > exp.exp_cap;<br> > > > > > uint32_t sltcap = pci_get_word(exp_cap +<br> > PCI_EXP_SLTCAP);<br> > > > > > + uint32_t sltctl = pci_get_word(exp_cap +<br> > PCI_EXP_SLTCTL);<br> > > > > > <br> > > > > > /* Check if hot-plug is disabled on the slot */<br> > > > > > if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) =<br> > = 0) {<br> > > > > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb<br> > > (HotplugHandler<br> > > > *hotplug_dev, DeviceState *dev,<br> > > > > > return;<br> > > > > > }<br> > > > > > <br> > > > > > + if ((sltctl & PCI_EXP_SLTCTL_PIC) ==<br> > > PCI_EXP_SLTCTL_PWR_IND_BLINK)<br> > > > {<br> > > > > > + error_setg(errp, "Hot-plug failed: %s is in Power<br> > > Transition",<br> > > > > > + DEVICE(hotplug_pdev)->id);<br> > > > > > + return;<br> > > > > > + }<br> > > > > > +<br> > > > > > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev),<br> > dev,<br> > > errp);<br> > > > > > } <br> > > > ><br> > > > > Probably the only way to handle for existing machine types.<br> > > ><br> > > ><br> > > > I agree<br> > > > <br> > > ><br> > > > > For new ones, can't we queue it in host memory somewhere?<br> > > ><br> > > ><br> > > ><br> > > > I am not sure I understand what will be the flow.<br> > > > - The user asks for a hotplug operation.<br> > > > - QEMU deferred operation.<br> > > > After that the operation may still fail, how would the user know if<br> > the<br> > > > operation<br> > > > succeeded or not?<br> > ><br> > ><br> > > How can it fail? It's just a button press ...<br> > ><br> > ><br> > ><br> > > Currently we have "Hotplug unsupported."<br> > > With this change we have "Guest/System not ready"<br> > <br> > <br> > Hotplug unsupported is not an error that can trigger with<br> > a well behaved management such as libvirt.<br> > <br> > <br> > > <br> > ><br> > > > <br> > > ><br> > > > I'm not actually convinced we can't do that even for existing<br> > machine<br> > > > types. <br> > > ><br> > > ><br> > > > Is a Guest visible change, I don't think we can do it.<br> > > > <br> > > ><br> > > > So I'm a bit hesitant to suggest going ahead with this without<br> > > > looking a bit closer at whether we can implement a<br> > wait-for-ready in<br> > > > qemu, rather than forcing every user of qemu (human or machine)<br> > to do<br> > > > so.<br> > > ><br> > > ><br> > > > While I agree it is a pain from the usability point of view,<br> > hotplug<br> > > operations<br> > > > are allowed to fail. This is not more than a corner case, ensuring<br> > the<br> > > right<br> > > > response (gracefully erroring out) may be enough.<br> > > ><br> > > > Thanks,<br> > > > Marcel<br> > > ><br> > ><br> > ><br> > > I don't think they ever failed in the past so management is unlikely<br> > > to handle the failure by retrying ...<br> > ><br> > ><br> > > That would require some management handling, yes.<br> > > But even without a "retry", failing is better than strange OS behavior.<br> > ><br> > > Trying a better alternative like deferring the operation for new machines<br> > > would make sense, however is out of the scope of this patch<br> > <br> > Expand the scope please. The scope should be "solve a problem xx" not<br> > "solve a problem xx by doing abc".<br> > <br> > <br> > <br> > The scope is detecting a hotplug error early instead<br> > passing to the Guest OS a hotplug operation that we know it will fail.<br> > <br> <br> Right. After detecting just failing unconditionally it a bit too<br> simplistic IMHO.<br><br></blockquote><div><br></div><div>Simplistic does not mean wrong or incorrect.</div><div>I fail to see why it is not enough.</div><div><br></div><div>What QEMU can do better? Wait an unbounded time for the blinking to finish?</div><div>What if we have a buggy guest with a kernel stuck in blinking?</div><div>Is QEMU's responsibility to emulate the operator itself? Because the operator<br></div><div>is the one who is supposed to wait.</div><div><br></div><div><br></div><div>Thanks,</div><div>Marcel</div><div><br></div><div>[...] </div></div></div>
Hi David, On Fri, Oct 23, 2020 at 6:49 AM David Gibson <dgibson@redhat.com> wrote: > On Thu, 22 Oct 2020 11:01:04 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote: > > [...] > > > > Right. After detecting just failing unconditionally it a bit too > > simplistic IMHO. > > There's also another factor here, which I thought I'd mentioned > already, but looks like I didn't: I think we're still missing some > details in what's going on. > > The premise for this patch is that plugging while the indicator is in > transition state is allowed to fail in any way on the guest side. I > don't think that's a reasonable interpretation, because it's unworkable > for physical hotplug. If the indicator starts blinking while you're in > the middle of shoving a card in, you'd be in trouble. > > So, what I'm assuming here is that while "don't plug while blinking" is > the instruction for the operator to obey as best they can, on the guest > side the rule has to be "start blinking, wait a while and by the time > you leave blinking state again, you can be confident any plugs or > unplugs have completed". Obviously still racy in the strict computer > science sense, but about the best you can do with slow humans in the > mix. > > So, qemu should of course endeavour to follow that rule as though it > was a human operator on a physical machine and not plug when the > indicator is blinking. *But* the qemu plug will in practice be fast > enough that if we're hitting real problems here, it suggests the guest > is still doing something wrong. > I personally think there is a little bit of over-engineering here. Let's start with the spec: Power Indicator Blinking A blinking Power Indicator indicates that the slot is powering up or powering down and that insertion or removal of the adapter is not permitted. What exactly is an interpretation here? As you stated, the races are theoretical, the whole point of the indicator is to let the operator know he can't plug the device just yet. I understand it would be more user friendly if the QEMU would wait internally for the blinking to end, but the whole point of the indicator is to let the operator (human or machine) know they can't plug the device at a specific time. Should QEMU take the responsibility of the operator? Is it even correct? Even if we would want such a feature, how is it related to this patch? The patch simply refuses to start a hotplug operation when it knows it will not succeed. Another way that would make sense to me would be is a new QEMU interface other than "add_device", let's say "adding_device_allowed", that would return true if the hotplug is allowed at this point of time. (I am aware of the theoretical races) The above will at least mimic the mechanics of the pyhs world. The operator looks at the indicator, the management software checks if adding the device is allowed. Since it is a corner case I would prefer the device_add to fail rather than introducing a new interface, but that's just me. Thanks, Marcel > -- > David Gibson <dgibson@redhat.com> > Principal Software Engineer, Virtualization, Red Hat > <div dir="ltr"><div dir="ltr">Hi David,</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Oct 23, 2020 at 6:49 AM David Gibson <<a href="mailto:dgibson@redhat.com">dgibson@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, 22 Oct 2020 11:01:04 -0400<br> "Michael S. Tsirkin" <<a href="mailto:mst@redhat.com" target="_blank">mst@redhat.com</a>> wrote:<br> <br> > On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote:<br> > [...] <br> > <br> > Right. After detecting just failing unconditionally it a bit too<br> > simplistic IMHO.<br> <br> There's also another factor here, which I thought I'd mentioned<br> already, but looks like I didn't: I think we're still missing some<br> details in what's going on.<br> <br> The premise for this patch is that plugging while the indicator is in<br> transition state is allowed to fail in any way on the guest side. I<br> don't think that's a reasonable interpretation, because it's unworkable<br> for physical hotplug. If the indicator starts blinking while you're in<br> the middle of shoving a card in, you'd be in trouble.<br> <br> So, what I'm assuming here is that while "don't plug while blinking" is<br> the instruction for the operator to obey as best they can, on the guest<br> side the rule has to be "start blinking, wait a while and by the time<br> you leave blinking state again, you can be confident any plugs or<br> unplugs have completed". Obviously still racy in the strict computer<br> science sense, but about the best you can do with slow humans in the<br> mix.<br> <br> So, qemu should of course endeavour to follow that rule as though it<br> was a human operator on a physical machine and not plug when the<br> indicator is blinking. *But* the qemu plug will in practice be fast<br> enough that if we're hitting real problems here, it suggests the guest<br> is still doing something wrong.<br></blockquote><div><br></div><div>I personally think there is a little bit of over-engineering here.</div><div>Let's start with the spec:</div><div><br></div><div> Power Indicator Blinking<br> A blinking Power Indicator indicates that the slot is powering up or powering down and that<br> insertion or removal of the adapter is not permitted.</div><div><br>What exactly is an interpretation here?</div><div>As you stated, the races are theoretical, the whole point of the indicator</div><div>is to let the operator know he can't plug the device just yet.</div><div><br></div><div>I understand it would be more user friendly if the QEMU would wait internally for the</div><div>blinking to end, but the whole point of the indicator is to let the operator (human or machine)</div><div>know they can't plug the device at a specific time.</div><div>Should QEMU take the responsibility of the operator? Is it even correct?<br></div><div><br></div><div>Even if we would want such a feature, how is it related to this patch?</div><div>The patch simply refuses to start a hotplug operation when it knows it will not succeed. </div><div> <br></div><div>Another way that would make sense to me would be is a new QEMU interface other than</div><div>"add_device", let's say "adding_device_allowed", that would return true if the hotplug is allowed</div><div>at this point of time. (I am aware of the theoretical races) </div><div><br></div><div>The above will at least mimic the mechanics of the pyhs world. The operator looks at the indicator,</div><div>the management software checks if adding the device is allowed.</div><div>Since it is a corner case I would prefer the device_add to fail rather than introducing a new interface,</div><div>but that's just me.</div><div><br></div><div>Thanks,</div><div>Marcel</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> -- <br> David Gibson <<a href="mailto:dgibson@redhat.com" target="_blank">dgibson@redhat.com</a>><br> Principal Software Engineer, Virtualization, Red Hat<br> </blockquote></div></div>
On Fri, Oct 23, 2020 at 09:47:14AM +0300, Marcel Apfelbaum wrote: > Hi David, > > On Fri, Oct 23, 2020 at 6:49 AM David Gibson <dgibson@redhat.com> wrote: > > On Thu, 22 Oct 2020 11:01:04 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote: > > [...] > > > > Right. After detecting just failing unconditionally it a bit too > > simplistic IMHO. > > There's also another factor here, which I thought I'd mentioned > already, but looks like I didn't: I think we're still missing some > details in what's going on. > > The premise for this patch is that plugging while the indicator is in > transition state is allowed to fail in any way on the guest side. I > don't think that's a reasonable interpretation, because it's unworkable > for physical hotplug. If the indicator starts blinking while you're in > the middle of shoving a card in, you'd be in trouble. > > So, what I'm assuming here is that while "don't plug while blinking" is > the instruction for the operator to obey as best they can, on the guest > side the rule has to be "start blinking, wait a while and by the time > you leave blinking state again, you can be confident any plugs or > unplugs have completed". Obviously still racy in the strict computer > science sense, but about the best you can do with slow humans in the > mix. > > So, qemu should of course endeavour to follow that rule as though it > was a human operator on a physical machine and not plug when the > indicator is blinking. *But* the qemu plug will in practice be fast > enough that if we're hitting real problems here, it suggests the guest > is still doing something wrong. > > > I personally think there is a little bit of over-engineering here. > Let's start with the spec: > >   Power Indicator Blinking >   A blinking Power Indicator indicates that the slot is powering up or > powering down and that >   insertion or removal of the adapter is not permitted. > > What exactly is an interpretation here? > As you stated, the races are theoretical, the whole point of the indicator > is to let the operator know he can't plug the device just yet. > > I understand it would be more user friendly if the QEMU would wait internally > for the > blinking to end, but the whole point of the indicator is to let the operator > (human or machine) > know they can't plug the device at a specific time. > Should QEMU take the responsibility of the operator? Is it even correct? > > Even if we would want such a feature, how is it related to this patch? > The patch simply refuses to start a hotplug operation when it knows it will not > succeed. >  > Another way that would make sense to me would be is a new QEMU interface other > than > "add_device", let's say "adding_device_allowed", that would return true if the > hotplug is allowed > at this point of time. (I am aware of the theoretical races) Rather than adding_device_allowed, something like "query slot" might be helpful for debugging. That would help user figure out e.g. why isn't device visible without any races. > The above will at least mimic the mechanics of the pyhs world. The operator > looks at the indicator, > the management software checks if adding the device is allowed. > Since it is a corner case I would prefer the device_add to fail rather than > introducing a new interface, > but that's just me. > > Thanks, > Marcel > I think we want QEMU management interface to be reasonably abstract and agnostic if possible. Pushing knowledge of hardware detail to management will just lead to pain IMHO. We supported device_add which practically never fails for years, at this point it's easier to keep supporting it than change all users ... > > -- > David Gibson <dgibson@redhat.com> > Principal Software Engineer, Virtualization, Red Hat >
On Fri, 23 Oct 2020 11:54:40 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Fri, Oct 23, 2020 at 09:47:14AM +0300, Marcel Apfelbaum wrote: > > Hi David, > > > > On Fri, Oct 23, 2020 at 6:49 AM David Gibson <dgibson@redhat.com> wrote: > > > > On Thu, 22 Oct 2020 11:01:04 -0400 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote: > > > [...] > > > > > > Right. After detecting just failing unconditionally it a bit too > > > simplistic IMHO. > > > > There's also another factor here, which I thought I'd mentioned > > already, but looks like I didn't: I think we're still missing some > > details in what's going on. > > > > The premise for this patch is that plugging while the indicator is in > > transition state is allowed to fail in any way on the guest side. I > > don't think that's a reasonable interpretation, because it's unworkable > > for physical hotplug. If the indicator starts blinking while you're in > > the middle of shoving a card in, you'd be in trouble. > > > > So, what I'm assuming here is that while "don't plug while blinking" is > > the instruction for the operator to obey as best they can, on the guest > > side the rule has to be "start blinking, wait a while and by the time > > you leave blinking state again, you can be confident any plugs or > > unplugs have completed". Obviously still racy in the strict computer > > science sense, but about the best you can do with slow humans in the > > mix. > > > > So, qemu should of course endeavour to follow that rule as though it > > was a human operator on a physical machine and not plug when the > > indicator is blinking. *But* the qemu plug will in practice be fast > > enough that if we're hitting real problems here, it suggests the guest > > is still doing something wrong. > > > > > > I personally think there is a little bit of over-engineering here. > > Let's start with the spec: > > > >   Power Indicator Blinking > >   A blinking Power Indicator indicates that the slot is powering up or > > powering down and that > >   insertion or removal of the adapter is not permitted. > > > > What exactly is an interpretation here? > > As you stated, the races are theoretical, the whole point of the indicator > > is to let the operator know he can't plug the device just yet. > > > > I understand it would be more user friendly if the QEMU would wait internally > > for the > > blinking to end, but the whole point of the indicator is to let the operator > > (human or machine) > > know they can't plug the device at a specific time. > > Should QEMU take the responsibility of the operator? Is it even correct? > > > > Even if we would want such a feature, how is it related to this patch? > > The patch simply refuses to start a hotplug operation when it knows it will not > > succeed. > >  > > Another way that would make sense to me would be is a new QEMU interface other > > than > > "add_device", let's say "adding_device_allowed", that would return true if the > > hotplug is allowed > > at this point of time. (I am aware of the theoretical races) > > Rather than adding_device_allowed, something like "query slot" > might be helpful for debugging. That would help user figure out > e.g. why isn't device visible without any races. Would be new command useful tough? What we end up is broken guest (if I read commit message right) and a user who has no idea if device_add was successful or not. So what user should do in this case - wait till it explodes? - can user remove it or it would be stuck there forever? - poll slot before hotplug, manually? (if this is the case then failing device_add cleanly doesn't sound bad, it looks similar to another error we have "/* Check if hot-plug is disabled on the slot */" in pcie_cap_slot_pre_plug_cb) CCing libvirt, as it concerns not only QEMU. > > > The above will at least mimic the mechanics of the pyhs world. The operator > > looks at the indicator, > > the management software checks if adding the device is allowed. > > Since it is a corner case I would prefer the device_add to fail rather than > > introducing a new interface, > > but that's just me. > > > > Thanks, > > Marcel > > > > I think we want QEMU management interface to be reasonably > abstract and agnostic if possible. Pushing knowledge of hardware > detail to management will just lead to pain IMHO. > We supported device_add which practically never fails for years, For CPUs and RAM, device_add can fail so maybe management is also prepared to handle errors on PCI hotplug path. > at this point it's easier to keep supporting it than > change all users ... > > > > > > -- > > David Gibson <dgibson@redhat.com> > > Principal Software Engineer, Virtualization, Red Hat > > > >
On Fri, 23 Oct 2020 09:47:14 +0300 Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote: > Hi David, > > On Fri, Oct 23, 2020 at 6:49 AM David Gibson <dgibson@redhat.com> wrote: > > > On Thu, 22 Oct 2020 11:01:04 -0400 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote: > > > [...] > > > > > > Right. After detecting just failing unconditionally it a bit too > > > simplistic IMHO. > > > > There's also another factor here, which I thought I'd mentioned > > already, but looks like I didn't: I think we're still missing some > > details in what's going on. > > > > The premise for this patch is that plugging while the indicator is in > > transition state is allowed to fail in any way on the guest side. I > > don't think that's a reasonable interpretation, because it's unworkable > > for physical hotplug. If the indicator starts blinking while you're in > > the middle of shoving a card in, you'd be in trouble. > > > > So, what I'm assuming here is that while "don't plug while blinking" is > > the instruction for the operator to obey as best they can, on the guest > > side the rule has to be "start blinking, wait a while and by the time > > you leave blinking state again, you can be confident any plugs or > > unplugs have completed". Obviously still racy in the strict computer > > science sense, but about the best you can do with slow humans in the > > mix. > > > > So, qemu should of course endeavour to follow that rule as though it > > was a human operator on a physical machine and not plug when the > > indicator is blinking. *But* the qemu plug will in practice be fast > > enough that if we're hitting real problems here, it suggests the guest > > is still doing something wrong. > > > > I personally think there is a little bit of over-engineering here. > Let's start with the spec: > > Power Indicator Blinking > A blinking Power Indicator indicates that the slot is powering up or > powering down and that > insertion or removal of the adapter is not permitted. Right, but the weird bit here is that IIUC, the kernel during general probe is switching the indicator from off -> blinking -> off without it ever going to "on" state. And it seems to do the power on and presence check with the indicator still in blinking state. This is different from the normal sequence on a hotplug: press button indicator goes to blinking ...wait... indicator goes to full on slot powers on presence detect Or am I missing something? > What exactly is an interpretation here? > As you stated, the races are theoretical, the whole point of the indicator > is to let the operator know he can't plug the device just yet. > > I understand it would be more user friendly if the QEMU would wait > internally for the > blinking to end, but the whole point of the indicator is to let the > operator (human or machine) > know they can't plug the device at a specific time. > Should QEMU take the responsibility of the operator? Is it even correct? I don't think there's an unambiguous correct answer here. But I think it's reasonable to interpret a general "device_add" as "instruct the virtual operator to plug in the card ASAP" as easily as "shove the virtual card in right now". device_add already covers a bunch of different pluggable interfaces, so I don't think precisely aligning to pci-e semantics is really a priority. > Even if we would want such a feature, how is it related to this patch? > The patch simply refuses to start a hotplug operation when it knows it will > not succeed. I don't think I was clear. There are two separate and unrelated things I'm bringing up here. The first is that having the device_add fail for transitory reasons that the management layer doesn't really care about is really bad UX. But the second point I'm raising here is that "don't plug when blinking" is not, strictly speaking an implementable strategy for a human operator. That makes me conclude that the idea here is plugs started at basically the same time as the blinking starts (which could be a little before or a little after, humans being fuzzy) should actually be acceptable, even if they finish after the indicator is blinking. It's not clear to me that the kernel's current behaviour actually permits that. > Another way that would make sense to me would be is a new QEMU interface > other than > "add_device", let's say "adding_device_allowed", that would return true if > the hotplug is allowed > at this point of time. (I am aware of the theoretical races) > > The above will at least mimic the mechanics of the pyhs world. The > operator looks at the indicator, > the management software checks if adding the device is allowed. > Since it is a corner case I would prefer the device_add to fail rather than > introducing a new interface, > but that's just me. Oh, no, that's not what I'm suggesting. Adding an "is allowed" command is even worse. It makes the management's task *more* complex, which making any possible race here even wider. -- David Gibson <dgibson@redhat.com> Principal Software Engineer, Virtualization, Red Hat
On Fri, 23 Oct 2020 19:27:55 +0200 Igor Mammedov <imammedo@redhat.com> wrote: > On Fri, 23 Oct 2020 11:54:40 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > [...] > [...] > [...] > [...] > > > > Rather than adding_device_allowed, something like "query slot" > > might be helpful for debugging. That would help user figure out > > e.g. why isn't device visible without any races. > > Would be new command useful tough? What we end up is broken guest > (if I read commit message right) and a user who has no idea if > device_add was successful or not. > So what user should do in this case > - wait till it explodes? > - can user remove it or it would be stuck there forever? > - poll slot before hotplug, manually? > > (if this is the case then failing device_add cleanly doesn't sound bad, > it looks similar to another error we have "/* Check if hot-plug is disabled on the slot */" > in pcie_cap_slot_pre_plug_cb) > > CCing libvirt, as it concerns not only QEMU. > > [...] > [...] > > > > I think we want QEMU management interface to be reasonably > > abstract and agnostic if possible. Pushing knowledge of hardware > > detail to management will just lead to pain IMHO. > > We supported device_add which practically never fails for years, > > For CPUs and RAM, device_add can fail so maybe management is also > prepared to handle errors on PCI hotplug path. There can be unarguable reasons for PCI hotplug to fail as well (attempting to plug to a bus that can't support it for one). The difference here is that it's a failure that we expect to be transitory. -- David Gibson <dgibson@redhat.com> Principal Software Engineer, Virtualization, Red Hat
On Fri, 23 Oct 2020 09:26:48 +0300 Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote: > Hi Michael, > > On Thu, Oct 22, 2020 at 6:01 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > Simplistic does not mean wrong or incorrect. > I fail to see why it is not enough. > > What QEMU can do better? Wait an unbounded time for the blinking to finish? It certainly shouldn't wait an unbounded time. But a wait with timeout seems worth investigating to me. > What if we have a buggy guest with a kernel stuck in blinking? > Is QEMU's responsibility to emulate the operator itself? Because the > operator > is the one who is supposed to wait. > > > Thanks, > Marcel > > [...] -- David Gibson <dgibson@redhat.com> Principal Software Engineer, Virtualization, Red Hat
On Fri, Oct 23, 2020 at 19:27:55 +0200, Igor Mammedov wrote: > On Fri, 23 Oct 2020 11:54:40 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Fri, Oct 23, 2020 at 09:47:14AM +0300, Marcel Apfelbaum wrote: > > > Hi David, > > > > > > On Fri, Oct 23, 2020 at 6:49 AM David Gibson <dgibson@redhat.com> wrote: > > > > > > On Thu, 22 Oct 2020 11:01:04 -0400 > > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > > On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote: > > > > [...] > > > > > > > > Right. After detecting just failing unconditionally it a bit too > > > > simplistic IMHO. > > > > > > There's also another factor here, which I thought I'd mentioned > > > already, but looks like I didn't: I think we're still missing some > > > details in what's going on. > > > > > > The premise for this patch is that plugging while the indicator is in > > > transition state is allowed to fail in any way on the guest side. I > > > don't think that's a reasonable interpretation, because it's unworkable > > > for physical hotplug. If the indicator starts blinking while you're in > > > the middle of shoving a card in, you'd be in trouble. > > > > > > So, what I'm assuming here is that while "don't plug while blinking" is > > > the instruction for the operator to obey as best they can, on the guest > > > side the rule has to be "start blinking, wait a while and by the time > > > you leave blinking state again, you can be confident any plugs or > > > unplugs have completed". Obviously still racy in the strict computer > > > science sense, but about the best you can do with slow humans in the > > > mix. > > > > > > So, qemu should of course endeavour to follow that rule as though it > > > was a human operator on a physical machine and not plug when the > > > indicator is blinking. *But* the qemu plug will in practice be fast > > > enough that if we're hitting real problems here, it suggests the guest > > > is still doing something wrong. > > > > > > > > > I personally think there is a little bit of over-engineering here. > > > Let's start with the spec: > > > > > >   Power Indicator Blinking > > >   A blinking Power Indicator indicates that the slot is powering up or > > > powering down and that > > >   insertion or removal of the adapter is not permitted. > > > > > > What exactly is an interpretation here? > > > As you stated, the races are theoretical, the whole point of the indicator > > > is to let the operator know he can't plug the device just yet. > > > > > > I understand it would be more user friendly if the QEMU would wait internally > > > for the > > > blinking to end, but the whole point of the indicator is to let the operator > > > (human or machine) > > > know they can't plug the device at a specific time. > > > Should QEMU take the responsibility of the operator? Is it even correct? > > > > > > Even if we would want such a feature, how is it related to this patch? > > > The patch simply refuses to start a hotplug operation when it knows it will not > > > succeed. > > >  > > > Another way that would make sense to me would be is a new QEMU interface other > > > than > > > "add_device", let's say "adding_device_allowed", that would return true if the > > > hotplug is allowed > > > at this point of time. (I am aware of the theoretical races) > > > > Rather than adding_device_allowed, something like "query slot" > > might be helpful for debugging. That would help user figure out > > e.g. why isn't device visible without any races. > > Would be new command useful tough? What we end up is broken guest > (if I read commit message right) and a user who has no idea if > device_add was successful or not. > So what user should do in this case > - wait till it explodes? > - can user remove it or it would be stuck there forever? > - poll slot before hotplug, manually? > > (if this is the case then failing device_add cleanly doesn't sound bad, > it looks similar to another error we have "/* Check if hot-plug is disabled on the slot */" > in pcie_cap_slot_pre_plug_cb) > > CCing libvirt, as it concerns not only QEMU. The only reason a separate command might make sense is if libvirt would want to provide a specific error to the user/upper management layer that the operation failed due to a transient failure (so that it can be retried later). We don't really want to go to a policy decision of how long to wait in such case, so unless qemu waits libvirt will plainly want to report an error. That said IMO 'device_add' should still fail if it's certain that the device won't be plugged in. This will fix any client which is currently in use. Adding a separate command is worth only for pre-checking for saner error handling. > > > The above will at least mimic the mechanics of the pyhs world. The operator > > > looks at the indicator, > > > the management software checks if adding the device is allowed. > > > Since it is a corner case I would prefer the device_add to fail rather than > > > introducing a new interface, > > > but that's just me. > > > > > > Thanks, > > > Marcel > > > > > > > I think we want QEMU management interface to be reasonably > > abstract and agnostic if possible. Pushing knowledge of hardware > > detail to management will just lead to pain IMHO. > > We supported device_add which practically never fails for years, > > For CPUs and RAM, device_add can fail so maybe management is also > prepared to handle errors on PCI hotplug path. While it was me who implemented device_add for cpu/memory I don't remmeber any more whether we are really prepared for it. We certainly want to do it if there's a possibility to do it.
On Mon, Oct 26, 2020 at 05:45:37PM +1100, David Gibson wrote: > On Fri, 23 Oct 2020 09:26:48 +0300 > Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote: > > > Hi Michael, > > > > On Thu, Oct 22, 2020 at 6:01 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > [...] > > Simplistic does not mean wrong or incorrect. > > I fail to see why it is not enough. > > > > What QEMU can do better? Wait an unbounded time for the blinking to finish? > > It certainly shouldn't wait an unbounded time. But a wait with timeout > seems worth investigating to me. If it's helpful, I'd add a query to check state so management can figure out why doesn't guest see device yet. But otherwise just buffer the request until such time as we can deliver it to guest ... > > What if we have a buggy guest with a kernel stuck in blinking? > > Is QEMU's responsibility to emulate the operator itself? Because the > > operator > > is the one who is supposed to wait. > > > > > > Thanks, > > Marcel > > > > [...] > > > -- > David Gibson <dgibson@redhat.com> > Principal Software Engineer, Virtualization, Red Hat
On Fri, Oct 23, 2020 at 09:26:48AM +0300, Marcel Apfelbaum wrote: > Hi Michael, > > On Thu, Oct 22, 2020 at 6:01 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote: > > > > > > On Thu, Oct 22, 2020 at 5:33 PM Michael S. Tsirkin <mst@redhat.com> > wrote: > > > >   On Thu, Oct 22, 2020 at 05:10:43PM +0300, Marcel Apfelbaum wrote: > >   > > >   > > >   > On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <mst@redhat.com> > >   wrote: > >   > > >   >   On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum > wrote: > >   >   > Hi David, Michael, > >   >   > > >   >   > On Thu, Oct 22, 2020 at 3:56 PM David Gibson < > dgibson@redhat.com> > >   wrote: > >   >   > > >   >   >   On Thu, 22 Oct 2020 08:06:55 -0400 > >   >   >   "Michael S. Tsirkin" <mst@redhat.com> wrote: > >   >   > > >   >   >   > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel > Apfelbaum > >   wrote: > >   >   >   > > From: Marcel Apfelbaum <marcel@redhat.com> > >   >   >   > > > >   >   >   > > During PCIe Root Port's transition from Power-Off to > >   Power-ON (or > >   >   >   vice-versa) > >   >   >   > > the "Slot Control Register" has the "Power Indicator > >   Control" > >   >   >   > > set to "Blinking" expressing a "power transition" > mode. > >   >   >   > > > >   >   >   > > Any hotplug operation during the "power transition" > mode is > >   not > >   >   >   permitted > >   >   >   > > or at least not expected by the Guest OS leading to > strange > >   >   failures. > >   >   >   > > > >   >   >   > > Detect and refuse hotplug operations in such case. > >   >   >   > > > >   >   >   > > Signed-off-by: Marcel Apfelbaum < > marcel.apfelbaum@gmail.com > >   > > >   >   >   > > --- > >   >   >   > > hw/pci/pcie.c | 7 +++++++ > >   >   >   > > 1 file changed, 7 insertions(+) > >   >   >   > > > >   >   >   > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > >   >   >   > > index 5b48bae0f6..2fe5c1473f 100644 > >   >   >   > > --- a/hw/pci/pcie.c > >   >   >   > > +++ b/hw/pci/pcie.c > >   >   >   > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb > >   (HotplugHandler > >   >   >   *hotplug_dev, DeviceState *dev, > >   >   >   > >   PCIDevice *hotplug_pdev = PCI_DEVICE > (hotplug_dev); > >   >   >   > >   uint8_t *exp_cap = hotplug_pdev->config + > >   hotplug_pdev-> > >   >   >   exp.exp_cap; > >   >   >   > >   uint32_t sltcap = pci_get_word(exp_cap + > >   PCI_EXP_SLTCAP); > >   >   >   > > +  uint32_t sltctl = pci_get_word(exp_cap + > >   PCI_EXP_SLTCTL); > >   >   >   > > > >   >   >   > >   /* Check if hot-plug is disabled on the slot */ > >   >   >   > >   if (dev->hotplugged && (sltcap & > PCI_EXP_SLTCAP_HPC) = > >   = 0) { > >   >   >   > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb > >   >   (HotplugHandler > >   >   >   *hotplug_dev, DeviceState *dev, > >   >   >   > >     return; > >   >   >   > >   } > >   >   >   > > > >   >   >   > > +  if ((sltctl & PCI_EXP_SLTCTL_PIC) == > >   >   PCI_EXP_SLTCTL_PWR_IND_BLINK) > >   >   >   { > >   >   >   > > +    error_setg(errp, "Hot-plug failed: %s is in > Power > >   >   Transition", > >   >   >   > > +          DEVICE(hotplug_pdev)->id); > >   >   >   > > +    return; > >   >   >   > > +  } > >   >   >   > > + > >   >   >   > >   pcie_cap_slot_plug_common(PCI_DEVICE > (hotplug_dev), > >   dev, > >   >   errp); > >   >   >   > > } > >   >   >   > > >   >   >   > Probably the only way to handle for existing machine > types. > >   >   > > >   >   > > >   >   > I agree > >   >   >  > >   >   > > >   >   >   > For new ones, can't we queue it in host memory > somewhere? > >   >   > > >   >   > > >   >   > > >   >   > I am not sure I understand what will be the flow. > >   >   >  - The user asks for a hotplug operation. > >   >   >  - QEMU deferred operation. > >   >   > After that the operation may still fail, how would the user > know if > >   the > >   >   > operation > >   >   > succeeded or not? > >   > > >   > > >   >   How can it fail? It's just a button press ... > >   > > >   > > >   > > >   > Currently we have "Hotplug unsupported." > >   > With this change we have "Guest/System not ready" > > > > > >   Hotplug unsupported is not an error that can trigger with > >   a well behaved management such as libvirt. > > > > > >   >  > >   > > >   >   >  > >   >   > > >   >   >   I'm not actually convinced we can't do that even for > existing > >   machine > >   >   >   types. > >   >   > > >   >   > > >   >   > Is a Guest visible change, I don't think we can do it. > >   >   >  > >   >   > > >   >   >   So I'm a bit hesitant to suggest going ahead with this > without > >   >   >   looking a bit closer at whether we can implement a > >   wait-for-ready in > >   >   >   qemu, rather than forcing every user of qemu (human or > machine) > >   to do > >   >   >   so. > >   >   > > >   >   > > >   >   > While I agree it is a pain from the usability point of view, > >   hotplug > >   >   operations > >   >   > are allowed to fail. This is not more than a corner case, > ensuring > >   the > >   >   right > >   >   > response (gracefully erroring out) may be enough. > >   >   > > >   >   > Thanks, > >   >   > Marcel > >   >   > > >   > > >   > > >   >   I don't think they ever failed in the past so management is > unlikely > >   >   to handle the failure by retrying ... > >   > > >   > > >   > That would require some management handling, yes. > >   > But even without a "retry", failing is better than strange OS > behavior. > >   > > >   > Trying a better alternative like deferring the operation for new > machines > >   > would make sense, however is out of the scope of this patch > > > >   Expand the scope please. The scope should be "solve a problem xx" not > >   "solve a problem xx by doing abc". > > > > > > > > The scope is detecting a hotplug error early instead > > passing to the Guest OS a hotplug operation that we know it will fail. > > > > Right. After detecting just failing unconditionally it a bit too > simplistic IMHO. > > > > Simplistic does not mean wrong or incorrect. > I fail to see why it is not enough. The failure patch requires management to retry later. A more elaborate scheme will fix the bug without need for management changes. > What QEMU can do better? Wait an unbounded time for the blinking to finish? > What if we have a buggy guest with a kernel stuck in blinking? Then it won't see the new device ever but does it even matter? It's stuck ... I'd ack adding a query command to see what is going on with the device. Can be generic, implementable on top of ACPI too. > Is QEMU's responsibility to emulate the operator itself? Because the operator > is the one who is supposed to wait. I think these details are immaterial for users. They don't read pci spec. > > Thanks, > Marcel > > [...]Â
On Tue, 27 Oct 2020 07:26:44 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Mon, Oct 26, 2020 at 05:45:37PM +1100, David Gibson wrote: > > On Fri, 23 Oct 2020 09:26:48 +0300 > > Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote: > > > > > Hi Michael, > > > > > > On Thu, Oct 22, 2020 at 6:01 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > [...] [...] > > > Simplistic does not mean wrong or incorrect. > > > I fail to see why it is not enough. > > > > > > What QEMU can do better? Wait an unbounded time for the blinking to finish? > > > > It certainly shouldn't wait an unbounded time. But a wait with timeout > > seems worth investigating to me. racy, timeout is bound to break once it's in overcommited env. > If it's helpful, I'd add a query to check state > so management can figure out why doesn't guest see device yet. that means mgmt would have to poll it and forward it to user somehow. > But otherwise just buffer the request until such time as > we can deliver it to guest ... I have more questions wrt the suggestion/workflow: * at what place would you suggest buffering it? * what would be the request in this case, i.e. create PCI device anyways and try to signal hotplug event later? * what would baremethal do in such case? * what to do in case guest is never ready, what user should do in such case? * can be such device be removed? not sure that all of this is worth of the effort and added complexity. alternatively: maybe ports can send QMP events about it's state changes, which end user would be able to see + error like in this patch. On top of it, mgmt could build a better UIx, like retry/notify logic if that's what user really wishes for and configures (it would be up to user to define behaviour). > > > What if we have a buggy guest with a kernel stuck in blinking? > > > Is QEMU's responsibility to emulate the operator itself? Because the > > > operator > > > is the one who is supposed to wait. > > > > > > > > > Thanks, > > > Marcel > > > > > > [...] > > > > > > -- > > David Gibson <dgibson@redhat.com> > > Principal Software Engineer, Virtualization, Red Hat > > >
On Tue, Oct 27, 2020 at 01:54:26PM +0100, Igor Mammedov wrote: > On Tue, 27 Oct 2020 07:26:44 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Mon, Oct 26, 2020 at 05:45:37PM +1100, David Gibson wrote: > > > On Fri, 23 Oct 2020 09:26:48 +0300 > > > Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote: > > > > > > > Hi Michael, > > > > > > > > On Thu, Oct 22, 2020 at 6:01 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > [...] > [...] > > > > Simplistic does not mean wrong or incorrect. > > > > I fail to see why it is not enough. > > > > > > > > What QEMU can do better? Wait an unbounded time for the blinking to finish? > > > > > > It certainly shouldn't wait an unbounded time. But a wait with timeout > > > seems worth investigating to me. > racy, timeout is bound to break once it's in overcommited env. > > > If it's helpful, I'd add a query to check state > > so management can figure out why doesn't guest see device yet. > that means mgmt would have to poll it and forward it to user > somehow. > > > But otherwise just buffer the request until such time as > > we can deliver it to guest ... > I have more questions wrt the suggestion/workflow: > * at what place would you suggest buffering it? PCIESlot maybe? > * what would be the request in this case, i.e. create PCI device anyways > and try to signal hotplug event later? that was my idea, yes. > * what would baremethal do in such case? exactly the same, human would wait until blinking stops. > * what to do in case guest is never ready, what user should do in such case? As in guest is stuck? Do we care? It can't use the device. > * can be such device be removed? why not? device_del could complete immediately ... > not sure that all of this is worth of the effort and added complexity. > > alternatively: > maybe ports can send QMP events about it's state changes, which end user would > be able to see + error like in this patch. > > On top of it, mgmt could build a better UIx, like retry/notify logic if > that's what user really wishes for and configures (it would be up to user to > define behaviour). I'd say let's get expected behaviour for existing commands first. We can add events and stuff on top. > > > > What if we have a buggy guest with a kernel stuck in blinking? > > > > Is QEMU's responsibility to emulate the operator itself? Because the > > > > operator > > > > is the one who is supposed to wait. > > > > > > > > > > > > Thanks, > > > > Marcel > > > > > > > > [...] > > > > > > > > > -- > > > David Gibson <dgibson@redhat.com> > > > Principal Software Engineer, Virtualization, Red Hat > > > > > >
On Tue, 27 Oct 2020 13:54:26 +0100 Igor Mammedov <imammedo@redhat.com> wrote: > On Tue, 27 Oct 2020 07:26:44 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > [...] > [...] > [...] > [...] > [...] > > > > > > It certainly shouldn't wait an unbounded time. But a wait with timeout > > > seems worth investigating to me. > racy, timeout is bound to break once it's in overcommited env. Hm. That's no less true at the management layer than it is at the qemu layer. > > If it's helpful, I'd add a query to check state > > so management can figure out why doesn't guest see device yet. > that means mgmt would have to poll it and forward it to user > somehow. If that even makes sense. In the case of Kata, it's supposed to be autonomously creating the VM, so there's nothing meaningful it can forward to the user other than "failed to create the container because of some hotplug problem that means nothing to you". > [...] > I have more questions wrt the suggestion/workflow: > * at what place would you suggest buffering it? > * what would be the request in this case, i.e. create PCI device anyways > and try to signal hotplug event later? > * what would baremethal do in such case? > * what to do in case guest is never ready, what user should do in such case? > * can be such device be removed? > > not sure that all of this is worth of the effort and added complexity. > > alternatively: > maybe ports can send QMP events about it's state changes, which end user would > be able to see + error like in this patch. > > On top of it, mgmt could build a better UIx, like retry/notify logic if > that's what user really wishes for and configures (it would be up to user to > define behaviour). That kind of makes sense if the user is explicitly requesting hotplugs, but that's not necessarily the case. -- David Gibson <dgibson@redhat.com> Principal Software Engineer, Virtualization, Red Hat
On Tue, 27 Oct 2020 09:02:06 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Tue, Oct 27, 2020 at 01:54:26PM +0100, Igor Mammedov wrote: > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > > I have more questions wrt the suggestion/workflow: > > * at what place would you suggest buffering it? > > PCIESlot maybe? > > > * what would be the request in this case, i.e. create PCI device anyways > > and try to signal hotplug event later? > > > that was my idea, yes. Actually, I don't think that will quite work. A whole chunk of the problem here is that because the device is realized, the guest sees it as part of its general scan *before* the hotplug event (ABP and PDC interrupts) appears. That makes the guest misinterpret the ABP as an *unplug* request. So delaying the interrupt without delaying the realize (or at least filtering config space access to it based on.. something) will just trigger the same problem AFAICT. > > * what would baremethal do in such case? > > exactly the same, human would wait until blinking stops. > > > * what to do in case guest is never ready, what user should do in such case? > > As in guest is stuck? Do we care? It can't use the device. > > > * can be such device be removed? > > why not? device_del could complete immediately ... > > [...] > > I'd say let's get expected behaviour for existing commands first. > We can add events and stuff on top. > > [...] > [...] > [...] > -- David Gibson <dgibson@redhat.com> Principal Software Engineer, Virtualization, Red Hat
On Wed, 28 Oct 2020 14:31:35 +1100 David Gibson <dgibson@redhat.com> wrote: > On Tue, 27 Oct 2020 13:54:26 +0100 > Igor Mammedov <imammedo@redhat.com> wrote: > > > On Tue, 27 Oct 2020 07:26:44 -0400 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > [...] > > [...] > > [...] > > [...] > > [...] > > > > > > > > It certainly shouldn't wait an unbounded time. But a wait with timeout > > > > seems worth investigating to me. > > racy, timeout is bound to break once it's in overcommited env. > > Hm. That's no less true at the management layer than it is at the qemu > layer. true, but it's user policy which is defined by user not by QEMU. > > > > If it's helpful, I'd add a query to check state > > > so management can figure out why doesn't guest see device yet. > > that means mgmt would have to poll it and forward it to user > > somehow. > > If that even makes sense. In the case of Kata, it's supposed to be > autonomously creating the VM, so there's nothing meaningful it can > forward to the user other than "failed to create the container because > of some hotplug problem that means nothing to you". > > > [...] > > I have more questions wrt the suggestion/workflow: > > * at what place would you suggest buffering it? > > * what would be the request in this case, i.e. create PCI device anyways > > and try to signal hotplug event later? > > * what would baremethal do in such case? > > * what to do in case guest is never ready, what user should do in such case? > > * can be such device be removed? > > > > not sure that all of this is worth of the effort and added complexity. > > > > alternatively: > > maybe ports can send QMP events about it's state changes, which end user would > > be able to see + error like in this patch. > > > > On top of it, mgmt could build a better UIx, like retry/notify logic if > > that's what user really wishes for and configures (it would be up to user to > > define behaviour). > > That kind of makes sense if the user is explicitly requesting hotplugs, > but that's not necessarily the case. user doesn't have to be a human, it could be some mgmt layer that would automate retry logic, depending on what actually user needs for particular task (i.e. fail immediately, retry N time then fail, retry with time out - then fail, don't care - succeed, ...). The point is for QEMU to provide means for mgmt to implement whatever policy user would need. PS: but then, I know close to nothing about PCI, so all of above might be nonsense.
On Wed, Oct 28, 2020 at 04:39:45PM +0100, Igor Mammedov wrote: > On Wed, 28 Oct 2020 14:31:35 +1100 > David Gibson <dgibson@redhat.com> wrote: > > > On Tue, 27 Oct 2020 13:54:26 +0100 > > Igor Mammedov <imammedo@redhat.com> wrote: > > > > > On Tue, 27 Oct 2020 07:26:44 -0400 > > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > [...] > > > [...] > > > [...] > > > [...] > > > [...] > > > > > > > > > > It certainly shouldn't wait an unbounded time. But a wait with timeout > > > > > seems worth investigating to me. > > > racy, timeout is bound to break once it's in overcommited env. > > > > Hm. That's no less true at the management layer than it is at the qemu > > layer. > true, but it's user policy which is defined by user not by QEMU. > > > > > > > If it's helpful, I'd add a query to check state > > > > so management can figure out why doesn't guest see device yet. > > > that means mgmt would have to poll it and forward it to user > > > somehow. > > > > If that even makes sense. In the case of Kata, it's supposed to be > > autonomously creating the VM, so there's nothing meaningful it can > > forward to the user other than "failed to create the container because > > of some hotplug problem that means nothing to you". > > > > > [...] > > > I have more questions wrt the suggestion/workflow: > > > * at what place would you suggest buffering it? > > > * what would be the request in this case, i.e. create PCI device anyways > > > and try to signal hotplug event later? > > > * what would baremethal do in such case? > > > * what to do in case guest is never ready, what user should do in such case? > > > * can be such device be removed? > > > > > > not sure that all of this is worth of the effort and added complexity. > > > > > > alternatively: > > > maybe ports can send QMP events about it's state changes, which end user would > > > be able to see + error like in this patch. > > > > > > On top of it, mgmt could build a better UIx, like retry/notify logic if > > > that's what user really wishes for and configures (it would be up to user to > > > define behaviour). > > > > That kind of makes sense if the user is explicitly requesting hotplugs, > > but that's not necessarily the case. > user doesn't have to be a human, it could be some mgmt layer that would > automate retry logic, depending on what actually user needs for particular task > (i.e. fail immediately, retry N time then fail, retry with time out - then fail, > don't care - succeed, ...). The point is for QEMU to provide means for mgmt to > implement whatever policy user would need. We are not coming up with new APIs here. Let's make existing ones work reliably first. We can talk about a flag where it fails instead of deferring hotplug, separately. > PS: > but then, I know close to nothing about PCI, so all of above might be nonsense.
On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote: > From: Marcel Apfelbaum <marcel@redhat.com> > > During PCIe Root Port's transition from Power-Off to Power-ON (or vice-versa) > the "Slot Control Register" has the "Power Indicator Control" > set to "Blinking" expressing a "power transition" mode. > > Any hotplug operation during the "power transition" mode is not permitted > or at least not expected by the Guest OS leading to strange failures. > > Detect and refuse hotplug operations in such case. > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Going back to this I have another question: could we get a bit more detail on when do we get into this situation? When does guest start blinking the indicator without us first starting a hotplug operation? > --- > hw/pci/pcie.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > index 5b48bae0f6..2fe5c1473f 100644 > --- a/hw/pci/pcie.c > +++ b/hw/pci/pcie.c > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, > PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev); > uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap; > uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP); > + uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL); > > /* Check if hot-plug is disabled on the slot */ > if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) { > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, > return; > } > > + if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) { > + error_setg(errp, "Hot-plug failed: %s is in Power Transition", > + DEVICE(hotplug_pdev)->id); > + return; > + } > + > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp); > } > > -- > 2.17.2
On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote: > From: Marcel Apfelbaum <marcel@redhat.com> > > During PCIe Root Port's transition from Power-Off to Power-ON (or vice-versa) > the "Slot Control Register" has the "Power Indicator Control" > set to "Blinking" expressing a "power transition" mode. > > Any hotplug operation during the "power transition" mode is not permitted > or at least not expected by the Guest OS leading to strange failures. > > Detect and refuse hotplug operations in such case. > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > --- > hw/pci/pcie.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > index 5b48bae0f6..2fe5c1473f 100644 > --- a/hw/pci/pcie.c > +++ b/hw/pci/pcie.c > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, > PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev); > uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap; > uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP); > + uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL); > > /* Check if hot-plug is disabled on the slot */ > if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) { > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, > return; > } > > + if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) { > + error_setg(errp, "Hot-plug failed: %s is in Power Transition", > + DEVICE(hotplug_pdev)->id); > + return; > + } > + > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp); > } I wonder if hw/pci/shpc.c is free from this issue? Roman.
Hi Roman, On Wed, Nov 11, 2020 at 6:10 PM Roman Kagan <rvkagan@yandex-team.ru> wrote: > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote: > > From: Marcel Apfelbaum <marcel@redhat.com> > > > > During PCIe Root Port's transition from Power-Off to Power-ON (or > vice-versa) > > the "Slot Control Register" has the "Power Indicator Control" > > set to "Blinking" expressing a "power transition" mode. > > > > Any hotplug operation during the "power transition" mode is not permitted > > or at least not expected by the Guest OS leading to strange failures. > > > > Detect and refuse hotplug operations in such case. > > > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > --- > > hw/pci/pcie.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > index 5b48bae0f6..2fe5c1473f 100644 > > --- a/hw/pci/pcie.c > > +++ b/hw/pci/pcie.c > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler > *hotplug_dev, DeviceState *dev, > > PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev); > > uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap; > > uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP); > > + uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL); > > > > /* Check if hot-plug is disabled on the slot */ > > if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) { > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler > *hotplug_dev, DeviceState *dev, > > return; > > } > > > > + if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) { > > + error_setg(errp, "Hot-plug failed: %s is in Power Transition", > > + DEVICE(hotplug_pdev)->id); > > + return; > > + } > > + > > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp); > > } > > I wonder if hw/pci/shpc.c is free from this issue? > > I did not encounter the issue with shpc. However, the SHPC spec describes the Power Indicator behavior for hotplug only leaving the booting sequence out of it: "Although the Standard Usage Model presented in Section 2.2.1.2 requires a specific relationship between the Power Indicator and the state of the slot, this relationship is enforced by software, not by the SHPC." It looks like it depends on the software implementation. Thanks, Marcel > Roman. > <div dir="ltr"><div dir="ltr">Hi Roman,</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Nov 11, 2020 at 6:10 PM Roman Kagan <<a href="mailto:rvkagan@yandex-team.ru">rvkagan@yandex-team.ru</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:<br> > From: Marcel Apfelbaum <<a href="mailto:marcel@redhat.com" target="_blank">marcel@redhat.com</a>><br> > <br> > During PCIe Root Port's transition from Power-Off to Power-ON (or vice-versa)<br> > the "Slot Control Register" has the "Power Indicator Control"<br> > set to "Blinking" expressing a "power transition" mode.<br> > <br> > Any hotplug operation during the "power transition" mode is not permitted<br> > or at least not expected by the Guest OS leading to strange failures.<br> > <br> > Detect and refuse hotplug operations in such case.<br> > <br> > Signed-off-by: Marcel Apfelbaum <<a href="mailto:marcel.apfelbaum@gmail.com" target="_blank">marcel.apfelbaum@gmail.com</a>><br> > ---<br> > hw/pci/pcie.c | 7 +++++++<br> > 1 file changed, 7 insertions(+)<br> > <br> > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c<br> > index 5b48bae0f6..2fe5c1473f 100644<br> > --- a/hw/pci/pcie.c<br> > +++ b/hw/pci/pcie.c<br> > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,<br> > PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);<br> > uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap;<br> > uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);<br> > + uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);<br> > <br> > /* Check if hot-plug is disabled on the slot */<br> > if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {<br> > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,<br> > return;<br> > }<br> > <br> > + if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) {<br> > + error_setg(errp, "Hot-plug failed: %s is in Power Transition",<br> > + DEVICE(hotplug_pdev)->id);<br> > + return;<br> > + }<br> > +<br> > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp);<br> > }<br> <br> I wonder if hw/pci/shpc.c is free from this issue?<br> <br></blockquote><div><br></div><div>I did not encounter the issue with shpc.</div><div><br></div><div>However, the SHPC spec describes the Power Indicator behavior for hotplug only leaving the booting sequence out of it: </div><div> "Although the Standard Usage Model presented in Section 2.2.1.2 requires a specific</div> relationship between the Power Indicator and the state of the slot, this relationship is<br><div> enforced by software, not by the SHPC."</div><div><br></div><div>It looks like it depends on the software implementation.</div><div><br></div><div><br></div><div>Thanks,</div><div>Marcel</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> Roman.<br> </blockquote></div></div>
Hi Michael, On Wed, Nov 11, 2020 at 2:35 PM Michael S. Tsirkin <mst@redhat.com> wrote: > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote: > > From: Marcel Apfelbaum <marcel@redhat.com> > > > > During PCIe Root Port's transition from Power-Off to Power-ON (or > vice-versa) > > the "Slot Control Register" has the "Power Indicator Control" > > set to "Blinking" expressing a "power transition" mode. > > > > Any hotplug operation during the "power transition" mode is not permitted > > or at least not expected by the Guest OS leading to strange failures. > > > > Detect and refuse hotplug operations in such case. > > > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > > Going back to this I have another question: could we get > a bit more detail on when do we get into this situation? > When does guest start blinking the indicator without us > first starting a hotplug operation? > While David has more insight on the kernel behavior during the mentioned issue, it seems the PCI subsystem sets the Power Indicator to "blinking" as part of the init sequence of the PCIe Root Port. The kernel will turn it off as soon as it finishes the init sequence. A hotplug operation during the "init" sequence will surprise the Guest OS. I'll let David to give us more info. Thank you, Marcel > > --- > > hw/pci/pcie.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > index 5b48bae0f6..2fe5c1473f 100644 > > --- a/hw/pci/pcie.c > > +++ b/hw/pci/pcie.c > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler > *hotplug_dev, DeviceState *dev, > > PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev); > > uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap; > > uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP); > > + uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL); > > > > /* Check if hot-plug is disabled on the slot */ > > if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) { > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler > *hotplug_dev, DeviceState *dev, > > return; > > } > > > > + if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) { > > + error_setg(errp, "Hot-plug failed: %s is in Power Transition", > > + DEVICE(hotplug_pdev)->id); > > + return; > > + } > > + > > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp); > > } > > > > -- > > 2.17.2 > > <div dir="ltr"><div dir="ltr">Hi Michael,</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Nov 11, 2020 at 2:35 PM Michael S. Tsirkin <<a href="mailto:mst@redhat.com">mst@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:<br> > From: Marcel Apfelbaum <<a href="mailto:marcel@redhat.com" target="_blank">marcel@redhat.com</a>><br> > <br> > During PCIe Root Port's transition from Power-Off to Power-ON (or vice-versa)<br> > the "Slot Control Register" has the "Power Indicator Control"<br> > set to "Blinking" expressing a "power transition" mode.<br> > <br> > Any hotplug operation during the "power transition" mode is not permitted<br> > or at least not expected by the Guest OS leading to strange failures.<br> > <br> > Detect and refuse hotplug operations in such case.<br> > <br> > Signed-off-by: Marcel Apfelbaum <<a href="mailto:marcel.apfelbaum@gmail.com" target="_blank">marcel.apfelbaum@gmail.com</a>><br> <br> <br> Going back to this I have another question: could we get<br> a bit more detail on when do we get into this situation?<br> When does guest start blinking the indicator without us<br> first starting a hotplug operation?<br></blockquote><div><br></div><div>While David has more insight on the kernel behavior during the mentioned issue,</div><div>it seems the PCI subsystem sets the Power Indicator</div><div>to "blinking" as part of the init sequence of the PCIe Root Port.</div><div>The kernel will turn it off as soon as it finishes the init sequence.</div><div><br></div><div>A hotplug operation during the "init" sequence will surprise the Guest OS.</div><div>I'll let David to give us more info.</div><div><br></div><div>Thank you,</div><div>Marcel </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> > ---<br> > hw/pci/pcie.c | 7 +++++++<br> > 1 file changed, 7 insertions(+)<br> > <br> > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c<br> > index 5b48bae0f6..2fe5c1473f 100644<br> > --- a/hw/pci/pcie.c<br> > +++ b/hw/pci/pcie.c<br> > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,<br> > PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);<br> > uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap;<br> > uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);<br> > + uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);<br> > <br> > /* Check if hot-plug is disabled on the slot */<br> > if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {<br> > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,<br> > return;<br> > }<br> > <br> > + if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) {<br> > + error_setg(errp, "Hot-plug failed: %s is in Power Transition",<br> > + DEVICE(hotplug_pdev)->id);<br> > + return;<br> > + }<br> > +<br> > pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp);<br> > }<br> > <br> > -- <br> > 2.17.2<br> <br> </blockquote></div></div>
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index 5b48bae0f6..2fe5c1473f 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev); uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap; uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP); + uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL); /* Check if hot-plug is disabled on the slot */ if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) { @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, return; } + if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) { + error_setg(errp, "Hot-plug failed: %s is in Power Transition", + DEVICE(hotplug_pdev)->id); + return; + } + pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp); }