Message ID | 20210219154030.10892-1-jgross@suse.com |
---|---|
Headers | show |
Series | xen/events: bug fixes and some diagnostic aids | expand |
Hi Juergen, On 19/02/2021 15:40, Juergen Gross wrote: > An event channel should be kept masked when an eoi is pending for it. > When being migrated to another cpu it might be unmasked, though. > > In order to avoid this keep three different flags for each event channel > to be able to distinguish "normal" masking/unmasking from eoi related > masking/unmasking and temporary masking. The event channel should only > be able to generate an interrupt if all flags are cleared. > > Cc: stable@vger.kernel.org > Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework") > Reported-by: Julien Grall <julien@xen.org> > Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Cheers, -- Julien Grall
On 2021-02-19 15:40, Juergen Gross wrote: > An event channel should be kept masked when an eoi is pending for it. > When being migrated to another cpu it might be unmasked, though. > > In order to avoid this keep three different flags for each event channel > to be able to distinguish "normal" masking/unmasking from eoi related > masking/unmasking and temporary masking. The event channel should only > be able to generate an interrupt if all flags are cleared. > > Cc: stable@vger.kernel.org > Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework") > Reported-by: Julien Grall <julien@xen.org> > Signed-off-by: Juergen Gross <jgross@suse.com> I tested this patch series backported to a 4.19 kernel and found that when doing a reboot loop of Windows with PV drivers, occasionally it will end up in a state with some event channels pending and masked in dom0 which breaks networking in the guest. The issue seems to have been introduced with this patch, though at first glance it appears correct. I haven't yet looked into why it is happening. Have you seen anything like this with this patch? Thanks, Ross
On 23.02.21 10:26, Ross Lagerwall wrote: > On 2021-02-19 15:40, Juergen Gross wrote: >> An event channel should be kept masked when an eoi is pending for it. >> When being migrated to another cpu it might be unmasked, though. >> >> In order to avoid this keep three different flags for each event channel >> to be able to distinguish "normal" masking/unmasking from eoi related >> masking/unmasking and temporary masking. The event channel should only >> be able to generate an interrupt if all flags are cleared. >> >> Cc: stable@vger.kernel.org >> Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework") >> Reported-by: Julien Grall <julien@xen.org> >> Signed-off-by: Juergen Gross <jgross@suse.com> > > I tested this patch series backported to a 4.19 kernel and found that > when doing a reboot loop of Windows with PV drivers, occasionally it will > end up in a state with some event channels pending and masked in dom0 > which breaks networking in the guest. > > The issue seems to have been introduced with this patch, though at first > glance it appears correct. I haven't yet looked into why it is happening. > Have you seen anything like this with this patch? Sorry it took so long, but now I was able to look into this issue. I have managed to reproduce it with a pv Linux guest. I'm now adding some debug code to understand what is happening there. Juergen
On 23.02.21 10:26, Ross Lagerwall wrote: > On 2021-02-19 15:40, Juergen Gross wrote: >> An event channel should be kept masked when an eoi is pending for it. >> When being migrated to another cpu it might be unmasked, though. >> >> In order to avoid this keep three different flags for each event channel >> to be able to distinguish "normal" masking/unmasking from eoi related >> masking/unmasking and temporary masking. The event channel should only >> be able to generate an interrupt if all flags are cleared. >> >> Cc: stable@vger.kernel.org >> Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework") >> Reported-by: Julien Grall <julien@xen.org> >> Signed-off-by: Juergen Gross <jgross@suse.com> > > I tested this patch series backported to a 4.19 kernel and found that > when doing a reboot loop of Windows with PV drivers, occasionally it will > end up in a state with some event channels pending and masked in dom0 > which breaks networking in the guest. > > The issue seems to have been introduced with this patch, though at first > glance it appears correct. I haven't yet looked into why it is happening. > Have you seen anything like this with this patch? I have found the issue. lateeoi_mask_ack_dynirq() must not set the "eoi" mask reason flag, as this callback will be called when the handler will not be called later, so there will never be a call of xen_irq_lateeoi() to unmask the event channel again. Juergen