Message ID | 1416937469-8162-1-git-send-email-julien.grall@linaro.org |
---|---|
State | Superseded, archived |
Headers | show |
Hi Stefano, On 27/11/14 10:51, Stefano Stabellini wrote: > On Thu, 27 Nov 2014, Ian Campbell wrote: >> On Tue, 2014-11-25 at 17:44 +0000, Julien Grall wrote: >>> ARMv8 model may not disable correctly the timer interrupt when Xen >> >> "correct disable" >> >>> context switch to an idle vCPU. Therefore Xen may receive a spurious >> >> "context switches" and s/spurious/unexpected/ (since spurious has a >> specific meaning in the h/w which does not match what is happening here) >> >>> timer interrupt. As the idle domain doesn't have vGIC, Xen will crash >>> when trying to inject the interrupt with the following stack trace. >>> >>> (XEN) [<0000000000228388>] _spin_lock_irqsave+0x28/0x94 (PC) >>> (XEN) [<0000000000228380>] _spin_lock_irqsave+0x20/0x94 (LR) >>> (XEN) [<0000000000250510>] vgic_vcpu_inject_irq+0x40/0x1b0 >>> (XEN) [<000000000024bcd0>] vtimer_interrupt+0x4c/0x54 >>> (XEN) [<0000000000247010>] do_IRQ+0x1a4/0x220 >>> (XEN) [<0000000000244864>] gic_interrupt+0x50/0xec >>> (XEN) [<000000000024fbac>] do_trap_irq+0x20/0x2c >>> (XEN) [<0000000000255240>] hyp_irq+0x5c/0x60 >>> (XEN) [<0000000000241084>] context_switch+0xb8/0xc4 >>> (XEN) [<000000000022482c>] schedule+0x684/0x6d0 >>> (XEN) [<000000000022785c>] __do_softirq+0xcc/0xe8 >>> (XEN) [<00000000002278d4>] do_softirq+0x14/0x1c >>> (XEN) [<0000000000240fac>] idle_loop+0x134/0x154 >>> (XEN) [<000000000024c160>] start_secondary+0x14c/0x15c >>> (XEN) [<0000000000000001>] 0000000000000001 >>> >>> While we receive spurious virtual timer interrupt, this could be safely >>> ignore for the time being. A proper fix need to be found for Xen 4.6. >>> >>> Signed-off-by: Julien Grall <julien.grall@linaro.org> >> >> Acked-by: Ian Campbell <ian.campbell@citrix.com> >> >> Although I wonder if we should log, perhaps rate limited or only once. >> >> Also, I've some grammar nits (above and below) which I can fix on commit >> if there is no resend... >> >>> >>> --- >>> >>> This patch is a bug fix candidate for Xen 4.5. Any ARMv8 model may >>> randomly crash when running Xen. >> >> CCing Konrad. >> >>> This patch don't inject the virtual timer interrupt if the current VCPU >>> is the idle one. Entering in this function with the idle VCPU is already >>> a bug itself. For now, I think this patch is the safest way to resolve >>> the problem. >>> >>> Meanwhile, I'm investigating with ARM to see wheter the bug comes from >>> Xen or the model. > > It is worth noting that there are no bad side effects of this change: > the vtimer_interrupt is always supposed to be received on non-idle > domains. As Julien wrote, the fact that we are receiving a > vtimer_interrupt in the idle_domain is a bug, one that probably comes > from the ARM model not emulating hardware correctly. ARM says: "The v8A ARM ARM says that the signal output will be disabled if , so the signal will be set to 0. However, how this is treated by the GIC depends on its configuration." So I'm not so sure it's a model bug. Regards,
Hi Ian, On 27/11/14 10:40, Ian Campbell wrote: > On Tue, 2014-11-25 at 17:44 +0000, Julien Grall wrote: >> ARMv8 model may not disable correctly the timer interrupt when Xen > > "correct disable" > >> context switch to an idle vCPU. Therefore Xen may receive a spurious > > "context switches" and s/spurious/unexpected/ (since spurious has a > specific meaning in the h/w which does not match what is happening here) > >> timer interrupt. As the idle domain doesn't have vGIC, Xen will crash >> when trying to inject the interrupt with the following stack trace. >> >> (XEN) [<0000000000228388>] _spin_lock_irqsave+0x28/0x94 (PC) >> (XEN) [<0000000000228380>] _spin_lock_irqsave+0x20/0x94 (LR) >> (XEN) [<0000000000250510>] vgic_vcpu_inject_irq+0x40/0x1b0 >> (XEN) [<000000000024bcd0>] vtimer_interrupt+0x4c/0x54 >> (XEN) [<0000000000247010>] do_IRQ+0x1a4/0x220 >> (XEN) [<0000000000244864>] gic_interrupt+0x50/0xec >> (XEN) [<000000000024fbac>] do_trap_irq+0x20/0x2c >> (XEN) [<0000000000255240>] hyp_irq+0x5c/0x60 >> (XEN) [<0000000000241084>] context_switch+0xb8/0xc4 >> (XEN) [<000000000022482c>] schedule+0x684/0x6d0 >> (XEN) [<000000000022785c>] __do_softirq+0xcc/0xe8 >> (XEN) [<00000000002278d4>] do_softirq+0x14/0x1c >> (XEN) [<0000000000240fac>] idle_loop+0x134/0x154 >> (XEN) [<000000000024c160>] start_secondary+0x14c/0x15c >> (XEN) [<0000000000000001>] 0000000000000001 >> >> While we receive spurious virtual timer interrupt, this could be safely >> ignore for the time being. A proper fix need to be found for Xen 4.6. >> >> Signed-off-by: Julien Grall <julien.grall@linaro.org> > > Acked-by: Ian Campbell <ian.campbell@citrix.com> > > Although I wonder if we should log, perhaps rate limited or only once. I don't think the printk is necessary, receiving this unexpected interrupt is harmless from the perspective that the guest will still work when the vCPU will run again. > Also, I've some grammar nits (above and below) which I can fix on commit > if there is no resend... Thanks. Regards,
Hi, On 25/11/14 17:44, Julien Grall wrote: > ARMv8 model may not disable correctly the timer interrupt when Xen > context switch to an idle vCPU. Therefore Xen may receive a spurious > timer interrupt. As the idle domain doesn't have vGIC, Xen will crash > when trying to inject the interrupt with the following stack trace. > > (XEN) [<0000000000228388>] _spin_lock_irqsave+0x28/0x94 (PC) > (XEN) [<0000000000228380>] _spin_lock_irqsave+0x20/0x94 (LR) > (XEN) [<0000000000250510>] vgic_vcpu_inject_irq+0x40/0x1b0 > (XEN) [<000000000024bcd0>] vtimer_interrupt+0x4c/0x54 > (XEN) [<0000000000247010>] do_IRQ+0x1a4/0x220 > (XEN) [<0000000000244864>] gic_interrupt+0x50/0xec > (XEN) [<000000000024fbac>] do_trap_irq+0x20/0x2c > (XEN) [<0000000000255240>] hyp_irq+0x5c/0x60 > (XEN) [<0000000000241084>] context_switch+0xb8/0xc4 > (XEN) [<000000000022482c>] schedule+0x684/0x6d0 > (XEN) [<000000000022785c>] __do_softirq+0xcc/0xe8 > (XEN) [<00000000002278d4>] do_softirq+0x14/0x1c > (XEN) [<0000000000240fac>] idle_loop+0x134/0x154 > (XEN) [<000000000024c160>] start_secondary+0x14c/0x15c > (XEN) [<0000000000000001>] 0000000000000001 > > While we receive spurious virtual timer interrupt, this could be safely > ignore for the time being. A proper fix need to be found for Xen 4.6. Thanks to ARM, they found the root of the problem. When the timer interrupt is edge-triggered, disabling the timer output signal, the state of the interrupt won't change in the GIC. I propose to reword the commit message into: xen/arm: Handle platforms with edge-triggered virtual timer Some platforms (such as the ARMv8 model) uses edge-triggered interrupt for the virtual timer. Even if the timer output signal is masked in the context switch, the GIC will keep track that of any raised interrupt while the IRQs has been disabled. As soon as the IRQs are re-enabled, the virtual interrupt timer will be injected to Xen. The interrupt handler doesn't except to the receive the IRQ and will crash if an idle vCPU is running: (XEN) [<0000000000228388>] _spin_lock_irqsave+0x28/0x94 (PC) (XEN) [<0000000000228380>] _spin_lock_irqsave+0x20/0x94 (LR) (XEN) [<0000000000250510>] vgic_vcpu_inject_irq+0x40/0x1b0 (XEN) [<000000000024bcd0>] vtimer_interrupt+0x4c/0x54 (XEN) [<0000000000247010>] do_IRQ+0x1a4/0x220 (XEN) [<0000000000244864>] gic_interrupt+0x50/0xec (XEN) [<000000000024fbac>] do_trap_irq+0x20/0x2c (XEN) [<0000000000255240>] hyp_irq+0x5c/0x60 (XEN) [<0000000000241084>] context_switch+0xb8/0xc4 (XEN) [<000000000022482c>] schedule+0x684/0x6d0 (XEN) [<000000000022785c>] __do_softirq+0xcc/0xe8 (XEN) [<00000000002278d4>] do_softirq+0x14/0x1c (XEN) [<0000000000240fac>] idle_loop+0x134/0x154 (XEN) [<000000000024c160>] start_secondary+0x14c/0x15c (XEN) [<0000000000000001>] 0000000000000001 The proper solution would be context/switching the virtual interrupt state at the GIC level. This would also avoid masking the output signal and requires specific handling in the guest OS. Sadly, this solution require some refactoring which would miss the Xen 4.5 release. For now implement a temporary solution which ignore the interrupt when the idle VCPU is running. Regards,
Hi Ian, On 28/11/14 11:47, Ian Campbell wrote: > On Thu, 2014-11-27 at 18:02 +0000, Julien Grall wrote: >> state at the GIC level. This would also avoid masking the output signal >> and requires specific handling in the guest OS. > > "which requires"? > > It doesn't seem quite right to me otherwise, since context switching the > virq state *removes* the need to have the guest do anything other than > what it would do on native. I though the "avoid" would apply for both "masking" and "requires". > Assuming this is what you meant I propose (fixing some grammar etc as I > go): Thanks for the correction, I will use this version. Shall I put your signed-off-by? Regards,
Hi Ian, On 28/11/14 12:32, Ian Campbell wrote: > On Thu, 2014-11-27 at 18:02 +0000, Julien Grall wrote: >> I propose to reword the commit message into: > > You'll want to change the code comment too I think. It was my plan. I will send a new version during the day. Regards,
diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c index a6436f1..83c74cb 100644 --- a/xen/arch/arm/time.c +++ b/xen/arch/arm/time.c @@ -169,6 +169,14 @@ static void timer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) static void vtimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) { + /* + * ARMv8 model may not disable correctly the timer interrupt when + * Xen context switch to an idle vCPU. Therefore Xen may receive + * timer interrupt. + */ + if ( is_idle_vcpu(current) ) + return; + current->arch.virt_timer.ctl = READ_SYSREG32(CNTV_CTL_EL0); WRITE_SYSREG32(current->arch.virt_timer.ctl | CNTx_CTL_MASK, CNTV_CTL_EL0); vgic_vcpu_inject_irq(current, current->arch.virt_timer.irq);
ARMv8 model may not disable correctly the timer interrupt when Xen context switch to an idle vCPU. Therefore Xen may receive a spurious timer interrupt. As the idle domain doesn't have vGIC, Xen will crash when trying to inject the interrupt with the following stack trace. (XEN) [<0000000000228388>] _spin_lock_irqsave+0x28/0x94 (PC) (XEN) [<0000000000228380>] _spin_lock_irqsave+0x20/0x94 (LR) (XEN) [<0000000000250510>] vgic_vcpu_inject_irq+0x40/0x1b0 (XEN) [<000000000024bcd0>] vtimer_interrupt+0x4c/0x54 (XEN) [<0000000000247010>] do_IRQ+0x1a4/0x220 (XEN) [<0000000000244864>] gic_interrupt+0x50/0xec (XEN) [<000000000024fbac>] do_trap_irq+0x20/0x2c (XEN) [<0000000000255240>] hyp_irq+0x5c/0x60 (XEN) [<0000000000241084>] context_switch+0xb8/0xc4 (XEN) [<000000000022482c>] schedule+0x684/0x6d0 (XEN) [<000000000022785c>] __do_softirq+0xcc/0xe8 (XEN) [<00000000002278d4>] do_softirq+0x14/0x1c (XEN) [<0000000000240fac>] idle_loop+0x134/0x154 (XEN) [<000000000024c160>] start_secondary+0x14c/0x15c (XEN) [<0000000000000001>] 0000000000000001 While we receive spurious virtual timer interrupt, this could be safely ignore for the time being. A proper fix need to be found for Xen 4.6. Signed-off-by: Julien Grall <julien.grall@linaro.org> --- This patch is a bug fix candidate for Xen 4.5. Any ARMv8 model may randomly crash when running Xen. This patch don't inject the virtual timer interrupt if the current VCPU is the idle one. Entering in this function with the idle VCPU is already a bug itself. For now, I think this patch is the safest way to resolve the problem. Meanwhile, I'm investigating with ARM to see wheter the bug comes from Xen or the model. --- xen/arch/arm/time.c | 8 ++++++++ 1 file changed, 8 insertions(+)