Message ID | alpine.DEB.2.02.1411191537340.12596@kaball.uk.xensource.com |
---|---|
State | New |
Headers | show |
On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >> Hi Stefano, >> >> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini >> <stefano.stabellini@eu.citrix.com> wrote: >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >> >> Hi Stefano, >> >> >> >> > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) >> >> > > - GICH[GICH_HCR] |= GICH_HCR_UIE; >> >> > > + GICH[GICH_HCR] |= GICH_HCR_NPIE; >> >> > > else >> >> > > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; >> >> > > + GICH[GICH_HCR] &= ~GICH_HCR_NPIE; >> >> > > >> >> > > } >> >> > >> >> > Yes, exactly >> >> >> >> I tried, hang still occurs with this change >> > >> > We need to figure out why during the hang you still have all the LRs >> > busy even if you are getting maintenance interrupts that should cause >> > them to be cleared. >> > >> >> I see that I have free LRs during maintenance interrupt >> >> (XEN) gic.c:871:d0v0 maintenance interrupt >> (XEN) GICH_LRs (vcpu 0) mask=0 >> (XEN) HW_LR[0]=9a015856 >> (XEN) HW_LR[1]=0 >> (XEN) HW_LR[2]=0 >> (XEN) HW_LR[3]=0 >> (XEN) Inflight irq=86 lr=0 >> (XEN) Inflight irq=2 lr=255 >> (XEN) Pending irq=2 >> >> But I see that after I got hang - maintenance interrupts are generated >> continuously. Platform continues printing the same log till reboot. > > Exactly the same log? As in the one above you just pasted? > That is very very suspicious. Yes exactly the same log. And looks like it means that LRs are flushed correctly. > > I am thinking that we are not handling GICH_HCR_UIE correctly and > something we do in Xen, maybe writing to an LR register, might trigger a > new maintenance interrupt immediately causing an infinite loop. > Yes, this is what I'm thinking about. Taking in account all collected debug info it looks like once LRs are overloaded with SGIs - maintenance interrupt occurs. And then it is not handled properly, and occurs again and again - so platform hangs inside its handler. > Could you please try this patch? It disable GICH_HCR_UIE immediately on > hypervisor entry. > Now trying. > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > index 4d2a92d..6ae8dc4 100644 > --- a/xen/arch/arm/gic.c > +++ b/xen/arch/arm/gic.c > @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) > if ( is_idle_vcpu(v) ) > return; > > + GICH[GICH_HCR] &= ~GICH_HCR_UIE; > + > spin_lock_irqsave(&v->arch.vgic.lock, flags); > > while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), > @@ -821,12 +823,8 @@ void gic_inject(void) > > gic_restore_pending_irqs(current); > > - > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > GICH[GICH_HCR] |= GICH_HCR_UIE; > - else > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; > - > } > > static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> wrote: > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini > <stefano.stabellini@eu.citrix.com> wrote: >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >>> Hi Stefano, >>> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini >>> <stefano.stabellini@eu.citrix.com> wrote: >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >>> >> Hi Stefano, >>> >> >>> >> > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) >>> >> > > - GICH[GICH_HCR] |= GICH_HCR_UIE; >>> >> > > + GICH[GICH_HCR] |= GICH_HCR_NPIE; >>> >> > > else >>> >> > > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; >>> >> > > + GICH[GICH_HCR] &= ~GICH_HCR_NPIE; >>> >> > > >>> >> > > } >>> >> > >>> >> > Yes, exactly >>> >> >>> >> I tried, hang still occurs with this change >>> > >>> > We need to figure out why during the hang you still have all the LRs >>> > busy even if you are getting maintenance interrupts that should cause >>> > them to be cleared. >>> > >>> >>> I see that I have free LRs during maintenance interrupt >>> >>> (XEN) gic.c:871:d0v0 maintenance interrupt >>> (XEN) GICH_LRs (vcpu 0) mask=0 >>> (XEN) HW_LR[0]=9a015856 >>> (XEN) HW_LR[1]=0 >>> (XEN) HW_LR[2]=0 >>> (XEN) HW_LR[3]=0 >>> (XEN) Inflight irq=86 lr=0 >>> (XEN) Inflight irq=2 lr=255 >>> (XEN) Pending irq=2 >>> >>> But I see that after I got hang - maintenance interrupts are generated >>> continuously. Platform continues printing the same log till reboot. >> >> Exactly the same log? As in the one above you just pasted? >> That is very very suspicious. > > Yes exactly the same log. And looks like it means that LRs are flushed > correctly. > >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and >> something we do in Xen, maybe writing to an LR register, might trigger a >> new maintenance interrupt immediately causing an infinite loop. >> > > Yes, this is what I'm thinking about. Taking in account all collected > debug info it looks like once LRs are overloaded with SGIs - > maintenance interrupt occurs. > And then it is not handled properly, and occurs again and again - so > platform hangs inside its handler. > >> Could you please try this patch? It disable GICH_HCR_UIE immediately on >> hypervisor entry. >> > > Now trying. > >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c >> index 4d2a92d..6ae8dc4 100644 >> --- a/xen/arch/arm/gic.c >> +++ b/xen/arch/arm/gic.c >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) >> if ( is_idle_vcpu(v) ) >> return; >> >> + GICH[GICH_HCR] &= ~GICH_HCR_UIE; >> + >> spin_lock_irqsave(&v->arch.vgic.lock, flags); >> >> while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), >> @@ -821,12 +823,8 @@ void gic_inject(void) >> >> gic_restore_pending_irqs(current); >> >> - >> if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) >> GICH[GICH_HCR] |= GICH_HCR_UIE; >> - else >> - GICH[GICH_HCR] &= ~GICH_HCR_UIE; >> - >> } >> >> static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi) > Heh - I don't see hangs with this patch :) But also I see that maintenance interrupt doesn't occur (and no hang as result) Stefano - is this expected? > > > -- > > Andrii Tseglytskyi | Embedded Dev > GlobalLogic > www.globallogic.com
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi > <andrii.tseglytskyi@globallogic.com> wrote: > > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini > > <stefano.stabellini@eu.citrix.com> wrote: > >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > >>> Hi Stefano, > >>> > >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini > >>> <stefano.stabellini@eu.citrix.com> wrote: > >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > >>> >> Hi Stefano, > >>> >> > >>> >> > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > >>> >> > > - GICH[GICH_HCR] |= GICH_HCR_UIE; > >>> >> > > + GICH[GICH_HCR] |= GICH_HCR_NPIE; > >>> >> > > else > >>> >> > > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; > >>> >> > > + GICH[GICH_HCR] &= ~GICH_HCR_NPIE; > >>> >> > > > >>> >> > > } > >>> >> > > >>> >> > Yes, exactly > >>> >> > >>> >> I tried, hang still occurs with this change > >>> > > >>> > We need to figure out why during the hang you still have all the LRs > >>> > busy even if you are getting maintenance interrupts that should cause > >>> > them to be cleared. > >>> > > >>> > >>> I see that I have free LRs during maintenance interrupt > >>> > >>> (XEN) gic.c:871:d0v0 maintenance interrupt > >>> (XEN) GICH_LRs (vcpu 0) mask=0 > >>> (XEN) HW_LR[0]=9a015856 > >>> (XEN) HW_LR[1]=0 > >>> (XEN) HW_LR[2]=0 > >>> (XEN) HW_LR[3]=0 > >>> (XEN) Inflight irq=86 lr=0 > >>> (XEN) Inflight irq=2 lr=255 > >>> (XEN) Pending irq=2 > >>> > >>> But I see that after I got hang - maintenance interrupts are generated > >>> continuously. Platform continues printing the same log till reboot. > >> > >> Exactly the same log? As in the one above you just pasted? > >> That is very very suspicious. > > > > Yes exactly the same log. And looks like it means that LRs are flushed > > correctly. > > > >> > >> I am thinking that we are not handling GICH_HCR_UIE correctly and > >> something we do in Xen, maybe writing to an LR register, might trigger a > >> new maintenance interrupt immediately causing an infinite loop. > >> > > > > Yes, this is what I'm thinking about. Taking in account all collected > > debug info it looks like once LRs are overloaded with SGIs - > > maintenance interrupt occurs. > > And then it is not handled properly, and occurs again and again - so > > platform hangs inside its handler. > > > >> Could you please try this patch? It disable GICH_HCR_UIE immediately on > >> hypervisor entry. > >> > > > > Now trying. > > > >> > >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > >> index 4d2a92d..6ae8dc4 100644 > >> --- a/xen/arch/arm/gic.c > >> +++ b/xen/arch/arm/gic.c > >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) > >> if ( is_idle_vcpu(v) ) > >> return; > >> > >> + GICH[GICH_HCR] &= ~GICH_HCR_UIE; > >> + > >> spin_lock_irqsave(&v->arch.vgic.lock, flags); > >> > >> while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), > >> @@ -821,12 +823,8 @@ void gic_inject(void) > >> > >> gic_restore_pending_irqs(current); > >> > >> - > >> if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > >> GICH[GICH_HCR] |= GICH_HCR_UIE; > >> - else > >> - GICH[GICH_HCR] &= ~GICH_HCR_UIE; > >> - > >> } > >> > >> static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi) > > > > Heh - I don't see hangs with this patch :) But also I see that > maintenance interrupt doesn't occur (and no hang as result) > Stefano - is this expected? No maintenance interrupts at all? That's strange. You should be receiving them when LRs are full and you still have interrupts pending to be added to them. You could add another printk here to see if you should be receiving them: if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) + { + gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n"); GICH[GICH_HCR] |= GICH_HCR_UIE; - else - GICH[GICH_HCR] &= ~GICH_HCR_UIE; - + } } > > > > > > -- > > > > Andrii Tseglytskyi | Embedded Dev > > GlobalLogic > > www.globallogic.com > > > > -- > > Andrii Tseglytskyi | Embedded Dev > GlobalLogic > www.globallogic.com >
On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi >> <andrii.tseglytskyi@globallogic.com> wrote: >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini >> > <stefano.stabellini@eu.citrix.com> wrote: >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >> >>> Hi Stefano, >> >>> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini >> >>> <stefano.stabellini@eu.citrix.com> wrote: >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >> >>> >> Hi Stefano, >> >>> >> >> >>> >> > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) >> >>> >> > > - GICH[GICH_HCR] |= GICH_HCR_UIE; >> >>> >> > > + GICH[GICH_HCR] |= GICH_HCR_NPIE; >> >>> >> > > else >> >>> >> > > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; >> >>> >> > > + GICH[GICH_HCR] &= ~GICH_HCR_NPIE; >> >>> >> > > >> >>> >> > > } >> >>> >> > >> >>> >> > Yes, exactly >> >>> >> >> >>> >> I tried, hang still occurs with this change >> >>> > >> >>> > We need to figure out why during the hang you still have all the LRs >> >>> > busy even if you are getting maintenance interrupts that should cause >> >>> > them to be cleared. >> >>> > >> >>> >> >>> I see that I have free LRs during maintenance interrupt >> >>> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt >> >>> (XEN) GICH_LRs (vcpu 0) mask=0 >> >>> (XEN) HW_LR[0]=9a015856 >> >>> (XEN) HW_LR[1]=0 >> >>> (XEN) HW_LR[2]=0 >> >>> (XEN) HW_LR[3]=0 >> >>> (XEN) Inflight irq=86 lr=0 >> >>> (XEN) Inflight irq=2 lr=255 >> >>> (XEN) Pending irq=2 >> >>> >> >>> But I see that after I got hang - maintenance interrupts are generated >> >>> continuously. Platform continues printing the same log till reboot. >> >> >> >> Exactly the same log? As in the one above you just pasted? >> >> That is very very suspicious. >> > >> > Yes exactly the same log. And looks like it means that LRs are flushed >> > correctly. >> > >> >> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and >> >> something we do in Xen, maybe writing to an LR register, might trigger a >> >> new maintenance interrupt immediately causing an infinite loop. >> >> >> > >> > Yes, this is what I'm thinking about. Taking in account all collected >> > debug info it looks like once LRs are overloaded with SGIs - >> > maintenance interrupt occurs. >> > And then it is not handled properly, and occurs again and again - so >> > platform hangs inside its handler. >> > >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on >> >> hypervisor entry. >> >> >> > >> > Now trying. >> > >> >> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c >> >> index 4d2a92d..6ae8dc4 100644 >> >> --- a/xen/arch/arm/gic.c >> >> +++ b/xen/arch/arm/gic.c >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) >> >> if ( is_idle_vcpu(v) ) >> >> return; >> >> >> >> + GICH[GICH_HCR] &= ~GICH_HCR_UIE; >> >> + >> >> spin_lock_irqsave(&v->arch.vgic.lock, flags); >> >> >> >> while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), >> >> @@ -821,12 +823,8 @@ void gic_inject(void) >> >> >> >> gic_restore_pending_irqs(current); >> >> >> >> - >> >> if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) >> >> GICH[GICH_HCR] |= GICH_HCR_UIE; >> >> - else >> >> - GICH[GICH_HCR] &= ~GICH_HCR_UIE; >> >> - >> >> } >> >> >> >> static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi) >> > >> >> Heh - I don't see hangs with this patch :) But also I see that >> maintenance interrupt doesn't occur (and no hang as result) >> Stefano - is this expected? > > No maintenance interrupts at all? That's strange. You should be > receiving them when LRs are full and you still have interrupts pending > to be added to them. > > You could add another printk here to see if you should be receiving > them: > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > + { > + gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n"); > GICH[GICH_HCR] |= GICH_HCR_UIE; > - else > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; > - > + } > } > Requested properly: (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt But does not occur > >> > >> > >> > -- >> > >> > Andrii Tseglytskyi | Embedded Dev >> > GlobalLogic >> > www.globallogic.com >> >> >> >> -- >> >> Andrii Tseglytskyi | Embedded Dev >> GlobalLogic >> www.globallogic.com >>
Gic dump during interrupt requesting: (XEN) GICH_LRs (vcpu 0) mask=f (XEN) HW_LR[0]=3a00001f (XEN) HW_LR[1]=9a015856 (XEN) HW_LR[2]=1a00001b (XEN) HW_LR[3]=9a00e439 (XEN) Inflight irq=31 lr=0 (XEN) Inflight irq=86 lr=1 (XEN) Inflight irq=27 lr=2 (XEN) Inflight irq=57 lr=3 (XEN) Inflight irq=2 lr=255 (XEN) Pending irq=2 On Wed, Nov 19, 2014 at 6:29 PM, Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> wrote: > On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini > <stefano.stabellini@eu.citrix.com> wrote: >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >>> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi >>> <andrii.tseglytskyi@globallogic.com> wrote: >>> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini >>> > <stefano.stabellini@eu.citrix.com> wrote: >>> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >>> >>> Hi Stefano, >>> >>> >>> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini >>> >>> <stefano.stabellini@eu.citrix.com> wrote: >>> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >>> >>> >> Hi Stefano, >>> >>> >> >>> >>> >> > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) >>> >>> >> > > - GICH[GICH_HCR] |= GICH_HCR_UIE; >>> >>> >> > > + GICH[GICH_HCR] |= GICH_HCR_NPIE; >>> >>> >> > > else >>> >>> >> > > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; >>> >>> >> > > + GICH[GICH_HCR] &= ~GICH_HCR_NPIE; >>> >>> >> > > >>> >>> >> > > } >>> >>> >> > >>> >>> >> > Yes, exactly >>> >>> >> >>> >>> >> I tried, hang still occurs with this change >>> >>> > >>> >>> > We need to figure out why during the hang you still have all the LRs >>> >>> > busy even if you are getting maintenance interrupts that should cause >>> >>> > them to be cleared. >>> >>> > >>> >>> >>> >>> I see that I have free LRs during maintenance interrupt >>> >>> >>> >>> (XEN) gic.c:871:d0v0 maintenance interrupt >>> >>> (XEN) GICH_LRs (vcpu 0) mask=0 >>> >>> (XEN) HW_LR[0]=9a015856 >>> >>> (XEN) HW_LR[1]=0 >>> >>> (XEN) HW_LR[2]=0 >>> >>> (XEN) HW_LR[3]=0 >>> >>> (XEN) Inflight irq=86 lr=0 >>> >>> (XEN) Inflight irq=2 lr=255 >>> >>> (XEN) Pending irq=2 >>> >>> >>> >>> But I see that after I got hang - maintenance interrupts are generated >>> >>> continuously. Platform continues printing the same log till reboot. >>> >> >>> >> Exactly the same log? As in the one above you just pasted? >>> >> That is very very suspicious. >>> > >>> > Yes exactly the same log. And looks like it means that LRs are flushed >>> > correctly. >>> > >>> >> >>> >> I am thinking that we are not handling GICH_HCR_UIE correctly and >>> >> something we do in Xen, maybe writing to an LR register, might trigger a >>> >> new maintenance interrupt immediately causing an infinite loop. >>> >> >>> > >>> > Yes, this is what I'm thinking about. Taking in account all collected >>> > debug info it looks like once LRs are overloaded with SGIs - >>> > maintenance interrupt occurs. >>> > And then it is not handled properly, and occurs again and again - so >>> > platform hangs inside its handler. >>> > >>> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on >>> >> hypervisor entry. >>> >> >>> > >>> > Now trying. >>> > >>> >> >>> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c >>> >> index 4d2a92d..6ae8dc4 100644 >>> >> --- a/xen/arch/arm/gic.c >>> >> +++ b/xen/arch/arm/gic.c >>> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) >>> >> if ( is_idle_vcpu(v) ) >>> >> return; >>> >> >>> >> + GICH[GICH_HCR] &= ~GICH_HCR_UIE; >>> >> + >>> >> spin_lock_irqsave(&v->arch.vgic.lock, flags); >>> >> >>> >> while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), >>> >> @@ -821,12 +823,8 @@ void gic_inject(void) >>> >> >>> >> gic_restore_pending_irqs(current); >>> >> >>> >> - >>> >> if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) >>> >> GICH[GICH_HCR] |= GICH_HCR_UIE; >>> >> - else >>> >> - GICH[GICH_HCR] &= ~GICH_HCR_UIE; >>> >> - >>> >> } >>> >> >>> >> static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi) >>> > >>> >>> Heh - I don't see hangs with this patch :) But also I see that >>> maintenance interrupt doesn't occur (and no hang as result) >>> Stefano - is this expected? >> >> No maintenance interrupts at all? That's strange. You should be >> receiving them when LRs are full and you still have interrupts pending >> to be added to them. >> >> You could add another printk here to see if you should be receiving >> them: >> >> if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) >> + { >> + gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n"); >> GICH[GICH_HCR] |= GICH_HCR_UIE; >> - else >> - GICH[GICH_HCR] &= ~GICH_HCR_UIE; >> - >> + } >> } >> > > Requested properly: > > (XEN) gic.c:756:d0v0 requesting maintenance interrupt > (XEN) gic.c:756:d0v0 requesting maintenance interrupt > (XEN) gic.c:756:d0v0 requesting maintenance interrupt > (XEN) gic.c:756:d0v0 requesting maintenance interrupt > (XEN) gic.c:756:d0v0 requesting maintenance interrupt > (XEN) gic.c:756:d0v0 requesting maintenance interrupt > (XEN) gic.c:756:d0v0 requesting maintenance interrupt > > But does not occur > > >> >>> > >>> > >>> > -- >>> > >>> > Andrii Tseglytskyi | Embedded Dev >>> > GlobalLogic >>> > www.globallogic.com >>> >>> >>> >>> -- >>> >>> Andrii Tseglytskyi | Embedded Dev >>> GlobalLogic >>> www.globallogic.com >>> > > > > -- > > Andrii Tseglytskyi | Embedded Dev > GlobalLogic > www.globallogic.com
BTW - shouldn't this flag GICH_LR_MAINTENANCE_IRQ be set after maintenance interrupt requesting ? On Wed, Nov 19, 2014 at 6:32 PM, Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> wrote: > Gic dump during interrupt requesting: > > (XEN) GICH_LRs (vcpu 0) mask=f > (XEN) HW_LR[0]=3a00001f > (XEN) HW_LR[1]=9a015856 > (XEN) HW_LR[2]=1a00001b > (XEN) HW_LR[3]=9a00e439 > (XEN) Inflight irq=31 lr=0 > (XEN) Inflight irq=86 lr=1 > (XEN) Inflight irq=27 lr=2 > (XEN) Inflight irq=57 lr=3 > (XEN) Inflight irq=2 lr=255 > (XEN) Pending irq=2 > > On Wed, Nov 19, 2014 at 6:29 PM, Andrii Tseglytskyi > <andrii.tseglytskyi@globallogic.com> wrote: >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini >> <stefano.stabellini@eu.citrix.com> wrote: >>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >>>> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi >>>> <andrii.tseglytskyi@globallogic.com> wrote: >>>> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini >>>> > <stefano.stabellini@eu.citrix.com> wrote: >>>> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >>>> >>> Hi Stefano, >>>> >>> >>>> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini >>>> >>> <stefano.stabellini@eu.citrix.com> wrote: >>>> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >>>> >>> >> Hi Stefano, >>>> >>> >> >>>> >>> >> > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) >>>> >>> >> > > - GICH[GICH_HCR] |= GICH_HCR_UIE; >>>> >>> >> > > + GICH[GICH_HCR] |= GICH_HCR_NPIE; >>>> >>> >> > > else >>>> >>> >> > > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; >>>> >>> >> > > + GICH[GICH_HCR] &= ~GICH_HCR_NPIE; >>>> >>> >> > > >>>> >>> >> > > } >>>> >>> >> > >>>> >>> >> > Yes, exactly >>>> >>> >> >>>> >>> >> I tried, hang still occurs with this change >>>> >>> > >>>> >>> > We need to figure out why during the hang you still have all the LRs >>>> >>> > busy even if you are getting maintenance interrupts that should cause >>>> >>> > them to be cleared. >>>> >>> > >>>> >>> >>>> >>> I see that I have free LRs during maintenance interrupt >>>> >>> >>>> >>> (XEN) gic.c:871:d0v0 maintenance interrupt >>>> >>> (XEN) GICH_LRs (vcpu 0) mask=0 >>>> >>> (XEN) HW_LR[0]=9a015856 >>>> >>> (XEN) HW_LR[1]=0 >>>> >>> (XEN) HW_LR[2]=0 >>>> >>> (XEN) HW_LR[3]=0 >>>> >>> (XEN) Inflight irq=86 lr=0 >>>> >>> (XEN) Inflight irq=2 lr=255 >>>> >>> (XEN) Pending irq=2 >>>> >>> >>>> >>> But I see that after I got hang - maintenance interrupts are generated >>>> >>> continuously. Platform continues printing the same log till reboot. >>>> >> >>>> >> Exactly the same log? As in the one above you just pasted? >>>> >> That is very very suspicious. >>>> > >>>> > Yes exactly the same log. And looks like it means that LRs are flushed >>>> > correctly. >>>> > >>>> >> >>>> >> I am thinking that we are not handling GICH_HCR_UIE correctly and >>>> >> something we do in Xen, maybe writing to an LR register, might trigger a >>>> >> new maintenance interrupt immediately causing an infinite loop. >>>> >> >>>> > >>>> > Yes, this is what I'm thinking about. Taking in account all collected >>>> > debug info it looks like once LRs are overloaded with SGIs - >>>> > maintenance interrupt occurs. >>>> > And then it is not handled properly, and occurs again and again - so >>>> > platform hangs inside its handler. >>>> > >>>> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on >>>> >> hypervisor entry. >>>> >> >>>> > >>>> > Now trying. >>>> > >>>> >> >>>> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c >>>> >> index 4d2a92d..6ae8dc4 100644 >>>> >> --- a/xen/arch/arm/gic.c >>>> >> +++ b/xen/arch/arm/gic.c >>>> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) >>>> >> if ( is_idle_vcpu(v) ) >>>> >> return; >>>> >> >>>> >> + GICH[GICH_HCR] &= ~GICH_HCR_UIE; >>>> >> + >>>> >> spin_lock_irqsave(&v->arch.vgic.lock, flags); >>>> >> >>>> >> while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), >>>> >> @@ -821,12 +823,8 @@ void gic_inject(void) >>>> >> >>>> >> gic_restore_pending_irqs(current); >>>> >> >>>> >> - >>>> >> if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) >>>> >> GICH[GICH_HCR] |= GICH_HCR_UIE; >>>> >> - else >>>> >> - GICH[GICH_HCR] &= ~GICH_HCR_UIE; >>>> >> - >>>> >> } >>>> >> >>>> >> static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi) >>>> > >>>> >>>> Heh - I don't see hangs with this patch :) But also I see that >>>> maintenance interrupt doesn't occur (and no hang as result) >>>> Stefano - is this expected? >>> >>> No maintenance interrupts at all? That's strange. You should be >>> receiving them when LRs are full and you still have interrupts pending >>> to be added to them. >>> >>> You could add another printk here to see if you should be receiving >>> them: >>> >>> if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) >>> + { >>> + gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n"); >>> GICH[GICH_HCR] |= GICH_HCR_UIE; >>> - else >>> - GICH[GICH_HCR] &= ~GICH_HCR_UIE; >>> - >>> + } >>> } >>> >> >> Requested properly: >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt >> >> But does not occur >> >> >>> >>>> > >>>> > >>>> > -- >>>> > >>>> > Andrii Tseglytskyi | Embedded Dev >>>> > GlobalLogic >>>> > www.globallogic.com >>>> >>>> >>>> >>>> -- >>>> >>>> Andrii Tseglytskyi | Embedded Dev >>>> GlobalLogic >>>> www.globallogic.com >>>> >> >> >> >> -- >> >> Andrii Tseglytskyi | Embedded Dev >> GlobalLogic >> www.globallogic.com > > > > -- > > Andrii Tseglytskyi | Embedded Dev > GlobalLogic > www.globallogic.com
No, that's for requesting a maintenance interrupt for a specific irq when it is EOI'ed by the guest. In our case we are requesting maintenance interrupts via UIE: a single global maintenance interrupt when most LRs become free. On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > BTW - shouldn't this flag GICH_LR_MAINTENANCE_IRQ be set after > maintenance interrupt requesting ? > > On Wed, Nov 19, 2014 at 6:32 PM, Andrii Tseglytskyi > <andrii.tseglytskyi@globallogic.com> wrote: > > Gic dump during interrupt requesting: > > > > (XEN) GICH_LRs (vcpu 0) mask=f > > (XEN) HW_LR[0]=3a00001f > > (XEN) HW_LR[1]=9a015856 > > (XEN) HW_LR[2]=1a00001b > > (XEN) HW_LR[3]=9a00e439 > > (XEN) Inflight irq=31 lr=0 > > (XEN) Inflight irq=86 lr=1 > > (XEN) Inflight irq=27 lr=2 > > (XEN) Inflight irq=57 lr=3 > > (XEN) Inflight irq=2 lr=255 > > (XEN) Pending irq=2 > > > > On Wed, Nov 19, 2014 at 6:29 PM, Andrii Tseglytskyi > > <andrii.tseglytskyi@globallogic.com> wrote: > >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini > >> <stefano.stabellini@eu.citrix.com> wrote: > >>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > >>>> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi > >>>> <andrii.tseglytskyi@globallogic.com> wrote: > >>>> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini > >>>> > <stefano.stabellini@eu.citrix.com> wrote: > >>>> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > >>>> >>> Hi Stefano, > >>>> >>> > >>>> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini > >>>> >>> <stefano.stabellini@eu.citrix.com> wrote: > >>>> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > >>>> >>> >> Hi Stefano, > >>>> >>> >> > >>>> >>> >> > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > >>>> >>> >> > > - GICH[GICH_HCR] |= GICH_HCR_UIE; > >>>> >>> >> > > + GICH[GICH_HCR] |= GICH_HCR_NPIE; > >>>> >>> >> > > else > >>>> >>> >> > > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; > >>>> >>> >> > > + GICH[GICH_HCR] &= ~GICH_HCR_NPIE; > >>>> >>> >> > > > >>>> >>> >> > > } > >>>> >>> >> > > >>>> >>> >> > Yes, exactly > >>>> >>> >> > >>>> >>> >> I tried, hang still occurs with this change > >>>> >>> > > >>>> >>> > We need to figure out why during the hang you still have all the LRs > >>>> >>> > busy even if you are getting maintenance interrupts that should cause > >>>> >>> > them to be cleared. > >>>> >>> > > >>>> >>> > >>>> >>> I see that I have free LRs during maintenance interrupt > >>>> >>> > >>>> >>> (XEN) gic.c:871:d0v0 maintenance interrupt > >>>> >>> (XEN) GICH_LRs (vcpu 0) mask=0 > >>>> >>> (XEN) HW_LR[0]=9a015856 > >>>> >>> (XEN) HW_LR[1]=0 > >>>> >>> (XEN) HW_LR[2]=0 > >>>> >>> (XEN) HW_LR[3]=0 > >>>> >>> (XEN) Inflight irq=86 lr=0 > >>>> >>> (XEN) Inflight irq=2 lr=255 > >>>> >>> (XEN) Pending irq=2 > >>>> >>> > >>>> >>> But I see that after I got hang - maintenance interrupts are generated > >>>> >>> continuously. Platform continues printing the same log till reboot. > >>>> >> > >>>> >> Exactly the same log? As in the one above you just pasted? > >>>> >> That is very very suspicious. > >>>> > > >>>> > Yes exactly the same log. And looks like it means that LRs are flushed > >>>> > correctly. > >>>> > > >>>> >> > >>>> >> I am thinking that we are not handling GICH_HCR_UIE correctly and > >>>> >> something we do in Xen, maybe writing to an LR register, might trigger a > >>>> >> new maintenance interrupt immediately causing an infinite loop. > >>>> >> > >>>> > > >>>> > Yes, this is what I'm thinking about. Taking in account all collected > >>>> > debug info it looks like once LRs are overloaded with SGIs - > >>>> > maintenance interrupt occurs. > >>>> > And then it is not handled properly, and occurs again and again - so > >>>> > platform hangs inside its handler. > >>>> > > >>>> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on > >>>> >> hypervisor entry. > >>>> >> > >>>> > > >>>> > Now trying. > >>>> > > >>>> >> > >>>> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > >>>> >> index 4d2a92d..6ae8dc4 100644 > >>>> >> --- a/xen/arch/arm/gic.c > >>>> >> +++ b/xen/arch/arm/gic.c > >>>> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) > >>>> >> if ( is_idle_vcpu(v) ) > >>>> >> return; > >>>> >> > >>>> >> + GICH[GICH_HCR] &= ~GICH_HCR_UIE; > >>>> >> + > >>>> >> spin_lock_irqsave(&v->arch.vgic.lock, flags); > >>>> >> > >>>> >> while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), > >>>> >> @@ -821,12 +823,8 @@ void gic_inject(void) > >>>> >> > >>>> >> gic_restore_pending_irqs(current); > >>>> >> > >>>> >> - > >>>> >> if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > >>>> >> GICH[GICH_HCR] |= GICH_HCR_UIE; > >>>> >> - else > >>>> >> - GICH[GICH_HCR] &= ~GICH_HCR_UIE; > >>>> >> - > >>>> >> } > >>>> >> > >>>> >> static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi) > >>>> > > >>>> > >>>> Heh - I don't see hangs with this patch :) But also I see that > >>>> maintenance interrupt doesn't occur (and no hang as result) > >>>> Stefano - is this expected? > >>> > >>> No maintenance interrupts at all? That's strange. You should be > >>> receiving them when LRs are full and you still have interrupts pending > >>> to be added to them. > >>> > >>> You could add another printk here to see if you should be receiving > >>> them: > >>> > >>> if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > >>> + { > >>> + gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n"); > >>> GICH[GICH_HCR] |= GICH_HCR_UIE; > >>> - else > >>> - GICH[GICH_HCR] &= ~GICH_HCR_UIE; > >>> - > >>> + } > >>> } > >>> > >> > >> Requested properly: > >> > >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt > >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt > >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt > >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt > >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt > >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt > >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt > >> > >> But does not occur > >> > >> > >>> > >>>> > > >>>> > > >>>> > -- > >>>> > > >>>> > Andrii Tseglytskyi | Embedded Dev > >>>> > GlobalLogic > >>>> > www.globallogic.com > >>>> > >>>> > >>>> > >>>> -- > >>>> > >>>> Andrii Tseglytskyi | Embedded Dev > >>>> GlobalLogic > >>>> www.globallogic.com > >>>> > >> > >> > >> > >> -- > >> > >> Andrii Tseglytskyi | Embedded Dev > >> GlobalLogic > >> www.globallogic.com > > > > > > > > -- > > > > Andrii Tseglytskyi | Embedded Dev > > GlobalLogic > > www.globallogic.com > > > > -- > > Andrii Tseglytskyi | Embedded Dev > GlobalLogic > www.globallogic.com >
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 4d2a92d..6ae8dc4 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; + GICH[GICH_HCR] &= ~GICH_HCR_UIE; + spin_lock_irqsave(&v->arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), @@ -821,12 +823,8 @@ void gic_inject(void) gic_restore_pending_irqs(current); - if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) GICH[GICH_HCR] |= GICH_HCR_UIE; - else - GICH[GICH_HCR] &= ~GICH_HCR_UIE; - } static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)