Message ID | alpine.DEB.2.02.1411191134080.27247@kaball.uk.xensource.com |
---|---|
State | New |
Headers | show |
On Wed, Nov 19, 2014 at 1:42 PM, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >> On Wed, Nov 19, 2014 at 1:12 PM, Stefano Stabellini >> <stefano.stabellini@eu.citrix.com> wrote: >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: >> >> Hi Stefano, >> >> >> >> Thank you for your support. >> >> >> >> You are right - with latest change you've proposed I got a continuous >> >> prints during platform hang: >> >> >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 >> >> >> >> Looks line issue needs further deeper debugging. >> > >> > Cool! You could simply print what irqs are in all LRs when they are >> > full, for example you could call gic_dump_info. That would tell us what >> > is taking all the LRs space we have. >> > >> > How many LRs are available on omap5 anyway? >> >> :) Already done this: >> >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=27 nr_lrs 4 i 4 into d0v0 >> (XEN) GICH_LRs (vcpu 0) mask=f >> (XEN) HW_LR[0]=1a00001f >> (XEN) HW_LR[1]=9a00e439 >> (XEN) HW_LR[2]=1a000002 >> (XEN) HW_LR[3]=9a015856 >> (XEN) Inflight irq=31 lr=0 >> (XEN) Inflight irq=57 lr=1 >> (XEN) Inflight irq=2 lr=2 >> (XEN) Inflight irq=86 lr=3 >> (XEN) Inflight irq=27 lr=255 >> (XEN) Pending irq=27 > > 27 should be the virtual timer if I remember correctly. > > So it looks like there is not actually anything wrong, is just that you > have too much inflight irqs? It should cause problems because in that > case GICH_HCR_UIE should be set and you should get a maintenance > interrupt when LRs become available (actually when "none, or only one, > of the List register entries is marked as a valid interrupt"). > > Maybe GICH_HCR_UIE is the one that doesn't work properly. It might be > worth checking that you are receiving maintenance interrupts: > > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > index b7516c0..b3eaa44 100644 > --- a/xen/arch/arm/gic.c > +++ b/xen/arch/arm/gic.c > @@ -868,6 +868,8 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r > * on return to guest that is going to clear the old LRs and inject > * new interrupts. > */ > + > + gdprintk(XENLOG_DEBUG, "maintenance interrupt\n"); > } > I observe this print during hang, so maintenance interrupt occurs. Can I perform some kind of LRs cleanup inside its handler? > void gic_dump_info(struct vcpu *v) > > > You could also try to replace GICH_HCR_UIE with GICH_HCR_NPIE, you > should still be receiving maintenance interrupts when one or more LRs > become available. Sorry didn't get, do you mean this change ? @@ -759,9 +760,9 @@ void gic_inject(void) if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) - GICH[GICH_HCR] |= GICH_HCR_UIE; + GICH[GICH_HCR] |= GICH_HCR_NPIE; else - GICH[GICH_HCR] &= ~GICH_HCR_UIE; + GICH[GICH_HCR] &= ~GICH_HCR_NPIE; } > > >> > >> > I doubt you have so much interrupt traffic to actually fill all the LRs, >> > so I am thinking that a few LRs might not be cleared properly (that >> > should happen on hypervisor entry, gic_update_one_lr should take care of >> > it). >> >> This actually explains why this happens during domU start - SGI >> traffic might be very heavy this time >> >> > >> > >> >> Regards, >> >> Andrii >> >> >> >> On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini >> >> <stefano.stabellini@eu.citrix.com> wrote: >> >> > Hello Andrii, >> >> > we are getting closer :-) >> >> > >> >> > It would help if you post the output with GIC_DEBUG defined but without >> >> > the other change that "fixes" the issue. >> >> > >> >> > I think the problem is probably due to software irqs. >> >> > You are getting too many >> >> > >> >> > gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending >> >> > >> >> > messages. That means you are loosing virtual SGIs (guest VCPU to guest >> >> > VCPU). It would be best to investigate why, especially if you get many >> >> > more of the same messages without the MAINTENANCE_IRQ change I >> >> > suggested. >> >> > >> >> > This patch might also help understading the problem more: >> >> > >> >> > >> >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c >> >> > index b7516c0..5eaeca2 100644 >> >> > --- a/xen/arch/arm/gic.c >> >> > +++ b/xen/arch/arm/gic.c >> >> > @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v) >> >> > list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue ) >> >> > { >> >> > i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs); >> >> > - if ( i >= nr_lrs ) return; >> >> > + if ( i >= nr_lrs ) >> >> > + { >> >> > + gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into d%dv%d\n", >> >> > + p->irq, v->domain->domain_id, v->vcpu_id); >> >> > + continue; >> >> > + } >> >> > >> >> > spin_lock_irqsave(&gic.lock, flags); >> >> > gic_set_lr(i, p, GICH_LR_PENDING); >> >> > >> >> > >> >> > >> >> > >> >> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: >> >> >> Hi Stefano, >> >> >> >> >> >> No hangs with this change. >> >> >> Complete log is the following: >> >> >> >> >> >> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26) >> >> >> DRA752 ES1.0 >> >> >> <ethaddr> not set. Validating first E-fuse MAC >> >> >> cpsw >> >> >> - UART enabled - >> >> >> - CPU 00000000 booting - >> >> >> - Xen starting in Hyp mode - >> >> >> - Zero BSS - >> >> >> - Setting up control registers - >> >> >> - Turning on paging - >> >> >> - Ready - >> >> >> (XEN) Checking for initrd in /chosen >> >> >> (XEN) RAM: 0000000080000000 - 000000009fffffff >> >> >> (XEN) RAM: 00000000a0000000 - 00000000bfffffff >> >> >> (XEN) RAM: 00000000c0000000 - 00000000dfffffff >> >> >> (XEN) >> >> >> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa >> >> >> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000 >> >> >> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000 >> >> >> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000 >> >> >> (XEN) RESVD[0]: 00000000ba300000 - 00000000bfd00000 >> >> >> (XEN) RESVD[1]: 0000000095800000 - 0000000095900000 >> >> >> (XEN) RESVD[2]: 0000000098a00000 - 0000000098b00000 >> >> >> (XEN) RESVD[3]: 0000000095f00000 - 0000000098a00000 >> >> >> (XEN) RESVD[4]: 0000000095900000 - 0000000095f00000 >> >> >> (XEN) >> >> >> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0 >> >> >> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1 >> >> >> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000 >> >> >> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages) >> >> >> (XEN) Dom heap: 344064 pages >> >> >> (XEN) Domain heap initialised >> >> >> (XEN) Looking for UART console serial0 >> >> >> Xen 4.5-unstable >> >> >> (XEN) Xen version 4.5-unstable (atseglytskyi@) >> >> >> (arm-linux-gnueabihf-gcc (crosstool-NG >> >> >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3 >> >> >> 20130328 (prerelease)) debu4 >> >> >> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty >> >> >> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2 >> >> >> (XEN) 32-bit Execution: >> >> >> (XEN) Processor Features: 00001131:00011011 >> >> >> (XEN) Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle >> >> >> (XEN) Extensions: GenericTimer Security >> >> >> (XEN) Debug Features: 02010555 >> >> >> (XEN) Auxiliary Features: 00000000 >> >> >> (XEN) Memory Model Features: 10201105 20000000 01240000 02102211 >> >> >> (XEN) ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000 >> >> >> (XEN) Platform: TI DRA7 >> >> >> (XEN) /psci method must be smc, but is: "hvc" >> >> >> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c) >> >> >> (XEN) Set AuxCoreBoot0 to 0x20 >> >> >> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 >> >> >> (XEN) Using generic timer at 6144 KHz >> >> >> (XEN) GIC initialization: >> >> >> (XEN) gic_dist_addr=0000000048211000 >> >> >> (XEN) gic_cpu_addr=0000000048212000 >> >> >> (XEN) gic_hyp_addr=0000000048214000 >> >> >> (XEN) gic_vcpu_addr=0000000048216000 >> >> >> (XEN) gic_maintenance_irq=25 >> >> >> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b). >> >> >> (XEN) Using scheduler: SMP Credit Scheduler (credit) >> >> >> (XEN) I/O virtualisation disabled >> >> >> (XEN) Allocated console ring of 16 KiB. >> >> >> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0 >> >> >> (XEN) Bringing up CPU1 >> >> >> - CPU 00000001 booting - >> >> >> - Xen starting in Hyp mode - >> >> >> - Setting up control registers - >> >> >> - Turning on paging - >> >> >> - Ready - >> >> >> (XEN) CPU 1 booted. >> >> >> (XEN) Brought up 2 CPUs >> >> >> (XEN) *** LOADING DOMAIN 0 *** >> >> >> (XEN) Loading kernel from boot module 2 >> >> >> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0) >> >> >> (XEN) Loading zImage from 00000000c0000040 to 00000000cfc00000-00000000cff50c48 >> >> >> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8 >> >> >> (XEN) Std. Loglevel: All >> >> >> (XEN) Guest Loglevel: All >> >> >> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch >> >> >> input to Xen) >> >> >> (XEN) Freed 272kB init memory. >> >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is >> >> >> already pending in LR0 >> >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is >> >> >> already pending in LR0 >> >> >> [ 0.000000] /cpus/cpu@0 missing clock-frequency property >> >> >> [ 0.000000] /cpus/cpu@1 missing clock-frequency property >> >> >> [ 0.140625] omap-gpmc omap-gpmc: failed to reserve memory >> >> >> [ 0.187500] omap_l3_noc ocp.3: couldn't find resource 2 >> >> >> [ 0.273437] i2c i2c-1: of_i2c: invalid reg on >> >> >> /ocp/i2c@48072000/camera_ov10635 >> >> >> [ 0.437500] ldo3: operation not allowed >> >> >> [ 0.437500] omapdss HDMI error: can't set the voltage regulator >> >> >> [ 0.468750] tfc_s9700 display0: tfc_s9700_probe probe >> >> >> [ 0.468750] ov1063x 1-0030: No deserializer node found >> >> >> [ 0.468750] ov1063x 1-0030: No serializer node found >> >> >> [ 0.468750] ov1063x 1-0030: Failed writing register 0x0103! >> >> >> [ 0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30 >> >> >> [ 0.578125] ahci ahci.0.auto: can't get clock >> >> >> [ 0.898437] ldc_module_init >> >> >> [ 1.304687] Missing dual_emac_res_vlan in DT. >> >> >> [ 1.304687] Using 1 as Reserved VLAN for 0 slave >> >> >> [ 1.312500] Missing dual_emac_res_vlan in DT. >> >> >> [ 1.320312] Using 2 as Reserved VLAN for 1 slave >> >> >> [ 1.382812] Freeing init memory: 236K >> >> >> sh: write error: No such device >> >> >> Cannot identify '/dev/camera0': 2, No such file or directory >> >> >> Parsing config from /xen/images/DomUAndroid.cfg >> >> >> XSM Disabled: seclabel not supported >> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet >> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give >> >> >> dom1 access to irq 53: Function not implemented >> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet >> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give >> >> >> dom1 access to irq 71: Function not implemented >> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet >> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give >> >> >> dom1 access to irq 173: Function not implemented >> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet >> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give >> >> >> dom1 access to irq 174: Function not implemented >> >> >> Turning on vfb in domain 1 >> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is >> >> >> still lr_pending >> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is >> >> >> still lr_pending >> >> >> Parsing config from /xen/images/DomUQNX.cfg >> >> >> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to >> >> >> inject irq=2 into d0v0, when it is still lr_pending >> >> >> >> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is >> >> >> still lr_pending >> >> >> [ 4.304687] dra7-evm-sound sound.17: cpu dai node is invalid >> >> >> [ 4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link -22 >> >> >> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader >> >> >> found: Invalid kernel >> >> >> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image >> >> >> failed: No such file or directory >> >> >> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot >> >> >> (re-)build domain: -3 >> >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is >> >> >> still lr_pending >> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is >> >> >> still lr_pending >> >> >> Turning on 'vsnd' in domain '1' (dev_id: '0') >> >> >> Turning on vkbd in domain 1 >> >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is >> >> >> still lr_pending >> >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is >> >> >> still lr_pending >> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is >> >> >> still lr_pending >> >> >> >> >> >> Please press Enter to activate this console. (XEN) gic.c:617:d0v1 >> >> >> trying to inject irq=2 into d0v0, when it is still lr_pending >> >> >> >> >> >> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi >> >> >> <andrii.tseglytskyi@globallogic.com> wrote: >> >> >> > OK got it. Give me a few mins >> >> >> > >> >> >> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini >> >> >> > <stefano.stabellini@eu.citrix.com> wrote: >> >> >> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only >> >> >> >> for non-hardware irqs (desc == NULL) and keep avoiding >> >> >> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs. >> >> >> >> >> >> >> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid >> >> >> >> other potential bugs introduced later. >> >> >> >> >> >> >> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: >> >> >> >>> What if I try on top of current master branch the following code: >> >> >> >>> >> >> >> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c >> >> >> >>> index 31fb81a..6764ab7 100644 >> >> >> >>> --- a/xen/arch/arm/gic-v2.c >> >> >> >>> +++ b/xen/arch/arm/gic-v2.c >> >> >> >>> @@ -36,6 +36,8 @@ >> >> >> >>> #include <asm/io.h> >> >> >> >>> #include <asm/gic.h> >> >> >> >>> >> >> >> >>> +#define GIC_DEBUG 1 >> >> >> >>> + >> >> >> >>> /* >> >> >> >>> * LR register definitions are GIC v2 specific. >> >> >> >>> * Moved these definitions from header file to here >> >> >> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c >> >> >> >>> index bcaded9..c03d6a6 100644 >> >> >> >>> --- a/xen/arch/arm/gic.c >> >> >> >>> +++ b/xen/arch/arm/gic.c >> >> >> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask); >> >> >> >>> >> >> >> >>> #define lr_all_full() (this_cpu(lr_mask) == ((1 << >> >> >> >>> gic_hw_ops->info->nr_lrs) - 1)) >> >> >> >>> >> >> >> >>> -#undef GIC_DEBUG >> >> >> >>> +#define GIC_DEBUG 1 >> >> >> >>> >> >> >> >>> static void gic_update_one_lr(struct vcpu *v, int i); >> >> >> >>> >> >> >> >>> It is equivalent to what you proposing - my code contains >> >> >> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will >> >> >> >>> be executed: >> >> >> >>> lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function >> >> >> >>> >> >> >> >>> regards, >> >> >> >>> Andrii >> >> >> >>> >> >> >> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini >> >> >> >>> <stefano.stabellini@eu.citrix.com> wrote: >> >> >> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: >> >> >> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and >> >> >> >>> >> everything works fine >> >> >> >>> >> The following 2 patches fixes xen/master for my platform. >> >> >> >>> >> >> >> >> >>> >> Stefano, could you please take a look to these changes? >> >> >> >>> >> >> >> >> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca >> >> >> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> >> >> >> >>> >> Date: Tue Nov 18 14:20:42 2014 +0200 >> >> >> >>> >> >> >> >> >>> >> xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag >> >> >> >>> >> >> >> >> >>> >> Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434 >> >> >> >>> >> Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> >> >> >> >>> >> >> >> >> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c >> >> >> >>> >> index 31fb81a..093ecdb 100644 >> >> >> >>> >> --- a/xen/arch/arm/gic-v2.c >> >> >> >>> >> +++ b/xen/arch/arm/gic-v2.c >> >> >> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct >> >> >> >>> >> pending_irq *p, >> >> >> >>> >> << GICH_V2_LR_PRIORITY_SHIFT) | >> >> >> >>> >> ((p->irq & GICH_V2_LR_VIRTUAL_MASK) << >> >> >> >>> >> GICH_V2_LR_VIRTUAL_SHIFT)); >> >> >> >>> >> >> >> >> >>> >> - if ( p->desc != NULL ) >> >> >> >>> >> + if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) ) >> >> >> >>> >> { >> >> >> >>> >> - if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) ) >> >> >> >>> >> - lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; >> >> >> >>> >> - else >> >> >> >>> >> - lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & >> >> >> >>> >> GICH_V2_LR_PHYSICAL_MASK ) >> >> >> >>> >> - << GICH_V2_LR_PHYSICAL_SHIFT); >> >> >> >>> >> + lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; >> >> >> >>> >> + } >> >> >> >>> >> + else if ( p->desc != NULL ) >> >> >> >>> >> + { >> >> >> >>> >> + lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK ) >> >> >> >>> >> + << GICH_V2_LR_PHYSICAL_SHIFT); >> >> >> >>> >> } >> >> >> >>> >> >> >> >> >>> >> writel_gich(lr_reg, GICH_LR + lr * 4); >> >> >> >>> > >> >> >> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it >> >> >> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need >> >> >> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not >> >> >> >>> > working correctly on OMAP5. This changes might only be better at >> >> >> >>> > "hiding" the real issue. >> >> >> >>> > >> >> >> >>> > Maybe the problem is exactly the opposite: the new scheme for avoiding >> >> >> >>> > maintenance interrupts doesn't work for software interrupts. >> >> >> >>> > The commit that should make them work correctly after the >> >> >> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121 >> >> >> >>> > If you look at the changes to gic_update_one_lr in that commit, you'll >> >> >> >>> > see that is going to set a software irq as PENDING if it is already ACTIVE. >> >> >> >>> > Maybe that doesn't work correctly on OMAP5. >> >> >> >>> > >> >> >> >>> > Could you try this patch on top of >> >> >> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121? It should help us understand >> >> >> >>> > if the problem is specifically with software irqs. >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c >> >> >> >>> > index b7516c0..d8a17c9 100644 >> >> >> >>> > --- a/xen/arch/arm/gic.c >> >> >> >>> > +++ b/xen/arch/arm/gic.c >> >> >> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id); >> >> >> >>> > /* Maximum cpu interface per GIC */ >> >> >> >>> > #define NR_GIC_CPU_IF 8 >> >> >> >>> > >> >> >> >>> > -#undef GIC_DEBUG >> >> >> >>> > +#define GIC_DEBUG 1 >> >> >> >>> > >> >> >> >>> > static void gic_update_one_lr(struct vcpu *v, int i); >> >> >> >>> > >> >> >> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p, >> >> >> >>> > ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT); >> >> >> >>> > if ( p->desc != NULL ) >> >> >> >>> > lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT); >> >> >> >>> > + else >> >> >> >>> > + lr_val |= GICH_LR_MAINTENANCE_IRQ; >> >> >> >>> > >> >> >> >>> > GICH[GICH_LR + lr] = lr_val; >> >> >> >>> > >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> -- >> >> >> >>> >> >> >> >>> Andrii Tseglytskyi | Embedded Dev >> >> >> >>> GlobalLogic >> >> >> >>> www.globallogic.com >> >> >> >>> >> >> >> > >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > >> >> >> > Andrii Tseglytskyi | Embedded Dev >> >> >> > GlobalLogic >> >> >> > www.globallogic.com >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> Andrii Tseglytskyi | Embedded Dev >> >> >> GlobalLogic >> >> >> www.globallogic.com >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> Andrii Tseglytskyi | Embedded Dev >> >> GlobalLogic >> >> www.globallogic.com >> >> >> >> >> >> -- >> >> Andrii Tseglytskyi | Embedded Dev >> GlobalLogic >> www.globallogic.com >>
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > On Wed, Nov 19, 2014 at 1:42 PM, Stefano Stabellini > <stefano.stabellini@eu.citrix.com> wrote: > > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > >> On Wed, Nov 19, 2014 at 1:12 PM, Stefano Stabellini > >> <stefano.stabellini@eu.citrix.com> wrote: > >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > >> >> Hi Stefano, > >> >> > >> >> Thank you for your support. > >> >> > >> >> You are right - with latest change you've proposed I got a continuous > >> >> prints during platform hang: > >> >> > >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 > >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 > >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 > >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 > >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 > >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 > >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 > >> >> > >> >> Looks line issue needs further deeper debugging. > >> > > >> > Cool! You could simply print what irqs are in all LRs when they are > >> > full, for example you could call gic_dump_info. That would tell us what > >> > is taking all the LRs space we have. > >> > > >> > How many LRs are available on omap5 anyway? > >> > >> :) Already done this: > >> > >> > >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=27 nr_lrs 4 i 4 into d0v0 > >> (XEN) GICH_LRs (vcpu 0) mask=f > >> (XEN) HW_LR[0]=1a00001f > >> (XEN) HW_LR[1]=9a00e439 > >> (XEN) HW_LR[2]=1a000002 > >> (XEN) HW_LR[3]=9a015856 > >> (XEN) Inflight irq=31 lr=0 > >> (XEN) Inflight irq=57 lr=1 > >> (XEN) Inflight irq=2 lr=2 > >> (XEN) Inflight irq=86 lr=3 > >> (XEN) Inflight irq=27 lr=255 > >> (XEN) Pending irq=27 > > > > 27 should be the virtual timer if I remember correctly. > > > > So it looks like there is not actually anything wrong, is just that you > > have too much inflight irqs? It should cause problems because in that > > case GICH_HCR_UIE should be set and you should get a maintenance > > interrupt when LRs become available (actually when "none, or only one, > > of the List register entries is marked as a valid interrupt"). > > > > Maybe GICH_HCR_UIE is the one that doesn't work properly. It might be > > worth checking that you are receiving maintenance interrupts: > > > > > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > index b7516c0..b3eaa44 100644 > > --- a/xen/arch/arm/gic.c > > +++ b/xen/arch/arm/gic.c > > @@ -868,6 +868,8 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r > > * on return to guest that is going to clear the old LRs and inject > > * new interrupts. > > */ > > + > > + gdprintk(XENLOG_DEBUG, "maintenance interrupt\n"); > > } > > > > I observe this print during hang, so maintenance interrupt occurs. > Can I perform some kind of LRs cleanup inside its handler? It should happen automatically because on hypervisor entry gic_clear_lrs gets called. In fact by the time you see this message the LRs should have already been cleared. > > void gic_dump_info(struct vcpu *v) > > > > > > You could also try to replace GICH_HCR_UIE with GICH_HCR_NPIE, you > > should still be receiving maintenance interrupts when one or more LRs > > become available. > > Sorry didn't get, do you mean this change ? > > @@ -759,9 +760,9 @@ void gic_inject(void) > > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > - GICH[GICH_HCR] |= GICH_HCR_UIE; > + GICH[GICH_HCR] |= GICH_HCR_NPIE; > else > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; > + GICH[GICH_HCR] &= ~GICH_HCR_NPIE; > > } Yes, exactly > > > > > > >> > > >> > I doubt you have so much interrupt traffic to actually fill all the LRs, > >> > so I am thinking that a few LRs might not be cleared properly (that > >> > should happen on hypervisor entry, gic_update_one_lr should take care of > >> > it). > >> > >> This actually explains why this happens during domU start - SGI > >> traffic might be very heavy this time > >> > >> > > >> > > >> >> Regards, > >> >> Andrii > >> >> > >> >> On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini > >> >> <stefano.stabellini@eu.citrix.com> wrote: > >> >> > Hello Andrii, > >> >> > we are getting closer :-) > >> >> > > >> >> > It would help if you post the output with GIC_DEBUG defined but without > >> >> > the other change that "fixes" the issue. > >> >> > > >> >> > I think the problem is probably due to software irqs. > >> >> > You are getting too many > >> >> > > >> >> > gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending > >> >> > > >> >> > messages. That means you are loosing virtual SGIs (guest VCPU to guest > >> >> > VCPU). It would be best to investigate why, especially if you get many > >> >> > more of the same messages without the MAINTENANCE_IRQ change I > >> >> > suggested. > >> >> > > >> >> > This patch might also help understading the problem more: > >> >> > > >> >> > > >> >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > >> >> > index b7516c0..5eaeca2 100644 > >> >> > --- a/xen/arch/arm/gic.c > >> >> > +++ b/xen/arch/arm/gic.c > >> >> > @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v) > >> >> > list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue ) > >> >> > { > >> >> > i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs); > >> >> > - if ( i >= nr_lrs ) return; > >> >> > + if ( i >= nr_lrs ) > >> >> > + { > >> >> > + gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into d%dv%d\n", > >> >> > + p->irq, v->domain->domain_id, v->vcpu_id); > >> >> > + continue; > >> >> > + } > >> >> > > >> >> > spin_lock_irqsave(&gic.lock, flags); > >> >> > gic_set_lr(i, p, GICH_LR_PENDING); > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: > >> >> >> Hi Stefano, > >> >> >> > >> >> >> No hangs with this change. > >> >> >> Complete log is the following: > >> >> >> > >> >> >> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26) > >> >> >> DRA752 ES1.0 > >> >> >> <ethaddr> not set. Validating first E-fuse MAC > >> >> >> cpsw > >> >> >> - UART enabled - > >> >> >> - CPU 00000000 booting - > >> >> >> - Xen starting in Hyp mode - > >> >> >> - Zero BSS - > >> >> >> - Setting up control registers - > >> >> >> - Turning on paging - > >> >> >> - Ready - > >> >> >> (XEN) Checking for initrd in /chosen > >> >> >> (XEN) RAM: 0000000080000000 - 000000009fffffff > >> >> >> (XEN) RAM: 00000000a0000000 - 00000000bfffffff > >> >> >> (XEN) RAM: 00000000c0000000 - 00000000dfffffff > >> >> >> (XEN) > >> >> >> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa > >> >> >> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000 > >> >> >> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000 > >> >> >> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000 > >> >> >> (XEN) RESVD[0]: 00000000ba300000 - 00000000bfd00000 > >> >> >> (XEN) RESVD[1]: 0000000095800000 - 0000000095900000 > >> >> >> (XEN) RESVD[2]: 0000000098a00000 - 0000000098b00000 > >> >> >> (XEN) RESVD[3]: 0000000095f00000 - 0000000098a00000 > >> >> >> (XEN) RESVD[4]: 0000000095900000 - 0000000095f00000 > >> >> >> (XEN) > >> >> >> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0 > >> >> >> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1 > >> >> >> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000 > >> >> >> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages) > >> >> >> (XEN) Dom heap: 344064 pages > >> >> >> (XEN) Domain heap initialised > >> >> >> (XEN) Looking for UART console serial0 > >> >> >> Xen 4.5-unstable > >> >> >> (XEN) Xen version 4.5-unstable (atseglytskyi@) > >> >> >> (arm-linux-gnueabihf-gcc (crosstool-NG > >> >> >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3 > >> >> >> 20130328 (prerelease)) debu4 > >> >> >> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty > >> >> >> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2 > >> >> >> (XEN) 32-bit Execution: > >> >> >> (XEN) Processor Features: 00001131:00011011 > >> >> >> (XEN) Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle > >> >> >> (XEN) Extensions: GenericTimer Security > >> >> >> (XEN) Debug Features: 02010555 > >> >> >> (XEN) Auxiliary Features: 00000000 > >> >> >> (XEN) Memory Model Features: 10201105 20000000 01240000 02102211 > >> >> >> (XEN) ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000 > >> >> >> (XEN) Platform: TI DRA7 > >> >> >> (XEN) /psci method must be smc, but is: "hvc" > >> >> >> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c) > >> >> >> (XEN) Set AuxCoreBoot0 to 0x20 > >> >> >> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 > >> >> >> (XEN) Using generic timer at 6144 KHz > >> >> >> (XEN) GIC initialization: > >> >> >> (XEN) gic_dist_addr=0000000048211000 > >> >> >> (XEN) gic_cpu_addr=0000000048212000 > >> >> >> (XEN) gic_hyp_addr=0000000048214000 > >> >> >> (XEN) gic_vcpu_addr=0000000048216000 > >> >> >> (XEN) gic_maintenance_irq=25 > >> >> >> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b). > >> >> >> (XEN) Using scheduler: SMP Credit Scheduler (credit) > >> >> >> (XEN) I/O virtualisation disabled > >> >> >> (XEN) Allocated console ring of 16 KiB. > >> >> >> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0 > >> >> >> (XEN) Bringing up CPU1 > >> >> >> - CPU 00000001 booting - > >> >> >> - Xen starting in Hyp mode - > >> >> >> - Setting up control registers - > >> >> >> - Turning on paging - > >> >> >> - Ready - > >> >> >> (XEN) CPU 1 booted. > >> >> >> (XEN) Brought up 2 CPUs > >> >> >> (XEN) *** LOADING DOMAIN 0 *** > >> >> >> (XEN) Loading kernel from boot module 2 > >> >> >> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0) > >> >> >> (XEN) Loading zImage from 00000000c0000040 to 00000000cfc00000-00000000cff50c48 > >> >> >> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8 > >> >> >> (XEN) Std. Loglevel: All > >> >> >> (XEN) Guest Loglevel: All > >> >> >> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch > >> >> >> input to Xen) > >> >> >> (XEN) Freed 272kB init memory. > >> >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is > >> >> >> already pending in LR0 > >> >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is > >> >> >> already pending in LR0 > >> >> >> [ 0.000000] /cpus/cpu@0 missing clock-frequency property > >> >> >> [ 0.000000] /cpus/cpu@1 missing clock-frequency property > >> >> >> [ 0.140625] omap-gpmc omap-gpmc: failed to reserve memory > >> >> >> [ 0.187500] omap_l3_noc ocp.3: couldn't find resource 2 > >> >> >> [ 0.273437] i2c i2c-1: of_i2c: invalid reg on > >> >> >> /ocp/i2c@48072000/camera_ov10635 > >> >> >> [ 0.437500] ldo3: operation not allowed > >> >> >> [ 0.437500] omapdss HDMI error: can't set the voltage regulator > >> >> >> [ 0.468750] tfc_s9700 display0: tfc_s9700_probe probe > >> >> >> [ 0.468750] ov1063x 1-0030: No deserializer node found > >> >> >> [ 0.468750] ov1063x 1-0030: No serializer node found > >> >> >> [ 0.468750] ov1063x 1-0030: Failed writing register 0x0103! > >> >> >> [ 0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30 > >> >> >> [ 0.578125] ahci ahci.0.auto: can't get clock > >> >> >> [ 0.898437] ldc_module_init > >> >> >> [ 1.304687] Missing dual_emac_res_vlan in DT. > >> >> >> [ 1.304687] Using 1 as Reserved VLAN for 0 slave > >> >> >> [ 1.312500] Missing dual_emac_res_vlan in DT. > >> >> >> [ 1.320312] Using 2 as Reserved VLAN for 1 slave > >> >> >> [ 1.382812] Freeing init memory: 236K > >> >> >> sh: write error: No such device > >> >> >> Cannot identify '/dev/camera0': 2, No such file or directory > >> >> >> Parsing config from /xen/images/DomUAndroid.cfg > >> >> >> XSM Disabled: seclabel not supported > >> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet > >> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give > >> >> >> dom1 access to irq 53: Function not implemented > >> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet > >> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give > >> >> >> dom1 access to irq 71: Function not implemented > >> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet > >> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give > >> >> >> dom1 access to irq 173: Function not implemented > >> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet > >> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give > >> >> >> dom1 access to irq 174: Function not implemented > >> >> >> Turning on vfb in domain 1 > >> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > >> >> >> still lr_pending > >> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > >> >> >> still lr_pending > >> >> >> Parsing config from /xen/images/DomUQNX.cfg > >> >> >> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to > >> >> >> inject irq=2 into d0v0, when it is still lr_pending > >> >> >> > >> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > >> >> >> still lr_pending > >> >> >> [ 4.304687] dra7-evm-sound sound.17: cpu dai node is invalid > >> >> >> [ 4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link -22 > >> >> >> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader > >> >> >> found: Invalid kernel > >> >> >> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image > >> >> >> failed: No such file or directory > >> >> >> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot > >> >> >> (re-)build domain: -3 > >> >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is > >> >> >> still lr_pending > >> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > >> >> >> still lr_pending > >> >> >> Turning on 'vsnd' in domain '1' (dev_id: '0') > >> >> >> Turning on vkbd in domain 1 > >> >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is > >> >> >> still lr_pending > >> >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is > >> >> >> still lr_pending > >> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > >> >> >> still lr_pending > >> >> >> > >> >> >> Please press Enter to activate this console. (XEN) gic.c:617:d0v1 > >> >> >> trying to inject irq=2 into d0v0, when it is still lr_pending > >> >> >> > >> >> >> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi > >> >> >> <andrii.tseglytskyi@globallogic.com> wrote: > >> >> >> > OK got it. Give me a few mins > >> >> >> > > >> >> >> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini > >> >> >> > <stefano.stabellini@eu.citrix.com> wrote: > >> >> >> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only > >> >> >> >> for non-hardware irqs (desc == NULL) and keep avoiding > >> >> >> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs. > >> >> >> >> > >> >> >> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid > >> >> >> >> other potential bugs introduced later. > >> >> >> >> > >> >> >> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: > >> >> >> >>> What if I try on top of current master branch the following code: > >> >> >> >>> > >> >> >> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c > >> >> >> >>> index 31fb81a..6764ab7 100644 > >> >> >> >>> --- a/xen/arch/arm/gic-v2.c > >> >> >> >>> +++ b/xen/arch/arm/gic-v2.c > >> >> >> >>> @@ -36,6 +36,8 @@ > >> >> >> >>> #include <asm/io.h> > >> >> >> >>> #include <asm/gic.h> > >> >> >> >>> > >> >> >> >>> +#define GIC_DEBUG 1 > >> >> >> >>> + > >> >> >> >>> /* > >> >> >> >>> * LR register definitions are GIC v2 specific. > >> >> >> >>> * Moved these definitions from header file to here > >> >> >> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > >> >> >> >>> index bcaded9..c03d6a6 100644 > >> >> >> >>> --- a/xen/arch/arm/gic.c > >> >> >> >>> +++ b/xen/arch/arm/gic.c > >> >> >> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask); > >> >> >> >>> > >> >> >> >>> #define lr_all_full() (this_cpu(lr_mask) == ((1 << > >> >> >> >>> gic_hw_ops->info->nr_lrs) - 1)) > >> >> >> >>> > >> >> >> >>> -#undef GIC_DEBUG > >> >> >> >>> +#define GIC_DEBUG 1 > >> >> >> >>> > >> >> >> >>> static void gic_update_one_lr(struct vcpu *v, int i); > >> >> >> >>> > >> >> >> >>> It is equivalent to what you proposing - my code contains > >> >> >> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will > >> >> >> >>> be executed: > >> >> >> >>> lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function > >> >> >> >>> > >> >> >> >>> regards, > >> >> >> >>> Andrii > >> >> >> >>> > >> >> >> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini > >> >> >> >>> <stefano.stabellini@eu.citrix.com> wrote: > >> >> >> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: > >> >> >> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and > >> >> >> >>> >> everything works fine > >> >> >> >>> >> The following 2 patches fixes xen/master for my platform. > >> >> >> >>> >> > >> >> >> >>> >> Stefano, could you please take a look to these changes? > >> >> >> >>> >> > >> >> >> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca > >> >> >> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> > >> >> >> >>> >> Date: Tue Nov 18 14:20:42 2014 +0200 > >> >> >> >>> >> > >> >> >> >>> >> xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag > >> >> >> >>> >> > >> >> >> >>> >> Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434 > >> >> >> >>> >> Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> > >> >> >> >>> >> > >> >> >> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c > >> >> >> >>> >> index 31fb81a..093ecdb 100644 > >> >> >> >>> >> --- a/xen/arch/arm/gic-v2.c > >> >> >> >>> >> +++ b/xen/arch/arm/gic-v2.c > >> >> >> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct > >> >> >> >>> >> pending_irq *p, > >> >> >> >>> >> << GICH_V2_LR_PRIORITY_SHIFT) | > >> >> >> >>> >> ((p->irq & GICH_V2_LR_VIRTUAL_MASK) << > >> >> >> >>> >> GICH_V2_LR_VIRTUAL_SHIFT)); > >> >> >> >>> >> > >> >> >> >>> >> - if ( p->desc != NULL ) > >> >> >> >>> >> + if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) ) > >> >> >> >>> >> { > >> >> >> >>> >> - if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) ) > >> >> >> >>> >> - lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; > >> >> >> >>> >> - else > >> >> >> >>> >> - lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & > >> >> >> >>> >> GICH_V2_LR_PHYSICAL_MASK ) > >> >> >> >>> >> - << GICH_V2_LR_PHYSICAL_SHIFT); > >> >> >> >>> >> + lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; > >> >> >> >>> >> + } > >> >> >> >>> >> + else if ( p->desc != NULL ) > >> >> >> >>> >> + { > >> >> >> >>> >> + lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK ) > >> >> >> >>> >> + << GICH_V2_LR_PHYSICAL_SHIFT); > >> >> >> >>> >> } > >> >> >> >>> >> > >> >> >> >>> >> writel_gich(lr_reg, GICH_LR + lr * 4); > >> >> >> >>> > > >> >> >> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it > >> >> >> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need > >> >> >> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not > >> >> >> >>> > working correctly on OMAP5. This changes might only be better at > >> >> >> >>> > "hiding" the real issue. > >> >> >> >>> > > >> >> >> >>> > Maybe the problem is exactly the opposite: the new scheme for avoiding > >> >> >> >>> > maintenance interrupts doesn't work for software interrupts. > >> >> >> >>> > The commit that should make them work correctly after the > >> >> >> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121 > >> >> >> >>> > If you look at the changes to gic_update_one_lr in that commit, you'll > >> >> >> >>> > see that is going to set a software irq as PENDING if it is already ACTIVE. > >> >> >> >>> > Maybe that doesn't work correctly on OMAP5. > >> >> >> >>> > > >> >> >> >>> > Could you try this patch on top of > >> >> >> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121? It should help us understand > >> >> >> >>> > if the problem is specifically with software irqs. > >> >> >> >>> > > >> >> >> >>> > > >> >> >> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > >> >> >> >>> > index b7516c0..d8a17c9 100644 > >> >> >> >>> > --- a/xen/arch/arm/gic.c > >> >> >> >>> > +++ b/xen/arch/arm/gic.c > >> >> >> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id); > >> >> >> >>> > /* Maximum cpu interface per GIC */ > >> >> >> >>> > #define NR_GIC_CPU_IF 8 > >> >> >> >>> > > >> >> >> >>> > -#undef GIC_DEBUG > >> >> >> >>> > +#define GIC_DEBUG 1 > >> >> >> >>> > > >> >> >> >>> > static void gic_update_one_lr(struct vcpu *v, int i); > >> >> >> >>> > > >> >> >> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p, > >> >> >> >>> > ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT); > >> >> >> >>> > if ( p->desc != NULL ) > >> >> >> >>> > lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT); > >> >> >> >>> > + else > >> >> >> >>> > + lr_val |= GICH_LR_MAINTENANCE_IRQ; > >> >> >> >>> > > >> >> >> >>> > GICH[GICH_LR + lr] = lr_val; > >> >> >> >>> > > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> -- > >> >> >> >>> > >> >> >> >>> Andrii Tseglytskyi | Embedded Dev > >> >> >> >>> GlobalLogic > >> >> >> >>> www.globallogic.com > >> >> >> >>> > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > -- > >> >> >> > > >> >> >> > Andrii Tseglytskyi | Embedded Dev > >> >> >> > GlobalLogic > >> >> >> > www.globallogic.com > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> > >> >> >> Andrii Tseglytskyi | Embedded Dev > >> >> >> GlobalLogic > >> >> >> www.globallogic.com > >> >> >> > >> >> > >> >> > >> >> > >> >> -- > >> >> > >> >> Andrii Tseglytskyi | Embedded Dev > >> >> GlobalLogic > >> >> www.globallogic.com > >> >> > >> > >> > >> > >> -- > >> > >> Andrii Tseglytskyi | Embedded Dev > >> GlobalLogic > >> www.globallogic.com > >> > > > > -- > > Andrii Tseglytskyi | Embedded Dev > GlobalLogic > www.globallogic.com >
On Wed, 2014-11-19 at 11:42 +0000, Stefano Stabellini wrote: > So it looks like there is not actually anything wrong, is just that you > have too much inflight irqs? It should cause problems because in that > case GICH_HCR_UIE should be set and you should get a maintenance > interrupt when LRs become available (actually when "none, or only one, > of the List register entries is marked as a valid interrupt"). > > Maybe GICH_HCR_UIE is the one that doesn't work properly. How much testing did this aspect get when the no-maint-irq series originally went in? Did you manage to find a workload which filled all the LRs or try artificially limiting the number of LRs somehow in order to provoke it? I ask because my intuition is that this won't happen very much, meaning those code paths may not be as well tested... > It might be > worth checking that you are receiving maintenance interrupts: > > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > index b7516c0..b3eaa44 100644 > --- a/xen/arch/arm/gic.c > +++ b/xen/arch/arm/gic.c > @@ -868,6 +868,8 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r > * on return to guest that is going to clear the old LRs and inject > * new interrupts. > */ > + > + gdprintk(XENLOG_DEBUG, "maintenance interrupt\n"); > } > > void gic_dump_info(struct vcpu *v) > > > You could also try to replace GICH_HCR_UIE with GICH_HCR_NPIE, you > should still be receiving maintenance interrupts when one or more LRs > become available. > > > > > > > > I doubt you have so much interrupt traffic to actually fill all the LRs, > > > so I am thinking that a few LRs might not be cleared properly (that > > > should happen on hypervisor entry, gic_update_one_lr should take care of > > > it). > > > > This actually explains why this happens during domU start - SGI > > traffic might be very heavy this time > > > > > > > > > > >> Regards, > > >> Andrii > > >> > > >> On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini > > >> <stefano.stabellini@eu.citrix.com> wrote: > > >> > Hello Andrii, > > >> > we are getting closer :-) > > >> > > > >> > It would help if you post the output with GIC_DEBUG defined but without > > >> > the other change that "fixes" the issue. > > >> > > > >> > I think the problem is probably due to software irqs. > > >> > You are getting too many > > >> > > > >> > gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending > > >> > > > >> > messages. That means you are loosing virtual SGIs (guest VCPU to guest > > >> > VCPU). It would be best to investigate why, especially if you get many > > >> > more of the same messages without the MAINTENANCE_IRQ change I > > >> > suggested. > > >> > > > >> > This patch might also help understading the problem more: > > >> > > > >> > > > >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > >> > index b7516c0..5eaeca2 100644 > > >> > --- a/xen/arch/arm/gic.c > > >> > +++ b/xen/arch/arm/gic.c > > >> > @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v) > > >> > list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue ) > > >> > { > > >> > i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs); > > >> > - if ( i >= nr_lrs ) return; > > >> > + if ( i >= nr_lrs ) > > >> > + { > > >> > + gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into d%dv%d\n", > > >> > + p->irq, v->domain->domain_id, v->vcpu_id); > > >> > + continue; > > >> > + } > > >> > > > >> > spin_lock_irqsave(&gic.lock, flags); > > >> > gic_set_lr(i, p, GICH_LR_PENDING); > > >> > > > >> > > > >> > > > >> > > > >> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: > > >> >> Hi Stefano, > > >> >> > > >> >> No hangs with this change. > > >> >> Complete log is the following: > > >> >> > > >> >> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26) > > >> >> DRA752 ES1.0 > > >> >> <ethaddr> not set. Validating first E-fuse MAC > > >> >> cpsw > > >> >> - UART enabled - > > >> >> - CPU 00000000 booting - > > >> >> - Xen starting in Hyp mode - > > >> >> - Zero BSS - > > >> >> - Setting up control registers - > > >> >> - Turning on paging - > > >> >> - Ready - > > >> >> (XEN) Checking for initrd in /chosen > > >> >> (XEN) RAM: 0000000080000000 - 000000009fffffff > > >> >> (XEN) RAM: 00000000a0000000 - 00000000bfffffff > > >> >> (XEN) RAM: 00000000c0000000 - 00000000dfffffff > > >> >> (XEN) > > >> >> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa > > >> >> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000 > > >> >> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000 > > >> >> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000 > > >> >> (XEN) RESVD[0]: 00000000ba300000 - 00000000bfd00000 > > >> >> (XEN) RESVD[1]: 0000000095800000 - 0000000095900000 > > >> >> (XEN) RESVD[2]: 0000000098a00000 - 0000000098b00000 > > >> >> (XEN) RESVD[3]: 0000000095f00000 - 0000000098a00000 > > >> >> (XEN) RESVD[4]: 0000000095900000 - 0000000095f00000 > > >> >> (XEN) > > >> >> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0 > > >> >> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1 > > >> >> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000 > > >> >> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages) > > >> >> (XEN) Dom heap: 344064 pages > > >> >> (XEN) Domain heap initialised > > >> >> (XEN) Looking for UART console serial0 > > >> >> Xen 4.5-unstable > > >> >> (XEN) Xen version 4.5-unstable (atseglytskyi@) > > >> >> (arm-linux-gnueabihf-gcc (crosstool-NG > > >> >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3 > > >> >> 20130328 (prerelease)) debu4 > > >> >> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty > > >> >> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2 > > >> >> (XEN) 32-bit Execution: > > >> >> (XEN) Processor Features: 00001131:00011011 > > >> >> (XEN) Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle > > >> >> (XEN) Extensions: GenericTimer Security > > >> >> (XEN) Debug Features: 02010555 > > >> >> (XEN) Auxiliary Features: 00000000 > > >> >> (XEN) Memory Model Features: 10201105 20000000 01240000 02102211 > > >> >> (XEN) ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000 > > >> >> (XEN) Platform: TI DRA7 > > >> >> (XEN) /psci method must be smc, but is: "hvc" > > >> >> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c) > > >> >> (XEN) Set AuxCoreBoot0 to 0x20 > > >> >> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 > > >> >> (XEN) Using generic timer at 6144 KHz > > >> >> (XEN) GIC initialization: > > >> >> (XEN) gic_dist_addr=0000000048211000 > > >> >> (XEN) gic_cpu_addr=0000000048212000 > > >> >> (XEN) gic_hyp_addr=0000000048214000 > > >> >> (XEN) gic_vcpu_addr=0000000048216000 > > >> >> (XEN) gic_maintenance_irq=25 > > >> >> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b). > > >> >> (XEN) Using scheduler: SMP Credit Scheduler (credit) > > >> >> (XEN) I/O virtualisation disabled > > >> >> (XEN) Allocated console ring of 16 KiB. > > >> >> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0 > > >> >> (XEN) Bringing up CPU1 > > >> >> - CPU 00000001 booting - > > >> >> - Xen starting in Hyp mode - > > >> >> - Setting up control registers - > > >> >> - Turning on paging - > > >> >> - Ready - > > >> >> (XEN) CPU 1 booted. > > >> >> (XEN) Brought up 2 CPUs > > >> >> (XEN) *** LOADING DOMAIN 0 *** > > >> >> (XEN) Loading kernel from boot module 2 > > >> >> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0) > > >> >> (XEN) Loading zImage from 00000000c0000040 to 00000000cfc00000-00000000cff50c48 > > >> >> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8 > > >> >> (XEN) Std. Loglevel: All > > >> >> (XEN) Guest Loglevel: All > > >> >> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch > > >> >> input to Xen) > > >> >> (XEN) Freed 272kB init memory. > > >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is > > >> >> already pending in LR0 > > >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is > > >> >> already pending in LR0 > > >> >> [ 0.000000] /cpus/cpu@0 missing clock-frequency property > > >> >> [ 0.000000] /cpus/cpu@1 missing clock-frequency property > > >> >> [ 0.140625] omap-gpmc omap-gpmc: failed to reserve memory > > >> >> [ 0.187500] omap_l3_noc ocp.3: couldn't find resource 2 > > >> >> [ 0.273437] i2c i2c-1: of_i2c: invalid reg on > > >> >> /ocp/i2c@48072000/camera_ov10635 > > >> >> [ 0.437500] ldo3: operation not allowed > > >> >> [ 0.437500] omapdss HDMI error: can't set the voltage regulator > > >> >> [ 0.468750] tfc_s9700 display0: tfc_s9700_probe probe > > >> >> [ 0.468750] ov1063x 1-0030: No deserializer node found > > >> >> [ 0.468750] ov1063x 1-0030: No serializer node found > > >> >> [ 0.468750] ov1063x 1-0030: Failed writing register 0x0103! > > >> >> [ 0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30 > > >> >> [ 0.578125] ahci ahci.0.auto: can't get clock > > >> >> [ 0.898437] ldc_module_init > > >> >> [ 1.304687] Missing dual_emac_res_vlan in DT. > > >> >> [ 1.304687] Using 1 as Reserved VLAN for 0 slave > > >> >> [ 1.312500] Missing dual_emac_res_vlan in DT. > > >> >> [ 1.320312] Using 2 as Reserved VLAN for 1 slave > > >> >> [ 1.382812] Freeing init memory: 236K > > >> >> sh: write error: No such device > > >> >> Cannot identify '/dev/camera0': 2, No such file or directory > > >> >> Parsing config from /xen/images/DomUAndroid.cfg > > >> >> XSM Disabled: seclabel not supported > > >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet > > >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give > > >> >> dom1 access to irq 53: Function not implemented > > >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet > > >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give > > >> >> dom1 access to irq 71: Function not implemented > > >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet > > >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give > > >> >> dom1 access to irq 173: Function not implemented > > >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet > > >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give > > >> >> dom1 access to irq 174: Function not implemented > > >> >> Turning on vfb in domain 1 > > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > > >> >> still lr_pending > > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > > >> >> still lr_pending > > >> >> Parsing config from /xen/images/DomUQNX.cfg > > >> >> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to > > >> >> inject irq=2 into d0v0, when it is still lr_pending > > >> >> > > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > > >> >> still lr_pending > > >> >> [ 4.304687] dra7-evm-sound sound.17: cpu dai node is invalid > > >> >> [ 4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link -22 > > >> >> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader > > >> >> found: Invalid kernel > > >> >> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image > > >> >> failed: No such file or directory > > >> >> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot > > >> >> (re-)build domain: -3 > > >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is > > >> >> still lr_pending > > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > > >> >> still lr_pending > > >> >> Turning on 'vsnd' in domain '1' (dev_id: '0') > > >> >> Turning on vkbd in domain 1 > > >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is > > >> >> still lr_pending > > >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is > > >> >> still lr_pending > > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > > >> >> still lr_pending > > >> >> > > >> >> Please press Enter to activate this console. (XEN) gic.c:617:d0v1 > > >> >> trying to inject irq=2 into d0v0, when it is still lr_pending > > >> >> > > >> >> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi > > >> >> <andrii.tseglytskyi@globallogic.com> wrote: > > >> >> > OK got it. Give me a few mins > > >> >> > > > >> >> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini > > >> >> > <stefano.stabellini@eu.citrix.com> wrote: > > >> >> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only > > >> >> >> for non-hardware irqs (desc == NULL) and keep avoiding > > >> >> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs. > > >> >> >> > > >> >> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid > > >> >> >> other potential bugs introduced later. > > >> >> >> > > >> >> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: > > >> >> >>> What if I try on top of current master branch the following code: > > >> >> >>> > > >> >> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c > > >> >> >>> index 31fb81a..6764ab7 100644 > > >> >> >>> --- a/xen/arch/arm/gic-v2.c > > >> >> >>> +++ b/xen/arch/arm/gic-v2.c > > >> >> >>> @@ -36,6 +36,8 @@ > > >> >> >>> #include <asm/io.h> > > >> >> >>> #include <asm/gic.h> > > >> >> >>> > > >> >> >>> +#define GIC_DEBUG 1 > > >> >> >>> + > > >> >> >>> /* > > >> >> >>> * LR register definitions are GIC v2 specific. > > >> >> >>> * Moved these definitions from header file to here > > >> >> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > >> >> >>> index bcaded9..c03d6a6 100644 > > >> >> >>> --- a/xen/arch/arm/gic.c > > >> >> >>> +++ b/xen/arch/arm/gic.c > > >> >> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask); > > >> >> >>> > > >> >> >>> #define lr_all_full() (this_cpu(lr_mask) == ((1 << > > >> >> >>> gic_hw_ops->info->nr_lrs) - 1)) > > >> >> >>> > > >> >> >>> -#undef GIC_DEBUG > > >> >> >>> +#define GIC_DEBUG 1 > > >> >> >>> > > >> >> >>> static void gic_update_one_lr(struct vcpu *v, int i); > > >> >> >>> > > >> >> >>> It is equivalent to what you proposing - my code contains > > >> >> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will > > >> >> >>> be executed: > > >> >> >>> lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function > > >> >> >>> > > >> >> >>> regards, > > >> >> >>> Andrii > > >> >> >>> > > >> >> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini > > >> >> >>> <stefano.stabellini@eu.citrix.com> wrote: > > >> >> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: > > >> >> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and > > >> >> >>> >> everything works fine > > >> >> >>> >> The following 2 patches fixes xen/master for my platform. > > >> >> >>> >> > > >> >> >>> >> Stefano, could you please take a look to these changes? > > >> >> >>> >> > > >> >> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca > > >> >> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> > > >> >> >>> >> Date: Tue Nov 18 14:20:42 2014 +0200 > > >> >> >>> >> > > >> >> >>> >> xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag > > >> >> >>> >> > > >> >> >>> >> Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434 > > >> >> >>> >> Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> > > >> >> >>> >> > > >> >> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c > > >> >> >>> >> index 31fb81a..093ecdb 100644 > > >> >> >>> >> --- a/xen/arch/arm/gic-v2.c > > >> >> >>> >> +++ b/xen/arch/arm/gic-v2.c > > >> >> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct > > >> >> >>> >> pending_irq *p, > > >> >> >>> >> << GICH_V2_LR_PRIORITY_SHIFT) | > > >> >> >>> >> ((p->irq & GICH_V2_LR_VIRTUAL_MASK) << > > >> >> >>> >> GICH_V2_LR_VIRTUAL_SHIFT)); > > >> >> >>> >> > > >> >> >>> >> - if ( p->desc != NULL ) > > >> >> >>> >> + if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) ) > > >> >> >>> >> { > > >> >> >>> >> - if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) ) > > >> >> >>> >> - lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; > > >> >> >>> >> - else > > >> >> >>> >> - lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & > > >> >> >>> >> GICH_V2_LR_PHYSICAL_MASK ) > > >> >> >>> >> - << GICH_V2_LR_PHYSICAL_SHIFT); > > >> >> >>> >> + lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; > > >> >> >>> >> + } > > >> >> >>> >> + else if ( p->desc != NULL ) > > >> >> >>> >> + { > > >> >> >>> >> + lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK ) > > >> >> >>> >> + << GICH_V2_LR_PHYSICAL_SHIFT); > > >> >> >>> >> } > > >> >> >>> >> > > >> >> >>> >> writel_gich(lr_reg, GICH_LR + lr * 4); > > >> >> >>> > > > >> >> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it > > >> >> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need > > >> >> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not > > >> >> >>> > working correctly on OMAP5. This changes might only be better at > > >> >> >>> > "hiding" the real issue. > > >> >> >>> > > > >> >> >>> > Maybe the problem is exactly the opposite: the new scheme for avoiding > > >> >> >>> > maintenance interrupts doesn't work for software interrupts. > > >> >> >>> > The commit that should make them work correctly after the > > >> >> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121 > > >> >> >>> > If you look at the changes to gic_update_one_lr in that commit, you'll > > >> >> >>> > see that is going to set a software irq as PENDING if it is already ACTIVE. > > >> >> >>> > Maybe that doesn't work correctly on OMAP5. > > >> >> >>> > > > >> >> >>> > Could you try this patch on top of > > >> >> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121? It should help us understand > > >> >> >>> > if the problem is specifically with software irqs. > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > >> >> >>> > index b7516c0..d8a17c9 100644 > > >> >> >>> > --- a/xen/arch/arm/gic.c > > >> >> >>> > +++ b/xen/arch/arm/gic.c > > >> >> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id); > > >> >> >>> > /* Maximum cpu interface per GIC */ > > >> >> >>> > #define NR_GIC_CPU_IF 8 > > >> >> >>> > > > >> >> >>> > -#undef GIC_DEBUG > > >> >> >>> > +#define GIC_DEBUG 1 > > >> >> >>> > > > >> >> >>> > static void gic_update_one_lr(struct vcpu *v, int i); > > >> >> >>> > > > >> >> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p, > > >> >> >>> > ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT); > > >> >> >>> > if ( p->desc != NULL ) > > >> >> >>> > lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT); > > >> >> >>> > + else > > >> >> >>> > + lr_val |= GICH_LR_MAINTENANCE_IRQ; > > >> >> >>> > > > >> >> >>> > GICH[GICH_LR + lr] = lr_val; > > >> >> >>> > > > >> >> >>> > > >> >> >>> > > >> >> >>> > > >> >> >>> -- > > >> >> >>> > > >> >> >>> Andrii Tseglytskyi | Embedded Dev > > >> >> >>> GlobalLogic > > >> >> >>> www.globallogic.com > > >> >> >>> > > >> >> > > > >> >> > > > >> >> > > > >> >> > -- > > >> >> > > > >> >> > Andrii Tseglytskyi | Embedded Dev > > >> >> > GlobalLogic > > >> >> > www.globallogic.com > > >> >> > > >> >> > > >> >> > > >> >> -- > > >> >> > > >> >> Andrii Tseglytskyi | Embedded Dev > > >> >> GlobalLogic > > >> >> www.globallogic.com > > >> >> > > >> > > >> > > >> > > >> -- > > >> > > >> Andrii Tseglytskyi | Embedded Dev > > >> GlobalLogic > > >> www.globallogic.com > > >> > > > > > > > > -- > > > > Andrii Tseglytskyi | Embedded Dev > > GlobalLogic > > www.globallogic.com > >
On Wed, 19 Nov 2014, Ian Campbell wrote: > On Wed, 2014-11-19 at 11:42 +0000, Stefano Stabellini wrote: > > So it looks like there is not actually anything wrong, is just that you > > have too much inflight irqs? It should cause problems because in that > > case GICH_HCR_UIE should be set and you should get a maintenance > > interrupt when LRs become available (actually when "none, or only one, > > of the List register entries is marked as a valid interrupt"). > > > > Maybe GICH_HCR_UIE is the one that doesn't work properly. > > How much testing did this aspect get when the no-maint-irq series > originally went in? Did you manage to find a workload which filled all > the LRs or try artificially limiting the number of LRs somehow in order > to provoke it? > > I ask because my intuition is that this won't happen very much, meaning > those code paths may not be as well tested... I did test it by artificially limiting the number of LRs to 1. However there have been many iterations of that series and I didn't run this test at every iteration. > > > It might be > > worth checking that you are receiving maintenance interrupts: > > > > > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > index b7516c0..b3eaa44 100644 > > --- a/xen/arch/arm/gic.c > > +++ b/xen/arch/arm/gic.c > > @@ -868,6 +868,8 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r > > * on return to guest that is going to clear the old LRs and inject > > * new interrupts. > > */ > > + > > + gdprintk(XENLOG_DEBUG, "maintenance interrupt\n"); > > } > > > > void gic_dump_info(struct vcpu *v) > > > > > > You could also try to replace GICH_HCR_UIE with GICH_HCR_NPIE, you > > should still be receiving maintenance interrupts when one or more LRs > > become available. > > > > > > > > > > > > I doubt you have so much interrupt traffic to actually fill all the LRs, > > > > so I am thinking that a few LRs might not be cleared properly (that > > > > should happen on hypervisor entry, gic_update_one_lr should take care of > > > > it). > > > > > > This actually explains why this happens during domU start - SGI > > > traffic might be very heavy this time > > > > > > > > > > > > > > >> Regards, > > > >> Andrii > > > >> > > > >> On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini > > > >> <stefano.stabellini@eu.citrix.com> wrote: > > > >> > Hello Andrii, > > > >> > we are getting closer :-) > > > >> > > > > >> > It would help if you post the output with GIC_DEBUG defined but without > > > >> > the other change that "fixes" the issue. > > > >> > > > > >> > I think the problem is probably due to software irqs. > > > >> > You are getting too many > > > >> > > > > >> > gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending > > > >> > > > > >> > messages. That means you are loosing virtual SGIs (guest VCPU to guest > > > >> > VCPU). It would be best to investigate why, especially if you get many > > > >> > more of the same messages without the MAINTENANCE_IRQ change I > > > >> > suggested. > > > >> > > > > >> > This patch might also help understading the problem more: > > > >> > > > > >> > > > > >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > > >> > index b7516c0..5eaeca2 100644 > > > >> > --- a/xen/arch/arm/gic.c > > > >> > +++ b/xen/arch/arm/gic.c > > > >> > @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v) > > > >> > list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue ) > > > >> > { > > > >> > i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs); > > > >> > - if ( i >= nr_lrs ) return; > > > >> > + if ( i >= nr_lrs ) > > > >> > + { > > > >> > + gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into d%dv%d\n", > > > >> > + p->irq, v->domain->domain_id, v->vcpu_id); > > > >> > + continue; > > > >> > + } > > > >> > > > > >> > spin_lock_irqsave(&gic.lock, flags); > > > >> > gic_set_lr(i, p, GICH_LR_PENDING); > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: > > > >> >> Hi Stefano, > > > >> >> > > > >> >> No hangs with this change. > > > >> >> Complete log is the following: > > > >> >> > > > >> >> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26) > > > >> >> DRA752 ES1.0 > > > >> >> <ethaddr> not set. Validating first E-fuse MAC > > > >> >> cpsw > > > >> >> - UART enabled - > > > >> >> - CPU 00000000 booting - > > > >> >> - Xen starting in Hyp mode - > > > >> >> - Zero BSS - > > > >> >> - Setting up control registers - > > > >> >> - Turning on paging - > > > >> >> - Ready - > > > >> >> (XEN) Checking for initrd in /chosen > > > >> >> (XEN) RAM: 0000000080000000 - 000000009fffffff > > > >> >> (XEN) RAM: 00000000a0000000 - 00000000bfffffff > > > >> >> (XEN) RAM: 00000000c0000000 - 00000000dfffffff > > > >> >> (XEN) > > > >> >> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa > > > >> >> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000 > > > >> >> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000 > > > >> >> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000 > > > >> >> (XEN) RESVD[0]: 00000000ba300000 - 00000000bfd00000 > > > >> >> (XEN) RESVD[1]: 0000000095800000 - 0000000095900000 > > > >> >> (XEN) RESVD[2]: 0000000098a00000 - 0000000098b00000 > > > >> >> (XEN) RESVD[3]: 0000000095f00000 - 0000000098a00000 > > > >> >> (XEN) RESVD[4]: 0000000095900000 - 0000000095f00000 > > > >> >> (XEN) > > > >> >> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0 > > > >> >> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1 > > > >> >> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000 > > > >> >> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages) > > > >> >> (XEN) Dom heap: 344064 pages > > > >> >> (XEN) Domain heap initialised > > > >> >> (XEN) Looking for UART console serial0 > > > >> >> Xen 4.5-unstable > > > >> >> (XEN) Xen version 4.5-unstable (atseglytskyi@) > > > >> >> (arm-linux-gnueabihf-gcc (crosstool-NG > > > >> >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3 > > > >> >> 20130328 (prerelease)) debu4 > > > >> >> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty > > > >> >> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2 > > > >> >> (XEN) 32-bit Execution: > > > >> >> (XEN) Processor Features: 00001131:00011011 > > > >> >> (XEN) Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle > > > >> >> (XEN) Extensions: GenericTimer Security > > > >> >> (XEN) Debug Features: 02010555 > > > >> >> (XEN) Auxiliary Features: 00000000 > > > >> >> (XEN) Memory Model Features: 10201105 20000000 01240000 02102211 > > > >> >> (XEN) ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000 > > > >> >> (XEN) Platform: TI DRA7 > > > >> >> (XEN) /psci method must be smc, but is: "hvc" > > > >> >> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c) > > > >> >> (XEN) Set AuxCoreBoot0 to 0x20 > > > >> >> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 > > > >> >> (XEN) Using generic timer at 6144 KHz > > > >> >> (XEN) GIC initialization: > > > >> >> (XEN) gic_dist_addr=0000000048211000 > > > >> >> (XEN) gic_cpu_addr=0000000048212000 > > > >> >> (XEN) gic_hyp_addr=0000000048214000 > > > >> >> (XEN) gic_vcpu_addr=0000000048216000 > > > >> >> (XEN) gic_maintenance_irq=25 > > > >> >> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b). > > > >> >> (XEN) Using scheduler: SMP Credit Scheduler (credit) > > > >> >> (XEN) I/O virtualisation disabled > > > >> >> (XEN) Allocated console ring of 16 KiB. > > > >> >> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0 > > > >> >> (XEN) Bringing up CPU1 > > > >> >> - CPU 00000001 booting - > > > >> >> - Xen starting in Hyp mode - > > > >> >> - Setting up control registers - > > > >> >> - Turning on paging - > > > >> >> - Ready - > > > >> >> (XEN) CPU 1 booted. > > > >> >> (XEN) Brought up 2 CPUs > > > >> >> (XEN) *** LOADING DOMAIN 0 *** > > > >> >> (XEN) Loading kernel from boot module 2 > > > >> >> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0) > > > >> >> (XEN) Loading zImage from 00000000c0000040 to 00000000cfc00000-00000000cff50c48 > > > >> >> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8 > > > >> >> (XEN) Std. Loglevel: All > > > >> >> (XEN) Guest Loglevel: All > > > >> >> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch > > > >> >> input to Xen) > > > >> >> (XEN) Freed 272kB init memory. > > > >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is > > > >> >> already pending in LR0 > > > >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is > > > >> >> already pending in LR0 > > > >> >> [ 0.000000] /cpus/cpu@0 missing clock-frequency property > > > >> >> [ 0.000000] /cpus/cpu@1 missing clock-frequency property > > > >> >> [ 0.140625] omap-gpmc omap-gpmc: failed to reserve memory > > > >> >> [ 0.187500] omap_l3_noc ocp.3: couldn't find resource 2 > > > >> >> [ 0.273437] i2c i2c-1: of_i2c: invalid reg on > > > >> >> /ocp/i2c@48072000/camera_ov10635 > > > >> >> [ 0.437500] ldo3: operation not allowed > > > >> >> [ 0.437500] omapdss HDMI error: can't set the voltage regulator > > > >> >> [ 0.468750] tfc_s9700 display0: tfc_s9700_probe probe > > > >> >> [ 0.468750] ov1063x 1-0030: No deserializer node found > > > >> >> [ 0.468750] ov1063x 1-0030: No serializer node found > > > >> >> [ 0.468750] ov1063x 1-0030: Failed writing register 0x0103! > > > >> >> [ 0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30 > > > >> >> [ 0.578125] ahci ahci.0.auto: can't get clock > > > >> >> [ 0.898437] ldc_module_init > > > >> >> [ 1.304687] Missing dual_emac_res_vlan in DT. > > > >> >> [ 1.304687] Using 1 as Reserved VLAN for 0 slave > > > >> >> [ 1.312500] Missing dual_emac_res_vlan in DT. > > > >> >> [ 1.320312] Using 2 as Reserved VLAN for 1 slave > > > >> >> [ 1.382812] Freeing init memory: 236K > > > >> >> sh: write error: No such device > > > >> >> Cannot identify '/dev/camera0': 2, No such file or directory > > > >> >> Parsing config from /xen/images/DomUAndroid.cfg > > > >> >> XSM Disabled: seclabel not supported > > > >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet > > > >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give > > > >> >> dom1 access to irq 53: Function not implemented > > > >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet > > > >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give > > > >> >> dom1 access to irq 71: Function not implemented > > > >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet > > > >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give > > > >> >> dom1 access to irq 173: Function not implemented > > > >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet > > > >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give > > > >> >> dom1 access to irq 174: Function not implemented > > > >> >> Turning on vfb in domain 1 > > > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > > > >> >> still lr_pending > > > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > > > >> >> still lr_pending > > > >> >> Parsing config from /xen/images/DomUQNX.cfg > > > >> >> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to > > > >> >> inject irq=2 into d0v0, when it is still lr_pending > > > >> >> > > > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > > > >> >> still lr_pending > > > >> >> [ 4.304687] dra7-evm-sound sound.17: cpu dai node is invalid > > > >> >> [ 4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link -22 > > > >> >> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader > > > >> >> found: Invalid kernel > > > >> >> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image > > > >> >> failed: No such file or directory > > > >> >> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot > > > >> >> (re-)build domain: -3 > > > >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is > > > >> >> still lr_pending > > > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > > > >> >> still lr_pending > > > >> >> Turning on 'vsnd' in domain '1' (dev_id: '0') > > > >> >> Turning on vkbd in domain 1 > > > >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is > > > >> >> still lr_pending > > > >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is > > > >> >> still lr_pending > > > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is > > > >> >> still lr_pending > > > >> >> > > > >> >> Please press Enter to activate this console. (XEN) gic.c:617:d0v1 > > > >> >> trying to inject irq=2 into d0v0, when it is still lr_pending > > > >> >> > > > >> >> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi > > > >> >> <andrii.tseglytskyi@globallogic.com> wrote: > > > >> >> > OK got it. Give me a few mins > > > >> >> > > > > >> >> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini > > > >> >> > <stefano.stabellini@eu.citrix.com> wrote: > > > >> >> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only > > > >> >> >> for non-hardware irqs (desc == NULL) and keep avoiding > > > >> >> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs. > > > >> >> >> > > > >> >> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid > > > >> >> >> other potential bugs introduced later. > > > >> >> >> > > > >> >> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: > > > >> >> >>> What if I try on top of current master branch the following code: > > > >> >> >>> > > > >> >> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c > > > >> >> >>> index 31fb81a..6764ab7 100644 > > > >> >> >>> --- a/xen/arch/arm/gic-v2.c > > > >> >> >>> +++ b/xen/arch/arm/gic-v2.c > > > >> >> >>> @@ -36,6 +36,8 @@ > > > >> >> >>> #include <asm/io.h> > > > >> >> >>> #include <asm/gic.h> > > > >> >> >>> > > > >> >> >>> +#define GIC_DEBUG 1 > > > >> >> >>> + > > > >> >> >>> /* > > > >> >> >>> * LR register definitions are GIC v2 specific. > > > >> >> >>> * Moved these definitions from header file to here > > > >> >> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > > >> >> >>> index bcaded9..c03d6a6 100644 > > > >> >> >>> --- a/xen/arch/arm/gic.c > > > >> >> >>> +++ b/xen/arch/arm/gic.c > > > >> >> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask); > > > >> >> >>> > > > >> >> >>> #define lr_all_full() (this_cpu(lr_mask) == ((1 << > > > >> >> >>> gic_hw_ops->info->nr_lrs) - 1)) > > > >> >> >>> > > > >> >> >>> -#undef GIC_DEBUG > > > >> >> >>> +#define GIC_DEBUG 1 > > > >> >> >>> > > > >> >> >>> static void gic_update_one_lr(struct vcpu *v, int i); > > > >> >> >>> > > > >> >> >>> It is equivalent to what you proposing - my code contains > > > >> >> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will > > > >> >> >>> be executed: > > > >> >> >>> lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function > > > >> >> >>> > > > >> >> >>> regards, > > > >> >> >>> Andrii > > > >> >> >>> > > > >> >> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini > > > >> >> >>> <stefano.stabellini@eu.citrix.com> wrote: > > > >> >> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: > > > >> >> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and > > > >> >> >>> >> everything works fine > > > >> >> >>> >> The following 2 patches fixes xen/master for my platform. > > > >> >> >>> >> > > > >> >> >>> >> Stefano, could you please take a look to these changes? > > > >> >> >>> >> > > > >> >> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca > > > >> >> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> > > > >> >> >>> >> Date: Tue Nov 18 14:20:42 2014 +0200 > > > >> >> >>> >> > > > >> >> >>> >> xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag > > > >> >> >>> >> > > > >> >> >>> >> Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434 > > > >> >> >>> >> Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> > > > >> >> >>> >> > > > >> >> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c > > > >> >> >>> >> index 31fb81a..093ecdb 100644 > > > >> >> >>> >> --- a/xen/arch/arm/gic-v2.c > > > >> >> >>> >> +++ b/xen/arch/arm/gic-v2.c > > > >> >> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct > > > >> >> >>> >> pending_irq *p, > > > >> >> >>> >> << GICH_V2_LR_PRIORITY_SHIFT) | > > > >> >> >>> >> ((p->irq & GICH_V2_LR_VIRTUAL_MASK) << > > > >> >> >>> >> GICH_V2_LR_VIRTUAL_SHIFT)); > > > >> >> >>> >> > > > >> >> >>> >> - if ( p->desc != NULL ) > > > >> >> >>> >> + if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) ) > > > >> >> >>> >> { > > > >> >> >>> >> - if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) ) > > > >> >> >>> >> - lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; > > > >> >> >>> >> - else > > > >> >> >>> >> - lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & > > > >> >> >>> >> GICH_V2_LR_PHYSICAL_MASK ) > > > >> >> >>> >> - << GICH_V2_LR_PHYSICAL_SHIFT); > > > >> >> >>> >> + lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; > > > >> >> >>> >> + } > > > >> >> >>> >> + else if ( p->desc != NULL ) > > > >> >> >>> >> + { > > > >> >> >>> >> + lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK ) > > > >> >> >>> >> + << GICH_V2_LR_PHYSICAL_SHIFT); > > > >> >> >>> >> } > > > >> >> >>> >> > > > >> >> >>> >> writel_gich(lr_reg, GICH_LR + lr * 4); > > > >> >> >>> > > > > >> >> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it > > > >> >> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need > > > >> >> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not > > > >> >> >>> > working correctly on OMAP5. This changes might only be better at > > > >> >> >>> > "hiding" the real issue. > > > >> >> >>> > > > > >> >> >>> > Maybe the problem is exactly the opposite: the new scheme for avoiding > > > >> >> >>> > maintenance interrupts doesn't work for software interrupts. > > > >> >> >>> > The commit that should make them work correctly after the > > > >> >> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121 > > > >> >> >>> > If you look at the changes to gic_update_one_lr in that commit, you'll > > > >> >> >>> > see that is going to set a software irq as PENDING if it is already ACTIVE. > > > >> >> >>> > Maybe that doesn't work correctly on OMAP5. > > > >> >> >>> > > > > >> >> >>> > Could you try this patch on top of > > > >> >> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121? It should help us understand > > > >> >> >>> > if the problem is specifically with software irqs. > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > > >> >> >>> > index b7516c0..d8a17c9 100644 > > > >> >> >>> > --- a/xen/arch/arm/gic.c > > > >> >> >>> > +++ b/xen/arch/arm/gic.c > > > >> >> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id); > > > >> >> >>> > /* Maximum cpu interface per GIC */ > > > >> >> >>> > #define NR_GIC_CPU_IF 8 > > > >> >> >>> > > > > >> >> >>> > -#undef GIC_DEBUG > > > >> >> >>> > +#define GIC_DEBUG 1 > > > >> >> >>> > > > > >> >> >>> > static void gic_update_one_lr(struct vcpu *v, int i); > > > >> >> >>> > > > > >> >> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p, > > > >> >> >>> > ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT); > > > >> >> >>> > if ( p->desc != NULL ) > > > >> >> >>> > lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT); > > > >> >> >>> > + else > > > >> >> >>> > + lr_val |= GICH_LR_MAINTENANCE_IRQ; > > > >> >> >>> > > > > >> >> >>> > GICH[GICH_LR + lr] = lr_val; > > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > > >> >> >>> -- > > > >> >> >>> > > > >> >> >>> Andrii Tseglytskyi | Embedded Dev > > > >> >> >>> GlobalLogic > > > >> >> >>> www.globallogic.com > > > >> >> >>> > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > -- > > > >> >> > > > > >> >> > Andrii Tseglytskyi | Embedded Dev > > > >> >> > GlobalLogic > > > >> >> > www.globallogic.com > > > >> >> > > > >> >> > > > >> >> > > > >> >> -- > > > >> >> > > > >> >> Andrii Tseglytskyi | Embedded Dev > > > >> >> GlobalLogic > > > >> >> www.globallogic.com > > > >> >> > > > >> > > > >> > > > >> > > > >> -- > > > >> > > > >> Andrii Tseglytskyi | Embedded Dev > > > >> GlobalLogic > > > >> www.globallogic.com > > > >> > > > > > > > > > > > > -- > > > > > > Andrii Tseglytskyi | Embedded Dev > > > GlobalLogic > > > www.globallogic.com > > > > >
On 11/19/2014 12:17 PM, Stefano Stabellini wrote: > On Wed, 19 Nov 2014, Ian Campbell wrote: >> On Wed, 2014-11-19 at 11:42 +0000, Stefano Stabellini wrote: >>> So it looks like there is not actually anything wrong, is just that you >>> have too much inflight irqs? It should cause problems because in that >>> case GICH_HCR_UIE should be set and you should get a maintenance >>> interrupt when LRs become available (actually when "none, or only one, >>> of the List register entries is marked as a valid interrupt"). >>> >>> Maybe GICH_HCR_UIE is the one that doesn't work properly. >> >> How much testing did this aspect get when the no-maint-irq series >> originally went in? Did you manage to find a workload which filled all >> the LRs or try artificially limiting the number of LRs somehow in order >> to provoke it? >> >> I ask because my intuition is that this won't happen very much, meaning >> those code paths may not be as well tested... > > I did test it by artificially limiting the number of LRs to 1. > However there have been many iterations of that series and I didn't run > this test at every iteration. am I the only to think this may not be related to this bug? All the LRs are full with IRQ of the same priority. So it's valid. As gic_restore_pending_irqs is called every time that we return to the guest. It could be anything else. It would be interesting to see why we are trapping all the time in Xen. Regards,
Hi Stefano, > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > > - GICH[GICH_HCR] |= GICH_HCR_UIE; > > + GICH[GICH_HCR] |= GICH_HCR_NPIE; > > else > > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; > > + GICH[GICH_HCR] &= ~GICH_HCR_NPIE; > > > > } > > Yes, exactly I tried, hang still occurs with this change Regards, Andrii
Hi Julien, On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall <julien.grall@linaro.org> wrote: > On 11/19/2014 12:17 PM, Stefano Stabellini wrote: >> On Wed, 19 Nov 2014, Ian Campbell wrote: >>> On Wed, 2014-11-19 at 11:42 +0000, Stefano Stabellini wrote: >>>> So it looks like there is not actually anything wrong, is just that you >>>> have too much inflight irqs? It should cause problems because in that >>>> case GICH_HCR_UIE should be set and you should get a maintenance >>>> interrupt when LRs become available (actually when "none, or only one, >>>> of the List register entries is marked as a valid interrupt"). >>>> >>>> Maybe GICH_HCR_UIE is the one that doesn't work properly. >>> >>> How much testing did this aspect get when the no-maint-irq series >>> originally went in? Did you manage to find a workload which filled all >>> the LRs or try artificially limiting the number of LRs somehow in order >>> to provoke it? >>> >>> I ask because my intuition is that this won't happen very much, meaning >>> those code paths may not be as well tested... >> >> I did test it by artificially limiting the number of LRs to 1. >> However there have been many iterations of that series and I didn't run >> this test at every iteration. > > am I the only to think this may not be related to this bug? All the LRs > are full with IRQ of the same priority. So it's valid. > > As gic_restore_pending_irqs is called every time that we return to the > guest. It could be anything else. > > It would be interesting to see why we are trapping all the time in Xen. > I may perform any test if you have some specific scenario. > Regards, > > -- > Julien Grall
On 11/19/2014 12:40 PM, Andrii Tseglytskyi wrote: > Hi Julien, > > On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall <julien.grall@linaro.org> wrote: >> On 11/19/2014 12:17 PM, Stefano Stabellini wrote: >>> On Wed, 19 Nov 2014, Ian Campbell wrote: >>>> On Wed, 2014-11-19 at 11:42 +0000, Stefano Stabellini wrote: >>>>> So it looks like there is not actually anything wrong, is just that you >>>>> have too much inflight irqs? It should cause problems because in that >>>>> case GICH_HCR_UIE should be set and you should get a maintenance >>>>> interrupt when LRs become available (actually when "none, or only one, >>>>> of the List register entries is marked as a valid interrupt"). >>>>> >>>>> Maybe GICH_HCR_UIE is the one that doesn't work properly. >>>> >>>> How much testing did this aspect get when the no-maint-irq series >>>> originally went in? Did you manage to find a workload which filled all >>>> the LRs or try artificially limiting the number of LRs somehow in order >>>> to provoke it? >>>> >>>> I ask because my intuition is that this won't happen very much, meaning >>>> those code paths may not be as well tested... >>> >>> I did test it by artificially limiting the number of LRs to 1. >>> However there have been many iterations of that series and I didn't run >>> this test at every iteration. >> >> am I the only to think this may not be related to this bug? All the LRs >> are full with IRQ of the same priority. So it's valid. >> >> As gic_restore_pending_irqs is called every time that we return to the >> guest. It could be anything else. >> >> It would be interesting to see why we are trapping all the time in Xen. >> > > I may perform any test if you have some specific scenario. I have no specific scenario in my mind :/. It looks like I'm able to reproduce it on my ARM board by the restricted the number of LRs to 1. I will investigate. Regards,
On Wed, Nov 19, 2014 at 3:26 PM, Julien Grall <julien.grall@linaro.org> wrote: > On 11/19/2014 12:40 PM, Andrii Tseglytskyi wrote: >> Hi Julien, >> >> On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall <julien.grall@linaro.org> wrote: >>> On 11/19/2014 12:17 PM, Stefano Stabellini wrote: >>>> On Wed, 19 Nov 2014, Ian Campbell wrote: >>>>> On Wed, 2014-11-19 at 11:42 +0000, Stefano Stabellini wrote: >>>>>> So it looks like there is not actually anything wrong, is just that you >>>>>> have too much inflight irqs? It should cause problems because in that >>>>>> case GICH_HCR_UIE should be set and you should get a maintenance >>>>>> interrupt when LRs become available (actually when "none, or only one, >>>>>> of the List register entries is marked as a valid interrupt"). >>>>>> >>>>>> Maybe GICH_HCR_UIE is the one that doesn't work properly. >>>>> >>>>> How much testing did this aspect get when the no-maint-irq series >>>>> originally went in? Did you manage to find a workload which filled all >>>>> the LRs or try artificially limiting the number of LRs somehow in order >>>>> to provoke it? >>>>> >>>>> I ask because my intuition is that this won't happen very much, meaning >>>>> those code paths may not be as well tested... >>>> >>>> I did test it by artificially limiting the number of LRs to 1. >>>> However there have been many iterations of that series and I didn't run >>>> this test at every iteration. >>> >>> am I the only to think this may not be related to this bug? All the LRs >>> are full with IRQ of the same priority. So it's valid. >>> >>> As gic_restore_pending_irqs is called every time that we return to the >>> guest. It could be anything else. >>> >>> It would be interesting to see why we are trapping all the time in Xen. >>> >> >> I may perform any test if you have some specific scenario. > > I have no specific scenario in my mind :/. > > It looks like I'm able to reproduce it on my ARM board by the restricted > the number of LRs to 1. > Do you mean that you got a hang with current xen/master branch ? Regards, Andrii > I will investigate. > > Regards, > > -- > Julien Grall
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: > Hi Stefano, > > > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > > > - GICH[GICH_HCR] |= GICH_HCR_UIE; > > > + GICH[GICH_HCR] |= GICH_HCR_NPIE; > > > else > > > - GICH[GICH_HCR] &= ~GICH_HCR_UIE; > > > + GICH[GICH_HCR] &= ~GICH_HCR_NPIE; > > > > > > } > > > > Yes, exactly > > I tried, hang still occurs with this change We need to figure out why during the hang you still have all the LRs busy even if you are getting maintenance interrupts that should cause them to be cleared. Could you please call gic_dump_info(current) from maintenance_interrupt, and post the output during the hang? Remove the other gic_dump_info to avoid confusion, we want to understand what is the status of the LRs after clearing them upon receiving a maintenance interrupt at busy times.
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index b7516c0..b3eaa44 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -868,6 +868,8 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r * on return to guest that is going to clear the old LRs and inject * new interrupts. */ + + gdprintk(XENLOG_DEBUG, "maintenance interrupt\n"); } void gic_dump_info(struct vcpu *v)