diff mbox

hw/arm/virt: fix pl011 and pl031 irq flags

Message ID 1410274423-9461-1-git-send-email-peter.maydell@linaro.org
State Superseded
Headers show

Commit Message

Peter Maydell Sept. 9, 2014, 2:53 p.m. UTC
The pl011 and pl031 devices both use level triggered interrupts,
but the device tree we construct was incorrectly telling the
kernel to configure the GIC to treat them as edge triggered.
This meant that output from the pl011 would hang after a while.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-stable@nongnu.org
---
Thanks to Christoffer Dall for figuring out the cause of the hangs here.

 hw/arm/virt.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Christoffer Dall Sept. 10, 2014, 10:43 a.m. UTC | #1
On Tue, Sep 09, 2014 at 03:53:43PM +0100, Peter Maydell wrote:
> The pl011 and pl031 devices both use level triggered interrupts,
> but the device tree we construct was incorrectly telling the
> kernel to configure the GIC to treat them as edge triggered.
> This meant that output from the pl011 would hang after a while.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> Cc: qemu-stable@nongnu.org
> ---
> Thanks to Christoffer Dall for figuring out the cause of the hangs here.
> 
>  hw/arm/virt.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index e8f231e..1b343f0 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -371,7 +371,7 @@ static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic)
>                                       2, base, 2, size);
>      qemu_fdt_setprop_cells(vbi->fdt, nodename, "interrupts",
>                                 GIC_FDT_IRQ_TYPE_SPI, irq,
> -                               GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
> +                               GIC_FDT_IRQ_FLAGS_LEVEL_HI);
>      qemu_fdt_setprop_cells(vbi->fdt, nodename, "clocks",
>                                 vbi->clock_phandle, vbi->clock_phandle);
>      qemu_fdt_setprop(vbi->fdt, nodename, "clock-names",
> @@ -398,7 +398,7 @@ static void create_rtc(const VirtBoardInfo *vbi, qemu_irq *pic)
>                                   2, base, 2, size);
>      qemu_fdt_setprop_cells(vbi->fdt, nodename, "interrupts",
>                             GIC_FDT_IRQ_TYPE_SPI, irq,
> -                           GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
> +                           GIC_FDT_IRQ_FLAGS_LEVEL_HI);
>      qemu_fdt_setprop_cell(vbi->fdt, nodename, "clocks", vbi->clock_phandle);
>      qemu_fdt_setprop_string(vbi->fdt, nodename, "clock-names", "apb_pclk");
>      g_free(nodename);
> -- 
> 1.9.1
> 
I've been trying to figure out why we would see this particular hang
with SMP and not UP (or maybe it is just very much more unlikely to
happen with UP), but haven't been able to come up with a sequence of
events to support this yet.  It also worries me that we weren't seeing
this with KVM, since it indicates that we're either doing something
wrong in the KVM or QEMU GIC emulation code, potentially.

In any case, this patch is correct, so:

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Ard Biesheuvel Sept. 10, 2014, 10:48 a.m. UTC | #2
On 10 September 2014 12:43, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Sep 09, 2014 at 03:53:43PM +0100, Peter Maydell wrote:
>> The pl011 and pl031 devices both use level triggered interrupts,
>> but the device tree we construct was incorrectly telling the
>> kernel to configure the GIC to treat them as edge triggered.
>> This meant that output from the pl011 would hang after a while.
>>
>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>> Cc: qemu-stable@nongnu.org
>> ---
>> Thanks to Christoffer Dall for figuring out the cause of the hangs here.
>>
>>  hw/arm/virt.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index e8f231e..1b343f0 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -371,7 +371,7 @@ static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic)
>>                                       2, base, 2, size);
>>      qemu_fdt_setprop_cells(vbi->fdt, nodename, "interrupts",
>>                                 GIC_FDT_IRQ_TYPE_SPI, irq,
>> -                               GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
>> +                               GIC_FDT_IRQ_FLAGS_LEVEL_HI);
>>      qemu_fdt_setprop_cells(vbi->fdt, nodename, "clocks",
>>                                 vbi->clock_phandle, vbi->clock_phandle);
>>      qemu_fdt_setprop(vbi->fdt, nodename, "clock-names",
>> @@ -398,7 +398,7 @@ static void create_rtc(const VirtBoardInfo *vbi, qemu_irq *pic)
>>                                   2, base, 2, size);
>>      qemu_fdt_setprop_cells(vbi->fdt, nodename, "interrupts",
>>                             GIC_FDT_IRQ_TYPE_SPI, irq,
>> -                           GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
>> +                           GIC_FDT_IRQ_FLAGS_LEVEL_HI);
>>      qemu_fdt_setprop_cell(vbi->fdt, nodename, "clocks", vbi->clock_phandle);
>>      qemu_fdt_setprop_string(vbi->fdt, nodename, "clock-names", "apb_pclk");
>>      g_free(nodename);
>> --
>> 1.9.1
>>
> I've been trying to figure out why we would see this particular hang
> with SMP and not UP (or maybe it is just very much more unlikely to
> happen with UP), but haven't been able to come up with a sequence of
> events to support this yet.  It also worries me that we weren't seeing
> this with KVM, since it indicates that we're either doing something
> wrong in the KVM or QEMU GIC emulation code, potentially.
>
> In any case, this patch is correct, so:
>
> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>

I am still seeing lockups and self detected stalls even with this
patch, and sometimes just hitting a key into the console will get
things moving again.
So I am not convinced yet whether this fixes something fundamentally,
or just works around it by taking an alternative code path through the
kernel which doesn't trigger the same root bug.
Christoffer Dall Sept. 10, 2014, 11:32 a.m. UTC | #3
On Wed, Sep 10, 2014 at 12:48 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 10 September 2014 12:43, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
>> On Tue, Sep 09, 2014 at 03:53:43PM +0100, Peter Maydell wrote:
>>> The pl011 and pl031 devices both use level triggered interrupts,
>>> but the device tree we construct was incorrectly telling the
>>> kernel to configure the GIC to treat them as edge triggered.
>>> This meant that output from the pl011 would hang after a while.
>>>
>>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>>> Cc: qemu-stable@nongnu.org
>>> ---
>>> Thanks to Christoffer Dall for figuring out the cause of the hangs here.
>>>
>>>  hw/arm/virt.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>> index e8f231e..1b343f0 100644
>>> --- a/hw/arm/virt.c
>>> +++ b/hw/arm/virt.c
>>> @@ -371,7 +371,7 @@ static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic)
>>>                                       2, base, 2, size);
>>>      qemu_fdt_setprop_cells(vbi->fdt, nodename, "interrupts",
>>>                                 GIC_FDT_IRQ_TYPE_SPI, irq,
>>> -                               GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
>>> +                               GIC_FDT_IRQ_FLAGS_LEVEL_HI);
>>>      qemu_fdt_setprop_cells(vbi->fdt, nodename, "clocks",
>>>                                 vbi->clock_phandle, vbi->clock_phandle);
>>>      qemu_fdt_setprop(vbi->fdt, nodename, "clock-names",
>>> @@ -398,7 +398,7 @@ static void create_rtc(const VirtBoardInfo *vbi, qemu_irq *pic)
>>>                                   2, base, 2, size);
>>>      qemu_fdt_setprop_cells(vbi->fdt, nodename, "interrupts",
>>>                             GIC_FDT_IRQ_TYPE_SPI, irq,
>>> -                           GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
>>> +                           GIC_FDT_IRQ_FLAGS_LEVEL_HI);
>>>      qemu_fdt_setprop_cell(vbi->fdt, nodename, "clocks", vbi->clock_phandle);
>>>      qemu_fdt_setprop_string(vbi->fdt, nodename, "clock-names", "apb_pclk");
>>>      g_free(nodename);
>>> --
>>> 1.9.1
>>>
>> I've been trying to figure out why we would see this particular hang
>> with SMP and not UP (or maybe it is just very much more unlikely to
>> happen with UP), but haven't been able to come up with a sequence of
>> events to support this yet.  It also worries me that we weren't seeing
>> this with KVM, since it indicates that we're either doing something
>> wrong in the KVM or QEMU GIC emulation code, potentially.
>>
>> In any case, this patch is correct, so:
>>
>> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
>
> I am still seeing lockups and self detected stalls even with this
> patch, and sometimes just hitting a key into the console will get
> things moving again.
> So I am not convinced yet whether this fixes something fundamentally,
> or just works around it by taking an alternative code path through the
> kernel which doesn't trigger the same root bug.

As far as I can tell, this fix won't change any other behavior in the
kernel than configuring the GICD for that IRQ to level/edge on system
boot.

I also saw the lockups of the system that went away when giving the
pl011 some inputs, but I thought I only saw this before the fix -
admittedly I didn't test this very thoroughly.

I wonder if we are losing timer interrupts due to the same artifact....?

-Christoffer
Peter Crosthwaite Sept. 10, 2014, 12:42 p.m. UTC | #4
On Wed, Sep 10, 2014 at 9:32 PM, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Wed, Sep 10, 2014 at 12:48 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> On 10 September 2014 12:43, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>>> On Tue, Sep 09, 2014 at 03:53:43PM +0100, Peter Maydell wrote:
>>>> The pl011 and pl031 devices both use level triggered interrupts,
>>>> but the device tree we construct was incorrectly telling the
>>>> kernel to configure the GIC to treat them as edge triggered.
>>>> This meant that output from the pl011 would hang after a while.
>>>>
>>>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>>>> Cc: qemu-stable@nongnu.org
>>>> ---
>>>> Thanks to Christoffer Dall for figuring out the cause of the hangs here.
>>>>
>>>>  hw/arm/virt.c | 4 ++--
>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>>> index e8f231e..1b343f0 100644
>>>> --- a/hw/arm/virt.c
>>>> +++ b/hw/arm/virt.c
>>>> @@ -371,7 +371,7 @@ static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic)
>>>>                                       2, base, 2, size);
>>>>      qemu_fdt_setprop_cells(vbi->fdt, nodename, "interrupts",
>>>>                                 GIC_FDT_IRQ_TYPE_SPI, irq,
>>>> -                               GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
>>>> +                               GIC_FDT_IRQ_FLAGS_LEVEL_HI);
>>>>      qemu_fdt_setprop_cells(vbi->fdt, nodename, "clocks",
>>>>                                 vbi->clock_phandle, vbi->clock_phandle);
>>>>      qemu_fdt_setprop(vbi->fdt, nodename, "clock-names",
>>>> @@ -398,7 +398,7 @@ static void create_rtc(const VirtBoardInfo *vbi, qemu_irq *pic)
>>>>                                   2, base, 2, size);
>>>>      qemu_fdt_setprop_cells(vbi->fdt, nodename, "interrupts",
>>>>                             GIC_FDT_IRQ_TYPE_SPI, irq,
>>>> -                           GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
>>>> +                           GIC_FDT_IRQ_FLAGS_LEVEL_HI);
>>>>      qemu_fdt_setprop_cell(vbi->fdt, nodename, "clocks", vbi->clock_phandle);
>>>>      qemu_fdt_setprop_string(vbi->fdt, nodename, "clock-names", "apb_pclk");
>>>>      g_free(nodename);
>>>> --
>>>> 1.9.1
>>>>
>>> I've been trying to figure out why we would see this particular hang
>>> with SMP and not UP (or maybe it is just very much more unlikely to
>>> happen with UP), but haven't been able to come up with a sequence of
>>> events to support this yet.  It also worries me that we weren't seeing
>>> this with KVM, since it indicates that we're either doing something
>>> wrong in the KVM or QEMU GIC emulation code, potentially.
>>>
>>> In any case, this patch is correct, so:
>>>
>>> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
>>
>> I am still seeing lockups and self detected stalls even with this
>> patch, and sometimes just hitting a key into the console will get
>> things moving again.
>> So I am not convinced yet whether this fixes something fundamentally,
>> or just works around it by taking an alternative code path through the
>> kernel which doesn't trigger the same root bug.
>
> As far as I can tell, this fix won't change any other behavior in the
> kernel than configuring the GICD for that IRQ to level/edge on system
> boot.
>
> I also saw the lockups of the system that went away when giving the
> pl011 some inputs, but I thought I only saw this before the fix -
> admittedly I didn't test this very thoroughly.
>
> I wonder if we are losing timer interrupts due to the same artifact....?
>

I recently tried to debug SMP lockups on vexpress with bisection pointing at:

commit 6d327171551a12b937c5718073b984
8d0274c74d
Author: Alex Bligh <alex@alex.org.uk>
Date:   Wed Aug 21 16:02:59 2013 +0100

    aio / timers: Remove alarm timers

    Remove alarm timers from qemu-timers.c now we use g_poll / ppoll
    instead.

    Signed-off-by: Alex Bligh <alex@alex.org.uk>
    Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Unfortunately this patch predates the virt machine, but I am highly
suspicious of the same root cause - self detected stalling was amongst
a range of symptoms I was seeing. My best theory is a TCG mainloop bug
which is consistent with your works-on-KVM observation.

I had a whole bunch of patches that added traces to arm timer and
interrupts etc. I should probably send them through. My basic
conclusion was that that CPU exit_request were comming in pairs with
only a short delay between them (for the timer most likely).

Looking at the tcg mainloop:

static void tcg_exec_all(void)
{
    int r;

    /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
    qemu_clock_warp(QEMU_CLOCK_VIRTUAL);

    if (next_cpu == NULL) {
        next_cpu = first_cpu;
    }
    for (; next_cpu != NULL && !exit_request; next_cpu = CPU_NEXT(next_cpu)) {
        CPUState *cpu = next_cpu;
        CPUArchState *env = cpu->env_ptr;

        qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
                          (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);

        if (cpu_can_run(cpu)) {
            r = tcg_cpu_exec(env);
            if (r == EXCP_DEBUG) {
                cpu_handle_guest_debug(cpu);
                break;
            }
        } else if (cpu->stop || cpu->stopped) {
            break;
        }
    }
    exit_request = 0;
}

The for loop implements a round robin schedule. But with the double
exit requests I was seeing, the first exit_request would cause the
currently executing CPU to exit, then the second exit_request would
cause the second CPU to exit straight away with no code execution
moving schedule back to the first. This pattern occured consistently
for many cycles starving the second CPU long term. This often leading
to secondary CPU probe failure.

That qemu_clock_enable and the qemu_clock_warp are a little suspicous
to me. Can they be the source of exit_requests causing CPU starvation?

Regards,
Peter

> -Christoffer
>
diff mbox

Patch

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index e8f231e..1b343f0 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -371,7 +371,7 @@  static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic)
                                      2, base, 2, size);
     qemu_fdt_setprop_cells(vbi->fdt, nodename, "interrupts",
                                GIC_FDT_IRQ_TYPE_SPI, irq,
-                               GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
+                               GIC_FDT_IRQ_FLAGS_LEVEL_HI);
     qemu_fdt_setprop_cells(vbi->fdt, nodename, "clocks",
                                vbi->clock_phandle, vbi->clock_phandle);
     qemu_fdt_setprop(vbi->fdt, nodename, "clock-names",
@@ -398,7 +398,7 @@  static void create_rtc(const VirtBoardInfo *vbi, qemu_irq *pic)
                                  2, base, 2, size);
     qemu_fdt_setprop_cells(vbi->fdt, nodename, "interrupts",
                            GIC_FDT_IRQ_TYPE_SPI, irq,
-                           GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
+                           GIC_FDT_IRQ_FLAGS_LEVEL_HI);
     qemu_fdt_setprop_cell(vbi->fdt, nodename, "clocks", vbi->clock_phandle);
     qemu_fdt_setprop_string(vbi->fdt, nodename, "clock-names", "apb_pclk");
     g_free(nodename);