[8/9] virtio-gpu: fix hang under TCG when unmapping blob

Message ID	20250428125918.449346-9-alex.bennee@linaro.org
State	Superseded
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: =?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org> To: qemu-devel@nongnu.org Cc: Thomas Huth <thuth@redhat.com>, Alexandre Iooss <erdnaxe@crans.org>, "Michael S. Tsirkin" <mst@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, =?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org>, David Hildenbrand <david@redhat.com>, Pierrick Bouvier <pierrick.bouvier@linaro.org>, qemu-arm@nongnu.org, =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>, Peter Xu <peterx@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, Mahmoud Mandour <ma.mandourr@gmail.com>, Manos Pitsidianakis <manos.pitsidianakis@linaro.org>, qemu-stable@nongnu.org Subject: [PATCH 8/9] virtio-gpu: fix hang under TCG when unmapping blob Date: Mon, 28 Apr 2025 13:59:17 +0100 Message-Id: <20250428125918.449346-9-alex.bennee@linaro.org> In-Reply-To: <20250428125918.449346-1-alex.bennee@linaro.org> References: <20250428125918.449346-1-alex.bennee@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2a00:1450:4864:20::62b; envelope-from=alex.bennee@linaro.org; helo=mail-ej1-x62b.google.com X-Spam_score_int: 12 X-Spam_score: 1.2 X-Spam_bar: + X-Spam_report: (1.2 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_SBL_CSS=3.335, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org
Series	Maintainer updates for May (testing, plugins, virtio-gpu) \| expand [0/9] Maintainer updates for May (testing, plugins, virtio-gpu) [1/9] tests/docker: expose $HOME/.cache as docker volume [2/9] gitlab: disable debug info on CI builds [3/9] tests/tcg: make aarch64 boot.S handle different starting modes [4/9] contrib/plugins: add a scaling factor to the ips arg [5/9] contrib/plugins: allow setting of instructions per quantum [6/9] MAINTAINERS: add myself to virtio-gpu for Odd Fixes [7/9] hw/display: re-arrange memory region tracking [8/9] virtio-gpu: fix hang under TCG when unmapping blob [9/9] virtio-gpu: refactor async blob unmapping

Alex Bennée April 28, 2025, 12:59 p.m. UTC

From: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>

This commit fixes an indefinite hang when using VIRTIO GPU blob objects
under TCG in certain conditions.

The VIRTIO_GPU_CMD_RESOURCE_MAP_BLOB VIRTIO command creates a
MemoryRegion and attaches it to an offset on a PCI BAR of the
VirtIOGPUdevice. The VIRTIO_GPU_CMD_RESOURCE_UNMAP_BLOB command unmaps
it.

Because virglrenderer commands are not thread-safe they are only
called on the main context and QEMU performs the cleanup in three steps
to prevent a use-after-free scenario where the guest can access the
region after it’s unmapped:

1. From the main context, the region’s field finish_unmapping is false
   by default, so it sets a variable cmd_suspended, increases the
   renderer_blocked variable, deletes the blob subregion, and unparents
   the blob subregion causing its reference count to decrement.

2. From an RCU context, the MemoryView gets freed, the FlatView gets
   recalculated, the free callback of the blob region
   virtio_gpu_virgl_hostmem_region_free is called which sets the
   region’s field finish_unmapping to true, allowing the main thread
   context to finish replying to the command

3. From the main context, the command is processed again, but this time
   finish_unmapping is true, so virgl_renderer_resource_unmap can be
   called and a response is sent to the guest.

It happens so that under TCG, if the guest has no timers configured (and
thus no interrupt will cause the CPU to exit), the RCU thread does not
have enough time to grab the locks and recalculate the FlatView.

That’s not a big problem in practice since most guests will assume a
response will happen later in time and go on to do different things,
potentially triggering interrupts and allowing the RCU context to run.
If the guest waits for the unmap command to complete though, it blocks
indefinitely. Attaching to the QEMU monitor and force quitting the guest
allows the cleanup to continue.

There's no reason why the FlatView recalculation can't occur right away
when we delete the blob subregion, however. It does not, because when we
create the subregion we set the object as its own parent:

    memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);

The extra reference is what prevents freeing the memory region object in
the memory transaction of deleting the subregion.

This commit changes the owner object to the device, which removes the
extra owner reference in the memory region and causes the MR to be
freed right away in the main context.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250410122643.1747913-3-manos.pitsidianakis@linaro.org>
Cc: qemu-stable@nongnu.org
---
 hw/display/virtio-gpu-virgl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Dmitry Osipenko April 29, 2025, 6:48 p.m. UTC | #1

On 4/28/25 15:59, Alex BennÃ©e wrote:
> From: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
> 
> This commit fixes an indefinite hang when using VIRTIO GPU blob objects
> under TCG in certain conditions.
> 
> The VIRTIO_GPU_CMD_RESOURCE_MAP_BLOB VIRTIO command creates a
> MemoryRegion and attaches it to an offset on a PCI BAR of the
> VirtIOGPUdevice. The VIRTIO_GPU_CMD_RESOURCE_UNMAP_BLOB command unmaps
> it.
> 
> Because virglrenderer commands are not thread-safe they are only
> called on the main context and QEMU performs the cleanup in three steps
> to prevent a use-after-free scenario where the guest can access the
> region after it’s unmapped:
> 
> 1. From the main context, the region’s field finish_unmapping is false
>    by default, so it sets a variable cmd_suspended, increases the
>    renderer_blocked variable, deletes the blob subregion, and unparents
>    the blob subregion causing its reference count to decrement.
> 
> 2. From an RCU context, the MemoryView gets freed, the FlatView gets
>    recalculated, the free callback of the blob region
>    virtio_gpu_virgl_hostmem_region_free is called which sets the
>    region’s field finish_unmapping to true, allowing the main thread
>    context to finish replying to the command
> 
> 3. From the main context, the command is processed again, but this time
>    finish_unmapping is true, so virgl_renderer_resource_unmap can be
>    called and a response is sent to the guest.
> 
> It happens so that under TCG, if the guest has no timers configured (and
> thus no interrupt will cause the CPU to exit), the RCU thread does not
> have enough time to grab the locks and recalculate the FlatView.
> 
> That’s not a big problem in practice since most guests will assume a
> response will happen later in time and go on to do different things,
> potentially triggering interrupts and allowing the RCU context to run.
> If the guest waits for the unmap command to complete though, it blocks
> indefinitely. Attaching to the QEMU monitor and force quitting the guest
> allows the cleanup to continue.
> 
> There's no reason why the FlatView recalculation can't occur right away
> when we delete the blob subregion, however. It does not, because when we
> create the subregion we set the object as its own parent:
> 
>     memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
> 
> The extra reference is what prevents freeing the memory region object in
> the memory transaction of deleting the subregion.
> 
> This commit changes the owner object to the device, which removes the
> extra owner reference in the memory region and causes the MR to be
> freed right away in the main context.
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Tested-by: Alex Bennée <alex.bennee@linaro.org>
> Message-Id: <20250410122643.1747913-3-manos.pitsidianakis@linaro.org>
> Cc: qemu-stable@nongnu.org
> ---
>  hw/display/virtio-gpu-virgl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
> index 71a7500de9..8fbe4e70cc 100644
> --- a/hw/display/virtio-gpu-virgl.c
> +++ b/hw/display/virtio-gpu-virgl.c
> @@ -112,7 +112,7 @@ virtio_gpu_virgl_map_resource_blob(VirtIOGPU *g,
>      vmr->g = g;
>      mr = g_new0(MemoryRegion, 1);
>  
> -    memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
> +    memory_region_init_ram_ptr(mr, OBJECT(g), "blob", size, data);
>      memory_region_add_subregion(&b->hostmem, offset, mr);
>      memory_region_set_enabled(mr, true);
>  

This change makes QEMU to crash.

AFAICT, it effectively reverts code to old bugged version [1] that was
rejected in the past.

+Akihiko Odaki

[1]
https://lore.kernel.org/qemu-devel/20230915111130.24064-10-ray.huang@amd.com/

Alex Bennée April 29, 2025, 9:19 p.m. UTC | #2

Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:

> On 4/28/25 15:59, Alex BennÃ©e wrote:
>> From: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
>> 
>> This commit fixes an indefinite hang when using VIRTIO GPU blob objects
>> under TCG in certain conditions.
>> 
>> The VIRTIO_GPU_CMD_RESOURCE_MAP_BLOB VIRTIO command creates a
>> MemoryRegion and attaches it to an offset on a PCI BAR of the
>> VirtIOGPUdevice. The VIRTIO_GPU_CMD_RESOURCE_UNMAP_BLOB command unmaps
>> it.
>> 
>> Because virglrenderer commands are not thread-safe they are only
>> called on the main context and QEMU performs the cleanup in three steps
>> to prevent a use-after-free scenario where the guest can access the
>> region after it’s unmapped:
>> 
>> 1. From the main context, the region’s field finish_unmapping is false
>>    by default, so it sets a variable cmd_suspended, increases the
>>    renderer_blocked variable, deletes the blob subregion, and unparents
>>    the blob subregion causing its reference count to decrement.
>> 
>> 2. From an RCU context, the MemoryView gets freed, the FlatView gets
>>    recalculated, the free callback of the blob region
>>    virtio_gpu_virgl_hostmem_region_free is called which sets the
>>    region’s field finish_unmapping to true, allowing the main thread
>>    context to finish replying to the command
>> 
>> 3. From the main context, the command is processed again, but this time
>>    finish_unmapping is true, so virgl_renderer_resource_unmap can be
>>    called and a response is sent to the guest.
>> 
>> It happens so that under TCG, if the guest has no timers configured (and
>> thus no interrupt will cause the CPU to exit), the RCU thread does not
>> have enough time to grab the locks and recalculate the FlatView.
>> 
>> That’s not a big problem in practice since most guests will assume a
>> response will happen later in time and go on to do different things,
>> potentially triggering interrupts and allowing the RCU context to run.
>> If the guest waits for the unmap command to complete though, it blocks
>> indefinitely. Attaching to the QEMU monitor and force quitting the guest
>> allows the cleanup to continue.
>> 
>> There's no reason why the FlatView recalculation can't occur right away
>> when we delete the blob subregion, however. It does not, because when we
>> create the subregion we set the object as its own parent:
>> 
>>     memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
>> 
>> The extra reference is what prevents freeing the memory region object in
>> the memory transaction of deleting the subregion.
>> 
>> This commit changes the owner object to the device, which removes the
>> extra owner reference in the memory region and causes the MR to be
>> freed right away in the main context.
>> 
>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>> Tested-by: Alex Bennée <alex.bennee@linaro.org>
>> Message-Id: <20250410122643.1747913-3-manos.pitsidianakis@linaro.org>
>> Cc: qemu-stable@nongnu.org
>> ---
>>  hw/display/virtio-gpu-virgl.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
>> index 71a7500de9..8fbe4e70cc 100644
>> --- a/hw/display/virtio-gpu-virgl.c
>> +++ b/hw/display/virtio-gpu-virgl.c
>> @@ -112,7 +112,7 @@ virtio_gpu_virgl_map_resource_blob(VirtIOGPU *g,
>>      vmr->g = g;
>>      mr = g_new0(MemoryRegion, 1);
>>  
>> -    memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
>> +    memory_region_init_ram_ptr(mr, OBJECT(g), "blob", size, data);
>>      memory_region_add_subregion(&b->hostmem, offset, mr);
>>      memory_region_set_enabled(mr, true);
>>  
>
> This change makes QEMU to crash.

What is your command line to cause the crash?

>
> AFAICT, it effectively reverts code to old bugged version [1] that was
> rejected in the past.
>
> +Akihiko Odaki
>
> [1]
> https://lore.kernel.org/qemu-devel/20230915111130.24064-10-ray.huang@amd.com/

Dmitry Osipenko April 29, 2025, 9:26 p.m. UTC | #3

On 4/30/25 00:19, Alex Bennée wrote:
>> This change makes QEMU to crash.
> What is your command line to cause the crash?

I applied this patch on top of native context v11, ran AMD nctx and got a crash on SDDM startup.

(gdb) bt
#0  0x00007ffff5411b54 in __pthread_kill_implementation () at /lib64/libc.so.6
#1  0x00007ffff53b8f9e in raise () at /lib64/libc.so.6
#2  0x00007ffff53a0942 in abort () at /lib64/libc.so.6
#3  0x00007ffff6cbf18c in g_assertion_message[cold] () at /lib64/libglib-2.0.so.0
#4  0x00007ffff6d2ea07 in g_assertion_message_expr () at /lib64/libglib-2.0.so.0
#5  0x0000555555a42820 in object_finalize (data=0x555557c9d290) at ../qom/object.c:732
#6  object_unref (objptr=0x555557c9d290) at ../qom/object.c:1231
#7  0x00005555559f3df3 in memory_region_unref (mr=<optimized out>) at ../system/memory.c:1854
#8  0x0000555555a003a7 in phys_section_destroy (mr=0x555559ef5b60) at ../system/physmem.c:1035
#9  phys_sections_free (map=0x555559c2dd80) at ../system/physmem.c:1048
#10 address_space_dispatch_free (d=0x555559c2dd70) at ../system/physmem.c:2692
#11 0x00005555559f1d33 in flatview_destroy (view=0x55555a54a720) at ../system/memory.c:295
#12 0x0000555555c278cf in call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:301
#13 0x0000555555c1cc68 in qemu_thread_start (args=0x555557993d30) at ../util/qemu-thread-posix.c:541
#14 0x00007ffff540fba8 in start_thread () at /lib64/libc.so.6
#15 0x00007ffff5493b8c in __clone3 () at /lib64/libc.so.6

Alex Bennée April 30, 2025, 10:24 a.m. UTC | #4

Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:

> On 4/30/25 00:19, Alex Bennée wrote:
>>> This change makes QEMU to crash.
>> What is your command line to cause the crash?
>
> I applied this patch on top of native context v11, ran AMD nctx and
> got a crash on SDDM startup.

Did you also include the pre-cursor patch which splits MemoryRegion out
of the container struct. The aim here is to allow MemoryRegion counting
to be handled without worrying about other structure lifetimes.

>
> (gdb) bt
> #0  0x00007ffff5411b54 in __pthread_kill_implementation () at /lib64/libc.so.6
> #1  0x00007ffff53b8f9e in raise () at /lib64/libc.so.6
> #2  0x00007ffff53a0942 in abort () at /lib64/libc.so.6
> #3  0x00007ffff6cbf18c in g_assertion_message[cold] () at /lib64/libglib-2.0.so.0
> #4  0x00007ffff6d2ea07 in g_assertion_message_expr () at /lib64/libglib-2.0.so.0
> #5  0x0000555555a42820 in object_finalize (data=0x555557c9d290) at ../qom/object.c:732
> #6  object_unref (objptr=0x555557c9d290) at ../qom/object.c:1231
> #7  0x00005555559f3df3 in memory_region_unref (mr=<optimized out>) at ../system/memory.c:1854
> #8  0x0000555555a003a7 in phys_section_destroy (mr=0x555559ef5b60) at ../system/physmem.c:1035
> #9  phys_sections_free (map=0x555559c2dd80) at ../system/physmem.c:1048
> #10 address_space_dispatch_free (d=0x555559c2dd70) at ../system/physmem.c:2692
> #11 0x00005555559f1d33 in flatview_destroy (view=0x55555a54a720) at ../system/memory.c:295
> #12 0x0000555555c278cf in call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:301
> #13 0x0000555555c1cc68 in qemu_thread_start (args=0x555557993d30) at ../util/qemu-thread-posix.c:541
> #14 0x00007ffff540fba8 in start_thread () at /lib64/libc.so.6
> #15 0x00007ffff5493b8c in __clone3 () at /lib64/libc.so.6

Dmitry Osipenko April 30, 2025, 8:42 p.m. UTC | #5

On 4/30/25 13:24, Alex Bennée wrote:
>> On 4/30/25 00:19, Alex Bennée wrote:
>>>> This change makes QEMU to crash.
>>> What is your command line to cause the crash?
>> I applied this patch on top of native context v11, ran AMD nctx and
>> got a crash on SDDM startup.
> Did you also include the pre-cursor patch which splits MemoryRegion out
> of the container struct. The aim here is to allow MemoryRegion counting
> to be handled without worrying about other structure lifetimes.

Very good catch, I indeed missed that other patch. Reapplied all the
patches and QEMU doesn't crash anymore. Now the code changes look sane
to me.

Will be great if Akihiko Odaki could comment too.

Akihiko Odaki May 4, 2025, 8:19 a.m. UTC | #6

On 2025/04/30 3:48, Dmitry Osipenko wrote:
> On 4/28/25 15:59, Alex BennÃ©e wrote:
>> From: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
>>
>> This commit fixes an indefinite hang when using VIRTIO GPU blob objects
>> under TCG in certain conditions.
>>
>> The VIRTIO_GPU_CMD_RESOURCE_MAP_BLOB VIRTIO command creates a
>> MemoryRegion and attaches it to an offset on a PCI BAR of the
>> VirtIOGPUdevice. The VIRTIO_GPU_CMD_RESOURCE_UNMAP_BLOB command unmaps
>> it.
>>
>> Because virglrenderer commands are not thread-safe they are only
>> called on the main context and QEMU performs the cleanup in three steps
>> to prevent a use-after-free scenario where the guest can access the
>> region after it’s unmapped:
>>
>> 1. From the main context, the region’s field finish_unmapping is false
>>     by default, so it sets a variable cmd_suspended, increases the
>>     renderer_blocked variable, deletes the blob subregion, and unparents
>>     the blob subregion causing its reference count to decrement.
>>
>> 2. From an RCU context, the MemoryView gets freed, the FlatView gets
>>     recalculated, the free callback of the blob region
>>     virtio_gpu_virgl_hostmem_region_free is called which sets the
>>     region’s field finish_unmapping to true, allowing the main thread
>>     context to finish replying to the command
>>
>> 3. From the main context, the command is processed again, but this time
>>     finish_unmapping is true, so virgl_renderer_resource_unmap can be
>>     called and a response is sent to the guest.
>>
>> It happens so that under TCG, if the guest has no timers configured (and
>> thus no interrupt will cause the CPU to exit), the RCU thread does not
>> have enough time to grab the locks and recalculate the FlatView.
>>
>> That’s not a big problem in practice since most guests will assume a
>> response will happen later in time and go on to do different things,
>> potentially triggering interrupts and allowing the RCU context to run.
>> If the guest waits for the unmap command to complete though, it blocks
>> indefinitely. Attaching to the QEMU monitor and force quitting the guest
>> allows the cleanup to continue.

The RCU thread should be free to kick in if the guest is waiting and 
idle. That may be a problem that should be analyzed and fixed.

>>
>> There's no reason why the FlatView recalculation can't occur right away
>> when we delete the blob subregion, however. It does not, because when we
>> create the subregion we set the object as its own parent:
>>
>>      memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
>>
>> The extra reference is what prevents freeing the memory region object in
>> the memory transaction of deleting the subregion.
>>
>> This commit changes the owner object to the device, which removes the
>> extra owner reference in the memory region and causes the MR to be
>> freed right away in the main context.
>>
>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>> Tested-by: Alex Bennée <alex.bennee@linaro.org>
>> Message-Id: <20250410122643.1747913-3-manos.pitsidianakis@linaro.org>
>> Cc: qemu-stable@nongnu.org
>> ---
>>   hw/display/virtio-gpu-virgl.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
>> index 71a7500de9..8fbe4e70cc 100644
>> --- a/hw/display/virtio-gpu-virgl.c
>> +++ b/hw/display/virtio-gpu-virgl.c
>> @@ -112,7 +112,7 @@ virtio_gpu_virgl_map_resource_blob(VirtIOGPU *g,
>>       vmr->g = g;
>>       mr = g_new0(MemoryRegion, 1);
>>   
>> -    memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
>> +    memory_region_init_ram_ptr(mr, OBJECT(g), "blob", size, data);
>>       memory_region_add_subregion(&b->hostmem, offset, mr);
>>       memory_region_set_enabled(mr, true);
>>   
> 
> This change makes QEMU to crash.
> 
> AFAICT, it effectively reverts code to old bugged version [1] that was
> rejected in the past.
> 
> +Akihiko Odaki
> 
> [1]
> https://lore.kernel.org/qemu-devel/20230915111130.24064-10-ray.huang@amd.com/

I think you are right.

Changing the owner indeed makes the MR to be freed right away, but this 
is because it makes a reference by the FlatView to the MR dangling and 
can lead to use-after-free.

Setting the owner to the device implies that the device keeps the 
storage of the MR and. The references to the MR must be counted by the 
device to keep the storage available in such a case.

The MR itself doesn't count the references. It is fine as long as the MR 
is kept alive by being parented by the device, but 
virtio_gpu_virgl_unmap_resource_blob() unparents the MR and breaks the 
assumption. That's why the device shouldn't be the the owner of the MR.

Regards,
Akihiko Odaki

Alex Bennée May 6, 2025, 10:12 a.m. UTC | #7

Akihiko Odaki <akihiko.odaki@daynix.com> writes:

> On 2025/04/30 3:48, Dmitry Osipenko wrote:
>> On 4/28/25 15:59, Alex BennÃ©e wrote:
>>> From: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
>>>
>>> This commit fixes an indefinite hang when using VIRTIO GPU blob objects
>>> under TCG in certain conditions.
>>>
>>> The VIRTIO_GPU_CMD_RESOURCE_MAP_BLOB VIRTIO command creates a
>>> MemoryRegion and attaches it to an offset on a PCI BAR of the
>>> VirtIOGPUdevice. The VIRTIO_GPU_CMD_RESOURCE_UNMAP_BLOB command unmaps
>>> it.
>>>
>>> Because virglrenderer commands are not thread-safe they are only
>>> called on the main context and QEMU performs the cleanup in three steps
>>> to prevent a use-after-free scenario where the guest can access the
>>> region after it’s unmapped:
>>>
>>> 1. From the main context, the region’s field finish_unmapping is false
>>>     by default, so it sets a variable cmd_suspended, increases the
>>>     renderer_blocked variable, deletes the blob subregion, and unparents
>>>     the blob subregion causing its reference count to decrement.
>>>
>>> 2. From an RCU context, the MemoryView gets freed, the FlatView gets
>>>     recalculated, the free callback of the blob region
>>>     virtio_gpu_virgl_hostmem_region_free is called which sets the
>>>     region’s field finish_unmapping to true, allowing the main thread
>>>     context to finish replying to the command
>>>
>>> 3. From the main context, the command is processed again, but this time
>>>     finish_unmapping is true, so virgl_renderer_resource_unmap can be
>>>     called and a response is sent to the guest.
>>>
>>> It happens so that under TCG, if the guest has no timers configured (and
>>> thus no interrupt will cause the CPU to exit), the RCU thread does not
>>> have enough time to grab the locks and recalculate the FlatView.
>>>
>>> That’s not a big problem in practice since most guests will assume a
>>> response will happen later in time and go on to do different things,
>>> potentially triggering interrupts and allowing the RCU context to run.
>>> If the guest waits for the unmap command to complete though, it blocks
>>> indefinitely. Attaching to the QEMU monitor and force quitting the guest
>>> allows the cleanup to continue.
>
> The RCU thread should be free to kick in if the guest is waiting and
> idle. That may be a problem that should be analyzed and fixed.
>
>>>
>>> There's no reason why the FlatView recalculation can't occur right away
>>> when we delete the blob subregion, however. It does not, because when we
>>> create the subregion we set the object as its own parent:
>>>
>>>      memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
>>>
>>> The extra reference is what prevents freeing the memory region object in
>>> the memory transaction of deleting the subregion.
>>>
>>> This commit changes the owner object to the device, which removes the
>>> extra owner reference in the memory region and causes the MR to be
>>> freed right away in the main context.
>>>
>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>> Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
>>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>>> Tested-by: Alex Bennée <alex.bennee@linaro.org>
>>> Message-Id: <20250410122643.1747913-3-manos.pitsidianakis@linaro.org>
>>> Cc: qemu-stable@nongnu.org
>>> ---
>>>   hw/display/virtio-gpu-virgl.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
>>> index 71a7500de9..8fbe4e70cc 100644
>>> --- a/hw/display/virtio-gpu-virgl.c
>>> +++ b/hw/display/virtio-gpu-virgl.c
>>> @@ -112,7 +112,7 @@ virtio_gpu_virgl_map_resource_blob(VirtIOGPU *g,
>>>       vmr->g = g;
>>>       mr = g_new0(MemoryRegion, 1);
>>>   -    memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size,
>>> data);
>>> +    memory_region_init_ram_ptr(mr, OBJECT(g), "blob", size, data);
>>>       memory_region_add_subregion(&b->hostmem, offset, mr);
>>>       memory_region_set_enabled(mr, true);
>>>   
>> This change makes QEMU to crash.
>> AFAICT, it effectively reverts code to old bugged version [1] that
>> was
>> rejected in the past.
>> +Akihiko Odaki
>> [1]
>> https://lore.kernel.org/qemu-devel/20230915111130.24064-10-ray.huang@amd.com/
>
> I think you are right.
>
> Changing the owner indeed makes the MR to be freed right away, but
> this is because it makes a reference by the FlatView to the MR
> dangling and can lead to use-after-free.
>
> Setting the owner to the device implies that the device keeps the
> storage of the MR and. The references to the MR must be counted by the
> device to keep the storage available in such a case.

Well the lifetime of the region is effectively controlled by the virgl
resource manager. There is a disconnect between when we can finally free
the MR and when the virglrenderer is done with it.

> The MR itself doesn't count the references. It is fine as long as the
> MR is kept alive by being parented by the device, but
> virtio_gpu_virgl_unmap_resource_blob() unparents the MR and breaks the
> assumption. That's why the device shouldn't be the the owner of the
> MR.

Then who should? Ultimately some object needs to "own" the MR until we
know it can be freed (so no longer mapped into the guest and no longer
referenced by virglrenderer). The normal MR refcounting isn't enough
because the lifetime extends beyond when it is "visible" in the FlatView.

>
> Regards,
> Akihiko Odaki

Akihiko Odaki May 8, 2025, 11:44 a.m. UTC | #8

On 2025/05/06 19:12, Alex Bennée wrote:
> Akihiko Odaki <akihiko.odaki@daynix.com> writes:
> 
>> On 2025/04/30 3:48, Dmitry Osipenko wrote:
>>> On 4/28/25 15:59, Alex BennÃ©e wrote:
>>>> From: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
>>>>
>>>> This commit fixes an indefinite hang when using VIRTIO GPU blob objects
>>>> under TCG in certain conditions.
>>>>
>>>> The VIRTIO_GPU_CMD_RESOURCE_MAP_BLOB VIRTIO command creates a
>>>> MemoryRegion and attaches it to an offset on a PCI BAR of the
>>>> VirtIOGPUdevice. The VIRTIO_GPU_CMD_RESOURCE_UNMAP_BLOB command unmaps
>>>> it.
>>>>
>>>> Because virglrenderer commands are not thread-safe they are only
>>>> called on the main context and QEMU performs the cleanup in three steps
>>>> to prevent a use-after-free scenario where the guest can access the
>>>> region after it’s unmapped:
>>>>
>>>> 1. From the main context, the region’s field finish_unmapping is false
>>>>      by default, so it sets a variable cmd_suspended, increases the
>>>>      renderer_blocked variable, deletes the blob subregion, and unparents
>>>>      the blob subregion causing its reference count to decrement.
>>>>
>>>> 2. From an RCU context, the MemoryView gets freed, the FlatView gets
>>>>      recalculated, the free callback of the blob region
>>>>      virtio_gpu_virgl_hostmem_region_free is called which sets the
>>>>      region’s field finish_unmapping to true, allowing the main thread
>>>>      context to finish replying to the command
>>>>
>>>> 3. From the main context, the command is processed again, but this time
>>>>      finish_unmapping is true, so virgl_renderer_resource_unmap can be
>>>>      called and a response is sent to the guest.
>>>>
>>>> It happens so that under TCG, if the guest has no timers configured (and
>>>> thus no interrupt will cause the CPU to exit), the RCU thread does not
>>>> have enough time to grab the locks and recalculate the FlatView.
>>>>
>>>> That’s not a big problem in practice since most guests will assume a
>>>> response will happen later in time and go on to do different things,
>>>> potentially triggering interrupts and allowing the RCU context to run.
>>>> If the guest waits for the unmap command to complete though, it blocks
>>>> indefinitely. Attaching to the QEMU monitor and force quitting the guest
>>>> allows the cleanup to continue.
>>
>> The RCU thread should be free to kick in if the guest is waiting and
>> idle. That may be a problem that should be analyzed and fixed.
>>
>>>>
>>>> There's no reason why the FlatView recalculation can't occur right away
>>>> when we delete the blob subregion, however. It does not, because when we
>>>> create the subregion we set the object as its own parent:
>>>>
>>>>       memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
>>>>
>>>> The extra reference is what prevents freeing the memory region object in
>>>> the memory transaction of deleting the subregion.
>>>>
>>>> This commit changes the owner object to the device, which removes the
>>>> extra owner reference in the memory region and causes the MR to be
>>>> freed right away in the main context.
>>>>
>>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>>> Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
>>>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>>>> Tested-by: Alex Bennée <alex.bennee@linaro.org>
>>>> Message-Id: <20250410122643.1747913-3-manos.pitsidianakis@linaro.org>
>>>> Cc: qemu-stable@nongnu.org
>>>> ---
>>>>    hw/display/virtio-gpu-virgl.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
>>>> index 71a7500de9..8fbe4e70cc 100644
>>>> --- a/hw/display/virtio-gpu-virgl.c
>>>> +++ b/hw/display/virtio-gpu-virgl.c
>>>> @@ -112,7 +112,7 @@ virtio_gpu_virgl_map_resource_blob(VirtIOGPU *g,
>>>>        vmr->g = g;
>>>>        mr = g_new0(MemoryRegion, 1);
>>>>    -    memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size,
>>>> data);
>>>> +    memory_region_init_ram_ptr(mr, OBJECT(g), "blob", size, data);
>>>>        memory_region_add_subregion(&b->hostmem, offset, mr);
>>>>        memory_region_set_enabled(mr, true);
>>>>    
>>> This change makes QEMU to crash.
>>> AFAICT, it effectively reverts code to old bugged version [1] that
>>> was
>>> rejected in the past.
>>> +Akihiko Odaki
>>> [1]
>>> https://lore.kernel.org/qemu-devel/20230915111130.24064-10-ray.huang@amd.com/
>>
>> I think you are right.
>>
>> Changing the owner indeed makes the MR to be freed right away, but
>> this is because it makes a reference by the FlatView to the MR
>> dangling and can lead to use-after-free.
>>
>> Setting the owner to the device implies that the device keeps the
>> storage of the MR and. The references to the MR must be counted by the
>> device to keep the storage available in such a case.
> 
> Well the lifetime of the region is effectively controlled by the virgl
> resource manager. There is a disconnect between when we can finally free
> the MR and when the virglrenderer is done with it.
> 
>> The MR itself doesn't count the references. It is fine as long as the
>> MR is kept alive by being parented by the device, but
>> virtio_gpu_virgl_unmap_resource_blob() unparents the MR and breaks the
>> assumption. That's why the device shouldn't be the the owner of the
>> MR.
> 
> Then who should? Ultimately some object needs to "own" the MR until we
> know it can be freed (so no longer mapped into the guest and no longer
> referenced by virglrenderer). The normal MR refcounting isn't enough
> because the lifetime extends beyond when it is "visible" in the FlatView.

The current code makes the MR "own" itself. This works because the 
references are kept counted by the MR's own reference counter just like 
a normal Object.

Regards,
Akihiko Odaki

[8/9] virtio-gpu: fix hang under TCG when unmapping blob

Commit Message

Comments

Patch