diff mbox series

[PATCH-for-6.0] net: tap: fix crash on hotplug

Message ID 3f6be9c84782a0943ea21a8a6f8a5d055b65f2d5.1619018363.git.crobinso@redhat.com
State New
Headers show
Series [PATCH-for-6.0] net: tap: fix crash on hotplug | expand

Commit Message

Cole Robinson April 21, 2021, 3:22 p.m. UTC
Attempting to hotplug a tap nic with libvirt will crash qemu:

$ sudo virsh attach-interface f32 network default
error: Failed to attach interface
error: Unable to read from monitor: Connection reset by peer

0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206
206	        if (!s->nc.peer->do_not_pad) {
gdb$ bt

s->nc.peer may not be set at this point. This seems to be an
expected case, as qemu_send_packet_* explicitly checks for NULL
s->nc.peer later.

Fix it by checking for s->nc.peer here too. Padding is applied if
s->nc.peer is not set.

https://bugzilla.redhat.com/show_bug.cgi?id=1949786
Fixes: 969e50b61a2

Signed-off-by: Cole Robinson <crobinso@redhat.com>

---
* Or should we skip padding if nc.peer is unset? I didn't dig into it
* tap-win3.c and slirp.c may need a similar fix, but the slirp case
  didn't crash in a simple test.

 net/tap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.31.1

Comments

Philippe Mathieu-Daudé April 21, 2021, 4:36 p.m. UTC | #1
Cc'ing Bin.

On 4/21/21 5:22 PM, Cole Robinson wrote:
> Attempting to hotplug a tap nic with libvirt will crash qemu:

> 

> $ sudo virsh attach-interface f32 network default

> error: Failed to attach interface

> error: Unable to read from monitor: Connection reset by peer

> 

> 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206

> 206	        if (!s->nc.peer->do_not_pad) {

> gdb$ bt

> 

> s->nc.peer may not be set at this point. This seems to be an

> expected case, as qemu_send_packet_* explicitly checks for NULL

> s->nc.peer later.

> 

> Fix it by checking for s->nc.peer here too. Padding is applied if

> s->nc.peer is not set.

> 

> https://bugzilla.redhat.com/show_bug.cgi?id=1949786

> Fixes: 969e50b61a2

> 

> Signed-off-by: Cole Robinson <crobinso@redhat.com>

> ---

> * Or should we skip padding if nc.peer is unset? I didn't dig into it

> * tap-win3.c and slirp.c may need a similar fix, but the slirp case

>   didn't crash in a simple test.

> 

>  net/tap.c | 2 +-

>  1 file changed, 1 insertion(+), 1 deletion(-)

> 

> diff --git a/net/tap.c b/net/tap.c

> index dd42ac6134..937559dbb8 100644

> --- a/net/tap.c

> +++ b/net/tap.c

> @@ -203,7 +203,7 @@ static void tap_send(void *opaque)

>              size -= s->host_vnet_hdr_len;

>          }

>  

> -        if (!s->nc.peer->do_not_pad) {

> +        if (!s->nc.peer || !s->nc.peer->do_not_pad) {

>              if (eth_pad_short_frame(min_pkt, &min_pktsz, buf, size)) {

>                  buf = min_pkt;

>                  size = min_pktsz;

>
Peter Maydell April 21, 2021, 7:54 p.m. UTC | #2
On Wed, 21 Apr 2021 at 16:24, Cole Robinson <crobinso@redhat.com> wrote:
>

> Attempting to hotplug a tap nic with libvirt will crash qemu:

>

> $ sudo virsh attach-interface f32 network default

> error: Failed to attach interface

> error: Unable to read from monitor: Connection reset by peer

>

> 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206

> 206             if (!s->nc.peer->do_not_pad) {

> gdb$ bt

>

> s->nc.peer may not be set at this point. This seems to be an

> expected case, as qemu_send_packet_* explicitly checks for NULL

> s->nc.peer later.

>

> Fix it by checking for s->nc.peer here too. Padding is applied if

> s->nc.peer is not set.

>

> https://bugzilla.redhat.com/show_bug.cgi?id=1949786

> Fixes: 969e50b61a2


Is this a regression since 5.2 ? (I guess so given the Fixes tag.)

Also, I'm kind of irritated that this was reported to RH on the
15th and we only get a patch now after rc4. I really really don't
want to have to roll an rc5, so this now has a much higher
hill to climb to get into 6.0 than if it had been reported
(eg on the "Planning" wiki page) as a for-6.0 issue before rc4
was tagged...

thanks
-- PMM
Cole Robinson April 21, 2021, 10:27 p.m. UTC | #3
On 4/21/21 3:54 PM, Peter Maydell wrote:
> On Wed, 21 Apr 2021 at 16:24, Cole Robinson <crobinso@redhat.com> wrote:

>>

>> Attempting to hotplug a tap nic with libvirt will crash qemu:

>>

>> $ sudo virsh attach-interface f32 network default

>> error: Failed to attach interface

>> error: Unable to read from monitor: Connection reset by peer

>>

>> 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206

>> 206             if (!s->nc.peer->do_not_pad) {

>> gdb$ bt

>>

>> s->nc.peer may not be set at this point. This seems to be an

>> expected case, as qemu_send_packet_* explicitly checks for NULL

>> s->nc.peer later.

>>

>> Fix it by checking for s->nc.peer here too. Padding is applied if

>> s->nc.peer is not set.

>>

>> https://bugzilla.redhat.com/show_bug.cgi?id=1949786

>> Fixes: 969e50b61a2

> 

> Is this a regression since 5.2 ? (I guess so given the Fixes tag.)

> 


Yes

> Also, I'm kind of irritated that this was reported to RH on the

> 15th and we only get a patch now after rc4.


Sorry about that, I was slow attempting the reproducer, only gave it a
spin today. I saw Jason had some reverts in rc3 so I guessed that would
fix things, I was surprised to see it still reproduced with rc4.

 I really really don't
> want to have to roll an rc5, so this now has a much higher

> hill to climb to get into 6.0 than if it had been reported

> (eg on the "Planning" wiki page) as a for-6.0 issue before rc4

> was tagged..

I'm not too in tune to rules of the rc releases TBH, I used the subject
prefix just to ensure this got attention. For Fedora's needs it's not a
big deal if this isn't in 6.0.0 GA. But AFAICT most nic hotplug via
libvirt will crash qemu 100% of the time so I imagine every distro will
want to immediately backport this patch.

Thanks,
Cole
Jason Wang April 22, 2021, 2:25 a.m. UTC | #4
在 2021/4/21 下午11:22, Cole Robinson 写道:
> Attempting to hotplug a tap nic with libvirt will crash qemu:

>

> $ sudo virsh attach-interface f32 network default

> error: Failed to attach interface

> error: Unable to read from monitor: Connection reset by peer

>

> 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206

> 206	        if (!s->nc.peer->do_not_pad) {

> gdb$ bt

>

> s->nc.peer may not be set at this point. This seems to be an

> expected case, as qemu_send_packet_* explicitly checks for NULL

> s->nc.peer later.

>

> Fix it by checking for s->nc.peer here too. Padding is applied if

> s->nc.peer is not set.

>

> https://bugzilla.redhat.com/show_bug.cgi?id=1949786

> Fixes: 969e50b61a2

>

> Signed-off-by: Cole Robinson <crobinso@redhat.com>

> ---

> * Or should we skip padding if nc.peer is unset?



I think so, the padding is for the peer.


> I didn't dig into it

> * tap-win3.c and slirp.c may need a similar fix, but the slirp case

>    didn't crash in a simple test.



Yes, the reason is because there's no packet go through slirp I think.

Thanks.


>

>   net/tap.c | 2 +-

>   1 file changed, 1 insertion(+), 1 deletion(-)

>

> diff --git a/net/tap.c b/net/tap.c

> index dd42ac6134..937559dbb8 100644

> --- a/net/tap.c

> +++ b/net/tap.c

> @@ -203,7 +203,7 @@ static void tap_send(void *opaque)

>               size -= s->host_vnet_hdr_len;

>           }

>   

> -        if (!s->nc.peer->do_not_pad) {

> +        if (!s->nc.peer || !s->nc.peer->do_not_pad) {

>               if (eth_pad_short_frame(min_pkt, &min_pktsz, buf, size)) {

>                   buf = min_pkt;

>                   size = min_pktsz;
Bin Meng April 22, 2021, 4:29 a.m. UTC | #5
On Thu, Apr 22, 2021 at 12:36 AM Philippe Mathieu-Daudé
<philmd@redhat.com> wrote:
>

> Cc'ing Bin.

>

> On 4/21/21 5:22 PM, Cole Robinson wrote:

> > Attempting to hotplug a tap nic with libvirt will crash qemu:

> >

> > $ sudo virsh attach-interface f32 network default

> > error: Failed to attach interface

> > error: Unable to read from monitor: Connection reset by peer

> >

> > 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206

> > 206           if (!s->nc.peer->do_not_pad) {

> > gdb$ bt

> >

> > s->nc.peer may not be set at this point. This seems to be an

> > expected case, as qemu_send_packet_* explicitly checks for NULL

> > s->nc.peer later.

> >

> > Fix it by checking for s->nc.peer here too. Padding is applied if

> > s->nc.peer is not set.

> >

> > https://bugzilla.redhat.com/show_bug.cgi?id=1949786

> > Fixes: 969e50b61a2

> >

> > Signed-off-by: Cole Robinson <crobinso@redhat.com>

> > ---

> > * Or should we skip padding if nc.peer is unset? I didn't dig into it

> > * tap-win3.c and slirp.c may need a similar fix, but the slirp case

> >   didn't crash in a simple test.

> >

> >  net/tap.c | 2 +-

> >  1 file changed, 1 insertion(+), 1 deletion(-)

> >

> > diff --git a/net/tap.c b/net/tap.c

> > index dd42ac6134..937559dbb8 100644

> > --- a/net/tap.c

> > +++ b/net/tap.c

> > @@ -203,7 +203,7 @@ static void tap_send(void *opaque)

> >              size -= s->host_vnet_hdr_len;

> >          }

> >

> > -        if (!s->nc.peer->do_not_pad) {

> > +        if (!s->nc.peer || !s->nc.peer->do_not_pad) {


I think we should do:

if (s->nc.peer && !s->nc.peer->do_not_pad)

> >              if (eth_pad_short_frame(min_pkt, &min_pktsz, buf, size)) {

> >                  buf = min_pkt;

> >                  size = min_pktsz;

> >


And do the similar fix on tap-win32 and slirp codes.

Regards,
Bin
Peter Maydell April 22, 2021, 9:36 a.m. UTC | #6
On Thu, 22 Apr 2021 at 05:29, Bin Meng <bmeng.cn@gmail.com> wrote:
>

> On Thu, Apr 22, 2021 at 12:36 AM Philippe Mathieu-Daudé

> <philmd@redhat.com> wrote:

> >

> > Cc'ing Bin.

> >

> > On 4/21/21 5:22 PM, Cole Robinson wrote:

> > > Attempting to hotplug a tap nic with libvirt will crash qemu:

> > >

> > > $ sudo virsh attach-interface f32 network default

> > > error: Failed to attach interface

> > > error: Unable to read from monitor: Connection reset by peer

> > >

> > > 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206

> > > 206           if (!s->nc.peer->do_not_pad) {

> > > gdb$ bt

> > >

> > > s->nc.peer may not be set at this point. This seems to be an

> > > expected case, as qemu_send_packet_* explicitly checks for NULL

> > > s->nc.peer later.

> > >

> > > Fix it by checking for s->nc.peer here too. Padding is applied if

> > > s->nc.peer is not set.

> > >

> > > https://bugzilla.redhat.com/show_bug.cgi?id=1949786

> > > Fixes: 969e50b61a2

> > >

> > > Signed-off-by: Cole Robinson <crobinso@redhat.com>

> > > ---

> > > * Or should we skip padding if nc.peer is unset? I didn't dig into it

> > > * tap-win3.c and slirp.c may need a similar fix, but the slirp case

> > >   didn't crash in a simple test.

> > >

> > >  net/tap.c | 2 +-

> > >  1 file changed, 1 insertion(+), 1 deletion(-)

> > >

> > > diff --git a/net/tap.c b/net/tap.c

> > > index dd42ac6134..937559dbb8 100644

> > > --- a/net/tap.c

> > > +++ b/net/tap.c

> > > @@ -203,7 +203,7 @@ static void tap_send(void *opaque)

> > >              size -= s->host_vnet_hdr_len;

> > >          }

> > >

> > > -        if (!s->nc.peer->do_not_pad) {

> > > +        if (!s->nc.peer || !s->nc.peer->do_not_pad) {

>

> I think we should do:

>

> if (s->nc.peer && !s->nc.peer->do_not_pad)


Yes. If there is no peer then the qemu_send_packet() that we're about
to do is going to discard the packet anyway, so there's no point in
padding it.

Maybe consider

static inline bool net_peer_needs_padding(NetClientState *nc)
{
    return nc->peer && !nc->peer->do_not_pad;
}

since we want the same check in three places ?

thanks
-- PMM
Bin Meng April 22, 2021, 9:42 a.m. UTC | #7
On Thu, Apr 22, 2021 at 5:36 PM Peter Maydell <peter.maydell@linaro.org> wrote:
>

> On Thu, 22 Apr 2021 at 05:29, Bin Meng <bmeng.cn@gmail.com> wrote:

> >

> > On Thu, Apr 22, 2021 at 12:36 AM Philippe Mathieu-Daudé

> > <philmd@redhat.com> wrote:

> > >

> > > Cc'ing Bin.

> > >

> > > On 4/21/21 5:22 PM, Cole Robinson wrote:

> > > > Attempting to hotplug a tap nic with libvirt will crash qemu:

> > > >

> > > > $ sudo virsh attach-interface f32 network default

> > > > error: Failed to attach interface

> > > > error: Unable to read from monitor: Connection reset by peer

> > > >

> > > > 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206

> > > > 206           if (!s->nc.peer->do_not_pad) {

> > > > gdb$ bt

> > > >

> > > > s->nc.peer may not be set at this point. This seems to be an

> > > > expected case, as qemu_send_packet_* explicitly checks for NULL

> > > > s->nc.peer later.

> > > >

> > > > Fix it by checking for s->nc.peer here too. Padding is applied if

> > > > s->nc.peer is not set.

> > > >

> > > > https://bugzilla.redhat.com/show_bug.cgi?id=1949786

> > > > Fixes: 969e50b61a2

> > > >

> > > > Signed-off-by: Cole Robinson <crobinso@redhat.com>

> > > > ---

> > > > * Or should we skip padding if nc.peer is unset? I didn't dig into it

> > > > * tap-win3.c and slirp.c may need a similar fix, but the slirp case

> > > >   didn't crash in a simple test.

> > > >

> > > >  net/tap.c | 2 +-

> > > >  1 file changed, 1 insertion(+), 1 deletion(-)

> > > >

> > > > diff --git a/net/tap.c b/net/tap.c

> > > > index dd42ac6134..937559dbb8 100644

> > > > --- a/net/tap.c

> > > > +++ b/net/tap.c

> > > > @@ -203,7 +203,7 @@ static void tap_send(void *opaque)

> > > >              size -= s->host_vnet_hdr_len;

> > > >          }

> > > >

> > > > -        if (!s->nc.peer->do_not_pad) {

> > > > +        if (!s->nc.peer || !s->nc.peer->do_not_pad) {

> >

> > I think we should do:

> >

> > if (s->nc.peer && !s->nc.peer->do_not_pad)

>

> Yes. If there is no peer then the qemu_send_packet() that we're about

> to do is going to discard the packet anyway, so there's no point in

> padding it.

>

> Maybe consider

>

> static inline bool net_peer_needs_padding(NetClientState *nc)

> {

>     return nc->peer && !nc->peer->do_not_pad;

> }

>

> since we want the same check in three places ?


Sounds good to me.

Regards,
Bin
Cole Robinson April 22, 2021, 9:34 p.m. UTC | #8
On 4/22/21 5:42 AM, Bin Meng wrote:
> On Thu, Apr 22, 2021 at 5:36 PM Peter Maydell <peter.maydell@linaro.org> wrote:

>>

>> On Thu, 22 Apr 2021 at 05:29, Bin Meng <bmeng.cn@gmail.com> wrote:

>>>

>>> On Thu, Apr 22, 2021 at 12:36 AM Philippe Mathieu-Daudé

>>> <philmd@redhat.com> wrote:

>>>>

>>>> Cc'ing Bin.

>>>>

>>>> On 4/21/21 5:22 PM, Cole Robinson wrote:

>>>>> Attempting to hotplug a tap nic with libvirt will crash qemu:

>>>>>

>>>>> $ sudo virsh attach-interface f32 network default

>>>>> error: Failed to attach interface

>>>>> error: Unable to read from monitor: Connection reset by peer

>>>>>

>>>>> 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206

>>>>> 206           if (!s->nc.peer->do_not_pad) {

>>>>> gdb$ bt

>>>>>

>>>>> s->nc.peer may not be set at this point. This seems to be an

>>>>> expected case, as qemu_send_packet_* explicitly checks for NULL

>>>>> s->nc.peer later.

>>>>>

>>>>> Fix it by checking for s->nc.peer here too. Padding is applied if

>>>>> s->nc.peer is not set.

>>>>>

>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1949786

>>>>> Fixes: 969e50b61a2

>>>>>

>>>>> Signed-off-by: Cole Robinson <crobinso@redhat.com>

>>>>> ---

>>>>> * Or should we skip padding if nc.peer is unset? I didn't dig into it

>>>>> * tap-win3.c and slirp.c may need a similar fix, but the slirp case

>>>>>   didn't crash in a simple test.

>>>>>

>>>>>  net/tap.c | 2 +-

>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)

>>>>>

>>>>> diff --git a/net/tap.c b/net/tap.c

>>>>> index dd42ac6134..937559dbb8 100644

>>>>> --- a/net/tap.c

>>>>> +++ b/net/tap.c

>>>>> @@ -203,7 +203,7 @@ static void tap_send(void *opaque)

>>>>>              size -= s->host_vnet_hdr_len;

>>>>>          }

>>>>>

>>>>> -        if (!s->nc.peer->do_not_pad) {

>>>>> +        if (!s->nc.peer || !s->nc.peer->do_not_pad) {

>>>

>>> I think we should do:

>>>

>>> if (s->nc.peer && !s->nc.peer->do_not_pad)

>>

>> Yes. If there is no peer then the qemu_send_packet() that we're about

>> to do is going to discard the packet anyway, so there's no point in

>> padding it.

>>

>> Maybe consider

>>

>> static inline bool net_peer_needs_padding(NetClientState *nc)

>> {

>>     return nc->peer && !nc->peer->do_not_pad;

>> }

>>

>> since we want the same check in three places ?

> 

> Sounds good to me.

> 


I did not get to this today. Bin/Jason/anyone want to write the patch, I
will test it tomorrow (US EDT time). If not I'll write the patch tomorrow.

Thanks,
Cole
Jason Wang April 23, 2021, 1:43 a.m. UTC | #9
在 2021/4/23 上午5:34, Cole Robinson 写道:
> On 4/22/21 5:42 AM, Bin Meng wrote:

>> On Thu, Apr 22, 2021 at 5:36 PM Peter Maydell <peter.maydell@linaro.org> wrote:

>>> On Thu, 22 Apr 2021 at 05:29, Bin Meng <bmeng.cn@gmail.com> wrote:

>>>> On Thu, Apr 22, 2021 at 12:36 AM Philippe Mathieu-Daudé

>>>> <philmd@redhat.com> wrote:

>>>>> Cc'ing Bin.

>>>>>

>>>>> On 4/21/21 5:22 PM, Cole Robinson wrote:

>>>>>> Attempting to hotplug a tap nic with libvirt will crash qemu:

>>>>>>

>>>>>> $ sudo virsh attach-interface f32 network default

>>>>>> error: Failed to attach interface

>>>>>> error: Unable to read from monitor: Connection reset by peer

>>>>>>

>>>>>> 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206

>>>>>> 206           if (!s->nc.peer->do_not_pad) {

>>>>>> gdb$ bt

>>>>>>

>>>>>> s->nc.peer may not be set at this point. This seems to be an

>>>>>> expected case, as qemu_send_packet_* explicitly checks for NULL

>>>>>> s->nc.peer later.

>>>>>>

>>>>>> Fix it by checking for s->nc.peer here too. Padding is applied if

>>>>>> s->nc.peer is not set.

>>>>>>

>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1949786

>>>>>> Fixes: 969e50b61a2

>>>>>>

>>>>>> Signed-off-by: Cole Robinson <crobinso@redhat.com>

>>>>>> ---

>>>>>> * Or should we skip padding if nc.peer is unset? I didn't dig into it

>>>>>> * tap-win3.c and slirp.c may need a similar fix, but the slirp case

>>>>>>    didn't crash in a simple test.

>>>>>>

>>>>>>   net/tap.c | 2 +-

>>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)

>>>>>>

>>>>>> diff --git a/net/tap.c b/net/tap.c

>>>>>> index dd42ac6134..937559dbb8 100644

>>>>>> --- a/net/tap.c

>>>>>> +++ b/net/tap.c

>>>>>> @@ -203,7 +203,7 @@ static void tap_send(void *opaque)

>>>>>>               size -= s->host_vnet_hdr_len;

>>>>>>           }

>>>>>>

>>>>>> -        if (!s->nc.peer->do_not_pad) {

>>>>>> +        if (!s->nc.peer || !s->nc.peer->do_not_pad) {

>>>> I think we should do:

>>>>

>>>> if (s->nc.peer && !s->nc.peer->do_not_pad)

>>> Yes. If there is no peer then the qemu_send_packet() that we're about

>>> to do is going to discard the packet anyway, so there's no point in

>>> padding it.

>>>

>>> Maybe consider

>>>

>>> static inline bool net_peer_needs_padding(NetClientState *nc)

>>> {

>>>      return nc->peer && !nc->peer->do_not_pad;

>>> }

>>>

>>> since we want the same check in three places ?

>> Sounds good to me.

>>

> I did not get to this today. Bin/Jason/anyone want to write the patch,



I will send a patch soon.

Thanks


> I

> will test it tomorrow (US EDT time). If not I'll write the patch tomorrow.

>

> Thanks,

> Cole

>

>
diff mbox series

Patch

diff --git a/net/tap.c b/net/tap.c
index dd42ac6134..937559dbb8 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -203,7 +203,7 @@  static void tap_send(void *opaque)
             size -= s->host_vnet_hdr_len;
         }
 
-        if (!s->nc.peer->do_not_pad) {
+        if (!s->nc.peer || !s->nc.peer->do_not_pad) {
             if (eth_pad_short_frame(min_pkt, &min_pktsz, buf, size)) {
                 buf = min_pkt;
                 size = min_pktsz;