Message ID | 20210129002136.70865-1-weiwan@google.com |
---|---|
State | New |
Headers | show |
Series | [net] virtio-net: suppress bad irq warning for tx napi | expand |
On Thu, 28 Jan 2021 16:21:36 -0800 Wei Wang wrote: > With the implementation of napi-tx in virtio driver, we clean tx > descriptors from rx napi handler, for the purpose of reducing tx > complete interrupts. But this could introduce a race where tx complete > interrupt has been raised, but the handler found there is no work to do > because we have done the work in the previous rx interrupt handler. > This could lead to the following warning msg: > [ 3588.010778] irq 38: nobody cared (try booting with the > "irqpoll" option) > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > 5.3.0-19-generic #20~18.04.2-Ubuntu > [ 3588.017940] Call Trace: > [ 3588.017942] <IRQ> > [ 3588.017951] dump_stack+0x63/0x85 > [ 3588.017953] __report_bad_irq+0x35/0xc0 > [ 3588.017955] note_interrupt+0x24b/0x2a0 > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > [ 3588.017957] handle_irq_event+0x3b/0x60 > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > [ 3588.017961] handle_irq+0x20/0x30 > [ 3588.017964] do_IRQ+0x50/0xe0 > [ 3588.017966] common_interrupt+0xf/0xf > [ 3588.017966] </IRQ> > [ 3588.017989] handlers: > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > [ 3588.025099] Disabling IRQ #38 > > This patch adds a new param to struct vring_virtqueue, and we set it for > tx virtqueues if napi-tx is enabled, to suppress the warning in such > case. > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > Reported-by: Rick Jones <jonesrick@google.com> > Signed-off-by: Wei Wang <weiwan@google.com> > Signed-off-by: Willem de Bruijn <willemb@google.com> Michael, Jason, does this one look okay to you?
On 2021/1/29 上午8:21, Wei Wang wrote: > With the implementation of napi-tx in virtio driver, we clean tx > descriptors from rx napi handler, for the purpose of reducing tx > complete interrupts. But this could introduce a race where tx complete > interrupt has been raised, but the handler found there is no work to do > because we have done the work in the previous rx interrupt handler. > This could lead to the following warning msg: > [ 3588.010778] irq 38: nobody cared (try booting with the > "irqpoll" option) > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > 5.3.0-19-generic #20~18.04.2-Ubuntu > [ 3588.017940] Call Trace: > [ 3588.017942] <IRQ> > [ 3588.017951] dump_stack+0x63/0x85 > [ 3588.017953] __report_bad_irq+0x35/0xc0 > [ 3588.017955] note_interrupt+0x24b/0x2a0 > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > [ 3588.017957] handle_irq_event+0x3b/0x60 > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > [ 3588.017961] handle_irq+0x20/0x30 > [ 3588.017964] do_IRQ+0x50/0xe0 > [ 3588.017966] common_interrupt+0xf/0xf > [ 3588.017966] </IRQ> > [ 3588.017989] handlers: > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > [ 3588.025099] Disabling IRQ #38 > > This patch adds a new param to struct vring_virtqueue, and we set it for > tx virtqueues if napi-tx is enabled, to suppress the warning in such > case. > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > Reported-by: Rick Jones <jonesrick@google.com> > Signed-off-by: Wei Wang <weiwan@google.com> > Signed-off-by: Willem de Bruijn <willemb@google.com> Please use get_maintainer.pl to make sure Michael and me were cced. > --- > drivers/net/virtio_net.c | 19 ++++++++++++++----- > drivers/virtio/virtio_ring.c | 16 ++++++++++++++++ > include/linux/virtio.h | 2 ++ > 3 files changed, 32 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 508408fbe78f..e9a3f30864e8 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi, > return; > } > > + /* With napi_tx enabled, free_old_xmit_skbs() could be called from > + * rx napi handler. Set work_steal to suppress bad irq warning for > + * IRQ_NONE case from tx complete interrupt handler. > + */ > + virtqueue_set_work_steal(vq, true); > + > return virtnet_napi_enable(vq, napi); Do we need to force the ordering between steal set and napi enable? > } > > -static void virtnet_napi_tx_disable(struct napi_struct *napi) > +static void virtnet_napi_tx_disable(struct virtqueue *vq, > + struct napi_struct *napi) > { > - if (napi->weight) > + if (napi->weight) { > napi_disable(napi); > + virtqueue_set_work_steal(vq, false); > + } > } > > static void refill_work(struct work_struct *work) > @@ -1835,7 +1844,7 @@ static int virtnet_close(struct net_device *dev) > for (i = 0; i < vi->max_queue_pairs; i++) { > xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); > napi_disable(&vi->rq[i].napi); > - virtnet_napi_tx_disable(&vi->sq[i].napi); > + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); > } > > return 0; > @@ -2315,7 +2324,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev) > if (netif_running(vi->dev)) { > for (i = 0; i < vi->max_queue_pairs; i++) { > napi_disable(&vi->rq[i].napi); > - virtnet_napi_tx_disable(&vi->sq[i].napi); > + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); > } > } > } > @@ -2440,7 +2449,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog, > if (netif_running(dev)) { > for (i = 0; i < vi->max_queue_pairs; i++) { > napi_disable(&vi->rq[i].napi); > - virtnet_napi_tx_disable(&vi->sq[i].napi); > + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); > } > } > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > index 71e16b53e9c1..f7c5d697c302 100644 > --- a/drivers/virtio/virtio_ring.c > +++ b/drivers/virtio/virtio_ring.c > @@ -105,6 +105,9 @@ struct vring_virtqueue { > /* Host publishes avail event idx */ > bool event; > > + /* Tx side napi work could be done from rx side. */ > + bool work_steal; So vring_vritqueue is a general structure, let's avoid mentioning network specific stuffs here. And we need a better name like "no_interrupt_check"? And we need a separate patch for virtio core changes. > + > /* Head of free buffer list. */ > unsigned int free_head; > /* Number we've added since last sync. */ > @@ -1604,6 +1607,7 @@ static struct virtqueue *vring_create_virtqueue_packed( > vq->notify = notify; > vq->weak_barriers = weak_barriers; > vq->broken = false; > + vq->work_steal = false; > vq->last_used_idx = 0; > vq->num_added = 0; > vq->packed_ring = true; > @@ -2038,6 +2042,9 @@ irqreturn_t vring_interrupt(int irq, void *_vq) > > if (!more_used(vq)) { > pr_debug("virtqueue interrupt with no work for %p\n", vq); Do we still need to keep this warning? > + if (vq->work_steal) > + return IRQ_HANDLED; So I wonder instead of doing trick like this, maybe it's time to unify TX/RX NAPI with the help of[1] (virtio-net use queue pairs). Thanks [1] https://lkml.org/lkml/2014/12/25/169 > + > return IRQ_NONE; > } > > @@ -2082,6 +2089,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index, > vq->notify = notify; > vq->weak_barriers = weak_barriers; > vq->broken = false; > + vq->work_steal = false; > vq->last_used_idx = 0; > vq->num_added = 0; > vq->use_dma_api = vring_use_dma_api(vdev); > @@ -2266,6 +2274,14 @@ bool virtqueue_is_broken(struct virtqueue *_vq) > } > EXPORT_SYMBOL_GPL(virtqueue_is_broken); > > +void virtqueue_set_work_steal(struct virtqueue *_vq, bool val) > +{ > + struct vring_virtqueue *vq = to_vvq(_vq); > + > + vq->work_steal = val; > +} > +EXPORT_SYMBOL_GPL(virtqueue_set_work_steal); > + > /* > * This should prevent the device from being used, allowing drivers to > * recover. You may need to grab appropriate locks to flush. > diff --git a/include/linux/virtio.h b/include/linux/virtio.h > index 55ea329fe72a..091c30f21ff9 100644 > --- a/include/linux/virtio.h > +++ b/include/linux/virtio.h > @@ -84,6 +84,8 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *vq); > > bool virtqueue_is_broken(struct virtqueue *vq); > > +void virtqueue_set_work_steal(struct virtqueue *vq, bool val); > + > const struct vring *virtqueue_get_vring(struct virtqueue *vq); > dma_addr_t virtqueue_get_desc_addr(struct virtqueue *vq); > dma_addr_t virtqueue_get_avail_addr(struct virtqueue *vq);
On Mon, Feb 1, 2021 at 10:09 PM Jason Wang <jasowang@redhat.com> wrote: > > > On 2021/1/29 上午8:21, Wei Wang wrote: > > With the implementation of napi-tx in virtio driver, we clean tx > > descriptors from rx napi handler, for the purpose of reducing tx > > complete interrupts. But this could introduce a race where tx complete > > interrupt has been raised, but the handler found there is no work to do > > because we have done the work in the previous rx interrupt handler. > > This could lead to the following warning msg: > > [ 3588.010778] irq 38: nobody cared (try booting with the > > "irqpoll" option) > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > > 5.3.0-19-generic #20~18.04.2-Ubuntu > > [ 3588.017940] Call Trace: > > [ 3588.017942] <IRQ> > > [ 3588.017951] dump_stack+0x63/0x85 > > [ 3588.017953] __report_bad_irq+0x35/0xc0 > > [ 3588.017955] note_interrupt+0x24b/0x2a0 > > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > > [ 3588.017957] handle_irq_event+0x3b/0x60 > > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > > [ 3588.017961] handle_irq+0x20/0x30 > > [ 3588.017964] do_IRQ+0x50/0xe0 > > [ 3588.017966] common_interrupt+0xf/0xf > > [ 3588.017966] </IRQ> > > [ 3588.017989] handlers: > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > > [ 3588.025099] Disabling IRQ #38 > > > > This patch adds a new param to struct vring_virtqueue, and we set it for > > tx virtqueues if napi-tx is enabled, to suppress the warning in such > > case. > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > Reported-by: Rick Jones <jonesrick@google.com> > > Signed-off-by: Wei Wang <weiwan@google.com> > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > Please use get_maintainer.pl to make sure Michael and me were cced. Will do. Sorry about that. I suggested just the virtualization list, my bad. > > > --- > > drivers/net/virtio_net.c | 19 ++++++++++++++----- > > drivers/virtio/virtio_ring.c | 16 ++++++++++++++++ > > include/linux/virtio.h | 2 ++ > > 3 files changed, 32 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 508408fbe78f..e9a3f30864e8 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi, > > return; > > } > > > > + /* With napi_tx enabled, free_old_xmit_skbs() could be called from > > + * rx napi handler. Set work_steal to suppress bad irq warning for > > + * IRQ_NONE case from tx complete interrupt handler. > > + */ > > + virtqueue_set_work_steal(vq, true); > > + > > return virtnet_napi_enable(vq, napi); > > > Do we need to force the ordering between steal set and napi enable? The warning only occurs after one hundred spurious interrupts, so not really. > > > } > > > > -static void virtnet_napi_tx_disable(struct napi_struct *napi) > > +static void virtnet_napi_tx_disable(struct virtqueue *vq, > > + struct napi_struct *napi) > > { > > - if (napi->weight) > > + if (napi->weight) { > > napi_disable(napi); > > + virtqueue_set_work_steal(vq, false); > > + } > > } > > > > static void refill_work(struct work_struct *work) > > @@ -1835,7 +1844,7 @@ static int virtnet_close(struct net_device *dev) > > for (i = 0; i < vi->max_queue_pairs; i++) { > > xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); > > napi_disable(&vi->rq[i].napi); > > - virtnet_napi_tx_disable(&vi->sq[i].napi); > > + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); > > } > > > > return 0; > > @@ -2315,7 +2324,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev) > > if (netif_running(vi->dev)) { > > for (i = 0; i < vi->max_queue_pairs; i++) { > > napi_disable(&vi->rq[i].napi); > > - virtnet_napi_tx_disable(&vi->sq[i].napi); > > + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); > > } > > } > > } > > @@ -2440,7 +2449,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog, > > if (netif_running(dev)) { > > for (i = 0; i < vi->max_queue_pairs; i++) { > > napi_disable(&vi->rq[i].napi); > > - virtnet_napi_tx_disable(&vi->sq[i].napi); > > + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); > > } > > } > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > > index 71e16b53e9c1..f7c5d697c302 100644 > > --- a/drivers/virtio/virtio_ring.c > > +++ b/drivers/virtio/virtio_ring.c > > @@ -105,6 +105,9 @@ struct vring_virtqueue { > > /* Host publishes avail event idx */ > > bool event; > > > > + /* Tx side napi work could be done from rx side. */ > > + bool work_steal; > > > So vring_vritqueue is a general structure, let's avoid mentioning > network specific stuffs here. And we need a better name like > "no_interrupt_check"? > > And we need a separate patch for virtio core changes. Ack. Will change. > > > + > > /* Head of free buffer list. */ > > unsigned int free_head; > > /* Number we've added since last sync. */ > > @@ -1604,6 +1607,7 @@ static struct virtqueue *vring_create_virtqueue_packed( > > vq->notify = notify; > > vq->weak_barriers = weak_barriers; > > vq->broken = false; > > + vq->work_steal = false; > > vq->last_used_idx = 0; > > vq->num_added = 0; > > vq->packed_ring = true; > > @@ -2038,6 +2042,9 @@ irqreturn_t vring_interrupt(int irq, void *_vq) > > > > if (!more_used(vq)) { > > pr_debug("virtqueue interrupt with no work for %p\n", vq); > > > Do we still need to keep this warning? Come to think of it, I would say no, in this case. > > > > + if (vq->work_steal) > > + return IRQ_HANDLED; > > > So I wonder instead of doing trick like this, maybe it's time to unify > TX/RX NAPI with the help of[1] (virtio-net use queue pairs). > > Thanks > > [1] https://lkml.org/lkml/2014/12/25/169 Interesting idea. It does sound like a good fit for this model. The patch in the Fixes line proved effective at suppressing unnecessary TX interrupts when processing in RX interrupt handler. So not sure how much will help in practice. Might be a nice project to evaluate separate for net-next at some point. Thanks for the review!
On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote: > With the implementation of napi-tx in virtio driver, we clean tx > descriptors from rx napi handler, for the purpose of reducing tx > complete interrupts. But this could introduce a race where tx complete > interrupt has been raised, but the handler found there is no work to do > because we have done the work in the previous rx interrupt handler. > This could lead to the following warning msg: > [ 3588.010778] irq 38: nobody cared (try booting with the > "irqpoll" option) > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > 5.3.0-19-generic #20~18.04.2-Ubuntu > [ 3588.017940] Call Trace: > [ 3588.017942] <IRQ> > [ 3588.017951] dump_stack+0x63/0x85 > [ 3588.017953] __report_bad_irq+0x35/0xc0 > [ 3588.017955] note_interrupt+0x24b/0x2a0 > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > [ 3588.017957] handle_irq_event+0x3b/0x60 > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > [ 3588.017961] handle_irq+0x20/0x30 > [ 3588.017964] do_IRQ+0x50/0xe0 > [ 3588.017966] common_interrupt+0xf/0xf > [ 3588.017966] </IRQ> > [ 3588.017989] handlers: > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > [ 3588.025099] Disabling IRQ #38 > > This patch adds a new param to struct vring_virtqueue, and we set it for > tx virtqueues if napi-tx is enabled, to suppress the warning in such > case. > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > Reported-by: Rick Jones <jonesrick@google.com> > Signed-off-by: Wei Wang <weiwan@google.com> > Signed-off-by: Willem de Bruijn <willemb@google.com> This description does not make sense to me. irq X: nobody cared only triggers after an interrupt is unhandled repeatedly. So something causes a storm of useless tx interrupts here. Let's find out what it was please. What you are doing is just preventing linux from complaining. > --- > drivers/net/virtio_net.c | 19 ++++++++++++++----- > drivers/virtio/virtio_ring.c | 16 ++++++++++++++++ > include/linux/virtio.h | 2 ++ > 3 files changed, 32 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 508408fbe78f..e9a3f30864e8 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi, > return; > } > > + /* With napi_tx enabled, free_old_xmit_skbs() could be called from > + * rx napi handler. Set work_steal to suppress bad irq warning for > + * IRQ_NONE case from tx complete interrupt handler. > + */ > + virtqueue_set_work_steal(vq, true); > + > return virtnet_napi_enable(vq, napi); > } > > -static void virtnet_napi_tx_disable(struct napi_struct *napi) > +static void virtnet_napi_tx_disable(struct virtqueue *vq, > + struct napi_struct *napi) > { > - if (napi->weight) > + if (napi->weight) { > napi_disable(napi); > + virtqueue_set_work_steal(vq, false); > + } > } > > static void refill_work(struct work_struct *work) > @@ -1835,7 +1844,7 @@ static int virtnet_close(struct net_device *dev) > for (i = 0; i < vi->max_queue_pairs; i++) { > xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); > napi_disable(&vi->rq[i].napi); > - virtnet_napi_tx_disable(&vi->sq[i].napi); > + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); > } > > return 0; > @@ -2315,7 +2324,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev) > if (netif_running(vi->dev)) { > for (i = 0; i < vi->max_queue_pairs; i++) { > napi_disable(&vi->rq[i].napi); > - virtnet_napi_tx_disable(&vi->sq[i].napi); > + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); > } > } > } > @@ -2440,7 +2449,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog, > if (netif_running(dev)) { > for (i = 0; i < vi->max_queue_pairs; i++) { > napi_disable(&vi->rq[i].napi); > - virtnet_napi_tx_disable(&vi->sq[i].napi); > + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); > } > } > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > index 71e16b53e9c1..f7c5d697c302 100644 > --- a/drivers/virtio/virtio_ring.c > +++ b/drivers/virtio/virtio_ring.c > @@ -105,6 +105,9 @@ struct vring_virtqueue { > /* Host publishes avail event idx */ > bool event; > > + /* Tx side napi work could be done from rx side. */ > + bool work_steal; > + > /* Head of free buffer list. */ > unsigned int free_head; > /* Number we've added since last sync. */ > @@ -1604,6 +1607,7 @@ static struct virtqueue *vring_create_virtqueue_packed( > vq->notify = notify; > vq->weak_barriers = weak_barriers; > vq->broken = false; > + vq->work_steal = false; > vq->last_used_idx = 0; > vq->num_added = 0; > vq->packed_ring = true; > @@ -2038,6 +2042,9 @@ irqreturn_t vring_interrupt(int irq, void *_vq) > > if (!more_used(vq)) { > pr_debug("virtqueue interrupt with no work for %p\n", vq); > + if (vq->work_steal) > + return IRQ_HANDLED; > + > return IRQ_NONE; > } > > @@ -2082,6 +2089,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index, > vq->notify = notify; > vq->weak_barriers = weak_barriers; > vq->broken = false; > + vq->work_steal = false; > vq->last_used_idx = 0; > vq->num_added = 0; > vq->use_dma_api = vring_use_dma_api(vdev); > @@ -2266,6 +2274,14 @@ bool virtqueue_is_broken(struct virtqueue *_vq) > } > EXPORT_SYMBOL_GPL(virtqueue_is_broken); > > +void virtqueue_set_work_steal(struct virtqueue *_vq, bool val) > +{ > + struct vring_virtqueue *vq = to_vvq(_vq); > + > + vq->work_steal = val; > +} > +EXPORT_SYMBOL_GPL(virtqueue_set_work_steal); > + > /* > * This should prevent the device from being used, allowing drivers to > * recover. You may need to grab appropriate locks to flush. > diff --git a/include/linux/virtio.h b/include/linux/virtio.h > index 55ea329fe72a..091c30f21ff9 100644 > --- a/include/linux/virtio.h > +++ b/include/linux/virtio.h > @@ -84,6 +84,8 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *vq); > > bool virtqueue_is_broken(struct virtqueue *vq); > > +void virtqueue_set_work_steal(struct virtqueue *vq, bool val); > + > const struct vring *virtqueue_get_vring(struct virtqueue *vq); > dma_addr_t virtqueue_get_desc_addr(struct virtqueue *vq); > dma_addr_t virtqueue_get_avail_addr(struct virtqueue *vq); > -- > 2.30.0.365.g02bc693789-goog >
On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote: > > With the implementation of napi-tx in virtio driver, we clean tx > > descriptors from rx napi handler, for the purpose of reducing tx > > complete interrupts. But this could introduce a race where tx complete > > interrupt has been raised, but the handler found there is no work to do > > because we have done the work in the previous rx interrupt handler. > > This could lead to the following warning msg: > > [ 3588.010778] irq 38: nobody cared (try booting with the > > "irqpoll" option) > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > > 5.3.0-19-generic #20~18.04.2-Ubuntu > > [ 3588.017940] Call Trace: > > [ 3588.017942] <IRQ> > > [ 3588.017951] dump_stack+0x63/0x85 > > [ 3588.017953] __report_bad_irq+0x35/0xc0 > > [ 3588.017955] note_interrupt+0x24b/0x2a0 > > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > > [ 3588.017957] handle_irq_event+0x3b/0x60 > > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > > [ 3588.017961] handle_irq+0x20/0x30 > > [ 3588.017964] do_IRQ+0x50/0xe0 > > [ 3588.017966] common_interrupt+0xf/0xf > > [ 3588.017966] </IRQ> > > [ 3588.017989] handlers: > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > > [ 3588.025099] Disabling IRQ #38 > > > > This patch adds a new param to struct vring_virtqueue, and we set it for > > tx virtqueues if napi-tx is enabled, to suppress the warning in such > > case. > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > Reported-by: Rick Jones <jonesrick@google.com> > > Signed-off-by: Wei Wang <weiwan@google.com> > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > This description does not make sense to me. > > irq X: nobody cared > only triggers after an interrupt is unhandled repeatedly. > > So something causes a storm of useless tx interrupts here. > > Let's find out what it was please. What you are doing is > just preventing linux from complaining. The traffic that causes this warning is a netperf tcp_stream with at least 128 flows between 2 hosts. And the warning gets triggered on the receiving host, which has a lot of rx interrupts firing on all queues, and a few tx interrupts. And I think the scenario is: when the tx interrupt gets fired, it gets coalesced with the rx interrupt. Basically, the rx and tx interrupts get triggered very close to each other, and gets handled in one round of do_IRQ(). And the rx irq handler gets called first, which calls virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx() to try to do the work on the corresponding tx queue as well. That's why when tx interrupt handler gets called, it sees no work to do. And the reason for the rx handler to handle the tx work is here: https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html > > > > > --- > > drivers/net/virtio_net.c | 19 ++++++++++++++----- > > drivers/virtio/virtio_ring.c | 16 ++++++++++++++++ > > include/linux/virtio.h | 2 ++ > > 3 files changed, 32 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 508408fbe78f..e9a3f30864e8 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi, > > return; > > } > > > > + /* With napi_tx enabled, free_old_xmit_skbs() could be called from > > + * rx napi handler. Set work_steal to suppress bad irq warning for > > + * IRQ_NONE case from tx complete interrupt handler. > > + */ > > + virtqueue_set_work_steal(vq, true); > > + > > return virtnet_napi_enable(vq, napi); > > } > > > > -static void virtnet_napi_tx_disable(struct napi_struct *napi) > > +static void virtnet_napi_tx_disable(struct virtqueue *vq, > > + struct napi_struct *napi) > > { > > - if (napi->weight) > > + if (napi->weight) { > > napi_disable(napi); > > + virtqueue_set_work_steal(vq, false); > > + } > > } > > > > static void refill_work(struct work_struct *work) > > @@ -1835,7 +1844,7 @@ static int virtnet_close(struct net_device *dev) > > for (i = 0; i < vi->max_queue_pairs; i++) { > > xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); > > napi_disable(&vi->rq[i].napi); > > - virtnet_napi_tx_disable(&vi->sq[i].napi); > > + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); > > } > > > > return 0; > > @@ -2315,7 +2324,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev) > > if (netif_running(vi->dev)) { > > for (i = 0; i < vi->max_queue_pairs; i++) { > > napi_disable(&vi->rq[i].napi); > > - virtnet_napi_tx_disable(&vi->sq[i].napi); > > + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); > > } > > } > > } > > @@ -2440,7 +2449,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog, > > if (netif_running(dev)) { > > for (i = 0; i < vi->max_queue_pairs; i++) { > > napi_disable(&vi->rq[i].napi); > > - virtnet_napi_tx_disable(&vi->sq[i].napi); > > + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); > > } > > } > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > > index 71e16b53e9c1..f7c5d697c302 100644 > > --- a/drivers/virtio/virtio_ring.c > > +++ b/drivers/virtio/virtio_ring.c > > @@ -105,6 +105,9 @@ struct vring_virtqueue { > > /* Host publishes avail event idx */ > > bool event; > > > > + /* Tx side napi work could be done from rx side. */ > > + bool work_steal; > > + > > /* Head of free buffer list. */ > > unsigned int free_head; > > /* Number we've added since last sync. */ > > @@ -1604,6 +1607,7 @@ static struct virtqueue *vring_create_virtqueue_packed( > > vq->notify = notify; > > vq->weak_barriers = weak_barriers; > > vq->broken = false; > > + vq->work_steal = false; > > vq->last_used_idx = 0; > > vq->num_added = 0; > > vq->packed_ring = true; > > @@ -2038,6 +2042,9 @@ irqreturn_t vring_interrupt(int irq, void *_vq) > > > > if (!more_used(vq)) { > > pr_debug("virtqueue interrupt with no work for %p\n", vq); > > + if (vq->work_steal) > > + return IRQ_HANDLED; > > + > > return IRQ_NONE; > > } > > > > @@ -2082,6 +2089,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index, > > vq->notify = notify; > > vq->weak_barriers = weak_barriers; > > vq->broken = false; > > + vq->work_steal = false; > > vq->last_used_idx = 0; > > vq->num_added = 0; > > vq->use_dma_api = vring_use_dma_api(vdev); > > @@ -2266,6 +2274,14 @@ bool virtqueue_is_broken(struct virtqueue *_vq) > > } > > EXPORT_SYMBOL_GPL(virtqueue_is_broken); > > > > +void virtqueue_set_work_steal(struct virtqueue *_vq, bool val) > > +{ > > + struct vring_virtqueue *vq = to_vvq(_vq); > > + > > + vq->work_steal = val; > > +} > > +EXPORT_SYMBOL_GPL(virtqueue_set_work_steal); > > + > > /* > > * This should prevent the device from being used, allowing drivers to > > * recover. You may need to grab appropriate locks to flush. > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h > > index 55ea329fe72a..091c30f21ff9 100644 > > --- a/include/linux/virtio.h > > +++ b/include/linux/virtio.h > > @@ -84,6 +84,8 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *vq); > > > > bool virtqueue_is_broken(struct virtqueue *vq); > > > > +void virtqueue_set_work_steal(struct virtqueue *vq, bool val); > > + > > const struct vring *virtqueue_get_vring(struct virtqueue *vq); > > dma_addr_t virtqueue_get_desc_addr(struct virtqueue *vq); > > dma_addr_t virtqueue_get_avail_addr(struct virtqueue *vq); > > -- > > 2.30.0.365.g02bc693789-goog > > >
On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote: > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote: > > > With the implementation of napi-tx in virtio driver, we clean tx > > > descriptors from rx napi handler, for the purpose of reducing tx > > > complete interrupts. But this could introduce a race where tx complete > > > interrupt has been raised, but the handler found there is no work to do > > > because we have done the work in the previous rx interrupt handler. > > > This could lead to the following warning msg: > > > [ 3588.010778] irq 38: nobody cared (try booting with the > > > "irqpoll" option) > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > > > 5.3.0-19-generic #20~18.04.2-Ubuntu > > > [ 3588.017940] Call Trace: > > > [ 3588.017942] <IRQ> > > > [ 3588.017951] dump_stack+0x63/0x85 > > > [ 3588.017953] __report_bad_irq+0x35/0xc0 > > > [ 3588.017955] note_interrupt+0x24b/0x2a0 > > > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > > > [ 3588.017957] handle_irq_event+0x3b/0x60 > > > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > > > [ 3588.017961] handle_irq+0x20/0x30 > > > [ 3588.017964] do_IRQ+0x50/0xe0 > > > [ 3588.017966] common_interrupt+0xf/0xf > > > [ 3588.017966] </IRQ> > > > [ 3588.017989] handlers: > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > > > [ 3588.025099] Disabling IRQ #38 > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such > > > case. > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > > Reported-by: Rick Jones <jonesrick@google.com> > > > Signed-off-by: Wei Wang <weiwan@google.com> > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > > > This description does not make sense to me. > > > > irq X: nobody cared > > only triggers after an interrupt is unhandled repeatedly. > > > > So something causes a storm of useless tx interrupts here. > > > > Let's find out what it was please. What you are doing is > > just preventing linux from complaining. > > The traffic that causes this warning is a netperf tcp_stream with at > least 128 flows between 2 hosts. And the warning gets triggered on the > receiving host, which has a lot of rx interrupts firing on all queues, > and a few tx interrupts. > And I think the scenario is: when the tx interrupt gets fired, it gets > coalesced with the rx interrupt. Basically, the rx and tx interrupts > get triggered very close to each other, and gets handled in one round > of do_IRQ(). And the rx irq handler gets called first, which calls > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx() > to try to do the work on the corresponding tx queue as well. That's > why when tx interrupt handler gets called, it sees no work to do. > And the reason for the rx handler to handle the tx work is here: > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html Indeed. It's not a storm necessarily. The warning occurs after one hundred such events, since boot, which is a small number compared real interrupt load. Occasionally seeing an interrupt with no work is expected after 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As long as this rate of events is very low compared to useful interrupts, and total interrupt count is greatly reduced vs not having work stealing, it is a net win.
On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote: > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote: > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote: > > > > With the implementation of napi-tx in virtio driver, we clean tx > > > > descriptors from rx napi handler, for the purpose of reducing tx > > > > complete interrupts. But this could introduce a race where tx complete > > > > interrupt has been raised, but the handler found there is no work to do > > > > because we have done the work in the previous rx interrupt handler. > > > > This could lead to the following warning msg: > > > > [ 3588.010778] irq 38: nobody cared (try booting with the > > > > "irqpoll" option) > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu > > > > [ 3588.017940] Call Trace: > > > > [ 3588.017942] <IRQ> > > > > [ 3588.017951] dump_stack+0x63/0x85 > > > > [ 3588.017953] __report_bad_irq+0x35/0xc0 > > > > [ 3588.017955] note_interrupt+0x24b/0x2a0 > > > > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > > > > [ 3588.017957] handle_irq_event+0x3b/0x60 > > > > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > > > > [ 3588.017961] handle_irq+0x20/0x30 > > > > [ 3588.017964] do_IRQ+0x50/0xe0 > > > > [ 3588.017966] common_interrupt+0xf/0xf > > > > [ 3588.017966] </IRQ> > > > > [ 3588.017989] handlers: > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > > > > [ 3588.025099] Disabling IRQ #38 > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such > > > > case. > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > > > Reported-by: Rick Jones <jonesrick@google.com> > > > > Signed-off-by: Wei Wang <weiwan@google.com> > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > > > > > > This description does not make sense to me. > > > > > > irq X: nobody cared > > > only triggers after an interrupt is unhandled repeatedly. > > > > > > So something causes a storm of useless tx interrupts here. > > > > > > Let's find out what it was please. What you are doing is > > > just preventing linux from complaining. > > > > The traffic that causes this warning is a netperf tcp_stream with at > > least 128 flows between 2 hosts. And the warning gets triggered on the > > receiving host, which has a lot of rx interrupts firing on all queues, > > and a few tx interrupts. > > And I think the scenario is: when the tx interrupt gets fired, it gets > > coalesced with the rx interrupt. Basically, the rx and tx interrupts > > get triggered very close to each other, and gets handled in one round > > of do_IRQ(). And the rx irq handler gets called first, which calls > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx() > > to try to do the work on the corresponding tx queue as well. That's > > why when tx interrupt handler gets called, it sees no work to do. > > And the reason for the rx handler to handle the tx work is here: > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html > > Indeed. It's not a storm necessarily. The warning occurs after one > hundred such events, since boot, which is a small number compared real > interrupt load. Sorry, this is wrong. It is the other call to __report_bad_irq from note_interrupt that applies here. > Occasionally seeing an interrupt with no work is expected after > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As > long as this rate of events is very low compared to useful interrupts, > and total interrupt count is greatly reduced vs not having work > stealing, it is a net win.
On 2021/2/2 下午10:37, Willem de Bruijn wrote: > On Mon, Feb 1, 2021 at 10:09 PM Jason Wang <jasowang@redhat.com> wrote: >> >> On 2021/1/29 上午8:21, Wei Wang wrote: >>> With the implementation of napi-tx in virtio driver, we clean tx >>> descriptors from rx napi handler, for the purpose of reducing tx >>> complete interrupts. But this could introduce a race where tx complete >>> interrupt has been raised, but the handler found there is no work to do >>> because we have done the work in the previous rx interrupt handler. >>> This could lead to the following warning msg: >>> [ 3588.010778] irq 38: nobody cared (try booting with the >>> "irqpoll" option) >>> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted >>> 5.3.0-19-generic #20~18.04.2-Ubuntu >>> [ 3588.017940] Call Trace: >>> [ 3588.017942] <IRQ> >>> [ 3588.017951] dump_stack+0x63/0x85 >>> [ 3588.017953] __report_bad_irq+0x35/0xc0 >>> [ 3588.017955] note_interrupt+0x24b/0x2a0 >>> [ 3588.017956] handle_irq_event_percpu+0x54/0x80 >>> [ 3588.017957] handle_irq_event+0x3b/0x60 >>> [ 3588.017958] handle_edge_irq+0x83/0x1a0 >>> [ 3588.017961] handle_irq+0x20/0x30 >>> [ 3588.017964] do_IRQ+0x50/0xe0 >>> [ 3588.017966] common_interrupt+0xf/0xf >>> [ 3588.017966] </IRQ> >>> [ 3588.017989] handlers: >>> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt >>> [ 3588.025099] Disabling IRQ #38 >>> >>> This patch adds a new param to struct vring_virtqueue, and we set it for >>> tx virtqueues if napi-tx is enabled, to suppress the warning in such >>> case. >>> >>> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") >>> Reported-by: Rick Jones <jonesrick@google.com> >>> Signed-off-by: Wei Wang <weiwan@google.com> >>> Signed-off-by: Willem de Bruijn <willemb@google.com> >> >> Please use get_maintainer.pl to make sure Michael and me were cced. > Will do. Sorry about that. I suggested just the virtualization list, my bad. > >>> --- >>> drivers/net/virtio_net.c | 19 ++++++++++++++----- >>> drivers/virtio/virtio_ring.c | 16 ++++++++++++++++ >>> include/linux/virtio.h | 2 ++ >>> 3 files changed, 32 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>> index 508408fbe78f..e9a3f30864e8 100644 >>> --- a/drivers/net/virtio_net.c >>> +++ b/drivers/net/virtio_net.c >>> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi, >>> return; >>> } >>> >>> + /* With napi_tx enabled, free_old_xmit_skbs() could be called from >>> + * rx napi handler. Set work_steal to suppress bad irq warning for >>> + * IRQ_NONE case from tx complete interrupt handler. >>> + */ >>> + virtqueue_set_work_steal(vq, true); >>> + >>> return virtnet_napi_enable(vq, napi); >> >> Do we need to force the ordering between steal set and napi enable? > The warning only occurs after one hundred spurious interrupts, so not > really. Ok, so it looks like a hint. Then I wonder how much value do we need to introduce helper like virtqueue_set_work_steal() that allows the caller to toggle. How about disable the check forever during virtqueue initialization? > >>> } >>> >>> -static void virtnet_napi_tx_disable(struct napi_struct *napi) >>> +static void virtnet_napi_tx_disable(struct virtqueue *vq, >>> + struct napi_struct *napi) >>> { >>> - if (napi->weight) >>> + if (napi->weight) { >>> napi_disable(napi); >>> + virtqueue_set_work_steal(vq, false); >>> + } >>> } >>> >>> static void refill_work(struct work_struct *work) >>> @@ -1835,7 +1844,7 @@ static int virtnet_close(struct net_device *dev) >>> for (i = 0; i < vi->max_queue_pairs; i++) { >>> xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); >>> napi_disable(&vi->rq[i].napi); >>> - virtnet_napi_tx_disable(&vi->sq[i].napi); >>> + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); >>> } >>> >>> return 0; >>> @@ -2315,7 +2324,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev) >>> if (netif_running(vi->dev)) { >>> for (i = 0; i < vi->max_queue_pairs; i++) { >>> napi_disable(&vi->rq[i].napi); >>> - virtnet_napi_tx_disable(&vi->sq[i].napi); >>> + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); >>> } >>> } >>> } >>> @@ -2440,7 +2449,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog, >>> if (netif_running(dev)) { >>> for (i = 0; i < vi->max_queue_pairs; i++) { >>> napi_disable(&vi->rq[i].napi); >>> - virtnet_napi_tx_disable(&vi->sq[i].napi); >>> + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); >>> } >>> } >>> >>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c >>> index 71e16b53e9c1..f7c5d697c302 100644 >>> --- a/drivers/virtio/virtio_ring.c >>> +++ b/drivers/virtio/virtio_ring.c >>> @@ -105,6 +105,9 @@ struct vring_virtqueue { >>> /* Host publishes avail event idx */ >>> bool event; >>> >>> + /* Tx side napi work could be done from rx side. */ >>> + bool work_steal; >> >> So vring_vritqueue is a general structure, let's avoid mentioning >> network specific stuffs here. And we need a better name like >> "no_interrupt_check"? >> >> And we need a separate patch for virtio core changes. > Ack. Will change. > >>> + >>> /* Head of free buffer list. */ >>> unsigned int free_head; >>> /* Number we've added since last sync. */ >>> @@ -1604,6 +1607,7 @@ static struct virtqueue *vring_create_virtqueue_packed( >>> vq->notify = notify; >>> vq->weak_barriers = weak_barriers; >>> vq->broken = false; >>> + vq->work_steal = false; >>> vq->last_used_idx = 0; >>> vq->num_added = 0; >>> vq->packed_ring = true; >>> @@ -2038,6 +2042,9 @@ irqreturn_t vring_interrupt(int irq, void *_vq) >>> >>> if (!more_used(vq)) { >>> pr_debug("virtqueue interrupt with no work for %p\n", vq); >> >> Do we still need to keep this warning? > Come to think of it, I would say no, in this case. > >> >>> + if (vq->work_steal) >>> + return IRQ_HANDLED; >> >> So I wonder instead of doing trick like this, maybe it's time to unify >> TX/RX NAPI with the help of[1] (virtio-net use queue pairs). >> >> Thanks >> >> [1] https://lkml.org/lkml/2014/12/25/169 > Interesting idea. It does sound like a good fit for this model. The > patch in the Fixes line proved effective at suppressing unnecessary TX > interrupts when processing in RX interrupt handler. So not sure how > much will help in practice. Might be a nice project to evaluate > separate for net-next at some point. Right, basically the idea is that if an irq is shared among several virtqueues, there's no need to check for more_used() there. Yes, we can try sometime in the future. (Or e.g we have more than 128 queue pairs). Thanks > > Thanks for the review! >
On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote: > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote: > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote: > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote: > > > > > With the implementation of napi-tx in virtio driver, we clean tx > > > > > descriptors from rx napi handler, for the purpose of reducing tx > > > > > complete interrupts. But this could introduce a race where tx complete > > > > > interrupt has been raised, but the handler found there is no work to do > > > > > because we have done the work in the previous rx interrupt handler. > > > > > This could lead to the following warning msg: > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the > > > > > "irqpoll" option) > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu > > > > > [ 3588.017940] Call Trace: > > > > > [ 3588.017942] <IRQ> > > > > > [ 3588.017951] dump_stack+0x63/0x85 > > > > > [ 3588.017953] __report_bad_irq+0x35/0xc0 > > > > > [ 3588.017955] note_interrupt+0x24b/0x2a0 > > > > > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > > > > > [ 3588.017957] handle_irq_event+0x3b/0x60 > > > > > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > > > > > [ 3588.017961] handle_irq+0x20/0x30 > > > > > [ 3588.017964] do_IRQ+0x50/0xe0 > > > > > [ 3588.017966] common_interrupt+0xf/0xf > > > > > [ 3588.017966] </IRQ> > > > > > [ 3588.017989] handlers: > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > > > > > [ 3588.025099] Disabling IRQ #38 > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such > > > > > case. > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > > > > Reported-by: Rick Jones <jonesrick@google.com> > > > > > Signed-off-by: Wei Wang <weiwan@google.com> > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > > > > > > > > > This description does not make sense to me. > > > > > > > > irq X: nobody cared > > > > only triggers after an interrupt is unhandled repeatedly. > > > > > > > > So something causes a storm of useless tx interrupts here. > > > > > > > > Let's find out what it was please. What you are doing is > > > > just preventing linux from complaining. > > > > > > The traffic that causes this warning is a netperf tcp_stream with at > > > least 128 flows between 2 hosts. And the warning gets triggered on the > > > receiving host, which has a lot of rx interrupts firing on all queues, > > > and a few tx interrupts. > > > And I think the scenario is: when the tx interrupt gets fired, it gets > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts > > > get triggered very close to each other, and gets handled in one round > > > of do_IRQ(). And the rx irq handler gets called first, which calls > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx() > > > to try to do the work on the corresponding tx queue as well. That's > > > why when tx interrupt handler gets called, it sees no work to do. > > > And the reason for the rx handler to handle the tx work is here: > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html > > > > Indeed. It's not a storm necessarily. The warning occurs after one > > hundred such events, since boot, which is a small number compared real > > interrupt load. > > Sorry, this is wrong. It is the other call to __report_bad_irq from > note_interrupt that applies here. > > > Occasionally seeing an interrupt with no work is expected after > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As > > long as this rate of events is very low compared to useful interrupts, > > and total interrupt count is greatly reduced vs not having work > > stealing, it is a net win. Right, but if 99900 out of 100000 interrupts were wasted, then it is surely an even greater win to disable interrupts while polling like this. Might be tricky to detect, disabling/enabling aggressively every time even if there's nothing in the queue is sure to cause lots of cache line bounces, and we don't want to enable callbacks if they were not enabled e.g. by start_xmit ... Some kind of counter? -- MST
On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote: > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote: > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote: > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote: > > > > > > With the implementation of napi-tx in virtio driver, we clean tx > > > > > > descriptors from rx napi handler, for the purpose of reducing tx > > > > > > complete interrupts. But this could introduce a race where tx complete > > > > > > interrupt has been raised, but the handler found there is no work to do > > > > > > because we have done the work in the previous rx interrupt handler. > > > > > > This could lead to the following warning msg: > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the > > > > > > "irqpoll" option) > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu > > > > > > [ 3588.017940] Call Trace: > > > > > > [ 3588.017942] <IRQ> > > > > > > [ 3588.017951] dump_stack+0x63/0x85 > > > > > > [ 3588.017953] __report_bad_irq+0x35/0xc0 > > > > > > [ 3588.017955] note_interrupt+0x24b/0x2a0 > > > > > > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > > > > > > [ 3588.017957] handle_irq_event+0x3b/0x60 > > > > > > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > > > > > > [ 3588.017961] handle_irq+0x20/0x30 > > > > > > [ 3588.017964] do_IRQ+0x50/0xe0 > > > > > > [ 3588.017966] common_interrupt+0xf/0xf > > > > > > [ 3588.017966] </IRQ> > > > > > > [ 3588.017989] handlers: > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > > > > > > [ 3588.025099] Disabling IRQ #38 > > > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such > > > > > > case. > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > > > > > Reported-by: Rick Jones <jonesrick@google.com> > > > > > > Signed-off-by: Wei Wang <weiwan@google.com> > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > > > > > > > > > > > > This description does not make sense to me. > > > > > > > > > > irq X: nobody cared > > > > > only triggers after an interrupt is unhandled repeatedly. > > > > > > > > > > So something causes a storm of useless tx interrupts here. > > > > > > > > > > Let's find out what it was please. What you are doing is > > > > > just preventing linux from complaining. > > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at > > > > least 128 flows between 2 hosts. And the warning gets triggered on the > > > > receiving host, which has a lot of rx interrupts firing on all queues, > > > > and a few tx interrupts. > > > > And I think the scenario is: when the tx interrupt gets fired, it gets > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts > > > > get triggered very close to each other, and gets handled in one round > > > > of do_IRQ(). And the rx irq handler gets called first, which calls > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx() > > > > to try to do the work on the corresponding tx queue as well. That's > > > > why when tx interrupt handler gets called, it sees no work to do. > > > > And the reason for the rx handler to handle the tx work is here: > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one > > > hundred such events, since boot, which is a small number compared real > > > interrupt load. > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from > > note_interrupt that applies here. > > > > > Occasionally seeing an interrupt with no work is expected after > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As > > > long as this rate of events is very low compared to useful interrupts, > > > and total interrupt count is greatly reduced vs not having work > > > stealing, it is a net win. > > Right, but if 99900 out of 100000 interrupts were wasted, then it is > surely an even greater win to disable interrupts while polling like > this. Might be tricky to detect, disabling/enabling aggressively every > time even if there's nothing in the queue is sure to cause lots of cache > line bounces, and we don't want to enable callbacks if they were not > enabled e.g. by start_xmit ... Some kind of counter? Yes. It was known that the work stealing is more effective in some workloads than others. But a 99% spurious rate I had not anticipated. Most interesting is the number of interrupts suppressed as a result of the feature. That is not captured by this statistic. In any case, we'll take a step back to better understand behavior. And especially why this high spurious rate exhibits in this workload with many concurrent flows.
On Wed, Feb 3, 2021 at 12:33 AM Jason Wang <jasowang@redhat.com> wrote: > > > On 2021/2/2 下午10:37, Willem de Bruijn wrote: > > On Mon, Feb 1, 2021 at 10:09 PM Jason Wang <jasowang@redhat.com> wrote: > >> > >> On 2021/1/29 上午8:21, Wei Wang wrote: > >>> With the implementation of napi-tx in virtio driver, we clean tx > >>> descriptors from rx napi handler, for the purpose of reducing tx > >>> complete interrupts. But this could introduce a race where tx complete > >>> interrupt has been raised, but the handler found there is no work to do > >>> because we have done the work in the previous rx interrupt handler. > >>> This could lead to the following warning msg: > >>> [ 3588.010778] irq 38: nobody cared (try booting with the > >>> "irqpoll" option) > >>> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > >>> 5.3.0-19-generic #20~18.04.2-Ubuntu > >>> [ 3588.017940] Call Trace: > >>> [ 3588.017942] <IRQ> > >>> [ 3588.017951] dump_stack+0x63/0x85 > >>> [ 3588.017953] __report_bad_irq+0x35/0xc0 > >>> [ 3588.017955] note_interrupt+0x24b/0x2a0 > >>> [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > >>> [ 3588.017957] handle_irq_event+0x3b/0x60 > >>> [ 3588.017958] handle_edge_irq+0x83/0x1a0 > >>> [ 3588.017961] handle_irq+0x20/0x30 > >>> [ 3588.017964] do_IRQ+0x50/0xe0 > >>> [ 3588.017966] common_interrupt+0xf/0xf > >>> [ 3588.017966] </IRQ> > >>> [ 3588.017989] handlers: > >>> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > >>> [ 3588.025099] Disabling IRQ #38 > >>> > >>> This patch adds a new param to struct vring_virtqueue, and we set it for > >>> tx virtqueues if napi-tx is enabled, to suppress the warning in such > >>> case. > >>> > >>> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > >>> Reported-by: Rick Jones <jonesrick@google.com> > >>> Signed-off-by: Wei Wang <weiwan@google.com> > >>> Signed-off-by: Willem de Bruijn <willemb@google.com> > >> > >> Please use get_maintainer.pl to make sure Michael and me were cced. > > Will do. Sorry about that. I suggested just the virtualization list, my bad. > > > >>> --- > >>> drivers/net/virtio_net.c | 19 ++++++++++++++----- > >>> drivers/virtio/virtio_ring.c | 16 ++++++++++++++++ > >>> include/linux/virtio.h | 2 ++ > >>> 3 files changed, 32 insertions(+), 5 deletions(-) > >>> > >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >>> index 508408fbe78f..e9a3f30864e8 100644 > >>> --- a/drivers/net/virtio_net.c > >>> +++ b/drivers/net/virtio_net.c > >>> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi, > >>> return; > >>> } > >>> > >>> + /* With napi_tx enabled, free_old_xmit_skbs() could be called from > >>> + * rx napi handler. Set work_steal to suppress bad irq warning for > >>> + * IRQ_NONE case from tx complete interrupt handler. > >>> + */ > >>> + virtqueue_set_work_steal(vq, true); > >>> + > >>> return virtnet_napi_enable(vq, napi); > >> > >> Do we need to force the ordering between steal set and napi enable? > > The warning only occurs after one hundred spurious interrupts, so not > > really. > > > Ok, so it looks like a hint. Then I wonder how much value do we need to > introduce helper like virtqueue_set_work_steal() that allows the caller > to toggle. How about disable the check forever during virtqueue > initialization? Yes, that is even simpler. We still need the helper, as the internal variables of vring_virtqueue are not accessible from virtio-net. An earlier patch added the variable to virtqueue itself, but I think it belongs in vring_virtqueue. And the helper is not a lot of code.
On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote: > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote: > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote: > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote: > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote: > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx > > > > > > > complete interrupts. But this could introduce a race where tx complete > > > > > > > interrupt has been raised, but the handler found there is no work to do > > > > > > > because we have done the work in the previous rx interrupt handler. > > > > > > > This could lead to the following warning msg: > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the > > > > > > > "irqpoll" option) > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu > > > > > > > [ 3588.017940] Call Trace: > > > > > > > [ 3588.017942] <IRQ> > > > > > > > [ 3588.017951] dump_stack+0x63/0x85 > > > > > > > [ 3588.017953] __report_bad_irq+0x35/0xc0 > > > > > > > [ 3588.017955] note_interrupt+0x24b/0x2a0 > > > > > > > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > > > > > > > [ 3588.017957] handle_irq_event+0x3b/0x60 > > > > > > > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > > > > > > > [ 3588.017961] handle_irq+0x20/0x30 > > > > > > > [ 3588.017964] do_IRQ+0x50/0xe0 > > > > > > > [ 3588.017966] common_interrupt+0xf/0xf > > > > > > > [ 3588.017966] </IRQ> > > > > > > > [ 3588.017989] handlers: > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > > > > > > > [ 3588.025099] Disabling IRQ #38 > > > > > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such > > > > > > > case. > > > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > > > > > > Reported-by: Rick Jones <jonesrick@google.com> > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com> > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > > > > > > > > > > > > > > > This description does not make sense to me. > > > > > > > > > > > > irq X: nobody cared > > > > > > only triggers after an interrupt is unhandled repeatedly. > > > > > > > > > > > > So something causes a storm of useless tx interrupts here. > > > > > > > > > > > > Let's find out what it was please. What you are doing is > > > > > > just preventing linux from complaining. > > > > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the > > > > > receiving host, which has a lot of rx interrupts firing on all queues, > > > > > and a few tx interrupts. > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts > > > > > get triggered very close to each other, and gets handled in one round > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx() > > > > > to try to do the work on the corresponding tx queue as well. That's > > > > > why when tx interrupt handler gets called, it sees no work to do. > > > > > And the reason for the rx handler to handle the tx work is here: > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html > > > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one > > > > hundred such events, since boot, which is a small number compared real > > > > interrupt load. > > > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from > > > note_interrupt that applies here. > > > > > > > Occasionally seeing an interrupt with no work is expected after > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As > > > > long as this rate of events is very low compared to useful interrupts, > > > > and total interrupt count is greatly reduced vs not having work > > > > stealing, it is a net win. > > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is > > surely an even greater win to disable interrupts while polling like > > this. Might be tricky to detect, disabling/enabling aggressively every > > time even if there's nothing in the queue is sure to cause lots of cache > > line bounces, and we don't want to enable callbacks if they were not > > enabled e.g. by start_xmit ... Some kind of counter? > > Yes. It was known that the work stealing is more effective in some > workloads than others. But a 99% spurious rate I had not anticipated. > > Most interesting is the number of interrupts suppressed as a result of > the feature. That is not captured by this statistic. > > In any case, we'll take a step back to better understand behavior. And > especially why this high spurious rate exhibits in this workload with > many concurrent flows. I've been thinking about it. Imagine work stealing working perfectly. Each time we xmit a packet, it is stolen and freed. Since xmit enables callbacks (just in case!) we also get an interrupt, which is automatically spurious. My conclusion is that we shouldn't just work around it but instead (or additionally?) reduce the number of interrupts by disabling callbacks e.g. when a. we are currently stealing packets or b. we stole all packets This should be enough to reduce the chances below 99% ;) One annoying thing is that with split and event index, we do not disable interrupts. Could be worth revisiting, for now maybe just disable the event index feature? I am not sure it is actually worth it with stealing. -- MST
On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote: > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote: > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote: > > > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote: > > > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote: > > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx > > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx > > > > > > > > complete interrupts. But this could introduce a race where tx complete > > > > > > > > interrupt has been raised, but the handler found there is no work to do > > > > > > > > because we have done the work in the previous rx interrupt handler. > > > > > > > > This could lead to the following warning msg: > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the > > > > > > > > "irqpoll" option) > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu > > > > > > > > [ 3588.017940] Call Trace: > > > > > > > > [ 3588.017942] <IRQ> > > > > > > > > [ 3588.017951] dump_stack+0x63/0x85 > > > > > > > > [ 3588.017953] __report_bad_irq+0x35/0xc0 > > > > > > > > [ 3588.017955] note_interrupt+0x24b/0x2a0 > > > > > > > > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > > > > > > > > [ 3588.017957] handle_irq_event+0x3b/0x60 > > > > > > > > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > > > > > > > > [ 3588.017961] handle_irq+0x20/0x30 > > > > > > > > [ 3588.017964] do_IRQ+0x50/0xe0 > > > > > > > > [ 3588.017966] common_interrupt+0xf/0xf > > > > > > > > [ 3588.017966] </IRQ> > > > > > > > > [ 3588.017989] handlers: > > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > > > > > > > > [ 3588.025099] Disabling IRQ #38 > > > > > > > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such > > > > > > > > case. > > > > > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > > > > > > > Reported-by: Rick Jones <jonesrick@google.com> > > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com> > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > > > > > > > > > > > > > > > > > > This description does not make sense to me. > > > > > > > > > > > > > > irq X: nobody cared > > > > > > > only triggers after an interrupt is unhandled repeatedly. > > > > > > > > > > > > > > So something causes a storm of useless tx interrupts here. > > > > > > > > > > > > > > Let's find out what it was please. What you are doing is > > > > > > > just preventing linux from complaining. > > > > > > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at > > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the > > > > > > receiving host, which has a lot of rx interrupts firing on all queues, > > > > > > and a few tx interrupts. > > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets > > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts > > > > > > get triggered very close to each other, and gets handled in one round > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls > > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx() > > > > > > to try to do the work on the corresponding tx queue as well. That's > > > > > > why when tx interrupt handler gets called, it sees no work to do. > > > > > > And the reason for the rx handler to handle the tx work is here: > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html > > > > > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one > > > > > hundred such events, since boot, which is a small number compared real > > > > > interrupt load. > > > > > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from > > > > note_interrupt that applies here. > > > > > > > > > Occasionally seeing an interrupt with no work is expected after > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As > > > > > long as this rate of events is very low compared to useful interrupts, > > > > > and total interrupt count is greatly reduced vs not having work > > > > > stealing, it is a net win. > > > > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is > > > surely an even greater win to disable interrupts while polling like > > > this. Might be tricky to detect, disabling/enabling aggressively every > > > time even if there's nothing in the queue is sure to cause lots of cache > > > line bounces, and we don't want to enable callbacks if they were not > > > enabled e.g. by start_xmit ... Some kind of counter? > > > > Yes. It was known that the work stealing is more effective in some > > workloads than others. But a 99% spurious rate I had not anticipated. > > > > Most interesting is the number of interrupts suppressed as a result of > > the feature. That is not captured by this statistic. > > > > In any case, we'll take a step back to better understand behavior. And > > especially why this high spurious rate exhibits in this workload with > > many concurrent flows. > > > I've been thinking about it. Imagine work stealing working perfectly. > Each time we xmit a packet, it is stolen and freed. > Since xmit enables callbacks (just in case!) we also > get an interrupt, which is automatically spurious. > > My conclusion is that we shouldn't just work around it but instead > (or additionally?) > reduce the number of interrupts by disabling callbacks e.g. when > a. we are currently stealing packets > or > b. we stole all packets > Thinking along this line, that probably means, we should disable cb on the tx virtqueue, when scheduling the napi work on the rx side, and reenable it after the rx napi work is done? Also, I wonder if it is too late to disable cb at the point we start to steal pkts or have stolen all pkts. Because the steal work is done in the napi handler of the rx queue. But the tx interrupt must have been raised before that. Will we come back to process the tx interrupt again after we re-enabled the cb on the tx side? > This should be enough to reduce the chances below 99% ;) > > One annoying thing is that with split and event index, we do not disable > interrupts. Could be worth revisiting, for now maybe just disable the > event index feature? I am not sure it is actually worth it with > stealing. > > -- > MST >
On 2021/2/4 上午2:28, Willem de Bruijn wrote: > On Wed, Feb 3, 2021 at 12:33 AM Jason Wang <jasowang@redhat.com> wrote: >> >> On 2021/2/2 下午10:37, Willem de Bruijn wrote: >>> On Mon, Feb 1, 2021 at 10:09 PM Jason Wang <jasowang@redhat.com> wrote: >>>> On 2021/1/29 上午8:21, Wei Wang wrote: >>>>> With the implementation of napi-tx in virtio driver, we clean tx >>>>> descriptors from rx napi handler, for the purpose of reducing tx >>>>> complete interrupts. But this could introduce a race where tx complete >>>>> interrupt has been raised, but the handler found there is no work to do >>>>> because we have done the work in the previous rx interrupt handler. >>>>> This could lead to the following warning msg: >>>>> [ 3588.010778] irq 38: nobody cared (try booting with the >>>>> "irqpoll" option) >>>>> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted >>>>> 5.3.0-19-generic #20~18.04.2-Ubuntu >>>>> [ 3588.017940] Call Trace: >>>>> [ 3588.017942] <IRQ> >>>>> [ 3588.017951] dump_stack+0x63/0x85 >>>>> [ 3588.017953] __report_bad_irq+0x35/0xc0 >>>>> [ 3588.017955] note_interrupt+0x24b/0x2a0 >>>>> [ 3588.017956] handle_irq_event_percpu+0x54/0x80 >>>>> [ 3588.017957] handle_irq_event+0x3b/0x60 >>>>> [ 3588.017958] handle_edge_irq+0x83/0x1a0 >>>>> [ 3588.017961] handle_irq+0x20/0x30 >>>>> [ 3588.017964] do_IRQ+0x50/0xe0 >>>>> [ 3588.017966] common_interrupt+0xf/0xf >>>>> [ 3588.017966] </IRQ> >>>>> [ 3588.017989] handlers: >>>>> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt >>>>> [ 3588.025099] Disabling IRQ #38 >>>>> >>>>> This patch adds a new param to struct vring_virtqueue, and we set it for >>>>> tx virtqueues if napi-tx is enabled, to suppress the warning in such >>>>> case. >>>>> >>>>> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") >>>>> Reported-by: Rick Jones <jonesrick@google.com> >>>>> Signed-off-by: Wei Wang <weiwan@google.com> >>>>> Signed-off-by: Willem de Bruijn <willemb@google.com> >>>> Please use get_maintainer.pl to make sure Michael and me were cced. >>> Will do. Sorry about that. I suggested just the virtualization list, my bad. >>> >>>>> --- >>>>> drivers/net/virtio_net.c | 19 ++++++++++++++----- >>>>> drivers/virtio/virtio_ring.c | 16 ++++++++++++++++ >>>>> include/linux/virtio.h | 2 ++ >>>>> 3 files changed, 32 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>>>> index 508408fbe78f..e9a3f30864e8 100644 >>>>> --- a/drivers/net/virtio_net.c >>>>> +++ b/drivers/net/virtio_net.c >>>>> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi, >>>>> return; >>>>> } >>>>> >>>>> + /* With napi_tx enabled, free_old_xmit_skbs() could be called from >>>>> + * rx napi handler. Set work_steal to suppress bad irq warning for >>>>> + * IRQ_NONE case from tx complete interrupt handler. >>>>> + */ >>>>> + virtqueue_set_work_steal(vq, true); >>>>> + >>>>> return virtnet_napi_enable(vq, napi); >>>> Do we need to force the ordering between steal set and napi enable? >>> The warning only occurs after one hundred spurious interrupts, so not >>> really. >> >> Ok, so it looks like a hint. Then I wonder how much value do we need to >> introduce helper like virtqueue_set_work_steal() that allows the caller >> to toggle. How about disable the check forever during virtqueue >> initialization? > Yes, that is even simpler. > > We still need the helper, as the internal variables of vring_virtqueue > are not accessible from virtio-net. An earlier patch added the > variable to virtqueue itself, but I think it belongs in > vring_virtqueue. And the helper is not a lot of code. It's better to do this before the allocating the irq. But it looks not easy unless we extend find_vqs(). Thanks >
On Wed, Feb 3, 2021 at 6:53 PM Wei Wang <weiwan@google.com> wrote: > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote: > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote: > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote: > > > > > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote: > > > > > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote: > > > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx > > > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx > > > > > > > > > complete interrupts. But this could introduce a race where tx complete > > > > > > > > > interrupt has been raised, but the handler found there is no work to do > > > > > > > > > because we have done the work in the previous rx interrupt handler. > > > > > > > > > This could lead to the following warning msg: > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the > > > > > > > > > "irqpoll" option) > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu > > > > > > > > > [ 3588.017940] Call Trace: > > > > > > > > > [ 3588.017942] <IRQ> > > > > > > > > > [ 3588.017951] dump_stack+0x63/0x85 > > > > > > > > > [ 3588.017953] __report_bad_irq+0x35/0xc0 > > > > > > > > > [ 3588.017955] note_interrupt+0x24b/0x2a0 > > > > > > > > > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > > > > > > > > > [ 3588.017957] handle_irq_event+0x3b/0x60 > > > > > > > > > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > > > > > > > > > [ 3588.017961] handle_irq+0x20/0x30 > > > > > > > > > [ 3588.017964] do_IRQ+0x50/0xe0 > > > > > > > > > [ 3588.017966] common_interrupt+0xf/0xf > > > > > > > > > [ 3588.017966] </IRQ> > > > > > > > > > [ 3588.017989] handlers: > > > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > > > > > > > > > [ 3588.025099] Disabling IRQ #38 > > > > > > > > > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such > > > > > > > > > case. > > > > > > > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > > > > > > > > Reported-by: Rick Jones <jonesrick@google.com> > > > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com> > > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > > > > > > > > > > > > > > > > > > > > > This description does not make sense to me. > > > > > > > > > > > > > > > > irq X: nobody cared > > > > > > > > only triggers after an interrupt is unhandled repeatedly. > > > > > > > > > > > > > > > > So something causes a storm of useless tx interrupts here. > > > > > > > > > > > > > > > > Let's find out what it was please. What you are doing is > > > > > > > > just preventing linux from complaining. > > > > > > > > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at > > > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the > > > > > > > receiving host, which has a lot of rx interrupts firing on all queues, > > > > > > > and a few tx interrupts. > > > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts > > > > > > > get triggered very close to each other, and gets handled in one round > > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls > > > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx() > > > > > > > to try to do the work on the corresponding tx queue as well. That's > > > > > > > why when tx interrupt handler gets called, it sees no work to do. > > > > > > > And the reason for the rx handler to handle the tx work is here: > > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html > > > > > > > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one > > > > > > hundred such events, since boot, which is a small number compared real > > > > > > interrupt load. > > > > > > > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from > > > > > note_interrupt that applies here. > > > > > > > > > > > Occasionally seeing an interrupt with no work is expected after > > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As > > > > > > long as this rate of events is very low compared to useful interrupts, > > > > > > and total interrupt count is greatly reduced vs not having work > > > > > > stealing, it is a net win. > > > > > > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is > > > > surely an even greater win to disable interrupts while polling like > > > > this. Might be tricky to detect, disabling/enabling aggressively every > > > > time even if there's nothing in the queue is sure to cause lots of cache > > > > line bounces, and we don't want to enable callbacks if they were not > > > > enabled e.g. by start_xmit ... Some kind of counter? > > > > > > Yes. It was known that the work stealing is more effective in some > > > workloads than others. But a 99% spurious rate I had not anticipated. > > > > > > Most interesting is the number of interrupts suppressed as a result of > > > the feature. That is not captured by this statistic. > > > > > > In any case, we'll take a step back to better understand behavior. And > > > especially why this high spurious rate exhibits in this workload with > > > many concurrent flows. > > > > > > I've been thinking about it. Imagine work stealing working perfectly. > > Each time we xmit a packet, it is stolen and freed. > > Since xmit enables callbacks (just in case!) we also > > get an interrupt, which is automatically spurious. > > > > My conclusion is that we shouldn't just work around it but instead > > (or additionally?) > > reduce the number of interrupts by disabling callbacks e.g. when > > a. we are currently stealing packets > > or > > b. we stole all packets Agreed. This might prove a significant performance gain at the same time :) > > > Thinking along this line, that probably means, we should disable cb on > the tx virtqueue, when scheduling the napi work on the rx side, and > reenable it after the rx napi work is done? > Also, I wonder if it is too late to disable cb at the point we start > to steal pkts or have stolen all pkts. The earlier the better. I see no benefit to delay until the rx handler actually runs. > Because the steal work is done > in the napi handler of the rx queue. But the tx interrupt must have > been raised before that. Will we come back to process the tx interrupt > again after we re-enabled the cb on the tx side? > > > This should be enough to reduce the chances below 99% ;) > > > > One annoying thing is that with split and event index, we do not disable > > interrupts. Could be worth revisiting, for now maybe just disable the > > event index feature? I am not sure it is actually worth it with > > stealing. With event index, we suppress interrupts when another interrupt is already pending from a previous packet, right? When the previous position of the producer is already beyond the consumer. It doesn't matter whether the previous packet triggered a tx interrupt or deferred to an already scheduled rx interrupt? From that seems fine to leave it out.
On Wed, Feb 3, 2021 at 10:06 PM Jason Wang <jasowang@redhat.com> wrote: > > > On 2021/2/4 上午2:28, Willem de Bruijn wrote: > > On Wed, Feb 3, 2021 at 12:33 AM Jason Wang <jasowang@redhat.com> wrote: > >> > >> On 2021/2/2 下午10:37, Willem de Bruijn wrote: > >>> On Mon, Feb 1, 2021 at 10:09 PM Jason Wang <jasowang@redhat.com> wrote: > >>>> On 2021/1/29 上午8:21, Wei Wang wrote: > >>>>> With the implementation of napi-tx in virtio driver, we clean tx > >>>>> descriptors from rx napi handler, for the purpose of reducing tx > >>>>> complete interrupts. But this could introduce a race where tx complete > >>>>> interrupt has been raised, but the handler found there is no work to do > >>>>> because we have done the work in the previous rx interrupt handler. > >>>>> This could lead to the following warning msg: > >>>>> [ 3588.010778] irq 38: nobody cared (try booting with the > >>>>> "irqpoll" option) > >>>>> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > >>>>> 5.3.0-19-generic #20~18.04.2-Ubuntu > >>>>> [ 3588.017940] Call Trace: > >>>>> [ 3588.017942] <IRQ> > >>>>> [ 3588.017951] dump_stack+0x63/0x85 > >>>>> [ 3588.017953] __report_bad_irq+0x35/0xc0 > >>>>> [ 3588.017955] note_interrupt+0x24b/0x2a0 > >>>>> [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > >>>>> [ 3588.017957] handle_irq_event+0x3b/0x60 > >>>>> [ 3588.017958] handle_edge_irq+0x83/0x1a0 > >>>>> [ 3588.017961] handle_irq+0x20/0x30 > >>>>> [ 3588.017964] do_IRQ+0x50/0xe0 > >>>>> [ 3588.017966] common_interrupt+0xf/0xf > >>>>> [ 3588.017966] </IRQ> > >>>>> [ 3588.017989] handlers: > >>>>> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > >>>>> [ 3588.025099] Disabling IRQ #38 > >>>>> > >>>>> This patch adds a new param to struct vring_virtqueue, and we set it for > >>>>> tx virtqueues if napi-tx is enabled, to suppress the warning in such > >>>>> case. > >>>>> > >>>>> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > >>>>> Reported-by: Rick Jones <jonesrick@google.com> > >>>>> Signed-off-by: Wei Wang <weiwan@google.com> > >>>>> Signed-off-by: Willem de Bruijn <willemb@google.com> > >>>> Please use get_maintainer.pl to make sure Michael and me were cced. > >>> Will do. Sorry about that. I suggested just the virtualization list, my bad. > >>> > >>>>> --- > >>>>> drivers/net/virtio_net.c | 19 ++++++++++++++----- > >>>>> drivers/virtio/virtio_ring.c | 16 ++++++++++++++++ > >>>>> include/linux/virtio.h | 2 ++ > >>>>> 3 files changed, 32 insertions(+), 5 deletions(-) > >>>>> > >>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >>>>> index 508408fbe78f..e9a3f30864e8 100644 > >>>>> --- a/drivers/net/virtio_net.c > >>>>> +++ b/drivers/net/virtio_net.c > >>>>> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi, > >>>>> return; > >>>>> } > >>>>> > >>>>> + /* With napi_tx enabled, free_old_xmit_skbs() could be called from > >>>>> + * rx napi handler. Set work_steal to suppress bad irq warning for > >>>>> + * IRQ_NONE case from tx complete interrupt handler. > >>>>> + */ > >>>>> + virtqueue_set_work_steal(vq, true); > >>>>> + > >>>>> return virtnet_napi_enable(vq, napi); > >>>> Do we need to force the ordering between steal set and napi enable? > >>> The warning only occurs after one hundred spurious interrupts, so not > >>> really. > >> > >> Ok, so it looks like a hint. Then I wonder how much value do we need to > >> introduce helper like virtqueue_set_work_steal() that allows the caller > >> to toggle. How about disable the check forever during virtqueue > >> initialization? > > Yes, that is even simpler. > > > > We still need the helper, as the internal variables of vring_virtqueue > > are not accessible from virtio-net. An earlier patch added the > > variable to virtqueue itself, but I think it belongs in > > vring_virtqueue. And the helper is not a lot of code. > > > It's better to do this before the allocating the irq. But it looks not > easy unless we extend find_vqs(). Can you elaborate why that is better? At virtnet_open the interrupts are not firing either. I have no preference. Just curious, especially if it complicates the patch.
On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang <weiwan@google.com> wrote: > > > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote: > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote: > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote: > > > > > > > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote: > > > > > > > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote: > > > > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx > > > > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx > > > > > > > > > > complete interrupts. But this could introduce a race where tx complete > > > > > > > > > > interrupt has been raised, but the handler found there is no work to do > > > > > > > > > > because we have done the work in the previous rx interrupt handler. > > > > > > > > > > This could lead to the following warning msg: > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the > > > > > > > > > > "irqpoll" option) > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu > > > > > > > > > > [ 3588.017940] Call Trace: > > > > > > > > > > [ 3588.017942] <IRQ> > > > > > > > > > > [ 3588.017951] dump_stack+0x63/0x85 > > > > > > > > > > [ 3588.017953] __report_bad_irq+0x35/0xc0 > > > > > > > > > > [ 3588.017955] note_interrupt+0x24b/0x2a0 > > > > > > > > > > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > > > > > > > > > > [ 3588.017957] handle_irq_event+0x3b/0x60 > > > > > > > > > > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > > > > > > > > > > [ 3588.017961] handle_irq+0x20/0x30 > > > > > > > > > > [ 3588.017964] do_IRQ+0x50/0xe0 > > > > > > > > > > [ 3588.017966] common_interrupt+0xf/0xf > > > > > > > > > > [ 3588.017966] </IRQ> > > > > > > > > > > [ 3588.017989] handlers: > > > > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > > > > > > > > > > [ 3588.025099] Disabling IRQ #38 > > > > > > > > > > > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such > > > > > > > > > > case. > > > > > > > > > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > > > > > > > > > Reported-by: Rick Jones <jonesrick@google.com> > > > > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com> > > > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > This description does not make sense to me. > > > > > > > > > > > > > > > > > > irq X: nobody cared > > > > > > > > > only triggers after an interrupt is unhandled repeatedly. > > > > > > > > > > > > > > > > > > So something causes a storm of useless tx interrupts here. > > > > > > > > > > > > > > > > > > Let's find out what it was please. What you are doing is > > > > > > > > > just preventing linux from complaining. > > > > > > > > > > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at > > > > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the > > > > > > > > receiving host, which has a lot of rx interrupts firing on all queues, > > > > > > > > and a few tx interrupts. > > > > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets > > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts > > > > > > > > get triggered very close to each other, and gets handled in one round > > > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls > > > > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx() > > > > > > > > to try to do the work on the corresponding tx queue as well. That's > > > > > > > > why when tx interrupt handler gets called, it sees no work to do. > > > > > > > > And the reason for the rx handler to handle the tx work is here: > > > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html > > > > > > > > > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one > > > > > > > hundred such events, since boot, which is a small number compared real > > > > > > > interrupt load. > > > > > > > > > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from > > > > > > note_interrupt that applies here. > > > > > > > > > > > > > Occasionally seeing an interrupt with no work is expected after > > > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As > > > > > > > long as this rate of events is very low compared to useful interrupts, > > > > > > > and total interrupt count is greatly reduced vs not having work > > > > > > > stealing, it is a net win. > > > > > > > > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is > > > > > surely an even greater win to disable interrupts while polling like > > > > > this. Might be tricky to detect, disabling/enabling aggressively every > > > > > time even if there's nothing in the queue is sure to cause lots of cache > > > > > line bounces, and we don't want to enable callbacks if they were not > > > > > enabled e.g. by start_xmit ... Some kind of counter? > > > > > > > > Yes. It was known that the work stealing is more effective in some > > > > workloads than others. But a 99% spurious rate I had not anticipated. > > > > > > > > Most interesting is the number of interrupts suppressed as a result of > > > > the feature. That is not captured by this statistic. > > > > > > > > In any case, we'll take a step back to better understand behavior. And > > > > especially why this high spurious rate exhibits in this workload with > > > > many concurrent flows. > > > > > > > > > I've been thinking about it. Imagine work stealing working perfectly. > > > Each time we xmit a packet, it is stolen and freed. > > > Since xmit enables callbacks (just in case!) we also > > > get an interrupt, which is automatically spurious. > > > > > > My conclusion is that we shouldn't just work around it but instead > > > (or additionally?) > > > reduce the number of interrupts by disabling callbacks e.g. when > > > a. we are currently stealing packets > > > or > > > b. we stole all packets > > Agreed. This might prove a significant performance gain at the same time :) > > > > > > Thinking along this line, that probably means, we should disable cb on > > the tx virtqueue, when scheduling the napi work on the rx side, and > > reenable it after the rx napi work is done? > > Also, I wonder if it is too late to disable cb at the point we start > > to steal pkts or have stolen all pkts. > > The earlier the better. I see no benefit to delay until the rx handler > actually runs. > I've been thinking more on this. I think the fundamental issue here is that the rx napi handler virtnet_poll() does the tx side work by calling virtnet_poll_cleantx() without any notification to the tx side. I am thinking, in virtnet_poll(), instead of directly call virtnet_poll_cleantx(), why not do virtqueue_napi_schedule() to schedule the tx side napi, and let the tx napi handler do the cleaning work. This way, we automatically call virtqueue_disable_cb() on the tx vq, and after the tx work is done, virtqueue_napi_complete() is called to re-enable the cb on the tx side. This way, the tx side knows what has been done, and will likely reduce the # of spurious tx interrupts? And I don't think there is much cost in doing that, since napi_schedule() basically queues the tx napi to the back of its napi_list, and serves it right after the rx napi handler is done. What do you guys think? I could quickly test it up to see if it solves the issue. > > Because the steal work is done > > in the napi handler of the rx queue. But the tx interrupt must have > > been raised before that. Will we come back to process the tx interrupt > > again after we re-enabled the cb on the tx side? > > > > > This should be enough to reduce the chances below 99% ;) > > > > > > One annoying thing is that with split and event index, we do not disable > > > interrupts. Could be worth revisiting, for now maybe just disable the > > > event index feature? I am not sure it is actually worth it with > > > stealing. > > With event index, we suppress interrupts when another interrupt is > already pending from a previous packet, right? When the previous > position of the producer is already beyond the consumer. It doesn't > matter whether the previous packet triggered a tx interrupt or > deferred to an already scheduled rx interrupt? From that seems fine to > leave it out.
On 2021/2/5 上午4:50, Willem de Bruijn wrote: > On Wed, Feb 3, 2021 at 10:06 PM Jason Wang <jasowang@redhat.com> wrote: >> >> On 2021/2/4 上午2:28, Willem de Bruijn wrote: >>> On Wed, Feb 3, 2021 at 12:33 AM Jason Wang <jasowang@redhat.com> wrote: >>>> On 2021/2/2 下午10:37, Willem de Bruijn wrote: >>>>> On Mon, Feb 1, 2021 at 10:09 PM Jason Wang <jasowang@redhat.com> wrote: >>>>>> On 2021/1/29 上午8:21, Wei Wang wrote: >>>>>>> With the implementation of napi-tx in virtio driver, we clean tx >>>>>>> descriptors from rx napi handler, for the purpose of reducing tx >>>>>>> complete interrupts. But this could introduce a race where tx complete >>>>>>> interrupt has been raised, but the handler found there is no work to do >>>>>>> because we have done the work in the previous rx interrupt handler. >>>>>>> This could lead to the following warning msg: >>>>>>> [ 3588.010778] irq 38: nobody cared (try booting with the >>>>>>> "irqpoll" option) >>>>>>> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted >>>>>>> 5.3.0-19-generic #20~18.04.2-Ubuntu >>>>>>> [ 3588.017940] Call Trace: >>>>>>> [ 3588.017942] <IRQ> >>>>>>> [ 3588.017951] dump_stack+0x63/0x85 >>>>>>> [ 3588.017953] __report_bad_irq+0x35/0xc0 >>>>>>> [ 3588.017955] note_interrupt+0x24b/0x2a0 >>>>>>> [ 3588.017956] handle_irq_event_percpu+0x54/0x80 >>>>>>> [ 3588.017957] handle_irq_event+0x3b/0x60 >>>>>>> [ 3588.017958] handle_edge_irq+0x83/0x1a0 >>>>>>> [ 3588.017961] handle_irq+0x20/0x30 >>>>>>> [ 3588.017964] do_IRQ+0x50/0xe0 >>>>>>> [ 3588.017966] common_interrupt+0xf/0xf >>>>>>> [ 3588.017966] </IRQ> >>>>>>> [ 3588.017989] handlers: >>>>>>> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt >>>>>>> [ 3588.025099] Disabling IRQ #38 >>>>>>> >>>>>>> This patch adds a new param to struct vring_virtqueue, and we set it for >>>>>>> tx virtqueues if napi-tx is enabled, to suppress the warning in such >>>>>>> case. >>>>>>> >>>>>>> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") >>>>>>> Reported-by: Rick Jones <jonesrick@google.com> >>>>>>> Signed-off-by: Wei Wang <weiwan@google.com> >>>>>>> Signed-off-by: Willem de Bruijn <willemb@google.com> >>>>>> Please use get_maintainer.pl to make sure Michael and me were cced. >>>>> Will do. Sorry about that. I suggested just the virtualization list, my bad. >>>>> >>>>>>> --- >>>>>>> drivers/net/virtio_net.c | 19 ++++++++++++++----- >>>>>>> drivers/virtio/virtio_ring.c | 16 ++++++++++++++++ >>>>>>> include/linux/virtio.h | 2 ++ >>>>>>> 3 files changed, 32 insertions(+), 5 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>>>>>> index 508408fbe78f..e9a3f30864e8 100644 >>>>>>> --- a/drivers/net/virtio_net.c >>>>>>> +++ b/drivers/net/virtio_net.c >>>>>>> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi, >>>>>>> return; >>>>>>> } >>>>>>> >>>>>>> + /* With napi_tx enabled, free_old_xmit_skbs() could be called from >>>>>>> + * rx napi handler. Set work_steal to suppress bad irq warning for >>>>>>> + * IRQ_NONE case from tx complete interrupt handler. >>>>>>> + */ >>>>>>> + virtqueue_set_work_steal(vq, true); >>>>>>> + >>>>>>> return virtnet_napi_enable(vq, napi); >>>>>> Do we need to force the ordering between steal set and napi enable? >>>>> The warning only occurs after one hundred spurious interrupts, so not >>>>> really. >>>> Ok, so it looks like a hint. Then I wonder how much value do we need to >>>> introduce helper like virtqueue_set_work_steal() that allows the caller >>>> to toggle. How about disable the check forever during virtqueue >>>> initialization? >>> Yes, that is even simpler. >>> >>> We still need the helper, as the internal variables of vring_virtqueue >>> are not accessible from virtio-net. An earlier patch added the >>> variable to virtqueue itself, but I think it belongs in >>> vring_virtqueue. And the helper is not a lot of code. >> >> It's better to do this before the allocating the irq. But it looks not >> easy unless we extend find_vqs(). > Can you elaborate why that is better? At virtnet_open the interrupts > are not firing either. I think you meant NAPI actually? > > I have no preference. Just curious, especially if it complicates the patch. > My understanding is that. It's probably ok for net. But we probably need to document the assumptions to make sure it was not abused in other drivers. Introduce new parameters for find_vqs() can help to eliminate the subtle stuffs but I agree it looks like a overkill. (Btw, I forget the numbers but wonder how much difference if we simple remove the free_old_xmits() from the rx NAPI path?) Thanks
On Sun, Feb 7, 2021 at 10:29 PM Jason Wang <jasowang@redhat.com> wrote: > > > On 2021/2/5 上午4:50, Willem de Bruijn wrote: > > On Wed, Feb 3, 2021 at 10:06 PM Jason Wang <jasowang@redhat.com> wrote: > >> > >> On 2021/2/4 上午2:28, Willem de Bruijn wrote: > >>> On Wed, Feb 3, 2021 at 12:33 AM Jason Wang <jasowang@redhat.com> wrote: > >>>> On 2021/2/2 下午10:37, Willem de Bruijn wrote: > >>>>> On Mon, Feb 1, 2021 at 10:09 PM Jason Wang <jasowang@redhat.com> wrote: > >>>>>> On 2021/1/29 上午8:21, Wei Wang wrote: > >>>>>>> With the implementation of napi-tx in virtio driver, we clean tx > >>>>>>> descriptors from rx napi handler, for the purpose of reducing tx > >>>>>>> complete interrupts. But this could introduce a race where tx complete > >>>>>>> interrupt has been raised, but the handler found there is no work to do > >>>>>>> because we have done the work in the previous rx interrupt handler. > >>>>>>> This could lead to the following warning msg: > >>>>>>> [ 3588.010778] irq 38: nobody cared (try booting with the > >>>>>>> "irqpoll" option) > >>>>>>> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > >>>>>>> 5.3.0-19-generic #20~18.04.2-Ubuntu > >>>>>>> [ 3588.017940] Call Trace: > >>>>>>> [ 3588.017942] <IRQ> > >>>>>>> [ 3588.017951] dump_stack+0x63/0x85 > >>>>>>> [ 3588.017953] __report_bad_irq+0x35/0xc0 > >>>>>>> [ 3588.017955] note_interrupt+0x24b/0x2a0 > >>>>>>> [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > >>>>>>> [ 3588.017957] handle_irq_event+0x3b/0x60 > >>>>>>> [ 3588.017958] handle_edge_irq+0x83/0x1a0 > >>>>>>> [ 3588.017961] handle_irq+0x20/0x30 > >>>>>>> [ 3588.017964] do_IRQ+0x50/0xe0 > >>>>>>> [ 3588.017966] common_interrupt+0xf/0xf > >>>>>>> [ 3588.017966] </IRQ> > >>>>>>> [ 3588.017989] handlers: > >>>>>>> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > >>>>>>> [ 3588.025099] Disabling IRQ #38 > >>>>>>> > >>>>>>> This patch adds a new param to struct vring_virtqueue, and we set it for > >>>>>>> tx virtqueues if napi-tx is enabled, to suppress the warning in such > >>>>>>> case. > >>>>>>> > >>>>>>> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > >>>>>>> Reported-by: Rick Jones <jonesrick@google.com> > >>>>>>> Signed-off-by: Wei Wang <weiwan@google.com> > >>>>>>> Signed-off-by: Willem de Bruijn <willemb@google.com> > >>>>>> Please use get_maintainer.pl to make sure Michael and me were cced. > >>>>> Will do. Sorry about that. I suggested just the virtualization list, my bad. > >>>>> > >>>>>>> --- > >>>>>>> drivers/net/virtio_net.c | 19 ++++++++++++++----- > >>>>>>> drivers/virtio/virtio_ring.c | 16 ++++++++++++++++ > >>>>>>> include/linux/virtio.h | 2 ++ > >>>>>>> 3 files changed, 32 insertions(+), 5 deletions(-) > >>>>>>> > >>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >>>>>>> index 508408fbe78f..e9a3f30864e8 100644 > >>>>>>> --- a/drivers/net/virtio_net.c > >>>>>>> +++ b/drivers/net/virtio_net.c > >>>>>>> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi, > >>>>>>> return; > >>>>>>> } > >>>>>>> > >>>>>>> + /* With napi_tx enabled, free_old_xmit_skbs() could be called from > >>>>>>> + * rx napi handler. Set work_steal to suppress bad irq warning for > >>>>>>> + * IRQ_NONE case from tx complete interrupt handler. > >>>>>>> + */ > >>>>>>> + virtqueue_set_work_steal(vq, true); > >>>>>>> + > >>>>>>> return virtnet_napi_enable(vq, napi); > >>>>>> Do we need to force the ordering between steal set and napi enable? > >>>>> The warning only occurs after one hundred spurious interrupts, so not > >>>>> really. > >>>> Ok, so it looks like a hint. Then I wonder how much value do we need to > >>>> introduce helper like virtqueue_set_work_steal() that allows the caller > >>>> to toggle. How about disable the check forever during virtqueue > >>>> initialization? > >>> Yes, that is even simpler. > >>> > >>> We still need the helper, as the internal variables of vring_virtqueue > >>> are not accessible from virtio-net. An earlier patch added the > >>> variable to virtqueue itself, but I think it belongs in > >>> vring_virtqueue. And the helper is not a lot of code. > >> > >> It's better to do this before the allocating the irq. But it looks not > >> easy unless we extend find_vqs(). > > Can you elaborate why that is better? At virtnet_open the interrupts > > are not firing either. > > > I think you meant NAPI actually? I meant interrupt: we don't have to worry about the spurious interrupt warning when no interrupts will be firing. Until virtnet_open completes, the device is down. > > > > > I have no preference. Just curious, especially if it complicates the patch. > > > > My understanding is that. It's probably ok for net. But we probably need > to document the assumptions to make sure it was not abused in other drivers. > > Introduce new parameters for find_vqs() can help to eliminate the subtle > stuffs but I agree it looks like a overkill. > > (Btw, I forget the numbers but wonder how much difference if we simple > remove the free_old_xmits() from the rx NAPI path?) The committed patchset did not record those numbers, but I found them in an earlier iteration: [PATCH net-next 0/3] virtio-net tx napi https://lists.openwall.net/netdev/2017/04/02/55 It did seem to significantly reduce compute cycles ("Gcyc") at the time. For instance: TCP_RR Latency (us): 1x: p50 24 24 21 p99 27 27 27 Gcycles 299 432 308 I'm concerned that removing it now may cause a regression report in a few months. That is higher risk than the spurious interrupt warning that was only reported after years of use.
On 2021/2/9 上午3:08, Willem de Bruijn wrote: > On Sun, Feb 7, 2021 at 10:29 PM Jason Wang <jasowang@redhat.com> wrote: >> >> On 2021/2/5 上午4:50, Willem de Bruijn wrote: >>> On Wed, Feb 3, 2021 at 10:06 PM Jason Wang <jasowang@redhat.com> wrote: >>>> On 2021/2/4 上午2:28, Willem de Bruijn wrote: >>>>> On Wed, Feb 3, 2021 at 12:33 AM Jason Wang <jasowang@redhat.com> wrote: >>>>>> On 2021/2/2 下午10:37, Willem de Bruijn wrote: >>>>>>> On Mon, Feb 1, 2021 at 10:09 PM Jason Wang <jasowang@redhat.com> wrote: >>>>>>>> On 2021/1/29 上午8:21, Wei Wang wrote: >>>>>>>>> With the implementation of napi-tx in virtio driver, we clean tx >>>>>>>>> descriptors from rx napi handler, for the purpose of reducing tx >>>>>>>>> complete interrupts. But this could introduce a race where tx complete >>>>>>>>> interrupt has been raised, but the handler found there is no work to do >>>>>>>>> because we have done the work in the previous rx interrupt handler. >>>>>>>>> This could lead to the following warning msg: >>>>>>>>> [ 3588.010778] irq 38: nobody cared (try booting with the >>>>>>>>> "irqpoll" option) >>>>>>>>> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted >>>>>>>>> 5.3.0-19-generic #20~18.04.2-Ubuntu >>>>>>>>> [ 3588.017940] Call Trace: >>>>>>>>> [ 3588.017942] <IRQ> >>>>>>>>> [ 3588.017951] dump_stack+0x63/0x85 >>>>>>>>> [ 3588.017953] __report_bad_irq+0x35/0xc0 >>>>>>>>> [ 3588.017955] note_interrupt+0x24b/0x2a0 >>>>>>>>> [ 3588.017956] handle_irq_event_percpu+0x54/0x80 >>>>>>>>> [ 3588.017957] handle_irq_event+0x3b/0x60 >>>>>>>>> [ 3588.017958] handle_edge_irq+0x83/0x1a0 >>>>>>>>> [ 3588.017961] handle_irq+0x20/0x30 >>>>>>>>> [ 3588.017964] do_IRQ+0x50/0xe0 >>>>>>>>> [ 3588.017966] common_interrupt+0xf/0xf >>>>>>>>> [ 3588.017966] </IRQ> >>>>>>>>> [ 3588.017989] handlers: >>>>>>>>> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt >>>>>>>>> [ 3588.025099] Disabling IRQ #38 >>>>>>>>> >>>>>>>>> This patch adds a new param to struct vring_virtqueue, and we set it for >>>>>>>>> tx virtqueues if napi-tx is enabled, to suppress the warning in such >>>>>>>>> case. >>>>>>>>> >>>>>>>>> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") >>>>>>>>> Reported-by: Rick Jones <jonesrick@google.com> >>>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com> >>>>>>>>> Signed-off-by: Willem de Bruijn <willemb@google.com> >>>>>>>> Please use get_maintainer.pl to make sure Michael and me were cced. >>>>>>> Will do. Sorry about that. I suggested just the virtualization list, my bad. >>>>>>> >>>>>>>>> --- >>>>>>>>> drivers/net/virtio_net.c | 19 ++++++++++++++----- >>>>>>>>> drivers/virtio/virtio_ring.c | 16 ++++++++++++++++ >>>>>>>>> include/linux/virtio.h | 2 ++ >>>>>>>>> 3 files changed, 32 insertions(+), 5 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>>>>>>>> index 508408fbe78f..e9a3f30864e8 100644 >>>>>>>>> --- a/drivers/net/virtio_net.c >>>>>>>>> +++ b/drivers/net/virtio_net.c >>>>>>>>> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi, >>>>>>>>> return; >>>>>>>>> } >>>>>>>>> >>>>>>>>> + /* With napi_tx enabled, free_old_xmit_skbs() could be called from >>>>>>>>> + * rx napi handler. Set work_steal to suppress bad irq warning for >>>>>>>>> + * IRQ_NONE case from tx complete interrupt handler. >>>>>>>>> + */ >>>>>>>>> + virtqueue_set_work_steal(vq, true); >>>>>>>>> + >>>>>>>>> return virtnet_napi_enable(vq, napi); >>>>>>>> Do we need to force the ordering between steal set and napi enable? >>>>>>> The warning only occurs after one hundred spurious interrupts, so not >>>>>>> really. >>>>>> Ok, so it looks like a hint. Then I wonder how much value do we need to >>>>>> introduce helper like virtqueue_set_work_steal() that allows the caller >>>>>> to toggle. How about disable the check forever during virtqueue >>>>>> initialization? >>>>> Yes, that is even simpler. >>>>> >>>>> We still need the helper, as the internal variables of vring_virtqueue >>>>> are not accessible from virtio-net. An earlier patch added the >>>>> variable to virtqueue itself, but I think it belongs in >>>>> vring_virtqueue. And the helper is not a lot of code. >>>> It's better to do this before the allocating the irq. But it looks not >>>> easy unless we extend find_vqs(). >>> Can you elaborate why that is better? At virtnet_open the interrupts >>> are not firing either. >> >> I think you meant NAPI actually? > I meant interrupt: we don't have to worry about the spurious interrupt > warning when no interrupts will be firing. Until virtnet_open > completes, the device is down. Ok. > > >>> I have no preference. Just curious, especially if it complicates the patch. >>> >> My understanding is that. It's probably ok for net. But we probably need >> to document the assumptions to make sure it was not abused in other drivers. >> >> Introduce new parameters for find_vqs() can help to eliminate the subtle >> stuffs but I agree it looks like a overkill. >> >> (Btw, I forget the numbers but wonder how much difference if we simple >> remove the free_old_xmits() from the rx NAPI path?) > The committed patchset did not record those numbers, but I found them > in an earlier iteration: > > [PATCH net-next 0/3] virtio-net tx napi > https://lists.openwall.net/netdev/2017/04/02/55 > > It did seem to significantly reduce compute cycles ("Gcyc") at the > time. For instance: > > TCP_RR Latency (us): > 1x: > p50 24 24 21 > p99 27 27 27 > Gcycles 299 432 308 > > I'm concerned that removing it now may cause a regression report in a > few months. That is higher risk than the spurious interrupt warning > that was only reported after years of use. Right. So if Michael is fine with this approach, I'm ok with it. But we probably need to a TODO to invent the interrupt handlers that can be used for more than one virtqueues. When MSI-X is enabled, the interrupt handler (vring_interrup()) assumes the interrupt is used by a single virtqueue. Thanks >
> >>> I have no preference. Just curious, especially if it complicates the patch. > >>> > >> My understanding is that. It's probably ok for net. But we probably need > >> to document the assumptions to make sure it was not abused in other drivers. > >> > >> Introduce new parameters for find_vqs() can help to eliminate the subtle > >> stuffs but I agree it looks like a overkill. > >> > >> (Btw, I forget the numbers but wonder how much difference if we simple > >> remove the free_old_xmits() from the rx NAPI path?) > > The committed patchset did not record those numbers, but I found them > > in an earlier iteration: > > > > [PATCH net-next 0/3] virtio-net tx napi > > https://lists.openwall.net/netdev/2017/04/02/55 > > > > It did seem to significantly reduce compute cycles ("Gcyc") at the > > time. For instance: > > > > TCP_RR Latency (us): > > 1x: > > p50 24 24 21 > > p99 27 27 27 > > Gcycles 299 432 308 > > > > I'm concerned that removing it now may cause a regression report in a > > few months. That is higher risk than the spurious interrupt warning > > that was only reported after years of use. > > > Right. > > So if Michael is fine with this approach, I'm ok with it. But we > probably need to a TODO to invent the interrupt handlers that can be > used for more than one virtqueues. When MSI-X is enabled, the interrupt > handler (vring_interrup()) assumes the interrupt is used by a single > virtqueue. Thanks. The approach to schedule tx-napi from virtnet_poll_cleantx instead of cleaning directly in this rx-napi function was not effective at suppressing the warning, I understand. It should be easy to drop the spurious rate to a little under 99% percent, if only to suppress the warning. By probabilistically cleaning in virtnet_poll_cleantx only every 98/100, say. But that really is a hack. There does seem to be a huge potential for cycle savings if we can really avoid the many spurious interrupts. As scheduling napi_tx from virtnet_poll_cleantx is not effective, agreed that we should probably look at the more complete solution to handle both tx and rx virtqueues from the same IRQ. And revert this explicit warning suppression patch once we have that.
On Tue, Feb 9, 2021 at 6:58 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > > >>> I have no preference. Just curious, especially if it complicates the patch. > > >>> > > >> My understanding is that. It's probably ok for net. But we probably need > > >> to document the assumptions to make sure it was not abused in other drivers. > > >> > > >> Introduce new parameters for find_vqs() can help to eliminate the subtle > > >> stuffs but I agree it looks like a overkill. > > >> > > >> (Btw, I forget the numbers but wonder how much difference if we simple > > >> remove the free_old_xmits() from the rx NAPI path?) > > > The committed patchset did not record those numbers, but I found them > > > in an earlier iteration: > > > > > > [PATCH net-next 0/3] virtio-net tx napi > > > https://lists.openwall.net/netdev/2017/04/02/55 > > > > > > It did seem to significantly reduce compute cycles ("Gcyc") at the > > > time. For instance: > > > > > > TCP_RR Latency (us): > > > 1x: > > > p50 24 24 21 > > > p99 27 27 27 > > > Gcycles 299 432 308 > > > > > > I'm concerned that removing it now may cause a regression report in a > > > few months. That is higher risk than the spurious interrupt warning > > > that was only reported after years of use. > > > > > > Right. > > > > So if Michael is fine with this approach, I'm ok with it. But we > > probably need to a TODO to invent the interrupt handlers that can be > > used for more than one virtqueues. When MSI-X is enabled, the interrupt > > handler (vring_interrup()) assumes the interrupt is used by a single > > virtqueue. > > Thanks. > > The approach to schedule tx-napi from virtnet_poll_cleantx instead of > cleaning directly in this rx-napi function was not effective at > suppressing the warning, I understand. Correct. I tried the approach to schedule tx napi instead of directly do free_old_xmit_skbs() in virtnet_poll_cleantx(). But the warning still happens. > > It should be easy to drop the spurious rate to a little under 99% > percent, if only to suppress the warning. By probabilistically > cleaning in virtnet_poll_cleantx only every 98/100, say. But that > really is a hack. > > There does seem to be a huge potential for cycle savings if we can > really avoid the many spurious interrupts. > > As scheduling napi_tx from virtnet_poll_cleantx is not effective, > agreed that we should probably look at the more complete solution to > handle both tx and rx virtqueues from the same IRQ. > > And revert this explicit warning suppression patch once we have that. If everyone agrees, I will submit a new version of this patch to address previous comments, before a more complete solution is proposed.
On Tue, Feb 09, 2021 at 10:00:22AM -0800, Wei Wang wrote: > On Tue, Feb 9, 2021 at 6:58 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > > >>> I have no preference. Just curious, especially if it complicates the patch. > > > >>> > > > >> My understanding is that. It's probably ok for net. But we probably need > > > >> to document the assumptions to make sure it was not abused in other drivers. > > > >> > > > >> Introduce new parameters for find_vqs() can help to eliminate the subtle > > > >> stuffs but I agree it looks like a overkill. > > > >> > > > >> (Btw, I forget the numbers but wonder how much difference if we simple > > > >> remove the free_old_xmits() from the rx NAPI path?) > > > > The committed patchset did not record those numbers, but I found them > > > > in an earlier iteration: > > > > > > > > [PATCH net-next 0/3] virtio-net tx napi > > > > https://lists.openwall.net/netdev/2017/04/02/55 > > > > > > > > It did seem to significantly reduce compute cycles ("Gcyc") at the > > > > time. For instance: > > > > > > > > TCP_RR Latency (us): > > > > 1x: > > > > p50 24 24 21 > > > > p99 27 27 27 > > > > Gcycles 299 432 308 > > > > > > > > I'm concerned that removing it now may cause a regression report in a > > > > few months. That is higher risk than the spurious interrupt warning > > > > that was only reported after years of use. > > > > > > > > > Right. > > > > > > So if Michael is fine with this approach, I'm ok with it. But we > > > probably need to a TODO to invent the interrupt handlers that can be > > > used for more than one virtqueues. When MSI-X is enabled, the interrupt > > > handler (vring_interrup()) assumes the interrupt is used by a single > > > virtqueue. > > > > Thanks. > > > > The approach to schedule tx-napi from virtnet_poll_cleantx instead of > > cleaning directly in this rx-napi function was not effective at > > suppressing the warning, I understand. > > Correct. I tried the approach to schedule tx napi instead of directly > do free_old_xmit_skbs() in virtnet_poll_cleantx(). But the warning > still happens. Two questions here: is the device using packed or split vqs? And is event index enabled? I think one issue is that at the moment with split and event index we don't actually disable events at all. static void virtqueue_disable_cb_split(struct virtqueue *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) { vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT; if (!vq->event) vq->split.vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->split.avail_flags_shadow); } } Can you try your napi patch + disable event index? -- MST
On Wed, Feb 10, 2021 at 1:14 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Feb 09, 2021 at 10:00:22AM -0800, Wei Wang wrote: > > On Tue, Feb 9, 2021 at 6:58 AM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > >>> I have no preference. Just curious, especially if it complicates the patch. > > > > >>> > > > > >> My understanding is that. It's probably ok for net. But we probably need > > > > >> to document the assumptions to make sure it was not abused in other drivers. > > > > >> > > > > >> Introduce new parameters for find_vqs() can help to eliminate the subtle > > > > >> stuffs but I agree it looks like a overkill. > > > > >> > > > > >> (Btw, I forget the numbers but wonder how much difference if we simple > > > > >> remove the free_old_xmits() from the rx NAPI path?) > > > > > The committed patchset did not record those numbers, but I found them > > > > > in an earlier iteration: > > > > > > > > > > [PATCH net-next 0/3] virtio-net tx napi > > > > > https://lists.openwall.net/netdev/2017/04/02/55 > > > > > > > > > > It did seem to significantly reduce compute cycles ("Gcyc") at the > > > > > time. For instance: > > > > > > > > > > TCP_RR Latency (us): > > > > > 1x: > > > > > p50 24 24 21 > > > > > p99 27 27 27 > > > > > Gcycles 299 432 308 > > > > > > > > > > I'm concerned that removing it now may cause a regression report in a > > > > > few months. That is higher risk than the spurious interrupt warning > > > > > that was only reported after years of use. > > > > > > > > > > > > Right. > > > > > > > > So if Michael is fine with this approach, I'm ok with it. But we > > > > probably need to a TODO to invent the interrupt handlers that can be > > > > used for more than one virtqueues. When MSI-X is enabled, the interrupt > > > > handler (vring_interrup()) assumes the interrupt is used by a single > > > > virtqueue. > > > > > > Thanks. > > > > > > The approach to schedule tx-napi from virtnet_poll_cleantx instead of > > > cleaning directly in this rx-napi function was not effective at > > > suppressing the warning, I understand. > > > > Correct. I tried the approach to schedule tx napi instead of directly > > do free_old_xmit_skbs() in virtnet_poll_cleantx(). But the warning > > still happens. > > Two questions here: is the device using packed or split vqs? > And is event index enabled? > The device is indeed using split vqs with event index enabled. > I think one issue is that at the moment with split and event index we > don't actually disable events at all. > You mean we don't disable 'interrupts' right? What is the reason for that? > static void virtqueue_disable_cb_split(struct virtqueue *_vq) > { > struct vring_virtqueue *vq = to_vvq(_vq); > > if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) { > vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT; > if (!vq->event) > vq->split.vring.avail->flags = > cpu_to_virtio16(_vq->vdev, > vq->split.avail_flags_shadow); > } > } > > Can you try your napi patch + disable event index? > Thanks for the suggestion. I've run the reproducer with napi patch + disable event index, and so far, I did not see the warning getting triggered. Will keep it running for a bit longer. > > -- > MST >
On 2021/2/10 下午5:14, Michael S. Tsirkin wrote: > On Tue, Feb 09, 2021 at 10:00:22AM -0800, Wei Wang wrote: >> On Tue, Feb 9, 2021 at 6:58 AM Willem de Bruijn >> <willemdebruijn.kernel@gmail.com> wrote: >>>>>>> I have no preference. Just curious, especially if it complicates the patch. >>>>>>> >>>>>> My understanding is that. It's probably ok for net. But we probably need >>>>>> to document the assumptions to make sure it was not abused in other drivers. >>>>>> >>>>>> Introduce new parameters for find_vqs() can help to eliminate the subtle >>>>>> stuffs but I agree it looks like a overkill. >>>>>> >>>>>> (Btw, I forget the numbers but wonder how much difference if we simple >>>>>> remove the free_old_xmits() from the rx NAPI path?) >>>>> The committed patchset did not record those numbers, but I found them >>>>> in an earlier iteration: >>>>> >>>>> [PATCH net-next 0/3] virtio-net tx napi >>>>> https://lists.openwall.net/netdev/2017/04/02/55 >>>>> >>>>> It did seem to significantly reduce compute cycles ("Gcyc") at the >>>>> time. For instance: >>>>> >>>>> TCP_RR Latency (us): >>>>> 1x: >>>>> p50 24 24 21 >>>>> p99 27 27 27 >>>>> Gcycles 299 432 308 >>>>> >>>>> I'm concerned that removing it now may cause a regression report in a >>>>> few months. That is higher risk than the spurious interrupt warning >>>>> that was only reported after years of use. >>>> >>>> Right. >>>> >>>> So if Michael is fine with this approach, I'm ok with it. But we >>>> probably need to a TODO to invent the interrupt handlers that can be >>>> used for more than one virtqueues. When MSI-X is enabled, the interrupt >>>> handler (vring_interrup()) assumes the interrupt is used by a single >>>> virtqueue. >>> Thanks. >>> >>> The approach to schedule tx-napi from virtnet_poll_cleantx instead of >>> cleaning directly in this rx-napi function was not effective at >>> suppressing the warning, I understand. >> Correct. I tried the approach to schedule tx napi instead of directly >> do free_old_xmit_skbs() in virtnet_poll_cleantx(). But the warning >> still happens. > Two questions here: is the device using packed or split vqs? > And is event index enabled? > > I think one issue is that at the moment with split and event index we > don't actually disable events at all. Do we really have a way to disable that? (We don't have a flag like packed virtqueue) Or you mean the trick [1] when I post tx interrupt RFC? Thanks [1] https://lkml.org/lkml/2015/2/9/113 > > static void virtqueue_disable_cb_split(struct virtqueue *_vq) > { > struct vring_virtqueue *vq = to_vvq(_vq); > > if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) { > vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT; > if (!vq->event) > vq->split.vring.avail->flags = > cpu_to_virtio16(_vq->vdev, > vq->split.avail_flags_shadow); > } > } > > Can you try your napi patch + disable event index? > >
On Thu, Feb 18, 2021 at 01:39:19PM +0800, Jason Wang wrote: > > On 2021/2/10 下午5:14, Michael S. Tsirkin wrote: > > On Tue, Feb 09, 2021 at 10:00:22AM -0800, Wei Wang wrote: > > > On Tue, Feb 9, 2021 at 6:58 AM Willem de Bruijn > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > I have no preference. Just curious, especially if it complicates the patch. > > > > > > > > > > > > > > > My understanding is that. It's probably ok for net. But we probably need > > > > > > > to document the assumptions to make sure it was not abused in other drivers. > > > > > > > > > > > > > > Introduce new parameters for find_vqs() can help to eliminate the subtle > > > > > > > stuffs but I agree it looks like a overkill. > > > > > > > > > > > > > > (Btw, I forget the numbers but wonder how much difference if we simple > > > > > > > remove the free_old_xmits() from the rx NAPI path?) > > > > > > The committed patchset did not record those numbers, but I found them > > > > > > in an earlier iteration: > > > > > > > > > > > > [PATCH net-next 0/3] virtio-net tx napi > > > > > > https://lists.openwall.net/netdev/2017/04/02/55 > > > > > > > > > > > > It did seem to significantly reduce compute cycles ("Gcyc") at the > > > > > > time. For instance: > > > > > > > > > > > > TCP_RR Latency (us): > > > > > > 1x: > > > > > > p50 24 24 21 > > > > > > p99 27 27 27 > > > > > > Gcycles 299 432 308 > > > > > > > > > > > > I'm concerned that removing it now may cause a regression report in a > > > > > > few months. That is higher risk than the spurious interrupt warning > > > > > > that was only reported after years of use. > > > > > > > > > > Right. > > > > > > > > > > So if Michael is fine with this approach, I'm ok with it. But we > > > > > probably need to a TODO to invent the interrupt handlers that can be > > > > > used for more than one virtqueues. When MSI-X is enabled, the interrupt > > > > > handler (vring_interrup()) assumes the interrupt is used by a single > > > > > virtqueue. > > > > Thanks. > > > > > > > > The approach to schedule tx-napi from virtnet_poll_cleantx instead of > > > > cleaning directly in this rx-napi function was not effective at > > > > suppressing the warning, I understand. > > > Correct. I tried the approach to schedule tx napi instead of directly > > > do free_old_xmit_skbs() in virtnet_poll_cleantx(). But the warning > > > still happens. > > Two questions here: is the device using packed or split vqs? > > And is event index enabled? > > > > I think one issue is that at the moment with split and event index we > > don't actually disable events at all. > > > Do we really have a way to disable that? (We don't have a flag like packed > virtqueue) > > Or you mean the trick [1] when I post tx interrupt RFC? > > Thanks > > [1] https://lkml.org/lkml/2015/2/9/113 Something like this. Or basically any other value will do, e.g. move the index back to a value just signalled ... > > > > > static void virtqueue_disable_cb_split(struct virtqueue *_vq) > > { > > struct vring_virtqueue *vq = to_vvq(_vq); > > > > if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) { > > vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT; > > if (!vq->event) > > vq->split.vring.avail->flags = > > cpu_to_virtio16(_vq->vdev, > > vq->split.avail_flags_shadow); > > } > > } > > > > Can you try your napi patch + disable event index? > > > >
OK I started looking at this again. My idea is simple.
A. disable callbacks before we try to drain skbs
B. actually do disable callbacks even with event idx
To make B not regress, we need to
C. detect the common case of disable after event triggering and skip the write then.
I added a new event_triggered flag for that.
Completely untested - but then I could not see the warnings either.
Would be very much interested to know whether this patch helps
resolve the sruprious interrupt problem at all ...
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 82e520d2cb12..a91a2d6d1ee3 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1429,6 +1429,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
return;
if (__netif_tx_trylock(txq)) {
+ virtqueue_disable_cb(vq);
free_old_xmit_skbs(sq, true);
__netif_tx_unlock(txq);
}
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71e16b53e9c1..213bfe8b6051 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -113,6 +113,9 @@ struct vring_virtqueue {
/* Last used index we've seen. */
u16 last_used_idx;
+ /* Hint for event idx: already triggered no need to disable. */
+ bool event_triggered;
+
union {
/* Available for split ring */
struct {
@@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
- if (!vq->event)
+ if (vq->event)
+ /* TODO: this is a hack. Figure out a cleaner value to write. */
+ vring_used_event(&vq->split.vring) = 0x0;
+ else
vq->split.vring.avail->flags =
cpu_to_virtio16(_vq->vdev,
vq->split.avail_flags_shadow);
@@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
vq->weak_barriers = weak_barriers;
vq->broken = false;
vq->last_used_idx = 0;
+ vq->event_triggered = false;
vq->num_added = 0;
vq->packed_ring = true;
vq->use_dma_api = vring_use_dma_api(vdev);
@@ -1919,6 +1926,14 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
+ /* If device triggered an event already it won't trigger one again:
+ * no need to disable.
+ */
+ if (vq->event_triggered) {
+ vq->event_triggered = false;
+ return;
+ }
+
if (vq->packed_ring)
virtqueue_disable_cb_packed(_vq);
else
@@ -2044,6 +2059,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
if (unlikely(vq->broken))
return IRQ_HANDLED;
+ /* Just a hint for performance: so it's ok that this can be racy! */
+ if (vq->event)
+ vq->event_triggered = true;
+
pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
if (vq->vq.callback)
vq->vq.callback(&vq->vq);
@@ -2083,6 +2102,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
vq->weak_barriers = weak_barriers;
vq->broken = false;
vq->last_used_idx = 0;
+ vq->event_triggered = false;
vq->num_added = 0;
vq->use_dma_api = vring_use_dma_api(vdev);
#ifdef DEBUG
On Mon, Apr 12, 2021 at 06:08:21PM -0400, Michael S. Tsirkin wrote: > OK I started looking at this again. My idea is simple. > A. disable callbacks before we try to drain skbs > B. actually do disable callbacks even with event idx > > To make B not regress, we need to > C. detect the common case of disable after event triggering and skip the write then. > > I added a new event_triggered flag for that. > Completely untested - but then I could not see the warnings either. > Would be very much interested to know whether this patch helps > resolve the sruprious interrupt problem at all ... > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Hmm a slightly cleaner alternative is to clear the flag when enabling interrupts ... I wonder which cacheline it's best to use for this. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 82e520d2cb12..c23341b18eb5 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1429,6 +1429,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) return; if (__netif_tx_trylock(txq)) { + virtqueue_disable_cb(vq); free_old_xmit_skbs(sq, true); __netif_tx_unlock(txq); } diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 71e16b53e9c1..88f0b16b11b8 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -113,6 +113,9 @@ struct vring_virtqueue { /* Last used index we've seen. */ u16 last_used_idx; + /* Hint for event idx: already triggered no need to disable. */ + bool event_triggered; + union { /* Available for split ring */ struct { @@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq) if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) { vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT; - if (!vq->event) + if (vq->event) + /* TODO: this is a hack. Figure out a cleaner value to write. */ + vring_used_event(&vq->split.vring) = 0x0; + else vq->split.vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->split.avail_flags_shadow); @@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed( vq->weak_barriers = weak_barriers; vq->broken = false; vq->last_used_idx = 0; + vq->event_triggered = false; vq->num_added = 0; vq->packed_ring = true; vq->use_dma_api = vring_use_dma_api(vdev); @@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); + /* If device triggered an event already it won't trigger one again: + * no need to disable. + */ + if (vq->event_triggered) + return; + if (vq->packed_ring) virtqueue_disable_cb_packed(_vq); else @@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); + if (vq->event_triggered) + vq->event_triggered = false; + return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) : virtqueue_enable_cb_prepare_split(_vq); } @@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); + if (vq->event_triggered) + vq->event_triggered = false; + return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) : virtqueue_enable_cb_delayed_split(_vq); } @@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq) if (unlikely(vq->broken)) return IRQ_HANDLED; + /* Just a hint for performance: so it's ok that this can be racy! */ + if (vq->event) + vq->event_triggered = true; + pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback); if (vq->vq.callback) vq->vq.callback(&vq->vq); @@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index, vq->weak_barriers = weak_barriers; vq->broken = false; vq->last_used_idx = 0; + vq->event_triggered = false; vq->num_added = 0; vq->use_dma_api = vring_use_dma_api(vdev); #ifdef DEBUG
From: "Michael S. Tsirkin" <mst@redhat.com> Date: Mon, 12 Apr 2021 18:33:45 -0400 > On Mon, Apr 12, 2021 at 06:08:21PM -0400, Michael S. Tsirkin wrote: >> OK I started looking at this again. My idea is simple. >> A. disable callbacks before we try to drain skbs >> B. actually do disable callbacks even with event idx >> >> To make B not regress, we need to >> C. detect the common case of disable after event triggering and skip the write then. >> >> I added a new event_triggered flag for that. >> Completely untested - but then I could not see the warnings either. >> Would be very much interested to know whether this patch helps >> resolve the sruprious interrupt problem at all ... >> >> >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > Hmm a slightly cleaner alternative is to clear the flag when enabling interrupts ... > I wonder which cacheline it's best to use for this. > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Please make a fresh new submission if you want to use this approach, thanks.
On Mon, Apr 12, 2021 at 04:14:58PM -0700, David Miller wrote: > From: "Michael S. Tsirkin" <mst@redhat.com> > Date: Mon, 12 Apr 2021 18:33:45 -0400 > > > On Mon, Apr 12, 2021 at 06:08:21PM -0400, Michael S. Tsirkin wrote: > >> OK I started looking at this again. My idea is simple. > >> A. disable callbacks before we try to drain skbs > >> B. actually do disable callbacks even with event idx > >> > >> To make B not regress, we need to > >> C. detect the common case of disable after event triggering and skip the write then. > >> > >> I added a new event_triggered flag for that. > >> Completely untested - but then I could not see the warnings either. > >> Would be very much interested to know whether this patch helps > >> resolve the sruprious interrupt problem at all ... > >> > >> > >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > > Hmm a slightly cleaner alternative is to clear the flag when enabling interrupts ... > > I wonder which cacheline it's best to use for this. > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > Please make a fresh new submission if you want to use this approach, thanks. Absolutely. This is untested so I just sent this idea out for early feedback and hopefully help with testing on real hardware. Sorry about being unclear. -- MST
On Fri, Feb 05, 2021 at 02:28:33PM -0800, Wei Wang wrote: > On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang <weiwan@google.com> wrote: > > > > > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote: > > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote: > > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote: > > > > > > > > > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote: > > > > > > > > > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote: > > > > > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx > > > > > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx > > > > > > > > > > > complete interrupts. But this could introduce a race where tx complete > > > > > > > > > > > interrupt has been raised, but the handler found there is no work to do > > > > > > > > > > > because we have done the work in the previous rx interrupt handler. > > > > > > > > > > > This could lead to the following warning msg: > > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the > > > > > > > > > > > "irqpoll" option) > > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu > > > > > > > > > > > [ 3588.017940] Call Trace: > > > > > > > > > > > [ 3588.017942] <IRQ> > > > > > > > > > > > [ 3588.017951] dump_stack+0x63/0x85 > > > > > > > > > > > [ 3588.017953] __report_bad_irq+0x35/0xc0 > > > > > > > > > > > [ 3588.017955] note_interrupt+0x24b/0x2a0 > > > > > > > > > > > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > > > > > > > > > > > [ 3588.017957] handle_irq_event+0x3b/0x60 > > > > > > > > > > > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > > > > > > > > > > > [ 3588.017961] handle_irq+0x20/0x30 > > > > > > > > > > > [ 3588.017964] do_IRQ+0x50/0xe0 > > > > > > > > > > > [ 3588.017966] common_interrupt+0xf/0xf > > > > > > > > > > > [ 3588.017966] </IRQ> > > > > > > > > > > > [ 3588.017989] handlers: > > > > > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > > > > > > > > > > > [ 3588.025099] Disabling IRQ #38 > > > > > > > > > > > > > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for > > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such > > > > > > > > > > > case. > > > > > > > > > > > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > > > > > > > > > > Reported-by: Rick Jones <jonesrick@google.com> > > > > > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com> > > > > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This description does not make sense to me. > > > > > > > > > > > > > > > > > > > > irq X: nobody cared > > > > > > > > > > only triggers after an interrupt is unhandled repeatedly. > > > > > > > > > > > > > > > > > > > > So something causes a storm of useless tx interrupts here. > > > > > > > > > > > > > > > > > > > > Let's find out what it was please. What you are doing is > > > > > > > > > > just preventing linux from complaining. > > > > > > > > > > > > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at > > > > > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the > > > > > > > > > receiving host, which has a lot of rx interrupts firing on all queues, > > > > > > > > > and a few tx interrupts. > > > > > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets > > > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts > > > > > > > > > get triggered very close to each other, and gets handled in one round > > > > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls > > > > > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx() > > > > > > > > > to try to do the work on the corresponding tx queue as well. That's > > > > > > > > > why when tx interrupt handler gets called, it sees no work to do. > > > > > > > > > And the reason for the rx handler to handle the tx work is here: > > > > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html > > > > > > > > > > > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one > > > > > > > > hundred such events, since boot, which is a small number compared real > > > > > > > > interrupt load. > > > > > > > > > > > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from > > > > > > > note_interrupt that applies here. > > > > > > > > > > > > > > > Occasionally seeing an interrupt with no work is expected after > > > > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As > > > > > > > > long as this rate of events is very low compared to useful interrupts, > > > > > > > > and total interrupt count is greatly reduced vs not having work > > > > > > > > stealing, it is a net win. > > > > > > > > > > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is > > > > > > surely an even greater win to disable interrupts while polling like > > > > > > this. Might be tricky to detect, disabling/enabling aggressively every > > > > > > time even if there's nothing in the queue is sure to cause lots of cache > > > > > > line bounces, and we don't want to enable callbacks if they were not > > > > > > enabled e.g. by start_xmit ... Some kind of counter? > > > > > > > > > > Yes. It was known that the work stealing is more effective in some > > > > > workloads than others. But a 99% spurious rate I had not anticipated. > > > > > > > > > > Most interesting is the number of interrupts suppressed as a result of > > > > > the feature. That is not captured by this statistic. > > > > > > > > > > In any case, we'll take a step back to better understand behavior. And > > > > > especially why this high spurious rate exhibits in this workload with > > > > > many concurrent flows. > > > > > > > > > > > > I've been thinking about it. Imagine work stealing working perfectly. > > > > Each time we xmit a packet, it is stolen and freed. > > > > Since xmit enables callbacks (just in case!) we also > > > > get an interrupt, which is automatically spurious. > > > > > > > > My conclusion is that we shouldn't just work around it but instead > > > > (or additionally?) > > > > reduce the number of interrupts by disabling callbacks e.g. when > > > > a. we are currently stealing packets > > > > or > > > > b. we stole all packets > > > > Agreed. This might prove a significant performance gain at the same time :) > > > > > > > > > Thinking along this line, that probably means, we should disable cb on > > > the tx virtqueue, when scheduling the napi work on the rx side, and > > > reenable it after the rx napi work is done? > > > Also, I wonder if it is too late to disable cb at the point we start > > > to steal pkts or have stolen all pkts. > > > > The earlier the better. I see no benefit to delay until the rx handler > > actually runs. > > > > I've been thinking more on this. I think the fundamental issue here is > that the rx napi handler virtnet_poll() does the tx side work by > calling virtnet_poll_cleantx() without any notification to the tx > side. > I am thinking, in virtnet_poll(), instead of directly call > virtnet_poll_cleantx(), why not do virtqueue_napi_schedule() to > schedule the tx side napi, and let the tx napi handler do the cleaning > work. This way, we automatically call virtqueue_disable_cb() on the tx > vq, and after the tx work is done, virtqueue_napi_complete() is called > to re-enable the cb on the tx side. This way, the tx side knows what > has been done, and will likely reduce the # of spurious tx interrupts? > And I don't think there is much cost in doing that, since > napi_schedule() basically queues the tx napi to the back of its > napi_list, and serves it right after the rx napi handler is done. > What do you guys think? I could quickly test it up to see if it solves > the issue. Sure pls test. I think you will want to disable event index for now to make sure disable cb is not a nop (I am working on fixing that). > > > Because the steal work is done > > > in the napi handler of the rx queue. But the tx interrupt must have > > > been raised before that. Will we come back to process the tx interrupt > > > again after we re-enabled the cb on the tx side? > > > > > > > This should be enough to reduce the chances below 99% ;) > > > > > > > > One annoying thing is that with split and event index, we do not disable > > > > interrupts. Could be worth revisiting, for now maybe just disable the > > > > event index feature? I am not sure it is actually worth it with > > > > stealing. > > > > With event index, we suppress interrupts when another interrupt is > > already pending from a previous packet, right? When the previous > > position of the producer is already beyond the consumer. It doesn't > > matter whether the previous packet triggered a tx interrupt or > > deferred to an already scheduled rx interrupt? From that seems fine to > > leave it out.
On Mon, Apr 12, 2021 at 10:16 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Fri, Feb 05, 2021 at 02:28:33PM -0800, Wei Wang wrote: > > On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang <weiwan@google.com> wrote: > > > > > > > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote: > > > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote: > > > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote: > > > > > > > > > > > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote: > > > > > > > > > > > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote: > > > > > > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx > > > > > > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx > > > > > > > > > > > > complete interrupts. But this could introduce a race where tx complete > > > > > > > > > > > > interrupt has been raised, but the handler found there is no work to do > > > > > > > > > > > > because we have done the work in the previous rx interrupt handler. > > > > > > > > > > > > This could lead to the following warning msg: > > > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the > > > > > > > > > > > > "irqpoll" option) > > > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > > > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu > > > > > > > > > > > > [ 3588.017940] Call Trace: > > > > > > > > > > > > [ 3588.017942] <IRQ> > > > > > > > > > > > > [ 3588.017951] dump_stack+0x63/0x85 > > > > > > > > > > > > [ 3588.017953] __report_bad_irq+0x35/0xc0 > > > > > > > > > > > > [ 3588.017955] note_interrupt+0x24b/0x2a0 > > > > > > > > > > > > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > > > > > > > > > > > > [ 3588.017957] handle_irq_event+0x3b/0x60 > > > > > > > > > > > > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > > > > > > > > > > > > [ 3588.017961] handle_irq+0x20/0x30 > > > > > > > > > > > > [ 3588.017964] do_IRQ+0x50/0xe0 > > > > > > > > > > > > [ 3588.017966] common_interrupt+0xf/0xf > > > > > > > > > > > > [ 3588.017966] </IRQ> > > > > > > > > > > > > [ 3588.017989] handlers: > > > > > > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > > > > > > > > > > > > [ 3588.025099] Disabling IRQ #38 > > > > > > > > > > > > > > > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for > > > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such > > > > > > > > > > > > case. > > > > > > > > > > > > > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > > > > > > > > > > > Reported-by: Rick Jones <jonesrick@google.com> > > > > > > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com> > > > > > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This description does not make sense to me. > > > > > > > > > > > > > > > > > > > > > > irq X: nobody cared > > > > > > > > > > > only triggers after an interrupt is unhandled repeatedly. > > > > > > > > > > > > > > > > > > > > > > So something causes a storm of useless tx interrupts here. > > > > > > > > > > > > > > > > > > > > > > Let's find out what it was please. What you are doing is > > > > > > > > > > > just preventing linux from complaining. > > > > > > > > > > > > > > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at > > > > > > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the > > > > > > > > > > receiving host, which has a lot of rx interrupts firing on all queues, > > > > > > > > > > and a few tx interrupts. > > > > > > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets > > > > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts > > > > > > > > > > get triggered very close to each other, and gets handled in one round > > > > > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls > > > > > > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx() > > > > > > > > > > to try to do the work on the corresponding tx queue as well. That's > > > > > > > > > > why when tx interrupt handler gets called, it sees no work to do. > > > > > > > > > > And the reason for the rx handler to handle the tx work is here: > > > > > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html > > > > > > > > > > > > > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one > > > > > > > > > hundred such events, since boot, which is a small number compared real > > > > > > > > > interrupt load. > > > > > > > > > > > > > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from > > > > > > > > note_interrupt that applies here. > > > > > > > > > > > > > > > > > Occasionally seeing an interrupt with no work is expected after > > > > > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As > > > > > > > > > long as this rate of events is very low compared to useful interrupts, > > > > > > > > > and total interrupt count is greatly reduced vs not having work > > > > > > > > > stealing, it is a net win. > > > > > > > > > > > > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is > > > > > > > surely an even greater win to disable interrupts while polling like > > > > > > > this. Might be tricky to detect, disabling/enabling aggressively every > > > > > > > time even if there's nothing in the queue is sure to cause lots of cache > > > > > > > line bounces, and we don't want to enable callbacks if they were not > > > > > > > enabled e.g. by start_xmit ... Some kind of counter? > > > > > > > > > > > > Yes. It was known that the work stealing is more effective in some > > > > > > workloads than others. But a 99% spurious rate I had not anticipated. > > > > > > > > > > > > Most interesting is the number of interrupts suppressed as a result of > > > > > > the feature. That is not captured by this statistic. > > > > > > > > > > > > In any case, we'll take a step back to better understand behavior. And > > > > > > especially why this high spurious rate exhibits in this workload with > > > > > > many concurrent flows. > > > > > > > > > > > > > > > I've been thinking about it. Imagine work stealing working perfectly. > > > > > Each time we xmit a packet, it is stolen and freed. > > > > > Since xmit enables callbacks (just in case!) we also > > > > > get an interrupt, which is automatically spurious. > > > > > > > > > > My conclusion is that we shouldn't just work around it but instead > > > > > (or additionally?) > > > > > reduce the number of interrupts by disabling callbacks e.g. when > > > > > a. we are currently stealing packets > > > > > or > > > > > b. we stole all packets > > > > > > Agreed. This might prove a significant performance gain at the same time :) > > > > > > > > > > > > Thinking along this line, that probably means, we should disable cb on > > > > the tx virtqueue, when scheduling the napi work on the rx side, and > > > > reenable it after the rx napi work is done? > > > > Also, I wonder if it is too late to disable cb at the point we start > > > > to steal pkts or have stolen all pkts. > > > > > > The earlier the better. I see no benefit to delay until the rx handler > > > actually runs. > > > > > > > I've been thinking more on this. I think the fundamental issue here is > > that the rx napi handler virtnet_poll() does the tx side work by > > calling virtnet_poll_cleantx() without any notification to the tx > > side. > > I am thinking, in virtnet_poll(), instead of directly call > > virtnet_poll_cleantx(), why not do virtqueue_napi_schedule() to > > schedule the tx side napi, and let the tx napi handler do the cleaning > > work. This way, we automatically call virtqueue_disable_cb() on the tx > > vq, and after the tx work is done, virtqueue_napi_complete() is called > > to re-enable the cb on the tx side. This way, the tx side knows what > > has been done, and will likely reduce the # of spurious tx interrupts? > > And I don't think there is much cost in doing that, since > > napi_schedule() basically queues the tx napi to the back of its > > napi_list, and serves it right after the rx napi handler is done. > > What do you guys think? I could quickly test it up to see if it solves > > the issue. > > > Sure pls test. I think you will want to disable event index > for now to make sure disable cb is not a nop (I am working on > fixing that). > Hi Michael and Jason, I'd like to follow up on this issue a bit more. I've done some more investigation into this issue: 1. With Michael's recent patch: a7766ef18b336 ("virtio_net: disable cb aggressively"), we are still seeing this issue with a tcp_stream test with 240 flows. 2. We've tried with the following patch to suppress cleaning tx queue from rx napi handler for 10% of the time: diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 79bd2585ec6b..711768dbc617 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1510,6 +1510,8 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) return; if (__netif_tx_trylock(txq)) { + if (virtqueue_more_used(sq->vq) && !prandom_u32_max(10)) + goto unlock; do { virtqueue_disable_cb(sq->vq); free_old_xmit_skbs(sq, true); @@ -1518,6 +1520,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) netif_tx_wake_queue(txq); +unlock: __netif_tx_unlock(txq); } } This also does not help. It turns out skipping 10% is just not enough. We have to skip for 50% of the time in order for the warning to be suppressed. And this does not seem to be a viable solution since how much we skip probably will depend on the traffic pattern. My questions here: 1. Michael mentioned that if we use split queues with event idx, the interrupts are not actually being disabled. Is this still the case? If so, is that also the cause for so many spurious interrupts? 2. Michael also submitted another patch: 8d622d21d248 ("virtio: fix up virtio_disable_cb"). I am not quite sure, would that change help reduce the # of spurious interrupts we see if we use split queues with event idx? From my limited understanding, that patch skips calling virtqueue_disable_cb_split() if event_trigger is set for split queues. BTW, I have the setup to reproduce this issue easily. So do let me know if you have other ideas on how to fix it. Thanks. Wei > > > > Because the steal work is done > > > > in the napi handler of the rx queue. But the tx interrupt must have > > > > been raised before that. Will we come back to process the tx interrupt > > > > again after we re-enabled the cb on the tx side? > > > > > > > > > This should be enough to reduce the chances below 99% ;) > > > > > > > > > > One annoying thing is that with split and event index, we do not disable > > > > > interrupts. Could be worth revisiting, for now maybe just disable the > > > > > event index feature? I am not sure it is actually worth it with > > > > > stealing. > > > > > > With event index, we suppress interrupts when another interrupt is > > > already pending from a previous packet, right? When the previous > > > position of the producer is already beyond the consumer. It doesn't > > > matter whether the previous packet triggered a tx interrupt or > > > deferred to an already scheduled rx interrupt? From that seems fine to > > > leave it out. >
On Wed, Sep 29, 2021 at 01:21:58PM -0700, Wei Wang wrote: > On Mon, Apr 12, 2021 at 10:16 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Fri, Feb 05, 2021 at 02:28:33PM -0800, Wei Wang wrote: > > > On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang <weiwan@google.com> wrote: > > > > > > > > > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote: > > > > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote: > > > > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote: > > > > > > > > > > > > > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote: > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote: > > > > > > > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx > > > > > > > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx > > > > > > > > > > > > > complete interrupts. But this could introduce a race where tx complete > > > > > > > > > > > > > interrupt has been raised, but the handler found there is no work to do > > > > > > > > > > > > > because we have done the work in the previous rx interrupt handler. > > > > > > > > > > > > > This could lead to the following warning msg: > > > > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the > > > > > > > > > > > > > "irqpoll" option) > > > > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > > > > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu > > > > > > > > > > > > > [ 3588.017940] Call Trace: > > > > > > > > > > > > > [ 3588.017942] <IRQ> > > > > > > > > > > > > > [ 3588.017951] dump_stack+0x63/0x85 > > > > > > > > > > > > > [ 3588.017953] __report_bad_irq+0x35/0xc0 > > > > > > > > > > > > > [ 3588.017955] note_interrupt+0x24b/0x2a0 > > > > > > > > > > > > > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > > > > > > > > > > > > > [ 3588.017957] handle_irq_event+0x3b/0x60 > > > > > > > > > > > > > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > > > > > > > > > > > > > [ 3588.017961] handle_irq+0x20/0x30 > > > > > > > > > > > > > [ 3588.017964] do_IRQ+0x50/0xe0 > > > > > > > > > > > > > [ 3588.017966] common_interrupt+0xf/0xf > > > > > > > > > > > > > [ 3588.017966] </IRQ> > > > > > > > > > > > > > [ 3588.017989] handlers: > > > > > > > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > > > > > > > > > > > > > [ 3588.025099] Disabling IRQ #38 > > > > > > > > > > > > > > > > > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for > > > > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such > > > > > > > > > > > > > case. > > > > > > > > > > > > > > > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > > > > > > > > > > > > Reported-by: Rick Jones <jonesrick@google.com> > > > > > > > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com> > > > > > > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This description does not make sense to me. > > > > > > > > > > > > > > > > > > > > > > > > irq X: nobody cared > > > > > > > > > > > > only triggers after an interrupt is unhandled repeatedly. > > > > > > > > > > > > > > > > > > > > > > > > So something causes a storm of useless tx interrupts here. > > > > > > > > > > > > > > > > > > > > > > > > Let's find out what it was please. What you are doing is > > > > > > > > > > > > just preventing linux from complaining. > > > > > > > > > > > > > > > > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at > > > > > > > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the > > > > > > > > > > > receiving host, which has a lot of rx interrupts firing on all queues, > > > > > > > > > > > and a few tx interrupts. > > > > > > > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets > > > > > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts > > > > > > > > > > > get triggered very close to each other, and gets handled in one round > > > > > > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls > > > > > > > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx() > > > > > > > > > > > to try to do the work on the corresponding tx queue as well. That's > > > > > > > > > > > why when tx interrupt handler gets called, it sees no work to do. > > > > > > > > > > > And the reason for the rx handler to handle the tx work is here: > > > > > > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html > > > > > > > > > > > > > > > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one > > > > > > > > > > hundred such events, since boot, which is a small number compared real > > > > > > > > > > interrupt load. > > > > > > > > > > > > > > > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from > > > > > > > > > note_interrupt that applies here. > > > > > > > > > > > > > > > > > > > Occasionally seeing an interrupt with no work is expected after > > > > > > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As > > > > > > > > > > long as this rate of events is very low compared to useful interrupts, > > > > > > > > > > and total interrupt count is greatly reduced vs not having work > > > > > > > > > > stealing, it is a net win. > > > > > > > > > > > > > > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is > > > > > > > > surely an even greater win to disable interrupts while polling like > > > > > > > > this. Might be tricky to detect, disabling/enabling aggressively every > > > > > > > > time even if there's nothing in the queue is sure to cause lots of cache > > > > > > > > line bounces, and we don't want to enable callbacks if they were not > > > > > > > > enabled e.g. by start_xmit ... Some kind of counter? > > > > > > > > > > > > > > Yes. It was known that the work stealing is more effective in some > > > > > > > workloads than others. But a 99% spurious rate I had not anticipated. > > > > > > > > > > > > > > Most interesting is the number of interrupts suppressed as a result of > > > > > > > the feature. That is not captured by this statistic. > > > > > > > > > > > > > > In any case, we'll take a step back to better understand behavior. And > > > > > > > especially why this high spurious rate exhibits in this workload with > > > > > > > many concurrent flows. > > > > > > > > > > > > > > > > > > I've been thinking about it. Imagine work stealing working perfectly. > > > > > > Each time we xmit a packet, it is stolen and freed. > > > > > > Since xmit enables callbacks (just in case!) we also > > > > > > get an interrupt, which is automatically spurious. > > > > > > > > > > > > My conclusion is that we shouldn't just work around it but instead > > > > > > (or additionally?) > > > > > > reduce the number of interrupts by disabling callbacks e.g. when > > > > > > a. we are currently stealing packets > > > > > > or > > > > > > b. we stole all packets > > > > > > > > Agreed. This might prove a significant performance gain at the same time :) > > > > > > > > > > > > > > > Thinking along this line, that probably means, we should disable cb on > > > > > the tx virtqueue, when scheduling the napi work on the rx side, and > > > > > reenable it after the rx napi work is done? > > > > > Also, I wonder if it is too late to disable cb at the point we start > > > > > to steal pkts or have stolen all pkts. > > > > > > > > The earlier the better. I see no benefit to delay until the rx handler > > > > actually runs. > > > > > > > > > > I've been thinking more on this. I think the fundamental issue here is > > > that the rx napi handler virtnet_poll() does the tx side work by > > > calling virtnet_poll_cleantx() without any notification to the tx > > > side. > > > I am thinking, in virtnet_poll(), instead of directly call > > > virtnet_poll_cleantx(), why not do virtqueue_napi_schedule() to > > > schedule the tx side napi, and let the tx napi handler do the cleaning > > > work. This way, we automatically call virtqueue_disable_cb() on the tx > > > vq, and after the tx work is done, virtqueue_napi_complete() is called > > > to re-enable the cb on the tx side. This way, the tx side knows what > > > has been done, and will likely reduce the # of spurious tx interrupts? > > > And I don't think there is much cost in doing that, since > > > napi_schedule() basically queues the tx napi to the back of its > > > napi_list, and serves it right after the rx napi handler is done. > > > What do you guys think? I could quickly test it up to see if it solves > > > the issue. > > > > > > Sure pls test. I think you will want to disable event index > > for now to make sure disable cb is not a nop (I am working on > > fixing that). > > > > Hi Michael and Jason, > > I'd like to follow up on this issue a bit more. > I've done some more investigation into this issue: > 1. With Michael's recent patch: a7766ef18b336 ("virtio_net: disable cb > aggressively"), we are still seeing this issue with a tcp_stream test > with 240 flows. > 2. We've tried with the following patch to suppress cleaning tx queue > from rx napi handler for 10% of the time: > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 79bd2585ec6b..711768dbc617 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -1510,6 +1510,8 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) > return; > > if (__netif_tx_trylock(txq)) { > + if (virtqueue_more_used(sq->vq) && !prandom_u32_max(10)) > + goto unlock; > do { > virtqueue_disable_cb(sq->vq); > free_old_xmit_skbs(sq, true); > @@ -1518,6 +1520,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) > if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) > netif_tx_wake_queue(txq); > > +unlock: > __netif_tx_unlock(txq); > } > } > This also does not help. It turns out skipping 10% is just not enough. > We have to skip for 50% of the time in order for the warning to be > suppressed. > And this does not seem to be a viable solution since how much we skip > probably will depend on the traffic pattern. > > My questions here: > 1. Michael mentioned that if we use split queues with event idx, the > interrupts are not actually being disabled. Is this still the case? If > so, is that also the cause for so many spurious interrupts? > 2. Michael also submitted another patch: 8d622d21d248 ("virtio: fix up > virtio_disable_cb"). I am not quite sure, would that change help > reduce the # of spurious interrupts we see if we use split queues with > event idx? From my limited understanding, that patch skips calling > virtqueue_disable_cb_split() if event_trigger is set for split queues. > > BTW, I have the setup to reproduce this issue easily. So do let me > know if you have other ideas on how to fix it. > > Thanks. > Wei I think that commit is needed to fix the issue, yes. My suggestion is to try v5.14 in its entirety rather than cherry-picking. If you see that the issue is fixed there I can point you to a list of commit to backport. > > > > > > Because the steal work is done > > > > > in the napi handler of the rx queue. But the tx interrupt must have > > > > > been raised before that. Will we come back to process the tx interrupt > > > > > again after we re-enabled the cb on the tx side? > > > > > > > > > > > This should be enough to reduce the chances below 99% ;) > > > > > > > > > > > > One annoying thing is that with split and event index, we do not disable > > > > > > interrupts. Could be worth revisiting, for now maybe just disable the > > > > > > event index feature? I am not sure it is actually worth it with > > > > > > stealing. > > > > > > > > With event index, we suppress interrupts when another interrupt is > > > > already pending from a previous packet, right? When the previous > > > > position of the producer is already beyond the consumer. It doesn't > > > > matter whether the previous packet triggered a tx interrupt or > > > > deferred to an already scheduled rx interrupt? From that seems fine to > > > > leave it out. > >
On Wed, Sep 29, 2021 at 2:53 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Wed, Sep 29, 2021 at 01:21:58PM -0700, Wei Wang wrote: > > On Mon, Apr 12, 2021 at 10:16 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Fri, Feb 05, 2021 at 02:28:33PM -0800, Wei Wang wrote: > > > > On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang <weiwan@google.com> wrote: > > > > > > > > > > > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote: > > > > > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote: > > > > > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote: > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote: > > > > > > > > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx > > > > > > > > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx > > > > > > > > > > > > > > complete interrupts. But this could introduce a race where tx complete > > > > > > > > > > > > > > interrupt has been raised, but the handler found there is no work to do > > > > > > > > > > > > > > because we have done the work in the previous rx interrupt handler. > > > > > > > > > > > > > > This could lead to the following warning msg: > > > > > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the > > > > > > > > > > > > > > "irqpoll" option) > > > > > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > > > > > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu > > > > > > > > > > > > > > [ 3588.017940] Call Trace: > > > > > > > > > > > > > > [ 3588.017942] <IRQ> > > > > > > > > > > > > > > [ 3588.017951] dump_stack+0x63/0x85 > > > > > > > > > > > > > > [ 3588.017953] __report_bad_irq+0x35/0xc0 > > > > > > > > > > > > > > [ 3588.017955] note_interrupt+0x24b/0x2a0 > > > > > > > > > > > > > > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > > > > > > > > > > > > > > [ 3588.017957] handle_irq_event+0x3b/0x60 > > > > > > > > > > > > > > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > > > > > > > > > > > > > > [ 3588.017961] handle_irq+0x20/0x30 > > > > > > > > > > > > > > [ 3588.017964] do_IRQ+0x50/0xe0 > > > > > > > > > > > > > > [ 3588.017966] common_interrupt+0xf/0xf > > > > > > > > > > > > > > [ 3588.017966] </IRQ> > > > > > > > > > > > > > > [ 3588.017989] handlers: > > > > > > > > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > > > > > > > > > > > > > > [ 3588.025099] Disabling IRQ #38 > > > > > > > > > > > > > > > > > > > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for > > > > > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such > > > > > > > > > > > > > > case. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > > > > > > > > > > > > > Reported-by: Rick Jones <jonesrick@google.com> > > > > > > > > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com> > > > > > > > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This description does not make sense to me. > > > > > > > > > > > > > > > > > > > > > > > > > > irq X: nobody cared > > > > > > > > > > > > > only triggers after an interrupt is unhandled repeatedly. > > > > > > > > > > > > > > > > > > > > > > > > > > So something causes a storm of useless tx interrupts here. > > > > > > > > > > > > > > > > > > > > > > > > > > Let's find out what it was please. What you are doing is > > > > > > > > > > > > > just preventing linux from complaining. > > > > > > > > > > > > > > > > > > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at > > > > > > > > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the > > > > > > > > > > > > receiving host, which has a lot of rx interrupts firing on all queues, > > > > > > > > > > > > and a few tx interrupts. > > > > > > > > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets > > > > > > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts > > > > > > > > > > > > get triggered very close to each other, and gets handled in one round > > > > > > > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls > > > > > > > > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx() > > > > > > > > > > > > to try to do the work on the corresponding tx queue as well. That's > > > > > > > > > > > > why when tx interrupt handler gets called, it sees no work to do. > > > > > > > > > > > > And the reason for the rx handler to handle the tx work is here: > > > > > > > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html > > > > > > > > > > > > > > > > > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one > > > > > > > > > > > hundred such events, since boot, which is a small number compared real > > > > > > > > > > > interrupt load. > > > > > > > > > > > > > > > > > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from > > > > > > > > > > note_interrupt that applies here. > > > > > > > > > > > > > > > > > > > > > Occasionally seeing an interrupt with no work is expected after > > > > > > > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As > > > > > > > > > > > long as this rate of events is very low compared to useful interrupts, > > > > > > > > > > > and total interrupt count is greatly reduced vs not having work > > > > > > > > > > > stealing, it is a net win. > > > > > > > > > > > > > > > > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is > > > > > > > > > surely an even greater win to disable interrupts while polling like > > > > > > > > > this. Might be tricky to detect, disabling/enabling aggressively every > > > > > > > > > time even if there's nothing in the queue is sure to cause lots of cache > > > > > > > > > line bounces, and we don't want to enable callbacks if they were not > > > > > > > > > enabled e.g. by start_xmit ... Some kind of counter? > > > > > > > > > > > > > > > > Yes. It was known that the work stealing is more effective in some > > > > > > > > workloads than others. But a 99% spurious rate I had not anticipated. > > > > > > > > > > > > > > > > Most interesting is the number of interrupts suppressed as a result of > > > > > > > > the feature. That is not captured by this statistic. > > > > > > > > > > > > > > > > In any case, we'll take a step back to better understand behavior. And > > > > > > > > especially why this high spurious rate exhibits in this workload with > > > > > > > > many concurrent flows. > > > > > > > > > > > > > > > > > > > > > I've been thinking about it. Imagine work stealing working perfectly. > > > > > > > Each time we xmit a packet, it is stolen and freed. > > > > > > > Since xmit enables callbacks (just in case!) we also > > > > > > > get an interrupt, which is automatically spurious. > > > > > > > > > > > > > > My conclusion is that we shouldn't just work around it but instead > > > > > > > (or additionally?) > > > > > > > reduce the number of interrupts by disabling callbacks e.g. when > > > > > > > a. we are currently stealing packets > > > > > > > or > > > > > > > b. we stole all packets > > > > > > > > > > Agreed. This might prove a significant performance gain at the same time :) > > > > > > > > > > > > > > > > > > Thinking along this line, that probably means, we should disable cb on > > > > > > the tx virtqueue, when scheduling the napi work on the rx side, and > > > > > > reenable it after the rx napi work is done? > > > > > > Also, I wonder if it is too late to disable cb at the point we start > > > > > > to steal pkts or have stolen all pkts. > > > > > > > > > > The earlier the better. I see no benefit to delay until the rx handler > > > > > actually runs. > > > > > > > > > > > > > I've been thinking more on this. I think the fundamental issue here is > > > > that the rx napi handler virtnet_poll() does the tx side work by > > > > calling virtnet_poll_cleantx() without any notification to the tx > > > > side. > > > > I am thinking, in virtnet_poll(), instead of directly call > > > > virtnet_poll_cleantx(), why not do virtqueue_napi_schedule() to > > > > schedule the tx side napi, and let the tx napi handler do the cleaning > > > > work. This way, we automatically call virtqueue_disable_cb() on the tx > > > > vq, and after the tx work is done, virtqueue_napi_complete() is called > > > > to re-enable the cb on the tx side. This way, the tx side knows what > > > > has been done, and will likely reduce the # of spurious tx interrupts? > > > > And I don't think there is much cost in doing that, since > > > > napi_schedule() basically queues the tx napi to the back of its > > > > napi_list, and serves it right after the rx napi handler is done. > > > > What do you guys think? I could quickly test it up to see if it solves > > > > the issue. > > > > > > > > > Sure pls test. I think you will want to disable event index > > > for now to make sure disable cb is not a nop (I am working on > > > fixing that). > > > > > > > Hi Michael and Jason, > > > > I'd like to follow up on this issue a bit more. > > I've done some more investigation into this issue: > > 1. With Michael's recent patch: a7766ef18b336 ("virtio_net: disable cb > > aggressively"), we are still seeing this issue with a tcp_stream test > > with 240 flows. > > 2. We've tried with the following patch to suppress cleaning tx queue > > from rx napi handler for 10% of the time: > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 79bd2585ec6b..711768dbc617 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -1510,6 +1510,8 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) > > return; > > > > if (__netif_tx_trylock(txq)) { > > + if (virtqueue_more_used(sq->vq) && !prandom_u32_max(10)) > > + goto unlock; > > do { > > virtqueue_disable_cb(sq->vq); > > free_old_xmit_skbs(sq, true); > > @@ -1518,6 +1520,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) > > if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) > > netif_tx_wake_queue(txq); > > > > +unlock: > > __netif_tx_unlock(txq); > > } > > } > > This also does not help. It turns out skipping 10% is just not enough. > > We have to skip for 50% of the time in order for the warning to be > > suppressed. > > And this does not seem to be a viable solution since how much we skip > > probably will depend on the traffic pattern. > > > > My questions here: > > 1. Michael mentioned that if we use split queues with event idx, the > > interrupts are not actually being disabled. Is this still the case? If > > so, is that also the cause for so many spurious interrupts? > > 2. Michael also submitted another patch: 8d622d21d248 ("virtio: fix up > > virtio_disable_cb"). I am not quite sure, would that change help > > reduce the # of spurious interrupts we see if we use split queues with > > event idx? From my limited understanding, that patch skips calling > > virtqueue_disable_cb_split() if event_trigger is set for split queues. > > > > BTW, I have the setup to reproduce this issue easily. So do let me > > know if you have other ideas on how to fix it. > > > > Thanks. > > Wei > > I think that commit is needed to fix the issue, yes. > > My suggestion is to try v5.14 in its entirety > rather than cherry-picking. > > If you see that the issue is > fixed there I can point you to a list of commit to backport. > Thanks Michael. It does seem with 5.14 kernel, I no longer see this issue happening, which is great! Could you point me to a list of commits that fixed it? > > > > > > > > Because the steal work is done > > > > > > in the napi handler of the rx queue. But the tx interrupt must have > > > > > > been raised before that. Will we come back to process the tx interrupt > > > > > > again after we re-enabled the cb on the tx side? > > > > > > > > > > > > > This should be enough to reduce the chances below 99% ;) > > > > > > > > > > > > > > One annoying thing is that with split and event index, we do not disable > > > > > > > interrupts. Could be worth revisiting, for now maybe just disable the > > > > > > > event index feature? I am not sure it is actually worth it with > > > > > > > stealing. > > > > > > > > > > With event index, we suppress interrupts when another interrupt is > > > > > already pending from a previous packet, right? When the previous > > > > > position of the producer is already beyond the consumer. It doesn't > > > > > matter whether the previous packet triggered a tx interrupt or > > > > > deferred to an already scheduled rx interrupt? From that seems fine to > > > > > leave it out. > > > >
On Wed, Sep 29, 2021 at 04:08:29PM -0700, Wei Wang wrote: > On Wed, Sep 29, 2021 at 2:53 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Wed, Sep 29, 2021 at 01:21:58PM -0700, Wei Wang wrote: > > > On Mon, Apr 12, 2021 at 10:16 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > On Fri, Feb 05, 2021 at 02:28:33PM -0800, Wei Wang wrote: > > > > > On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn > > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > > > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang <weiwan@google.com> wrote: > > > > > > > > > > > > > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > > > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote: > > > > > > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote: > > > > > > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote: > > > > > > > > > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx > > > > > > > > > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx > > > > > > > > > > > > > > > complete interrupts. But this could introduce a race where tx complete > > > > > > > > > > > > > > > interrupt has been raised, but the handler found there is no work to do > > > > > > > > > > > > > > > because we have done the work in the previous rx interrupt handler. > > > > > > > > > > > > > > > This could lead to the following warning msg: > > > > > > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the > > > > > > > > > > > > > > > "irqpoll" option) > > > > > > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > > > > > > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu > > > > > > > > > > > > > > > [ 3588.017940] Call Trace: > > > > > > > > > > > > > > > [ 3588.017942] <IRQ> > > > > > > > > > > > > > > > [ 3588.017951] dump_stack+0x63/0x85 > > > > > > > > > > > > > > > [ 3588.017953] __report_bad_irq+0x35/0xc0 > > > > > > > > > > > > > > > [ 3588.017955] note_interrupt+0x24b/0x2a0 > > > > > > > > > > > > > > > [ 3588.017956] handle_irq_event_percpu+0x54/0x80 > > > > > > > > > > > > > > > [ 3588.017957] handle_irq_event+0x3b/0x60 > > > > > > > > > > > > > > > [ 3588.017958] handle_edge_irq+0x83/0x1a0 > > > > > > > > > > > > > > > [ 3588.017961] handle_irq+0x20/0x30 > > > > > > > > > > > > > > > [ 3588.017964] do_IRQ+0x50/0xe0 > > > > > > > > > > > > > > > [ 3588.017966] common_interrupt+0xf/0xf > > > > > > > > > > > > > > > [ 3588.017966] </IRQ> > > > > > > > > > > > > > > > [ 3588.017989] handlers: > > > > > > > > > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt > > > > > > > > > > > > > > > [ 3588.025099] Disabling IRQ #38 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for > > > > > > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such > > > > > > > > > > > > > > > case. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") > > > > > > > > > > > > > > > Reported-by: Rick Jones <jonesrick@google.com> > > > > > > > > > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com> > > > > > > > > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This description does not make sense to me. > > > > > > > > > > > > > > > > > > > > > > > > > > > > irq X: nobody cared > > > > > > > > > > > > > > only triggers after an interrupt is unhandled repeatedly. > > > > > > > > > > > > > > > > > > > > > > > > > > > > So something causes a storm of useless tx interrupts here. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Let's find out what it was please. What you are doing is > > > > > > > > > > > > > > just preventing linux from complaining. > > > > > > > > > > > > > > > > > > > > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at > > > > > > > > > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the > > > > > > > > > > > > > receiving host, which has a lot of rx interrupts firing on all queues, > > > > > > > > > > > > > and a few tx interrupts. > > > > > > > > > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets > > > > > > > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts > > > > > > > > > > > > > get triggered very close to each other, and gets handled in one round > > > > > > > > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls > > > > > > > > > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx() > > > > > > > > > > > > > to try to do the work on the corresponding tx queue as well. That's > > > > > > > > > > > > > why when tx interrupt handler gets called, it sees no work to do. > > > > > > > > > > > > > And the reason for the rx handler to handle the tx work is here: > > > > > > > > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html > > > > > > > > > > > > > > > > > > > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one > > > > > > > > > > > > hundred such events, since boot, which is a small number compared real > > > > > > > > > > > > interrupt load. > > > > > > > > > > > > > > > > > > > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from > > > > > > > > > > > note_interrupt that applies here. > > > > > > > > > > > > > > > > > > > > > > > Occasionally seeing an interrupt with no work is expected after > > > > > > > > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As > > > > > > > > > > > > long as this rate of events is very low compared to useful interrupts, > > > > > > > > > > > > and total interrupt count is greatly reduced vs not having work > > > > > > > > > > > > stealing, it is a net win. > > > > > > > > > > > > > > > > > > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is > > > > > > > > > > surely an even greater win to disable interrupts while polling like > > > > > > > > > > this. Might be tricky to detect, disabling/enabling aggressively every > > > > > > > > > > time even if there's nothing in the queue is sure to cause lots of cache > > > > > > > > > > line bounces, and we don't want to enable callbacks if they were not > > > > > > > > > > enabled e.g. by start_xmit ... Some kind of counter? > > > > > > > > > > > > > > > > > > Yes. It was known that the work stealing is more effective in some > > > > > > > > > workloads than others. But a 99% spurious rate I had not anticipated. > > > > > > > > > > > > > > > > > > Most interesting is the number of interrupts suppressed as a result of > > > > > > > > > the feature. That is not captured by this statistic. > > > > > > > > > > > > > > > > > > In any case, we'll take a step back to better understand behavior. And > > > > > > > > > especially why this high spurious rate exhibits in this workload with > > > > > > > > > many concurrent flows. > > > > > > > > > > > > > > > > > > > > > > > > I've been thinking about it. Imagine work stealing working perfectly. > > > > > > > > Each time we xmit a packet, it is stolen and freed. > > > > > > > > Since xmit enables callbacks (just in case!) we also > > > > > > > > get an interrupt, which is automatically spurious. > > > > > > > > > > > > > > > > My conclusion is that we shouldn't just work around it but instead > > > > > > > > (or additionally?) > > > > > > > > reduce the number of interrupts by disabling callbacks e.g. when > > > > > > > > a. we are currently stealing packets > > > > > > > > or > > > > > > > > b. we stole all packets > > > > > > > > > > > > Agreed. This might prove a significant performance gain at the same time :) > > > > > > > > > > > > > > > > > > > > > Thinking along this line, that probably means, we should disable cb on > > > > > > > the tx virtqueue, when scheduling the napi work on the rx side, and > > > > > > > reenable it after the rx napi work is done? > > > > > > > Also, I wonder if it is too late to disable cb at the point we start > > > > > > > to steal pkts or have stolen all pkts. > > > > > > > > > > > > The earlier the better. I see no benefit to delay until the rx handler > > > > > > actually runs. > > > > > > > > > > > > > > > > I've been thinking more on this. I think the fundamental issue here is > > > > > that the rx napi handler virtnet_poll() does the tx side work by > > > > > calling virtnet_poll_cleantx() without any notification to the tx > > > > > side. > > > > > I am thinking, in virtnet_poll(), instead of directly call > > > > > virtnet_poll_cleantx(), why not do virtqueue_napi_schedule() to > > > > > schedule the tx side napi, and let the tx napi handler do the cleaning > > > > > work. This way, we automatically call virtqueue_disable_cb() on the tx > > > > > vq, and after the tx work is done, virtqueue_napi_complete() is called > > > > > to re-enable the cb on the tx side. This way, the tx side knows what > > > > > has been done, and will likely reduce the # of spurious tx interrupts? > > > > > And I don't think there is much cost in doing that, since > > > > > napi_schedule() basically queues the tx napi to the back of its > > > > > napi_list, and serves it right after the rx napi handler is done. > > > > > What do you guys think? I could quickly test it up to see if it solves > > > > > the issue. > > > > > > > > > > > > Sure pls test. I think you will want to disable event index > > > > for now to make sure disable cb is not a nop (I am working on > > > > fixing that). > > > > > > > > > > Hi Michael and Jason, > > > > > > I'd like to follow up on this issue a bit more. > > > I've done some more investigation into this issue: > > > 1. With Michael's recent patch: a7766ef18b336 ("virtio_net: disable cb > > > aggressively"), we are still seeing this issue with a tcp_stream test > > > with 240 flows. > > > 2. We've tried with the following patch to suppress cleaning tx queue > > > from rx napi handler for 10% of the time: > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > index 79bd2585ec6b..711768dbc617 100644 > > > --- a/drivers/net/virtio_net.c > > > +++ b/drivers/net/virtio_net.c > > > @@ -1510,6 +1510,8 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) > > > return; > > > > > > if (__netif_tx_trylock(txq)) { > > > + if (virtqueue_more_used(sq->vq) && !prandom_u32_max(10)) > > > + goto unlock; > > > do { > > > virtqueue_disable_cb(sq->vq); > > > free_old_xmit_skbs(sq, true); > > > @@ -1518,6 +1520,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) > > > if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) > > > netif_tx_wake_queue(txq); > > > > > > +unlock: > > > __netif_tx_unlock(txq); > > > } > > > } > > > This also does not help. It turns out skipping 10% is just not enough. > > > We have to skip for 50% of the time in order for the warning to be > > > suppressed. > > > And this does not seem to be a viable solution since how much we skip > > > probably will depend on the traffic pattern. > > > > > > My questions here: > > > 1. Michael mentioned that if we use split queues with event idx, the > > > interrupts are not actually being disabled. Is this still the case? If > > > so, is that also the cause for so many spurious interrupts? > > > 2. Michael also submitted another patch: 8d622d21d248 ("virtio: fix up > > > virtio_disable_cb"). I am not quite sure, would that change help > > > reduce the # of spurious interrupts we see if we use split queues with > > > event idx? From my limited understanding, that patch skips calling > > > virtqueue_disable_cb_split() if event_trigger is set for split queues. > > > > > > BTW, I have the setup to reproduce this issue easily. So do let me > > > know if you have other ideas on how to fix it. > > > > > > Thanks. > > > Wei > > > > I think that commit is needed to fix the issue, yes. > > > > My suggestion is to try v5.14 in its entirety > > rather than cherry-picking. > > > > If you see that the issue is > > fixed there I can point you to a list of commit to backport. > > > > Thanks Michael. It does seem with 5.14 kernel, I no longer see this > issue happening, which is great! > Could you point me to a list of commits that fixed it? Try this: a7766ef18b33 virtio_net: disable cb aggressively 8d622d21d248 virtio: fix up virtio_disable_cb 22bc63c58e87 virtio_net: move txq wakeups under tx q lock 5a2f966d0f3f virtio_net: move tx vq operation under tx queue lock > > > > > > > > > > Because the steal work is done > > > > > > > in the napi handler of the rx queue. But the tx interrupt must have > > > > > > > been raised before that. Will we come back to process the tx interrupt > > > > > > > again after we re-enabled the cb on the tx side? > > > > > > > > > > > > > > > This should be enough to reduce the chances below 99% ;) > > > > > > > > > > > > > > > > One annoying thing is that with split and event index, we do not disable > > > > > > > > interrupts. Could be worth revisiting, for now maybe just disable the > > > > > > > > event index feature? I am not sure it is actually worth it with > > > > > > > > stealing. > > > > > > > > > > > > With event index, we suppress interrupts when another interrupt is > > > > > > already pending from a previous packet, right? When the previous > > > > > > position of the producer is already beyond the consumer. It doesn't > > > > > > matter whether the previous packet triggered a tx interrupt or > > > > > > deferred to an already scheduled rx interrupt? From that seems fine to > > > > > > leave it out. > > > > > >
To me, the symptoms of this problem (a packet storm) smelt like the lack of bql leading to very bursty tcp behavior and enormous tcp RTT inflation in this driver which I observed 6+ months back. now that it seems to be fixed (?), could you re-run the netperf 128 stream test and share a packet capture? (try with pfifo_fast or fq_codel at the qdisc), and sch_fq as a control (which was working correctly). thx.
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 508408fbe78f..e9a3f30864e8 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi, return; } + /* With napi_tx enabled, free_old_xmit_skbs() could be called from + * rx napi handler. Set work_steal to suppress bad irq warning for + * IRQ_NONE case from tx complete interrupt handler. + */ + virtqueue_set_work_steal(vq, true); + return virtnet_napi_enable(vq, napi); } -static void virtnet_napi_tx_disable(struct napi_struct *napi) +static void virtnet_napi_tx_disable(struct virtqueue *vq, + struct napi_struct *napi) { - if (napi->weight) + if (napi->weight) { napi_disable(napi); + virtqueue_set_work_steal(vq, false); + } } static void refill_work(struct work_struct *work) @@ -1835,7 +1844,7 @@ static int virtnet_close(struct net_device *dev) for (i = 0; i < vi->max_queue_pairs; i++) { xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); napi_disable(&vi->rq[i].napi); - virtnet_napi_tx_disable(&vi->sq[i].napi); + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); } return 0; @@ -2315,7 +2324,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev) if (netif_running(vi->dev)) { for (i = 0; i < vi->max_queue_pairs; i++) { napi_disable(&vi->rq[i].napi); - virtnet_napi_tx_disable(&vi->sq[i].napi); + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); } } } @@ -2440,7 +2449,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog, if (netif_running(dev)) { for (i = 0; i < vi->max_queue_pairs; i++) { napi_disable(&vi->rq[i].napi); - virtnet_napi_tx_disable(&vi->sq[i].napi); + virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi); } } diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 71e16b53e9c1..f7c5d697c302 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -105,6 +105,9 @@ struct vring_virtqueue { /* Host publishes avail event idx */ bool event; + /* Tx side napi work could be done from rx side. */ + bool work_steal; + /* Head of free buffer list. */ unsigned int free_head; /* Number we've added since last sync. */ @@ -1604,6 +1607,7 @@ static struct virtqueue *vring_create_virtqueue_packed( vq->notify = notify; vq->weak_barriers = weak_barriers; vq->broken = false; + vq->work_steal = false; vq->last_used_idx = 0; vq->num_added = 0; vq->packed_ring = true; @@ -2038,6 +2042,9 @@ irqreturn_t vring_interrupt(int irq, void *_vq) if (!more_used(vq)) { pr_debug("virtqueue interrupt with no work for %p\n", vq); + if (vq->work_steal) + return IRQ_HANDLED; + return IRQ_NONE; } @@ -2082,6 +2089,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index, vq->notify = notify; vq->weak_barriers = weak_barriers; vq->broken = false; + vq->work_steal = false; vq->last_used_idx = 0; vq->num_added = 0; vq->use_dma_api = vring_use_dma_api(vdev); @@ -2266,6 +2274,14 @@ bool virtqueue_is_broken(struct virtqueue *_vq) } EXPORT_SYMBOL_GPL(virtqueue_is_broken); +void virtqueue_set_work_steal(struct virtqueue *_vq, bool val) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + + vq->work_steal = val; +} +EXPORT_SYMBOL_GPL(virtqueue_set_work_steal); + /* * This should prevent the device from being used, allowing drivers to * recover. You may need to grab appropriate locks to flush. diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 55ea329fe72a..091c30f21ff9 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -84,6 +84,8 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *vq); bool virtqueue_is_broken(struct virtqueue *vq); +void virtqueue_set_work_steal(struct virtqueue *vq, bool val); + const struct vring *virtqueue_get_vring(struct virtqueue *vq); dma_addr_t virtqueue_get_desc_addr(struct virtqueue *vq); dma_addr_t virtqueue_get_avail_addr(struct virtqueue *vq);