Message ID | 1612614564-4220-3-git-send-email-si-wei.liu@oracle.com |
---|---|
State | Superseded |
Headers | show |
Series | [1/3] mlx5_vdpa: should exclude header length and fcs from mtu | expand |
On 2021/2/6 下午8:29, Si-Wei Liu wrote: > While virtq is stopped, get_vq_state() is supposed to > be called to get sync'ed with the latest internal > avail_index from device. The saved avail_index is used > to restate the virtq once device is started. Commit > b35ccebe3ef7 introduced the clear_virtqueues() routine > to reset the saved avail_index, however, the index > gets cleared a bit earlier before get_vq_state() tries > to read it. This would cause consistency problems when > virtq is restarted, e.g. through a series of link down > and link up events. We could defer the clearing of > avail_index to until the device is to be started, > i.e. until VIRTIO_CONFIG_S_DRIVER_OK is set again in > set_status(). > > Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map") > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> > --- > drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c > index aa6f8cd..444ab58 100644 > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) > if (!status) { > mlx5_vdpa_info(mvdev, "performing device reset\n"); > teardown_driver(ndev); > - clear_virtqueues(ndev); > mlx5_vdpa_destroy_mr(&ndev->mvdev); > ndev->mvdev.status = 0; > ++mvdev->generation; > @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) > > if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) { > if (status & VIRTIO_CONFIG_S_DRIVER_OK) { > + clear_virtqueues(ndev); > err = setup_driver(ndev); > if (err) { > mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
On Sat, Feb 06, 2021 at 04:29:24AM -0800, Si-Wei Liu wrote: > While virtq is stopped, get_vq_state() is supposed to > be called to get sync'ed with the latest internal > avail_index from device. The saved avail_index is used > to restate the virtq once device is started. Commit > b35ccebe3ef7 introduced the clear_virtqueues() routine > to reset the saved avail_index, however, the index > gets cleared a bit earlier before get_vq_state() tries > to read it. This would cause consistency problems when > virtq is restarted, e.g. through a series of link down > and link up events. We could defer the clearing of > avail_index to until the device is to be started, > i.e. until VIRTIO_CONFIG_S_DRIVER_OK is set again in > set_status(). Not sure I understand the scenario. You are talking about reset of the device followed by up/down events on the interface. How can you trigger this? > > Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map") > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> > --- > drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c > index aa6f8cd..444ab58 100644 > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) > if (!status) { > mlx5_vdpa_info(mvdev, "performing device reset\n"); > teardown_driver(ndev); > - clear_virtqueues(ndev); > mlx5_vdpa_destroy_mr(&ndev->mvdev); > ndev->mvdev.status = 0; > ++mvdev->generation; > @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) > > if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) { > if (status & VIRTIO_CONFIG_S_DRIVER_OK) { > + clear_virtqueues(ndev); > err = setup_driver(ndev); > if (err) { > mlx5_vdpa_warn(mvdev, "failed to setup driver\n"); > -- > 1.8.3.1 >
On 2/7/2021 9:48 PM, Eli Cohen wrote: > On Sat, Feb 06, 2021 at 04:29:24AM -0800, Si-Wei Liu wrote: >> While virtq is stopped, get_vq_state() is supposed to >> be called to get sync'ed with the latest internal >> avail_index from device. The saved avail_index is used >> to restate the virtq once device is started. Commit >> b35ccebe3ef7 introduced the clear_virtqueues() routine >> to reset the saved avail_index, however, the index >> gets cleared a bit earlier before get_vq_state() tries >> to read it. This would cause consistency problems when >> virtq is restarted, e.g. through a series of link down >> and link up events. We could defer the clearing of >> avail_index to until the device is to be started, >> i.e. until VIRTIO_CONFIG_S_DRIVER_OK is set again in >> set_status(). > > Not sure I understand the scenario. You are talking about reset of the > device followed by up/down events on the interface. How can you trigger > this? Currently it's not possible to trigger link up/down events with upstream QEMU due lack of config/control interrupt implementation. And live migration could be another scenario that cannot be satisfied because of inconsistent queue state. They share the same root of cause as captured here. -Siwei > >> Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map") >> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> >> --- >> drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c >> index aa6f8cd..444ab58 100644 >> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c >> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c >> @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) >> if (!status) { >> mlx5_vdpa_info(mvdev, "performing device reset\n"); >> teardown_driver(ndev); >> - clear_virtqueues(ndev); >> mlx5_vdpa_destroy_mr(&ndev->mvdev); >> ndev->mvdev.status = 0; >> ++mvdev->generation; >> @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) >> >> if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) { >> if (status & VIRTIO_CONFIG_S_DRIVER_OK) { >> + clear_virtqueues(ndev); >> err = setup_driver(ndev); >> if (err) { >> mlx5_vdpa_warn(mvdev, "failed to setup driver\n"); >> -- >> 1.8.3.1 >>
On 2021/2/6 下午8:29, Si-Wei Liu wrote: > While virtq is stopped, get_vq_state() is supposed to > be called to get sync'ed with the latest internal > avail_index from device. The saved avail_index is used > to restate the virtq once device is started. Commit > b35ccebe3ef7 introduced the clear_virtqueues() routine > to reset the saved avail_index, however, the index > gets cleared a bit earlier before get_vq_state() tries > to read it. This would cause consistency problems when > virtq is restarted, e.g. through a series of link down > and link up events. We could defer the clearing of > avail_index to until the device is to be started, > i.e. until VIRTIO_CONFIG_S_DRIVER_OK is set again in > set_status(). > > Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map") > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> > --- > drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c > index aa6f8cd..444ab58 100644 > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) > if (!status) { > mlx5_vdpa_info(mvdev, "performing device reset\n"); > teardown_driver(ndev); > - clear_virtqueues(ndev); > mlx5_vdpa_destroy_mr(&ndev->mvdev); > ndev->mvdev.status = 0; > ++mvdev->generation; > @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) > > if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) { > if (status & VIRTIO_CONFIG_S_DRIVER_OK) { > + clear_virtqueues(ndev); Rethink about this. As mentioned in another thread, this in fact breaks set_vq_state(). (See vhost_virtqueue_start() -> vhost_vdpa_set_vring_base() in qemu codes). The issue is that the avail idx is forgot, we need keep it. Thanks > err = setup_driver(ndev); > if (err) { > mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
On 2/8/2021 7:37 PM, Jason Wang wrote: > > On 2021/2/6 下午8:29, Si-Wei Liu wrote: >> While virtq is stopped, get_vq_state() is supposed to >> be called to get sync'ed with the latest internal >> avail_index from device. The saved avail_index is used >> to restate the virtq once device is started. Commit >> b35ccebe3ef7 introduced the clear_virtqueues() routine >> to reset the saved avail_index, however, the index >> gets cleared a bit earlier before get_vq_state() tries >> to read it. This would cause consistency problems when >> virtq is restarted, e.g. through a series of link down >> and link up events. We could defer the clearing of >> avail_index to until the device is to be started, >> i.e. until VIRTIO_CONFIG_S_DRIVER_OK is set again in >> set_status(). >> >> Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index >> after change map") >> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> >> --- >> drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c >> b/drivers/vdpa/mlx5/net/mlx5_vnet.c >> index aa6f8cd..444ab58 100644 >> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c >> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c >> @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct >> vdpa_device *vdev, u8 status) >> if (!status) { >> mlx5_vdpa_info(mvdev, "performing device reset\n"); >> teardown_driver(ndev); >> - clear_virtqueues(ndev); >> mlx5_vdpa_destroy_mr(&ndev->mvdev); >> ndev->mvdev.status = 0; >> ++mvdev->generation; >> @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct >> vdpa_device *vdev, u8 status) >> if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) { >> if (status & VIRTIO_CONFIG_S_DRIVER_OK) { >> + clear_virtqueues(ndev); > > > Rethink about this. As mentioned in another thread, this in fact > breaks set_vq_state(). (See vhost_virtqueue_start() -> > vhost_vdpa_set_vring_base() in qemu codes). I assume that the clearing for vhost-vdpa would be done via (qemu code), vhost_dev_start()->vhost_vdpa_dev_start()->vhost_vdpa_call(status | VIRTIO_CONFIG_S_DRIVER_OK) which is _after_ vhost_virtqueue_start() gets called to restore the avail_idx to h/w in vhost_dev_start(). What am I missing here? -Siwei > > The issue is that the avail idx is forgot, we need keep it. > > Thanks > > >> err = setup_driver(ndev); >> if (err) { >> mlx5_vdpa_warn(mvdev, "failed to setup driver\n"); >
On 2021/2/10 上午8:26, Si-Wei Liu wrote: > > > On 2/8/2021 7:37 PM, Jason Wang wrote: >> >> On 2021/2/6 下午8:29, Si-Wei Liu wrote: >>> While virtq is stopped, get_vq_state() is supposed to >>> be called to get sync'ed with the latest internal >>> avail_index from device. The saved avail_index is used >>> to restate the virtq once device is started. Commit >>> b35ccebe3ef7 introduced the clear_virtqueues() routine >>> to reset the saved avail_index, however, the index >>> gets cleared a bit earlier before get_vq_state() tries >>> to read it. This would cause consistency problems when >>> virtq is restarted, e.g. through a series of link down >>> and link up events. We could defer the clearing of >>> avail_index to until the device is to be started, >>> i.e. until VIRTIO_CONFIG_S_DRIVER_OK is set again in >>> set_status(). >>> >>> Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index >>> after change map") >>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> >>> --- >>> drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c >>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c >>> index aa6f8cd..444ab58 100644 >>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c >>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c >>> @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct >>> vdpa_device *vdev, u8 status) >>> if (!status) { >>> mlx5_vdpa_info(mvdev, "performing device reset\n"); >>> teardown_driver(ndev); >>> - clear_virtqueues(ndev); >>> mlx5_vdpa_destroy_mr(&ndev->mvdev); >>> ndev->mvdev.status = 0; >>> ++mvdev->generation; >>> @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct >>> vdpa_device *vdev, u8 status) >>> if ((status ^ ndev->mvdev.status) & >>> VIRTIO_CONFIG_S_DRIVER_OK) { >>> if (status & VIRTIO_CONFIG_S_DRIVER_OK) { >>> + clear_virtqueues(ndev); >> >> >> Rethink about this. As mentioned in another thread, this in fact >> breaks set_vq_state(). (See vhost_virtqueue_start() -> >> vhost_vdpa_set_vring_base() in qemu codes). > I assume that the clearing for vhost-vdpa would be done via (qemu code), > > vhost_dev_start()->vhost_vdpa_dev_start()->vhost_vdpa_call(status | > VIRTIO_CONFIG_S_DRIVER_OK) > > which is _after_ vhost_virtqueue_start() gets called to restore the > avail_idx to h/w in vhost_dev_start(). What am I missing here? > > -Siwei I think not. I thought clear_virtqueues() will clear hardware index but looks not. (I guess we need a better name other than clear_virtqueues(), e.g from the name it looks like the it will clear the hardware states) Thanks > > >> >> The issue is that the avail idx is forgot, we need keep it. >> >> Thanks >> >> >>> err = setup_driver(ndev); >>> if (err) { >>> mlx5_vdpa_warn(mvdev, "failed to setup driver\n"); >> >
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index aa6f8cd..444ab58 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if (!status) { mlx5_vdpa_info(mvdev, "performing device reset\n"); teardown_driver(ndev); - clear_virtqueues(ndev); mlx5_vdpa_destroy_mr(&ndev->mvdev); ndev->mvdev.status = 0; ++mvdev->generation; @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) { if (status & VIRTIO_CONFIG_S_DRIVER_OK) { + clear_virtqueues(ndev); err = setup_driver(ndev); if (err) { mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
While virtq is stopped, get_vq_state() is supposed to be called to get sync'ed with the latest internal avail_index from device. The saved avail_index is used to restate the virtq once device is started. Commit b35ccebe3ef7 introduced the clear_virtqueues() routine to reset the saved avail_index, however, the index gets cleared a bit earlier before get_vq_state() tries to read it. This would cause consistency problems when virtq is restarted, e.g. through a series of link down and link up events. We could defer the clearing of avail_index to until the device is to be started, i.e. until VIRTIO_CONFIG_S_DRIVER_OK is set again in set_status(). Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map") Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)