diff mbox series

scsi: storvsc: Prevent running tasklet for long

Message ID 1657035141-2132-1-git-send-email-ssengar@linux.microsoft.com
State New
Headers show
Series scsi: storvsc: Prevent running tasklet for long | expand

Commit Message

Saurabh Singh Sengar July 5, 2022, 3:32 p.m. UTC
There can be scenarios where packets in ring buffer are continuously
getting queued from upper layer and dequeued from storvsc interrupt
handler, such scenarios can hold the foreach_vmbus_pkt loop (which is
executing as a tasklet) for a long duration. Theoretically its possible
that this loop executes forever. Add a condition to limit execution of
this tasklet for finite amount of time to avoid such hazardous scenarios.

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
---
 drivers/scsi/storvsc_drv.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Michael Kelley July 5, 2022, 5:39 p.m. UTC | #1
From: Saurabh Sengar <ssengar@linux.microsoft.com> Sent: Tuesday, July 5, 2022 8:32 AM
> 
> There can be scenarios where packets in ring buffer are continuously
> getting queued from upper layer and dequeued from storvsc interrupt
> handler, such scenarios can hold the foreach_vmbus_pkt loop (which is
> executing as a tasklet) for a long duration. Theoretically its possible
> that this loop executes forever. 

Actually, the "forever" part isn't possible.  The way foreach_vmbus_pkt()
works, it only runs until the ring buffer is empty, and it does not update
the ring buffer read index until the empty condition is reached.  So the
Hyper-V host is prevented from continuously feeding new packets into the
ring buffer while storvsc is reading them.  But ring buffer is pretty big, so
storvsc could still end up processing a lot of completion packets and take
an undesirable amount of time in the loop.  Hence the scenario is valid.

> Add a condition to limit execution of
> this tasklet for finite amount of time to avoid such hazardous scenarios.
> 
> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> ---
>  drivers/scsi/storvsc_drv.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index fe000da..0c428cb 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -60,6 +60,9 @@
>  #define VMSTOR_PROTO_VERSION_WIN8_1	VMSTOR_PROTO_VERSION(6, 0)
>  #define VMSTOR_PROTO_VERSION_WIN10	VMSTOR_PROTO_VERSION(6, 2)
> 
> +/* channel callback timeout in ms */
> +#define CALLBACK_TIMEOUT		5
> +
>  /*  Packet structure describing virtual storage requests. */
>  enum vstor_packet_operation {
>  	VSTOR_OPERATION_COMPLETE_IO		= 1,
> @@ -1204,6 +1207,7 @@ static void storvsc_on_channel_callback(void *context)
>  	struct hv_device *device;
>  	struct storvsc_device *stor_device;
>  	struct Scsi_Host *shost;
> +	unsigned long expire = jiffies + msecs_to_jiffies(CALLBACK_TIMEOUT);
> 
>  	if (channel->primary_channel != NULL)
>  		device = channel->primary_channel->device_obj;
> @@ -1224,6 +1228,9 @@ static void storvsc_on_channel_callback(void *context)
>  		u32 minlen = rqst_id ? sizeof(struct vstor_packet) :
>  			sizeof(enum vstor_packet_operation);
> 
> +		if (time_after(jiffies, expire))
> +			break;
> +

Per my comment on the commit message, breaking out of the
foreach_vmbus_pkt() loop leaves the ring indices unchanged.  I *think* we
want to close out the iteration via hv_pkt_iter_close() in the case where we
exit the loop early, so that Hyper-V can re-use the ring buffer space from
completions we've already processed.

Also when we re-enter storvsc_on_channel_callback() and again do
hv_pkt_iter_first(), that should be only after hv_pkt_iter_close() has been
called; i.e., the hv_pkt_iter_* functions should be matched up correctly.
The current implementation of these functions doesn't go awry
if they aren't matched up correctly, but we should nonetheless do them
correctly in case the implementation changes in the future.

Similar code in netvsc_poll() has the same behavior of not closing out
the hv_pkt_iter when the poll budget is exhausted, and we've had a recent
internal discussion about changing that.

Michael



>  		if (pktlen < minlen) {
>  			dev_err(&device->device,
>  				"Invalid pkt: id=%llu, len=%u, minlen=%u\n",
> --
> 1.8.3.1
Saurabh Singh Sengar July 6, 2022, 9:53 a.m. UTC | #2
On Wed, Jul 06, 2022 at 02:44:42PM +0530, Praveen Kumar wrote:
> On 05-07-2022 21:02, Saurabh Sengar wrote:
> > There can be scenarios where packets in ring buffer are continuously
> > getting queued from upper layer and dequeued from storvsc interrupt
> > handler, such scenarios can hold the foreach_vmbus_pkt loop (which is
> > executing as a tasklet) for a long duration. Theoretically its possible
> > that this loop executes forever. Add a condition to limit execution of
> > this tasklet for finite amount of time to avoid such hazardous scenarios.
> > 
> > Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> > ---
> >  drivers/scsi/storvsc_drv.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > index fe000da..0c428cb 100644
> > --- a/drivers/scsi/storvsc_drv.c
> > +++ b/drivers/scsi/storvsc_drv.c
> > @@ -60,6 +60,9 @@
> >  #define VMSTOR_PROTO_VERSION_WIN8_1	VMSTOR_PROTO_VERSION(6, 0)
> >  #define VMSTOR_PROTO_VERSION_WIN10	VMSTOR_PROTO_VERSION(6, 2)
> >  
> > +/* channel callback timeout in ms */
> > +#define CALLBACK_TIMEOUT		5
> 
> If I may, it would be good if we have the CALLBACK_TIMEOUT configurable based upon user's requirement with default value to '5'.
> I assume, this value '5' fits best to the use-case which we are trying to resolve here. Thanks.

Agree, how about adding a sysfs entry for this parameter

> 
> > +
> >  /*  Packet structure describing virtual storage requests. */
> >  enum vstor_packet_operation {
> >  	VSTOR_OPERATION_COMPLETE_IO		= 1,
> > @@ -1204,6 +1207,7 @@ static void storvsc_on_channel_callback(void *context)
> >  	struct hv_device *device;
> >  	struct storvsc_device *stor_device;
> >  	struct Scsi_Host *shost;
> > +	unsigned long expire = jiffies + msecs_to_jiffies(CALLBACK_TIMEOUT);
> >  
> >  	if (channel->primary_channel != NULL)
> >  		device = channel->primary_channel->device_obj;
> > @@ -1224,6 +1228,9 @@ static void storvsc_on_channel_callback(void *context)
> >  		u32 minlen = rqst_id ? sizeof(struct vstor_packet) :
> >  			sizeof(enum vstor_packet_operation);
> >  
> > +		if (time_after(jiffies, expire))
> > +			break;
> > +
> >  		if (pktlen < minlen) {
> >  			dev_err(&device->device,
> >  				"Invalid pkt: id=%llu, len=%u, minlen=%u\n",
> 
> Regards,
> 
> ~Praveen.
Saurabh Singh Sengar July 8, 2022, 10:42 a.m. UTC | #3
On Wed, Jul 06, 2022 at 11:09:43AM +0000, David Laight wrote:
> From: Praveen Kumar
> > Sent: 06 July 2022 10:15
> > 
> > On 05-07-2022 21:02, Saurabh Sengar wrote:
> > > There can be scenarios where packets in ring buffer are continuously
> > > getting queued from upper layer and dequeued from storvsc interrupt
> > > handler, such scenarios can hold the foreach_vmbus_pkt loop (which is
> > > executing as a tasklet) for a long duration. Theoretically its possible
> > > that this loop executes forever. Add a condition to limit execution of
> > > this tasklet for finite amount of time to avoid such hazardous scenarios.
> 
> Does this really make much difference?
> 
> I'd guess the tasklet gets immediately rescheduled as soon as
> the upper layer queues another packet?
> 
> Or do you get a different 'bug' where it is never woken again
> because the ring is stuck full?
> 
> 	David

My initial understanding was that staying in a tasklet for "too long" may not be a
good idea, however I was not sure what the "too long" value be, thus we are thinking
to provide this parameter as a configurable sysfs entry. I couldn't find any linux
doc justifying this, so please correct me here if I am mistaken.
We have also considered the networking drivers NAPI budget feature while deciding
this approach, where softirq exits once the budget is crossed. This budget feature
act as a performance tuning parameter for driver, and also can help with ring buffer
overflow. I believe similar reasons are true for scsi softirq as well.

NAPI budget Ref : https://wiki.linuxfoundation.org/networking/napi.

- Saurabh


> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
David Laight July 8, 2022, 12:41 p.m. UTC | #4
From: Saurabh Singh Sengar
> Sent: 08 July 2022 11:42
> 
> On Wed, Jul 06, 2022 at 11:09:43AM +0000, David Laight wrote:
> > From: Praveen Kumar
> > > Sent: 06 July 2022 10:15
> > >
> > > On 05-07-2022 21:02, Saurabh Sengar wrote:
> > > > There can be scenarios where packets in ring buffer are continuously
> > > > getting queued from upper layer and dequeued from storvsc interrupt
> > > > handler, such scenarios can hold the foreach_vmbus_pkt loop (which is
> > > > executing as a tasklet) for a long duration. Theoretically its possible
> > > > that this loop executes forever. Add a condition to limit execution of
> > > > this tasklet for finite amount of time to avoid such hazardous scenarios.
> >
> > Does this really make much difference?
> >
> > I'd guess the tasklet gets immediately rescheduled as soon as
> > the upper layer queues another packet?
> >
> > Or do you get a different 'bug' where it is never woken again
> > because the ring is stuck full?
> >
> > 	David
> 
> My initial understanding was that staying in a tasklet for "too long" may not be a
> good idea, however I was not sure what the "too long" value be, thus we are thinking
> to provide this parameter as a configurable sysfs entry. I couldn't find any linux
> doc justifying this, so please correct me here if I am mistaken.
> We have also considered the networking drivers NAPI budget feature while deciding
> this approach, where softirq exits once the budget is crossed. This budget feature
> act as a performance tuning parameter for driver, and also can help with ring buffer
> overflow. I believe similar reasons are true for scsi softirq as well.
> 
> NAPI budget Ref : https://wiki.linuxfoundation.org/networking/napi.

The NAPI 'budget' just changes where the system loops.
Instead of looping inside the ethernet driver rx processing
it first loops through the other functions of that 'napi' and
then loops in the softirq code.
All that just allows other 'softint' functions to run.
The softint code itself will defer to a kernel thread.
The softint code got patched (by Eric) to make it defer to
the thread less often (to avoid dropping packets), but that
change got reverted (well the original check was added back)
so the problem Eric was fixing got re-introduced.

If you are trying to stop rx ring buffer overflow then you
do need it to loop. If the softint code ever defers to a thread
you lose big-time.

The only way I've managed to receive 500,000 packets/sec is
using threaded napi and setting the napi thread to run under
the RT scheduler.

If you are looping from the softint scheduler you probably need to
return within a few microseconds - otherwise the ethernet
receive will start dropping packets at moderate loads.

OTOH I don't know where 'tasklet' get scheduler from.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Jeremi Piotrowski July 8, 2022, 12:43 p.m. UTC | #5
On Fri, Jul 08, 2022 at 03:42:03AM -0700, Saurabh Singh Sengar wrote:
> On Wed, Jul 06, 2022 at 11:09:43AM +0000, David Laight wrote:
> > From: Praveen Kumar
> > > Sent: 06 July 2022 10:15
> > > 
> > > On 05-07-2022 21:02, Saurabh Sengar wrote:
> > > > There can be scenarios where packets in ring buffer are continuously
> > > > getting queued from upper layer and dequeued from storvsc interrupt
> > > > handler, such scenarios can hold the foreach_vmbus_pkt loop (which is
> > > > executing as a tasklet) for a long duration. Theoretically its possible
> > > > that this loop executes forever. Add a condition to limit execution of
> > > > this tasklet for finite amount of time to avoid such hazardous scenarios.
> > 
> > Does this really make much difference?
> > 
> > I'd guess the tasklet gets immediately rescheduled as soon as
> > the upper layer queues another packet?
> > 
> > Or do you get a different 'bug' where it is never woken again
> > because the ring is stuck full?
> > 
> > 	David
> 
> My initial understanding was that staying in a tasklet for "too long" may not be a
> good idea, however I was not sure what the "too long" value be, thus we are thinking
> to provide this parameter as a configurable sysfs entry. I couldn't find any linux
> doc justifying this, so please correct me here if I am mistaken.

Staying in tasklet for "too long" is only an issue if you have other imporant
work to do. You might be interested in improving fairness/latency of various
kinds of workloads vs. storvsc:
* different storage devices
* storvsc vs. netdevs
* storvsc vs. userspace

Which one are you trying to address? Or is performance the highest concern?
Then you would likely prefer to keep polling as long as possible.

> We have also considered the networking drivers NAPI budget feature while deciding
> this approach, where softirq exits once the budget is crossed. This budget feature
> act as a performance tuning parameter for driver, and also can help with ring buffer
> overflow. I believe similar reasons are true for scsi softirq as well.
> 
> NAPI budget Ref : https://wiki.linuxfoundation.org/networking/napi.
> 
> - Saurabh

Reading code here https://elixir.bootlin.com/linux/latest/source/drivers/hv/connection.c#L448,
it looks like if you restricted storvsc to only process a finite amount of
packets per call you would achieve the *budget* effect. You would get called
again if there are more packets to consume and there is already a timeout in
that function. Having two different timeouts at these 2 levels will have weird
interactions.

There is also the irq_poll facility that exists for the block layer and serves
a similar purpose as NAPI. You would need to switch to using HV_CALL_ISR.

Jeremi

> 
> 
> > 
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)
diff mbox series

Patch

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index fe000da..0c428cb 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -60,6 +60,9 @@ 
 #define VMSTOR_PROTO_VERSION_WIN8_1	VMSTOR_PROTO_VERSION(6, 0)
 #define VMSTOR_PROTO_VERSION_WIN10	VMSTOR_PROTO_VERSION(6, 2)
 
+/* channel callback timeout in ms */
+#define CALLBACK_TIMEOUT		5
+
 /*  Packet structure describing virtual storage requests. */
 enum vstor_packet_operation {
 	VSTOR_OPERATION_COMPLETE_IO		= 1,
@@ -1204,6 +1207,7 @@  static void storvsc_on_channel_callback(void *context)
 	struct hv_device *device;
 	struct storvsc_device *stor_device;
 	struct Scsi_Host *shost;
+	unsigned long expire = jiffies + msecs_to_jiffies(CALLBACK_TIMEOUT);
 
 	if (channel->primary_channel != NULL)
 		device = channel->primary_channel->device_obj;
@@ -1224,6 +1228,9 @@  static void storvsc_on_channel_callback(void *context)
 		u32 minlen = rqst_id ? sizeof(struct vstor_packet) :
 			sizeof(enum vstor_packet_operation);
 
+		if (time_after(jiffies, expire))
+			break;
+
 		if (pktlen < minlen) {
 			dev_err(&device->device,
 				"Invalid pkt: id=%llu, len=%u, minlen=%u\n",