Message ID | 1657035141-2132-1-git-send-email-ssengar@linux.microsoft.com |
---|---|
State | New |
Headers | show |
Series | scsi: storvsc: Prevent running tasklet for long | expand |
From: Saurabh Sengar <ssengar@linux.microsoft.com> Sent: Tuesday, July 5, 2022 8:32 AM > > There can be scenarios where packets in ring buffer are continuously > getting queued from upper layer and dequeued from storvsc interrupt > handler, such scenarios can hold the foreach_vmbus_pkt loop (which is > executing as a tasklet) for a long duration. Theoretically its possible > that this loop executes forever. Actually, the "forever" part isn't possible. The way foreach_vmbus_pkt() works, it only runs until the ring buffer is empty, and it does not update the ring buffer read index until the empty condition is reached. So the Hyper-V host is prevented from continuously feeding new packets into the ring buffer while storvsc is reading them. But ring buffer is pretty big, so storvsc could still end up processing a lot of completion packets and take an undesirable amount of time in the loop. Hence the scenario is valid. > Add a condition to limit execution of > this tasklet for finite amount of time to avoid such hazardous scenarios. > > Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> > --- > drivers/scsi/storvsc_drv.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > index fe000da..0c428cb 100644 > --- a/drivers/scsi/storvsc_drv.c > +++ b/drivers/scsi/storvsc_drv.c > @@ -60,6 +60,9 @@ > #define VMSTOR_PROTO_VERSION_WIN8_1 VMSTOR_PROTO_VERSION(6, 0) > #define VMSTOR_PROTO_VERSION_WIN10 VMSTOR_PROTO_VERSION(6, 2) > > +/* channel callback timeout in ms */ > +#define CALLBACK_TIMEOUT 5 > + > /* Packet structure describing virtual storage requests. */ > enum vstor_packet_operation { > VSTOR_OPERATION_COMPLETE_IO = 1, > @@ -1204,6 +1207,7 @@ static void storvsc_on_channel_callback(void *context) > struct hv_device *device; > struct storvsc_device *stor_device; > struct Scsi_Host *shost; > + unsigned long expire = jiffies + msecs_to_jiffies(CALLBACK_TIMEOUT); > > if (channel->primary_channel != NULL) > device = channel->primary_channel->device_obj; > @@ -1224,6 +1228,9 @@ static void storvsc_on_channel_callback(void *context) > u32 minlen = rqst_id ? sizeof(struct vstor_packet) : > sizeof(enum vstor_packet_operation); > > + if (time_after(jiffies, expire)) > + break; > + Per my comment on the commit message, breaking out of the foreach_vmbus_pkt() loop leaves the ring indices unchanged. I *think* we want to close out the iteration via hv_pkt_iter_close() in the case where we exit the loop early, so that Hyper-V can re-use the ring buffer space from completions we've already processed. Also when we re-enter storvsc_on_channel_callback() and again do hv_pkt_iter_first(), that should be only after hv_pkt_iter_close() has been called; i.e., the hv_pkt_iter_* functions should be matched up correctly. The current implementation of these functions doesn't go awry if they aren't matched up correctly, but we should nonetheless do them correctly in case the implementation changes in the future. Similar code in netvsc_poll() has the same behavior of not closing out the hv_pkt_iter when the poll budget is exhausted, and we've had a recent internal discussion about changing that. Michael > if (pktlen < minlen) { > dev_err(&device->device, > "Invalid pkt: id=%llu, len=%u, minlen=%u\n", > -- > 1.8.3.1
On Wed, Jul 06, 2022 at 02:44:42PM +0530, Praveen Kumar wrote: > On 05-07-2022 21:02, Saurabh Sengar wrote: > > There can be scenarios where packets in ring buffer are continuously > > getting queued from upper layer and dequeued from storvsc interrupt > > handler, such scenarios can hold the foreach_vmbus_pkt loop (which is > > executing as a tasklet) for a long duration. Theoretically its possible > > that this loop executes forever. Add a condition to limit execution of > > this tasklet for finite amount of time to avoid such hazardous scenarios. > > > > Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> > > --- > > drivers/scsi/storvsc_drv.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > > index fe000da..0c428cb 100644 > > --- a/drivers/scsi/storvsc_drv.c > > +++ b/drivers/scsi/storvsc_drv.c > > @@ -60,6 +60,9 @@ > > #define VMSTOR_PROTO_VERSION_WIN8_1 VMSTOR_PROTO_VERSION(6, 0) > > #define VMSTOR_PROTO_VERSION_WIN10 VMSTOR_PROTO_VERSION(6, 2) > > > > +/* channel callback timeout in ms */ > > +#define CALLBACK_TIMEOUT 5 > > If I may, it would be good if we have the CALLBACK_TIMEOUT configurable based upon user's requirement with default value to '5'. > I assume, this value '5' fits best to the use-case which we are trying to resolve here. Thanks. Agree, how about adding a sysfs entry for this parameter > > > + > > /* Packet structure describing virtual storage requests. */ > > enum vstor_packet_operation { > > VSTOR_OPERATION_COMPLETE_IO = 1, > > @@ -1204,6 +1207,7 @@ static void storvsc_on_channel_callback(void *context) > > struct hv_device *device; > > struct storvsc_device *stor_device; > > struct Scsi_Host *shost; > > + unsigned long expire = jiffies + msecs_to_jiffies(CALLBACK_TIMEOUT); > > > > if (channel->primary_channel != NULL) > > device = channel->primary_channel->device_obj; > > @@ -1224,6 +1228,9 @@ static void storvsc_on_channel_callback(void *context) > > u32 minlen = rqst_id ? sizeof(struct vstor_packet) : > > sizeof(enum vstor_packet_operation); > > > > + if (time_after(jiffies, expire)) > > + break; > > + > > if (pktlen < minlen) { > > dev_err(&device->device, > > "Invalid pkt: id=%llu, len=%u, minlen=%u\n", > > Regards, > > ~Praveen.
On Wed, Jul 06, 2022 at 11:09:43AM +0000, David Laight wrote: > From: Praveen Kumar > > Sent: 06 July 2022 10:15 > > > > On 05-07-2022 21:02, Saurabh Sengar wrote: > > > There can be scenarios where packets in ring buffer are continuously > > > getting queued from upper layer and dequeued from storvsc interrupt > > > handler, such scenarios can hold the foreach_vmbus_pkt loop (which is > > > executing as a tasklet) for a long duration. Theoretically its possible > > > that this loop executes forever. Add a condition to limit execution of > > > this tasklet for finite amount of time to avoid such hazardous scenarios. > > Does this really make much difference? > > I'd guess the tasklet gets immediately rescheduled as soon as > the upper layer queues another packet? > > Or do you get a different 'bug' where it is never woken again > because the ring is stuck full? > > David My initial understanding was that staying in a tasklet for "too long" may not be a good idea, however I was not sure what the "too long" value be, thus we are thinking to provide this parameter as a configurable sysfs entry. I couldn't find any linux doc justifying this, so please correct me here if I am mistaken. We have also considered the networking drivers NAPI budget feature while deciding this approach, where softirq exits once the budget is crossed. This budget feature act as a performance tuning parameter for driver, and also can help with ring buffer overflow. I believe similar reasons are true for scsi softirq as well. NAPI budget Ref : https://wiki.linuxfoundation.org/networking/napi. - Saurabh > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales)
From: Saurabh Singh Sengar > Sent: 08 July 2022 11:42 > > On Wed, Jul 06, 2022 at 11:09:43AM +0000, David Laight wrote: > > From: Praveen Kumar > > > Sent: 06 July 2022 10:15 > > > > > > On 05-07-2022 21:02, Saurabh Sengar wrote: > > > > There can be scenarios where packets in ring buffer are continuously > > > > getting queued from upper layer and dequeued from storvsc interrupt > > > > handler, such scenarios can hold the foreach_vmbus_pkt loop (which is > > > > executing as a tasklet) for a long duration. Theoretically its possible > > > > that this loop executes forever. Add a condition to limit execution of > > > > this tasklet for finite amount of time to avoid such hazardous scenarios. > > > > Does this really make much difference? > > > > I'd guess the tasklet gets immediately rescheduled as soon as > > the upper layer queues another packet? > > > > Or do you get a different 'bug' where it is never woken again > > because the ring is stuck full? > > > > David > > My initial understanding was that staying in a tasklet for "too long" may not be a > good idea, however I was not sure what the "too long" value be, thus we are thinking > to provide this parameter as a configurable sysfs entry. I couldn't find any linux > doc justifying this, so please correct me here if I am mistaken. > We have also considered the networking drivers NAPI budget feature while deciding > this approach, where softirq exits once the budget is crossed. This budget feature > act as a performance tuning parameter for driver, and also can help with ring buffer > overflow. I believe similar reasons are true for scsi softirq as well. > > NAPI budget Ref : https://wiki.linuxfoundation.org/networking/napi. The NAPI 'budget' just changes where the system loops. Instead of looping inside the ethernet driver rx processing it first loops through the other functions of that 'napi' and then loops in the softirq code. All that just allows other 'softint' functions to run. The softint code itself will defer to a kernel thread. The softint code got patched (by Eric) to make it defer to the thread less often (to avoid dropping packets), but that change got reverted (well the original check was added back) so the problem Eric was fixing got re-introduced. If you are trying to stop rx ring buffer overflow then you do need it to loop. If the softint code ever defers to a thread you lose big-time. The only way I've managed to receive 500,000 packets/sec is using threaded napi and setting the napi thread to run under the RT scheduler. If you are looping from the softint scheduler you probably need to return within a few microseconds - otherwise the ethernet receive will start dropping packets at moderate loads. OTOH I don't know where 'tasklet' get scheduler from. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, Jul 08, 2022 at 03:42:03AM -0700, Saurabh Singh Sengar wrote: > On Wed, Jul 06, 2022 at 11:09:43AM +0000, David Laight wrote: > > From: Praveen Kumar > > > Sent: 06 July 2022 10:15 > > > > > > On 05-07-2022 21:02, Saurabh Sengar wrote: > > > > There can be scenarios where packets in ring buffer are continuously > > > > getting queued from upper layer and dequeued from storvsc interrupt > > > > handler, such scenarios can hold the foreach_vmbus_pkt loop (which is > > > > executing as a tasklet) for a long duration. Theoretically its possible > > > > that this loop executes forever. Add a condition to limit execution of > > > > this tasklet for finite amount of time to avoid such hazardous scenarios. > > > > Does this really make much difference? > > > > I'd guess the tasklet gets immediately rescheduled as soon as > > the upper layer queues another packet? > > > > Or do you get a different 'bug' where it is never woken again > > because the ring is stuck full? > > > > David > > My initial understanding was that staying in a tasklet for "too long" may not be a > good idea, however I was not sure what the "too long" value be, thus we are thinking > to provide this parameter as a configurable sysfs entry. I couldn't find any linux > doc justifying this, so please correct me here if I am mistaken. Staying in tasklet for "too long" is only an issue if you have other imporant work to do. You might be interested in improving fairness/latency of various kinds of workloads vs. storvsc: * different storage devices * storvsc vs. netdevs * storvsc vs. userspace Which one are you trying to address? Or is performance the highest concern? Then you would likely prefer to keep polling as long as possible. > We have also considered the networking drivers NAPI budget feature while deciding > this approach, where softirq exits once the budget is crossed. This budget feature > act as a performance tuning parameter for driver, and also can help with ring buffer > overflow. I believe similar reasons are true for scsi softirq as well. > > NAPI budget Ref : https://wiki.linuxfoundation.org/networking/napi. > > - Saurabh Reading code here https://elixir.bootlin.com/linux/latest/source/drivers/hv/connection.c#L448, it looks like if you restricted storvsc to only process a finite amount of packets per call you would achieve the *budget* effect. You would get called again if there are more packets to consume and there is already a timeout in that function. Having two different timeouts at these 2 levels will have weird interactions. There is also the irq_poll facility that exists for the block layer and serves a similar purpose as NAPI. You would need to switch to using HV_CALL_ISR. Jeremi > > > > > > - > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > > Registration No: 1397386 (Wales)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index fe000da..0c428cb 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -60,6 +60,9 @@ #define VMSTOR_PROTO_VERSION_WIN8_1 VMSTOR_PROTO_VERSION(6, 0) #define VMSTOR_PROTO_VERSION_WIN10 VMSTOR_PROTO_VERSION(6, 2) +/* channel callback timeout in ms */ +#define CALLBACK_TIMEOUT 5 + /* Packet structure describing virtual storage requests. */ enum vstor_packet_operation { VSTOR_OPERATION_COMPLETE_IO = 1, @@ -1204,6 +1207,7 @@ static void storvsc_on_channel_callback(void *context) struct hv_device *device; struct storvsc_device *stor_device; struct Scsi_Host *shost; + unsigned long expire = jiffies + msecs_to_jiffies(CALLBACK_TIMEOUT); if (channel->primary_channel != NULL) device = channel->primary_channel->device_obj; @@ -1224,6 +1228,9 @@ static void storvsc_on_channel_callback(void *context) u32 minlen = rqst_id ? sizeof(struct vstor_packet) : sizeof(enum vstor_packet_operation); + if (time_after(jiffies, expire)) + break; + if (pktlen < minlen) { dev_err(&device->device, "Invalid pkt: id=%llu, len=%u, minlen=%u\n",
There can be scenarios where packets in ring buffer are continuously getting queued from upper layer and dequeued from storvsc interrupt handler, such scenarios can hold the foreach_vmbus_pkt loop (which is executing as a tasklet) for a long duration. Theoretically its possible that this loop executes forever. Add a condition to limit execution of this tasklet for finite amount of time to avoid such hazardous scenarios. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> --- drivers/scsi/storvsc_drv.c | 7 +++++++ 1 file changed, 7 insertions(+)