diff mbox series

[iproute2-next,v1] devlink: display elapsed time during flash update

Message ID 20200929215651.3538844-1-jacob.e.keller@intel.com
State New
Headers show
Series [iproute2-next,v1] devlink: display elapsed time during flash update | expand

Commit Message

Jacob Keller Sept. 29, 2020, 9:56 p.m. UTC
For some devices, updating the flash can take significant time during
operations where no status can meaningfully be reported. This can be
somewhat confusing to a user who sees devlink appear to hang on the
terminal waiting for the device to update.

Recent changes to the kernel interface allow such long running commands
to provide a timeout value indicating some upper bound on how long the
relevant action could take.

Provide a ticking counter of the time elapsed since the previous status
message in order to make it clear that the program is not simply stuck.

Display this message whenever the status message from the kernel
indicates a timeout value. Additionally also display the message if
we've received no status for more than couple of seconds. If we elapse
more than the timeout provided by the status message, replace the
timeout display with "timeout reached".

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---

This is a respin of an RFC at [1] based on feedback. This version works as I
would expect, 

Changes since RFC
* Add fflush, fixing jittery output
* Since we're only comparing the seconds value, use "> 2" instead of "> 3"
  so that we begin displaying the elapsed time at 3 seconds rather than 4
  seconds.
* store only the string length instead of the full message in the context
* Rename some variables for clarity
* If we have a timeout, always display the elapsed time, instead of waiting
  for a few seconds.
* Fix a typo in a comment referring to 1/20th when the code used 1/10th of a
  second timeout.

 devlink/devlink.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 94 insertions(+), 1 deletion(-)


base-commit: d2be31d9b671ec0b3e32f56f9c913e249ed048bd

Comments

Shannon Nelson Sept. 29, 2020, 10:44 p.m. UTC | #1
On 9/29/20 2:56 PM, Jacob Keller wrote:
> For some devices, updating the flash can take significant time during
> operations where no status can meaningfully be reported. This can be
> somewhat confusing to a user who sees devlink appear to hang on the
> terminal waiting for the device to update.
>
> Recent changes to the kernel interface allow such long running commands
> to provide a timeout value indicating some upper bound on how long the
> relevant action could take.
>
> Provide a ticking counter of the time elapsed since the previous status
> message in order to make it clear that the program is not simply stuck.
>
> Display this message whenever the status message from the kernel
> indicates a timeout value. Additionally also display the message if
> we've received no status for more than couple of seconds. If we elapse
> more than the timeout provided by the status message, replace the
> timeout display with "timeout reached".
>
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---
>

Thanks, Jake.  In general this seems to work pretty well.  One thing, 
tho'...

Our fw download is slow (I won't go into the reasons here) so we're 
clicking through the Download x% over maybe 100+ seconds.  Since we send 
an update every 3% or so, we end up seeing the ( 0m 3s ) pop up and stay 
there the whole time, looking a little odd:

     ./iproute2-5.8.0/devlink/devlink dev flash pci/0000:b5:00.0 file 
ionic/dsc_fw_1.15.0-150.tar
     Preparing to flash
     Downloading  37% ( 0m 3s )
   ...
     Downloading  59% ( 0m 3s )
   ...
     Downloading  83% ( 0m 3s )

And at the end we see:

     Preparing to flash
     Downloading 100% ( 0m 3s )
     Installing ( 0m 43s : 25m 0s )
     Selecting ( 0m 5s : 0m 30s )
     Flash done

I can have the driver do updates more often in order to stay under the 3 
second limit and hide this, but it looks a bit funky, especially at the 
end where I know that 100% took a lot longer than 3 seconds.

sln
Jacob Keller Sept. 30, 2020, 9:20 p.m. UTC | #2
On 9/29/2020 3:44 PM, Shannon Nelson wrote:
> On 9/29/20 2:56 PM, Jacob Keller wrote:
>> For some devices, updating the flash can take significant time during
>> operations where no status can meaningfully be reported. This can be
>> somewhat confusing to a user who sees devlink appear to hang on the
>> terminal waiting for the device to update.
>>
>> Recent changes to the kernel interface allow such long running commands
>> to provide a timeout value indicating some upper bound on how long the
>> relevant action could take.
>>
>> Provide a ticking counter of the time elapsed since the previous status
>> message in order to make it clear that the program is not simply stuck.
>>
>> Display this message whenever the status message from the kernel
>> indicates a timeout value. Additionally also display the message if
>> we've received no status for more than couple of seconds. If we elapse
>> more than the timeout provided by the status message, replace the
>> timeout display with "timeout reached".
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>> ---
>>
> 
> Thanks, Jake.  In general this seems to work pretty well.  One thing, 
> tho'...
> 
> Our fw download is slow (I won't go into the reasons here) so we're 
> clicking through the Download x% over maybe 100+ seconds.  Since we send 
> an update every 3% or so, we end up seeing the ( 0m 3s ) pop up and stay 
> there the whole time, looking a little odd:
> 
>      ./iproute2-5.8.0/devlink/devlink dev flash pci/0000:b5:00.0 file 
> ionic/dsc_fw_1.15.0-150.tar
>      Preparing to flash
>      Downloading  37% ( 0m 3s )
>    ...
>      Downloading  59% ( 0m 3s )
>    ...
>      Downloading  83% ( 0m 3s )
> 
> And at the end we see:
> 
>      Preparing to flash
>      Downloading 100% ( 0m 3s )
>      Installing ( 0m 43s : 25m 0s )
>      Selecting ( 0m 5s : 0m 30s )
>      Flash done
> 
> I can have the driver do updates more often in order to stay under the 3 
> second limit and hide this, but it looks a bit funky, especially at the 
> end where I know that 100% took a lot longer than 3 seconds.
> 

I think we have two options here:

1) never display an elapsed time when we have done/total information

or

2) treat elapsed time as a measure since the last status message
changed, refactoring this so that it shows the total time spent on that
status message.

Thoughts on this? I think I'm leaning towards (2) at the moment myself.
This might lead to displaying the timing info on many % calculations
though... Hmm

> sln
> 
>
Jakub Kicinski Sept. 30, 2020, 9:36 p.m. UTC | #3
On Wed, 30 Sep 2020 14:20:43 -0700 Jacob Keller wrote:
> > Thanks, Jake.  In general this seems to work pretty well.  One thing, 
> > tho'...
> > 
> > Our fw download is slow (I won't go into the reasons here) so we're 
> > clicking through the Download x% over maybe 100+ seconds.  Since we send 
> > an update every 3% or so, we end up seeing the ( 0m 3s ) pop up and stay 
> > there the whole time, looking a little odd:
> > 
> >      ./iproute2-5.8.0/devlink/devlink dev flash pci/0000:b5:00.0 file 
> > ionic/dsc_fw_1.15.0-150.tar
> >      Preparing to flash
> >      Downloading  37% ( 0m 3s )
> >    ...
> >      Downloading  59% ( 0m 3s )
> >    ...
> >      Downloading  83% ( 0m 3s )

I'm not sure how to interpret this - are you saying that the timer
doesn't tick up or that the FW happens to complete the operation right
around the 3sec mark?

> > And at the end we see:
> > 
> >      Preparing to flash
> >      Downloading 100% ( 0m 3s )
> >      Installing ( 0m 43s : 25m 0s )
> >      Selecting ( 0m 5s : 0m 30s )
> >      Flash done
> > 
> > I can have the driver do updates more often in order to stay under the 3 
> > second limit and hide this, but it looks a bit funky, especially at the 
> > end where I know that 100% took a lot longer than 3 seconds.
> >   
> 
> I think we have two options here:
> 
> 1) never display an elapsed time when we have done/total information
> 
> or
> 
> 2) treat elapsed time as a measure since the last status message
> changed, refactoring this so that it shows the total time spent on that
> status message.
> 
> Thoughts on this? I think I'm leaning towards (2) at the moment myself.
> This might lead to displaying the timing info on many % calculations
> though... Hmm

Is the time information useful after stage is complete? I'd just wipe
it before moving on to the next message.
Jacob Keller Sept. 30, 2020, 9:43 p.m. UTC | #4
On 9/30/2020 2:36 PM, Jakub Kicinski wrote:
> On Wed, 30 Sep 2020 14:20:43 -0700 Jacob Keller wrote:
>>> Thanks, Jake.  In general this seems to work pretty well.  One thing, 
>>> tho'...
>>>
>>> Our fw download is slow (I won't go into the reasons here) so we're 
>>> clicking through the Download x% over maybe 100+ seconds.  Since we send 
>>> an update every 3% or so, we end up seeing the ( 0m 3s ) pop up and stay 
>>> there the whole time, looking a little odd:
>>>
>>>      ./iproute2-5.8.0/devlink/devlink dev flash pci/0000:b5:00.0 file 
>>> ionic/dsc_fw_1.15.0-150.tar
>>>      Preparing to flash
>>>      Downloading  37% ( 0m 3s )
>>>    ...
>>>      Downloading  59% ( 0m 3s )
>>>    ...
>>>      Downloading  83% ( 0m 3s )
> 
> I'm not sure how to interpret this - are you saying that the timer
> doesn't tick up or that the FW happens to complete the operation right
> around the 3sec mark?
> 


The elapsed time is calculated from the last status message we receive.
In Shannon's case, the done/total % status messages come approximately
slow enough that the elapsed time message keeps popping up. Since it's
measuring from the last time we got a status message, it looks weird
because it resets to 3 seconds over and over and over.

>>> And at the end we see:
>>>
>>>      Preparing to flash
>>>      Downloading 100% ( 0m 3s )
>>>      Installing ( 0m 43s : 25m 0s )
>>>      Selecting ( 0m 5s : 0m 30s )
>>>      Flash done
>>>
>>> I can have the driver do updates more often in order to stay under the 3 
>>> second limit and hide this, but it looks a bit funky, especially at the 
>>> end where I know that 100% took a lot longer than 3 seconds.
>>>   
>>
>> I think we have two options here:
>>
>> 1) never display an elapsed time when we have done/total information
>>
>> or
>>
>> 2) treat elapsed time as a measure since the last status message
>> changed, refactoring this so that it shows the total time spent on that
>> status message.
>>
>> Thoughts on this? I think I'm leaning towards (2) at the moment myself.
>> This might lead to displaying the timing info on many % calculations
>> though... Hmm
> 
> Is the time information useful after stage is complete? I'd just wipe
> it before moving on to the next message.
> 

My point was about changing when we calculated elapsed time from to be
"since the status message changed" rather than "since the last time the
driver sent any status even if the message remains the same".

I think clearing the timing message is a good improvement either way, so
I'll do that too.
Shannon Nelson Sept. 30, 2020, 9:55 p.m. UTC | #5
On 9/30/20 2:43 PM, Jacob Keller wrote:
> On 9/30/2020 2:36 PM, Jakub Kicinski wrote:
>> On Wed, 30 Sep 2020 14:20:43 -0700 Jacob Keller wrote:
>>>> Thanks, Jake.  In general this seems to work pretty well.  One thing,
>>>> tho'...
>>>>
>>>> Our fw download is slow (I won't go into the reasons here) so we're
>>>> clicking through the Download x% over maybe 100+ seconds.  Since we send
>>>> an update every 3% or so, we end up seeing the ( 0m 3s ) pop up and stay
>>>> there the whole time, looking a little odd:
>>>>
>>>>       ./iproute2-5.8.0/devlink/devlink dev flash pci/0000:b5:00.0 file
>>>> ionic/dsc_fw_1.15.0-150.tar
>>>>       Preparing to flash
>>>>       Downloading  37% ( 0m 3s )
>>>>     ...
>>>>       Downloading  59% ( 0m 3s )
>>>>     ...
>>>>       Downloading  83% ( 0m 3s )
>> I'm not sure how to interpret this - are you saying that the timer
>> doesn't tick up or that the FW happens to complete the operation right
>> around the 3sec mark?
>>
>
> The elapsed time is calculated from the last status message we receive.
> In Shannon's case, the done/total % status messages come approximately
> slow enough that the elapsed time message keeps popping up. Since it's
> measuring from the last time we got a status message, it looks weird
> because it resets to 3 seconds over and over and over.
>
>>>> And at the end we see:
>>>>
>>>>       Preparing to flash
>>>>       Downloading 100% ( 0m 3s )
>>>>       Installing ( 0m 43s : 25m 0s )
>>>>       Selecting ( 0m 5s : 0m 30s )
>>>>       Flash done
>>>>
>>>> I can have the driver do updates more often in order to stay under the 3
>>>> second limit and hide this, but it looks a bit funky, especially at the
>>>> end where I know that 100% took a lot longer than 3 seconds.
>>>>    
>>> I think we have two options here:
>>>
>>> 1) never display an elapsed time when we have done/total information
>>>
>>> or
>>>
>>> 2) treat elapsed time as a measure since the last status message
>>> changed, refactoring this so that it shows the total time spent on that
>>> status message.
>>>
>>> Thoughts on this? I think I'm leaning towards (2) at the moment myself.
>>> This might lead to displaying the timing info on many % calculations
>>> though... Hmm
>> Is the time information useful after stage is complete? I'd just wipe
>> it before moving on to the next message.
>>
> My point was about changing when we calculated elapsed time from to be
> "since the status message changed" rather than "since the last time the
> driver sent any status even if the message remains the same".

This would be better, and is a bit like what I was imagining early on, 
but at this point I'm wondering if the display of the elapsed time is 
actually useful, or simply making it messier.

>
> I think clearing the timing message is a good improvement either way, so
> I'll do that too.
Yes.

sln
Jacob Keller Sept. 30, 2020, 10:02 p.m. UTC | #6
On 9/30/2020 2:55 PM, Shannon Nelson wrote:
> On 9/30/20 2:43 PM, Jacob Keller wrote:
>> On 9/30/2020 2:36 PM, Jakub Kicinski wrote:
>>> On Wed, 30 Sep 2020 14:20:43 -0700 Jacob Keller wrote:
>>>>> Thanks, Jake.  In general this seems to work pretty well.  One thing,
>>>>> tho'...
>>>>>
>>>>> Our fw download is slow (I won't go into the reasons here) so we're
>>>>> clicking through the Download x% over maybe 100+ seconds.  Since we send
>>>>> an update every 3% or so, we end up seeing the ( 0m 3s ) pop up and stay
>>>>> there the whole time, looking a little odd:
>>>>>
>>>>>       ./iproute2-5.8.0/devlink/devlink dev flash pci/0000:b5:00.0 file
>>>>> ionic/dsc_fw_1.15.0-150.tar
>>>>>       Preparing to flash
>>>>>       Downloading  37% ( 0m 3s )
>>>>>     ...
>>>>>       Downloading  59% ( 0m 3s )
>>>>>     ...
>>>>>       Downloading  83% ( 0m 3s )
>>> I'm not sure how to interpret this - are you saying that the timer
>>> doesn't tick up or that the FW happens to complete the operation right
>>> around the 3sec mark?
>>>
>>
>> The elapsed time is calculated from the last status message we receive.
>> In Shannon's case, the done/total % status messages come approximately
>> slow enough that the elapsed time message keeps popping up. Since it's
>> measuring from the last time we got a status message, it looks weird
>> because it resets to 3 seconds over and over and over.
>>
>>>>> And at the end we see:
>>>>>
>>>>>       Preparing to flash
>>>>>       Downloading 100% ( 0m 3s )
>>>>>       Installing ( 0m 43s : 25m 0s )
>>>>>       Selecting ( 0m 5s : 0m 30s )
>>>>>       Flash done
>>>>>
>>>>> I can have the driver do updates more often in order to stay under the 3
>>>>> second limit and hide this, but it looks a bit funky, especially at the
>>>>> end where I know that 100% took a lot longer than 3 seconds.
>>>>>    
>>>> I think we have two options here:
>>>>
>>>> 1) never display an elapsed time when we have done/total information
>>>>
>>>> or
>>>>
>>>> 2) treat elapsed time as a measure since the last status message
>>>> changed, refactoring this so that it shows the total time spent on that
>>>> status message.
>>>>
>>>> Thoughts on this? I think I'm leaning towards (2) at the moment myself.
>>>> This might lead to displaying the timing info on many % calculations
>>>> though... Hmm
>>> Is the time information useful after stage is complete? I'd just wipe
>>> it before moving on to the next message.
>>>
>> My point was about changing when we calculated elapsed time from to be
>> "since the status message changed" rather than "since the last time the
>> driver sent any status even if the message remains the same".
> 
> This would be better, and is a bit like what I was imagining early on, 
> but at this point I'm wondering if the display of the elapsed time is 
> actually useful, or simply making it messier.
> 
>>
>> I think clearing the timing message is a good improvement either way, so
>> I'll do that too.
> Yes.
> 

I think if we always clear the elapsed time on a new message it will
feel a lot cleaner overall.

Thanks,
Jake

> sln
> 
>
diff mbox series

Patch

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 0374175eda3d..7f3bd45c9a6e 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -33,6 +33,7 @@ 
 #include <sys/select.h>
 #include <sys/socket.h>
 #include <sys/types.h>
+#include <sys/time.h>
 #include <rt_names.h>
 
 #include "version.h"
@@ -3066,6 +3067,9 @@  static int cmd_dev_info(struct dl *dl)
 
 struct cmd_dev_flash_status_ctx {
 	struct dl *dl;
+	struct timeval time_of_last_status;
+	uint64_t status_msg_timeout;
+	size_t elapsed_time_msg_len;
 	char *last_msg;
 	char *last_component;
 	uint8_t not_first:1,
@@ -3083,6 +3087,14 @@  static int nullstrcmp(const char *str1, const char *str2)
 	return str1 ? 1 : -1;
 }
 
+static void cmd_dev_flash_clear_elapsed_time(struct cmd_dev_flash_status_ctx *ctx)
+{
+	int i;
+
+	for (i = 0; i < ctx->elapsed_time_msg_len; i++)
+		pr_out_tty("\b");
+}
+
 static int cmd_dev_flash_status_cb(const struct nlmsghdr *nlh, void *data)
 {
 	struct cmd_dev_flash_status_ctx *ctx = data;
@@ -3116,6 +3128,11 @@  static int cmd_dev_flash_status_cb(const struct nlmsghdr *nlh, void *data)
 		return MNL_CB_STOP;
 	}
 
+	cmd_dev_flash_clear_elapsed_time(ctx);
+	gettimeofday(&ctx->time_of_last_status, NULL);
+	ctx->status_msg_timeout = 0;
+	ctx->elapsed_time_msg_len = 0;
+
 	if (tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_MSG])
 		msg = mnl_attr_get_str(tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_MSG]);
 	if (tb[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT])
@@ -3124,6 +3141,8 @@  static int cmd_dev_flash_status_cb(const struct nlmsghdr *nlh, void *data)
 		done = mnl_attr_get_u64(tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_DONE]);
 	if (tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_TOTAL])
 		total = mnl_attr_get_u64(tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_TOTAL]);
+	if (tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_TIMEOUT])
+		ctx->status_msg_timeout = mnl_attr_get_u64(tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_TIMEOUT]);
 
 	if (!nullstrcmp(msg, ctx->last_msg) &&
 	    !nullstrcmp(component, ctx->last_component) &&
@@ -3155,11 +3174,72 @@  static int cmd_dev_flash_status_cb(const struct nlmsghdr *nlh, void *data)
 	return MNL_CB_STOP;
 }
 
+static void cmd_dev_flash_time_elapsed(struct cmd_dev_flash_status_ctx *ctx)
+{
+	struct timeval now, res;
+
+	gettimeofday(&now, NULL);
+	timersub(&now, &ctx->time_of_last_status, &res);
+
+	/* Only begin displaying an elapsed time message if we've waited a few
+	 * seconds with no response, or the status message included a timeout
+	 * value.
+	 */
+	if (res.tv_sec > 2 || ctx->status_msg_timeout) {
+		uint64_t elapsed_m, elapsed_s;
+		char msg[128];
+		size_t len;
+
+		/* clear the last elapsed time message, if we have one */
+		cmd_dev_flash_clear_elapsed_time(ctx);
+
+		elapsed_m = res.tv_sec / 60;
+		elapsed_s = res.tv_sec % 60;
+
+		/**
+		 * If we've elapsed a few seconds without receiving any status
+		 * notification from the device, we display a time elapsed
+		 * message. This has a few possible formats:
+		 *
+		 * 1) just time elapsed, when no timeout was provided
+		 *    " ( Xm Ys )"
+		 * 2) time elapsed out of a timeout that came from the device
+		 *    driver via DEVLINK_CMD_FLASH_UPDATE_STATUS_TIMEOUT
+		 *    " ( Xm Ys : Am Ys)"
+		 * 3) time elapsed if we still receive no status after
+		 *    reaching the provided timeout.
+		 *    " ( Xm Ys : timeout reached )"
+		 */
+		if (!ctx->status_msg_timeout) {
+			len = snprintf(msg, sizeof(msg),
+				       " ( %lum %lus )", elapsed_m, elapsed_s);
+		} else if (res.tv_sec <= ctx->status_msg_timeout) {
+			uint64_t timeout_m, timeout_s;
+
+			timeout_m = ctx->status_msg_timeout / 60;
+			timeout_s = ctx->status_msg_timeout % 60;
+
+			len = snprintf(msg, sizeof(msg),
+				       " ( %lum %lus : %lum %lus )",
+				       elapsed_m, elapsed_s, timeout_m, timeout_s);
+		} else {
+			len = snprintf(msg, sizeof(msg),
+				       " ( %lum %lus : timeout reached )", elapsed_m, elapsed_s);
+		}
+
+		ctx->elapsed_time_msg_len = len;
+
+		pr_out_tty("%s", msg);
+		fflush(stdout);
+	}
+}
+
 static int cmd_dev_flash_fds_process(struct cmd_dev_flash_status_ctx *ctx,
 				     struct mnlg_socket *nlg_ntf,
 				     int pipe_r)
 {
 	int nlfd = mnlg_socket_get_fd(nlg_ntf);
+	struct timeval timeout;
 	fd_set fds[3];
 	int fdmax;
 	int i;
@@ -3174,7 +3254,14 @@  static int cmd_dev_flash_fds_process(struct cmd_dev_flash_status_ctx *ctx,
 	if (nlfd >= fdmax)
 		fdmax = nlfd + 1;
 
-	while (select(fdmax, &fds[0], &fds[1], &fds[2], NULL) < 0) {
+	/* select only for a short while (1/10th of a second) in order to
+	 * allow periodically updating the screen with an elapsed time
+	 * indicator.
+	 */
+	timeout.tv_sec = 0;
+	timeout.tv_usec = 100000;
+
+	while (select(fdmax, &fds[0], &fds[1], &fds[2], &timeout) < 0) {
 		if (errno == EINTR)
 			continue;
 		pr_err("select() failed\n");
@@ -3196,6 +3283,7 @@  static int cmd_dev_flash_fds_process(struct cmd_dev_flash_status_ctx *ctx,
 			return err2;
 		ctx->flash_done = 1;
 	}
+	cmd_dev_flash_time_elapsed(ctx);
 	return 0;
 }
 
@@ -3256,6 +3344,11 @@  static int cmd_dev_flash(struct dl *dl)
 	}
 	close(pipe_w);
 
+	/* initialize starting time to allow comparison for when to begin
+	 * displaying a time elapsed message.
+	 */
+	gettimeofday(&ctx.time_of_last_status, NULL);
+
 	do {
 		err = cmd_dev_flash_fds_process(&ctx, nlg_ntf, pipe_r);
 		if (err)