diff mbox series

[v2,3/3] serial: qcom-geni: do not kill the machine on fifo underrun

Message ID 20240704101805.30612-4-johan+linaro@kernel.org
State Accepted
Commit 2ac33975abda6921896e52372aec2be2cf51ab37
Headers show
Series serial: qcom-geni: fix lockups | expand

Commit Message

Johan Hovold July 4, 2024, 10:18 a.m. UTC
The Qualcomm GENI serial driver did not handle buffer flushing and used
to print discarded characters when the circular buffer was cleared.
Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
this instead resulted in a hard lockup due to
qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the
interrupt handler.

The underlying bugs have now been fixed, but make sure to output NUL
characters instead of killing the machine if a similar driver bug is
ever reintroduced.

Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
---
 drivers/tty/serial/qcom_geni_serial.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Doug Anderson July 8, 2024, 11:59 p.m. UTC | #1
Hi,

On Thu, Jul 4, 2024 at 3:19 AM Johan Hovold <johan+linaro@kernel.org> wrote:
>
> The Qualcomm GENI serial driver did not handle buffer flushing and used
> to print discarded characters when the circular buffer was cleared.
> Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
> this instead resulted in a hard lockup due to
> qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the
> interrupt handler.
>
> The underlying bugs have now been fixed, but make sure to output NUL
> characters instead of killing the machine if a similar driver bug is
> ever reintroduced.
>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---
>  drivers/tty/serial/qcom_geni_serial.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
> index b2bbd2d79dbb..69a632fefc41 100644
> --- a/drivers/tty/serial/qcom_geni_serial.c
> +++ b/drivers/tty/serial/qcom_geni_serial.c
> @@ -878,7 +878,7 @@ static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
>                 memset(buf, 0, sizeof(buf));
>                 tx_bytes = min(remaining, BYTES_PER_FIFO_WORD);
>
> -               tx_bytes = uart_fifo_out(uport, buf, tx_bytes);
> +               uart_fifo_out(uport, buf, tx_bytes);

FWIW I would have rather we output something much more obviously wrong
in this case instead of a NUL byte. Maybe we should fill it with "@"
characters or something? As you said: the driver shouldn't get into
this error condition so it shouldn't matter, but if we have a bug in
the future I'd rather it be an obvious bug instead of a subtle bug.
I'm happy to post a patch or provide a Reviewed-by if you want to post
a patch. Let me know.

-Doug
Johan Hovold July 9, 2024, 9:44 a.m. UTC | #2
On Mon, Jul 08, 2024 at 04:59:59PM -0700, Doug Anderson wrote:
> On Thu, Jul 4, 2024 at 3:19 AM Johan Hovold <johan+linaro@kernel.org> wrote:
> >
> > The Qualcomm GENI serial driver did not handle buffer flushing and used
> > to print discarded characters when the circular buffer was cleared.
> > Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
> > this instead resulted in a hard lockup due to
> > qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the
> > interrupt handler.
> >
> > The underlying bugs have now been fixed, but make sure to output NUL
> > characters instead of killing the machine if a similar driver bug is
> > ever reintroduced.
> >
> > Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> > ---
> >  drivers/tty/serial/qcom_geni_serial.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
> > index b2bbd2d79dbb..69a632fefc41 100644
> > --- a/drivers/tty/serial/qcom_geni_serial.c
> > +++ b/drivers/tty/serial/qcom_geni_serial.c
> > @@ -878,7 +878,7 @@ static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
> >                 memset(buf, 0, sizeof(buf));
> >                 tx_bytes = min(remaining, BYTES_PER_FIFO_WORD);
> >
> > -               tx_bytes = uart_fifo_out(uport, buf, tx_bytes);
> > +               uart_fifo_out(uport, buf, tx_bytes);
> 
> FWIW I would have rather we output something much more obviously wrong
> in this case instead of a NUL byte. Maybe we should fill it with "@"
> characters or something? As you said: the driver shouldn't get into
> this error condition so it shouldn't matter, but if we have a bug in
> the future I'd rather it be an obvious bug instead of a subtle bug.

Yeah, I've been running with a patch like that locally in my tests, and
went a bit back and forth whether I should post it. My reasoning for not
doing so was that the bugs have been fixed so we don't need to spend
cycles on memsetting the buffer to anything but NUL (I used 'X' in my
testing).

I guess that can be avoided by only padding the buffer if we ever hit an
underrun, but I still thinks it's questionable to spend the effort as
this is not something that should be needed. In any case, I didn't want
to spend time on it to fix the 6.10 regressions.

Killing the machine is perhaps an effective way to get attention to an
issue, but I'd much rather have an occasional NUL character in the log
*if* this ever becomes an issue at all again.

> I'm happy to post a patch or provide a Reviewed-by if you want to post
> a patch. Let me know.

If you feel strongly about this, I can either fill the buffer with
something else than NUL or add error handling for any such future
hypothetical bugs. What do you prefer?

Johan
Johan Hovold July 9, 2024, 12:55 p.m. UTC | #3
On Tue, Jul 09, 2024 at 11:44:18AM +0200, Johan Hovold wrote:
> On Mon, Jul 08, 2024 at 04:59:59PM -0700, Doug Anderson wrote:
> > On Thu, Jul 4, 2024 at 3:19 AM Johan Hovold <johan+linaro@kernel.org> wrote:

> > > @@ -878,7 +878,7 @@ static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
> > >                 memset(buf, 0, sizeof(buf));
> > >                 tx_bytes = min(remaining, BYTES_PER_FIFO_WORD);
> > >
> > > -               tx_bytes = uart_fifo_out(uport, buf, tx_bytes);
> > > +               uart_fifo_out(uport, buf, tx_bytes);
> > 
> > FWIW I would have rather we output something much more obviously wrong
> > in this case instead of a NUL byte. Maybe we should fill it with "@"
> > characters or something? As you said: the driver shouldn't get into
> > this error condition so it shouldn't matter, but if we have a bug in
> > the future I'd rather it be an obvious bug instead of a subtle bug.
> 
> Yeah, I've been running with a patch like that locally in my tests, and
> went a bit back and forth whether I should post it. My reasoning for not
> doing so was that the bugs have been fixed so we don't need to spend
> cycles on memsetting the buffer to anything but NUL (I used 'X' in my
> testing).
> 
> I guess that can be avoided by only padding the buffer if we ever hit an
> underrun, but I still thinks it's questionable to spend the effort as
> this is not something that should be needed. In any case, I didn't want
> to spend time on it to fix the 6.10 regressions.
> 
> Killing the machine is perhaps an effective way to get attention to an
> issue, but I'd much rather have an occasional NUL character in the log
> *if* this ever becomes an issue at all again.
> 
> > I'm happy to post a patch or provide a Reviewed-by if you want to post
> > a patch. Let me know.
> 
> If you feel strongly about this, I can either fill the buffer with
> something else than NUL or add error handling for any such future
> hypothetical bugs. What do you prefer?

Actually we just need to clear the buffer on entry, which would do away
with the unnecessary memset() that is there today. This should also give
you a printable indication that something is wrong in case a similar bug
is ever reintroduced (e.g. the last four characters would be repeated
until the transfer is complete instead of a fixed char like '@').

Perhaps that's good enough as a compromise?

Johan
Doug Anderson July 9, 2024, 11:30 p.m. UTC | #4
Hi,

On Tue, Jul 9, 2024 at 5:55 AM Johan Hovold <johan@kernel.org> wrote:
>
> On Tue, Jul 09, 2024 at 11:44:18AM +0200, Johan Hovold wrote:
> > On Mon, Jul 08, 2024 at 04:59:59PM -0700, Doug Anderson wrote:
> > > On Thu, Jul 4, 2024 at 3:19 AM Johan Hovold <johan+linaro@kernel.org> wrote:
>
> > > > @@ -878,7 +878,7 @@ static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
> > > >                 memset(buf, 0, sizeof(buf));
> > > >                 tx_bytes = min(remaining, BYTES_PER_FIFO_WORD);
> > > >
> > > > -               tx_bytes = uart_fifo_out(uport, buf, tx_bytes);
> > > > +               uart_fifo_out(uport, buf, tx_bytes);
> > >
> > > FWIW I would have rather we output something much more obviously wrong
> > > in this case instead of a NUL byte. Maybe we should fill it with "@"
> > > characters or something? As you said: the driver shouldn't get into
> > > this error condition so it shouldn't matter, but if we have a bug in
> > > the future I'd rather it be an obvious bug instead of a subtle bug.
> >
> > Yeah, I've been running with a patch like that locally in my tests, and
> > went a bit back and forth whether I should post it. My reasoning for not
> > doing so was that the bugs have been fixed so we don't need to spend
> > cycles on memsetting the buffer to anything but NUL (I used 'X' in my
> > testing).
> >
> > I guess that can be avoided by only padding the buffer if we ever hit an
> > underrun, but I still thinks it's questionable to spend the effort as
> > this is not something that should be needed. In any case, I didn't want
> > to spend time on it to fix the 6.10 regressions.
> >
> > Killing the machine is perhaps an effective way to get attention to an
> > issue, but I'd much rather have an occasional NUL character in the log
> > *if* this ever becomes an issue at all again.
> >
> > > I'm happy to post a patch or provide a Reviewed-by if you want to post
> > > a patch. Let me know.
> >
> > If you feel strongly about this, I can either fill the buffer with
> > something else than NUL or add error handling for any such future
> > hypothetical bugs. What do you prefer?
>
> Actually we just need to clear the buffer on entry, which would do away
> with the unnecessary memset() that is there today. This should also give
> you a printable indication that something is wrong in case a similar bug
> is ever reintroduced (e.g. the last four characters would be repeated
> until the transfer is complete instead of a fixed char like '@').
>
> Perhaps that's good enough as a compromise?

IMO initting 32-bits of data should be fine to do each time through
the loop. I've sent a patch:

https://lore.kernel.org/r/20240709162841.1.I93bf39f29d1887c46c74fbf8d4b937f6497cdfaa@changeid

-Doug
diff mbox series

Patch

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index b2bbd2d79dbb..69a632fefc41 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -878,7 +878,7 @@  static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
 		memset(buf, 0, sizeof(buf));
 		tx_bytes = min(remaining, BYTES_PER_FIFO_WORD);
 
-		tx_bytes = uart_fifo_out(uport, buf, tx_bytes);
+		uart_fifo_out(uport, buf, tx_bytes);
 
 		iowrite32_rep(uport->membase + SE_GENI_TX_FIFOn, buf, 1);