Message ID | 20240416123545.7098-4-wsa+renesas@sang-engineering.com |
---|---|
Headers | show |
Series | serial: sci: fix OOPS because of wrongly running hrtimer | expand |
Hi Wolfram, On Tue, Apr 16, 2024 at 2:35 PM Wolfram Sang <wsa+renesas@sang-engineering.com> wrote: > Clear the timer whenever 'chan_rx' is cleared to avoid an OOPS. > Currently, the driver only runs the timer when 'chan_rx' is set before. > However, it is good defensive programming to make sure the hrtimer is > always stopped before clearing the 'chan_rx' pointer. > > Reported-by: Dirk Behme <dirk.behme@de.bosch.com> > Closes: https://lore.kernel.org/r/ee6c9e16-9f29-450e-81da-4a8dceaa8fc7@de.bosch.com > Fixes: 9ab765566086 ("serial: sh-sci: Remove timer on shutdown of port") > Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Thanks for your patch! > Locking needs to be double-checked here. This patch is mainly calling > for opinions. I do think you need to cancel the timer: even when not restarting the timer in sci_dma_rx_complete() due to a DMA failure, the previous timer may still be running, and will cause a NULL pointer dereference on s->chan_rx on timer expiry. > --- a/drivers/tty/serial/sh-sci.c > +++ b/drivers/tty/serial/sh-sci.c > @@ -1262,6 +1262,7 @@ static void sci_dma_rx_chan_invalidate(struct sci_port *s) > { > unsigned int i; > > + hrtimer_cancel(&s->rx_timer); Is it safe to do this unconditionally on shutdown (cfr. the old check for s->chan_rx_saved)? > s->chan_rx = NULL; > for (i = 0; i < ARRAY_SIZE(s->cookie_rx); i++) > s->cookie_rx[i] = -EINVAL; > @@ -2242,14 +2243,6 @@ static void sci_shutdown(struct uart_port *port) > scr & (SCSCR_CKE1 | SCSCR_CKE0 | s->hscif_tot)); > uart_port_unlock_irqrestore(port, flags); > > -#ifdef CONFIG_SERIAL_SH_SCI_DMA > - if (s->chan_rx_saved) { > - dev_dbg(port->dev, "%s(%d) deleting rx_timer\n", __func__, > - port->line); > - hrtimer_cancel(&s->rx_timer); > - } > -#endif > - > if (s->rx_trigger > 1 && s->rx_fifo_timeout > 0) > del_timer_sync(&s->rx_fifo_timer); > sci_free_irq(s); Gr{oetje,eeting}s, Geert
Hi Geert, good news, I was able to trigger the DMA rx code path. I dunno what I did wrong last time. I started from scratch again and it worked easily by dd-ing random data to the second non-console debug port. > I do think you need to cancel the timer: even when not restarting > the timer in sci_dma_rx_complete() due to a DMA failure, the previous > timer may still be running, and will cause a NULL pointer dereference > on s->chan_rx on timer expiry. Taking locking into account, I think this patch is bogus. If we run into this NULL-pointer, we have a locking problem and cancelling the timer in sci_dma_rx_chan_invalidate() is not going to fix the underlying locking problem. sci_dma_rx_chan_invalidate() does not only clear the pointer but also the cookie_rx-array. sci_dma_rx_timer_fn() bails out via sci_dma_rx_find_active() if that array is cleared. It does so before accessing the chan_rx-pointer. So, it looks to me that should work once all calls to sci_dma_rx_chan_invalidate() are protected. And there is one path where this is not true, namely via sci_dma_rx_release() during shutdown. This is why I asked Dirk if the system was about to shutdown. Currently, I don't see any other problematic code path. > > -#ifdef CONFIG_SERIAL_SH_SCI_DMA > > - if (s->chan_rx_saved) { > > - dev_dbg(port->dev, "%s(%d) deleting rx_timer\n", __func__, > > - port->line); > > - hrtimer_cancel(&s->rx_timer); > > - } > > -#endif Also, this chunk needs to stay. I suggested in patch 1 to cancel the timer on successful dma_rx_complete, so the timer only runs when a DMA is in progress. Then, of course, we need to cancel it in shutdown. I hope I am not seeing "no wood for the trees" by now. I am not convinced that I actually found Dirk's race condition yet... All the best, Wolfram