Message ID | 20221216021504.457699-1-marex@denx.de |
---|---|
State | Superseded |
Headers | show |
Series | [v2] serial: stm32: Merge hard IRQ and threaded IRQ handling into single IRQ handler | expand |
On 12/16/22 09:00, Sebastian Andrzej Siewior wrote: > On 2022-12-16 03:15:04 [+0100], Marek Vasut wrote: >> Avoid locking in hard interrupt context, move the entirety of hard IRQ >> context code into single IRQ handler, preempt-rt would move the handler >> into thread. This fixes the following splat with preempt-rt enabled: >> >> BUG: scheduling while atomic: (mount)/1289/0x00010001 >> Modules linked in: >> Preemption disabled at: >> [<c0119127>] irq_enter_rcu+0xb/0x42 >> CPU: 0 PID: 1289 Comm: (mount) Not tainted 6.1.0-rc7-rt5-stable-standard-00006-gd70aeccb9f0f #17 >> Hardware name: STM32 (Device Tree Support) >> unwind_backtrace from show_stack+0xb/0xc >> show_stack from dump_stack_lvl+0x2b/0x34 >> dump_stack_lvl from __schedule_bug+0x53/0x80 >> __schedule_bug from __schedule+0x47/0x404 >> __schedule from schedule_rtlock+0x15/0x34 >> schedule_rtlock from rtlock_slowlock_locked+0x1d7/0x57e >> rtlock_slowlock_locked from rt_spin_lock+0x29/0x3c >> rt_spin_lock from stm32_usart_interrupt+0xa9/0x110 >> stm32_usart_interrupt from __handle_irq_event_percpu+0x73/0x14e >> __handle_irq_event_percpu from handle_irq_event_percpu+0x9/0x22 >> handle_irq_event_percpu from handle_irq_event+0x53/0x76 >> handle_irq_event from handle_fasteoi_irq+0x65/0xa8 >> handle_fasteoi_irq from handle_irq_desc+0xf/0x18 >> handle_irq_desc from gic_handle_irq+0x45/0x54 >> gic_handle_irq from generic_handle_arch_irq+0x19/0x2c >> generic_handle_arch_irq from call_with_stack+0xd/0x10 > > Could this be replaced maybe with a proper description instead of > slapping the backtrace into the patch description? Sure, I'm not confident in the preempt-rt parts, thanks for the commit message update, I'll include it in V3. > Requesting an interrupt with IRQF_ONESHOT will run the primary handler > in the hard-IRQ context even in the force-threaded mode. The > force-threaded mode is used by PREEMPT_RT in order to avoid acquiring > sleeping locks (spinlock_t) in hard-IRQ context. This combination > makes it impossible and leads to "sleeping while atomic" warnings. > > Use one interrupt handler for both handlers (primary and secondary) > and drop the IRQF_ONESHOT flag which is not needed. > > Fixes: e359b4411c283 ("serial: stm32: fix threaded interrupt handling") > > As for your change, this should work. > Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Thanks for all the help with this. > Should this DMA-mode need to be outsourced (due to $REASON) you can > request two handlers but then you need to avoid IRQF_ONESHOT and the > primary handler needs to disable the interrupt source in the UART > hardware. > > Also it might be worth checking if the DMA mode makes any sense if the > FIFO is so small. If you want to push a lot of data through the UART without refilling the small FIFO all the time and getting a lot of IRQs, that's where the DMA comes in. Maybe I misunderstood this comment ?
On 2022-12-16 12:52:43 [+0100], Marek Vasut wrote: > > Also it might be worth checking if the DMA mode makes any sense if the > > FIFO is so small. > > If you want to push a lot of data through the UART without refilling the > small FIFO all the time and getting a lot of IRQs, that's where the DMA > comes in. Maybe I misunderstood this comment ? I have no idea how this works in detail. However: if you FIFO which is 16 bytes deep then filling it means 16 writes and so one interrupt every 16 bytes. If the DMA engine is the "average slave dma support" then it gets programmed to fill 16 bytes and then issues an interrupt again. The "lucky" case if you can program say 512 bytes (or so) and the DMA engines itself is able to fill the FIFO 32 times without involving the CPU. The last case is clear win. If you have the 16 bytes-DMA case then you would have to check what is cheaper: programming the DMA engine for 16 bytes or stuffing it directly into the FIFO. If the DMA engine supports the larger case say 512 and refills the FIFO on its own, then using it makes sense. However this makes usually sense for larger transfers. Having a console on it usually leads to more overhead because you receive usually say two bytes a second (unless you are a fast typer). Sending depends on the usecase and the peak is usually during boot. People doing BT via UART usually want to use DMA because of the insane amount of data, that is pumped. Sebastian
On 12/16/22 17:40, Sebastian Andrzej Siewior wrote: > On 2022-12-16 12:52:43 [+0100], Marek Vasut wrote: >>> Also it might be worth checking if the DMA mode makes any sense if the >>> FIFO is so small. >> >> If you want to push a lot of data through the UART without refilling the >> small FIFO all the time and getting a lot of IRQs, that's where the DMA >> comes in. Maybe I misunderstood this comment ? > > I have no idea how this works in detail. However: if you FIFO which is > 16 bytes deep then filling it means 16 writes and so one interrupt every > 16 bytes. If the DMA engine is the "average slave dma support" then it > gets programmed to fill 16 bytes and then issues an interrupt again. The > "lucky" case if you can program say 512 bytes (or so) and the DMA > engines itself is able to fill the FIFO 32 times without involving the > CPU. The last case is clear win. Ah, no, on the STM32 the DMA should be capable of streaming arbitrary amount of data from DRAM to the UART FIFO if needed I think. > If you have the 16 bytes-DMA case then you would have to check what is > cheaper: programming the DMA engine for 16 bytes or stuffing it directly > into the FIFO. > If the DMA engine supports the larger case say 512 and refills the FIFO > on its own, then using it makes sense. However this makes usually sense > for larger transfers. Having a console on it usually leads to more > overhead because you receive usually say two bytes a second (unless you > are a fast typer). Sending depends on the usecase and the peak is > usually during boot. People doing BT via UART usually want to use DMA > because of the insane amount of data, that is pumped. Yes, I think we are in agreement here. The terse version was just a bit too terse. Thanks
diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c index dfdbcf092facc..bbbab8dc2bfa9 100644 --- a/drivers/tty/serial/stm32-usart.c +++ b/drivers/tty/serial/stm32-usart.c @@ -752,8 +752,9 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr) struct tty_port *tport = &port->state->port; struct stm32_port *stm32_port = to_stm32_port(port); const struct stm32_usart_offsets *ofs = &stm32_port->info->ofs; - u32 sr; + unsigned long flags; unsigned int size; + u32 sr; sr = readl_relaxed(port->membase + ofs->isr); @@ -793,27 +794,13 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr) } if ((sr & USART_SR_TXE) && !(stm32_port->tx_ch)) { - spin_lock(&port->lock); + spin_lock_irqsave(&port->lock, flags); stm32_usart_transmit_chars(port); - spin_unlock(&port->lock); + spin_unlock_irqrestore(&port->lock, flags); } - if (stm32_usart_rx_dma_enabled(port)) - return IRQ_WAKE_THREAD; - else - return IRQ_HANDLED; -} - -static irqreturn_t stm32_usart_threaded_interrupt(int irq, void *ptr) -{ - struct uart_port *port = ptr; - struct tty_port *tport = &port->state->port; - struct stm32_port *stm32_port = to_stm32_port(port); - unsigned int size; - unsigned long flags; - /* Receiver timeout irq for DMA RX */ - if (!stm32_port->throttled) { + if (stm32_usart_rx_dma_enabled(port) && !stm32_port->throttled) { spin_lock_irqsave(&port->lock, flags); size = stm32_usart_receive_chars(port, false); uart_unlock_and_check_sysrq_irqrestore(port, flags); @@ -1016,10 +1003,8 @@ static int stm32_usart_startup(struct uart_port *port) u32 val; int ret; - ret = request_threaded_irq(port->irq, stm32_usart_interrupt, - stm32_usart_threaded_interrupt, - IRQF_ONESHOT | IRQF_NO_SUSPEND, - name, port); + ret = request_irq(port->irq, stm32_usart_interrupt, + IRQF_NO_SUSPEND, name, port); if (ret) return ret;
Avoid locking in hard interrupt context, move the entirety of hard IRQ context code into single IRQ handler, preempt-rt would move the handler into thread. This fixes the following splat with preempt-rt enabled: BUG: scheduling while atomic: (mount)/1289/0x00010001 Modules linked in: Preemption disabled at: [<c0119127>] irq_enter_rcu+0xb/0x42 CPU: 0 PID: 1289 Comm: (mount) Not tainted 6.1.0-rc7-rt5-stable-standard-00006-gd70aeccb9f0f #17 Hardware name: STM32 (Device Tree Support) unwind_backtrace from show_stack+0xb/0xc show_stack from dump_stack_lvl+0x2b/0x34 dump_stack_lvl from __schedule_bug+0x53/0x80 __schedule_bug from __schedule+0x47/0x404 __schedule from schedule_rtlock+0x15/0x34 schedule_rtlock from rtlock_slowlock_locked+0x1d7/0x57e rtlock_slowlock_locked from rt_spin_lock+0x29/0x3c rt_spin_lock from stm32_usart_interrupt+0xa9/0x110 stm32_usart_interrupt from __handle_irq_event_percpu+0x73/0x14e __handle_irq_event_percpu from handle_irq_event_percpu+0x9/0x22 handle_irq_event_percpu from handle_irq_event+0x53/0x76 handle_irq_event from handle_fasteoi_irq+0x65/0xa8 handle_fasteoi_irq from handle_irq_desc+0xf/0x18 handle_irq_desc from gic_handle_irq+0x45/0x54 gic_handle_irq from generic_handle_arch_irq+0x19/0x2c generic_handle_arch_irq from call_with_stack+0xd/0x10 Signed-off-by: Marek Vasut <marex@denx.de> --- Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Erwan Le Ray <erwan.leray@foss.st.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valentin Caron <valentin.caron@foss.st.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-stm32@st-md-mailman.stormreply.com To: linux-serial@vger.kernel.org --- V2: - Update patch subject, was: serial: stm32: Move hard IRQ handling to threaded interrupt context - Use request_irq() instead, rename the IRQ handler function --- drivers/tty/serial/stm32-usart.c | 29 +++++++---------------------- 1 file changed, 7 insertions(+), 22 deletions(-)