Message ID | 20241001125033.10625-3-johan+linaro@kernel.org |
---|---|
State | Superseded |
Headers | show |
Series | serial: qcom-geni: fix receiver enable | expand |
Hi, On Tue, Oct 1, 2024 at 5:51 AM Johan Hovold <johan+linaro@kernel.org> wrote: > > A commit adding back the stopping of tx on port shutdown failed to add > back the locking which had also been removed by commit e83766334f96 > ("tty: serial: qcom_geni_serial: No need to stop tx/rx on UART > shutdown"). Hmmm, when I look at that commit it makes me think that the problem that commit e83766334f96 ("tty: serial: qcom_geni_serial: No need to stop tx/rx on UART shutdown") was fixing was re-introduced by commit d8aca2f96813 ("tty: serial: qcom-geni-serial: stop operations in progress at shutdown"). ...and indeed, it was. :( I can't interact with kgdb if I do this: 1. ssh over to DUT 2. Kill the console process (on ChromeOS stop console-ttyMSM0) 3. Drop in the debugger (echo g > /proc/sysrq-trigger) This bug predates your series, but since it touches the same code maybe you could fix it at the same time? I will note that commit e83766334f96 ("tty: serial: qcom_geni_serial: No need to stop tx/rx on UART shutdown") mentions that it wasn't required for FIFO mode--only DMA... Aside from the pre-existing bug, I agree that the locking should be there. -Doug
On Thu, Oct 03, 2024 at 11:30:08AM -0700, Doug Anderson wrote: > On Tue, Oct 1, 2024 at 5:51 AM Johan Hovold <johan+linaro@kernel.org> wrote: > > > > A commit adding back the stopping of tx on port shutdown failed to add > > back the locking which had also been removed by commit e83766334f96 > > ("tty: serial: qcom_geni_serial: No need to stop tx/rx on UART > > shutdown"). > > Hmmm, when I look at that commit it makes me think that the problem > that commit e83766334f96 ("tty: serial: qcom_geni_serial: No need to > stop tx/rx on UART shutdown") was fixing was re-introduced by commit > d8aca2f96813 ("tty: serial: qcom-geni-serial: stop operations in > progress at shutdown"). ...and indeed, it was. :( > > I can't interact with kgdb if I do this: > > 1. ssh over to DUT > 2. Kill the console process (on ChromeOS stop console-ttyMSM0) > 3. Drop in the debugger (echo g > /proc/sysrq-trigger) Yeah, don't do that then. ;) Not sure how your "console process" works, but this should only happen if you do not enable the serial console (console=ttyMSM0) and then try to use a polled console (as enabling the console will prevent port shutdown from being called). That should probably just be disallowed. The console code, and the polled console code bolted on top, is a bit of a hack so corner cases like this are to be expected. When the polled console code was introduced it was claimed that it would have "absolutely zero impact as long as CONFIG_CONSOLE_POLL is disabled". Perhaps I'm reading too much into it, but that statement is clearly ignoring the maintenance cost... Johan
Hi, On Wed, Oct 9, 2024 at 7:10 AM Johan Hovold <johan@kernel.org> wrote: > > On Thu, Oct 03, 2024 at 11:30:08AM -0700, Doug Anderson wrote: > > On Tue, Oct 1, 2024 at 5:51 AM Johan Hovold <johan+linaro@kernel.org> wrote: > > > > > > A commit adding back the stopping of tx on port shutdown failed to add > > > back the locking which had also been removed by commit e83766334f96 > > > ("tty: serial: qcom_geni_serial: No need to stop tx/rx on UART > > > shutdown"). > > > > Hmmm, when I look at that commit it makes me think that the problem > > that commit e83766334f96 ("tty: serial: qcom_geni_serial: No need to > > stop tx/rx on UART shutdown") was fixing was re-introduced by commit > > d8aca2f96813 ("tty: serial: qcom-geni-serial: stop operations in > > progress at shutdown"). ...and indeed, it was. :( > > > > I can't interact with kgdb if I do this: > > > > 1. ssh over to DUT > > 2. Kill the console process (on ChromeOS stop console-ttyMSM0) > > 3. Drop in the debugger (echo g > /proc/sysrq-trigger) > > Yeah, don't do that then. ;) The problem is, I don't always have a choice. As talked about in the message of commit e83766334f96 ("tty: serial: qcom_geni_serial: No need to stop tx/rx on UART shutdown"), the above steps attempt to simulate what happened organically: a crash in late shutdown. During shutdown the agetty has been killed by the init system and I don't have a choice about it. If I get a kernel crash then (which isn't uncommon since shutdown code tends to trigger seldom-used code paths) then I can't debug it. :( We need to fix this. > Not sure how your "console process" works, but this should only happen > if you do not enable the serial console (console=ttyMSM0) and then try > to use a polled console (as enabling the console will prevent port > shutdown from being called). That simply doesn't seem to be the case for me. The port shutdown seems to be called. To confirm, I put a printout at the start of qcom_geni_serial_shutdown(). I see in my /proc/cmdline: console=ttyMSM0,115200n8 ...and I indeed verify that I see console messages on my UART. I then run: stop console-ttyMSM0 ...and I see on the UART: [ 92.916964] DOUG: qcom_geni_serial_shutdown [ 92.922703] init: console-ttyMSM0 main process (611) killed by TERM signal Console messages keep coming out the UART even though the agetty isn't there. Now I (via ssh) drop into the debugger: echo g > /proc/sysrq-trigger I see the "kgdb" prompt but I can't interact with it because qcom_geni_serial_shutdown() stopped RX. -Doug
On Thu, Oct 10, 2024 at 03:30:05PM -0700, Doug Anderson wrote: > On Wed, Oct 9, 2024 at 7:10 AM Johan Hovold <johan@kernel.org> wrote: > > On Thu, Oct 03, 2024 at 11:30:08AM -0700, Doug Anderson wrote: > > > Hmmm, when I look at that commit it makes me think that the problem > > > that commit e83766334f96 ("tty: serial: qcom_geni_serial: No need to > > > stop tx/rx on UART shutdown") was fixing was re-introduced by commit > > > d8aca2f96813 ("tty: serial: qcom-geni-serial: stop operations in > > > progress at shutdown"). ...and indeed, it was. :( > > > > > > I can't interact with kgdb if I do this: > > > > > > 1. ssh over to DUT > > > 2. Kill the console process (on ChromeOS stop console-ttyMSM0) > > > 3. Drop in the debugger (echo g > /proc/sysrq-trigger) > > > > Yeah, don't do that then. ;) > > The problem is, I don't always have a choice. As talked about in the > message of commit e83766334f96 ("tty: serial: qcom_geni_serial: No > need to stop tx/rx on UART shutdown"), the above steps attempt to > simulate what happened organically: a crash in late shutdown. During > shutdown the agetty has been killed by the init system and I don't > have a choice about it. If I get a kernel crash then (which isn't > uncommon since shutdown code tends to trigger seldom-used code paths) > then I can't debug it. :( Ok, thanks for clarifying. > > Not sure how your "console process" works, but this should only happen > > if you do not enable the serial console (console=ttyMSM0) and then try > > to use a polled console (as enabling the console will prevent port > > shutdown from being called). > > That simply doesn't seem to be the case for me. The port shutdown > seems to be called. To confirm, I put a printout at the start of > qcom_geni_serial_shutdown(). I see in my /proc/cmdline: > > console=ttyMSM0,115200n8 > > ...and I indeed verify that I see console messages on my UART. I then run: > > stop console-ttyMSM0 > > ...and I see on the UART: > > [ 92.916964] DOUG: qcom_geni_serial_shutdown > [ 92.922703] init: console-ttyMSM0 main process (611) killed by TERM signal > > Console messages keep coming out the UART even though the agetty isn't > there. And this is with a Chromium kernel, not mainline? If you take a look at tty_port_shutdown() there's a hack in there for consoles that was added back in 2010 and that prevents shutdown() from called for console ports. Put perhaps you manage to hit shutdown() via some other path. Serial core is not yet using tty_port_hangup() so a hangup might trigger that... Could you check that with a dump_stack()? > Now I (via ssh) drop into the debugger: > > echo g > /proc/sysrq-trigger > > I see the "kgdb" prompt but I can't interact with it because > qcom_geni_serial_shutdown() stopped RX. How about simply amending poll_get_char() so that it enables the receiver if it's not already enabled? Johan
Hi, On Thu, Oct 10, 2024 at 11:51 PM Johan Hovold <johan@kernel.org> wrote: > > > > Not sure how your "console process" works, but this should only happen > > > if you do not enable the serial console (console=ttyMSM0) and then try > > > to use a polled console (as enabling the console will prevent port > > > shutdown from being called). > > > > That simply doesn't seem to be the case for me. The port shutdown > > seems to be called. To confirm, I put a printout at the start of > > qcom_geni_serial_shutdown(). I see in my /proc/cmdline: > > > > console=ttyMSM0,115200n8 > > > > ...and I indeed verify that I see console messages on my UART. I then run: > > > > stop console-ttyMSM0 > > > > ...and I see on the UART: > > > > [ 92.916964] DOUG: qcom_geni_serial_shutdown > > [ 92.922703] init: console-ttyMSM0 main process (611) killed by TERM signal > > > > Console messages keep coming out the UART even though the agetty isn't > > there. > > And this is with a Chromium kernel, not mainline? Who do you take me for?!?! :-P :-P :-P Of course it's with mainline. > If you take a look at tty_port_shutdown() there's a hack in there for > consoles that was added back in 2010 and that prevents shutdown() from > called for console ports. > > Put perhaps you manage to hit shutdown() via some other path. Serial > core is not yet using tty_port_hangup() so a hangup might trigger > that... > > Could you check that with a dump_stack()? Sure. Typed from the agetty itself, here ya go. Git hash is not a mainline git hash because I have your patches applied. "dirty" is because of the printout / dump_stack(). lazor-rev9 ~ # stop console-ttyMSM0 [ 68.772828] DOUG: qcom_geni_serial_shutdown [ 68.777365] CPU: 2 UID: 0 PID: 589 Comm: login Not tainted 6.12.0-rc1-g0bde0d120d58-dirty #1 ac8ed1a05abcc73f4fafe0543cbc76768ea594e1 [ 68.789737] Hardware name: Google Lazor (rev9) with LTE (DT) [ 68.795581] Call trace: [ 68.798124] dump_backtrace+0xf8/0x120 [ 68.802025] show_stack+0x24/0x38 [ 68.805463] dump_stack_lvl+0x40/0xc8 [ 68.809265] dump_stack+0x18/0x38 [ 68.812702] qcom_geni_serial_shutdown+0x38/0x110 [ 68.817578] uart_port_shutdown+0x48/0x68 [ 68.821736] uart_shutdown+0xcc/0x170 [ 68.825530] uart_hangup+0x54/0x158 [ 68.829154] __tty_hangup+0x20c/0x318 [ 68.832954] tty_vhangup_session+0x20/0x38 [ 68.837195] disassociate_ctty+0xe8/0x1a8 [ 68.841355] do_exit+0x10c/0x358 [ 68.844716] do_group_exit+0x9c/0xa8 [ 68.848441] get_signal+0x408/0x4d8 [ 68.852071] do_signal+0xa8/0x770 [ 68.855526] do_notify_resume+0x78/0x118 [ 68.859605] el0_svc+0x64/0x68 [ 68.862790] el0t_64_sync_handler+0x20/0x128 [ 68.867218] el0t_64_sync+0x1a8/0x1b0 [ 68.872933] init: console-ttyMSM0 main process (589) killed by TERM signal > > Now I (via ssh) drop into the debugger: > > > > echo g > /proc/sysrq-trigger > > > > I see the "kgdb" prompt but I can't interact with it because > > qcom_geni_serial_shutdown() stopped RX. > > How about simply amending poll_get_char() so that it enables the > receiver if it's not already enabled? Yeah, this would probably work. -Doug
On Fri, Oct 11, 2024 at 07:30:30AM -0700, Doug Anderson wrote: > On Thu, Oct 10, 2024 at 11:51 PM Johan Hovold <johan@kernel.org> wrote: > > > > > > Not sure how your "console process" works, but this should only happen > > > > if you do not enable the serial console (console=ttyMSM0) and then try > > > > to use a polled console (as enabling the console will prevent port > > > > shutdown from being called). > > And this is with a Chromium kernel, not mainline? > > Who do you take me for?!?! :-P :-P :-P Of course it's with mainline. Heh. Just checking. I was sure about shutdown() not being called when closing ports, but yeah, you can indeed hit this via hangup() as serial core was only half-converted over to use the tty port implementation in 2016. > > If you take a look at tty_port_shutdown() there's a hack in there for > > consoles that was added back in 2010 and that prevents shutdown() from > > called for console ports. > > > > Put perhaps you manage to hit shutdown() via some other path. Serial > > core is not yet using tty_port_hangup() so a hangup might trigger > > that... > > > > Could you check that with a dump_stack()? > lazor-rev9 ~ # stop console-ttyMSM0 > [ 68.812702] qcom_geni_serial_shutdown+0x38/0x110 > [ 68.817578] uart_port_shutdown+0x48/0x68 > [ 68.821736] uart_shutdown+0xcc/0x170 > [ 68.825530] uart_hangup+0x54/0x158 > [ 68.829154] __tty_hangup+0x20c/0x318 > [ 68.832954] tty_vhangup_session+0x20/0x38 > [ 68.837195] disassociate_ctty+0xe8/0x1a8 > [ 68.841355] do_exit+0x10c/0x358 > [ 68.844716] do_group_exit+0x9c/0xa8 > [ 68.848441] get_signal+0x408/0x4d8 > [ 68.852071] do_signal+0xa8/0x770 Thanks for confirming. I see this too when stopping a getty. > > > Now I (via ssh) drop into the debugger: > > > > > > echo g > /proc/sysrq-trigger > > > > > > I see the "kgdb" prompt but I can't interact with it because > > > qcom_geni_serial_shutdown() stopped RX. > > > > How about simply amending poll_get_char() so that it enables the > > receiver if it's not already enabled? > > Yeah, this would probably work. Seems we should clean up serial core so that it at least behaves consistently on hangup and close. Having someone think trough and document how these polled consoles are supposed to work would also be good and save people modifying these drivers a lot of work. If they are restricted to when the console is active, there would be no need for most of poll_init(), and we already prevent the console from being shut down on hangup() and close(). And then we now also have the detachable console mess to consider... Johan
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c index 9ea6bd09e665..b6a8729cee6d 100644 --- a/drivers/tty/serial/qcom_geni_serial.c +++ b/drivers/tty/serial/qcom_geni_serial.c @@ -1096,10 +1096,12 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport) { disable_irq(uport->irq); + uart_port_lock_irq(uport); qcom_geni_serial_stop_tx(uport); qcom_geni_serial_stop_rx(uport); qcom_geni_serial_cancel_tx_cmd(uport); + uart_port_unlock_irq(uport); } static void qcom_geni_serial_flush_buffer(struct uart_port *uport)
A commit adding back the stopping of tx on port shutdown failed to add back the locking which had also been removed by commit e83766334f96 ("tty: serial: qcom_geni_serial: No need to stop tx/rx on UART shutdown"). Holding the port lock is needed to serialise against the console code, which may update the interrupt enable register and access the port state. Fixes: d8aca2f96813 ("tty: serial: qcom-geni-serial: stop operations in progress at shutdown") Fixes: 947cc4ecc06c ("serial: qcom-geni: fix soft lockup on sw flow control and suspend") Cc: stable@vger.kernel.org # 6.3 Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> --- drivers/tty/serial/qcom_geni_serial.c | 2 ++ 1 file changed, 2 insertions(+)