mbox series

[00/18] serial: sc16is7xx: fixes, cleanups and improvements

Message ID 20231219171903.3530985-1-hugo@hugovil.com
Headers show
Series serial: sc16is7xx: fixes, cleanups and improvements | expand

Message

Hugo Villeneuve Dec. 19, 2023, 5:18 p.m. UTC
From: Hugo Villeneuve <hvilleneuve@dimonoff.com>

Hello,
this patch series brings a few fixes, clean-ups and improvements to the
sc16is7xx driver.

Some of the patches have been suggested by Andy Shevchenko following this
dicussion:

Link: https://lore.kernel.org/all/CAHp75VebCZckUrNraYQj9k=Mrn2kbYs1Lx26f5-8rKJ3RXeh-w@mail.gmail.com/

I have tested the changes on a custom board with two SC16IS752 DUART over
a SPI interface using a Variscite IMX8MN NANO SOM. The four UARTs are
configured in RS-485 mode.

I did not test the change on a SC16is7xx using I2C interface, as my custom
board is only using SPI.

Thank you.

Hugo Villeneuve (18):
  serial: sc16is7xx: fix segfault when removing driver
  serial: sc16is7xx: fix invalid sc16is7xx_lines bitfield in case of
    probe error
  serial: sc16is7xx: remove obsolete loop in sc16is7xx_port_irq()
  serial: sc16is7xx: improve do/while loop in sc16is7xx_irq()
  serial: sc16is7xx: use DECLARE_BITMAP for sc16is7xx_lines bitfield
  serial: sc16is7xx: use spi_get_device_match_data()
  serial: sc16is7xx: use i2c_get_match_data()
  serial: sc16is7xx: add driver name to struct uart_driver
  serial: sc16is7xx: add macro for max number of UART ports
  serial: sc16is7xx: use HZ_PER_MHZ macro to improve readability
  serial: sc16is7xx: add explicit return for some switch default cases
  serial: sc16is7xx: replace hardcoded divisor value with BIT() macro
  serial: sc16is7xx: use in_range() for DT properties bound checks
  serial: sc16is7xx: drop unneeded MODULE_ALIAS
  serial: sc16is7xx: pass R/W buffer in FIFO functions
  serial: sc16is7xx: reorder code to remove prototype declarations
  serial: sc16is7xx: refactor EFR lock
  serial: sc16is7xx: fix whitespace in sc16is7xx_startup() comments

 drivers/tty/serial/sc16is7xx.c | 392 ++++++++++++++++-----------------
 1 file changed, 191 insertions(+), 201 deletions(-)


base-commit: 43f012df3c1e979966524f79b5371fde6545488a

Comments

Andy Shevchenko Dec. 20, 2023, 3:38 p.m. UTC | #1
On Wed, Dec 20, 2023 at 05:34:29PM +0200, Andy Shevchenko wrote:
> On Tue, Dec 19, 2023 at 12:18:45PM -0500, Hugo Villeneuve wrote:
> > From: Hugo Villeneuve <hvilleneuve@dimonoff.com>

...

> > +	dev_set_drvdata(dev, NULL);
> 
> I believe this is wrong approach to fix the issue as this one is prone
> to be cleaned up in the future as we don't do this call explicitly for
> the past ~15 years.

On top of that the ->remove() is not the only uart_remove_one_port() call.
It has a lot of other stuff to go with.

It seems that ->remove() doesn't check the bit in &sc16is7xx_lines, that
might be the proper fix for the issue you have.
Andy Shevchenko Dec. 20, 2023, 3:45 p.m. UTC | #2
On Tue, Dec 19, 2023 at 12:18:51PM -0500, Hugo Villeneuve wrote:
> From: Hugo Villeneuve <hvilleneuve@dimonoff.com>
> 
> Use preferred i2c_get_match_data() instead of device_get_match_data()
> and i2c_client_get_device_id() to get the driver match data.

As per SPI patch.
Andy Shevchenko Dec. 20, 2023, 3:52 p.m. UTC | #3
On Tue, Dec 19, 2023 at 12:18:56PM -0500, Hugo Villeneuve wrote:
> From: Hugo Villeneuve <hvilleneuve@dimonoff.com>
> 
> To better show why the limit is what it is, since we have only 16 bits for
> the divisor.

Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Andy Shevchenko Dec. 20, 2023, 3:57 p.m. UTC | #4
On Tue, Dec 19, 2023 at 12:18:59PM -0500, Hugo Villeneuve wrote:
> From: Hugo Villeneuve <hvilleneuve@dimonoff.com>
> 
> To simplify function by avoiding cast.
> 
> This is similar to what is done in max310x driver.

...

> ---
> If deemed appropriate for stable kernel backporting:

I don't think it's eligible.

> ---

I don't see the necessity of the change, OTOH it's harmless.
The problem is that commit message is basically "Yeah, we
do cargo cult." Because I haven't seen what casting you are
talking about.
Andy Shevchenko Dec. 20, 2023, 4:04 p.m. UTC | #5
On Tue, Dec 19, 2023 at 12:19:02PM -0500, Hugo Villeneuve wrote:
> From: Hugo Villeneuve <hvilleneuve@dimonoff.com>
> 
> Add missing space at end of comments.

...

> -	/* Reset FIFOs*/
> +	/* Reset FIFOs */
>  	val = SC16IS7XX_FCR_RXRESET_BIT | SC16IS7XX_FCR_TXRESET_BIT;
>  	sc16is7xx_port_write(port, SC16IS7XX_FCR_REG, val);
>  	udelay(5);

You can combine this with other comment style cleanups and spelling fixes
(if any). I.o.w. proof-read the code and check if there are any issues
besides noted ones.
Hugo Villeneuve Dec. 20, 2023, 4:21 p.m. UTC | #6
On Wed, 20 Dec 2023 17:54:55 +0200
Andy Shevchenko <andy.shevchenko@gmail.com> wrote:

> On Tue, Dec 19, 2023 at 12:18:58PM -0500, Hugo Villeneuve wrote:
> > From: Hugo Villeneuve <hvilleneuve@dimonoff.com>
> > 
> > The MODULE_DEVICE_TABLE already creates the proper aliases for the
> 
> MODULE_DEVICE_TABLE()

Done for V2.

Hugo.


> 
> > SPI driver.
> 
> With the above fixed
> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 
>
Hugo Villeneuve Dec. 20, 2023, 4:38 p.m. UTC | #7
On Wed, 20 Dec 2023 18:06:29 +0200
Andy Shevchenko <andriy.shevchenko@intel.com> wrote:

> On Tue, Dec 19, 2023 at 12:18:44PM -0500, Hugo Villeneuve wrote:
> > From: Hugo Villeneuve <hvilleneuve@dimonoff.com>
> > 
> > Hello,
> > this patch series brings a few fixes, clean-ups and improvements to the
> > sc16is7xx driver.
> > 
> > Some of the patches have been suggested by Andy Shevchenko following this
> > dicussion:
> > 
> > Link: https://lore.kernel.org/all/CAHp75VebCZckUrNraYQj9k=Mrn2kbYs1Lx26f5-8rKJ3RXeh-w@mail.gmail.com/
> 
> Thanks, good series (need a bit of additional work, though).
> What I really miss is the proper split of the driver. See
> 0f04a81784fe ("pinctrl: mcp23s08: Split to three parts: core, I²C, SPI")
> as an example of a such.

Hi Andy,
thank you for the reviews.

I was thinking of doing the split after this current serie, and I
will look at your example as a reference.

Hugo Villeneuve
Hugo Villeneuve Dec. 20, 2023, 8:53 p.m. UTC | #8
On Wed, 20 Dec 2023 18:04:55 +0200
Andy Shevchenko <andriy.shevchenko@intel.com> wrote:

> On Tue, Dec 19, 2023 at 12:19:02PM -0500, Hugo Villeneuve wrote:
> > From: Hugo Villeneuve <hvilleneuve@dimonoff.com>
> > 
> > Add missing space at end of comments.
> 
> ...
> 
> > -	/* Reset FIFOs*/
> > +	/* Reset FIFOs */
> >  	val = SC16IS7XX_FCR_RXRESET_BIT | SC16IS7XX_FCR_TXRESET_BIT;
> >  	sc16is7xx_port_write(port, SC16IS7XX_FCR_REG, val);
> >  	udelay(5);
> 
> You can combine this with other comment style cleanups and spelling fixes
> (if any). I.o.w. proof-read the code and check if there are any issues
> besides noted ones.

Ok.

Hugo Villeneuve
Hugo Villeneuve Dec. 20, 2023, 9:38 p.m. UTC | #9
On Wed, 20 Dec 2023 18:03:18 +0200
Andy Shevchenko <andriy.shevchenko@intel.com> wrote:

> On Tue, Dec 19, 2023 at 12:19:01PM -0500, Hugo Villeneuve wrote:
> > From: Hugo Villeneuve <hvilleneuve@dimonoff.com>
> > 
> > Move common code for EFR lock/unlock of mutex into functions for code reuse
> > and clarity.
> 
> ...
> 
> > @@ -333,6 +333,7 @@ struct sc16is7xx_one {
> >  	struct sc16is7xx_one_config	config;
> >  	bool				irda_mode;
> >  	unsigned int			old_mctrl;
> > +	u8				old_lcr; /* Value before EFR access. */
> >  };
> 
> Have you run `pahole`?
> I believe with
> 
> 	unsigned int			old_mctrl;
> 	u8				old_lcr; /* Value before EFR access. */
> 	bool				irda_mode;
> 
> layout it will take less memory.

Hi,
I did not know about this tool, nice.

$ pahole -C sc16is7xx_one drivers/tty/serial/sc16is7xx.o

Before:
    /* size: 752, cachelines: 12, members: 10 */

With your proposed change:
    /* size: 744, cachelines: 12, members: 10 */

Will add this modification for V2, as well as other issues
noted below.

Thank you,
Hugo


> > +/* In an amazing feat of design, the Enhanced Features Register (EFR)
> 
> /*
>  * This is NOT the style we use for multi-line
>  * comments in the serial subsystem. On contrary
>  * this comment can be used as a proper example.
>  * (Yes, I noticed it's an old comment, but take
>  *  a chance to fix it.)
>  */
> 
> > + * shares the address of the Interrupt Identification Register (IIR).
> > + * Access to EFR is switched on by writing a magic value (0xbf) to the
> > + * Line Control Register (LCR). Any interrupt firing during this time will
> > + * see the EFR where it expects the IIR to be, leading to
> > + * "Unexpected interrupt" messages.
> > + *
> > + * Prevent this possibility by claiming a mutex while accessing the EFR,
> > + * and claiming the same mutex from within the interrupt handler. This is
> > + * similar to disabling the interrupt, but that doesn't work because the
> > + * bulk of the interrupt processing is run as a workqueue job in thread
> > + * context.
> > + */
> 
> ...
> 
> > +	sc16is7xx_port_write(port, SC16IS7XX_LCR_REG,
> > +			     SC16IS7XX_LCR_CONF_MODE_B);
> 
> One line. (Yes, 81 character, but readability is as good as before.
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 
> 
>
Hugo Villeneuve Dec. 20, 2023, 10:09 p.m. UTC | #10
On Wed, 20 Dec 2023 17:41:31 +0200
Andy Shevchenko <andy.shevchenko@gmail.com> wrote:

> On Tue, Dec 19, 2023 at 12:18:47PM -0500, Hugo Villeneuve wrote:
> > From: Hugo Villeneuve <hvilleneuve@dimonoff.com>
> > 
> > Commit 834449872105 ("sc16is7xx: Fix for multi-channel stall") changed
> > sc16is7xx_port_irq() from looping multiple times when there was still
> > interrupts to serve. It simply changed the do {} while(1) loop to a
> > do {} while(0) loop, which makes the loop itself now obsolete.
> > 
> > Clean the code by removing this obsolete do {} while(0) loop.
> 
> I'm just wondering if you used --histogram diff algo when prepared the patches.
> If no, use that by default.

Hi,
I did not, and effectively it makes the patch easier to follow.

I will use it as default from now on.

Hugo Villeneuve
Hugo Villeneuve Dec. 20, 2023, 10:22 p.m. UTC | #11
On Wed, 20 Dec 2023 17:44:52 +0200
Andy Shevchenko <andy.shevchenko@gmail.com> wrote:

> On Tue, Dec 19, 2023 at 12:18:50PM -0500, Hugo Villeneuve wrote:
> > From: Hugo Villeneuve <hvilleneuve@dimonoff.com>
> > 
> > Use preferred spi_get_device_match_data() instead of
> > device_get_match_data() and spi_get_device_id() to get the driver match
> > data.
> 
> ...
> 
> > +	devtype = (struct sc16is7xx_devtype *)spi_get_device_match_data(spi);
> 
> Nice one, but drop the casting. (Yes, make sure the assignee is const. It might
> require an additional change.)

Done, devtype was already const.

> 
> > +	if (!devtype) {
> > +		dev_err(&spi->dev, "Failed to match device\n");
> > +		return -ENODEV;
> 
> 		return dev_err_probe(...);
> 
> ?

Yes, done for V2.

Also done the same two changes for the I2C part.

Hugo.
Hugo Villeneuve Dec. 21, 2023, 3:56 p.m. UTC | #12
On Wed, 20 Dec 2023 17:40:42 +0200
Andy Shevchenko <andriy.shevchenko@intel.com> wrote:

> On Tue, Dec 19, 2023 at 12:18:46PM -0500, Hugo Villeneuve wrote:
> > From: Hugo Villeneuve <hvilleneuve@dimonoff.com>
> > 
> > If an error occurs during probing, the sc16is7xx_lines bitfield may be left
> > in a state that doesn't represent the correct state of lines allocation.
> > 
> > For example, in a system with two SC16 devices, if an error occurs only
> > during probing of channel (port) B of the second device, sc16is7xx_lines
> > final state will be 00001011b instead of the expected 00000011b.
> > 
> > This is caused in part because of the "i--" in the for/loop located in
> > the out_ports: error path.
> > 
> > Fix this by checking the return value of uart_add_one_port() and set line
> > allocation bit only if this was successful. This allows the refactor of
> > the obfuscated for(i--...) loop in the error path, and properly call
> > uart_remove_one_port() only when needed, and properly unset line allocation
> > bits.
> > 
> > Also use same mechanism in remove() when calling uart_remove_one_port().
> 
> Yes, this seems to be the correct one to fix the problem described in
> the patch 1. I dunno why the patch 1 even exists.

Hi,
this will indeed fix the problem described in patch 1.

However, if I remove patch 1, and I simulate the same probe error as
described in patch 1, now we get stuck forever when trying to 
remove the driver. This is something that I observed before and
that patch 1 also corrected.

The problem is caused in sc16is7xx_remove() when calling this function

    kthread_flush_worker(&s->kworker);

I am not sure how best to handle that without patch 1.

Hugo Villeneuve
Andy Shevchenko Dec. 21, 2023, 4:05 p.m. UTC | #13
On Thu, Dec 21, 2023 at 10:56:39AM -0500, Hugo Villeneuve wrote:
> On Wed, 20 Dec 2023 17:40:42 +0200
> Andy Shevchenko <andriy.shevchenko@intel.com> wrote:
> > On Tue, Dec 19, 2023 at 12:18:46PM -0500, Hugo Villeneuve wrote:

...

> > Yes, this seems to be the correct one to fix the problem described in
> > the patch 1. I dunno why the patch 1 even exists.
> 
> Hi,
> this will indeed fix the problem described in patch 1.
> 
> However, if I remove patch 1, and I simulate the same probe error as
> described in patch 1, now we get stuck forever when trying to 
> remove the driver. This is something that I observed before and
> that patch 1 also corrected.
> 
> The problem is caused in sc16is7xx_remove() when calling this function
> 
>     kthread_flush_worker(&s->kworker);
> 
> I am not sure how best to handle that without patch 1.

So, it means we need to root cause this issue. Because patch 1 looks
really bogus.
Hugo Villeneuve Dec. 21, 2023, 4:13 p.m. UTC | #14
On Thu, 21 Dec 2023 10:56:39 -0500
Hugo Villeneuve <hugo@hugovil.com> wrote:

> On Wed, 20 Dec 2023 17:40:42 +0200
> Andy Shevchenko <andriy.shevchenko@intel.com> wrote:
> 
> > On Tue, Dec 19, 2023 at 12:18:46PM -0500, Hugo Villeneuve wrote:
> > > From: Hugo Villeneuve <hvilleneuve@dimonoff.com>
> > > 
> > > If an error occurs during probing, the sc16is7xx_lines bitfield may be left
> > > in a state that doesn't represent the correct state of lines allocation.
> > > 
> > > For example, in a system with two SC16 devices, if an error occurs only
> > > during probing of channel (port) B of the second device, sc16is7xx_lines
> > > final state will be 00001011b instead of the expected 00000011b.
> > > 
> > > This is caused in part because of the "i--" in the for/loop located in
> > > the out_ports: error path.
> > > 
> > > Fix this by checking the return value of uart_add_one_port() and set line
> > > allocation bit only if this was successful. This allows the refactor of
> > > the obfuscated for(i--...) loop in the error path, and properly call
> > > uart_remove_one_port() only when needed, and properly unset line allocation
> > > bits.
> > > 
> > > Also use same mechanism in remove() when calling uart_remove_one_port().
> > 
> > Yes, this seems to be the correct one to fix the problem described in
> > the patch 1. I dunno why the patch 1 even exists.
> 
> Hi,
> this will indeed fix the problem described in patch 1.
> 
> However, if I remove patch 1, and I simulate the same probe error as
> described in patch 1, now we get stuck forever when trying to 
> remove the driver. This is something that I observed before and
> that patch 1 also corrected.
> 
> The problem is caused in sc16is7xx_remove() when calling this function
> 
>     kthread_flush_worker(&s->kworker);
> 
> I am not sure how best to handle that without patch 1.

Also, if we manage to get past kthread_flush_worker() and 
kthread_stop() (commented out for testing purposes), we get another bug:

# rmmod sc16is7xx
...
crystal-duart-24m already disabled
WARNING: CPU: 2 PID: 340 at drivers/clk/clk.c:1090
clk_core_disable+0x1b0/0x1e0
...
Call trace:
clk_core_disable+0x1b0/0x1e0
clk_disable+0x38/0x60
sc16is7xx_remove+0x1e4/0x240 [sc16is7xx]

This one is caused by calling clk_disable_unprepare(). But
clk_disable_unprepare() has already been called in probe error handling
code. Patch 1 also fixed this...

Hugo Villeneuve
Andy Shevchenko Dec. 21, 2023, 4:16 p.m. UTC | #15
On Thu, Dec 21, 2023 at 11:13:37AM -0500, Hugo Villeneuve wrote:
> On Thu, 21 Dec 2023 10:56:39 -0500
> Hugo Villeneuve <hugo@hugovil.com> wrote:
> > On Wed, 20 Dec 2023 17:40:42 +0200
> > Andy Shevchenko <andriy.shevchenko@intel.com> wrote:

...

> > this will indeed fix the problem described in patch 1.
> > 
> > However, if I remove patch 1, and I simulate the same probe error as
> > described in patch 1, now we get stuck forever when trying to 
> > remove the driver. This is something that I observed before and
> > that patch 1 also corrected.
> > 
> > The problem is caused in sc16is7xx_remove() when calling this function
> > 
> >     kthread_flush_worker(&s->kworker);
> > 
> > I am not sure how best to handle that without patch 1.
> 
> Also, if we manage to get past kthread_flush_worker() and 
> kthread_stop() (commented out for testing purposes), we get another bug:
> 
> # rmmod sc16is7xx
> ...
> crystal-duart-24m already disabled
> WARNING: CPU: 2 PID: 340 at drivers/clk/clk.c:1090
> clk_core_disable+0x1b0/0x1e0
> ...
> Call trace:
> clk_core_disable+0x1b0/0x1e0
> clk_disable+0x38/0x60
> sc16is7xx_remove+0x1e4/0x240 [sc16is7xx]
> 
> This one is caused by calling clk_disable_unprepare(). But
> clk_disable_unprepare() has already been called in probe error handling
> code. Patch 1 also fixed this...

Word "fixed" is incorrect. "Papered over" is what it did.
Hugo Villeneuve Dec. 21, 2023, 5:13 p.m. UTC | #16
On Thu, 21 Dec 2023 18:16:40 +0200
Andy Shevchenko <andriy.shevchenko@intel.com> wrote:

> On Thu, Dec 21, 2023 at 11:13:37AM -0500, Hugo Villeneuve wrote:
> > On Thu, 21 Dec 2023 10:56:39 -0500
> > Hugo Villeneuve <hugo@hugovil.com> wrote:
> > > On Wed, 20 Dec 2023 17:40:42 +0200
> > > Andy Shevchenko <andriy.shevchenko@intel.com> wrote:
> 
> ...
> 
> > > this will indeed fix the problem described in patch 1.
> > > 
> > > However, if I remove patch 1, and I simulate the same probe error as
> > > described in patch 1, now we get stuck forever when trying to 
> > > remove the driver. This is something that I observed before and
> > > that patch 1 also corrected.
> > > 
> > > The problem is caused in sc16is7xx_remove() when calling this function
> > > 
> > >     kthread_flush_worker(&s->kworker);
> > > 
> > > I am not sure how best to handle that without patch 1.
> > 
> > Also, if we manage to get past kthread_flush_worker() and 
> > kthread_stop() (commented out for testing purposes), we get another bug:
> > 
> > # rmmod sc16is7xx
> > ...
> > crystal-duart-24m already disabled
> > WARNING: CPU: 2 PID: 340 at drivers/clk/clk.c:1090
> > clk_core_disable+0x1b0/0x1e0
> > ...
> > Call trace:
> > clk_core_disable+0x1b0/0x1e0
> > clk_disable+0x38/0x60
> > sc16is7xx_remove+0x1e4/0x240 [sc16is7xx]
> > 
> > This one is caused by calling clk_disable_unprepare(). But
> > clk_disable_unprepare() has already been called in probe error handling
> > code. Patch 1 also fixed this...
> 
> Word "fixed" is incorrect. "Papered over" is what it did.

Hi,
I just found the problem, and it was in my bug simulation, not the
driver itself. When I simulated the bug, I forgot to set "ret" to an
error code, and thus I returned 0 at the end of sc16is7xx_probe(). This
is why sc16is7xx_remove() was called when unloading driver, but
shouldn't have.

If I simulate my probe error and return "-EINVAL" at the end of
sc16is7xx_probe(), sc16is7xx_remove() is not called when
unloading the driver.

Sorry for the noise, so I will drop patch 1 and leave patch "fix invalid
sc16is7xx_lines bitfield in case of probe error" as it is, and
simply remove comments about Yury's patch.

Hugo.
Hugo Villeneuve Dec. 21, 2023, 7:35 p.m. UTC | #17
On Wed, 20 Dec 2023 17:57:59 +0200
Andy Shevchenko <andriy.shevchenko@intel.com> wrote:

> On Tue, Dec 19, 2023 at 12:18:59PM -0500, Hugo Villeneuve wrote:
> > From: Hugo Villeneuve <hvilleneuve@dimonoff.com>
> > 
> > To simplify function by avoiding cast.
> > 
> > This is similar to what is done in max310x driver.
> 
> ...
> 
> > ---
> > If deemed appropriate for stable kernel backporting:
> 
> I don't think it's eligible.

No problem.


> > ---
> 
> I don't see the necessity of the change, OTOH it's harmless.
> The problem is that commit message is basically "Yeah, we
> do cargo cult." Because I haven't seen what casting you are
> talking about.

I'll try to reword the commit message.

And replace cast with something like "... to remove additional struct
sc16is7xx_port variable..."

Hugo Villeneuve