Message ID | bd91db46c50615bc1d1d62beb659fa7f62386446.1701446070.git.jan.kundrat@cesnet.cz |
---|---|
State | New |
Headers | show |
Series | tty: max310x: work around regmap->regcache data corruption | expand |
On Fri, 1 Dec 2023 15:51:51 +0100 Jan Kundrát <jan.kundrat@cesnet.cz> wrote: > The TL;DR summary is that the regmap_noinc_write spills over the data > that are correctly written to the HW also to the following registers in > the regcache. As a result, regcache then contains user-controlled > garbage which will be used later for bit updates on unrelated registers. > > This patch is a "wrong" fix; a real fix would involve fixing regmap > and/or regcache, but that code has too many indirections for my little > mind. > > I was investigating a regression that happened somewhere between 5.12.4 > (plus 14 of our patches) and v6.5.9 (plus 7 of our patches). Our > MAX14830 UART would work fine the first time, but when our application > opens the UART the second time it just wouldn't send anything over the > physical TX pin. With the help of a logical analyzer, I found out that > the kernel was sending value 0xcd to the MODE1 register, which on this > chip is a request to set the UART's TX pin to the Hi-Z mode and to > switch off RX completely. That's certainly not the intention of the > code, but that's what I was seeing on the physical SPI bus, and also in > the log when I instrumented the regmap layer. > > It turned out that one of the *data* bytes which were sent over the UART > was 0xdd, and that this *data byte* somehow ended up in the regcache's > idea about the value within the MODE1 register. When the UART is opened > up the next time and max310x_startup updates a single unrelated bit in > MODE1, that code consults the regcache, notices the 0xdd data byte in > there, and ends up sending 0xcd over SPI. > > Here's what dump_stack() shows: > > max310x spi1.2: regcache_write: reg 0x9 value 0xdd > max310x spi1.2: PWNED > CPU: 1 PID: 26 Comm: kworker/1:1 Not tainted 6.5.9-7-g9e090fe75fd8 #7 > Hardware name: Marvell Armada 380/385 (Device Tree) > Workqueue: events max310x_tx_proc > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x40/0x4c > dump_stack_lvl from regcache_write+0xc0/0xc4 > regcache_write from _regmap_raw_write_impl+0x178/0x828 > _regmap_raw_write_impl from _regmap_raw_write+0xb8/0x134 > _regmap_raw_write from regmap_noinc_write+0x130/0x178 > regmap_noinc_write from max310x_tx_proc+0xd4/0x1a4 > max310x_tx_proc from process_one_work+0x21c/0x4e4 > process_one_work from worker_thread+0x50/0x54c > worker_thread from kthread+0xe0/0xfc > kthread from ret_from_fork+0x14/0x28 > > Clearly, regmap_noinc_write of a register 0x00 (that's the TX FIFO on > this chip) has no business updating register 0x09, but that's what was > happening here. The regmap_config is already set up in a way that > register 0x00 is marked precious and volatile, so it has no business > going through the cache at all. Also, the documentation for > regmap_noinc_write suggests that this driver was using the regmap > infrastructure correctly, and that the real bug is somewhere in > regmap/regcache where a call to regmap_noinc_write end up updating an > unrelated register in regcache. Hi Jan, it is funny, as I am preparing to send a patch for the sc16is7xx driver to convert FIFO R/W to use the _noinc_ versions of regmap functions, inspired by your patch 3f42b142ea11 ("serial: max310x: fix IO data corruption in batched operations"). I am testing on a custom board with two SC16IS752 in SPI mode. Here is our current FIFO write code: regcache_cache_bypass(one->regmap, true); regmap_raw_write(one->regmap, SC16IS7XX_THR_REG, s->buf, to_send); regcache_cache_bypass(one->regmap, false); I am converting it to _noinc_ version to be able to remove the manual (and ugly) cache control workaround to this: regmap_noinc_write(one->regmap, SC16IS7XX_THR_REG, s->buf, to_send); SC16IS7XX_THR_REG is already in precious and volatile, and I also have put it in the noinc list. To confirm that this works ok, I have put debug traces in some regmap functions, and escpecially a trace in regcache_write() to indicate if regmap is caching or not the register. Here is an example when writing 01234567890123456789 (20 bytes) to the Tx FIFO: sc16is7xx spi0.0: sc16is7xx_tx_proc(): entry sc16is7xx spi0.0: sc16is7xx_handle_tx(): entry sc16is7xx spi0.0: regcache_read() not caching volatile reg $08 spi0.0-port0: regmap_read: [08] 40 sc16is7xx spi0.0: regcache_write() not caching volatile reg $08 sc16is7xx spi0.0: _regmap_raw_write_impl() reg = $00 sc16is7xx spi0.0: _regmap_raw_write_impl() val_len = 20 sc16is7xx spi0.0: regcache_write() not caching volatile reg $00 spi0.0-port0: regmap_write: [00] 30 31 32 33 34 35 36 37 38 39... spi0.0-port0: regmap_write: [00] 36 37 38 39 ... With this I have confirmed that regmap _noinc_ works as intended, with regcache_write() indicating it is not caching the volatile register 00 (THR). I hope this can help you with your investigation, let me know if I can help more. Hugo Villeneuve. > Until regmap/regcache is fixed, let's just use an adapted version of the > old code that bypasses regmap altogether, and just sends out an SPI > transaction. > > This is related to my commit 3f42b142ea1171967e40e10e4b0241c0d6d28d41 > ("serial: max310x: fix IO data corruption in batched operations") which > introduced usage of regmap_noinc_write() to this driver. That commit is > a fixup of commit 285e76fc049c4d32c772eea9460a7ef28a193802 ("serial: > max310x: use regmap methods for SPI batch operations") which started > using regmap_raw_write(), which was however also a wrong function. > > Fixes: 3f42b142ea11 ("serial: max310x: fix IO data corruption in batched operations") > Fixes: 285e76fc049c ("serial: max310x: use regmap methods for SPI batch operations") > Signed-off-by: Jan Kundrát <jan.kundrat@cesnet.cz> > To: Mark Brown <broonie@kernel.org> > To: Cosmin Tanislav <cosmin.tanislav@analog.com> > To: linux-serial@vger.kernel.org > Cc: Andy Shevchenko <andy.shevchenko@gmail.com> > Cc: linux-kernel@vger.kernel.org > --- > drivers/tty/serial/max310x.c | 30 ++++++++++++++++++++++++------ > 1 file changed, 24 insertions(+), 6 deletions(-) > > diff --git a/drivers/tty/serial/max310x.c b/drivers/tty/serial/max310x.c > index c44237470bee..79797b573723 100644 > --- a/drivers/tty/serial/max310x.c > +++ b/drivers/tty/serial/max310x.c > @@ -663,16 +663,34 @@ static u32 max310x_set_ref_clk(struct device *dev, struct max310x_port *s, > > static void max310x_batch_write(struct uart_port *port, u8 *txbuf, unsigned int len) > { > - struct max310x_one *one = to_max310x_port(port); > - > - regmap_noinc_write(one->regmap, MAX310X_THR_REG, txbuf, len); > + const u8 header = (port->iobase * 0x20 + MAX310X_THR_REG) | MAX310X_WRITE_BIT; > + struct spi_transfer xfer[] = { > + { > + .tx_buf = &header, > + .len = 1, > + }, > + { > + .tx_buf = txbuf, > + .len = len, > + }, > + }; > + spi_sync_transfer(to_spi_device(port->dev), xfer, ARRAY_SIZE(xfer)); > } > > static void max310x_batch_read(struct uart_port *port, u8 *rxbuf, unsigned int len) > { > - struct max310x_one *one = to_max310x_port(port); > - > - regmap_noinc_read(one->regmap, MAX310X_RHR_REG, rxbuf, len); > + const u8 header = port->iobase * 0x20 + MAX310X_RHR_REG; > + struct spi_transfer xfer[] = { > + { > + .tx_buf = &header, > + .len = 1, > + }, > + { > + .rx_buf = rxbuf, > + .len = len, > + }, > + }; > + spi_sync_transfer(to_spi_device(port->dev), xfer, ARRAY_SIZE(xfer)); > } > > static void max310x_handle_rx(struct uart_port *port, unsigned int rxlen) > -- > 2.42.0 > > >
On Fri, 1 Dec 2023 13:27:36 -0500 Hugo Villeneuve <hugo@hugovil.com> wrote: > On Fri, 1 Dec 2023 15:51:51 +0100 > Jan Kundrát <jan.kundrat@cesnet.cz> wrote: > > > The TL;DR summary is that the regmap_noinc_write spills over the data > > that are correctly written to the HW also to the following registers in > > the regcache. As a result, regcache then contains user-controlled > > garbage which will be used later for bit updates on unrelated registers. > > > > This patch is a "wrong" fix; a real fix would involve fixing regmap > > and/or regcache, but that code has too many indirections for my little > > mind. > > > > I was investigating a regression that happened somewhere between 5.12.4 > > (plus 14 of our patches) and v6.5.9 (plus 7 of our patches). Our > > MAX14830 UART would work fine the first time, but when our application > > opens the UART the second time it just wouldn't send anything over the > > physical TX pin. With the help of a logical analyzer, I found out that > > the kernel was sending value 0xcd to the MODE1 register, which on this > > chip is a request to set the UART's TX pin to the Hi-Z mode and to > > switch off RX completely. That's certainly not the intention of the > > code, but that's what I was seeing on the physical SPI bus, and also in > > the log when I instrumented the regmap layer. > > > > It turned out that one of the *data* bytes which were sent over the UART > > was 0xdd, and that this *data byte* somehow ended up in the regcache's > > idea about the value within the MODE1 register. When the UART is opened > > up the next time and max310x_startup updates a single unrelated bit in > > MODE1, that code consults the regcache, notices the 0xdd data byte in > > there, and ends up sending 0xcd over SPI. > > > > Here's what dump_stack() shows: > > > > max310x spi1.2: regcache_write: reg 0x9 value 0xdd > > max310x spi1.2: PWNED > > CPU: 1 PID: 26 Comm: kworker/1:1 Not tainted 6.5.9-7-g9e090fe75fd8 #7 > > Hardware name: Marvell Armada 380/385 (Device Tree) > > Workqueue: events max310x_tx_proc > > unwind_backtrace from show_stack+0x10/0x14 > > show_stack from dump_stack_lvl+0x40/0x4c > > dump_stack_lvl from regcache_write+0xc0/0xc4 > > regcache_write from _regmap_raw_write_impl+0x178/0x828 > > _regmap_raw_write_impl from _regmap_raw_write+0xb8/0x134 > > _regmap_raw_write from regmap_noinc_write+0x130/0x178 > > regmap_noinc_write from max310x_tx_proc+0xd4/0x1a4 > > max310x_tx_proc from process_one_work+0x21c/0x4e4 > > process_one_work from worker_thread+0x50/0x54c > > worker_thread from kthread+0xe0/0xfc > > kthread from ret_from_fork+0x14/0x28 > > > > Clearly, regmap_noinc_write of a register 0x00 (that's the TX FIFO on > > this chip) has no business updating register 0x09, but that's what was > > happening here. The regmap_config is already set up in a way that > > register 0x00 is marked precious and volatile, so it has no business > > going through the cache at all. Also, the documentation for > > regmap_noinc_write suggests that this driver was using the regmap > > infrastructure correctly, and that the real bug is somewhere in > > regmap/regcache where a call to regmap_noinc_write end up updating an > > unrelated register in regcache. > > Hi Jan, > it is funny, as I am preparing to send a patch for the sc16is7xx driver > to convert FIFO R/W to use the _noinc_ versions of regmap functions, > inspired by your patch 3f42b142ea11 ("serial: max310x: fix IO data > corruption in batched operations"). > > I am testing on a custom board with two SC16IS752 in SPI mode. Hi, I ran the tests on Greg's tty-next tree. Hugo Villeneuve. > Here is our current FIFO write code: > > regcache_cache_bypass(one->regmap, true); > regmap_raw_write(one->regmap, SC16IS7XX_THR_REG, s->buf, to_send); > regcache_cache_bypass(one->regmap, false); > > I am converting it to _noinc_ version to be able to remove the manual > (and ugly) cache control workaround to this: > > regmap_noinc_write(one->regmap, SC16IS7XX_THR_REG, s->buf, to_send); > > SC16IS7XX_THR_REG is already in precious and volatile, and I also > have put it in the noinc list. > > To confirm that this works ok, I have put debug traces in some regmap > functions, and escpecially a trace in regcache_write() to indicate if > regmap is caching or not the register. > > Here is an example when writing 01234567890123456789 (20 bytes) to the > Tx FIFO: > > sc16is7xx spi0.0: sc16is7xx_tx_proc(): entry > sc16is7xx spi0.0: sc16is7xx_handle_tx(): entry > sc16is7xx spi0.0: regcache_read() not caching volatile reg $08 > spi0.0-port0: regmap_read: [08] 40 > sc16is7xx spi0.0: regcache_write() not caching volatile reg $08 > sc16is7xx spi0.0: _regmap_raw_write_impl() reg = $00 > sc16is7xx spi0.0: _regmap_raw_write_impl() val_len = 20 > sc16is7xx spi0.0: regcache_write() not caching volatile reg $00 > spi0.0-port0: regmap_write: [00] 30 31 32 33 34 35 36 37 38 39... > spi0.0-port0: regmap_write: [00] 36 37 38 39 > ... > > With this I have confirmed that regmap _noinc_ works as intended, with > regcache_write() indicating it is not caching the volatile register 00 > (THR). > > I hope this can help you with your investigation, let me know if I can > help more. > > Hugo Villeneuve. > > > > > Until regmap/regcache is fixed, let's just use an adapted version of the > > old code that bypasses regmap altogether, and just sends out an SPI > > transaction. > > > > This is related to my commit 3f42b142ea1171967e40e10e4b0241c0d6d28d41 > > ("serial: max310x: fix IO data corruption in batched operations") which > > introduced usage of regmap_noinc_write() to this driver. That commit is > > a fixup of commit 285e76fc049c4d32c772eea9460a7ef28a193802 ("serial: > > max310x: use regmap methods for SPI batch operations") which started > > using regmap_raw_write(), which was however also a wrong function. > > > > Fixes: 3f42b142ea11 ("serial: max310x: fix IO data corruption in batched operations") > > Fixes: 285e76fc049c ("serial: max310x: use regmap methods for SPI batch operations") > > Signed-off-by: Jan Kundrát <jan.kundrat@cesnet.cz> > > To: Mark Brown <broonie@kernel.org> > > To: Cosmin Tanislav <cosmin.tanislav@analog.com> > > To: linux-serial@vger.kernel.org > > Cc: Andy Shevchenko <andy.shevchenko@gmail.com> > > Cc: linux-kernel@vger.kernel.org > > --- > > drivers/tty/serial/max310x.c | 30 ++++++++++++++++++++++++------ > > 1 file changed, 24 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/tty/serial/max310x.c b/drivers/tty/serial/max310x.c > > index c44237470bee..79797b573723 100644 > > --- a/drivers/tty/serial/max310x.c > > +++ b/drivers/tty/serial/max310x.c > > @@ -663,16 +663,34 @@ static u32 max310x_set_ref_clk(struct device *dev, struct max310x_port *s, > > > > static void max310x_batch_write(struct uart_port *port, u8 *txbuf, unsigned int len) > > { > > - struct max310x_one *one = to_max310x_port(port); > > - > > - regmap_noinc_write(one->regmap, MAX310X_THR_REG, txbuf, len); > > + const u8 header = (port->iobase * 0x20 + MAX310X_THR_REG) | MAX310X_WRITE_BIT; > > + struct spi_transfer xfer[] = { > > + { > > + .tx_buf = &header, > > + .len = 1, > > + }, > > + { > > + .tx_buf = txbuf, > > + .len = len, > > + }, > > + }; > > + spi_sync_transfer(to_spi_device(port->dev), xfer, ARRAY_SIZE(xfer)); > > } > > > > static void max310x_batch_read(struct uart_port *port, u8 *rxbuf, unsigned int len) > > { > > - struct max310x_one *one = to_max310x_port(port); > > - > > - regmap_noinc_read(one->regmap, MAX310X_RHR_REG, rxbuf, len); > > + const u8 header = port->iobase * 0x20 + MAX310X_RHR_REG; > > + struct spi_transfer xfer[] = { > > + { > > + .tx_buf = &header, > > + .len = 1, > > + }, > > + { > > + .rx_buf = rxbuf, > > + .len = len, > > + }, > > + }; > > + spi_sync_transfer(to_spi_device(port->dev), xfer, ARRAY_SIZE(xfer)); > > } > > > > static void max310x_handle_rx(struct uart_port *port, unsigned int rxlen) > > -- > > 2.42.0 > > > > > > >
On Fri, 1 Dec 2023 21:41:48 +0000 Mark Brown <broonie@kernel.org> wrote: > On Fri, Dec 01, 2023 at 04:38:46PM -0500, Hugo Villeneuve wrote: > > Mark Brown <broonie@kernel.org> wrote: > > > > If you're working on that driver it'd also be good to update the current > > > use of cache bypass for the enhanced features/interrupt identification > > > register (and anything else in there, that did seem to be the only one) > > > to use regmap ranges instead - that'd remove the need for the efr_lock > > > and be a much more sensible/idiomatic use of the regmap APIs. > > > I will also look to remove the efr_lock, altough it has more > > implications since this ship has some registers that share a common > > address, and selected by bits in other registers, and I think this > > is why there is this efr_lock. > > Right, the registers sharing a common address with the register selected > by bits in another register is what regmap ranges support - the less > creative use of this is banked blocks of registers with a selector > register which picks which page bank is in use, that's moderately common > especially for TI. Hi Mark, thanks for the info, I was not aware of that, and will look into it. Hugo Villeneuve.
On Mon, Dec 04, 2023 at 11:29:05AM -0500, Hugo Villeneuve wrote: > Do you have an example of a driver which is using regmap ranges like it > should be done in this driver, that is using the exact same address for > two or more registers? I found an example, but it doesn't seem > applicable to the sc16is7xx driver because the two registers do not > share a common address, for example they have addresses like 0x01 and > 0x81, even though with the proper page selection, they finally map to > address 0x01. I don't understand what you mean here - you say that the addresses both have addresses 0x1 and 0x81 but map to address 0x1. What does the 0x81 refer to? The comments in the driver seemed to indicate that there was a single address which mapped to multiple underlying registers... Searching for struct regmap_range_cfg should show a lot of users in mainline.
On Mon, 4 Dec 2023 16:35:30 +0000 Mark Brown <broonie@kernel.org> wrote: > On Mon, Dec 04, 2023 at 11:29:05AM -0500, Hugo Villeneuve wrote: > > > Do you have an example of a driver which is using regmap ranges like it > > should be done in this driver, that is using the exact same address for > > two or more registers? I found an example, but it doesn't seem > > applicable to the sc16is7xx driver because the two registers do not > > share a common address, for example they have addresses like 0x01 and > > 0x81, even though with the proper page selection, they finally map to > > address 0x01. > > I don't understand what you mean here - you say that the addresses both > have addresses 0x1 and 0x81 but map to address 0x1. What does the 0x81 > refer to? The comments in the driver seemed to indicate that there was > a single address which mapped to multiple underlying registers... Hi, I was referring to an example in da9063-i2c.c where they have these two registers: #define DA9063_REG_STATUS_A 0x01 #define DA9063_REG_SEQ 0x81 To access one or the other, you must select page 0 or 1 in page config selection register at address 0x00. It makes sense to me for this case. But for the sc16is7xx, for example you have these two independent registers, sharing the exact same address: #define SC16IS7XX_IIR_REG (0x02) /* Interrupt Identification */ #define SC16IS7XX_FCR_REG (0x02) /* FIFO control */ I am not sure if regmap range can be used with this configuration. Assuming regmap range would be properly setup, when we call regmap_read(regmap, SC16IS7XX_IIR_REG, &val), how does regmap would know that we want to access SC16IS7XX_IIR_REG and not SC16IS7XX_FCR_REG? > Searching for struct regmap_range_cfg should show a lot of users in > mainline. Yes, I am trying to find a good example but I must download and read the datasheet for each one. If you could point to an IC/driver that uses regmap_range similar to IC sc16is7xx, it would really help. Thank you Hugo Villeneuve
On Mon, 4 Dec 2023 19:16:57 +0000 Mark Brown <broonie@kernel.org> wrote: > On Mon, Dec 04, 2023 at 01:59:22PM -0500, Hugo Villeneuve wrote: > > > Based on tas2770.c, I am assuming I could redefine the code to look > > like this: > > > /* SC16IS7XX register definitions */ > > #define SC16IS7XX_REG(page, reg) ((page * 128) + reg) > > > > #define SC16IS7XX_RHR_REG SC16IS7XX_REG(0, 0x00) /* RX FIFO */ > > #define SC16IS7XX_THR_REG SC16IS7XX_REG(0, 0x00) /* TX FIFO */ > > #define SC16IS7XX_IER_REG SC16IS7XX_REG(0, 0x01) /* Interrupt enable */ > > #define SC16IS7XX_IIR_REG SC16IS7XX_REG(0, 0x02) /* Interrupt Identification (read) */ > > #define SC16IS7XX_FCR_REG SC16IS7XX_REG(0, 0x02) /* FIFO control (write) */ > > #define SC16IS7XX_MCR_REG SC16IS7XX_REG(0, 0x04) /* Modem Control */ > > #define SC16IS7XX_LSR_REG SC16IS7XX_REG(0, 0x05) /* Line Status */ > > #define SC16IS7XX_MSR_REG SC16IS7XX_REG(0, 0x06) /* Modem Status */ > > #define SC16IS7XX_SPR_REG SC16IS7XX_REG(0, 0x07) /* Scratch Pad */ > > ... > > #define SC16IS7XX_EFR_REG SC16IS7XX_REG(1, 0x02) /* Enhanced Features */ > > #define SC16IS7XX_XON1_REG SC16IS7XX_REG(1, 0x04) /* Xon1 word */ > > #define SC16IS7XX_XON2_REG SC16IS7XX_REG(1, 0x05) /* Xon2 word */ > > #define SC16IS7XX_XOFF1_REG SC16IS7XX_REG(1, 0x06) /* Xoff1 word */ > > #define SC16IS7XX_XOFF2_REG SC16IS7XX_REG(1, 0x07) /* Xoff2 word */ > > > static const struct regmap_range_cfg sc16is7xx_regmap_ranges[] = { > > { > > .range_min = 0, > > .range_max = 1 * 128, > > .selector_reg = SC16IS7XX_LCR_REG, > > .selector_mask = 0xff, > > .selector_shift = 0, > > .window_start = 0, > > .window_len = 128, > > }, > > }; > > That looks plausible - I'd tend to make the range just completely > non-physical (eg, start at some unrepresentable value) so there's no > possible ambiguity with a physical address. I'm also not clear why > you've made the window be 128 registers wide if it's just this range I simply took what was in tas2770.c as a starting point. > from 2-7 that gets switched around, I'd do something more like > > #define SC16IS7XX_RANGE_BASE 0x1000 > #define SC16IS7XX_WINDOW_LEN 6 > #define SC16IS7XX_PAGED_REG(page, reg) \ > (SC16IS7XX_RANGE_BASE + (SC16IS7XX_WINDOW_LEN * page) + reg) > > .range_min = SC16IS7XX_RANGE_BASE, > .range_max = SC16IS7XX_RANGE_BASE + (2 * SC16IS7XX_WINDOW_LEN), > .window_start = 2, > .window_len = SC16IS7XX_WINDOW_LEN, > > You could apply a - 2 offset to reg as well to make the defines for the > registers clearer. The above means that any untranslated register you > see in I/O traces should be immediately obvious (and more likely to trip > up error handling and tell you about it if it goes wrong) and hopefully > also makes it easier to reason about what's going on. Ok. > > But here, selecting the proper "page" is not obvious, > > because to select page 1, we need to write a fixed value of 0xBF to > > the LCR register. > > > And when selecting page 0, we must write the previous value that was > > in LCR _before_ we made the switch to page 1... > > > How do we do that? > > That's one reason why having the range cover all the registers gets > confusing. That said the code does have special casing for a selector > register in the middle of the range since there were some devices that > just put the paging register in the middle of the range it controls for > some innovative region, the core will notice that this is a selector > register and not do any paging for that register. You are talking about the selector being inside the range, and I understand that because I have looked at _regmap_select_page() comments and logic. But that is not was my question was about. Here a pseudo code example to select "page" 1: 1. save original value of LCR register. 2. write 0xBF to LCR register 3. access desired register in page 1 4. restore original LCR value saved in step 1 How do you do that with regmap range? Hugo.
On Mon, 4 Dec 2023 19:48:05 +0000 Mark Brown <broonie@kernel.org> wrote: > On Mon, Dec 04, 2023 at 02:41:36PM -0500, Hugo Villeneuve wrote: > > > But that is not was my question was about. Here a pseudo code > > example to select "page" 1: > > > 1. save original value of LCR register. > > 2. write 0xBF to LCR register > > 3. access desired register in page 1 > > 4. restore original LCR value saved in step 1 > > > How do you do that with regmap range? > > Are you saying that the selector has other, non-selector functions? Yes! There is no bit or bit range in that register that is used to select a praticular set of registers or "page". It is only when writing the special magic value of $BF that the IC switches to "page" 1. And if the value is NOT $BF, then it switches back to "page" 0. When I told you if you could point to other IC/drivers that had the same configuration, I tough you were aware of this. That explains some of the confusion. > This is truly innovative hardware,... Well, I would not say innovative, but more "crappy" hardware design :) > ...generally the selector is just a > bitfield that you write paging values to. This is also what I am accustomed to normally. > You'd need to extend the core > so that it knows about this quirk, right now that's not possible and > we'll just leave the window pointing at whatever was last accessed. Ok. I am not sure that adding support for it would make sense, since I do not know of other ICs that could reuse this very specific and particular method for switching "paged" registers. Thank you, Hugo Villeneuve
On Mon, Dec 04, 2023 at 03:02:24PM -0500, Hugo Villeneuve wrote: > Mark Brown <broonie@kernel.org> wrote: > > This is truly innovative hardware,... > Well, I would not say innovative, but more "crappy" hardware design :) I didn't say it was *good* innovation. > > You'd need to extend the core > > so that it knows about this quirk, right now that's not possible and > > we'll just leave the window pointing at whatever was last accessed. > Ok. I am not sure that adding support for it would make sense, since I > do not know of other ICs that could reuse this very specific and > particular method for switching "paged" registers. Yeah, I'm drawing a blank there. The thing that springs to mind is optimisation with wanting to always be on a particular page for fast interrupt handling or something but that feels rather thin.
On Fri, 1 Dec 2023 18:34:38 +0000 Mark Brown <broonie@kernel.org> wrote: > On Fri, Dec 01, 2023 at 01:27:36PM -0500, Hugo Villeneuve wrote: > > > it is funny, as I am preparing to send a patch for the sc16is7xx driver > > to convert FIFO R/W to use the _noinc_ versions of regmap functions, > > inspired by your patch 3f42b142ea11 ("serial: max310x: fix IO data > > corruption in batched operations"). > > If you're working on that driver it'd also be good to update the current > use of cache bypass for the enhanced features/interrupt identification > register (and anything else in there, that did seem to be the only one) > to use regmap ranges instead - that'd remove the need for the efr_lock > and be a much more sensible/idiomatic use of the regmap APIs. Hi Mark, after our discussion about regmap range, it seems that the efr_lock will need to stay. In fact, all of this helped me to uncover another case where an additional lock would be needed. Hugo Villeneuve
diff --git a/drivers/tty/serial/max310x.c b/drivers/tty/serial/max310x.c index c44237470bee..79797b573723 100644 --- a/drivers/tty/serial/max310x.c +++ b/drivers/tty/serial/max310x.c @@ -663,16 +663,34 @@ static u32 max310x_set_ref_clk(struct device *dev, struct max310x_port *s, static void max310x_batch_write(struct uart_port *port, u8 *txbuf, unsigned int len) { - struct max310x_one *one = to_max310x_port(port); - - regmap_noinc_write(one->regmap, MAX310X_THR_REG, txbuf, len); + const u8 header = (port->iobase * 0x20 + MAX310X_THR_REG) | MAX310X_WRITE_BIT; + struct spi_transfer xfer[] = { + { + .tx_buf = &header, + .len = 1, + }, + { + .tx_buf = txbuf, + .len = len, + }, + }; + spi_sync_transfer(to_spi_device(port->dev), xfer, ARRAY_SIZE(xfer)); } static void max310x_batch_read(struct uart_port *port, u8 *rxbuf, unsigned int len) { - struct max310x_one *one = to_max310x_port(port); - - regmap_noinc_read(one->regmap, MAX310X_RHR_REG, rxbuf, len); + const u8 header = port->iobase * 0x20 + MAX310X_RHR_REG; + struct spi_transfer xfer[] = { + { + .tx_buf = &header, + .len = 1, + }, + { + .rx_buf = rxbuf, + .len = len, + }, + }; + spi_sync_transfer(to_spi_device(port->dev), xfer, ARRAY_SIZE(xfer)); } static void max310x_handle_rx(struct uart_port *port, unsigned int rxlen)
The TL;DR summary is that the regmap_noinc_write spills over the data that are correctly written to the HW also to the following registers in the regcache. As a result, regcache then contains user-controlled garbage which will be used later for bit updates on unrelated registers. This patch is a "wrong" fix; a real fix would involve fixing regmap and/or regcache, but that code has too many indirections for my little mind. I was investigating a regression that happened somewhere between 5.12.4 (plus 14 of our patches) and v6.5.9 (plus 7 of our patches). Our MAX14830 UART would work fine the first time, but when our application opens the UART the second time it just wouldn't send anything over the physical TX pin. With the help of a logical analyzer, I found out that the kernel was sending value 0xcd to the MODE1 register, which on this chip is a request to set the UART's TX pin to the Hi-Z mode and to switch off RX completely. That's certainly not the intention of the code, but that's what I was seeing on the physical SPI bus, and also in the log when I instrumented the regmap layer. It turned out that one of the *data* bytes which were sent over the UART was 0xdd, and that this *data byte* somehow ended up in the regcache's idea about the value within the MODE1 register. When the UART is opened up the next time and max310x_startup updates a single unrelated bit in MODE1, that code consults the regcache, notices the 0xdd data byte in there, and ends up sending 0xcd over SPI. Here's what dump_stack() shows: max310x spi1.2: regcache_write: reg 0x9 value 0xdd max310x spi1.2: PWNED CPU: 1 PID: 26 Comm: kworker/1:1 Not tainted 6.5.9-7-g9e090fe75fd8 #7 Hardware name: Marvell Armada 380/385 (Device Tree) Workqueue: events max310x_tx_proc unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x40/0x4c dump_stack_lvl from regcache_write+0xc0/0xc4 regcache_write from _regmap_raw_write_impl+0x178/0x828 _regmap_raw_write_impl from _regmap_raw_write+0xb8/0x134 _regmap_raw_write from regmap_noinc_write+0x130/0x178 regmap_noinc_write from max310x_tx_proc+0xd4/0x1a4 max310x_tx_proc from process_one_work+0x21c/0x4e4 process_one_work from worker_thread+0x50/0x54c worker_thread from kthread+0xe0/0xfc kthread from ret_from_fork+0x14/0x28 Clearly, regmap_noinc_write of a register 0x00 (that's the TX FIFO on this chip) has no business updating register 0x09, but that's what was happening here. The regmap_config is already set up in a way that register 0x00 is marked precious and volatile, so it has no business going through the cache at all. Also, the documentation for regmap_noinc_write suggests that this driver was using the regmap infrastructure correctly, and that the real bug is somewhere in regmap/regcache where a call to regmap_noinc_write end up updating an unrelated register in regcache. Until regmap/regcache is fixed, let's just use an adapted version of the old code that bypasses regmap altogether, and just sends out an SPI transaction. This is related to my commit 3f42b142ea1171967e40e10e4b0241c0d6d28d41 ("serial: max310x: fix IO data corruption in batched operations") which introduced usage of regmap_noinc_write() to this driver. That commit is a fixup of commit 285e76fc049c4d32c772eea9460a7ef28a193802 ("serial: max310x: use regmap methods for SPI batch operations") which started using regmap_raw_write(), which was however also a wrong function. Fixes: 3f42b142ea11 ("serial: max310x: fix IO data corruption in batched operations") Fixes: 285e76fc049c ("serial: max310x: use regmap methods for SPI batch operations") Signed-off-by: Jan Kundrát <jan.kundrat@cesnet.cz> To: Mark Brown <broonie@kernel.org> To: Cosmin Tanislav <cosmin.tanislav@analog.com> To: linux-serial@vger.kernel.org Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: linux-kernel@vger.kernel.org --- drivers/tty/serial/max310x.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-)