mbox series

[0/8] serial: imx: work-around for hardware RX flood, and then isr improvements

Message ID 20230113184334.287130-1-sorganov@gmail.com
Headers show
Series serial: imx: work-around for hardware RX flood, and then isr improvements | expand

Message

Sergey Organov Jan. 13, 2023, 6:43 p.m. UTC
These series contain a work-around for hardware RX flood in first 2
commits, and then further cleanups and optimizations of the receive IRQ
handler.

WARNING: the flood fix is extensively tested with older version of the
kernel with DMA turned off. The DMA path is tested only a bit, as DMA has
receive problems on my kernel version.

Sergey Organov (8):
  serial: imx: factor-out common code to imx_uart_soft_reset()
  serial: imx: work-around for hardware RX flood
  serial: imx: do not sysrq broken chars
  serial: imx: do not break from FIFO reading loop prematurely
  serial: imx: remove redundant USR2 read from FIFO reading loop
  serial: imx: stop using USR2 in FIFO reading loop
  serial: imx: use readl() to optimize FIFO reading loop
  serial: imx: refine local variables in rxint()

 drivers/tty/serial/imx.c | 227 ++++++++++++++++++++++++---------------
 1 file changed, 141 insertions(+), 86 deletions(-)

Comments

Ilpo Järvinen Jan. 16, 2023, 11:03 a.m. UTC | #1
On Fri, 13 Jan 2023, Sergey Organov wrote:

> Use readl() instead of heavier imx_uart_readl() in the Rx ISR, as we know
> we read registers that must not be cached.
> 
> Signed-off-by: Sergey Organov <sorganov@gmail.com>
> ---
>  drivers/tty/serial/imx.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
> index be00362b8b67..f4236e8995fa 100644
> --- a/drivers/tty/serial/imx.c
> +++ b/drivers/tty/serial/imx.c
> @@ -890,14 +890,15 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
>  	struct imx_port *sport = dev_id;
>  	unsigned int rx, flg;
>  	struct tty_port *port = &sport->port.state->port;
> +	typeof(sport->port.membase) membase = sport->port.membase;
>  	u32 usr2;
>  
>  	/* If we received something, check for 0xff flood */
> -	usr2 = imx_uart_readl(sport, USR2);
> +	usr2 = readl(membase + USR2);
>  	if (usr2 & USR2_RDR)
>  		imx_uart_check_flood(sport, usr2);
>  
> -	while ((rx = imx_uart_readl(sport, URXD0)) & URXD_CHARRDY) {
> +	while ((rx = readl(membase + URXD0)) & URXD_CHARRDY) {
>  		flg = TTY_NORMAL;
>  		sport->port.icount.rx++;

I'd just make a uport local variable and use uport->membase + xx. There 
are plenty of sport->port constructs to replace with uport in that 
function anyway.
Johan Hovold Jan. 16, 2023, 3:24 p.m. UTC | #2
On Fri, Jan 13, 2023 at 09:43:29PM +0300, Sergey Organov wrote:
> Do not call uart_handle_sysrq_char() if we got any receive error along with
> the character, as we don't want random junk to be considered a sysrq.
> 
> Signed-off-by: Sergey Organov <sorganov@gmail.com>
> ---
>  drivers/tty/serial/imx.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
> index e7fce31e460d..1c950112a598 100644
> --- a/drivers/tty/serial/imx.c
> +++ b/drivers/tty/serial/imx.c
> @@ -911,9 +911,6 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
>  				continue;
>  		}
>  
> -		if (uart_handle_sysrq_char(&sport->port, (unsigned char)rx))
> -			continue;
> -
>  		if (unlikely(rx & URXD_ERR)) {
>  			if (rx & URXD_BRK)
>  				sport->port.icount.brk++;
> @@ -942,7 +939,8 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
>  				flg = TTY_OVERRUN;
>  
>  			sport->port.sysrq = 0;
> -		}
> +		} else if (uart_handle_sysrq_char(&sport->port, (unsigned char)rx))
> +			continue;

Nit: missing braces {}

Note that you could also place just place this after the block due to
the reset of the sysrq time stamp.

>  
>  		if (sport->port.ignore_status_mask & URXD_DUMMY_READ)
>  			goto out;

Johan
Uwe Kleine-König Jan. 17, 2023, 11:32 a.m. UTC | #3
On Fri, Jan 13, 2023 at 09:43:33PM +0300, Sergey Organov wrote:
> Use readl() instead of heavier imx_uart_readl() in the Rx ISR, as we know
> we read registers that must not be cached.
> 
> Signed-off-by: Sergey Organov <sorganov@gmail.com>
> ---
>  drivers/tty/serial/imx.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
> index be00362b8b67..f4236e8995fa 100644
> --- a/drivers/tty/serial/imx.c
> +++ b/drivers/tty/serial/imx.c
> @@ -890,14 +890,15 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
>  	struct imx_port *sport = dev_id;
>  	unsigned int rx, flg;
>  	struct tty_port *port = &sport->port.state->port;
> +	typeof(sport->port.membase) membase = sport->port.membase;
>  	u32 usr2;
>  
>  	/* If we received something, check for 0xff flood */
> -	usr2 = imx_uart_readl(sport, USR2);
> +	usr2 = readl(membase + USR2);
>  	if (usr2 & USR2_RDR)
>  		imx_uart_check_flood(sport, usr2);
>  
> -	while ((rx = imx_uart_readl(sport, URXD0)) & URXD_CHARRDY) {
> +	while ((rx = readl(membase + URXD0)) & URXD_CHARRDY) {
>  		flg = TTY_NORMAL;
>  		sport->port.icount.rx++;

One of the motivations to introduce imx_uart_readl was to have a single
place to add a debug output to be able to inspect what the driver is
doing.

I wonder where your need for higher speed comes from and if the compiler
really generates more effective code with your change.

Please either drop the patch from your series or provide the differences
the compiler produces and a benchmark.

Best regards
Uwe
Sergey Organov Jan. 17, 2023, 1:22 p.m. UTC | #4
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:

> On Fri, Jan 13, 2023 at 09:43:33PM +0300, Sergey Organov wrote:
>> Use readl() instead of heavier imx_uart_readl() in the Rx ISR, as we know
>> we read registers that must not be cached.
>> 
>> Signed-off-by: Sergey Organov <sorganov@gmail.com>
>> ---
>>  drivers/tty/serial/imx.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
>> index be00362b8b67..f4236e8995fa 100644
>> --- a/drivers/tty/serial/imx.c
>> +++ b/drivers/tty/serial/imx.c
>> @@ -890,14 +890,15 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
>>  	struct imx_port *sport = dev_id;
>>  	unsigned int rx, flg;
>>  	struct tty_port *port = &sport->port.state->port;
>> +	typeof(sport->port.membase) membase = sport->port.membase;
>>  	u32 usr2;
>>  
>>  	/* If we received something, check for 0xff flood */
>> -	usr2 = imx_uart_readl(sport, USR2);
>> +	usr2 = readl(membase + USR2);
>>  	if (usr2 & USR2_RDR)
>>  		imx_uart_check_flood(sport, usr2);
>>  
>> -	while ((rx = imx_uart_readl(sport, URXD0)) & URXD_CHARRDY) {
>> +	while ((rx = readl(membase + URXD0)) & URXD_CHARRDY) {
>>  		flg = TTY_NORMAL;
>>  		sport->port.icount.rx++;
>
> One of the motivations to introduce imx_uart_readl was to have a single
> place to add a debug output to be able to inspect what the driver is
> doing.
>
> I wonder where your need for higher speed comes from and if the compiler
> really generates more effective code with your change.

Mostly it's because I'm obviously slowing things down a bit with the
patch to fight the flood, so I feel obliged to get things back on par
with the origin. Then, higher speed, let alone the time spent with
interrupts disabled and/or spinlocks taken, is always one of generic
goals for me.

As for the generated code, with this patch I don't aim to affect code
generation, I rather avoid execution of part of existing code while
being on the most critical path. It should be quite obvious that not
executing some code is at least not slower than executing it.

>
> Please either drop the patch from your series or provide the differences
> the compiler produces and a benchmark.

If your only objection against this patch is the desire to keep a single
place to add debug output, I'll be happy to tune the resulting code to
still have one.

That said, before we make a decision, could you please tell why register
shadows that the imx_uart_readl/writel are dealing with are needed in
the first place? It looks like all the registers that are shadowed are
readable as well. What's going on here, and if it happens to be a
speed-up, do we have any benchmarks?

Thanks,
-- Sergey Organov
Sergey Organov Jan. 17, 2023, 1:42 p.m. UTC | #5
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> writes:

> On Fri, 13 Jan 2023, Sergey Organov wrote:
>
>> We perform soft reset in 2 places, slightly differently for no sufficient
>> reasons, so move more generic variant to a function, and re-use the code.
>> 
>> Out of 2 repeat counters, 10 and 100, select 10, as the code works at
>> interrupts disabled, and in practice the reset happens immediately.
>> 
>> Signed-off-by: Sergey Organov <sorganov@gmail.com>
>> ---
>>  drivers/tty/serial/imx.c | 73 ++++++++++++++++++++--------------------
>>  1 file changed, 37 insertions(+), 36 deletions(-)
>> 
>> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
>> index 757825edb0cd..bf222d8568a9 100644
>> --- a/drivers/tty/serial/imx.c
>> +++ b/drivers/tty/serial/imx.c
>> @@ -397,6 +397,39 @@ static void start_hrtimer_ms(struct hrtimer *hrt, unsigned long msec)
>>         hrtimer_start(hrt, ms_to_ktime(msec), HRTIMER_MODE_REL);
>>  }
>>  
>> +/* called with port.lock taken and irqs off */
>> +static void imx_uart_soft_reset(struct imx_port *sport)
>> +{
>> +	int i = 10;
>> +	u32 ucr2, ubir, ubmr, uts;
>> +
>> +	/*
>> +	 * According to the Reference Manual description of the UART SRST bit:
>> +	 *
>> +	 * "Reset the transmit and receive state machines,
>> +	 * all FIFOs and register USR1, USR2, UBIR, UBMR, UBRC, URXD, UTXD
>> +	 * and UTS[6-3]".
>> +	 *
>> +	 * We don't need to restore the old values from USR1, USR2, URXD and
>> +	 * UTXD. UBRC is read only, so only save/restore the other three
>> +	 * registers.
>> +	 */
>> +	ubir = imx_uart_readl(sport, UBIR);
>> +	ubmr = imx_uart_readl(sport, UBMR);
>> +	uts = imx_uart_readl(sport, IMX21_UTS);
>> +
>> +	ucr2 = imx_uart_readl(sport, UCR2);
>> +	imx_uart_writel(sport, ucr2 & ~UCR2_SRST, UCR2);
>> +
>> +	while (!(imx_uart_readl(sport, UCR2) & UCR2_SRST) && (--i > 0))
>> +		udelay(1);
>
> This could use read_poll_timeout_atomic().

As this is just a factor-out of existing code that uses the loop, I'm not
sure if it's a good idea to change this along the way.

Do you want me to add yet another patch to the series? I'd rather not,
as I won't be able to test it on my outdated kernel anyway.

Thanks,
-- Sergey Organov
Sergey Organov Jan. 17, 2023, 5:35 p.m. UTC | #6
Johan Hovold <johan@kernel.org> writes:

> On Fri, Jan 13, 2023 at 09:43:29PM +0300, Sergey Organov wrote:
>> Do not call uart_handle_sysrq_char() if we got any receive error along with
>> the character, as we don't want random junk to be considered a sysrq.
>> 
>> Signed-off-by: Sergey Organov <sorganov@gmail.com>
>> ---
>>  drivers/tty/serial/imx.c | 6 ++----
>>  1 file changed, 2 insertions(+), 4 deletions(-)
>> 
>> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
>> index e7fce31e460d..1c950112a598 100644
>> --- a/drivers/tty/serial/imx.c
>> +++ b/drivers/tty/serial/imx.c
>> @@ -911,9 +911,6 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
>>  				continue;
>>  		}
>>  
>> -		if (uart_handle_sysrq_char(&sport->port, (unsigned char)rx))
>> -			continue;
>> -
>>  		if (unlikely(rx & URXD_ERR)) {
>>  			if (rx & URXD_BRK)
>>  				sport->port.icount.brk++;
>> @@ -942,7 +939,8 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
>>  				flg = TTY_OVERRUN;
>>  
>>  			sport->port.sysrq = 0;
>> -		}
>> +		} else if (uart_handle_sysrq_char(&sport->port, (unsigned char)rx))
>> +			continue;
>
> Nit: missing braces {}
>
> Note that you could also place just place this after the block due to
> the reset of the sysrq time stamp.

Thanks, I think I'll opt for adding braces. Relying on the reset of the
timestamp feels a bit convoluted.
Sergey Organov Jan. 17, 2023, 5:43 p.m. UTC | #7
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> writes:

> On Fri, 13 Jan 2023, Sergey Organov wrote:
>
>> Use readl() instead of heavier imx_uart_readl() in the Rx ISR, as we know
>> we read registers that must not be cached.
>> 
>> Signed-off-by: Sergey Organov <sorganov@gmail.com>
>> ---
>>  drivers/tty/serial/imx.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
>> index be00362b8b67..f4236e8995fa 100644
>> --- a/drivers/tty/serial/imx.c
>> +++ b/drivers/tty/serial/imx.c
>> @@ -890,14 +890,15 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
>>  	struct imx_port *sport = dev_id;
>>  	unsigned int rx, flg;
>>  	struct tty_port *port = &sport->port.state->port;
>> +	typeof(sport->port.membase) membase = sport->port.membase;
>>  	u32 usr2;
>>  
>>  	/* If we received something, check for 0xff flood */
>> -	usr2 = imx_uart_readl(sport, USR2);
>> +	usr2 = readl(membase + USR2);
>>  	if (usr2 & USR2_RDR)
>>  		imx_uart_check_flood(sport, usr2);
>>  
>> -	while ((rx = imx_uart_readl(sport, URXD0)) & URXD_CHARRDY) {
>> +	while ((rx = readl(membase + URXD0)) & URXD_CHARRDY) {
>>  		flg = TTY_NORMAL;
>>  		sport->port.icount.rx++;
>
> I'd just make a uport local variable and use uport->membase + xx. There 
> are plenty of sport->port constructs to replace with uport in that 
> function anyway.

OK, thanks, will do it this way. Probably with global rename over this
function in a separate patch?
Sergey Organov Jan. 17, 2023, 5:48 p.m. UTC | #8
Sherry Sun <sherry.sun@nxp.com> writes:

>> -----Original Message-----
>> From: Sergey Organov <sorganov@gmail.com>
>> Sent: 2023年1月14日 2:44
>> To: linux-serial@vger.kernel.org
>> Cc: Fabio Estevam <festevam@gmail.com>; Greg Kroah-Hartman
>> <gregkh@linuxfoundation.org>; Jiri Slaby <jirislaby@kernel.org>; Richard
>> Genoud <richard.genoud@gmail.com>; Sascha Hauer
>> <s.hauer@pengutronix.de>; Shawn Guo <shawnguo@kernel.org>;
>> tharvey@gateworks.com; Tomasz Moń <tomasz.mon@camlingroup.com>;
>> linux-arm-kernel@lists.infradead.org; dl-linux-imx <linux-imx@nxp.com>;
>> Pengutronix Kernel Team <kernel@pengutronix.de>; Sergey Organov
>> <sorganov@gmail.com>
>> Subject: [PATCH 7/8] serial: imx: use readl() to optimize FIFO reading loop
>> 
>> Use readl() instead of heavier imx_uart_readl() in the Rx ISR, as we know we
>> read registers that must not be cached.
>> 
>> Signed-off-by: Sergey Organov <sorganov@gmail.com>
>> ---
>>  drivers/tty/serial/imx.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c index
>> be00362b8b67..f4236e8995fa 100644
>> --- a/drivers/tty/serial/imx.c
>> +++ b/drivers/tty/serial/imx.c
>> @@ -890,14 +890,15 @@ static irqreturn_t __imx_uart_rxint(int irq, void
>> *dev_id)
>>  	struct imx_port *sport = dev_id;
>>  	unsigned int rx, flg;
>>  	struct tty_port *port = &sport->port.state->port;
>> +	typeof(sport->port.membase) membase = sport->port.membase;
>>  	u32 usr2;
>> 
>>  	/* If we received something, check for 0xff flood */
>> -	usr2 = imx_uart_readl(sport, USR2);
>> +	usr2 = readl(membase + USR2);
>>  	if (usr2 & USR2_RDR)
>>  		imx_uart_check_flood(sport, usr2);
>> 
>> -	while ((rx = imx_uart_readl(sport, URXD0)) & URXD_CHARRDY) {
>> +	while ((rx = readl(membase + URXD0)) & URXD_CHARRDY) {
>
> Actually imx_uart_readl() only set shadow registers for UCRx and UFCR,
> for the USR2 and URXD0 that you used here, they will not be cached.

Sure, and that's why we here don't need to call imx_uart_readl(), that
only needlessly checks for shadowing, thus producing pure overhead.

Best Regards,
Sergey
Uwe Kleine-König Jan. 17, 2023, 9:27 p.m. UTC | #9
Hello Sergey,

On Tue, Jan 17, 2023 at 04:22:51PM +0300, Sergey Organov wrote:
> Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:
> > On Fri, Jan 13, 2023 at 09:43:33PM +0300, Sergey Organov wrote:
> >> Use readl() instead of heavier imx_uart_readl() in the Rx ISR, as we know
> >> we read registers that must not be cached.
> >> 
> >> Signed-off-by: Sergey Organov <sorganov@gmail.com>
> >> ---
> >>  drivers/tty/serial/imx.c | 5 +++--
> >>  1 file changed, 3 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
> >> index be00362b8b67..f4236e8995fa 100644
> >> --- a/drivers/tty/serial/imx.c
> >> +++ b/drivers/tty/serial/imx.c
> >> @@ -890,14 +890,15 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
> >>  	struct imx_port *sport = dev_id;
> >>  	unsigned int rx, flg;
> >>  	struct tty_port *port = &sport->port.state->port;
> >> +	typeof(sport->port.membase) membase = sport->port.membase;
> >>  	u32 usr2;
> >>  
> >>  	/* If we received something, check for 0xff flood */
> >> -	usr2 = imx_uart_readl(sport, USR2);
> >> +	usr2 = readl(membase + USR2);
> >>  	if (usr2 & USR2_RDR)
> >>  		imx_uart_check_flood(sport, usr2);
> >>  
> >> -	while ((rx = imx_uart_readl(sport, URXD0)) & URXD_CHARRDY) {
> >> +	while ((rx = readl(membase + URXD0)) & URXD_CHARRDY) {
> >>  		flg = TTY_NORMAL;
> >>  		sport->port.icount.rx++;
> >
> > One of the motivations to introduce imx_uart_readl was to have a single
> > place to add a debug output to be able to inspect what the driver is
> > doing.
> >
> > I wonder where your need for higher speed comes from and if the compiler
> > really generates more effective code with your change.
> 
> Mostly it's because I'm obviously slowing things down a bit with the
> patch to fight the flood, so I feel obliged to get things back on par
> with the origin. Then, higher speed, let alone the time spent with
> interrupts disabled and/or spinlocks taken, is always one of generic
> goals for me.
> 
> As for the generated code, with this patch I don't aim to affect code
> generation, I rather avoid execution of part of existing code while
> being on the most critical path. It should be quite obvious that not
> executing some code is at least not slower than executing it.

That's true, but I think it doesn't apply here.

I would expect that the compiler "sees" for the call

	imx_uart_readl(sport, USR2)

that the 2nd argument is constant and that for that value of offset the
call is equivalent to readl(sport->port.membase + offset);

So I doubt you're making anything quicker here.

I tried the following patch on mainline (that is without the preceding
patches in this series):

diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
index 757825edb0cd..cfc2f7057345 100644
--- a/drivers/tty/serial/imx.c
+++ b/drivers/tty/serial/imx.c
@@ -807,7 +807,7 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
 	unsigned int rx, flg, ignored = 0;
 	struct tty_port *port = &sport->port.state->port;
 
-	while (imx_uart_readl(sport, USR2) & USR2_RDR) {
+	while (readl(sport->port.membase + USR2) & USR2_RDR) {
 		u32 usr2;
 
 		flg = TTY_NORMAL;

and the resulting code didn't change at all. For a bigger change (i.e.
adding a variable for sport->port.membase and replacing two
imx_uart_readl) the code changed quite a bit (it got 28 bytes bigger for
imx_v6_v7_defconfig) and in the short time I tried I couldn't judge if
the resulting code is better or not.

So a change that explicitly doesn't execute the code that the compiler
optimizes away anyhow isn't a win. Together with the fact that your
patch makes register access use different idioms and so makes it harder
to understand for a human I'd say the net benefit of your patch is negative.

> > Please either drop the patch from your series or provide the differences
> > the compiler produces and a benchmark.
> 
> If your only objection against this patch is the desire to keep a single
> place to add debug output, I'll be happy to tune the resulting code to
> still have one.

I don't see the need to optimize it.

> That said, before we make a decision, could you please tell why register
> shadows that the imx_uart_readl/writel are dealing with are needed in
> the first place? It looks like all the registers that are shadowed are
> readable as well. What's going on here, and if it happens to be a
> speed-up, do we have any benchmarks?

Not sure I did benchmarks back then, probably not. The main motivation
was really to have that single access function. So I admit being guilty
to have implemented an optimization without hard numbers just assuming
that access to (cached) RAM is quicker than the register space.

Best regards,
Uwe
Johan Hovold Jan. 18, 2023, 8:06 a.m. UTC | #10
On Tue, Jan 17, 2023 at 08:35:48PM +0300, Sergey Organov wrote:
> Johan Hovold <johan@kernel.org> writes:
> > On Fri, Jan 13, 2023 at 09:43:29PM +0300, Sergey Organov wrote:

> >> @@ -911,9 +911,6 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
> >>  				continue;
> >>  		}
> >>  
> >> -		if (uart_handle_sysrq_char(&sport->port, (unsigned char)rx))
> >> -			continue;
> >> -
> >>  		if (unlikely(rx & URXD_ERR)) {
> >>  			if (rx & URXD_BRK)
> >>  				sport->port.icount.brk++;
> >> @@ -942,7 +939,8 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
> >>  				flg = TTY_OVERRUN;
> >>  
> >>  			sport->port.sysrq = 0;
> >> -		}
> >> +		} else if (uart_handle_sysrq_char(&sport->port, (unsigned char)rx))
> >> +			continue;
> >
> > Nit: missing braces {}
> >
> > Note that you could also place just place this after the block due to
> > the reset of the sysrq time stamp.
> 
> Thanks, I think I'll opt for adding braces. Relying on the reset of the
> timestamp feels a bit convoluted.

I agree, it may be a bit too subtle.

Johan
Ilpo Järvinen Jan. 18, 2023, 8:24 a.m. UTC | #11
On Tue, 17 Jan 2023, Sergey Organov wrote:

> Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> writes:
> 
> > On Fri, 13 Jan 2023, Sergey Organov wrote:
> >
> >> Use readl() instead of heavier imx_uart_readl() in the Rx ISR, as we know
> >> we read registers that must not be cached.
> >> 
> >> Signed-off-by: Sergey Organov <sorganov@gmail.com>
> >> ---
> >>  drivers/tty/serial/imx.c | 5 +++--
> >>  1 file changed, 3 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
> >> index be00362b8b67..f4236e8995fa 100644
> >> --- a/drivers/tty/serial/imx.c
> >> +++ b/drivers/tty/serial/imx.c
> >> @@ -890,14 +890,15 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
> >>  	struct imx_port *sport = dev_id;
> >>  	unsigned int rx, flg;
> >>  	struct tty_port *port = &sport->port.state->port;
> >> +	typeof(sport->port.membase) membase = sport->port.membase;
> >>  	u32 usr2;
> >>  
> >>  	/* If we received something, check for 0xff flood */
> >> -	usr2 = imx_uart_readl(sport, USR2);
> >> +	usr2 = readl(membase + USR2);
> >>  	if (usr2 & USR2_RDR)
> >>  		imx_uart_check_flood(sport, usr2);
> >>  
> >> -	while ((rx = imx_uart_readl(sport, URXD0)) & URXD_CHARRDY) {
> >> +	while ((rx = readl(membase + URXD0)) & URXD_CHARRDY) {
> >>  		flg = TTY_NORMAL;
> >>  		sport->port.icount.rx++;
> >
> > I'd just make a uport local variable and use uport->membase + xx. There 
> > are plenty of sport->port constructs to replace with uport in that 
> > function anyway.
> 
> OK, thanks, will do it this way. Probably with global rename over this
> function in a separate patch?

Yes, it is better to have it in own patch.
Sergey Organov Jan. 18, 2023, 3:40 p.m. UTC | #12
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:

> Hello Sergey,
>
> On Tue, Jan 17, 2023 at 04:22:51PM +0300, Sergey Organov wrote:
>> Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:
>> > On Fri, Jan 13, 2023 at 09:43:33PM +0300, Sergey Organov wrote:
>> >> Use readl() instead of heavier imx_uart_readl() in the Rx ISR, as we know
>> >> we read registers that must not be cached.
>> >> 
>> >> Signed-off-by: Sergey Organov <sorganov@gmail.com>
>> >> ---
>> >>  drivers/tty/serial/imx.c | 5 +++--
>> >>  1 file changed, 3 insertions(+), 2 deletions(-)
>> >> 
>> >> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
>> >> index be00362b8b67..f4236e8995fa 100644
>> >> --- a/drivers/tty/serial/imx.c
>> >> +++ b/drivers/tty/serial/imx.c
>> >> @@ -890,14 +890,15 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
>> >>  	struct imx_port *sport = dev_id;
>> >>  	unsigned int rx, flg;
>> >>  	struct tty_port *port = &sport->port.state->port;
>> >> +	typeof(sport->port.membase) membase = sport->port.membase;
>> >>  	u32 usr2;
>> >>  
>> >>  	/* If we received something, check for 0xff flood */
>> >> -	usr2 = imx_uart_readl(sport, USR2);
>> >> +	usr2 = readl(membase + USR2);
>> >>  	if (usr2 & USR2_RDR)
>> >>  		imx_uart_check_flood(sport, usr2);
>> >>  
>> >> -	while ((rx = imx_uart_readl(sport, URXD0)) & URXD_CHARRDY) {
>> >> +	while ((rx = readl(membase + URXD0)) & URXD_CHARRDY) {
>> >>  		flg = TTY_NORMAL;
>> >>  		sport->port.icount.rx++;
>> >
>> > One of the motivations to introduce imx_uart_readl was to have a single
>> > place to add a debug output to be able to inspect what the driver is
>> > doing.
>> >
>> > I wonder where your need for higher speed comes from and if the compiler
>> > really generates more effective code with your change.
>> 
>> Mostly it's because I'm obviously slowing things down a bit with the
>> patch to fight the flood, so I feel obliged to get things back on par
>> with the origin. Then, higher speed, let alone the time spent with
>> interrupts disabled and/or spinlocks taken, is always one of generic
>> goals for me.
>> 
>> As for the generated code, with this patch I don't aim to affect code
>> generation, I rather avoid execution of part of existing code while
>> being on the most critical path. It should be quite obvious that not
>> executing some code is at least not slower than executing it.
>
> That's true, but I think it doesn't apply here.

Well, "at least not slower" still applies ;-)

>
> I would expect that the compiler "sees" for the call
>
> 	imx_uart_readl(sport, USR2)
>
> that the 2nd argument is constant and that for that value of offset the
> call is equivalent to readl(sport->port.membase + offset);
>
> So I doubt you're making anything quicker here.

Yep, it's nice compiler is clever enough to optimize-out the switch for
constant argument, though I still typically prefer to avoid over-relying
on optimizations. That said, I now tend to agree with your POV in this
particular case.

>
> I tried the following patch on mainline (that is without the preceding
> patches in this series):
>
> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
> index 757825edb0cd..cfc2f7057345 100644
> --- a/drivers/tty/serial/imx.c
> +++ b/drivers/tty/serial/imx.c
> @@ -807,7 +807,7 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
>  	unsigned int rx, flg, ignored = 0;
>  	struct tty_port *port = &sport->port.state->port;
>  
> -	while (imx_uart_readl(sport, USR2) & USR2_RDR) {
> +	while (readl(sport->port.membase + USR2) & USR2_RDR) {
>  		u32 usr2;
>  
>  		flg = TTY_NORMAL;
>
> and the resulting code didn't change at all. For a bigger change (i.e.
> adding a variable for sport->port.membase and replacing two
> imx_uart_readl) the code changed quite a bit (it got 28 bytes bigger for
> imx_v6_v7_defconfig) and in the short time I tried I couldn't judge if
> the resulting code is better or not.
>
> So a change that explicitly doesn't execute the code that the compiler
> optimizes away anyhow isn't a win. Together with the fact that your
> patch makes register access use different idioms and so makes it harder
> to understand for a human I'd say the net benefit of your patch is
> negative.

OK, you convinced me to drop it.

>
>> > Please either drop the patch from your series or provide the differences
>> > the compiler produces and a benchmark.
>> 
>> If your only objection against this patch is the desire to keep a single
>> place to add debug output, I'll be happy to tune the resulting code to
>> still have one.
>
> I don't see the need to optimize it.
>
>> That said, before we make a decision, could you please tell why register
>> shadows that the imx_uart_readl/writel are dealing with are needed in
>> the first place? It looks like all the registers that are shadowed are
>> readable as well. What's going on here, and if it happens to be a
>> speed-up, do we have any benchmarks?
>
> Not sure I did benchmarks back then, probably not. The main motivation
> was really to have that single access function. So I admit being guilty
> to have implemented an optimization without hard numbers just assuming
> that access to (cached) RAM is quicker than the register space.

Well, even if it is quicker, we still spend time writing to both RAM and
register, and then there is no gain for the data Tx/Rx registers that
aren't cached, yet are on most critical paths.

So, if this is just caching and doesn't change behavior, I'd suggest to
get rid of the shadowing altogether, making code simpler to follow.
Besides, if it were so, I'd had no temptation to replace the
imx_uart_readl() with raw readl().

Thanks,
-- Sergey
Sergey Organov Jan. 18, 2023, 3:43 p.m. UTC | #13
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> writes:

> On Tue, 17 Jan 2023, Sergey Organov wrote:
>
>> Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> writes:
>> 
>> > On Fri, 13 Jan 2023, Sergey Organov wrote:
>> >
>> >> Use readl() instead of heavier imx_uart_readl() in the Rx ISR, as we know
>> >> we read registers that must not be cached.
>> >> 
>> >> Signed-off-by: Sergey Organov <sorganov@gmail.com>
>> >> ---
>> >>  drivers/tty/serial/imx.c | 5 +++--
>> >>  1 file changed, 3 insertions(+), 2 deletions(-)
>> >> 
>> >> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
>> >> index be00362b8b67..f4236e8995fa 100644
>> >> --- a/drivers/tty/serial/imx.c
>> >> +++ b/drivers/tty/serial/imx.c
>> >> @@ -890,14 +890,15 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
>> >>  	struct imx_port *sport = dev_id;
>> >>  	unsigned int rx, flg;
>> >>  	struct tty_port *port = &sport->port.state->port;
>> >> +	typeof(sport->port.membase) membase = sport->port.membase;
>> >>  	u32 usr2;
>> >>  
>> >>  	/* If we received something, check for 0xff flood */
>> >> -	usr2 = imx_uart_readl(sport, USR2);
>> >> +	usr2 = readl(membase + USR2);
>> >>  	if (usr2 & USR2_RDR)
>> >>  		imx_uart_check_flood(sport, usr2);
>> >>  
>> >> -	while ((rx = imx_uart_readl(sport, URXD0)) & URXD_CHARRDY) {
>> >> +	while ((rx = readl(membase + URXD0)) & URXD_CHARRDY) {
>> >>  		flg = TTY_NORMAL;
>> >>  		sport->port.icount.rx++;
>> >
>> > I'd just make a uport local variable and use uport->membase + xx. There 
>> > are plenty of sport->port constructs to replace with uport in that 
>> > function anyway.
>> 
>> OK, thanks, will do it this way. Probably with global rename over this
>> function in a separate patch?
>
> Yes, it is better to have it in own patch.

Well, it now seems that I'll drop this patch altogether, by agreement
with Uwe. Do you think introducing of 'uport' still worth it in this
one function? I figure it's probably not, provided the reset of the code
in the driver still doesn't use the idiom.

Thanks,
-- Sergey
Ilpo Järvinen Jan. 18, 2023, 7:29 p.m. UTC | #14
On Wed, 18 Jan 2023, Sergey Organov wrote:

> Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> writes:
> 
> > On Tue, 17 Jan 2023, Sergey Organov wrote:
> >
> >> Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> writes:
> >> 
> >> > On Fri, 13 Jan 2023, Sergey Organov wrote:
> >> >
> >> >> Use readl() instead of heavier imx_uart_readl() in the Rx ISR, as we know
> >> >> we read registers that must not be cached.
> >> >> 
> >> >> Signed-off-by: Sergey Organov <sorganov@gmail.com>
> >> >> ---
> >> >>  drivers/tty/serial/imx.c | 5 +++--
> >> >>  1 file changed, 3 insertions(+), 2 deletions(-)
> >> >> 
> >> >> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
> >> >> index be00362b8b67..f4236e8995fa 100644
> >> >> --- a/drivers/tty/serial/imx.c
> >> >> +++ b/drivers/tty/serial/imx.c
> >> >> @@ -890,14 +890,15 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
> >> >>  	struct imx_port *sport = dev_id;
> >> >>  	unsigned int rx, flg;
> >> >>  	struct tty_port *port = &sport->port.state->port;
> >> >> +	typeof(sport->port.membase) membase = sport->port.membase;
> >> >>  	u32 usr2;
> >> >>  
> >> >>  	/* If we received something, check for 0xff flood */
> >> >> -	usr2 = imx_uart_readl(sport, USR2);
> >> >> +	usr2 = readl(membase + USR2);
> >> >>  	if (usr2 & USR2_RDR)
> >> >>  		imx_uart_check_flood(sport, usr2);
> >> >>  
> >> >> -	while ((rx = imx_uart_readl(sport, URXD0)) & URXD_CHARRDY) {
> >> >> +	while ((rx = readl(membase + URXD0)) & URXD_CHARRDY) {
> >> >>  		flg = TTY_NORMAL;
> >> >>  		sport->port.icount.rx++;
> >> >
> >> > I'd just make a uport local variable and use uport->membase + xx. There 
> >> > are plenty of sport->port constructs to replace with uport in that 
> >> > function anyway.
> >> 
> >> OK, thanks, will do it this way. Probably with global rename over this
> >> function in a separate patch?
> >
> > Yes, it is better to have it in own patch.
> 
> Well, it now seems that I'll drop this patch altogether, by agreement
> with Uwe. Do you think introducing of 'uport' still worth it in this
> one function? I figure it's probably not, provided the reset of the code
> in the driver still doesn't use the idiom.

I've no strong opinion either way. So feel free to leave them as they are 
now.
Uwe Kleine-König Jan. 19, 2023, 7:01 a.m. UTC | #15
Hello Sergey,

On Wed, Jan 18, 2023 at 06:40:17PM +0300, Sergey Organov wrote:
> Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:
> > On Tue, Jan 17, 2023 at 04:22:51PM +0300, Sergey Organov wrote:
> >> Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:
> >> > On Fri, Jan 13, 2023 at 09:43:33PM +0300, Sergey Organov wrote:
> >> >> Use readl() instead of heavier imx_uart_readl() in the Rx ISR, as we know
> >> >> we read registers that must not be cached.
> >> >> 
> >> >> Signed-off-by: Sergey Organov <sorganov@gmail.com>
> >> >> ---
> >> >>  drivers/tty/serial/imx.c | 5 +++--
> >> >>  1 file changed, 3 insertions(+), 2 deletions(-)
> >> >> 
> >> >> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
> >> >> index be00362b8b67..f4236e8995fa 100644
> >> >> --- a/drivers/tty/serial/imx.c
> >> >> +++ b/drivers/tty/serial/imx.c
> >> >> @@ -890,14 +890,15 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
> >> >>  	struct imx_port *sport = dev_id;
> >> >>  	unsigned int rx, flg;
> >> >>  	struct tty_port *port = &sport->port.state->port;
> >> >> +	typeof(sport->port.membase) membase = sport->port.membase;
> >> >>  	u32 usr2;
> >> >>  
> >> >>  	/* If we received something, check for 0xff flood */
> >> >> -	usr2 = imx_uart_readl(sport, USR2);
> >> >> +	usr2 = readl(membase + USR2);
> >> >>  	if (usr2 & USR2_RDR)
> >> >>  		imx_uart_check_flood(sport, usr2);
> >> >>  
> >> >> -	while ((rx = imx_uart_readl(sport, URXD0)) & URXD_CHARRDY) {
> >> >> +	while ((rx = readl(membase + URXD0)) & URXD_CHARRDY) {
> >> >>  		flg = TTY_NORMAL;
> >> >>  		sport->port.icount.rx++;
> >> >
> >> > One of the motivations to introduce imx_uart_readl was to have a single
> >> > place to add a debug output to be able to inspect what the driver is
> >> > doing.
> >> >
> >> > I wonder where your need for higher speed comes from and if the compiler
> >> > really generates more effective code with your change.
> >> 
> >> Mostly it's because I'm obviously slowing things down a bit with the
> >> patch to fight the flood, so I feel obliged to get things back on par
> >> with the origin. Then, higher speed, let alone the time spent with
> >> interrupts disabled and/or spinlocks taken, is always one of generic
> >> goals for me.
> >> 
> >> As for the generated code, with this patch I don't aim to affect code
> >> generation, I rather avoid execution of part of existing code while
> >> being on the most critical path. It should be quite obvious that not
> >> executing some code is at least not slower than executing it.
> >
> > That's true, but I think it doesn't apply here.
> 
> Well, "at least not slower" still applies ;-)
> 
> >
> > I would expect that the compiler "sees" for the call
> >
> > 	imx_uart_readl(sport, USR2)
> >
> > that the 2nd argument is constant and that for that value of offset the
> > call is equivalent to readl(sport->port.membase + offset);
> >
> > So I doubt you're making anything quicker here.
> 
> Yep, it's nice compiler is clever enough to optimize-out the switch for
> constant argument, though I still typically prefer to avoid over-relying
> on optimizations. That said, I now tend to agree with your POV in this
> particular case.
> 
> >
> > I tried the following patch on mainline (that is without the preceding
> > patches in this series):
> >
> > diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
> > index 757825edb0cd..cfc2f7057345 100644
> > --- a/drivers/tty/serial/imx.c
> > +++ b/drivers/tty/serial/imx.c
> > @@ -807,7 +807,7 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
> >  	unsigned int rx, flg, ignored = 0;
> >  	struct tty_port *port = &sport->port.state->port;
> >  
> > -	while (imx_uart_readl(sport, USR2) & USR2_RDR) {
> > +	while (readl(sport->port.membase + USR2) & USR2_RDR) {
> >  		u32 usr2;
> >  
> >  		flg = TTY_NORMAL;
> >
> > and the resulting code didn't change at all. For a bigger change (i.e.
> > adding a variable for sport->port.membase and replacing two
> > imx_uart_readl) the code changed quite a bit (it got 28 bytes bigger for
> > imx_v6_v7_defconfig) and in the short time I tried I couldn't judge if
> > the resulting code is better or not.
> >
> > So a change that explicitly doesn't execute the code that the compiler
> > optimizes away anyhow isn't a win. Together with the fact that your
> > patch makes register access use different idioms and so makes it harder
> > to understand for a human I'd say the net benefit of your patch is
> > negative.
> 
> OK, you convinced me to drop it.
> 
> >
> >> > Please either drop the patch from your series or provide the differences
> >> > the compiler produces and a benchmark.
> >> 
> >> If your only objection against this patch is the desire to keep a single
> >> place to add debug output, I'll be happy to tune the resulting code to
> >> still have one.
> >
> > I don't see the need to optimize it.
> >
> >> That said, before we make a decision, could you please tell why register
> >> shadows that the imx_uart_readl/writel are dealing with are needed in
> >> the first place? It looks like all the registers that are shadowed are
> >> readable as well. What's going on here, and if it happens to be a
> >> speed-up, do we have any benchmarks?
> >
> > Not sure I did benchmarks back then, probably not. The main motivation
> > was really to have that single access function. So I admit being guilty
> > to have implemented an optimization without hard numbers just assuming
> > that access to (cached) RAM is quicker than the register space.
> 
> Well, even if it is quicker, we still spend time writing to both RAM and
> register, and then there is no gain for the data Tx/Rx registers that
> aren't cached, yet are on most critical paths.

Well, assuming we're saving some time for the ctrl registers, it's worth
keeping it even though there is no gain for RX/TX, right? There is no
overhead for RX/TX.

> So, if this is just caching and doesn't change behavior, I'd suggest to
> get rid of the shadowing altogether, making code simpler to follow.

Knowing it's subjective I don't think the shadowing is complicated.
Functions are using the driver specific readl and writel functions and
shadowing is limited to these two functions.

in sum today I wouldn't change if the code does shadow the registers or
not if there isn't at least a strong hint that the one or the other
variant is better. So if you still want to work on that you're welcome,
but I invite you to do some benchmarks first and not only assume one or
the other variant is better.

My (unproved) assumption is that for console usage there is hardly a
difference and with a workflow that needs more changing of control
settings (like half duplex rs485) shadowing is slightly better.

Best regards
Uwe
Sergey Organov Jan. 21, 2023, 6:04 p.m. UTC | #16
Hello Uwe,

Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:

> Hello Sergey,
>
> On Wed, Jan 18, 2023 at 06:40:17PM +0300, Sergey Organov wrote:
>> Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:
>> > On Tue, Jan 17, 2023 at 04:22:51PM +0300, Sergey Organov wrote:
>> >> Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:
>> >> > On Fri, Jan 13, 2023 at 09:43:33PM +0300, Sergey Organov wrote:
>> >> >> Use readl() instead of heavier imx_uart_readl() in the Rx ISR, as we know
>> >> >> we read registers that must not be cached.
>> >> >> 
>> >> >> Signed-off-by: Sergey Organov <sorganov@gmail.com>
>> >> >> ---
>> >> >>  drivers/tty/serial/imx.c | 5 +++--
>> >> >>  1 file changed, 3 insertions(+), 2 deletions(-)
>> >> >> 
>> >> >> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
>> >> >> index be00362b8b67..f4236e8995fa 100644
>> >> >> --- a/drivers/tty/serial/imx.c
>> >> >> +++ b/drivers/tty/serial/imx.c
>> >> >> @@ -890,14 +890,15 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
>> >> >>  	struct imx_port *sport = dev_id;
>> >> >>  	unsigned int rx, flg;
>> >> >>  	struct tty_port *port = &sport->port.state->port;
>> >> >> +	typeof(sport->port.membase) membase = sport->port.membase;
>> >> >>  	u32 usr2;
>> >> >>  
>> >> >>  	/* If we received something, check for 0xff flood */
>> >> >> -	usr2 = imx_uart_readl(sport, USR2);
>> >> >> +	usr2 = readl(membase + USR2);
>> >> >>  	if (usr2 & USR2_RDR)
>> >> >>  		imx_uart_check_flood(sport, usr2);
>> >> >>  
>> >> >> -	while ((rx = imx_uart_readl(sport, URXD0)) & URXD_CHARRDY) {
>> >> >> +	while ((rx = readl(membase + URXD0)) & URXD_CHARRDY) {
>> >> >>  		flg = TTY_NORMAL;
>> >> >>  		sport->port.icount.rx++;
>> >> >
>> >> > One of the motivations to introduce imx_uart_readl was to have a single
>> >> > place to add a debug output to be able to inspect what the driver is
>> >> > doing.
>> >> >
>> >> > I wonder where your need for higher speed comes from and if the compiler
>> >> > really generates more effective code with your change.
>> >> 
>> >> Mostly it's because I'm obviously slowing things down a bit with the
>> >> patch to fight the flood, so I feel obliged to get things back on par
>> >> with the origin. Then, higher speed, let alone the time spent with
>> >> interrupts disabled and/or spinlocks taken, is always one of generic
>> >> goals for me.
>> >> 
>> >> As for the generated code, with this patch I don't aim to affect code
>> >> generation, I rather avoid execution of part of existing code while
>> >> being on the most critical path. It should be quite obvious that not
>> >> executing some code is at least not slower than executing it.
>> >
>> > That's true, but I think it doesn't apply here.
>> 
>> Well, "at least not slower" still applies ;-)
>> 
>> >
>> > I would expect that the compiler "sees" for the call
>> >
>> > 	imx_uart_readl(sport, USR2)
>> >
>> > that the 2nd argument is constant and that for that value of offset the
>> > call is equivalent to readl(sport->port.membase + offset);
>> >
>> > So I doubt you're making anything quicker here.
>> 
>> Yep, it's nice compiler is clever enough to optimize-out the switch for
>> constant argument, though I still typically prefer to avoid over-relying
>> on optimizations. That said, I now tend to agree with your POV in this
>> particular case.
>> 
>> >
>> > I tried the following patch on mainline (that is without the preceding
>> > patches in this series):
>> >
>> > diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
>> > index 757825edb0cd..cfc2f7057345 100644
>> > --- a/drivers/tty/serial/imx.c
>> > +++ b/drivers/tty/serial/imx.c
>> > @@ -807,7 +807,7 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
>> >  	unsigned int rx, flg, ignored = 0;
>> >  	struct tty_port *port = &sport->port.state->port;
>> >  
>> > -	while (imx_uart_readl(sport, USR2) & USR2_RDR) {
>> > +	while (readl(sport->port.membase + USR2) & USR2_RDR) {
>> >  		u32 usr2;
>> >  
>> >  		flg = TTY_NORMAL;
>> >
>> > and the resulting code didn't change at all. For a bigger change (i.e.
>> > adding a variable for sport->port.membase and replacing two
>> > imx_uart_readl) the code changed quite a bit (it got 28 bytes bigger for
>> > imx_v6_v7_defconfig) and in the short time I tried I couldn't judge if
>> > the resulting code is better or not.
>> >
>> > So a change that explicitly doesn't execute the code that the compiler
>> > optimizes away anyhow isn't a win. Together with the fact that your
>> > patch makes register access use different idioms and so makes it harder
>> > to understand for a human I'd say the net benefit of your patch is
>> > negative.
>> 
>> OK, you convinced me to drop it.
>> 
>> >
>> >> > Please either drop the patch from your series or provide the differences
>> >> > the compiler produces and a benchmark.
>> >> 
>> >> If your only objection against this patch is the desire to keep a single
>> >> place to add debug output, I'll be happy to tune the resulting code to
>> >> still have one.
>> >
>> > I don't see the need to optimize it.
>> >
>> >> That said, before we make a decision, could you please tell why register
>> >> shadows that the imx_uart_readl/writel are dealing with are needed in
>> >> the first place? It looks like all the registers that are shadowed are
>> >> readable as well. What's going on here, and if it happens to be a
>> >> speed-up, do we have any benchmarks?
>> >
>> > Not sure I did benchmarks back then, probably not. The main motivation
>> > was really to have that single access function. So I admit being guilty
>> > to have implemented an optimization without hard numbers just assuming
>> > that access to (cached) RAM is quicker than the register space.
>> 
>> Well, even if it is quicker, we still spend time writing to both RAM and
>> register, and then there is no gain for the data Tx/Rx registers that
>> aren't cached, yet are on most critical paths.
>
> Well, assuming we're saving some time for the ctrl registers, it's worth
> keeping it even though there is no gain for RX/TX, right? There is no
> overhead for RX/TX.
>
>> So, if this is just caching and doesn't change behavior, I'd suggest to
>> get rid of the shadowing altogether, making code simpler to follow.
>
> Knowing it's subjective I don't think the shadowing is complicated.
> Functions are using the driver specific readl and writel functions and
> shadowing is limited to these two functions.

It's not complicated indeed, but it's still code, and the less code, --
the better.

>
> in sum today I wouldn't change if the code does shadow the registers
>or not if there isn't at least a strong hint that the one or the other
>variant is better. So if you still want to work on that you're welcome,
>but I invite you to do some benchmarks first and not only assume one or
>the other variant is better.

No code is better hands down, unless proved otherwise. I dunno if a code
that somehow sneaked into the kernel gets any specific significance from
the maintenance POV though. If so, for me it'd be the only sound
argument in favor of keeping the code intact.

>
> My (unproved) assumption is that for console usage there is hardly a
> difference and with a workflow that needs more changing of control
> settings (like half duplex rs485) shadowing is slightly better.

Well, also unproved, I tend to disagree here. I'm afraid you'd rather
have hard time finding a case where it is noticeably better, especially
as on the write path the shadowing is pure overhead, and so you'd have
hard time getting this code into the kernel in the first place, provided
somebody would care to object.

For what it's worth, I've removed the shadowing in my kernel version,
and it gave me no troubles yet. For reference, I use RS232 only with DMA
turned off.

Overall, my feeling about the issue: the shadowing code is confusing and
should not have been put there in the first place. Essentially it looks
like pure code bloat, so removing it would be an improvement. Thus I
figure I'll send a separate patch for this, and you guys decide if the
patch goes in (and the code in question goes out) or not.

Thanks,
-- Sergey Organov