mbox series

[v3,00/13] serial: qcom-geni-serial: implement support for SE DMA

Message ID 20221123110759.1836666-1-brgl@bgdev.pl
Headers show
Series serial: qcom-geni-serial: implement support for SE DMA | expand

Message

Bartosz Golaszewski Nov. 23, 2022, 11:07 a.m. UTC
From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

The goal of this series is to update the qcom-geni-serial driver to use
the DMA mode of the QUPv3 serial engine. This is accomplished by the last
patch in the series. The previous ones contain either various tweaks,
reworks and refactoring or prepare the driver for adding DMA support.

More work will follow on the serial engine in order to reduce code
redundancy among its users and add support for SE DMA to the qcom GENI
SPI driver.

v2 -> v3:
- drop devres patches from the series

v1 -> v2:
- turn to_dev_uport() macro into a static inline function
- use CIRC_CNT_TO_END() and uart_xmit_advance() where applicable and don't
  handle xmit->tail directly
- drop sizeof() where BYTES_PER_FIFO_WORD can be used
- further refactor qcom_geni_serial_handle_tx_fifo()
- collect review tags

Bartosz Golaszewski (13):
  tty: serial: qcom-geni-serial: drop unneeded forward definitions
  tty: serial: qcom-geni-serial: remove unused symbols
  tty: serial: qcom-geni-serial: align #define values
  tty: serial: qcom-geni-serial: improve the to_dev_port() macro
  tty: serial: qcom-geni-serial: remove stray newlines
  tty: serial: qcom-geni-serial: refactor qcom_geni_serial_isr()
  tty: serial: qcom-geni-serial: remove unneeded tabs
  tty: serial: qcom-geni-serial: refactor qcom_geni_serial_handle_tx()
  tty: serial: qcom-geni-serial: drop the return value from handle_rx
  tty: serial: qcom-geni-serial: use of_device_id data
  tty: serial: qcom-geni-serial: stop operations in progress at shutdown
  soc: qcom-geni-se: add more symbol definitions
  tty: serial: qcom-geni-serial: add support for serial engine DMA

 drivers/tty/serial/qcom_geni_serial.c | 606 +++++++++++++++++---------
 include/linux/qcom-geni-se.h          |   3 +
 2 files changed, 409 insertions(+), 200 deletions(-)

Comments

Bartosz Golaszewski Nov. 24, 2022, 2:51 p.m. UTC | #1
On Thu, Nov 24, 2022 at 8:18 AM Jiri Slaby <jirislaby@kernel.org> wrote:
>
> On 23. 11. 22, 12:07, Bartosz Golaszewski wrote:
> > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> >
> > qcom_geni_serial_handle_tx() is pretty big, let's move the code that
> > handles the actual writing of data to a separate function which makes
> > sense in preparation for introducing a dma variant of handle_tx().
> >
> > Let's also shuffle the code a bit, drop unneeded variables and use
> > uart_xmit_advance() instead of handling tail->xmit manually.
> >
> > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> > ---
> >   drivers/tty/serial/qcom_geni_serial.c | 54 +++++++++++++--------------
> >   1 file changed, 27 insertions(+), 27 deletions(-)
> >
> > diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
> > index 68a1402fbe58..658b6d596f58 100644
> > --- a/drivers/tty/serial/qcom_geni_serial.c
> > +++ b/drivers/tty/serial/qcom_geni_serial.c
> > @@ -704,19 +704,42 @@ static void qcom_geni_serial_start_rx(struct uart_port *uport)
> >       writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
> >   }
>
> I know you just shuffle the code, but:
>
> > +static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
> > +                                          unsigned int chunk)
> > +{
> > +     struct qcom_geni_serial_port *port = to_dev_port(uport);
> > +     struct circ_buf *xmit = &uport->state->xmit;
> > +     u8 buf[BYTES_PER_FIFO_WORD];
> > +     size_t remaining = chunk;
>
> Why size_t when the others are uints? Well, BYTES_PER_FIFO_WORD should
> be defined as 4U.

Good point.

>
> > +     unsigned int tx_bytes;
> > +     int c;
> > +
> > +     while (remaining) {
> > +             memset(buf, 0, sizeof(buf));
> > +             tx_bytes = min_t(size_t, remaining, BYTES_PER_FIFO_WORD);
>
> Then, no need for min_t.
>

Same.

> > +
> > +             for (c = 0; c < tx_bytes ; c++) {
> > +                     buf[c] = xmit->buf[xmit->tail];
> > +                     uart_xmit_advance(uport, 1);
> > +             }
> > +
> > +             iowrite32_rep(uport->membase + SE_GENI_TX_FIFOn, buf, 1);
>
> I wonder, why is _rep variant used to transfer a single word? Only to
> hide the cast?
>

Even if - using writel() with a cast doesn't seem to improve the
performance and this one looks prettier IMO.

Bartosz
Srinivas Kandagatla Nov. 25, 2022, 2:37 p.m. UTC | #2
Thanks Bartosz for the patch,

On 23/11/2022 11:07, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> 
> The qcom-geni-serial driver currently only works in SE FIFO mode. This
> limits the UART speed to around 180 kB/s. In order to achieve higher
> speeds we need to use SE DMA mode.
> 
> Keep the console port working in FIFO mode but extend the code to use DMA
> for the high-speed port.
> 
> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> ---
>   drivers/tty/serial/qcom_geni_serial.c | 289 ++++++++++++++++++++++----
>   1 file changed, 247 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
> index 82242a40a95a..0f4ba909c792 100644
> --- a/drivers/tty/serial/qcom_geni_serial.c
> +++ b/drivers/tty/serial/qcom_geni_serial.c
> @@ -70,6 +70,8 @@
>   #define UART_START_TX			0x1
>   /* UART S_CMD OP codes */
>   #define UART_START_READ			0x1
> +#define UART_PARAM			0x1
> +#define UART_PARAM_RFR_OPEN		BIT(7)
>   
>   #define UART_OVERSAMPLING		32
>   #define STALE_TIMEOUT			16
> @@ -95,9 +97,11 @@
>   /* We always configure 4 bytes per FIFO word */
>   #define BYTES_PER_FIFO_WORD		4
>   
> +#define DMA_RX_BUF_SIZE		2048
> +
>   struct qcom_geni_device_data {
>   	bool console;
> -	void (*handle_rx)(struct uart_port *uport, u32 bytes, bool drop);
> +	enum geni_se_xfer_mode mode;
>   };
>   
>   struct qcom_geni_private_data {
> @@ -118,9 +122,11 @@ struct qcom_geni_serial_port {
>   	u32 tx_fifo_depth;
>   	u32 tx_fifo_width;
>   	u32 rx_fifo_depth;
> +	dma_addr_t tx_dma_addr;
> +	dma_addr_t rx_dma_addr;
>   	bool setup;
>   	unsigned int baud;
> -	void *rx_fifo;
> +	void *rx_buf;
>   	u32 loopback;
>   	bool brk;
>   
> @@ -552,18 +558,11 @@ static void handle_rx_console(struct uart_port *uport, u32 bytes, bool drop)
>   
>   static void handle_rx_uart(struct uart_port *uport, u32 bytes, bool drop)
>   {
> -	struct tty_port *tport;
>   	struct qcom_geni_serial_port *port = to_dev_port(uport);
> -	u32 num_bytes_pw = port->tx_fifo_width / BITS_PER_BYTE;
> -	u32 words = ALIGN(bytes, num_bytes_pw) / num_bytes_pw;
> +	struct tty_port *tport = &uport->state->port;
>   	int ret;
>   
> -	tport = &uport->state->port;
> -	ioread32_rep(uport->membase + SE_GENI_RX_FIFOn, port->rx_fifo, words);
> -	if (drop)
> -		return;
> -

Are we removing FIFO support for uart?

It it not a overhead to use dma for small transfers that are less than 
fifo size?


> -	ret = tty_insert_flip_string(tport, port->rx_fifo, bytes);
> +	ret = tty_insert_flip_string(tport, port->rx_buf, bytes);
>   	if (ret != bytes) {
>   		dev_err(uport->dev, "%s:Unable to push data ret %d_bytes %d\n",
>   				__func__, ret, bytes);
> @@ -578,7 +577,70 @@ static unsigned int qcom_geni_serial_tx_empty(struct uart_port *uport)
>   	return !readl(uport->membase + SE_GENI_TX_FIFO_STATUS);
>   }
>   
> -static void qcom_geni_serial_start_tx(struct uart_port *uport)
> +static void qcom_geni_serial_stop_tx_dma(struct uart_port *uport)
> +{
> +	struct qcom_geni_serial_port *port = to_dev_port(uport);
> +	bool done;

-->
> +	u32 status;
...
> +
> +	status = readl(uport->membase + SE_GENI_STATUS);
> +	if (!(status & M_GENI_CMD_ACTIVE))
> +		return;
<---

this code snippet is repeating more than few times in the patches, looks 
like it could be made to a inline helper.


> +
> +	if (port->rx_dma_addr) {
> +		geni_se_tx_dma_unprep(&port->se, port->tx_dma_addr,
> +				      port->tx_remaining);
> +		port->tx_dma_addr = (dma_addr_t)NULL;
> +		port->tx_remaining = 0;
> +	}
> +
> +	m_irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
> +	writel(m_irq_en, uport->membase + SE_GENI_M_IRQ_EN);
> +	geni_se_cancel_m_cmd(&port->se);
> +
> +	done = qcom_geni_serial_poll_bit(uport, SE_GENI_S_IRQ_STATUS,
> +					 S_CMD_CANCEL_EN, true);
> +	if (!done) {
> +		geni_se_abort_m_cmd(&port->se);
> +		qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
> +					  M_CMD_ABORT_EN, true);

return is not checked, there are few more such instances in the patch.

> +		writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
> +	}
> +
> +	writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
> +}
> +
> +static void qcom_geni_serial_start_tx_dma(struct uart_port *uport)
> +{
> +	struct qcom_geni_serial_port *port = to_dev_port(uport);
> +	struct circ_buf *xmit = &uport->state->xmit;
> +	unsigned int xmit_size;
> +	int ret;
> +
> +	if (port->tx_dma_addr)
> +		return;
Is this condition actually possible?


> +
> +	xmit_size = uart_circ_chars_pending(xmit);
> +	if (xmit_size < WAKEUP_CHARS)
> +		uart_write_wakeup(uport);
> +
> +	xmit_size = CIRC_CNT_TO_END(xmit->head, xmit->tail, UART_XMIT_SIZE);
> +
> +	qcom_geni_serial_setup_tx(uport, xmit_size);
> +
> +	ret = geni_se_tx_dma_prep(&port->se, &xmit->buf[xmit->tail],
> +				  xmit_size, &port->tx_dma_addr);
> +	if (ret) {
> +		dev_err(uport->dev, "unable to start TX SE DMA: %d\n", ret);
> +		qcom_geni_serial_stop_tx_dma(uport);
> +		return;
> +	}
> +
> +	port->tx_remaining = xmit_size;
> +}
> +

...
Bartosz Golaszewski Nov. 28, 2022, 11:03 a.m. UTC | #3
On Fri, 25 Nov 2022 at 15:37, Srinivas Kandagatla
<srinivas.kandagatla@linaro.org> wrote:
>

<snip>

> >
> > @@ -552,18 +558,11 @@ static void handle_rx_console(struct uart_port *uport, u32 bytes, bool drop)
> >
> >   static void handle_rx_uart(struct uart_port *uport, u32 bytes, bool drop)
> >   {
> > -     struct tty_port *tport;
> >       struct qcom_geni_serial_port *port = to_dev_port(uport);
> > -     u32 num_bytes_pw = port->tx_fifo_width / BITS_PER_BYTE;
> > -     u32 words = ALIGN(bytes, num_bytes_pw) / num_bytes_pw;
> > +     struct tty_port *tport = &uport->state->port;
> >       int ret;
> >
> > -     tport = &uport->state->port;
> > -     ioread32_rep(uport->membase + SE_GENI_RX_FIFOn, port->rx_fifo, words);
> > -     if (drop)
> > -             return;
> > -
>
> Are we removing FIFO support for uart?
>
> It it not a overhead to use dma for small transfers that are less than
> fifo size?
>

You made me think about it but no - while I haven't measured it yet, I
don't think it's worth the code complexity behind it. The i2c driver
doesn't do it this way for short transfers either. If you insist I can
test it and come forward with numbers but unless we encounter a
real-life example of the need for lots of very short reads/writes in
short succession, I don't think we should overcomplicate this.

>
> > -     ret = tty_insert_flip_string(tport, port->rx_fifo, bytes);
> > +     ret = tty_insert_flip_string(tport, port->rx_buf, bytes);
> >       if (ret != bytes) {
> >               dev_err(uport->dev, "%s:Unable to push data ret %d_bytes %d\n",
> >                               __func__, ret, bytes);
> > @@ -578,7 +577,70 @@ static unsigned int qcom_geni_serial_tx_empty(struct uart_port *uport)
> >       return !readl(uport->membase + SE_GENI_TX_FIFO_STATUS);
> >   }
> >
> > -static void qcom_geni_serial_start_tx(struct uart_port *uport)
> > +static void qcom_geni_serial_stop_tx_dma(struct uart_port *uport)
> > +{
> > +     struct qcom_geni_serial_port *port = to_dev_port(uport);
> > +     bool done;
>
> -->
> > +     u32 status;
> ...
> > +
> > +     status = readl(uport->membase + SE_GENI_STATUS);
> > +     if (!(status & M_GENI_CMD_ACTIVE))
> > +             return;
> <---
>
> this code snippet is repeating more than few times in the patches, looks
> like it could be made to a inline helper.
>

Makes sense.

>
> > +
> > +     if (port->rx_dma_addr) {
> > +             geni_se_tx_dma_unprep(&port->se, port->tx_dma_addr,
> > +                                   port->tx_remaining);
> > +             port->tx_dma_addr = (dma_addr_t)NULL;
> > +             port->tx_remaining = 0;
> > +     }
> > +
> > +     m_irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
> > +     writel(m_irq_en, uport->membase + SE_GENI_M_IRQ_EN);
> > +     geni_se_cancel_m_cmd(&port->se);
> > +
> > +     done = qcom_geni_serial_poll_bit(uport, SE_GENI_S_IRQ_STATUS,
> > +                                      S_CMD_CANCEL_EN, true);
> > +     if (!done) {
> > +             geni_se_abort_m_cmd(&port->se);
> > +             qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
> > +                                       M_CMD_ABORT_EN, true);
>
> return is not checked, there are few more such instances in the patch.
>

Yes, but this is an error path. What would I do if the cancel failed
to go through and then the abort failed as well? I can at best emit an
error message.

> > +             writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
> > +     }
> > +
> > +     writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
> > +}
> > +
> > +static void qcom_geni_serial_start_tx_dma(struct uart_port *uport)
> > +{
> > +     struct qcom_geni_serial_port *port = to_dev_port(uport);
> > +     struct circ_buf *xmit = &uport->state->xmit;
> > +     unsigned int xmit_size;
> > +     int ret;
> > +
> > +     if (port->tx_dma_addr)
> > +             return;
> Is this condition actually possible?
>

It should never happen but I wanted to be sure. Maybe a BUG_ON() or
WARN_ON() would be better?

>
> > +
> > +     xmit_size = uart_circ_chars_pending(xmit);
> > +     if (xmit_size < WAKEUP_CHARS)
> > +             uart_write_wakeup(uport);
> > +
> > +     xmit_size = CIRC_CNT_TO_END(xmit->head, xmit->tail, UART_XMIT_SIZE);
> > +
> > +     qcom_geni_serial_setup_tx(uport, xmit_size);
> > +
> > +     ret = geni_se_tx_dma_prep(&port->se, &xmit->buf[xmit->tail],
> > +                               xmit_size, &port->tx_dma_addr);
> > +     if (ret) {
> > +             dev_err(uport->dev, "unable to start TX SE DMA: %d\n", ret);
> > +             qcom_geni_serial_stop_tx_dma(uport);
> > +             return;
> > +     }
> > +
> > +     port->tx_remaining = xmit_size;
> > +}
> > +
>
> ...

Bart
Bartosz Golaszewski Nov. 29, 2022, 10:13 a.m. UTC | #4
On Fri, Nov 25, 2022 at 3:37 PM Srinivas Kandagatla
<srinivas.kandagatla@linaro.org> wrote:
>

[snip]

> > +
> > +static void qcom_geni_serial_start_tx_dma(struct uart_port *uport)
> > +{
> > +     struct qcom_geni_serial_port *port = to_dev_port(uport);
> > +     struct circ_buf *xmit = &uport->state->xmit;
> > +     unsigned int xmit_size;
> > +     int ret;
> > +
> > +     if (port->tx_dma_addr)
> > +             return;
> Is this condition actually possible?
>

So it turns out, it's possible that the subsystem calls start_tx when
this is already set but the main engine is not yet in active state (so
we can't simply test that bit). This needs to stay then.

Bart