[1/3] spi: spi-geni-qcom: Use the FIFO even more

Message ID	20200912140730.1.Ie67fa32009b94702d56232c064f1d89065ee8836@changeid
State	Accepted
Commit	fc129a43aa2705770dc45b2e9c506d2617fd5863
Headers	show Return-Path: <SRS0=elUy=CV=vger.kernel.org=linux-arm-msm-owner@kernel.org> From: Douglas Anderson <dianders@chromium.org> To: Mark Brown <broonie@kernel.org> Cc: Akash Asthana <akashast@codeaurora.org>, swboyd@chromium.org, Douglas Anderson <dianders@chromium.org>, Andy Gross <agross@kernel.org>, Bjorn Andersson <bjorn.andersson@linaro.org>, linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-spi@vger.kernel.org Subject: [PATCH 1/3] spi: spi-geni-qcom: Use the FIFO even more Date: Sat, 12 Sep 2020 14:07:59 -0700 Message-Id: <20200912140730.1.Ie67fa32009b94702d56232c064f1d89065ee8836@changeid> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-arm-msm-owner@vger.kernel.org Precedence: bulk
Series	[1/3] spi: spi-geni-qcom: Use the FIFO even more \| expand [1/3] spi: spi-geni-qcom: Use the FIFO even more [2/3] spi: spi-geni-qcom: Don't program CS_TOGGLE again and again [3/3] spi: spi-geni-qcom: Slightly optimize setup of bidirectional xfters

Doug Anderson Sept. 12, 2020, 9:07 p.m. UTC

In commit 902481a78ee4 ("spi: spi-geni-qcom: Actually use our FIFO") I
explained that the maximum size we could program the FIFO was
"mas->tx_fifo_depth - 3" but that I chose "mas->tx_fifo_depth()"
because I was worried about decreased bandwidth.

Since that time:
* All the interconnect patches have landed, making things run at the
  proper speed.
* I've done more measurements.

This lets me confirm that there's really no downside of using the FIFO
more.  Specifically I did "flashrom -p ec -r /tmp/foo.bin" on a
Chromebook and averaged over several runs.

Before: It took 6.66 seconds and 59669 interrupts fired.
After:  It took 6.66 seconds and 47992 interrupts fired.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/spi/spi-geni-qcom.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Bjorn Andersson Sept. 12, 2020, 10:53 p.m. UTC | #1

On Sat 12 Sep 16:07 CDT 2020, Douglas Anderson wrote:

> In commit 902481a78ee4 ("spi: spi-geni-qcom: Actually use our FIFO") I
> explained that the maximum size we could program the FIFO was
> "mas->tx_fifo_depth - 3" but that I chose "mas->tx_fifo_depth()"
> because I was worried about decreased bandwidth.
> 
> Since that time:
> * All the interconnect patches have landed, making things run at the
>   proper speed.
> * I've done more measurements.
> 
> This lets me confirm that there's really no downside of using the FIFO
> more.  Specifically I did "flashrom -p ec -r /tmp/foo.bin" on a
> Chromebook and averaged over several runs.

Wouldn't there be a downside in the form of setting the watermark that
close to the full FIFO we have less room for being late handling the
interrupt? Or is there some mechanism involved that will prevent
the FIFO from being overrun?

Regards,
Bjorn

> 
> Before: It took 6.66 seconds and 59669 interrupts fired.
> After:  It took 6.66 seconds and 47992 interrupts fired.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
>  drivers/spi/spi-geni-qcom.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c
> index 0dc3f4c55b0b..7f0bf0dec466 100644
> --- a/drivers/spi/spi-geni-qcom.c
> +++ b/drivers/spi/spi-geni-qcom.c
> @@ -308,7 +308,7 @@ static int spi_geni_init(struct spi_geni_master *mas)
>  	 * Hardware programming guide suggests to configure
>  	 * RX FIFO RFR level to fifo_depth-2.
>  	 */
> -	geni_se_init(se, mas->tx_fifo_depth / 2, mas->tx_fifo_depth - 2);
> +	geni_se_init(se, mas->tx_fifo_depth - 3, mas->tx_fifo_depth - 2);
>  	/* Transmit an entire FIFO worth of data per IRQ */
>  	mas->tx_wm = 1;
>  	ver = geni_se_get_qup_hw_version(se);
> -- 
> 2.28.0.618.gf4bc123cb7-goog
>

Bjorn Andersson Sept. 12, 2020, 10:54 p.m. UTC | #2

On Sat 12 Sep 16:08 CDT 2020, Douglas Anderson wrote:

> When setting up a bidirectional transfer we need to program both the
> TX and RX lengths.  We don't need a memory barrier between those two
> writes.  Factor out the __iowmb() and use writel_relaxed().  This
> saves a fraction of a microsecond of setup overhead on bidirectional
> transfers.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
>  drivers/spi/spi-geni-qcom.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c
> index 92d88bf85a90..6c7e12b68bf0 100644
> --- a/drivers/spi/spi-geni-qcom.c
> +++ b/drivers/spi/spi-geni-qcom.c
> @@ -376,15 +376,22 @@ static void setup_fifo_xfer(struct spi_transfer *xfer,
>  	len &= TRANS_LEN_MSK;
>  
>  	mas->cur_xfer = xfer;
> +
> +	/*
> +	 * Factor out the __iowmb() so that we can use writel_relaxed() for
> +	 * both writes below and thus only incur the overhead once even if
> +	 * we execute both of them.
> +	 */

How many passes through this function do we have to take before saving
the amount of time it took me to read this comment?

Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

Regards,
Bjorn

> +	__iowmb();
> +
>  	if (xfer->tx_buf) {
>  		m_cmd |= SPI_TX_ONLY;
>  		mas->tx_rem_bytes = xfer->len;
> -		writel(len, se->base + SE_SPI_TX_TRANS_LEN);
> +		writel_relaxed(len, se->base + SE_SPI_TX_TRANS_LEN);
>  	}
> -
>  	if (xfer->rx_buf) {
>  		m_cmd |= SPI_RX_ONLY;
> -		writel(len, se->base + SE_SPI_RX_TRANS_LEN);
> +		writel_relaxed(len, se->base + SE_SPI_RX_TRANS_LEN);
>  		mas->rx_rem_bytes = xfer->len;
>  	}
>  
> -- 
> 2.28.0.618.gf4bc123cb7-goog
>

Doug Anderson Sept. 13, 2020, 1:09 a.m. UTC | #3

Hi,

On Sat, Sep 12, 2020 at 3:54 PM Bjorn Andersson
<bjorn.andersson@linaro.org> wrote:
>
> On Sat 12 Sep 16:08 CDT 2020, Douglas Anderson wrote:
>
> > When setting up a bidirectional transfer we need to program both the
> > TX and RX lengths.  We don't need a memory barrier between those two
> > writes.  Factor out the __iowmb() and use writel_relaxed().  This
> > saves a fraction of a microsecond of setup overhead on bidirectional
> > transfers.
> >
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> >
> >  drivers/spi/spi-geni-qcom.c | 13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c
> > index 92d88bf85a90..6c7e12b68bf0 100644
> > --- a/drivers/spi/spi-geni-qcom.c
> > +++ b/drivers/spi/spi-geni-qcom.c
> > @@ -376,15 +376,22 @@ static void setup_fifo_xfer(struct spi_transfer *xfer,
> >       len &= TRANS_LEN_MSK;
> >
> >       mas->cur_xfer = xfer;
> > +
> > +     /*
> > +      * Factor out the __iowmb() so that we can use writel_relaxed() for
> > +      * both writes below and thus only incur the overhead once even if
> > +      * we execute both of them.
> > +      */
>
> How many passes through this function do we have to take before saving
> the amount of time it took me to read this comment?
>
> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

Thanks for the review!  Yeah, in Chrome OS we do a crazy amount of SPI
transfers since our EC and security chip are connected over SPI and we
seem to pile a whole lot of stuff into the EC.  This means we keep
coming back to the SPI controller again and again when profiling
things.  I'm hoping that we'll eventually be able to get DMA enabled
here, but until then at least it's nice to make the FIFO transfers
better...

-Doug

Doug Anderson Sept. 13, 2020, 1:11 a.m. UTC | #4

Hi,

On Sat, Sep 12, 2020 at 3:53 PM Bjorn Andersson
<bjorn.andersson@linaro.org> wrote:
>
> On Sat 12 Sep 16:07 CDT 2020, Douglas Anderson wrote:
>
> > In commit 902481a78ee4 ("spi: spi-geni-qcom: Actually use our FIFO") I
> > explained that the maximum size we could program the FIFO was
> > "mas->tx_fifo_depth - 3" but that I chose "mas->tx_fifo_depth()"
> > because I was worried about decreased bandwidth.
> >
> > Since that time:
> > * All the interconnect patches have landed, making things run at the
> >   proper speed.
> > * I've done more measurements.
> >
> > This lets me confirm that there's really no downside of using the FIFO
> > more.  Specifically I did "flashrom -p ec -r /tmp/foo.bin" on a
> > Chromebook and averaged over several runs.
>
> Wouldn't there be a downside in the form of setting the watermark that
> close to the full FIFO we have less room for being late handling the
> interrupt? Or is there some mechanism involved that will prevent
> the FIFO from being overrun?

Yeah, I had that worry too, but, as described in 902481a78ee4 ("spi:
spi-geni-qcom: Actually use our FIFO"), it doesn't seem to be a
problem.  From that commit: "We are the SPI master, so it makes sense
that there would be no problems with overruns, the master should just
stop clocking."

-Doug

Bjorn Andersson Sept. 13, 2020, 3:12 a.m. UTC | #5

On Sat 12 Sep 20:11 CDT 2020, Doug Anderson wrote:

> Hi,
> 
> On Sat, Sep 12, 2020 at 3:53 PM Bjorn Andersson
> <bjorn.andersson@linaro.org> wrote:
> >
> > On Sat 12 Sep 16:07 CDT 2020, Douglas Anderson wrote:
> >
> > > In commit 902481a78ee4 ("spi: spi-geni-qcom: Actually use our FIFO") I
> > > explained that the maximum size we could program the FIFO was
> > > "mas->tx_fifo_depth - 3" but that I chose "mas->tx_fifo_depth()"
> > > because I was worried about decreased bandwidth.
> > >
> > > Since that time:
> > > * All the interconnect patches have landed, making things run at the
> > >   proper speed.
> > > * I've done more measurements.
> > >
> > > This lets me confirm that there's really no downside of using the FIFO
> > > more.  Specifically I did "flashrom -p ec -r /tmp/foo.bin" on a
> > > Chromebook and averaged over several runs.
> >
> > Wouldn't there be a downside in the form of setting the watermark that
> > close to the full FIFO we have less room for being late handling the
> > interrupt? Or is there some mechanism involved that will prevent
> > the FIFO from being overrun?
> 
> Yeah, I had that worry too, but, as described in 902481a78ee4 ("spi:
> spi-geni-qcom: Actually use our FIFO"), it doesn't seem to be a
> problem.  From that commit: "We are the SPI master, so it makes sense
> that there would be no problems with overruns, the master should just
> stop clocking."
> 

Actually read the message of the linked commit now. I share your view
that this indicates that the controller does something wrt the clocking
to handle this case.

Change itself looks good, so:

Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

Regards,
Bjorn

Doug Anderson Sept. 13, 2020, 8:35 p.m. UTC | #6

Hi,

On Sat, Sep 12, 2020 at 6:09 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Sat, Sep 12, 2020 at 3:54 PM Bjorn Andersson
> <bjorn.andersson@linaro.org> wrote:
> >
> > On Sat 12 Sep 16:08 CDT 2020, Douglas Anderson wrote:
> >
> > > When setting up a bidirectional transfer we need to program both the
> > > TX and RX lengths.  We don't need a memory barrier between those two
> > > writes.  Factor out the __iowmb() and use writel_relaxed().  This
> > > saves a fraction of a microsecond of setup overhead on bidirectional
> > > transfers.
> > >
> > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > ---
> > >
> > >  drivers/spi/spi-geni-qcom.c | 13 ++++++++++---
> > >  1 file changed, 10 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c
> > > index 92d88bf85a90..6c7e12b68bf0 100644
> > > --- a/drivers/spi/spi-geni-qcom.c
> > > +++ b/drivers/spi/spi-geni-qcom.c
> > > @@ -376,15 +376,22 @@ static void setup_fifo_xfer(struct spi_transfer *xfer,
> > >       len &= TRANS_LEN_MSK;
> > >
> > >       mas->cur_xfer = xfer;
> > > +
> > > +     /*
> > > +      * Factor out the __iowmb() so that we can use writel_relaxed() for
> > > +      * both writes below and thus only incur the overhead once even if
> > > +      * we execute both of them.
> > > +      */
> >
> > How many passes through this function do we have to take before saving
> > the amount of time it took me to read this comment?
> >
> > Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
>
> Thanks for the review!  Yeah, in Chrome OS we do a crazy amount of SPI
> transfers since our EC and security chip are connected over SPI and we
> seem to pile a whole lot of stuff into the EC.  This means we keep
> coming back to the SPI controller again and again when profiling
> things.  I'm hoping that we'll eventually be able to get DMA enabled
> here, but until then at least it's nice to make the FIFO transfers
> better...

Ugh.  Given the problem that the kernel test robot found, I'm gonna
say just drop this patch but keep the others I sent.  As per the CL
description, it's a pretty minor optimization and even though we do a
lot of SPI transfers it's probably more worth it to work towards DMA
mode than to try to find a cleaner solution for this one.

-Doug

Mark Brown Sept. 14, 2020, 2:52 p.m. UTC | #7

On Sat, 12 Sep 2020 14:07:59 -0700, Douglas Anderson wrote:
> In commit 902481a78ee4 ("spi: spi-geni-qcom: Actually use our FIFO") I
> explained that the maximum size we could program the FIFO was
> "mas->tx_fifo_depth - 3" but that I chose "mas->tx_fifo_depth()"
> because I was worried about decreased bandwidth.
> 
> Since that time:
> * All the interconnect patches have landed, making things run at the
>   proper speed.
> * I've done more measurements.
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[1/2] spi: spi-geni-qcom: Use the FIFO even more
      commit: fc129a43aa2705770dc45b2e9c506d2617fd5863
[2/2] spi: spi-geni-qcom: Don't program CS_TOGGLE again and again
      commit: 14ac4e049dc1183440960f177b60b54357e54d90

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

Akash Asthana Sept. 15, 2020, 7:30 a.m. UTC | #8

On 9/13/2020 2:37 AM, Douglas Anderson wrote:
> In commit 902481a78ee4 ("spi: spi-geni-qcom: Actually use our FIFO") I
> explained that the maximum size we could program the FIFO was
> "mas->tx_fifo_depth - 3" but that I chose "mas->tx_fifo_depth()"
> because I was worried about decreased bandwidth.
>
> Since that time:
> * All the interconnect patches have landed, making things run at the
>    proper speed.
> * I've done more measurements.
>
> This lets me confirm that there's really no downside of using the FIFO
> more.  Specifically I did "flashrom -p ec -r /tmp/foo.bin" on a
> Chromebook and averaged over several runs.
>
> Before: It took 6.66 seconds and 59669 interrupts fired.
> After:  It took 6.66 seconds and 47992 interrupts fired.

Reviewed-by: Akash Asthana <akashast@codeaurora.org>

> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
>

[1/3] spi: spi-geni-qcom: Use the FIFO even more

Commit Message

Comments

Patch