Message ID | 20200912140730.3.Ided778fb4cd078e36c6b240d1b279cd7a534a313@changeid |
---|---|
State | New |
Headers | show |
Series | None | expand |
On Sat 12 Sep 16:08 CDT 2020, Douglas Anderson wrote: > When setting up a bidirectional transfer we need to program both the > TX and RX lengths. We don't need a memory barrier between those two > writes. Factor out the __iowmb() and use writel_relaxed(). This > saves a fraction of a microsecond of setup overhead on bidirectional > transfers. > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > > drivers/spi/spi-geni-qcom.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c > index 92d88bf85a90..6c7e12b68bf0 100644 > --- a/drivers/spi/spi-geni-qcom.c > +++ b/drivers/spi/spi-geni-qcom.c > @@ -376,15 +376,22 @@ static void setup_fifo_xfer(struct spi_transfer *xfer, > len &= TRANS_LEN_MSK; > > mas->cur_xfer = xfer; > + > + /* > + * Factor out the __iowmb() so that we can use writel_relaxed() for > + * both writes below and thus only incur the overhead once even if > + * we execute both of them. > + */ How many passes through this function do we have to take before saving the amount of time it took me to read this comment? Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Regards, Bjorn > + __iowmb(); > + > if (xfer->tx_buf) { > m_cmd |= SPI_TX_ONLY; > mas->tx_rem_bytes = xfer->len; > - writel(len, se->base + SE_SPI_TX_TRANS_LEN); > + writel_relaxed(len, se->base + SE_SPI_TX_TRANS_LEN); > } > - > if (xfer->rx_buf) { > m_cmd |= SPI_RX_ONLY; > - writel(len, se->base + SE_SPI_RX_TRANS_LEN); > + writel_relaxed(len, se->base + SE_SPI_RX_TRANS_LEN); > mas->rx_rem_bytes = xfer->len; > } > > -- > 2.28.0.618.gf4bc123cb7-goog >
Hi Douglas, I love your patch! Yet something to improve: [auto build test ERROR on spi/for-next] [also build test ERROR on v5.9-rc4 next-20200911] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Douglas-Anderson/spi-spi-geni-qcom-Use-the-FIFO-even-more/20200913-050928 base: https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next config: x86_64-randconfig-a002-20200913 (attached as .config) compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 3170d54842655d6d936aae32b7d0bc92fce7f22e) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install x86_64 cross compiling tool for clang build # apt-get install binutils-x86-64-linux-gnu # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> drivers/spi/spi-geni-qcom.c:385:2: error: implicit declaration of function '__iowmb' [-Werror,-Wimplicit-function-declaration] __iowmb(); ^ 1 error generated. # https://github.com/0day-ci/linux/commit/4458adf4007926cfaaa505b54a83059c9ba813ad git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Douglas-Anderson/spi-spi-geni-qcom-Use-the-FIFO-even-more/20200913-050928 git checkout 4458adf4007926cfaaa505b54a83059c9ba813ad vim +/__iowmb +385 drivers/spi/spi-geni-qcom.c 334 335 static void setup_fifo_xfer(struct spi_transfer *xfer, 336 struct spi_geni_master *mas, 337 u16 mode, struct spi_master *spi) 338 { 339 u32 m_cmd = 0; 340 u32 len; 341 struct geni_se *se = &mas->se; 342 int ret; 343 344 /* 345 * Ensure that our interrupt handler isn't still running from some 346 * prior command before we start messing with the hardware behind 347 * its back. We don't need to _keep_ the lock here since we're only 348 * worried about racing with out interrupt handler. The SPI core 349 * already handles making sure that we're not trying to do two 350 * transfers at once or setting a chip select and doing a transfer 351 * concurrently. 352 * 353 * NOTE: we actually _can't_ hold the lock here because possibly we 354 * might call clk_set_rate() which needs to be able to sleep. 355 */ 356 spin_lock_irq(&mas->lock); 357 spin_unlock_irq(&mas->lock); 358 359 if (xfer->bits_per_word != mas->cur_bits_per_word) { 360 spi_setup_word_len(mas, mode, xfer->bits_per_word); 361 mas->cur_bits_per_word = xfer->bits_per_word; 362 } 363 364 /* Speed and bits per word can be overridden per transfer */ 365 ret = geni_spi_set_clock_and_bw(mas, xfer->speed_hz); 366 if (ret) 367 return; 368 369 mas->tx_rem_bytes = 0; 370 mas->rx_rem_bytes = 0; 371 372 if (!(mas->cur_bits_per_word % MIN_WORD_LEN)) 373 len = xfer->len * BITS_PER_BYTE / mas->cur_bits_per_word; 374 else 375 len = xfer->len / (mas->cur_bits_per_word / BITS_PER_BYTE + 1); 376 len &= TRANS_LEN_MSK; 377 378 mas->cur_xfer = xfer; 379 380 /* 381 * Factor out the __iowmb() so that we can use writel_relaxed() for 382 * both writes below and thus only incur the overhead once even if 383 * we execute both of them. 384 */ > 385 __iowmb(); 386 387 if (xfer->tx_buf) { 388 m_cmd |= SPI_TX_ONLY; 389 mas->tx_rem_bytes = xfer->len; 390 writel_relaxed(len, se->base + SE_SPI_TX_TRANS_LEN); 391 } 392 if (xfer->rx_buf) { 393 m_cmd |= SPI_RX_ONLY; 394 writel_relaxed(len, se->base + SE_SPI_RX_TRANS_LEN); 395 mas->rx_rem_bytes = xfer->len; 396 } 397 398 /* 399 * Lock around right before we start the transfer since our 400 * interrupt could come in at any time now. 401 */ 402 spin_lock_irq(&mas->lock); 403 geni_se_setup_m_cmd(se, m_cmd, FRAGMENTATION); 404 405 /* 406 * TX_WATERMARK_REG should be set after SPI configuration and 407 * setting up GENI SE engine, as driver starts data transfer 408 * for the watermark interrupt. 409 */ 410 if (m_cmd & SPI_TX_ONLY) 411 writel(mas->tx_wm, se->base + SE_GENI_TX_WATERMARK_REG); 412 spin_unlock_irq(&mas->lock); 413 } 414 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Hi, On Sat, Sep 12, 2020 at 6:09 PM Doug Anderson <dianders@chromium.org> wrote: > > Hi, > > On Sat, Sep 12, 2020 at 3:54 PM Bjorn Andersson > <bjorn.andersson@linaro.org> wrote: > > > > On Sat 12 Sep 16:08 CDT 2020, Douglas Anderson wrote: > > > > > When setting up a bidirectional transfer we need to program both the > > > TX and RX lengths. We don't need a memory barrier between those two > > > writes. Factor out the __iowmb() and use writel_relaxed(). This > > > saves a fraction of a microsecond of setup overhead on bidirectional > > > transfers. > > > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > --- > > > > > > drivers/spi/spi-geni-qcom.c | 13 ++++++++++--- > > > 1 file changed, 10 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c > > > index 92d88bf85a90..6c7e12b68bf0 100644 > > > --- a/drivers/spi/spi-geni-qcom.c > > > +++ b/drivers/spi/spi-geni-qcom.c > > > @@ -376,15 +376,22 @@ static void setup_fifo_xfer(struct spi_transfer *xfer, > > > len &= TRANS_LEN_MSK; > > > > > > mas->cur_xfer = xfer; > > > + > > > + /* > > > + * Factor out the __iowmb() so that we can use writel_relaxed() for > > > + * both writes below and thus only incur the overhead once even if > > > + * we execute both of them. > > > + */ > > > > How many passes through this function do we have to take before saving > > the amount of time it took me to read this comment? > > > > Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> > > Thanks for the review! Yeah, in Chrome OS we do a crazy amount of SPI > transfers since our EC and security chip are connected over SPI and we > seem to pile a whole lot of stuff into the EC. This means we keep > coming back to the SPI controller again and again when profiling > things. I'm hoping that we'll eventually be able to get DMA enabled > here, but until then at least it's nice to make the FIFO transfers > better... Ugh. Given the problem that the kernel test robot found, I'm gonna say just drop this patch but keep the others I sent. As per the CL description, it's a pretty minor optimization and even though we do a lot of SPI transfers it's probably more worth it to work towards DMA mode than to try to find a cleaner solution for this one. -Doug
diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c index 92d88bf85a90..6c7e12b68bf0 100644 --- a/drivers/spi/spi-geni-qcom.c +++ b/drivers/spi/spi-geni-qcom.c @@ -376,15 +376,22 @@ static void setup_fifo_xfer(struct spi_transfer *xfer, len &= TRANS_LEN_MSK; mas->cur_xfer = xfer; + + /* + * Factor out the __iowmb() so that we can use writel_relaxed() for + * both writes below and thus only incur the overhead once even if + * we execute both of them. + */ + __iowmb(); + if (xfer->tx_buf) { m_cmd |= SPI_TX_ONLY; mas->tx_rem_bytes = xfer->len; - writel(len, se->base + SE_SPI_TX_TRANS_LEN); + writel_relaxed(len, se->base + SE_SPI_TX_TRANS_LEN); } - if (xfer->rx_buf) { m_cmd |= SPI_RX_ONLY; - writel(len, se->base + SE_SPI_RX_TRANS_LEN); + writel_relaxed(len, se->base + SE_SPI_RX_TRANS_LEN); mas->rx_rem_bytes = xfer->len; }
When setting up a bidirectional transfer we need to program both the TX and RX lengths. We don't need a memory barrier between those two writes. Factor out the __iowmb() and use writel_relaxed(). This saves a fraction of a microsecond of setup overhead on bidirectional transfers. Signed-off-by: Douglas Anderson <dianders@chromium.org> --- drivers/spi/spi-geni-qcom.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)