Message ID | 1608236927-28701-6-git-send-email-skomatineni@nvidia.com |
---|---|
State | Superseded |
Headers | show |
Series | Add Tegra Quad SPI driver | expand |
Hi Sowjanya, On 17/12/20 12:28PM, Sowjanya Komatineni wrote: > This patch marks dummy transfer by setting dummy_data bit to 1. > > Controllers supporting dummy transfer by hardware use this bit field > to skip software transfer of dummy bytes and use hardware dummy bytes > transfer. What is the benefit you get from this change? You add complexity in spi-mem and the controller driver, so that must come with some benefits. Here I don't see any. The transfer will certainly take the same amount of time because the number or period of the dummy cycles has not changed. So why is this needed? > Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com> > --- > drivers/spi/spi-mem.c | 1 + > include/linux/spi/spi.h | 2 ++ > 2 files changed, 3 insertions(+) > > diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c > index f3a3f19..c64371c 100644 > --- a/drivers/spi/spi-mem.c > +++ b/drivers/spi/spi-mem.c > @@ -354,6 +354,7 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op) > xfers[xferpos].tx_buf = tmpbuf + op->addr.nbytes + 1; > xfers[xferpos].len = op->dummy.nbytes; > xfers[xferpos].tx_nbits = op->dummy.buswidth; > + xfers[xferpos].dummy_data = 1; > spi_message_add_tail(&xfers[xferpos], &msg); > xferpos++; > totalxferlen += op->dummy.nbytes; > diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h > index aa09fdc..708f2f5 100644 > --- a/include/linux/spi/spi.h > +++ b/include/linux/spi/spi.h > @@ -827,6 +827,7 @@ extern void spi_res_release(struct spi_controller *ctlr, > * transfer. If 0 the default (from @spi_device) is used. > * @bits_per_word: select a bits_per_word other than the device default > * for this transfer. If 0 the default (from @spi_device) is used. > + * @dummy_data: indicates transfer is dummy bytes transfer. > * @cs_change: affects chipselect after this transfer completes > * @cs_change_delay: delay between cs deassert and assert when > * @cs_change is set and @spi_transfer is not the last in @spi_message > @@ -939,6 +940,7 @@ struct spi_transfer { > struct sg_table tx_sg; > struct sg_table rx_sg; > > + unsigned dummy_data:1; > unsigned cs_change:1; > unsigned tx_nbits:3; > unsigned rx_nbits:3; > -- > 2.7.4 >
On Fri, 18 Dec 2020 14:51:08 +0530 Pratyush Yadav <p.yadav@ti.com> wrote: > Hi Sowjanya, > > On 17/12/20 12:28PM, Sowjanya Komatineni wrote: > > This patch marks dummy transfer by setting dummy_data bit to 1. > > > > Controllers supporting dummy transfer by hardware use this bit field > > to skip software transfer of dummy bytes and use hardware dummy bytes > > transfer. > > What is the benefit you get from this change? You add complexity in > spi-mem and the controller driver, so that must come with some benefits. > Here I don't see any. The transfer will certainly take the same amount > of time because the number or period of the dummy cycles has not > changed. So why is this needed? Well, you don't have to queue TX bytes if you use HW-based dummy cycles, but I agree, I'd expect the overhead to be negligible, especially since we're talking about emitting a few bytes, not hundreds. This being said, the complexity added to the core is reasonable IMHO, so if it really helps reducing the CPU overhead (we might need some numbers to prove that), I guess it's okay. > > > Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com> > > --- > > drivers/spi/spi-mem.c | 1 + > > include/linux/spi/spi.h | 2 ++ > > 2 files changed, 3 insertions(+) > > > > diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c > > index f3a3f19..c64371c 100644 > > --- a/drivers/spi/spi-mem.c > > +++ b/drivers/spi/spi-mem.c > > @@ -354,6 +354,7 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op) > > xfers[xferpos].tx_buf = tmpbuf + op->addr.nbytes + 1; > > xfers[xferpos].len = op->dummy.nbytes; > > xfers[xferpos].tx_nbits = op->dummy.buswidth; > > + xfers[xferpos].dummy_data = 1; > > spi_message_add_tail(&xfers[xferpos], &msg); > > xferpos++; > > totalxferlen += op->dummy.nbytes; > > diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h > > index aa09fdc..708f2f5 100644 > > --- a/include/linux/spi/spi.h > > +++ b/include/linux/spi/spi.h > > @@ -827,6 +827,7 @@ extern void spi_res_release(struct spi_controller *ctlr, > > * transfer. If 0 the default (from @spi_device) is used. > > * @bits_per_word: select a bits_per_word other than the device default > > * for this transfer. If 0 the default (from @spi_device) is used. > > + * @dummy_data: indicates transfer is dummy bytes transfer. > > * @cs_change: affects chipselect after this transfer completes > > * @cs_change_delay: delay between cs deassert and assert when > > * @cs_change is set and @spi_transfer is not the last in @spi_message > > @@ -939,6 +940,7 @@ struct spi_transfer { > > struct sg_table tx_sg; > > struct sg_table rx_sg; > > > > + unsigned dummy_data:1; > > unsigned cs_change:1; > > unsigned tx_nbits:3; > > unsigned rx_nbits:3; > > -- > > 2.7.4 > > >
On 12/18/20 1:57 AM, Boris Brezillon wrote: > On Fri, 18 Dec 2020 14:51:08 +0530 > Pratyush Yadav <p.yadav@ti.com> wrote: > >> Hi Sowjanya, >> >> On 17/12/20 12:28PM, Sowjanya Komatineni wrote: >>> This patch marks dummy transfer by setting dummy_data bit to 1. >>> >>> Controllers supporting dummy transfer by hardware use this bit field >>> to skip software transfer of dummy bytes and use hardware dummy bytes >>> transfer. >> What is the benefit you get from this change? You add complexity in >> spi-mem and the controller driver, so that must come with some benefits. >> Here I don't see any. The transfer will certainly take the same amount >> of time because the number or period of the dummy cycles has not >> changed. So why is this needed? > Well, you don't have to queue TX bytes if you use HW-based dummy > cycles, but I agree, I'd expect the overhead to be negligible, > especially since we're talking about emitting a few bytes, not hundreds. > This being said, the complexity added to the core is reasonable IMHO, > so if it really helps reducing the CPU overhead (we might need some > numbers to prove that), I guess it's okay. Hardware dummy cycles feature of Tegra QSPI is to save SW transfer cycle of dummy bytes by filling FIFO. I don't have numbers as we always use hardware dummy cycles with Tegra QSPI. >> >>> Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com> >>> --- >>> drivers/spi/spi-mem.c | 1 + >>> include/linux/spi/spi.h | 2 ++ >>> 2 files changed, 3 insertions(+) >>> >>> diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c >>> index f3a3f19..c64371c 100644 >>> --- a/drivers/spi/spi-mem.c >>> +++ b/drivers/spi/spi-mem.c >>> @@ -354,6 +354,7 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op) >>> xfers[xferpos].tx_buf = tmpbuf + op->addr.nbytes + 1; >>> xfers[xferpos].len = op->dummy.nbytes; >>> xfers[xferpos].tx_nbits = op->dummy.buswidth; >>> + xfers[xferpos].dummy_data = 1; >>> spi_message_add_tail(&xfers[xferpos], &msg); >>> xferpos++; >>> totalxferlen += op->dummy.nbytes; >>> diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h >>> index aa09fdc..708f2f5 100644 >>> --- a/include/linux/spi/spi.h >>> +++ b/include/linux/spi/spi.h >>> @@ -827,6 +827,7 @@ extern void spi_res_release(struct spi_controller *ctlr, >>> * transfer. If 0 the default (from @spi_device) is used. >>> * @bits_per_word: select a bits_per_word other than the device default >>> * for this transfer. If 0 the default (from @spi_device) is used. >>> + * @dummy_data: indicates transfer is dummy bytes transfer. >>> * @cs_change: affects chipselect after this transfer completes >>> * @cs_change_delay: delay between cs deassert and assert when >>> * @cs_change is set and @spi_transfer is not the last in @spi_message >>> @@ -939,6 +940,7 @@ struct spi_transfer { >>> struct sg_table tx_sg; >>> struct sg_table rx_sg; >>> >>> + unsigned dummy_data:1; >>> unsigned cs_change:1; >>> unsigned tx_nbits:3; >>> unsigned rx_nbits:3; >>> -- >>> 2.7.4 >>>
On 18/12/20 10:09AM, Sowjanya Komatineni wrote: > > On 12/18/20 1:57 AM, Boris Brezillon wrote: > > On Fri, 18 Dec 2020 14:51:08 +0530 > > Pratyush Yadav <p.yadav@ti.com> wrote: > > > > > Hi Sowjanya, > > > > > > On 17/12/20 12:28PM, Sowjanya Komatineni wrote: > > > > This patch marks dummy transfer by setting dummy_data bit to 1. > > > > > > > > Controllers supporting dummy transfer by hardware use this bit field > > > > to skip software transfer of dummy bytes and use hardware dummy bytes > > > > transfer. > > > What is the benefit you get from this change? You add complexity in > > > spi-mem and the controller driver, so that must come with some benefits. > > > Here I don't see any. The transfer will certainly take the same amount > > > of time because the number or period of the dummy cycles has not > > > changed. So why is this needed? > > Well, you don't have to queue TX bytes if you use HW-based dummy > > cycles, but I agree, I'd expect the overhead to be negligible, > > especially since we're talking about emitting a few bytes, not hundreds. > > This being said, the complexity added to the core is reasonable IMHO, > > so if it really helps reducing the CPU overhead (we might need some > > numbers to prove that), I guess it's okay. > > Hardware dummy cycles feature of Tegra QSPI is to save SW transfer cycle of > dummy bytes by filling FIFO. > > I don't have numbers as we always use hardware dummy cycles with Tegra QSPI. Most flashes use somewhere around 8-10 dummy cycles. Some of the faster ones working at 166 MHz or above use 20-25. From what I understand, the only time hardware transfer of dummy cycles will reduce CPU load significantly will be when multiple small transfers are made in quick succession. Unless you have such a usecase I don't know how much benefit you will get by it. Anyway, if the SPI maintainers think this is worth it, I won't object. > > > > Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com> > > > > --- > > > > drivers/spi/spi-mem.c | 1 + > > > > include/linux/spi/spi.h | 2 ++ > > > > 2 files changed, 3 insertions(+) > > > > > > > > diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c > > > > index f3a3f19..c64371c 100644 > > > > --- a/drivers/spi/spi-mem.c > > > > +++ b/drivers/spi/spi-mem.c > > > > @@ -354,6 +354,7 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op) > > > > xfers[xferpos].tx_buf = tmpbuf + op->addr.nbytes + 1; > > > > xfers[xferpos].len = op->dummy.nbytes; > > > > xfers[xferpos].tx_nbits = op->dummy.buswidth; > > > > + xfers[xferpos].dummy_data = 1; > > > > spi_message_add_tail(&xfers[xferpos], &msg); > > > > xferpos++; > > > > totalxferlen += op->dummy.nbytes; > > > > diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h > > > > index aa09fdc..708f2f5 100644 > > > > --- a/include/linux/spi/spi.h > > > > +++ b/include/linux/spi/spi.h > > > > @@ -827,6 +827,7 @@ extern void spi_res_release(struct spi_controller *ctlr, > > > > * transfer. If 0 the default (from @spi_device) is used. > > > > * @bits_per_word: select a bits_per_word other than the device default > > > > * for this transfer. If 0 the default (from @spi_device) is used. > > > > + * @dummy_data: indicates transfer is dummy bytes transfer. > > > > * @cs_change: affects chipselect after this transfer completes > > > > * @cs_change_delay: delay between cs deassert and assert when > > > > * @cs_change is set and @spi_transfer is not the last in @spi_message > > > > @@ -939,6 +940,7 @@ struct spi_transfer { > > > > struct sg_table tx_sg; > > > > struct sg_table rx_sg; > > > > + unsigned dummy_data:1; > > > > unsigned cs_change:1; > > > > unsigned tx_nbits:3; > > > > unsigned rx_nbits:3; > > > > -- > > > > 2.7.4 -- Regards, Pratyush Yadav Texas Instruments India
On Sat, Dec 19, 2020 at 12:49:38AM +0530, Pratyush Yadav wrote:
> Anyway, if the SPI maintainers think this is worth it, I won't object.
This gets kind of circular, for me it's a question of if there's some
meaningful benefit from using the feature vs the cost to support it and
from the sounds of it we don't have numbers on the benefits from using
it at present.
On Fri, Dec 18, 2020 at 08:41:02PM +0000, Mark Brown wrote: > On Sat, Dec 19, 2020 at 12:49:38AM +0530, Pratyush Yadav wrote: > > Anyway, if the SPI maintainers think this is worth it, I won't object. > This gets kind of circular, for me it's a question of if there's some > meaningful benefit from using the feature vs the cost to support it and > from the sounds of it we don't have numbers on the benefits from using > it at present. ...although I do have to say looking at the implementation that the cost seems low, it's just a flag set on an existing transfer. The only issue is if we'd get more win from coalesing the entire transaction (or entire transmit) into a single transfer that could be DMAed and/or requires fewer trips through the stack which does make it seem like an unclear tradeoff from the point of view of client drivers
On 12/18/20 12:44 PM, Mark Brown wrote: > On Fri, Dec 18, 2020 at 08:41:02PM +0000, Mark Brown wrote: >> On Sat, Dec 19, 2020 at 12:49:38AM +0530, Pratyush Yadav wrote: >>> Anyway, if the SPI maintainers think this is worth it, I won't object. >> This gets kind of circular, for me it's a question of if there's some >> meaningful benefit from using the feature vs the cost to support it and >> from the sounds of it we don't have numbers on the benefits from using >> it at present. > ...although I do have to say looking at the implementation that the cost > seems low, it's just a flag set on an existing transfer. The only issue > is if we'd get more win from coalesing the entire transaction (or entire > transmit) into a single transfer that could be DMAed and/or requires > fewer trips through the stack which does make it seem like an unclear > tradeoff from the point of view of client drivers Using HW dummy cycles save extra software cycle of transfer which involves transfer setup register writes, writing dummy bytes to TX FIFO, interrupt processing. Implementation wise it just a single bit field added to spi_transfer and on Tegra controller driver programming dummy cycles with prior transfer and skipping sw dummy transfer which is actually not complex. From quick check, I see HW dummy cycles transfer of 128KB shows 18 Mb/s while SW transfer of bytes shows 17.3 MB/s on average. When back-to-back read commands are executed using HW dummy cycles will definitely save cycles.
On Fri, Dec 18, 2020 at 02:01:56PM -0800, Sowjanya Komatineni wrote: > From quick check, I see HW dummy cycles transfer of 128KB shows 18 Mb/s > while SW transfer of bytes shows 17.3 MB/s on average. OK, it's not going to revolutionize the world or anything but that's definitely a speedup.
diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c index f3a3f19..c64371c 100644 --- a/drivers/spi/spi-mem.c +++ b/drivers/spi/spi-mem.c @@ -354,6 +354,7 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op) xfers[xferpos].tx_buf = tmpbuf + op->addr.nbytes + 1; xfers[xferpos].len = op->dummy.nbytes; xfers[xferpos].tx_nbits = op->dummy.buswidth; + xfers[xferpos].dummy_data = 1; spi_message_add_tail(&xfers[xferpos], &msg); xferpos++; totalxferlen += op->dummy.nbytes; diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index aa09fdc..708f2f5 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -827,6 +827,7 @@ extern void spi_res_release(struct spi_controller *ctlr, * transfer. If 0 the default (from @spi_device) is used. * @bits_per_word: select a bits_per_word other than the device default * for this transfer. If 0 the default (from @spi_device) is used. + * @dummy_data: indicates transfer is dummy bytes transfer. * @cs_change: affects chipselect after this transfer completes * @cs_change_delay: delay between cs deassert and assert when * @cs_change is set and @spi_transfer is not the last in @spi_message @@ -939,6 +940,7 @@ struct spi_transfer { struct sg_table tx_sg; struct sg_table rx_sg; + unsigned dummy_data:1; unsigned cs_change:1; unsigned tx_nbits:3; unsigned rx_nbits:3;
This patch marks dummy transfer by setting dummy_data bit to 1. Controllers supporting dummy transfer by hardware use this bit field to skip software transfer of dummy bytes and use hardware dummy bytes transfer. Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com> --- drivers/spi/spi-mem.c | 1 + include/linux/spi/spi.h | 2 ++ 2 files changed, 3 insertions(+)