mbox series

[0/6] Avoid odd length/address read/writes in 8D-8D-8D mode.

Message ID 20210506191829.8271-1-p.yadav@ti.com
Headers show
Series Avoid odd length/address read/writes in 8D-8D-8D mode. | expand

Message

Pratyush Yadav May 6, 2021, 7:18 p.m. UTC
Hi,

On Octal DTR flashes like Micron Xcella or Cypress S28 family, reads or
writes cannot start at an odd address in 8D-8D-8D mode. Neither can they
be odd bytes long. Upper layers like filesystems don't know what mode
the flash is in, and hence don't know the read/write address or length
limitations. They might issue reads or writes that leave the flash in an
error state. In fact, using UBIFS on top of the flash was how I first
noticed this problem.

This series fixes that problem by padding the read/write with extra
bytes to make sure the final operation has an even address and length.
More info in patches 5 and 6.

Patches 1-3 fix some existing odd-byte long reads. Patch 4 adds checks
to disallow odd length command/address/dummy/data phases in 8D-8D-8D
mode. Note that this does not restrict the _value_ of the address from
being odd since this is a restriction on the flash, not the protocol
itself.

Patch 4 should go through the SPI tree but I have included it in this
series because if it goes in before patches 1-3, Micron MT35XU and
Cypress S28HS flashes will stop working correctly.

Tested on TI J721E for Micron MT35 and on TI J7200 for Cypress S28.


Pratyush Yadav (6):
  mtd: spi-nor: core: use 2 data bytes for template ops
  mtd: spi-nor: spansion: write 2 bytes when disabling Octal DTR mode
  mtd: spi-nor: micron-st: write 2 bytes when disabling Octal DTR mode
  spi: spi-mem: reject partial cycle transfers in 8D-8D-8D mode
  mtd: spi-nor: core; avoid odd length/address reads on 8D-8D-8D mode
  mtd: spi-nor: core; avoid odd length/address writes in 8D-8D-8D mode

 drivers/mtd/spi-nor/core.c      | 157 +++++++++++++++++++++++++++++++-
 drivers/mtd/spi-nor/micron-st.c |  22 ++++-
 drivers/mtd/spi-nor/spansion.c  |  18 +++-
 drivers/spi/spi-mem.c           |  12 ++-
 4 files changed, 194 insertions(+), 15 deletions(-)

Comments

Mark Brown May 7, 2021, 3:50 p.m. UTC | #1
On Fri, May 07, 2021 at 12:48:23AM +0530, Pratyush Yadav wrote:

> Patch 4 should go through the SPI tree but I have included it in this

> series because if it goes in before patches 1-3, Micron MT35XU and

> Cypress S28HS flashes will stop working correctly.


It probably makes sense to apply these to the MTD tree and then send me
a pull request to avoid any future conflicts with SPI.
Michael Walle May 7, 2021, 3:51 p.m. UTC | #2
Am 2021-05-06 21:18, schrieb Pratyush Yadav:
> On Octal DTR capable flashes like Micron Xcella reads cannot start or

> end at an odd address in Octal DTR mode. Extra bytes need to be read at

> the start or end to make sure both the start address and length remain

> even.

> 

> To avoid allocating too much extra memory, thereby putting unnecessary

> memory pressure on the system, the temporary buffer containing the 

> extra

> padding bytes is capped at PAGE_SIZE bytes. The rest of the 2-byte

> aligned part should be read directly in the main buffer.

> 

> Signed-off-by: Pratyush Yadav <p.yadav@ti.com>

> ---

> 

>  drivers/mtd/spi-nor/core.c | 81 +++++++++++++++++++++++++++++++++++++-

>  1 file changed, 80 insertions(+), 1 deletion(-)

> 

> diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c

> index 5cc206b8bbf3..3d66cc34af4d 100644

> --- a/drivers/mtd/spi-nor/core.c

> +++ b/drivers/mtd/spi-nor/core.c

> @@ -1904,6 +1904,82 @@ static const struct flash_info

> *spi_nor_read_id(struct spi_nor *nor)

>  	return ERR_PTR(-ENODEV);

>  }

> 

> +/*

> + * On Octal DTR capable flashes like Micron Xcella reads cannot start 

> or

> + * end at an odd address in Octal DTR mode. Extra bytes need to be 

> read

> + * at the start or end to make sure both the start address and length

> + * remain even.

> + */

> +static int spi_nor_octal_dtr_read(struct spi_nor *nor, loff_t from, 

> size_t len,

> +				  u_char *buf)

> +{

> +	u_char *tmp_buf;

> +	size_t tmp_len;

> +	loff_t start, end;

> +	int ret, bytes_read;

> +

> +	if (IS_ALIGNED(from, 2) && IS_ALIGNED(len, 2))

> +		return spi_nor_read_data(nor, from, len, buf);

> +	else if (IS_ALIGNED(from, 2) && len > PAGE_SIZE)

> +		return spi_nor_read_data(nor, from, round_down(len, PAGE_SIZE),

> +					 buf);

> +

> +	tmp_buf = kmalloc(PAGE_SIZE, GFP_KERNEL);

> +	if (!tmp_buf)

> +		return -ENOMEM;

> +

> +	start = round_down(from, 2);

> +	end = round_up(from + len, 2);

> +

> +	/*

> +	 * Avoid allocating too much memory. The requested read length might 

> be

> +	 * quite large. Allocating a buffer just as large (slightly bigger, 

> in

> +	 * fact) would put unnecessary memory pressure on the system.

> +	 *

> +	 * For example if the read is from 3 to 1M, then this will read from 

> 2

> +	 * to 4098. The reads from 4098 to 1M will then not need a temporary

> +	 * buffer so they can proceed as normal.

> +	 */

> +	tmp_len = min_t(size_t, end - start, PAGE_SIZE);

> +

> +	ret = spi_nor_read_data(nor, start, tmp_len, tmp_buf);

> +	if (ret == 0) {

> +		ret = -EIO;

> +		goto out;

> +	}

> +	if (ret < 0)

> +		goto out;

> +

> +	/*

> +	 * More bytes are read than actually requested, but that number can't 

> be

> +	 * reported to the calling function or it will confuse its 

> calculations.

> +	 * Calculate how many of the _requested_ bytes were read.

> +	 */

> +	bytes_read = ret;

> +

> +	if (from != start)

> +		ret -= from - start;

> +

> +	/*

> +	 * Only account for extra bytes at the end if they were actually 

> read.

> +	 * For example, if the total length was truncated because of 

> temporary

> +	 * buffer size limit then the adjustment for the extra bytes at the 

> end

> +	 * is not needed.

> +	 */

> +	if (start + bytes_read == end)

> +		ret -= end - (from + len);

> +

> +	if (ret < 0) {

> +		ret = -EIO;

> +		goto out;

> +	}

> +

> +	memcpy(buf, tmp_buf + (from - start), ret);

> +out:

> +	kfree(tmp_buf);

> +	return ret;

> +}

> +

>  static int spi_nor_read(struct mtd_info *mtd, loff_t from, size_t len,

>  			size_t *retlen, u_char *buf)

>  {

> @@ -1921,7 +1997,10 @@ static int spi_nor_read(struct mtd_info *mtd,

> loff_t from, size_t len,

> 

>  		addr = spi_nor_convert_addr(nor, addr);

> 

> -		ret = spi_nor_read_data(nor, addr, len, buf);

> +		if (nor->read_proto == SNOR_PROTO_8_8_8_DTR)

> +			ret = spi_nor_octal_dtr_read(nor, addr, len, buf);

> +		else

> +			ret = spi_nor_read_data(nor, addr, len, buf);

>  		if (ret == 0) {

>  			/* We shouldn't see 0-length reads */

>  			ret = -EIO;


Reviewed-by: Michael Walle <michael@walle.cc>


I wonder how much performance is lost if this would just split
one transfer into up to three ones: 2 byte, size - 2, 2 bytes.

-michael
Pratyush Yadav May 7, 2021, 6:04 p.m. UTC | #3
On 07/05/21 05:51PM, Michael Walle wrote:
> Am 2021-05-06 21:18, schrieb Pratyush Yadav:

> > On Octal DTR capable flashes like Micron Xcella reads cannot start or

> > end at an odd address in Octal DTR mode. Extra bytes need to be read at

> > the start or end to make sure both the start address and length remain

> > even.

> > 

> > To avoid allocating too much extra memory, thereby putting unnecessary

> > memory pressure on the system, the temporary buffer containing the extra

> > padding bytes is capped at PAGE_SIZE bytes. The rest of the 2-byte

> > aligned part should be read directly in the main buffer.

> > 

> > Signed-off-by: Pratyush Yadav <p.yadav@ti.com>

> > ---

> > 

> >  drivers/mtd/spi-nor/core.c | 81 +++++++++++++++++++++++++++++++++++++-

> >  1 file changed, 80 insertions(+), 1 deletion(-)

> > 

> > diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c

> > index 5cc206b8bbf3..3d66cc34af4d 100644

> > --- a/drivers/mtd/spi-nor/core.c

> > +++ b/drivers/mtd/spi-nor/core.c

> > @@ -1904,6 +1904,82 @@ static const struct flash_info

> > *spi_nor_read_id(struct spi_nor *nor)

> >  	return ERR_PTR(-ENODEV);

> >  }

> > 

> > +/*

> > + * On Octal DTR capable flashes like Micron Xcella reads cannot start

> > or

> > + * end at an odd address in Octal DTR mode. Extra bytes need to be read

> > + * at the start or end to make sure both the start address and length

> > + * remain even.

> > + */

> > +static int spi_nor_octal_dtr_read(struct spi_nor *nor, loff_t from,

> > size_t len,

> > +				  u_char *buf)

> > +{

> > +	u_char *tmp_buf;

> > +	size_t tmp_len;

> > +	loff_t start, end;

> > +	int ret, bytes_read;

> > +

> > +	if (IS_ALIGNED(from, 2) && IS_ALIGNED(len, 2))

> > +		return spi_nor_read_data(nor, from, len, buf);

> > +	else if (IS_ALIGNED(from, 2) && len > PAGE_SIZE)

> > +		return spi_nor_read_data(nor, from, round_down(len, PAGE_SIZE),

> > +					 buf);

> > +

> > +	tmp_buf = kmalloc(PAGE_SIZE, GFP_KERNEL);

> > +	if (!tmp_buf)

> > +		return -ENOMEM;

> > +

> > +	start = round_down(from, 2);

> > +	end = round_up(from + len, 2);

> > +

> > +	/*

> > +	 * Avoid allocating too much memory. The requested read length might

> > be

> > +	 * quite large. Allocating a buffer just as large (slightly bigger, in

> > +	 * fact) would put unnecessary memory pressure on the system.

> > +	 *

> > +	 * For example if the read is from 3 to 1M, then this will read from 2

> > +	 * to 4098. The reads from 4098 to 1M will then not need a temporary

> > +	 * buffer so they can proceed as normal.

> > +	 */

> > +	tmp_len = min_t(size_t, end - start, PAGE_SIZE);

> > +

> > +	ret = spi_nor_read_data(nor, start, tmp_len, tmp_buf);

> > +	if (ret == 0) {

> > +		ret = -EIO;

> > +		goto out;

> > +	}

> > +	if (ret < 0)

> > +		goto out;

> > +

> > +	/*

> > +	 * More bytes are read than actually requested, but that number can't

> > be

> > +	 * reported to the calling function or it will confuse its

> > calculations.

> > +	 * Calculate how many of the _requested_ bytes were read.

> > +	 */

> > +	bytes_read = ret;

> > +

> > +	if (from != start)

> > +		ret -= from - start;

> > +

> > +	/*

> > +	 * Only account for extra bytes at the end if they were actually read.

> > +	 * For example, if the total length was truncated because of temporary

> > +	 * buffer size limit then the adjustment for the extra bytes at the

> > end

> > +	 * is not needed.

> > +	 */

> > +	if (start + bytes_read == end)

> > +		ret -= end - (from + len);

> > +

> > +	if (ret < 0) {

> > +		ret = -EIO;

> > +		goto out;

> > +	}

> > +

> > +	memcpy(buf, tmp_buf + (from - start), ret);

> > +out:

> > +	kfree(tmp_buf);

> > +	return ret;

> > +}

> > +

> >  static int spi_nor_read(struct mtd_info *mtd, loff_t from, size_t len,

> >  			size_t *retlen, u_char *buf)

> >  {

> > @@ -1921,7 +1997,10 @@ static int spi_nor_read(struct mtd_info *mtd,

> > loff_t from, size_t len,

> > 

> >  		addr = spi_nor_convert_addr(nor, addr);

> > 

> > -		ret = spi_nor_read_data(nor, addr, len, buf);

> > +		if (nor->read_proto == SNOR_PROTO_8_8_8_DTR)

> > +			ret = spi_nor_octal_dtr_read(nor, addr, len, buf);

> > +		else

> > +			ret = spi_nor_read_data(nor, addr, len, buf);

> >  		if (ret == 0) {

> >  			/* We shouldn't see 0-length reads */

> >  			ret = -EIO;

> 

> Reviewed-by: Michael Walle <michael@walle.cc>


Thanks.

> 

> I wonder how much performance is lost if this would just split

> one transfer into up to three ones: 2 byte, size - 2, 2 bytes.


This case is not really possible since it would try to read PAGE_SIZE 
whenever it can. But there is a situation possible where one transfer is 
split into three. It would look something like: 4096 bytes, size - 4096 
bytes, 2 bytes.

I am trying to find a balance between minimizing number of reads while 
keeping the size of the temporary buffer to a reasonable limit. This is 
the best I could come up with. It optimizes for smaller transfers so 
while the absolute amount of overhead remains roughly the same, the 
ratio of it relative to read size is smaller.

You can optimize for read performance if you are willing to waste memory 
by simple allocating a size + 2 bytes long buffer. Then the read can 
proceed in one transaction. But IMO memory is much more important 
compared to read throughput.

-- 
Regards,
Pratyush Yadav
Texas Instruments Inc.
Michael Walle May 7, 2021, 6:14 p.m. UTC | #4
Am 2021-05-07 20:04, schrieb Pratyush Yadav:
> On 07/05/21 05:51PM, Michael Walle wrote:

>> Am 2021-05-06 21:18, schrieb Pratyush Yadav:

>> > On Octal DTR capable flashes like Micron Xcella reads cannot start or

>> > end at an odd address in Octal DTR mode. Extra bytes need to be read at

>> > the start or end to make sure both the start address and length remain

>> > even.

>> >

>> > To avoid allocating too much extra memory, thereby putting unnecessary

>> > memory pressure on the system, the temporary buffer containing the extra

>> > padding bytes is capped at PAGE_SIZE bytes. The rest of the 2-byte

>> > aligned part should be read directly in the main buffer.

>> >

>> > Signed-off-by: Pratyush Yadav <p.yadav@ti.com>

>> > ---

>> >

>> >  drivers/mtd/spi-nor/core.c | 81 +++++++++++++++++++++++++++++++++++++-

>> >  1 file changed, 80 insertions(+), 1 deletion(-)

>> >

>> > diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c

>> > index 5cc206b8bbf3..3d66cc34af4d 100644

>> > --- a/drivers/mtd/spi-nor/core.c

>> > +++ b/drivers/mtd/spi-nor/core.c

>> > @@ -1904,6 +1904,82 @@ static const struct flash_info

>> > *spi_nor_read_id(struct spi_nor *nor)

>> >  	return ERR_PTR(-ENODEV);

>> >  }

>> >

>> > +/*

>> > + * On Octal DTR capable flashes like Micron Xcella reads cannot start

>> > or

>> > + * end at an odd address in Octal DTR mode. Extra bytes need to be read

>> > + * at the start or end to make sure both the start address and length

>> > + * remain even.

>> > + */

>> > +static int spi_nor_octal_dtr_read(struct spi_nor *nor, loff_t from,

>> > size_t len,

>> > +				  u_char *buf)

>> > +{

>> > +	u_char *tmp_buf;

>> > +	size_t tmp_len;

>> > +	loff_t start, end;

>> > +	int ret, bytes_read;

>> > +

>> > +	if (IS_ALIGNED(from, 2) && IS_ALIGNED(len, 2))

>> > +		return spi_nor_read_data(nor, from, len, buf);

>> > +	else if (IS_ALIGNED(from, 2) && len > PAGE_SIZE)

>> > +		return spi_nor_read_data(nor, from, round_down(len, PAGE_SIZE),

>> > +					 buf);

>> > +

>> > +	tmp_buf = kmalloc(PAGE_SIZE, GFP_KERNEL);

>> > +	if (!tmp_buf)

>> > +		return -ENOMEM;

>> > +

>> > +	start = round_down(from, 2);

>> > +	end = round_up(from + len, 2);

>> > +

>> > +	/*

>> > +	 * Avoid allocating too much memory. The requested read length might

>> > be

>> > +	 * quite large. Allocating a buffer just as large (slightly bigger, in

>> > +	 * fact) would put unnecessary memory pressure on the system.

>> > +	 *

>> > +	 * For example if the read is from 3 to 1M, then this will read from 2

>> > +	 * to 4098. The reads from 4098 to 1M will then not need a temporary

>> > +	 * buffer so they can proceed as normal.

>> > +	 */

>> > +	tmp_len = min_t(size_t, end - start, PAGE_SIZE);

>> > +

>> > +	ret = spi_nor_read_data(nor, start, tmp_len, tmp_buf);

>> > +	if (ret == 0) {

>> > +		ret = -EIO;

>> > +		goto out;

>> > +	}

>> > +	if (ret < 0)

>> > +		goto out;

>> > +

>> > +	/*

>> > +	 * More bytes are read than actually requested, but that number can't

>> > be

>> > +	 * reported to the calling function or it will confuse its

>> > calculations.

>> > +	 * Calculate how many of the _requested_ bytes were read.

>> > +	 */

>> > +	bytes_read = ret;

>> > +

>> > +	if (from != start)

>> > +		ret -= from - start;

>> > +

>> > +	/*

>> > +	 * Only account for extra bytes at the end if they were actually read.

>> > +	 * For example, if the total length was truncated because of temporary

>> > +	 * buffer size limit then the adjustment for the extra bytes at the

>> > end

>> > +	 * is not needed.

>> > +	 */

>> > +	if (start + bytes_read == end)

>> > +		ret -= end - (from + len);

>> > +

>> > +	if (ret < 0) {

>> > +		ret = -EIO;

>> > +		goto out;

>> > +	}

>> > +

>> > +	memcpy(buf, tmp_buf + (from - start), ret);

>> > +out:

>> > +	kfree(tmp_buf);

>> > +	return ret;

>> > +}

>> > +

>> >  static int spi_nor_read(struct mtd_info *mtd, loff_t from, size_t len,

>> >  			size_t *retlen, u_char *buf)

>> >  {

>> > @@ -1921,7 +1997,10 @@ static int spi_nor_read(struct mtd_info *mtd,

>> > loff_t from, size_t len,

>> >

>> >  		addr = spi_nor_convert_addr(nor, addr);

>> >

>> > -		ret = spi_nor_read_data(nor, addr, len, buf);

>> > +		if (nor->read_proto == SNOR_PROTO_8_8_8_DTR)

>> > +			ret = spi_nor_octal_dtr_read(nor, addr, len, buf);

>> > +		else

>> > +			ret = spi_nor_read_data(nor, addr, len, buf);

>> >  		if (ret == 0) {

>> >  			/* We shouldn't see 0-length reads */

>> >  			ret = -EIO;

>> 

>> Reviewed-by: Michael Walle <michael@walle.cc>

> 

> Thanks.

> 

>> 

>> I wonder how much performance is lost if this would just split

>> one transfer into up to three ones: 2 byte, size - 2, 2 bytes.

> 

> This case is not really possible since it would try to read PAGE_SIZE

> whenever it can. But there is a situation possible where one transfer 

> is

> split into three. It would look something like: 4096 bytes, size - 4096

> bytes, 2 bytes.


Ah no, I wasn't talking about your implementation, but just having a 
naive
one where you don't move around up to PAGE_SIZE of data but just read
2 bytes in the beginning (if unaligned) and 2 bytes at the end (if 
unaligned)
and reading the part in between just as usual because its then aligend.

> I am trying to find a balance between minimizing number of reads while

> keeping the size of the temporary buffer to a reasonable limit. This is

> the best I could come up with. It optimizes for smaller transfers so

> while the absolute amount of overhead remains roughly the same, the

> ratio of it relative to read size is smaller.


Yes, with this you will have that memcpy() and one transfer for 
transfers
up to PAGE_SIZE; the "naive" one above would have up to three depending 
on
the aligment.

> You can optimize for read performance if you are willing to waste 

> memory

> by simple allocating a size + 2 bytes long buffer. Then the read can

> proceed in one transaction. But IMO memory is much more important

> compared to read throughput.


-michael
Pratyush Yadav May 7, 2021, 6:23 p.m. UTC | #5
On 07/05/21 08:14PM, Michael Walle wrote:
> Am 2021-05-07 20:04, schrieb Pratyush Yadav:

> > On 07/05/21 05:51PM, Michael Walle wrote:

> > > Am 2021-05-06 21:18, schrieb Pratyush Yadav:

> > > > On Octal DTR capable flashes like Micron Xcella reads cannot start or

> > > > end at an odd address in Octal DTR mode. Extra bytes need to be read at

> > > > the start or end to make sure both the start address and length remain

> > > > even.

> > > >

> > > > To avoid allocating too much extra memory, thereby putting unnecessary

> > > > memory pressure on the system, the temporary buffer containing the extra

> > > > padding bytes is capped at PAGE_SIZE bytes. The rest of the 2-byte

> > > > aligned part should be read directly in the main buffer.

> > > >

> > > > Signed-off-by: Pratyush Yadav <p.yadav@ti.com>

> > > > ---

> > > >

> > > >  drivers/mtd/spi-nor/core.c | 81 +++++++++++++++++++++++++++++++++++++-

> > > >  1 file changed, 80 insertions(+), 1 deletion(-)

> > > >

> > > > diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c

> > > > index 5cc206b8bbf3..3d66cc34af4d 100644

> > > > --- a/drivers/mtd/spi-nor/core.c

> > > > +++ b/drivers/mtd/spi-nor/core.c

> > > > @@ -1904,6 +1904,82 @@ static const struct flash_info

> > > > *spi_nor_read_id(struct spi_nor *nor)

> > > >  	return ERR_PTR(-ENODEV);

> > > >  }

> > > >

> > > > +/*

> > > > + * On Octal DTR capable flashes like Micron Xcella reads cannot start

> > > > or

> > > > + * end at an odd address in Octal DTR mode. Extra bytes need to be read

> > > > + * at the start or end to make sure both the start address and length

> > > > + * remain even.

> > > > + */

> > > > +static int spi_nor_octal_dtr_read(struct spi_nor *nor, loff_t from,

> > > > size_t len,

> > > > +				  u_char *buf)

> > > > +{

> > > > +	u_char *tmp_buf;

> > > > +	size_t tmp_len;

> > > > +	loff_t start, end;

> > > > +	int ret, bytes_read;

> > > > +

> > > > +	if (IS_ALIGNED(from, 2) && IS_ALIGNED(len, 2))

> > > > +		return spi_nor_read_data(nor, from, len, buf);

> > > > +	else if (IS_ALIGNED(from, 2) && len > PAGE_SIZE)

> > > > +		return spi_nor_read_data(nor, from, round_down(len, PAGE_SIZE),

> > > > +					 buf);

> > > > +

> > > > +	tmp_buf = kmalloc(PAGE_SIZE, GFP_KERNEL);

> > > > +	if (!tmp_buf)

> > > > +		return -ENOMEM;

> > > > +

> > > > +	start = round_down(from, 2);

> > > > +	end = round_up(from + len, 2);

> > > > +

> > > > +	/*

> > > > +	 * Avoid allocating too much memory. The requested read length might

> > > > be

> > > > +	 * quite large. Allocating a buffer just as large (slightly bigger, in

> > > > +	 * fact) would put unnecessary memory pressure on the system.

> > > > +	 *

> > > > +	 * For example if the read is from 3 to 1M, then this will read from 2

> > > > +	 * to 4098. The reads from 4098 to 1M will then not need a temporary

> > > > +	 * buffer so they can proceed as normal.

> > > > +	 */

> > > > +	tmp_len = min_t(size_t, end - start, PAGE_SIZE);

> > > > +

> > > > +	ret = spi_nor_read_data(nor, start, tmp_len, tmp_buf);

> > > > +	if (ret == 0) {

> > > > +		ret = -EIO;

> > > > +		goto out;

> > > > +	}

> > > > +	if (ret < 0)

> > > > +		goto out;

> > > > +

> > > > +	/*

> > > > +	 * More bytes are read than actually requested, but that number can't

> > > > be

> > > > +	 * reported to the calling function or it will confuse its

> > > > calculations.

> > > > +	 * Calculate how many of the _requested_ bytes were read.

> > > > +	 */

> > > > +	bytes_read = ret;

> > > > +

> > > > +	if (from != start)

> > > > +		ret -= from - start;

> > > > +

> > > > +	/*

> > > > +	 * Only account for extra bytes at the end if they were actually read.

> > > > +	 * For example, if the total length was truncated because of temporary

> > > > +	 * buffer size limit then the adjustment for the extra bytes at the

> > > > end

> > > > +	 * is not needed.

> > > > +	 */

> > > > +	if (start + bytes_read == end)

> > > > +		ret -= end - (from + len);

> > > > +

> > > > +	if (ret < 0) {

> > > > +		ret = -EIO;

> > > > +		goto out;

> > > > +	}

> > > > +

> > > > +	memcpy(buf, tmp_buf + (from - start), ret);

> > > > +out:

> > > > +	kfree(tmp_buf);

> > > > +	return ret;

> > > > +}

> > > > +

> > > >  static int spi_nor_read(struct mtd_info *mtd, loff_t from, size_t len,

> > > >  			size_t *retlen, u_char *buf)

> > > >  {

> > > > @@ -1921,7 +1997,10 @@ static int spi_nor_read(struct mtd_info *mtd,

> > > > loff_t from, size_t len,

> > > >

> > > >  		addr = spi_nor_convert_addr(nor, addr);

> > > >

> > > > -		ret = spi_nor_read_data(nor, addr, len, buf);

> > > > +		if (nor->read_proto == SNOR_PROTO_8_8_8_DTR)

> > > > +			ret = spi_nor_octal_dtr_read(nor, addr, len, buf);

> > > > +		else

> > > > +			ret = spi_nor_read_data(nor, addr, len, buf);

> > > >  		if (ret == 0) {

> > > >  			/* We shouldn't see 0-length reads */

> > > >  			ret = -EIO;

> > > 

> > > Reviewed-by: Michael Walle <michael@walle.cc>

> > 

> > Thanks.

> > 

> > > 

> > > I wonder how much performance is lost if this would just split

> > > one transfer into up to three ones: 2 byte, size - 2, 2 bytes.

> > 

> > This case is not really possible since it would try to read PAGE_SIZE

> > whenever it can. But there is a situation possible where one transfer is

> > split into three. It would look something like: 4096 bytes, size - 4096

> > bytes, 2 bytes.

> 

> Ah no, I wasn't talking about your implementation, but just having a naive

> one where you don't move around up to PAGE_SIZE of data but just read

> 2 bytes in the beginning (if unaligned) and 2 bytes at the end (if

> unaligned)

> and reading the part in between just as usual because its then aligend.

> 

> > I am trying to find a balance between minimizing number of reads while

> > keeping the size of the temporary buffer to a reasonable limit. This is

> > the best I could come up with. It optimizes for smaller transfers so

> > while the absolute amount of overhead remains roughly the same, the

> > ratio of it relative to read size is smaller.

> 

> Yes, with this you will have that memcpy() and one transfer for transfers

> up to PAGE_SIZE; the "naive" one above would have up to three depending on

> the aligment.


Right. Smaller transfers lose much more performance to the overhead than 
the larger ones do. So I think the optimization is worth the extra code 
complexity.

> 

> > You can optimize for read performance if you are willing to waste memory

> > by simple allocating a size + 2 bytes long buffer. Then the read can

> > proceed in one transaction. But IMO memory is much more important

> > compared to read throughput.

> 

> -michael


-- 
Regards,
Pratyush Yadav
Texas Instruments Inc.