Message ID | 1407233482-11642-2-git-send-email-rogerq@ti.com |
---|---|
State | Accepted |
Commit | 7d5929c1f34304ca5a970cfde8044053e56aa8c9 |
Headers | show |
On Tue, Aug 5, 2014 at 1:11 PM, Roger Quadros <rogerq@ti.com> wrote: > For v3.12 and prior, 1-bit Hamming code ECC via software was the > default choice. Commit c66d039197e4 in v3.13 changed the behaviour > to use 1-bit Hamming code via Hardware using a different ECC layout > i.e. (ROM code layout) than what is used by software ECC. > > This ECC layout change causes NAND filesystems created in v3.12 > and prior to be unusable in v3.13 and later. So revert back to > using software ECC by default if an ECC scheme is not explicitely > specified. > > This defect can be observed on the following boards during legacy boot > > -omap3beagle > -omap3touchbook > -overo > -am3517crane > -devkit8000 > -ldp > -3430sdp omap3pandora is also using sw ecc, with ubifs. Some time ago I tried booting mainline (I think it was 3.14) with rootfs on NAND, and while it did boot and reached a shell, there were lots of ubifs errors, fs got corrupted and I lost all my data. I used to be able to boot mainline this way fine sometime ~3.8 release. It's interesting that 3.14 was able to read the data, even with wrong ecc setup. Do you think it's safe again to boot ubifs created on 3.2 after applying this series?
Hi Gražvydas, On 08/05/2014 07:15 PM, Grazvydas Ignotas wrote: > On Tue, Aug 5, 2014 at 1:11 PM, Roger Quadros <rogerq@ti.com> wrote: >> For v3.12 and prior, 1-bit Hamming code ECC via software was the >> default choice. Commit c66d039197e4 in v3.13 changed the behaviour >> to use 1-bit Hamming code via Hardware using a different ECC layout >> i.e. (ROM code layout) than what is used by software ECC. >> >> This ECC layout change causes NAND filesystems created in v3.12 >> and prior to be unusable in v3.13 and later. So revert back to >> using software ECC by default if an ECC scheme is not explicitely >> specified. >> >> This defect can be observed on the following boards during legacy boot >> >> -omap3beagle >> -omap3touchbook >> -overo >> -am3517crane >> -devkit8000 >> -ldp >> -3430sdp > > omap3pandora is also using sw ecc, with ubifs. Some time ago I tried > booting mainline (I think it was 3.14) with rootfs on NAND, and while > it did boot and reached a shell, there were lots of ubifs errors, fs > got corrupted and I lost all my data. I used to be able to boot > mainline this way fine sometime ~3.8 release. It's interesting that > 3.14 was able to read the data, even with wrong ecc setup. This is due to another bug introduced in 3.7 by commit 65b97cf6b8deca3ad7a3e00e8316bb89617190fb. Because of that bug (i.e. inverted CS_MASK in omap_calculate_ecc), omap_calculate_ecc() always fails with -EINVAL and calculated ECC bytes are always 0. I'll be sending a patch to fix that as well. But that will only affect the cases where OMAP_ECC_HAM1_CODE_HW is used which happened for pandora from 3.13 onwards. > > Do you think it's safe again to boot ubifs created on 3.2 after > applying this series? > Yes. If you boot pandora using legacy boot (non DT method), it passes 0 for .ecc_opt in pandora_nand_data. This used to mean OMAP_ECC_HAMMING_CODE_DEFAULT which is software ecc. i.e. NAND_ECC_SOFT with default ECC layout. Until the above mentioned commits changed the meaning. We now call that option OMAP_ECC_HAM1_CODE_SW. Please let me know if it works for you. Thanks. cheers, -roger -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Wed, Aug 6, 2014 at 11:02 AM, Roger Quadros <rogerq@ti.com> wrote: > Hi Gražvydas, > > On 08/05/2014 07:15 PM, Grazvydas Ignotas wrote: >> On Tue, Aug 5, 2014 at 1:11 PM, Roger Quadros <rogerq@ti.com> wrote: >>> For v3.12 and prior, 1-bit Hamming code ECC via software was the >>> default choice. Commit c66d039197e4 in v3.13 changed the behaviour >>> to use 1-bit Hamming code via Hardware using a different ECC layout >>> i.e. (ROM code layout) than what is used by software ECC. >>> >>> This ECC layout change causes NAND filesystems created in v3.12 >>> and prior to be unusable in v3.13 and later. So revert back to >>> using software ECC by default if an ECC scheme is not explicitely >>> specified. >>> >>> This defect can be observed on the following boards during legacy boot >>> >>> -omap3beagle >>> -omap3touchbook >>> -overo >>> -am3517crane >>> -devkit8000 >>> -ldp >>> -3430sdp >> >> omap3pandora is also using sw ecc, with ubifs. Some time ago I tried >> booting mainline (I think it was 3.14) with rootfs on NAND, and while >> it did boot and reached a shell, there were lots of ubifs errors, fs >> got corrupted and I lost all my data. I used to be able to boot >> mainline this way fine sometime ~3.8 release. It's interesting that >> 3.14 was able to read the data, even with wrong ecc setup. > > This is due to another bug introduced in 3.7 by commit 65b97cf6b8deca3ad7a3e00e8316bb89617190fb. > Because of that bug (i.e. inverted CS_MASK in omap_calculate_ecc), omap_calculate_ecc() always fails with -EINVAL and calculated ECC bytes are always 0. I'll be sending a patch to fix that as well. But that will only affect the cases where OMAP_ECC_HAM1_CODE_HW is used which happened for pandora from 3.13 onwards. > >> >> Do you think it's safe again to boot ubifs created on 3.2 after >> applying this series? >> > > Yes. If you boot pandora using legacy boot (non DT method), it passes 0 for .ecc_opt in pandora_nand_data. This used to mean OMAP_ECC_HAMMING_CODE_DEFAULT which is software ecc. i.e. NAND_ECC_SOFT with default ECC layout. Until the above mentioned commits changed the meaning. We now call that option OMAP_ECC_HAM1_CODE_SW. > > Please let me know if it works for you. Thanks. Yes it does, thank you. Tested-by: Grazvydas Ignotas <notasas@gmail.com> Found something new in dmesg though: [ 1.542755] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xbc [ 1.549621] nand: Micron MT29F4G16ABBDA3W [ 1.553894] nand: 512MiB, SLC, page size: 2048, OOB size: 64 [ 1.560058] nand: WARNING: omap2-nand.0: the ECC used on your system is too weak compared to the one required by the NAND chip Do you think it's best to migrate to different ECC scheme? It would be better to avoid that so that users can freely change kernels and the bootloader wouldn't have to be changed..
On 08/07/2014 01:55 AM, Grazvydas Ignotas wrote: > On Wed, Aug 6, 2014 at 11:02 AM, Roger Quadros <rogerq@ti.com> wrote: >> Hi Gražvydas, >> >> On 08/05/2014 07:15 PM, Grazvydas Ignotas wrote: >>> On Tue, Aug 5, 2014 at 1:11 PM, Roger Quadros <rogerq@ti.com> wrote: >>>> For v3.12 and prior, 1-bit Hamming code ECC via software was the >>>> default choice. Commit c66d039197e4 in v3.13 changed the behaviour >>>> to use 1-bit Hamming code via Hardware using a different ECC layout >>>> i.e. (ROM code layout) than what is used by software ECC. >>>> >>>> This ECC layout change causes NAND filesystems created in v3.12 >>>> and prior to be unusable in v3.13 and later. So revert back to >>>> using software ECC by default if an ECC scheme is not explicitely >>>> specified. >>>> >>>> This defect can be observed on the following boards during legacy boot >>>> >>>> -omap3beagle >>>> -omap3touchbook >>>> -overo >>>> -am3517crane >>>> -devkit8000 >>>> -ldp >>>> -3430sdp >>> >>> omap3pandora is also using sw ecc, with ubifs. Some time ago I tried >>> booting mainline (I think it was 3.14) with rootfs on NAND, and while >>> it did boot and reached a shell, there were lots of ubifs errors, fs >>> got corrupted and I lost all my data. I used to be able to boot >>> mainline this way fine sometime ~3.8 release. It's interesting that >>> 3.14 was able to read the data, even with wrong ecc setup. >> >> This is due to another bug introduced in 3.7 by commit 65b97cf6b8deca3ad7a3e00e8316bb89617190fb. >> Because of that bug (i.e. inverted CS_MASK in omap_calculate_ecc), omap_calculate_ecc() always fails with -EINVAL and calculated ECC bytes are always 0. I'll be sending a patch to fix that as well. But that will only affect the cases where OMAP_ECC_HAM1_CODE_HW is used which happened for pandora from 3.13 onwards. >> >>> >>> Do you think it's safe again to boot ubifs created on 3.2 after >>> applying this series? >>> >> >> Yes. If you boot pandora using legacy boot (non DT method), it passes 0 for .ecc_opt in pandora_nand_data. This used to mean OMAP_ECC_HAMMING_CODE_DEFAULT which is software ecc. i.e. NAND_ECC_SOFT with default ECC layout. Until the above mentioned commits changed the meaning. We now call that option OMAP_ECC_HAM1_CODE_SW. >> >> Please let me know if it works for you. Thanks. > > Yes it does, thank you. > Tested-by: Grazvydas Ignotas <notasas@gmail.com> > > Found something new in dmesg though: > [ 1.542755] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xbc > [ 1.549621] nand: Micron MT29F4G16ABBDA3W > [ 1.553894] nand: 512MiB, SLC, page size: 2048, OOB size: 64 > [ 1.560058] nand: WARNING: omap2-nand.0: the ECC used on your > system is too weak compared to the one required by the NAND chip > > Do you think it's best to migrate to different ECC scheme? It would be > better to avoid that so that users can freely change kernels and the > bootloader wouldn't have to be changed.. > I'm not sure why these boards were using Software ECC scheme in the first place. So moving to a better ECC scheme should be considered with a warning that backward compatibility will be broken. There is a limitation with the OMAP3 ROM code loader. So if you want uniform ECC scheme for MLO, u-boot and kernel partitions then we are limited to Hamming code for SLC NAND with 512B, 2KB and 4KB pages. For MLC NAND, the ROM code uses a proprietary layout using checksum and BCH and I'm not very sure if this is compatible with the newer OMAP platforms and AM33xx platforms. For details see OMAP35x TRM. (spruf98y.pdf) http://www.ti.com/lit/ug/spruf98y/spruf98y.pdf sections 25.4.7.4.2 SLC NAND Read Sector Procedure 25.4.7.4.3 MLC NAND Read Sector Procedure cheers, -roger -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Grazvydas Ignotas <notasas@gmail.com> [140806 15:57]: > On Wed, Aug 6, 2014 at 11:02 AM, Roger Quadros <rogerq@ti.com> wrote: > > Hi Gražvydas, > > > > On 08/05/2014 07:15 PM, Grazvydas Ignotas wrote: > >> On Tue, Aug 5, 2014 at 1:11 PM, Roger Quadros <rogerq@ti.com> wrote: > >>> For v3.12 and prior, 1-bit Hamming code ECC via software was the > >>> default choice. Commit c66d039197e4 in v3.13 changed the behaviour > >>> to use 1-bit Hamming code via Hardware using a different ECC layout > >>> i.e. (ROM code layout) than what is used by software ECC. > >>> > >>> This ECC layout change causes NAND filesystems created in v3.12 > >>> and prior to be unusable in v3.13 and later. So revert back to > >>> using software ECC by default if an ECC scheme is not explicitely > >>> specified. > >>> > >>> This defect can be observed on the following boards during legacy boot > >>> > >>> -omap3beagle > >>> -omap3touchbook > >>> -overo > >>> -am3517crane > >>> -devkit8000 > >>> -ldp > >>> -3430sdp > >> > >> omap3pandora is also using sw ecc, with ubifs. Some time ago I tried > >> booting mainline (I think it was 3.14) with rootfs on NAND, and while > >> it did boot and reached a shell, there were lots of ubifs errors, fs > >> got corrupted and I lost all my data. I used to be able to boot > >> mainline this way fine sometime ~3.8 release. It's interesting that > >> 3.14 was able to read the data, even with wrong ecc setup. > > > > This is due to another bug introduced in 3.7 by commit 65b97cf6b8deca3ad7a3e00e8316bb89617190fb. > > Because of that bug (i.e. inverted CS_MASK in omap_calculate_ecc), omap_calculate_ecc() always fails with -EINVAL and calculated ECC bytes are always 0. I'll be sending a patch to fix that as well. But that will only affect the cases where OMAP_ECC_HAM1_CODE_HW is used which happened for pandora from 3.13 onwards. > > > >> > >> Do you think it's safe again to boot ubifs created on 3.2 after > >> applying this series? > >> > > > > Yes. If you boot pandora using legacy boot (non DT method), it passes 0 for .ecc_opt in pandora_nand_data. This used to mean OMAP_ECC_HAMMING_CODE_DEFAULT which is software ecc. i.e. NAND_ECC_SOFT with default ECC layout. Until the above mentioned commits changed the meaning. We now call that option OMAP_ECC_HAM1_CODE_SW. > > > > Please let me know if it works for you. Thanks. > > Yes it does, thank you. > Tested-by: Grazvydas Ignotas <notasas@gmail.com> OK thanks applying the whole series into omap-for-v3.17/fixes. Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
diff --git a/arch/arm/mach-omap2/board-flash.c b/arch/arm/mach-omap2/board-flash.c index e87f2a8..2d245c2 100644 --- a/arch/arm/mach-omap2/board-flash.c +++ b/arch/arm/mach-omap2/board-flash.c @@ -142,7 +142,7 @@ __init board_nand_init(struct mtd_partition *nand_parts, u8 nr_parts, u8 cs, board_nand_data.nr_parts = nr_parts; board_nand_data.devsize = nand_type; - board_nand_data.ecc_opt = OMAP_ECC_HAM1_CODE_HW; + board_nand_data.ecc_opt = OMAP_ECC_HAM1_CODE_SW; gpmc_nand_init(&board_nand_data, gpmc_t); } #endif /* CONFIG_MTD_NAND_OMAP2 || CONFIG_MTD_NAND_OMAP2_MODULE */ diff --git a/arch/arm/mach-omap2/gpmc-nand.c b/arch/arm/mach-omap2/gpmc-nand.c index 93914d2..03b6f95 100644 --- a/arch/arm/mach-omap2/gpmc-nand.c +++ b/arch/arm/mach-omap2/gpmc-nand.c @@ -68,7 +68,8 @@ static bool gpmc_hwecc_bch_capable(enum omap_ecc ecc_opt) return 0; /* legacy platforms support only HAM1 (1-bit Hamming) ECC scheme */ - if (ecc_opt == OMAP_ECC_HAM1_CODE_HW) + if (ecc_opt == OMAP_ECC_HAM1_CODE_HW || + ecc_opt == OMAP_ECC_HAM1_CODE_SW) return 1; else return 0; diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c index f0ed92e..4dd6178 100644 --- a/drivers/mtd/nand/omap2.c +++ b/drivers/mtd/nand/omap2.c @@ -1794,9 +1794,12 @@ static int omap_nand_probe(struct platform_device *pdev) } /* populate MTD interface based on ECC scheme */ - nand_chip->ecc.layout = &omap_oobinfo; ecclayout = &omap_oobinfo; switch (info->ecc_opt) { + case OMAP_ECC_HAM1_CODE_SW: + nand_chip->ecc.mode = NAND_ECC_SOFT; + break; + case OMAP_ECC_HAM1_CODE_HW: pr_info("nand: using OMAP_ECC_HAM1_CODE_HW\n"); nand_chip->ecc.mode = NAND_ECC_HW; @@ -1848,7 +1851,7 @@ static int omap_nand_probe(struct platform_device *pdev) nand_chip->ecc.priv = nand_bch_init(mtd, nand_chip->ecc.size, nand_chip->ecc.bytes, - &nand_chip->ecc.layout); + &ecclayout); if (!nand_chip->ecc.priv) { pr_err("nand: error: unable to use s/w BCH library\n"); err = -EINVAL; @@ -1923,7 +1926,7 @@ static int omap_nand_probe(struct platform_device *pdev) nand_chip->ecc.priv = nand_bch_init(mtd, nand_chip->ecc.size, nand_chip->ecc.bytes, - &nand_chip->ecc.layout); + &ecclayout); if (!nand_chip->ecc.priv) { pr_err("nand: error: unable to use s/w BCH library\n"); err = -EINVAL; @@ -2012,6 +2015,9 @@ static int omap_nand_probe(struct platform_device *pdev) goto return_error; } + if (info->ecc_opt == OMAP_ECC_HAM1_CODE_SW) + goto scan_tail; + /* all OOB bytes from oobfree->offset till end off OOB are free */ ecclayout->oobfree->length = mtd->oobsize - ecclayout->oobfree->offset; /* check if NAND device's OOB is enough to store ECC signatures */ @@ -2021,7 +2027,9 @@ static int omap_nand_probe(struct platform_device *pdev) err = -EINVAL; goto return_error; } + nand_chip->ecc.layout = ecclayout; +scan_tail: /* second phase scan */ if (nand_scan_tail(mtd)) { err = -ENXIO; diff --git a/include/linux/platform_data/mtd-nand-omap2.h b/include/linux/platform_data/mtd-nand-omap2.h index 660c029..16ec262 100644 --- a/include/linux/platform_data/mtd-nand-omap2.h +++ b/include/linux/platform_data/mtd-nand-omap2.h @@ -21,8 +21,17 @@ enum nand_io { }; enum omap_ecc { - /* 1-bit ECC calculation by GPMC, Error detection by Software */ - OMAP_ECC_HAM1_CODE_HW = 0, + /* + * 1-bit ECC: calculation and correction by SW + * ECC stored at end of spare area + */ + OMAP_ECC_HAM1_CODE_SW = 0, + + /* + * 1-bit ECC: calculation by GPMC, Error detection by Software + * ECC layout compatible with ROM code layout + */ + OMAP_ECC_HAM1_CODE_HW, /* 4-bit ECC calculation by GPMC, Error detection by Software */ OMAP_ECC_BCH4_CODE_HW_DETECTION_SW, /* 4-bit ECC calculation by GPMC, Error detection by ELM */
For v3.12 and prior, 1-bit Hamming code ECC via software was the default choice. Commit c66d039197e4 in v3.13 changed the behaviour to use 1-bit Hamming code via Hardware using a different ECC layout i.e. (ROM code layout) than what is used by software ECC. This ECC layout change causes NAND filesystems created in v3.12 and prior to be unusable in v3.13 and later. So revert back to using software ECC by default if an ECC scheme is not explicitely specified. This defect can be observed on the following boards during legacy boot -omap3beagle -omap3touchbook -overo -am3517crane -devkit8000 -ldp -3430sdp Signed-off-by: Roger Quadros <rogerq@ti.com> --- arch/arm/mach-omap2/board-flash.c | 2 +- arch/arm/mach-omap2/gpmc-nand.c | 3 ++- drivers/mtd/nand/omap2.c | 14 +++++++++++--- include/linux/platform_data/mtd-nand-omap2.h | 13 +++++++++++-- 4 files changed, 25 insertions(+), 7 deletions(-)