Message ID | 20240117110646.1317843-1-claudiu.beznea.uj@bp.renesas.com |
---|---|
State | New |
Headers | show |
Series | [v2] mmc: renesas_sdhi: Fix change point of data handling | expand |
On Wed, Jan 17, 2024 at 01:06:46PM +0200, Claudiu wrote: > From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> > > On latest kernel revisions it has been noticed (on a RZ/G3S system) that > when booting Linux and root file system is on eMMC, at some point in > the booting process, when the systemd applications are started, the > "mmc0: tuning execution failed: -5" message is displayed on console. > On kernel v6.7-rc5 this is reproducible in 90% of the boots. This was > missing on the same system with kernel v6.5.0-rc1. It was also noticed on > kernel revisions v6.6-rcX on a RZ/G2UL based system but not on the kernel > this fix is based on (v6.7-rc5). > > Investigating it on RZ/G3S lead to the conclusion that every time the issue > is reproduced all the probed TAPs are OK. According to datasheet, when this > happens the change point of data need to be considered for tuning. > > Previous code considered the change point of data happens when the content > of the SMPCMP register is zero. According to RZ/V2M hardware manual, > chapter "Change Point of the Input Data" (as this is the most clear > description that I've found about change point of the input data and all > RZ hardware manual are similar on this chapter), at the time of tuning, > data is captured by the previous and next TAPs and the result is stored in > the SMPCMP register (previous TAP in bits 22..16, next TAP in bits 7..0). > If there is a mismatch b/w the previous and the next TAPs, it indicates > that there is a change point of the input data. > > To comply with this, the code checks if this mismatch is present and > updates the priv->smpcmp mask. > > This change has been checked on the devices with the following DTSes by > doing 50 consecutive reboots and checking for the tuning failure message: > - r9a08g045s33-smarc.dts > - r8a7742-iwg21d-q7.dts > - r8a7743-iwg20d-q7.dts > - r8a7744-iwg20d-q7.dts > - r8a7745-iwg22d-sodimm.dts > - r8a77470-iwg23s-sbc.dts > - r8a774a1-hihope-rzg2m-ex.dts > - r8a774b1-hihope-rzg2n-ex.dts > - r8a774c0-ek874.dts > - r8a774e1-hihope-rzg2h-ex.dts > - r9a07g043u11-smarc-rzg2ul.dts > - r9a07g044c2-smarc-rzg2lc.dts > - r9a07g044l2-smarc-rzg2l.dts > - r9a07g054l2-smarc-rzv2l.dts > > On r8a774a1-hihope-rzg2m-ex, even though the hardware manual doesn't say > anything special about it in the "Change Point of the Input Data" chapter > or SMPCMP register description, it has been noticed that although all TAPs > probed in the tuning process are OK the SMPCMP is zero. For this updated > the renesas_sdhi_select_tuning() function to use priv->taps in case all > TAPs are OK. > > Fixes: 5fb6bf51f6d1 ("mmc: renesas_sdhi: improve TAP selection if all TAPs are good") > Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Very interesting patch! Please give me a few days to review/test it.
> Very interesting patch! Please give me a few days to review/test it.
I am still at it. I got some objections from Renesas and am trying to
figure out more details.
Hi Claudiu, but one thing I can ask already: > Investigating it on RZ/G3S lead to the conclusion that every time the issue > is reproduced all the probed TAPs are OK. According to datasheet, when this > happens the change point of data need to be considered for tuning. Yes, "considered" means here it should be *avoided*. > Previous code considered the change point of data happens when the content > of the SMPCMP register is zero. According to RZ/V2M hardware manual, When SMPCMP is zero, there is *no* change point. Which is good. > chapter "Change Point of the Input Data" (as this is the most clear > description that I've found about change point of the input data and all > RZ hardware manual are similar on this chapter), I also have a chapter named like this. If you check the diagram, change point is between TAP2 and 3, so the suggested TAP to use is 6 or 7. As far away as possible from the change point. > at the time of tuning, > data is captured by the previous and next TAPs and the result is stored in > the SMPCMP register (previous TAP in bits 22..16, next TAP in bits 7..0). > If there is a mismatch b/w the previous and the next TAPs, it indicates > that there is a change point of the input data. This is correct. > To comply with this, the code checks if this mismatch is present and > updates the priv->smpcmp mask. That means you select the "change point" instead of avoiding it? > This change has been checked on the devices with the following DTSes by > doing 50 consecutive reboots and checking for the tuning failure message: Okay, you might not have a failure message, but you might have selected the worst TAP. Or? > + if (cmpngu_data != cmpngd_data) > + set_bit(i, priv->smpcmp); Really looks like you select the change point instead of avoiding it. However, with some SD cards, I also see the EIO error you see. So, there might be room to improve TAP selection when all TAPs are good. I need to check if this is really is the same case for the SD cards in question. Happy hacking, Wolfram
Hi, Wolfram, On 29.01.2024 12:55, Wolfram Sang wrote: > Hi Claudiu, > > but one thing I can ask already: > >> Investigating it on RZ/G3S lead to the conclusion that every time the issue >> is reproduced all the probed TAPs are OK. According to datasheet, when this >> happens the change point of data need to be considered for tuning. > > Yes, "considered" means here it should be *avoided*. My understanding was the other way around from this statement found in RZ/G3S hw manual: "If all of the TAP [i] is OK, the sampling clock position is selected by identifying the change point of data. Change point of the data can be found in the value of SCC_SMPCMP register. Usage example is Section 33.8.3, Change point of the input data." > >> Previous code considered the change point of data happens when the content >> of the SMPCMP register is zero. According to RZ/V2M hardware manual, > > When SMPCMP is zero, there is *no* change point. Which is good. That was my understanding, too. > >> chapter "Change Point of the Input Data" (as this is the most clear >> description that I've found about change point of the input data and all >> RZ hardware manual are similar on this chapter), > > I also have a chapter named like this. If you check the diagram, change > point is between TAP2 and 3, so the suggested TAP to use is 6 or 7. As > far away as possible from the change point. My understanding was different here as of the following hw manual statement: "As the width of the input data is 1 (UI), select TAP6 or TAP7 which is *the median* of next TAP3 from TAP3" I understand from this that the median value should be considered here. > >> at the time of tuning, >> data is captured by the previous and next TAPs and the result is stored in >> the SMPCMP register (previous TAP in bits 22..16, next TAP in bits 7..0). >> If there is a mismatch b/w the previous and the next TAPs, it indicates >> that there is a change point of the input data. > > This is correct. > >> To comply with this, the code checks if this mismatch is present and >> updates the priv->smpcmp mask. > > That means you select the "change point" instead of avoiding it? > >> This change has been checked on the devices with the following DTSes by >> doing 50 consecutive reboots and checking for the tuning failure message: > > Okay, you might not have a failure message, but you might have selected > the worst TAP. Or? > >> + if (cmpngu_data != cmpngd_data) >> + set_bit(i, priv->smpcmp); > > Really looks like you select the change point instead of avoiding it. Looking again at it and digesting what you said about the tuning here, yes it seems I did it this way. > > However, with some SD cards, I also see the EIO error you see. So, there > might be room to improve TAP selection when all TAPs are good. I need to > check if this is really is the same case for the SD cards in question. Maybe better would be to change this condition: if (cmpngu_data != cmpngd_data) set_bit(i, priv->smpcmp); like this: if (cmpngu_data == cmpngd_data) set_bit(i, priv->smpcmp); ? I need to check it, though. Thanks for your input, Claudiu Beznea > > Happy hacking, > > Wolfram >
Hi Claudiu, > My understanding was the other way around from this statement found in > RZ/G3S hw manual: > > "If all of the TAP [i] is OK, the sampling clock position is selected by > identifying the change point of data. Yes, it is easy to misunderstand. It should add "and avoid it" or something. I got an internal diagram which makes it more clear. I just asked if I can share it with you. > > I also have a chapter named like this. If you check the diagram, change > > point is between TAP2 and 3, so the suggested TAP to use is 6 or 7. As > > far away as possible from the change point. > > My understanding was different here as of the following hw manual statement: > > "As the width of the input data is 1 (UI), select TAP6 or TAP7 which is > > *the median* of next TAP3 from TAP3" > > I understand from this that the median value should be considered here. Sorry, can't follow you here. "Select TAP6 or TAP7" is clear to me. But it doesn't really matter why it was misleading... > > However, with some SD cards, I also see the EIO error you see. So, there > > might be room to improve TAP selection when all TAPs are good. I need to > > check if this is really is the same case for the SD cards in question. > > Maybe better would be to change this condition: > > if (cmpngu_data != cmpngd_data) > set_bit(i, priv->smpcmp); > > like this: > if (cmpngu_data == cmpngd_data) > set_bit(i, priv->smpcmp); > > ? > > I need to check it, though. But isn't it equal to the current code then? (Except for one thing: the smpcmp bit is only set when there is no cmd error. I need to double check but I think I like that.) Happy hacking, Wolfram
> But isn't it equal to the current code then? (Except for one thing: the > smpcmp bit is only set when there is no cmd error. I need to double > check but I think I like that.) I double checked, I really like it. I'd just invert the logic. Pseudo code: if (!cmd_error) if (SMPCMP == 0) set_bit else mmc_abort_tuning()
On 30.01.2024 09:26, Wolfram Sang wrote: > Hi Claudiu, > >> My understanding was the other way around from this statement found in >> RZ/G3S hw manual: >> >> "If all of the TAP [i] is OK, the sampling clock position is selected by >> identifying the change point of data. > > Yes, it is easy to misunderstand. It should add "and avoid it" or > something. I got an internal diagram which makes it more clear. I just > asked if I can share it with you. > >>> I also have a chapter named like this. If you check the diagram, change >>> point is between TAP2 and 3, so the suggested TAP to use is 6 or 7. As >>> far away as possible from the change point. >> >> My understanding was different here as of the following hw manual statement: >> >> "As the width of the input data is 1 (UI), select TAP6 or TAP7 which is >> >> *the median* of next TAP3 from TAP3" >> >> I understand from this that the median value should be considered here. > > Sorry, can't follow you here. "Select TAP6 or TAP7" is clear to me. But > it doesn't really matter why it was misleading... > >>> However, with some SD cards, I also see the EIO error you see. So, there >>> might be room to improve TAP selection when all TAPs are good. I need to >>> check if this is really is the same case for the SD cards in question. >> >> Maybe better would be to change this condition: >> >> if (cmpngu_data != cmpngd_data) >> set_bit(i, priv->smpcmp); >> >> like this: >> if (cmpngu_data == cmpngd_data) >> set_bit(i, priv->smpcmp); >> >> ? >> >> I need to check it, though. > > But isn't it equal to the current code then? (Except for one thing: the
diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c index c675dec587ef..0090228a5e8f 100644 --- a/drivers/mmc/host/renesas_sdhi_core.c +++ b/drivers/mmc/host/renesas_sdhi_core.c @@ -18,6 +18,7 @@ * */ +#include <linux/bitfield.h> #include <linux/clk.h> #include <linux/delay.h> #include <linux/iopoll.h> @@ -312,6 +313,8 @@ static int renesas_sdhi_start_signal_voltage_switch(struct mmc_host *mmc, #define SH_MOBILE_SDHI_SCC_SMPCMP_CMD_REQDOWN BIT(8) #define SH_MOBILE_SDHI_SCC_SMPCMP_CMD_REQUP BIT(24) #define SH_MOBILE_SDHI_SCC_SMPCMP_CMD_ERR (BIT(8) | BIT(24)) +#define SH_MOBILE_SDHI_SCC_SMPCMP_CMPNGU_DATA GENMASK(23, 16) +#define SH_MOBILE_SDHI_SCC_SMPCMP_CMPNGD_DATA GENMASK(7, 0) #define SH_MOBILE_SDHI_SCC_TMPPORT2_HS400OSEL BIT(4) #define SH_MOBILE_SDHI_SCC_TMPPORT2_HS400EN BIT(31) @@ -641,7 +644,14 @@ static int renesas_sdhi_select_tuning(struct tmio_mmc_host *host) * identifying the change point of data. */ if (bitmap_full(priv->taps, taps_size)) { - bitmap = priv->smpcmp; + /* + * On some setups it happens that all TAPS are OK but + * no change point of data. Any tap should be OK for this. + */ + if (bitmap_empty(priv->smpcmp, taps_size)) + bitmap = priv->taps; + else + bitmap = priv->smpcmp; min_tap_row = 1; } else { bitmap = priv->taps; @@ -706,11 +716,18 @@ static int renesas_sdhi_execute_tuning(struct mmc_host *mmc, u32 opcode) if (mmc_send_tuning(mmc, opcode, &cmd_error) == 0) set_bit(i, priv->taps); - if (sd_scc_read32(host, priv, SH_MOBILE_SDHI_SCC_SMPCMP) == 0) - set_bit(i, priv->smpcmp); - - if (cmd_error) + if (cmd_error) { mmc_send_abort_tuning(mmc, opcode); + } else { + u32 val, cmpngu_data, cmpngd_data; + + val = sd_scc_read32(host, priv, SH_MOBILE_SDHI_SCC_SMPCMP); + cmpngu_data = FIELD_GET(SH_MOBILE_SDHI_SCC_SMPCMP_CMPNGU_DATA, val); + cmpngd_data = FIELD_GET(SH_MOBILE_SDHI_SCC_SMPCMP_CMPNGD_DATA, val); + + if (cmpngu_data != cmpngd_data) + set_bit(i, priv->smpcmp); + } } ret = renesas_sdhi_select_tuning(host);