diff mbox series

ufs: Unrecoverable UFSHCD_UIC_DL_TCx_REPLAY_ERROR after some write operations

Message ID Ax8WZP5mKeS4D-elLrXw3AUz9iT7Dgfh-VVWbSOiypPXlS9KRr6GFXhrlf2v6QrhWeFtheX0gYhM-zGawalUidf1noc1gE-TvinLrKryibc=@protonmail.com
State New
Headers show
Series ufs: Unrecoverable UFSHCD_UIC_DL_TCx_REPLAY_ERROR after some write operations | expand

Commit Message

Yassine Oudjana Oct. 29, 2020, 11:46 a.m. UTC
I'm trying to get the mainline kernel working on a Xiaomi Mi Note 2 with MSM8996Pro and
Samsung KLUCG4J1CB-B0B1, but I'm getting UFSHCD_UIC_DL_TCx_REPLAY_ERROR after some write operations.

I'm not certain what kind of write is causing it, but it never happens when root is set to be
mounted read-only, so I don't think it's reading that causes it. However, I was able to write dmesg
without any errors by using an initramfs hook, except that is before I get the error.

As soon as it happens, it never recovers from it. It tries to reset, but it just gets the error
again after resetting.

UFS reset isn't defined currently in the msm8996 device tree, so I defined it:
diff mbox series

Patch

--- a/arch/arm64/boot/dts/qcom/msm8996.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi
@@ -1100,6 +1100,8 @@  ufshc: ufshc@624000 {

                        lanes-per-direction = <1>;
                        #reset-cells = <1>;
+                       resets = <&gcc GCC_UFS_BCR>;
+                       reset-names = "rst";
                        status = "disabled";

                        ufs_variant {

When I did that, PHY poweron started to fail (qcom_qmp_phy_power_on in the PHY driver was
timing out). I looked into the downstream kernel for this device, and found out that it calibrates
the PHY while the reset is asserted. So I did this and was able to "fix"(?) it:

--- a/drivers/scsi/ufs/ufs-qcom.c
+++ b/drivers/scsi/ufs/ufs-qcom.c
@@ -292,11 +294,9 @@  static int ufs_qcom_power_up_sequence(struct ufs_hba *hba)
        bool is_rate_B = (UFS_QCOM_LIMIT_HS_RATE == PA_HS_MODE_B)
                                                        ? true : false;

-       /* Reset UFS Host Controller and PHY */
-       ret = ufs_qcom_host_reset(hba);
-       if (ret)
-               dev_warn(hba->dev, "%s: host reset returned %d\n",
-                                 __func__, ret);
+       ufs_qcom_assert_reset(hba);
+       /* provide 1ms delay to let the reset pulse propagate. */
+       usleep_range(1000, 1100);

        if (is_rate_B)
                phy_set_mode(phy, PHY_MODE_UFS_HS_B);
@@ -309,11 +309,19 @@  static int ufs_qcom_power_up_sequence(struct ufs_hba *hba)
                goto out;
        }

+       ufs_qcom_deassert_reset(hba);
+       /*
+        * after reset deassertion, phy will need all ref clocks,
+        * voltage, current to settle down before starting serdes.
+        */
+       usleep_range(1000, 1100);
+
        /* power on phy - start serdes and phy's power and clocks */
        ret = phy_power_on(phy);

Note that using reset_control_de/assert(host->core_reset) instead of ufs_qcom_de/assert_reset(hba)
didn't work. I thought it was only about the order of operations (PHY calibration before deasserting
the reset), but it seems like there's more to it.

Now phy poweron succeeds again, but it still isn't able to recover from
UFSHCD_UIC_DL_TCx_REPLAY_ERROR when it happens.

I also found some differences in the PHY calibration tables in the downstream kernel, so I changed
them here:

--- a/drivers/phy/qualcomm/phy-qcom-qmp.c
+++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
@@ -461,22 +461,22 @@  static const struct qmp_phy_init_tbl msm8998_pcie_pcs_tbl[] = {
 static const struct qmp_phy_init_tbl msm8996_ufs_serdes_tbl[] = {
        QMP_PHY_INIT_CFG(QPHY_POWER_DOWN_CONTROL, 0x01),
        QMP_PHY_INIT_CFG(QSERDES_COM_CMN_CONFIG, 0x0e),
-       QMP_PHY_INIT_CFG(QSERDES_COM_SYSCLK_EN_SEL, 0xd7),
+       QMP_PHY_INIT_CFG(QSERDES_COM_SYSCLK_EN_SEL, 0x14),
        QMP_PHY_INIT_CFG(QSERDES_COM_CLK_SELECT, 0x30),
-       QMP_PHY_INIT_CFG(QSERDES_COM_SYS_CLK_CTRL, 0x06),
+       QMP_PHY_INIT_CFG(QSERDES_COM_SYS_CLK_CTRL, 0x02),
        QMP_PHY_INIT_CFG(QSERDES_COM_BIAS_EN_CLKBUFLR_EN, 0x08),
        QMP_PHY_INIT_CFG(QSERDES_COM_BG_TIMER, 0x0a),
-       QMP_PHY_INIT_CFG(QSERDES_COM_HSCLK_SEL, 0x05),
+       QMP_PHY_INIT_CFG(QSERDES_COM_HSCLK_SEL, 0x00),
        QMP_PHY_INIT_CFG(QSERDES_COM_CORECLK_DIV, 0x0a),
        QMP_PHY_INIT_CFG(QSERDES_COM_CORECLK_DIV_MODE1, 0x0a),
        QMP_PHY_INIT_CFG(QSERDES_COM_LOCK_CMP_EN, 0x01),
-       QMP_PHY_INIT_CFG(QSERDES_COM_VCO_TUNE_CTRL, 0x10),
+       QMP_PHY_INIT_CFG(QSERDES_COM_VCO_TUNE_CTRL, 0x00),
        QMP_PHY_INIT_CFG(QSERDES_COM_RESETSM_CNTRL, 0x20),
        QMP_PHY_INIT_CFG(QSERDES_COM_CORE_CLK_EN, 0x00),
        QMP_PHY_INIT_CFG(QSERDES_COM_LOCK_CMP_CFG, 0x00),
        QMP_PHY_INIT_CFG(QSERDES_COM_VCO_TUNE_TIMER1, 0xff),
        QMP_PHY_INIT_CFG(QSERDES_COM_VCO_TUNE_TIMER2, 0x3f),
-       QMP_PHY_INIT_CFG(QSERDES_COM_VCO_TUNE_MAP, 0x54),
+       QMP_PHY_INIT_CFG(QSERDES_COM_VCO_TUNE_MAP, 0x44),
        QMP_PHY_INIT_CFG(QSERDES_COM_SVS_MODE_CLK_SEL, 0x05),
        QMP_PHY_INIT_CFG(QSERDES_COM_DEC_START_MODE0, 0x82),
        QMP_PHY_INIT_CFG(QSERDES_COM_DIV_FRAC_START1_MODE0, 0x00),
@@ -510,21 +510,40 @@  static const struct qmp_phy_init_tbl msm8996_ufs_serdes_tbl[] = {

 static const struct qmp_phy_init_tbl msm8996_ufs_tx_tbl[] = {
        QMP_PHY_INIT_CFG(QSERDES_TX_HIGHZ_TRANSCEIVEREN_BIAS_DRVR_EN, 0x45),
-       QMP_PHY_INIT_CFG(QSERDES_TX_LANE_MODE, 0x02),
+       QMP_PHY_INIT_CFG(QSERDES_TX_LANE_MODE, 0x06),
 };

 static const struct qmp_phy_init_tbl msm8996_ufs_rx_tbl[] = {
        QMP_PHY_INIT_CFG(QSERDES_RX_SIGDET_LVL, 0x24),
-       QMP_PHY_INIT_CFG(QSERDES_RX_SIGDET_CNTRL, 0x02),
-       QMP_PHY_INIT_CFG(QSERDES_RX_RX_INTERFACE_MODE, 0x00),
-       QMP_PHY_INIT_CFG(QSERDES_RX_SIGDET_DEGLITCH_CNTRL, 0x18),
-       QMP_PHY_INIT_CFG(QSERDES_RX_UCDR_FASTLOCK_FO_GAIN, 0x0B),
+       QMP_PHY_INIT_CFG(QSERDES_RX_SIGDET_CNTRL, 0x0f),
+       QMP_PHY_INIT_CFG(QSERDES_RX_RX_INTERFACE_MODE, 0x40),
+       QMP_PHY_INIT_CFG(QSERDES_RX_SIGDET_DEGLITCH_CNTRL, 0x1e),
+       QMP_PHY_INIT_CFG(QSERDES_RX_UCDR_FASTLOCK_FO_GAIN, 0x0b),
        QMP_PHY_INIT_CFG(QSERDES_RX_RX_TERM_BW, 0x5b),
        QMP_PHY_INIT_CFG(QSERDES_RX_RX_EQ_GAIN1_LSB, 0xff),
        QMP_PHY_INIT_CFG(QSERDES_RX_RX_EQ_GAIN1_MSB, 0x3f),
        QMP_PHY_INIT_CFG(QSERDES_RX_RX_EQ_GAIN2_LSB, 0xff),
-       QMP_PHY_INIT_CFG(QSERDES_RX_RX_EQ_GAIN2_MSB, 0x0f),
-       QMP_PHY_INIT_CFG(QSERDES_RX_RX_EQU_ADAPTOR_CNTRL2, 0x0E),
+       QMP_PHY_INIT_CFG(QSERDES_RX_RX_EQ_GAIN2_MSB, 0x3f),
+       QMP_PHY_INIT_CFG(QSERDES_RX_RX_EQU_ADAPTOR_CNTRL2, 0x0d),
};

That didn't change anything either.

I noticed that in downstream there are 3 slightly different calibration tables for host versions
2.0.0, 2.1.0, and 2.2.0 which this device has.
Also, I noticed that QSERDES_COM_VCO_TUNE_MAP differs between rate A and B, where it's set to 0x14
for rate A, and to 0x54 for rate B. This was done on mainline too until the 14nm-specific QMP PHY
driver and others were combined into one driver (phy-qcom-qmp.c), and after then only the rate B
value 0x54 is used always.
Those aren't necessarily related to this issue, but I thought I'd mention them since I noticed them
on my way.

What else could be causing this?

Using the mainline tree at 4525c8781ec0701ce824e8bd379ae1b129e26568