Message ID | 20220929232712.12202-2-Sergey.Semin@baikalelectronics.ru |
---|---|
State | New |
Headers | show |
Series | EDAC/mc/synopsys: Various fixes and cleanups | expand |
On Fri, Sep 30, 2022 at 02:26:56AM +0300, Serge Semin wrote: > The generic DW uMCTL2 DDRC v3.x support was added in commit f7824ded4149 > ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR"). It > hasn't been done quiet well there with respect to the IRQs handling > procedure. An attempt to fix that was introduced in the recent commit > 4bcffe941758 ("EDAC/synopsys: Re-enable the error interrupts on v3 hw"). > Alas again it didn't provide quite complete solution. Because? Btw, for the future, you should make sure you add those commit authors to Cc so that they can get a chance to comment. Adding the folks from that commit to Cc. > First of all the commit f7824ded4149 ("EDAC/synopsys: Add support for > version 3 of the Synopsys EDAC DDR") log says that v3.80a "has UE/CE auto > cleared". They aren't in none of the IP-core versions. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ What does that sentence mean exactly? The UE/CE auto clearing functionality is not in that silicon? > The IRQ status can be cleared by means of setting the ECCCLR/ECCCTL > register self-cleared flags 0-3. I'm guessing that's this reg: /* ECC control register */ #define ECC_CTRL_OFST 0xC4 ? > The pending IRQ clearance is done in the respective get_error_info() > method of the driver. Thus defining a quirk flag with the > "DDR_ECC_INTR_SELF_CLEAR" name was at least very inaccurate if not to > say misleading. > > So was adding the comments about the "ce/ue bits automatically > cleared". Aah, you mean that the ->get_error_info() functions are doing the clearing even if something should be doing self clearing. And that DDR_ECC_INTR_SELF_CLEAR thing is queried when enabling the error interrupt which is just bad naming because it looks like that quirk controls what register to write/read. > Second, disabling the being handled IRQ in the handler doesn't make sense > in Linux since the IC line is masked during that procedure anyway. So > disabling the IRQ in one part of the handler and enabling it at the end of > the method is simply redundant. (See, the ZynqMP-specific code with the > QoS IRQ CSR didn't do that originally.) So what is this commit message of 4bcffe941758 ("EDAC/synopsys: Re-enable the error interrupts on v3 hw") then talking about: "Then the interrupt handler will be called only once." How is that interrupt supposed to be reenabled? > Finally calling the zynqmp_get_error_info() method concurrently with the > enable_irq()/disable_irq() functions causes the IRQs mask state race > condition. Starting from DW uMCTL2 DDRC IP-core v3.10a [1] the ECCCLR > register has been renamed to ECCCTL and has been equipped with CE/UE IRQs > enable/disable flags [2]. Aha, that sounds like the right thing to toggle. > So the CSR now serves for the IRQ status and control functions used > concurrently during the IRQ handling and the IRQ disabling/enabling. > Thus the corresponding critical section must be protected with the > IRQ-safe spin-lock. > So let's fix all the problems noted above. First the > DDR_ECC_INTR_SELF_CLEAR flag is renamed to SYNPS_ZYNQMP_IRQ_REGS. "Describe your changes in imperative mood, e.g. "make xyzzy do frotz" instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy to do frotz", as if you are giving orders to the codebase to change its behaviour." In this case, pls formulate it something like this: "So fix all these problems noted above: rename DDR_ECC_INTR_SELF_CLEAR to SYNPS_ZYNQMP_IRQ_REGS to denote that, ..." And so on. > Its semantic is now the opposite: the quirk means having the ZynqMP > IRQ CSRs available on the platform. Yes, that makes more sense. > Second the DDR_UE_MASK and DDR_CE_MASK macros > are renamed to imply being used in the framework of the ECCCLR/ECCCTL CSRs > accesses. Third all the misleading comments are removed. Finally the > ECC_CLR_OFST register IOs are now protected with the IRQ-safe spin-lock > taken in order to prevent the IRQ status clearance and IRQ enable/disable > race condition. > > [1] DesignWare Cores Enhanced Universal DDR Memory and Protocol > Controllers (uMCTL2/uPCTL2), Release Notes, Version 3.91a, October 2020, > p. 27. > [2] DesignWare® Cores Enhanced Universal DDR Memory Controller (uMCTL2), > Databook Version 3.91a, October 2020, p.818-819. If those are not publicly accessible, then there's no point to put them in here. > Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR") > Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Does this need to go to stable@ and thus older kernels? > --- > drivers/edac/synopsys_edac.c | 76 +++++++++++++++++++++++------------- > 1 file changed, 48 insertions(+), 28 deletions(-) > @@ -300,6 +299,7 @@ struct synps_ecc_status { > /** > * struct synps_edac_priv - DDR memory controller private instance data. > * @baseaddr: Base address of the DDR controller. > + * @lock: Concurrent CSRs access lock. > * @message: Buffer for framing the event specific info. > * @stat: ECC status information. > * @p_data: Platform data. > @@ -314,6 +314,7 @@ struct synps_ecc_status { > */ > struct synps_edac_priv { > void __iomem *baseaddr; > + spinlock_t lock; That lock needs to be named properly and have a comment above it what it protects. > char message[SYNPS_EDAC_MSG_SIZE]; > struct synps_ecc_status stat; > const struct synps_platform_data *p_data; ... > @@ -516,24 +523,42 @@ static void handle_error(struct mem_ctl_info *mci, struct synps_ecc_status *p) > > static void enable_intr(struct synps_edac_priv *priv) > { > + unsigned long flags; > + > /* Enable UE/CE Interrupts */ > - if (priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR) > - writel(DDR_UE_MASK | DDR_CE_MASK, > - priv->baseaddr + ECC_CLR_OFST); > - else > + if (priv->p_data->quirks & SYNPS_ZYNQMP_IRQ_REGS) { > writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK, > priv->baseaddr + DDR_QOS_IRQ_EN_OFST); > > + return; > + } > + > + /* IRQs Enable/Disable feature has been available since v3.10a */ How does this comment help here? If it is available since a version number, why doesn't the below check a version number? IOW, what is the relevance of that comment here? In any case, I need to hear from this driver's maintainer too. Thx.
diff --git a/drivers/edac/synopsys_edac.c b/drivers/edac/synopsys_edac.c index f7d37c282819..c78fb5781ff9 100644 --- a/drivers/edac/synopsys_edac.c +++ b/drivers/edac/synopsys_edac.c @@ -9,6 +9,7 @@ #include <linux/edac.h> #include <linux/module.h> #include <linux/platform_device.h> +#include <linux/spinlock.h> #include <linux/interrupt.h> #include <linux/of.h> #include <linux/of_device.h> @@ -88,7 +89,7 @@ /* DDR ECC Quirks */ #define DDR_ECC_INTR_SUPPORT BIT(0) #define DDR_ECC_DATA_POISON_SUPPORT BIT(1) -#define DDR_ECC_INTR_SELF_CLEAR BIT(2) +#define SYNPS_ZYNQMP_IRQ_REGS BIT(2) /* ZynqMP Enhanced DDR memory controller registers that are relevant to ECC */ /* ECC Configuration Registers */ @@ -133,11 +134,13 @@ #define ECC_ADDRMAP0_OFFSET 0x200 -/* Control register bitfield definitions */ +/* ECC control/clear register definitions */ #define ECC_CTRL_BUSWIDTH_MASK 0x3000 #define ECC_CTRL_BUSWIDTH_SHIFT 12 #define ECC_CTRL_CLR_CE_ERRCNT BIT(2) #define ECC_CTRL_CLR_UE_ERRCNT BIT(3) +#define ECC_CTRL_EN_CE_IRQ BIT(8) +#define ECC_CTRL_EN_UE_IRQ BIT(9) /* DDR Control Register width definitions */ #define DDRCTL_EWDTH_16 2 @@ -164,10 +167,6 @@ #define DDR_QOS_IRQ_EN_OFST 0x20208 #define DDR_QOS_IRQ_DB_OFST 0x2020C -/* DDR QOS Interrupt register definitions */ -#define DDR_UE_MASK BIT(9) -#define DDR_CE_MASK BIT(8) - /* ECC Corrected Error Register Mask and Shifts*/ #define ECC_CEADDR0_RW_MASK 0x3FFFF #define ECC_CEADDR0_RNK_MASK BIT(24) @@ -300,6 +299,7 @@ struct synps_ecc_status { /** * struct synps_edac_priv - DDR memory controller private instance data. * @baseaddr: Base address of the DDR controller. + * @lock: Concurrent CSRs access lock. * @message: Buffer for framing the event specific info. * @stat: ECC status information. * @p_data: Platform data. @@ -314,6 +314,7 @@ struct synps_ecc_status { */ struct synps_edac_priv { void __iomem *baseaddr; + spinlock_t lock; char message[SYNPS_EDAC_MSG_SIZE]; struct synps_ecc_status stat; const struct synps_platform_data *p_data; @@ -409,7 +410,8 @@ static int zynq_get_error_info(struct synps_edac_priv *priv) static int zynqmp_get_error_info(struct synps_edac_priv *priv) { struct synps_ecc_status *p; - u32 regval, clearval = 0; + u32 regval, clearval; + unsigned long flags; void __iomem *base; base = priv->baseaddr; @@ -452,11 +454,16 @@ static int zynqmp_get_error_info(struct synps_edac_priv *priv) ECC_CEADDR1_BNKNR_SHIFT; p->ueinfo.blknr = (regval & ECC_CEADDR1_BLKNR_MASK); p->ueinfo.data = readl(base + ECC_UESYND0_OFST); + out: - clearval = ECC_CTRL_CLR_CE_ERR | ECC_CTRL_CLR_CE_ERRCNT; - clearval |= ECC_CTRL_CLR_UE_ERR | ECC_CTRL_CLR_UE_ERRCNT; + spin_lock_irqsave(&priv->lock, flags); + + clearval = readl(base + ECC_CLR_OFST) | + ECC_CTRL_CLR_CE_ERR | ECC_CTRL_CLR_CE_ERRCNT | + ECC_CTRL_CLR_UE_ERR | ECC_CTRL_CLR_UE_ERRCNT; writel(clearval, base + ECC_CLR_OFST); - writel(0x0, base + ECC_CLR_OFST); + + spin_unlock_irqrestore(&priv->lock, flags); return 0; } @@ -516,24 +523,42 @@ static void handle_error(struct mem_ctl_info *mci, struct synps_ecc_status *p) static void enable_intr(struct synps_edac_priv *priv) { + unsigned long flags; + /* Enable UE/CE Interrupts */ - if (priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR) - writel(DDR_UE_MASK | DDR_CE_MASK, - priv->baseaddr + ECC_CLR_OFST); - else + if (priv->p_data->quirks & SYNPS_ZYNQMP_IRQ_REGS) { writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK, priv->baseaddr + DDR_QOS_IRQ_EN_OFST); + return; + } + + /* IRQs Enable/Disable feature has been available since v3.10a */ + spin_lock_irqsave(&priv->lock, flags); + + writel(ECC_CTRL_EN_CE_IRQ | ECC_CTRL_EN_UE_IRQ, + priv->baseaddr + ECC_CLR_OFST); + + spin_unlock_irqrestore(&priv->lock, flags); } static void disable_intr(struct synps_edac_priv *priv) { + unsigned long flags; + /* Disable UE/CE Interrupts */ - if (priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR) - writel(0x0, priv->baseaddr + ECC_CLR_OFST); - else + if (priv->p_data->quirks & SYNPS_ZYNQMP_IRQ_REGS) { writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK, priv->baseaddr + DDR_QOS_IRQ_DB_OFST); + + return; + } + + spin_lock_irqsave(&priv->lock, flags); + + writel(0, priv->baseaddr + ECC_CLR_OFST); + + spin_unlock_irqrestore(&priv->lock, flags); } /** @@ -553,11 +578,7 @@ static irqreturn_t intr_handler(int irq, void *dev_id) priv = mci->pvt_info; p_data = priv->p_data; - /* - * v3.0 of the controller has the ce/ue bits cleared automatically, - * so this condition does not apply. - */ - if (!(priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR)) { + if (priv->p_data->quirks & SYNPS_ZYNQMP_IRQ_REGS) { regval = readl(priv->baseaddr + DDR_QOS_IRQ_STAT_OFST); regval &= (DDR_QOSCE_MASK | DDR_QOSUE_MASK); if (!(regval & ECC_CE_UE_INTR_MASK)) @@ -574,11 +595,9 @@ static irqreturn_t intr_handler(int irq, void *dev_id) edac_dbg(3, "Total error count CE %d UE %d\n", priv->ce_cnt, priv->ue_cnt); - /* v3.0 of the controller does not have this register */ - if (!(priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR)) + + if (priv->p_data->quirks & SYNPS_ZYNQMP_IRQ_REGS) writel(regval, priv->baseaddr + DDR_QOS_IRQ_STAT_OFST); - else - enable_intr(priv); return IRQ_HANDLED; } @@ -900,7 +919,7 @@ static const struct synps_platform_data zynqmp_edac_def = { .get_mtype = zynqmp_get_mtype, .get_dtype = zynqmp_get_dtype, .get_ecc_state = zynqmp_get_ecc_state, - .quirks = (DDR_ECC_INTR_SUPPORT + .quirks = (DDR_ECC_INTR_SUPPORT | SYNPS_ZYNQMP_IRQ_REGS #ifdef CONFIG_EDAC_DEBUG | DDR_ECC_DATA_POISON_SUPPORT #endif @@ -912,7 +931,7 @@ static const struct synps_platform_data synopsys_edac_def = { .get_mtype = zynqmp_get_mtype, .get_dtype = zynqmp_get_dtype, .get_ecc_state = zynqmp_get_ecc_state, - .quirks = (DDR_ECC_INTR_SUPPORT | DDR_ECC_INTR_SELF_CLEAR + .quirks = (DDR_ECC_INTR_SUPPORT #ifdef CONFIG_EDAC_DEBUG | DDR_ECC_DATA_POISON_SUPPORT #endif @@ -1360,6 +1379,7 @@ static int mc_probe(struct platform_device *pdev) priv = mci->pvt_info; priv->baseaddr = baseaddr; priv->p_data = p_data; + spin_lock_init(&priv->lock); mc_init(mci, pdev);
The generic DW uMCTL2 DDRC v3.x support was added in commit f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR"). It hasn't been done quiet well there with respect to the IRQs handling procedure. An attempt to fix that was introduced in the recent commit 4bcffe941758 ("EDAC/synopsys: Re-enable the error interrupts on v3 hw"). Alas again it didn't provide quite complete solution. First of all the commit f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR") log says that v3.80a "has UE/CE auto cleared". They aren't in none of the IP-core versions. The IRQ status can be cleared by means of setting the ECCCLR/ECCCTL register self-cleared flags 0-3. The pending IRQ clearance is done in the respective get_error_info() method of the driver. Thus defining a quirk flag with the "DDR_ECC_INTR_SELF_CLEAR" name was at least very inaccurate if not to say misleading. So was adding the comments about the "ce/ue bits automatically cleared". Second, disabling the being handled IRQ in the handler doesn't make sense in Linux since the IC line is masked during that procedure anyway. So disabling the IRQ in one part of the handler and enabling it at the end of the method is simply redundant. (See, the ZynqMP-specific code with the QoS IRQ CSR didn't do that originally.) Finally calling the zynqmp_get_error_info() method concurrently with the enable_irq()/disable_irq() functions causes the IRQs mask state race condition. Starting from DW uMCTL2 DDRC IP-core v3.10a [1] the ECCCLR register has been renamed to ECCCTL and has been equipped with CE/UE IRQs enable/disable flags [2]. So the CSR now serves for the IRQ status and control functions used concurrently during the IRQ handling and the IRQ disabling/enabling. Thus the corresponding critical section must be protected with the IRQ-safe spin-lock. So let's fix all the problems noted above. First the DDR_ECC_INTR_SELF_CLEAR flag is renamed to SYNPS_ZYNQMP_IRQ_REGS. Its semantic is now the opposite: the quirk means having the ZynqMP IRQ CSRs available on the platform. Second the DDR_UE_MASK and DDR_CE_MASK macros are renamed to imply being used in the framework of the ECCCLR/ECCCTL CSRs accesses. Third all the misleading comments are removed. Finally the ECC_CLR_OFST register IOs are now protected with the IRQ-safe spin-lock taken in order to prevent the IRQ status clearance and IRQ enable/disable race condition. [1] DesignWare Cores Enhanced Universal DDR Memory and Protocol Controllers (uMCTL2/uPCTL2), Release Notes, Version 3.91a, October 2020, p. 27. [2] DesignWare® Cores Enhanced Universal DDR Memory Controller (uMCTL2), Databook Version 3.91a, October 2020, p.818-819. Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR") Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> --- drivers/edac/synopsys_edac.c | 76 +++++++++++++++++++++++------------- 1 file changed, 48 insertions(+), 28 deletions(-)