Message ID | 20220929232712.12202-1-Sergey.Semin@baikalelectronics.ru |
---|---|
Headers | show |
Series | EDAC/mc/synopsys: Various fixes and cleanups | expand |
On Fri, Sep 30, 2022 at 02:26:55AM +0300, Serge Semin wrote: > This patchset is a first one in the series created in the framework of > my Baikal-T1 DDRC-related work: > > [1: In-progress] EDAC/mc/synopsys: Various fixes and cleanups > Link: ---you are looking at it--- > [2: In-progress] EDAC/synopsys: Add generic DDRC info and address mapping > Link: https://lore.kernel.org/linux-edac/20220910195007.11027-1-Sergey.Semin@baikalelectronics.ru > [3: In-progress] EDAC/synopsys: Add generic resources and Baikal-T1 support > Link: https://lore.kernel.org/linux-edac/20220910195659.11843-1-Sergey.Semin@baikalelectronics.ru > > Note the patchsets above must be merged in the same order as they are > placed in the list in order to prevent conflicts. Nothing prevents them > from being reviewed synchronously though. Any tests are very welcome. > Thanks in advance. So I'd take a look slowly but I'd like for this driver's maintainer - Michal Simek - to have a look first. Thx.
On Fri, Sep 30, 2022 at 02:27:09AM +0300, Serge Semin wrote: > It was a bad idea in the first place to combine two absolutely different > controllers support in a single driver [1]. It caused having an additional > level of abstraction, which obviously have needlessly overcomplicated the > driver and as such caused many problems in the new main controller > features support implementation. The solution looks even more unreasonable > now seeing the justification of having both controllers support in a > single driver hasn't been implemented by the original code author [2]. Yeah, no, you need to give more concrete details here. Why exactly is this a problem? Are you saying that if synopsys puts out 10 incompatible memory controllers, we should have 10 separate EDAC drivers? Hell no. synopsys_edac.c is not a huge file and the probe logic which matches which synps_platform_data to load is pretty straight-forward to me. But maybe I'm missing something so please explain in detail what the actual problems with this are? Thx.
On Thu, Oct 06, 2022 at 03:25:42PM +0200, Borislav Petkov wrote: > On Thu, Oct 06, 2022 at 03:17:40PM +0300, Serge Semin wrote: > > In general because it needlessly overcomplicates the driver, worsen > > it scalability, maintainability and readability, makes it much harder > > to add new controller features. Moreover even if you still able to add > > some more features support the driver will get to be more and more messy > > (as Michal correctly said in the original thread [1]). > > Did you read that thread until the end? Of course I did. Here is a short summary: 1. @Punnaiah described that he got a Zynq system with two different DDR controllers. One of them was Synopsys DW uMCTL2 DDRC (ARM cortex A9 PS) and another one - Zynq A05 DDRC (Xilinx PL). All of these DDR controllers could be probed in Linux. It was required to detect the ECC errors for both of them. 2. @Punnaiah suggested that both of these controllers should have been handled by two different drivers since the controllers were different. But he was reasonably afraid that there could be a race condition for the mc_num assignment since it was the drivers responsibility to allocate one. 3. @Punnaiah suggested two solutions: 1) Keep two drivers in single file and use static global variable for tracking the mc_num. 2) Let the framework assign the mc_num to the edac driver. 4. @Punnaiah preferred the solution 2, but you said that the solution 1 would do it. 5. @Michal was against mixing up the support of two different devices in a single driver. He reasonably assumed that the driver would get to look messy. 6. @Boris disagreed saying that it won't be messy and any communication between the two memory controllers could be done internally in that driver. 7. So you all decided to keep both controllers support in a single file, but would get back to a discussion of having separate drivers for them later. But after all these years of the Synopsys EDAC driver living in kernel we can conclude that combining the two different devices support in a single driver was pointless. First by the driver design nobody ever used the driver to handle both of them at the same time (MC index is fixed to zero). Second there is no even a glimpse of communications between two devices in the system. Third there are two different set of macros and multiple platform-specific methods and if-flag-else statements fully abstracting out the devices implementation from the EDAC core. All of that has made the driver overcomplicated in-vain seeing none of the reasons of these complications have been realised. Meanwhile extending further the Synopsys uMCTL2 DDR controller functionality support would mean adding new macros, functions, platform-specific flags, parameters and device-specific data. That would have made the driver even more complicated. Dropping the abstraction layer out and splitting up the drivers was the best choice in the current circumstances especially seeing the driver author and @Michal preferred that solution in the first place. Moreover in order to cover a still possible case of having both Synopsys uMCTL2 DDRC and Zynq A05 DDRC used on the same system in future I have implemented the solution 2. > > > It will get to be messy since you'll need to add more if-flag-else > > conditionals or define new device-specific callbacks to select the > > right code path for different controllers; you'll need to define > > some private data specific to one controller and unused in another. > > All of that you already have in the driver and all of that would be > > unneeded should the driver author have followed the standard kernel > > bus-device-driver model. > > So lemme ask this again because the last time it all sounded weird and I > don't think it got clarified fully: you cannot have more than one memory > controller type at the same time on those systems, can you? To be clear I don't work with the Xilinx systems. So I can only say based on your discussion with @Punnaiah on the initial driver review. @Punnaiah said they could have more than one memory controller type on their systems. That's why he described the problem with the MC indexes allocation and suggested to combine the devices support in a single driver. > > Because if you can and you want to support that, the current EDAC > "design" allows to have only a single EDAC driver per system. So you > can't do two drivers now. I do understand that. > > If your answer to that is > > Subject: [PATCH RESEND v3 13/17] EDAC/mc: Add MC unique index allocation procedure > > then I'm sceptical but I'd need to look at that first. Em, if there is something else which makes the EDAC drivers to be impossible to co-exist on the same system, then it greatly violates the bus-device-driver model. > > And reading your commit messages, you're talking a lot about what you're > doing. But that should be visible from the patch. What I wanna read is > *why* you're doing it. Each patchlog has a thorough enough description of "why". > > Like in this patch above, what's that "unique index allocation > procedure" for? Have you read the patchlog? > > edac_mc_alloc() already gets a mc_num as the MC number. > > And yes, if you want to do multiple driver instances like x86 does per > NUMA node instances, then that is done with edac_mc_alloc() which gives > you a memory controller descriptor and then you can deal with those. > See the "EDAC/mc: Add MC unique index allocation procedure" patch. It also provides a way to assign the indexes based on the OF-aliases. > From all the text it sounds to me you want to add a separate driver for > that Zynq A05 thing but I might still be missing something... I do want to detach the Zynq A05 device support from out of the Synopsys uMCTL2 driver. BTW have you read the cover letter? It contains a short summary of the changes and their justification. It seems as if you started your review straight from this patch. -Sergey > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Sep 30, 2022 at 02:27:08AM +0300, Serge Semin wrote: > In case of the unique index allocation it's not that optimal to always > rely on the low-level device drivers (platform drivers), because they get > to start to implement either the same design pattern (for instance global > static MC counter) or may end-up with having non-unique index eventually > at runtime. Needless to say that having a generic unique index > allocation/tracking procedure will make code more readable and safer. I guess this is trying to say that the current memory controller index thing doesn't work. But why doesn't it work? It works just fine with the x86 drivers - there the memory controller number is the same as the node number where it is located so that works just fine. If that scheme cannot work on other systems, then I need to see an explanation why it cannot work first. > The suggested implementation is based on the kernel IDA infrastructure > exposed by the lib/idr.c driver with API described in linux/idr.h header > file. It's used to create an ID resource descriptor "mc_idr", which then > is utilized either to track the custom MC idx specified by EDAC LLDDs or > to allocate the next-free MC idx. This is talking about the "what" and not the "why". > A new special MC index is introduced here. It's defined by the > EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least > probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM > index is specified by the EDAC LLDD, the MC index will be either retrieved > from the MC device OF-node alias index ("mc[:number:]") or automatically > generated as the next-free MC index found by the ID allocation procedure. This is also talking about the "what" and not the "why". At best, what you're doing should be visible from the patch itself. Here's a longer explanation of how a commit message should look like: https://kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes Thx.
On Wed, Oct 12, 2022 at 07:28:48PM +0200, Borislav Petkov wrote: > On Sat, Oct 08, 2022 at 03:42:24AM +0300, Serge Semin wrote: > > Of course I did. Here is a short summary: > > You didn't have to do a short summary but sure, if you prefer... It seemed to me as I should have so to make sure we are on the page of the original discussion understanding. > > > 7. So you all decided to keep both controllers support in a single file, > > but would get back to a discussion of having separate drivers for them > > later. > > Yes, pretty much. > > > But after all these years of the Synopsys EDAC driver living in kernel > > we can conclude that combining the two different devices support in a > > single driver was pointless. First by the driver design nobody ever > > used the driver to handle both of them at the same time (MC index is > > fixed to zero). > > So how was this supposed to work on his system? How am I supposed to know? You should have asked him at the time of his patches review before accepting them. He (Punnaiah Choudary Kalluri) said they had the system with two different DDR controllers. Since the MC idx was supposed to be provided by the controller driver he suggested to have both devices support implemented in the same driver. You agreed. If no interaction intended between the two devices why did he need to combine their support in the same driver then? He could have as well just split the drivers up and didn't bother with all the discussions. > > If you have a system with two different memory controllers I don't. The Synopsys EDAC driver author (Punnaiah) did judging by what he said in your discussion. > and you want > to have two different EDAC drivers for each, then the whole EDAC core > code needs to be audited wrt concurrency and synchronizing access to > its facilities because I don't think it has ever supported more than a > single EDAC driver per system. Once again. What is driver-depended do you have in the EDAC core? Does it follow the bus-device-drivers model? I failed to find any inter-driver dependency (what could be seen for instance in the tty/serial subsystem). But I am not the subsystem maintainer after all. I found the possible races in the MC index registration and fixed it in the corresponding patch. But that was the just registration-specifics behavior which could be easily fixed in the same way as the most of the kernel subsystem do. You are worried about the concurrencies. Does the EDAC core have problems if there are several DDR devices of the same type probed? AFAICS it doesn't as long as the indexes are properly allocated by the driver. What is the problem with registering devices of different types? The error counters are the MC-data-specific, so are the sysfs-nodes. The EDAC MC error handler function doesn't touch any inter-device parts of the core. The registration procedure is protected by the mutex and RCU. So it seems as the EDAC core developer thought about having the devices being registered concurrently. > > And it has never needed to, at least not on x86 land. Which is > currently changing because of CXL, because of accelerators needing > RAS, GPUs needing RAS and so on and so on. So eventually we'd have to > either put the new RAS functionality in the existing chipset-specific > driver or have to go the multiple EDAC drivers route. But that's only > tangential... If it has never needed to, then please explain why did you let the Synopsys EDAC driver being accepted like that then? > > So first I'd like to hear what your use case is: single EDAC driver for > your particular Baical-T1 device or you need to support multiple EDAC > drivers. In my case it's a single EDAC driver per-chip. There can be several DDR-controllers installed on the same SoC, but all of them of the same type (Synopsys DW uMCTL2 v2.61a). > > If so, why? I don't have a system with different DDR controllers detected on the same SoC but Punnaiah Choudary Kalluri, original Synopsys EDAC driver developer, did. > > > Moreover in order to cover a still possible case of having both > > Synopsys uMCTL2 DDRC and Zynq A05 DDRC used on the same system in > > future I have implemented the solution 2. > > See above. > > > Em, if there is something else which makes the EDAC drivers to be > > impossible to co-exist on the same system, then it greatly violates > > the bus-device-driver model. > > If by that you mean the aspect of a driver associating with a device and > performing operations with it then why do you assume that EDAC drivers > have to adhere to that model? > > > Have you read the patchlog? > > Lemme reply to it directly. > > > BTW have you read the cover letter? It contains a short summary of the > > changes and their justification. > > Yes, I have read it and it contains a lot of unnecessary detail which > should be in the respective patches themselves. And I still don't know > exactly what *you* are trying to do, as I said above. > > A cover letter should contain a short executive summary explaining only > the goal of the patchset and then you can go into details if you prefer. > A reviewer should not have to dig into patch management details to know > what this patchset is trying to do. The main part of the letter starts exactly with the goals and then has text with more details of what is done and why. It is enough to get a notion regarding the patchset aim and content. Each patchlog starts with the problem description, the suggested solution and some details of the implementation I thought was required to add. Everything in accordance with the kernel patches standards. -Sergey > > A possible structure could be: > > Problem is A. > > It happens because of B. > > Fix it by doing C. > > (Potentially do D). > > Btw, when you're writing your commit messages, please use passive voice > in your commit message: no "we" or "I", etc, and describe your changes > in imperative mood. > > Thx. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 12, 2022 at 10:27:43PM +0300, Serge Semin wrote: > ... inter-device parts of the core. The registration procedure is > protected by the mutex and RCU. So it seems as the EDAC core developer Seems, schmeems. As I said already, EDAC has always had a single chipset-specific driver. Period. So if one needs to run more than one chipset-specific driver concurrently, then the whole code needs to be audited because this hasn't been done before. > If it has never needed to, then please explain why did you let the > Synopsys EDAC driver being accepted like that then? I think I already did. > In my case it's a single EDAC driver per-chip. There can be several > DDR-controllers installed on the same SoC, but all of them of the same > type (Synopsys DW uMCTL2 v2.61a). Good. I'll look at your patches as time allows.
On Wed, Oct 12, 2022 at 07:29:22PM +0200, Borislav Petkov wrote: > On Fri, Sep 30, 2022 at 02:27:08AM +0300, Serge Semin wrote: > > In case of the unique index allocation it's not that optimal to always > > rely on the low-level device drivers (platform drivers), because they get > > to start to implement either the same design pattern (for instance global > > static MC counter) or may end-up with having non-unique index eventually > > at runtime. Needless to say that having a generic unique index > > allocation/tracking procedure will make code more readable and safer. > > I guess this is trying to say that the current memory controller index > thing doesn't work. But why doesn't it work?
On Wed, Oct 12, 2022 at 11:01:54PM +0300, Serge Semin wrote: > The unified approach makes code indeed more readable in the platform > drivers and safer since they didn't have to bother with more coding. > See for instance the drivers with the static variable-based IDs > allocation. Which drivers? Concrete examples please. > Have you read it yourself? Yes. I even have improved it over the years. > Here is a short excerpt from there: > "Once the problem is established, describe what you are actually doing > about it in technical detail. It's important to describe the change > in plain English for the reviewer to verify that the code is behaving > as you intend it to." Maybe that part can be misunderstood: "describe what you're doing about it". That doesn't mean the text should explain what you're adding and how stuff is defined: "It's defined by the EDAC_AUTO_MC_NUM macro." I can see that from the diff. So let me try to explain to you what I'm expecting from commit messages in the EDAC tree: The commit message should explain *why* a change is being done so that, months, years from now, when you've gone on to do something else, people doing git archeology can actually figure out *why* this change was done. And the explanation in that commit message should be *complete* and should contain *all* necessary information to understand why this change was done. Your commit message is not explaining the problem. "In case of the unique index allocation it's not that optimal to always rely on the low-level device drivers (platform drivers)" That's your statement. That needs to have exact details so that people can look at that commit message, look at the code which *you* point them to in it and go, aha, that is the problem. "because they get to start to implement either the same design pattern (for instance global static MC counter) or may end-up with having non-unique index eventually at runtime." Who are they, exact pointers please. "The suggested implementation is based on the kernel IDA infrastructure exposed by the lib/idr.c driver with API described in linux/idr.h header file." That doesn't matter one bit for the change you're doing. You could have added it under the "---" line. "A new special MC index is introduced here. It's defined by the EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM index is specified by the EDAC LLDD, the MC index will be either retrieved from the MC device OF-node alias index ("mc[:number:]") or automatically generated as the next-free MC index found by the ID allocation procedure." Some of that paragraph should go over the function as a comment - not in the commit message as it pertains to what the function does and it would make a *lot* more sense there when someone tries to figure out what the function does instead of in the commit message. So, I'm still not convinced why do some EDAC drivers need unique MC identifiers, why the current scheme doesn't work and where it doesn't work.
On Wed, Oct 12, 2022 at 09:44:00PM +0200, Borislav Petkov wrote: > On Wed, Oct 12, 2022 at 10:27:43PM +0300, Serge Semin wrote: > > ... inter-device parts of the core. The registration procedure is > > protected by the mutex and RCU. So it seems as the EDAC core developer > > Seems, schmeems. As I said already, EDAC has always had a single > chipset-specific driver. Period. So if one needs to run more than one > chipset-specific driver concurrently, then the whole code needs to be > audited because this hasn't been done before. > > > If it has never needed to, then please explain why did you let the > > Synopsys EDAC driver being accepted like that then? > > I think I already did. Kind of. What you didn't explain was the driver-specific problem in the edac_mc core. What is the difference in the EDAC core handling two devices (including of difference types) on the same platform and handling the same devices each probed by two different drivers? (Consider the drivers are designed thread-safe and we are talking about the EDAC MC core.) > > > In my case it's a single EDAC driver per-chip. There can be several > > DDR-controllers installed on the same SoC, but all of them of the same > > type (Synopsys DW uMCTL2 v2.61a). > > Good. > > I'll look at your patches as time allows. Ok. Thanks in advance. -Sergey > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
This patchset is a first one in the series created in the framework of my Baikal-T1 DDRC-related work: [1: In-progress] EDAC/mc/synopsys: Various fixes and cleanups Link: ---you are looking at it--- [2: In-progress] EDAC/synopsys: Add generic DDRC info and address mapping Link: https://lore.kernel.org/linux-edac/20220910195007.11027-1-Sergey.Semin@baikalelectronics.ru [3: In-progress] EDAC/synopsys: Add generic resources and Baikal-T1 support Link: https://lore.kernel.org/linux-edac/20220910195659.11843-1-Sergey.Semin@baikalelectronics.ru Note the patchsets above must be merged in the same order as they are placed in the list in order to prevent conflicts. Nothing prevents them from being reviewed synchronously though. Any tests are very welcome. Thanks in advance. Regarding this series content. It's an initial patchset which traditionally provides various fixes, cleanups and modifications required for the more comfortable further features development. The main goal of it though is to detach the Xilinx Zynq A05 DDRC related code into the dedicated driver since first it has nothing to do with the Synopsys DW uMCTL2 DDR controller and second it will be a great deal obstacle on the way of extending the Synopsys-part functionality. The series starts with fixes patches, which in short concern the next aspects: touching the ZynqMP-specific CSRs on the Xilinx ZinqMP platform only, serializing an access to the ECCCLR register, adding correct memory devices type detection, setting a correct value to the mem_ctl_info.scrub_cap field, dropping an erroneous ADDRMAP[4] parsing and getting back a correct order of the ECC errors info detection procedure. Afterwards the patchset provides several cleanup patches required for the more coherent code splitting up (Xilinx Zynq A05 and Synopsys DW uMCTL2) so the provided modifications would be useful in both drivers. First we get to replace the platform resource manual IO-remapping with the devm_platform_ioremap_resource() method call. Secondly we suggest to drop: internal CE/UE errors counters, local to_mci() macros definition, some redundant ecc_error_info structure fields and redundant info from the error message, duplicated dimm->nr_pages debug printout and spaces from the MEM_TYPE flags declarations. (The later two updates concern the MCI core part.) Thirdly before splitting up the driver we need to add an unique MC index allocation infrastructure to the MCI core. It's required since after splitting the driver up we'll need to make sure both device types could be correctly probed on the same platform. Finally the Xilinx Zynq A05 part of the driver is moved out to a dedicated driver where it should been originally placed. After that the platform-specific setups API is removed from the Synopsys DW uMCTL2 DDRC driver since it's no longer required. Finally as the cherry on the cake we suggest to unify the DW uMCTL2 DDRC driver entities naming and replace the open-coded "shift/mask" patter with the kernel helpers like BIT/GENMASK/FIELD_x in there. It shall significantly improve the code readability. Link: https://lore.kernel.org/linux-edac/20220822190730.27277-1-Sergey.Semin@baikalelectronics.ru/ Changelog v2: - Move Synopsys DW uMCTL2 DDRC bindings file renaming to a separate patch. (@Krzysztof) - Introduce a new compatible string "snps,dw-umctl2-ddrc" matching the new DT-schema name. - Forgot to fix some of the prefix of the SYNPS_ZYNQMP_IRQ_REGS macro in several places. (@tbot) - Drop the no longer used "priv" pointer from the mc_init() function. (@tbot) - Include "linux/bitfield.h" header file to get the FIELD_GET macro definition. (@tbot) - Drop the already merged in patches: [PATCH 12/20] EDAC/mc: Replace spaces with tabs in memtype flags definition [PATCH 13/20] EDAC/mc: Drop duplicated dimm->nr_pages debug printout Link: https://lore.kernel.org/linux-edac/20220910194237.10142-1-Sergey.Semin@baikalelectronics.ru Changelog v3: - Drop the no longer used "priv" pointer from the mc_init() function. (@tbot) - Drop the merged in patches: [PATCH v2 14/19] dt-bindings: memory: snps: Detach Zynq DDRC controller support [PATCH v2 15/19] dt-bindings: memory: snps: Use more descriptive device name (@Krzysztof) Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru> Cc: Michail Ivanov <Michail.Ivanov@baikalelectronics.ru> Cc: Pavel Parkhomenko <Pavel.Parkhomenko@baikalelectronics.ru> Cc: Punnaiah Choudary Kalluri <punnaiah.choudary.kalluri@xilinx.com> Cc: Manish Narani <manish.narani@xilinx.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Robert Richter <rric@kernel.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org> Cc: Rob Herring <robh@kernel.org> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Cc: devicetree@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac@vger.kernel.org Cc: linux-kernel@vger.kernel.org Serge Semin (17): EDAC/synopsys: Fix native uMCTL2 IRQs handling procedure EDAC/synopsys: Fix generic device type detection procedure EDAC/synopsys: Fix mci->scrub_cap field setting EDAC/synopsys: Drop erroneous ADDRMAP4.addrmap_col_b10 parse EDAC/synopsys: Fix reading errors count before ECC status EDAC/synopsys: Use platform device devm ioremap method EDAC/synopsys: Drop internal CE and UE counters EDAC/synopsys: Drop local to_mci macro implementation EDAC/synopsys: Drop struct ecc_error_info.blknr field EDAC/synopsys: Shorten out struct ecc_error_info.bankgrpnr field name EDAC/synopsys: Drop redundant info from error message EDAC/mc: Init DIMM labels in MC registration method EDAC/mc: Add MC unique index allocation procedure EDAC/synopsys: Detach Zynq DDRC controller support EDAC/synopsys: Drop unused platform-specific setup API EDAC/synopsys: Unify the driver entities naming EDAC/synopsys: Convert to using BIT/GENMASK/FIELD_x macros MAINTAINERS | 1 + drivers/edac/Kconfig | 9 +- drivers/edac/Makefile | 1 + drivers/edac/edac_mc.c | 135 +++++- drivers/edac/edac_mc.h | 4 + drivers/edac/synopsys_edac.c | 903 ++++++++++++----------------------- drivers/edac/zynq_edac.c | 501 +++++++++++++++++++ 7 files changed, 927 insertions(+), 627 deletions(-) create mode 100644 drivers/edac/zynq_edac.c