Message ID | 20210205222644.2357303-1-saravanak@google.com |
---|---|
Headers | show |
Series | Make fw_devlink=on more forgiving | expand |
On Fri, Feb 5, 2021 at 2:26 PM Saravana Kannan <saravanak@google.com> wrote: > > There are a lot of devices/drivers where they never have a struct device > created for them or the driver initializes the hardware without ever > binding to the struct device. > > This series is intended to avoid any boot regressions due to such > devices/drivers when fw_devlink=on and also address the handling of > optional suppliers. > > Patch 1 and 2 addresses the issue of firmware nodes that look like > they'll have struct devices created for them, but will never actually > have struct devices added for them. For example, DT nodes with a > compatible property that don't have devices added for them. > > Patch 3 and 4 allow for handling optional DT bindings. > > Patch 5 sets up a generic API to handle drivers that never bind with > their devices. > > Patch 6 through 8 update different frameworks to use the new API. > > Thanks, > Saravana > Forgot to add version history: v1 -> v2: Patch 1: Added a flag to fwnodes that aren't devices. Patch 3: New patch to ise the flag set in patch 1 to not create bad links. v2 -> v3: - Patch 1: Added Rafael's Ack - New patches 3 and 4 v3 -> v4: - No changes to patches 1-4. - New patches 5-8. -Saravana > Saravana Kannan (8): > driver core: fw_devlink: Detect supplier devices that will never be > added > of: property: Don't add links to absent suppliers > driver core: Add fw_devlink.strict kernel param > of: property: Add fw_devlink support for optional properties > driver core: fw_devlink: Handle suppliers that don't use driver core > irqdomain: Mark fwnodes when their irqdomain is added/removed > PM: domains: Mark fwnodes when their powerdomain is added/removed > clk: Mark fwnodes when their clock provider is added/removed > > .../admin-guide/kernel-parameters.txt | 5 ++ > drivers/base/core.c | 58 ++++++++++++++++++- > drivers/base/power/domain.c | 2 + > drivers/clk/clk.c | 3 + > drivers/of/property.c | 16 +++-- > include/linux/fwnode.h | 20 ++++++- > kernel/irq/irqdomain.c | 2 + > 7 files changed, 98 insertions(+), 8 deletions(-) > > -- > 2.30.0.478.g8a0d178c01-goog >
On Sat, Feb 6, 2021 at 11:41 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > Hi Saravana, > > On Fri, Feb 5, 2021 at 11:26 PM Saravana Kannan <saravanak@google.com> wrote: > > There are a lot of devices/drivers where they never have a struct device > > created for them or the driver initializes the hardware without ever > > binding to the struct device. > > > > This series is intended to avoid any boot regressions due to such > > devices/drivers when fw_devlink=on and also address the handling of > > optional suppliers. > > Thanks for your series! > > > Patch 5 sets up a generic API to handle drivers that never bind with > > their devices. > > > > Patch 6 through 8 update different frameworks to use the new API. > > > driver core: fw_devlink: Handle suppliers that don't use driver core > > irqdomain: Mark fwnodes when their irqdomain is added/removed > > PM: domains: Mark fwnodes when their powerdomain is added/removed > > clk: Mark fwnodes when their clock provider is added/removed > > I take it this is an automatic alternative for letting drivers set the > OF_POPULATED flag manually? The frameworks can still continue setting it to avoid creating dead "struct devices" that'll never be used. This new flag handles cases where the device is already created, but will never bind to a driver. So, they are meant to do slightly different things, but the end result is removing the need for individual drivers to set OF_POPULATED (and Rob hates that too). > Is this actually safe? It's not uncommon for a driver to register > multiple providers, sometimes even of different types (clock, genpd, > irq, reset[1], ...). This flag is just an indication that the fwnode has been initialized by a driver. It's okay if the flag gets set multiple times when a driver is registering with multiple frameworks. It's also okay if the flag is cleared multiple times as the driver is uninitializing the hardware (although, this is very unlikely for drivers that don't use device-driver model). When we actually try to create device links, we just check if this happened without a driver actually binding to this device. There's no "probing" race because the "status" I check goes through NO_DRIVER -> PROBING -(registering happens)-> BOUND -> UNBINDING -(deregistering happens) -> NO_DRIVER. So if the fwnode flag is getting set as part of the driver's probe function, the "status" value will never be NO_DRIVER. > Can you be sure consumer drivers do not start probing while their > dependency is still busy registering providers? The code only acts on that flag when trying to create device links from the consumer to the supplier. This is just a way to tell "hey, don't bother creating a device link, this supplier will never bind". So it just avoids blocking the consumer. Doesn't really make the consumers probe earlier than they would have. > [1] Which brings my attention to the fact that devlink does not consider > "resets" properties yet. > Yeah, we can add that and other bindings as we go. -Saravana
On Fri, Feb 5, 2021 at 4:27 PM Saravana Kannan <saravanak@google.com> wrote: > > This allows fw_devlink to recognize clock provider drivers that don't > use the device-driver model to initialize the device. fw_devlink will > use this information to make sure consumers of such clock providers > aren't indefinitely blocked from probing, waiting for the power domain > device to appear and bind to a driver. Don't we have cases that are a mixture? IOW, a subset of the clock provider is initialized early, then the full driver takes over. You'd want consumers that are not a driver to succeed, but drivers to defer until the full driver is up. > > Signed-off-by: Saravana Kannan <saravanak@google.com> > --- > drivers/clk/clk.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c > index 8c1d04db990d..27ff90eacb1f 100644 > --- a/drivers/clk/clk.c > +++ b/drivers/clk/clk.c > @@ -4555,6 +4555,8 @@ int of_clk_add_provider(struct device_node *np, > if (ret < 0) > of_clk_del_provider(np); > > + fwnode_dev_initialized(&np->fwnode, true); > + > return ret; > } > EXPORT_SYMBOL_GPL(of_clk_add_provider); > @@ -4672,6 +4674,7 @@ void of_clk_del_provider(struct device_node *np) > list_for_each_entry(cp, &of_clk_providers, link) { > if (cp->node == np) { > list_del(&cp->link); > + fwnode_dev_initialized(&np->fwnode, false); > of_node_put(cp->node); > kfree(cp); > break; > -- > 2.30.0.478.g8a0d178c01-goog >
On Mon, Feb 8, 2021 at 7:39 AM Rob Herring <robh+dt@kernel.org> wrote: > > On Fri, Feb 5, 2021 at 4:27 PM Saravana Kannan <saravanak@google.com> wrote: > > > > This allows fw_devlink to recognize clock provider drivers that don't > > use the device-driver model to initialize the device. fw_devlink will > > use this information to make sure consumers of such clock providers > > aren't indefinitely blocked from probing, waiting for the power domain > > device to appear and bind to a driver. > > Don't we have cases that are a mixture? IOW, a subset of the clock > provider is initialized early, then the full driver takes over. You'd > want consumers that are not a driver to succeed, but drivers to defer > until the full driver is up. You probably just made a typo, but to clarify, this is about ignoring suppliers that never bind. So, in your case the clock device is the supplier. To answer your question, consumer devices added after the full supplier driver takes over will still have device links created to the supplier clock device. But consumers added before the full driver takes over won't. So, nothing is worse off with fw_devlink=on and we get way more dependency tracking (device links) created than what we have today. -Saravana
On Mon, Feb 8, 2021 at 12:40 AM Marek Szyprowski <m.szyprowski@samsung.com> wrote: > > Hi Saravana, > > On 05.02.2021 23:26, Saravana Kannan wrote: > > There are a lot of devices/drivers where they never have a struct device > > created for them or the driver initializes the hardware without ever > > binding to the struct device. > > > > This series is intended to avoid any boot regressions due to such > > devices/drivers when fw_devlink=on and also address the handling of > > optional suppliers. > > > > Patch 1 and 2 addresses the issue of firmware nodes that look like > > they'll have struct devices created for them, but will never actually > > have struct devices added for them. For example, DT nodes with a > > compatible property that don't have devices added for them. > > > > Patch 3 and 4 allow for handling optional DT bindings. > > > > Patch 5 sets up a generic API to handle drivers that never bind with > > their devices. > > > > Patch 6 through 8 update different frameworks to use the new API. > > This patchset fixes probing issue observed on various Exynos based > boards even with commit c09a3e6c97f0 ("soc: samsung: pm_domains: Convert > to regular platform driver") reverted. Thanks! > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > Thanks for testing! -Saravana
On Fri, 05 Feb 2021 14:26:38 -0800, Saravana Kannan wrote: > If driver core marks a firmware node as not a device, don't add fwnode > links where it's a supplier. > > Fixes: e590474768f1 ("driver core: Set fw_devlink=on by default") > Signed-off-by: Saravana Kannan <saravanak@google.com> > --- > drivers/of/property.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > Acked-by: Rob Herring <robh@kernel.org>
Hi, Saravana, On 2/6/21 12:26 AM, Saravana Kannan wrote: > There are a lot of devices/drivers where they never have a struct device > created for them or the driver initializes the hardware without ever > binding to the struct device. > > This series is intended to avoid any boot regressions due to such > devices/drivers when fw_devlink=on and also address the handling of > optional suppliers. > > Patch 1 and 2 addresses the issue of firmware nodes that look like > they'll have struct devices created for them, but will never actually > have struct devices added for them. For example, DT nodes with a > compatible property that don't have devices added for them. > > Patch 3 and 4 allow for handling optional DT bindings. > > Patch 5 sets up a generic API to handle drivers that never bind with > their devices. > > Patch 6 through 8 update different frameworks to use the new API. > > Thanks, > Saravana > > Saravana Kannan (8): > driver core: fw_devlink: Detect supplier devices that will never be > added > of: property: Don't add links to absent suppliers > driver core: Add fw_devlink.strict kernel param > of: property: Add fw_devlink support for optional properties > driver core: fw_devlink: Handle suppliers that don't use driver core > irqdomain: Mark fwnodes when their irqdomain is added/removed > PM: domains: Mark fwnodes when their powerdomain is added/removed > clk: Mark fwnodes when their clock provider is added/removed > > .../admin-guide/kernel-parameters.txt | 5 ++ > drivers/base/core.c | 58 ++++++++++++++++++- > drivers/base/power/domain.c | 2 + > drivers/clk/clk.c | 3 + > drivers/of/property.c | 16 +++-- > include/linux/fwnode.h | 20 ++++++- > kernel/irq/irqdomain.c | 2 + > 7 files changed, 98 insertions(+), 8 deletions(-) > Even with this patch set applied, sama5d2_xplained can not boot. Patch at [1] makes sama5d2_xplained boot again. Stephen applied it to clk-next. Cheers, ta [1] https://lore.kernel.org/lkml/20210203154332.470587-1-tudor.ambarus@microchip.com/
On Wed, Feb 10, 2021 at 12:19 AM <Tudor.Ambarus@microchip.com> wrote: > > Hi, Saravana, > > On 2/6/21 12:26 AM, Saravana Kannan wrote: > > There are a lot of devices/drivers where they never have a struct device > > created for them or the driver initializes the hardware without ever > > binding to the struct device. > > > > This series is intended to avoid any boot regressions due to such > > devices/drivers when fw_devlink=on and also address the handling of > > optional suppliers. > > > > Patch 1 and 2 addresses the issue of firmware nodes that look like > > they'll have struct devices created for them, but will never actually > > have struct devices added for them. For example, DT nodes with a > > compatible property that don't have devices added for them. > > > > Patch 3 and 4 allow for handling optional DT bindings. > > > > Patch 5 sets up a generic API to handle drivers that never bind with > > their devices. > > > > Patch 6 through 8 update different frameworks to use the new API. > > > > Thanks, > > Saravana > > > > Saravana Kannan (8): > > driver core: fw_devlink: Detect supplier devices that will never be > > added > > of: property: Don't add links to absent suppliers > > driver core: Add fw_devlink.strict kernel param > > of: property: Add fw_devlink support for optional properties > > driver core: fw_devlink: Handle suppliers that don't use driver core > > irqdomain: Mark fwnodes when their irqdomain is added/removed > > PM: domains: Mark fwnodes when their powerdomain is added/removed > > clk: Mark fwnodes when their clock provider is added/removed > > > > .../admin-guide/kernel-parameters.txt | 5 ++ > > drivers/base/core.c | 58 ++++++++++++++++++- > > drivers/base/power/domain.c | 2 + > > drivers/clk/clk.c | 3 + > > drivers/of/property.c | 16 +++-- > > include/linux/fwnode.h | 20 ++++++- > > kernel/irq/irqdomain.c | 2 + > > 7 files changed, 98 insertions(+), 8 deletions(-) > > > > Even with this patch set applied, sama5d2_xplained can not boot. > Patch at [1] makes sama5d2_xplained boot again. Stephen applied it > to clk-next. I'm glad you won't actually have any boot issues in 5.12, but the fact you need [1] with this series doesn't make a lot of sense to me because: 1. The FWNODE_FLAG_INITIALIZED flag will be set for the clock fwnode in question way before any consumer devices are added. 2. Any consumer device added after (1) will stop trying to link to the clock device. Are you somehow adding a consumer to the clock fwnode before (1)? Can you try this patch without your clk fix? I was trying to avoid looping through a list, but looks like your case might somehow need it? -Saravana +++ b/drivers/base/core.c @@ -943,6 +943,31 @@ static void device_links_missing_supplier(struct device *dev) } } +static int fw_devlink_check_suppliers(struct device *dev) +{ + struct fwnode_link *link; + int ret = 0; + + if (!dev->fwnode ||fw_devlink_is_permissive()) + return 0; + + /* + * Device waiting for supplier to become available is not allowed to + * probe. + */ + mutex_lock(&fwnode_link_lock); + list_for_each_entry(link, &dev->fwnode->suppliers, c_hook) { + if (link->supplier->flags & FWNODE_FLAG_INITIALIZED) + continue; + + ret = -EPROBE_DEFER; + break; + } + mutex_unlock(&fwnode_link_lock); + + return ret; +} + /** * device_links_check_suppliers - Check presence of supplier drivers. * @dev: Consumer device. @@ -964,21 +989,13 @@ int device_links_check_suppliers(struct device *dev) struct device_link *link; int ret = 0; - /* - * Device waiting for supplier to become available is not allowed to - * probe. - */ - mutex_lock(&fwnode_link_lock); - if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) && - !fw_devlink_is_permissive()) { + if (fw_devlink_check_suppliers(dev)) { dev_dbg(dev, "probe deferral - wait for supplier %pfwP\n", list_first_entry(&dev->fwnode->suppliers, struct fwnode_link, c_hook)->supplier); - mutex_unlock(&fwnode_link_lock); return -EPROBE_DEFER; } - mutex_unlock(&fwnode_link_lock); device_links_write_lock(); > > Cheers, > ta > > [1] https://lore.kernel.org/lkml/20210203154332.470587-1-tudor.ambarus@microchip.com/
On 2/10/21 10:54 AM, Saravana Kannan wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > On Wed, Feb 10, 2021 at 12:19 AM <Tudor.Ambarus@microchip.com> wrote: >> >> Hi, Saravana, >> >> On 2/6/21 12:26 AM, Saravana Kannan wrote: >>> There are a lot of devices/drivers where they never have a struct device >>> created for them or the driver initializes the hardware without ever >>> binding to the struct device. >>> >>> This series is intended to avoid any boot regressions due to such >>> devices/drivers when fw_devlink=on and also address the handling of >>> optional suppliers. >>> >>> Patch 1 and 2 addresses the issue of firmware nodes that look like >>> they'll have struct devices created for them, but will never actually >>> have struct devices added for them. For example, DT nodes with a >>> compatible property that don't have devices added for them. >>> >>> Patch 3 and 4 allow for handling optional DT bindings. >>> >>> Patch 5 sets up a generic API to handle drivers that never bind with >>> their devices. >>> >>> Patch 6 through 8 update different frameworks to use the new API. >>> >>> Thanks, >>> Saravana >>> >>> Saravana Kannan (8): >>> driver core: fw_devlink: Detect supplier devices that will never be >>> added >>> of: property: Don't add links to absent suppliers >>> driver core: Add fw_devlink.strict kernel param >>> of: property: Add fw_devlink support for optional properties >>> driver core: fw_devlink: Handle suppliers that don't use driver core >>> irqdomain: Mark fwnodes when their irqdomain is added/removed >>> PM: domains: Mark fwnodes when their powerdomain is added/removed >>> clk: Mark fwnodes when their clock provider is added/removed >>> >>> .../admin-guide/kernel-parameters.txt | 5 ++ >>> drivers/base/core.c | 58 ++++++++++++++++++- >>> drivers/base/power/domain.c | 2 + >>> drivers/clk/clk.c | 3 + >>> drivers/of/property.c | 16 +++-- >>> include/linux/fwnode.h | 20 ++++++- >>> kernel/irq/irqdomain.c | 2 + >>> 7 files changed, 98 insertions(+), 8 deletions(-) >>> >> >> Even with this patch set applied, sama5d2_xplained can not boot. >> Patch at [1] makes sama5d2_xplained boot again. Stephen applied it >> to clk-next. > > I'm glad you won't actually have any boot issues in 5.12, but the fact > you need [1] with this series doesn't make a lot of sense to me > because: > > 1. The FWNODE_FLAG_INITIALIZED flag will be set for the clock fwnode > in question way before any consumer devices are added. Looks like in my case FWNODE_FLAG_INITIALIZED is not set, because drivers/clk/at91/sama5d2.c uses of_clk_add_hw_provider(). > 2. Any consumer device added after (1) will stop trying to link to the > clock device. > > Are you somehow adding a consumer to the clock fwnode before (1)? > > Can you try this patch without your clk fix? I was trying to avoid > looping through a list, but looks like your case might somehow need > it? > I tried it, didn't solve my boot problem. The following patch makes the sama5d2_xplained boot again, even without the patch from [1]: diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index 27ff90eacb1f..9370e4dfecae 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -4594,6 +4594,8 @@ int of_clk_add_hw_provider(struct device_node *np, if (ret < 0) of_clk_del_provider(np); + fwnode_dev_initialized(&np->fwnode, true); + return ret; } EXPORT_SYMBOL_GPL(of_clk_add_hw_provider); Cheers, ta > -Saravana > > +++ b/drivers/base/core.c > @@ -943,6 +943,31 @@ static void device_links_missing_supplier(struct > device *dev) > } > } > > +static int fw_devlink_check_suppliers(struct device *dev) > +{ > + struct fwnode_link *link; > + int ret = 0; > + > + if (!dev->fwnode ||fw_devlink_is_permissive()) > + return 0; > + > + /* > + * Device waiting for supplier to become available is not allowed to > + * probe. > + */ > + mutex_lock(&fwnode_link_lock); > + list_for_each_entry(link, &dev->fwnode->suppliers, c_hook) { > + if (link->supplier->flags & FWNODE_FLAG_INITIALIZED) > + continue; > + > + ret = -EPROBE_DEFER; > + break; > + } > + mutex_unlock(&fwnode_link_lock); > + > + return ret; > +} > + > /** > * device_links_check_suppliers - Check presence of supplier drivers. > * @dev: Consumer device. > @@ -964,21 +989,13 @@ int device_links_check_suppliers(struct device *dev) > struct device_link *link; > int ret = 0; > > - /* > - * Device waiting for supplier to become available is not allowed to > - * probe. > - */ > - mutex_lock(&fwnode_link_lock); > - if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) && > - !fw_devlink_is_permissive()) { > + if (fw_devlink_check_suppliers(dev)) { > dev_dbg(dev, "probe deferral - wait for supplier %pfwP\n", > list_first_entry(&dev->fwnode->suppliers, > struct fwnode_link, > c_hook)->supplier); > - mutex_unlock(&fwnode_link_lock); > return -EPROBE_DEFER; > } > - mutex_unlock(&fwnode_link_lock); > > device_links_write_lock(); > > > >> >> Cheers, >> ta >> >> [1] https://lore.kernel.org/lkml/20210203154332.470587-1-tudor.ambarus@microchip.com/
On Wed, Feb 10, 2021 at 2:02 AM <Tudor.Ambarus@microchip.com> wrote: > > On 2/10/21 10:54 AM, Saravana Kannan wrote: > > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > > > On Wed, Feb 10, 2021 at 12:19 AM <Tudor.Ambarus@microchip.com> wrote: > >> > >> Hi, Saravana, > >> > >> On 2/6/21 12:26 AM, Saravana Kannan wrote: > >>> There are a lot of devices/drivers where they never have a struct device > >>> created for them or the driver initializes the hardware without ever > >>> binding to the struct device. > >>> > >>> This series is intended to avoid any boot regressions due to such > >>> devices/drivers when fw_devlink=on and also address the handling of > >>> optional suppliers. > >>> > >>> Patch 1 and 2 addresses the issue of firmware nodes that look like > >>> they'll have struct devices created for them, but will never actually > >>> have struct devices added for them. For example, DT nodes with a > >>> compatible property that don't have devices added for them. > >>> > >>> Patch 3 and 4 allow for handling optional DT bindings. > >>> > >>> Patch 5 sets up a generic API to handle drivers that never bind with > >>> their devices. > >>> > >>> Patch 6 through 8 update different frameworks to use the new API. > >>> > >>> Thanks, > >>> Saravana > >>> > >>> Saravana Kannan (8): > >>> driver core: fw_devlink: Detect supplier devices that will never be > >>> added > >>> of: property: Don't add links to absent suppliers > >>> driver core: Add fw_devlink.strict kernel param > >>> of: property: Add fw_devlink support for optional properties > >>> driver core: fw_devlink: Handle suppliers that don't use driver core > >>> irqdomain: Mark fwnodes when their irqdomain is added/removed > >>> PM: domains: Mark fwnodes when their powerdomain is added/removed > >>> clk: Mark fwnodes when their clock provider is added/removed > >>> > >>> .../admin-guide/kernel-parameters.txt | 5 ++ > >>> drivers/base/core.c | 58 ++++++++++++++++++- > >>> drivers/base/power/domain.c | 2 + > >>> drivers/clk/clk.c | 3 + > >>> drivers/of/property.c | 16 +++-- > >>> include/linux/fwnode.h | 20 ++++++- > >>> kernel/irq/irqdomain.c | 2 + > >>> 7 files changed, 98 insertions(+), 8 deletions(-) > >>> > >> > >> Even with this patch set applied, sama5d2_xplained can not boot. > >> Patch at [1] makes sama5d2_xplained boot again. Stephen applied it > >> to clk-next. > > > > I'm glad you won't actually have any boot issues in 5.12, but the fact > > you need [1] with this series doesn't make a lot of sense to me > > because: > > > > 1. The FWNODE_FLAG_INITIALIZED flag will be set for the clock fwnode > > in question way before any consumer devices are added. > > Looks like in my case FWNODE_FLAG_INITIALIZED is not set, because > drivers/clk/at91/sama5d2.c uses of_clk_add_hw_provider(). Ah, that explains it. > > 2. Any consumer device added after (1) will stop trying to link to the > > clock device. > > > > Are you somehow adding a consumer to the clock fwnode before (1)? > > > > Can you try this patch without your clk fix? I was trying to avoid > > looping through a list, but looks like your case might somehow need > > it? > > > > I tried it, didn't solve my boot problem. Thanks! I should stop coding past midnight! > The following patch makes the > sama5d2_xplained boot again, even without the patch from [1]: Great! I gave a reviewed-by. -Saravana
Hi Saravana, On Fri, Feb 5, 2021 at 11:26 PM Saravana Kannan <saravanak@google.com> wrote: > There are a lot of devices/drivers where they never have a struct device > created for them or the driver initializes the hardware without ever > binding to the struct device. > > This series is intended to avoid any boot regressions due to such > devices/drivers when fw_devlink=on and also address the handling of > optional suppliers. > > Patch 1 and 2 addresses the issue of firmware nodes that look like > they'll have struct devices created for them, but will never actually > have struct devices added for them. For example, DT nodes with a > compatible property that don't have devices added for them. > > Patch 3 and 4 allow for handling optional DT bindings. > > Patch 5 sets up a generic API to handle drivers that never bind with > their devices. > > Patch 6 through 8 update different frameworks to use the new API. > > Thanks, > Saravana > > Saravana Kannan (8): > driver core: fw_devlink: Detect supplier devices that will never be > added > of: property: Don't add links to absent suppliers > driver core: Add fw_devlink.strict kernel param > of: property: Add fw_devlink support for optional properties > driver core: fw_devlink: Handle suppliers that don't use driver core > irqdomain: Mark fwnodes when their irqdomain is added/removed > PM: domains: Mark fwnodes when their powerdomain is added/removed > clk: Mark fwnodes when their clock provider is added/removed Thanks for your series, which is now part of driver-core-next. I gave driver-core-next + [1] a try on various Renesas boards. Test results are below. In general, the result looks much better than before. [1] - https://lore.kernel.org/lkml/20210210114435.122242-1-tudor.ambarus@microchip.com/ 1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu). - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device node OF_POPULATED after init") is no longer needed (but already queued for v5.12 anyway) - Some devices are reprobed, despite their drivers returning a real error code, and not -EPROBE_DEFER: renesas_wdt e6020000.watchdog: Watchdog blacklisted on r8a7791 ES1.* (rwdt_probe() returns -ENODEV) sh-pfc e6060000.pinctrl: pin GP_7_23 already requested by ee090000.pci; cannot claim for e6590000.usb sh-pfc e6060000.pinctrl: pin-247 (e6590000.usb) status -22 sh-pfc e6060000.pinctrl: could not request pin 247 (GP_7_23) from group usb0 on device sh-pfc renesas_usbhs e6590000.usb: Error applying setting, reverse things back renesas_usbhs: probe of e6590000.usb failed with error -22 rcar-pcie fe000000.pcie: host bridge /soc/pcie@fe000000 ranges: rcar-pcie fe000000.pcie: IO 0x00fe100000..0x00fe1fffff -> 0x0000000000 rcar-pcie fe000000.pcie: MEM 0x00fe200000..0x00fe3fffff -> 0x00fe200000 rcar-pcie fe000000.pcie: MEM 0x0030000000..0x0037ffffff -> 0x0030000000 rcar-pcie fe000000.pcie: MEM 0x0038000000..0x003fffffff -> 0x0038000000 rcar-pcie fe000000.pcie: IB MEM 0x0040000000..0x00bfffffff -> 0x0040000000 rcar-pcie fe000000.pcie: IB MEM 0x0200000000..0x02ffffffff -> 0x0200000000 rcar-pcie fe000000.pcie: PCIe link down (rcar_pcie_probe() returns -ENODEV) xhci-hcd ee000000.usb: xHCI Host Controller xhci-hcd ee000000.usb: new USB bus registered, assigned bus number 7 xhci-hcd ee000000.usb: Direct firmware load for r8a779x_usb3_v3.dlmem failed with error -2 xhci-hcd ee000000.usb: can't setup: -2 xhci-hcd ee000000.usb: USB bus 7 deregistered xhci-hcd: probe of ee000000.usb failed with error -2 - The PCI reprobing leads to a memory leak, for which I've sent a fix "[PATCH] PCI: Fix memory leak in pci_register_io_range()" https://lore.kernel.org/linux-pci/20210202100332.829047-1-geert+renesas@glider.be/ - I2C on R-Car Gen3 does not seem to use DMA, according to /sys/kernel/debug/dmaengine/summary: -dma4chan0 | e66d8000.i2c:tx -dma4chan1 | e66d8000.i2c:rx -dma5chan0 | e6510000.i2c:tx - Disabling CONFIG_IPMMU_VMSA (IOMMU) now works, good! ignoring dependency for device, assuming no driver - Disabling CONFIG_RCAR_DMAC works for most devices, except for sound: -rcar_sound ec500000.sound: probed ALSA device list: - #0: rcar-sound + No soundcards found. # cat /sys/kernel/debug/devices_deferred 2-0010 sound ec500000.sound platform e6510000.i2c: Linked as a sync state only consumer to ec500000.sound platform ec500000.sound: Linked as a consumer to e6060000.pinctrl platform ec500000.sound: Linked as a consumer to e6150000.clock-controller i2c 2-0010: Linked as a consumer to ec500000.sound platform ec500000.sound: Linked as a consumer to 2-004f cs2000-cp 2-004f: revision - C1 i2c-rcar e6510000.i2c: probed i2c-rcar e6510000.i2c: Dropping the link to ec500000.sound i2c 2-0010: probe deferral - supplier ec500000.sound not ready With CONFIG_RCAR_DMAC=y, ec500000.sound is probed quite early. arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts ak4613: codec@10 { clocks = <&rcar_sound 3>; port { ak4613_endpoint: endpoint { remote-endpoint = <&rsnd_endpoint0>; }; }; }; sound_card: sound { dais = <&rsnd_port0 /* ak4613 */ &rsnd_port1 /* HDMI0 */ &rsnd_port2>; /* HDMI1 */ }; rcar_sound: sound@ec500000 { ports { rsnd_port0: port@0 { rsnd_endpoint0: endpoint { remote-endpoint = <&ak4613_endpoint>; } } } }; 2. SH/R-Mobile AG5 (kzm9g), APE6 (ape6evm), A1 (armadillo800-eva) - "PATCH] soc: renesas: rmobile-sysc: Set OF_POPULATED and absorb reset handling" is no longer needed https://lore.kernel.org/linux-arm-kernel/20210205133319.1921108-1-geert+renesas@glider.be/ - On R-Mobile A1, I get a BUG and a memory leak: BUG: spinlock bad magic on CPU#0, swapper/1 lock: lcdc0_device+0x10c/0x308, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 CPU: 0 PID: 1 Comm: swapper Not tainted 5.11.0-rc5-armadillo-00032-gf0a85c26907e #266 Hardware name: Generic R8A7740 (Flattened Device Tree) [<c010c3c8>] (unwind_backtrace) from [<c010a49c>] (show_stack+0x10/0x14) [<c010a49c>] (show_stack) from [<c0159534>] (do_raw_spin_lock+0x20/0x94) [<c0159534>] (do_raw_spin_lock) from [<c04089d8>] (dev_pm_get_subsys_data+0x30/0xa0) [<c04089d8>] (dev_pm_get_subsys_data) from [<c0413698>] (genpd_add_device+0x34/0x1c0) [<c0413698>] (genpd_add_device) from [<c041389c>] (of_genpd_add_device+0x34/0x4c) [<c041389c>] (of_genpd_add_device) from [<c0a1e9bc>] (board_staging_register_device+0xf8/0x118) [<c0a1e9bc>] (board_staging_register_device) from [<c0a1ea00>] (board_staging_register_devices+0x24/0x28) [<c0a1ea00>] (board_staging_register_devices) from [<c0a1ea30>] (runtime_board_check+0x2c/0x40) [<c0a1ea30>] (runtime_board_check) from [<c0101fac>] (do_one_initcall+0xe0/0x278) [<c0101fac>] (do_one_initcall) from [<c0a01034>] (kernel_init_freeable+0x174/0x1c0) [<c0a01034>] (kernel_init_freeable) from [<c05fd568>] (kernel_init+0x8/0x118) [<c05fd568>] (kernel_init) from [<c010011c>] (ret_from_fork+0x14/0x38) Exception stack(0xc19c9fb0 to 0xc19c9ff8) 9fa0: 00000000 00000000 00000000 00000000 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 unreferenced object 0xc4134e00 (size 512): comm "swapper", pid 1, jiffies 4294937296 (age 3541.930s) hex dump (first 32 bytes): 00 4e 13 c4 00 4e 13 c4 ff ff ff 7f ff ff ff 7f .N...N.......... ff ff ff 7f 02 00 00 00 00 5f 13 c4 1c 4e 13 c4 ........._...N.. backtrace: [<de1a3c34>] dev_pm_qos_constraints_allocate+0x10/0xcc [<d21cf6e4>] dev_pm_qos_add_notifier+0x6c/0xd0 [<e04bbc90>] genpd_add_device+0x178/0x1c0 [<95067303>] of_genpd_add_device+0x34/0x4c [<c334b97a>] board_staging_register_device+0xf8/0x118 [<01bd495a>] board_staging_register_devices+0x24/0x28 [<fb25a5d8>] runtime_board_check+0x2c/0x40 [<65aed679>] do_one_initcall+0xe0/0x278 [<97e3f4f7>] kernel_init_freeable+0x174/0x1c0 [<63c8fed0>] kernel_init+0x8/0x118 [<f704d96c>] ret_from_fork+0x14/0x38 [<00000000>] 0x0 3. RZ/A1 and RZ/A2: No issues. Gr{oetje,eeting}s, Geert
Hi Saravana, On Thu, Feb 11, 2021 at 2:00 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > - Disabling CONFIG_RCAR_DMAC works for most devices, except for > sound: Please ignore. DMA is mandatory for sound, and thus fails in the same way on v5.11-rc5. Gr{oetje,eeting}s, Geert
Hi Saravana, On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <saravanak@google.com> wrote: > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > 1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu). > > > > - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device > > node OF_POPULATED after init") is no longer needed (but already > > queued for v5.12 anyway) > > Rob doesn't like the proliferation of OF_POPULATED and we don't need > it anymore, so maybe work it out with him? It's a balance between some > wasted memory (struct device(s)) vs not proliferating OF_POPULATED. Rob: should it be reverted? For v5.13? I guess other similar "fixes" went in in the mean time. > > - Some devices are reprobed, despite their drivers returning > > a real error code, and not -EPROBE_DEFER: > > Sorry, it's not obvious from the logs below where "reprobing" is > happening. Can you give more pointers please? My log was indeed not a full log, but just the reprobes happening. I'll send you a full log by private email. > Also, thinking more about this, the only way I could see this happen is: > 1. Device fails with error that's not -EPROBE_DEFER > 2. It somehow gets added to a device link (with AUTOPROBE_CONSUMER > flag) where it's a consumer. > 3. The supplier probes and the device gets added to the deferred probe > list again. > > But I can't see how this sequence can happen. Device links are created > only when a device is added. And is the supplier isn't added yet, the > consumer wouldn't have probed in the first place. The full log doesn't show any evidence of the device being added to a list in between the two probes. > Other than "annoying waste of time" is this causing any other problems? Probably not. But see below. > > - The PCI reprobing leads to a memory leak, for which I've sent a fix > > "[PATCH] PCI: Fix memory leak in pci_register_io_range()" > > https://lore.kernel.org/linux-pci/20210202100332.829047-1-geert+renesas@glider.be/ > > Wrt PCI reprobing, > 1. Is this PCI never expected to probe, but it's being reattempted > despite the NOT EPROBE_DEFER error? Or There is no PCIe card present, so the failure is expected. Later it is reprobed, which of course fails again. > 2. The PCI was deferred probe when it should have probed and then when > it's finally reattemped and it could succeed, we are hitting this mem > leak issue? I think the leak has always been there, but it was just exposed by this unneeded reprobe. I don't think a reprobe after that specific error path had ever happened before. > I'm basically trying to distinguish between "this stuff should never > be retried" vs "this/it's suppliers got probe deferred with > fw_devlink=on vs but didn't get probe deferred with > fw_devlink=permissive and that's causing issues" There should not be a probe deferral, as no -EPROBE_DEFER was returned. > > - I2C on R-Car Gen3 does not seem to use DMA, according to > > /sys/kernel/debug/dmaengine/summary: > > > > -dma4chan0 | e66d8000.i2c:tx > > -dma4chan1 | e66d8000.i2c:rx > > -dma5chan0 | e6510000.i2c:tx > > I think I need more context on the problem before I can try to fix it. > I'm also very unfamiliar with that file. With fw_devlink=permissive, > I2C was using DMA? If so, the next step is to see if the I2C relative > probe order with DMA is getting changed and if so, why. Yes, I plan to dig deeper to see what really happens... > > - On R-Mobile A1, I get a BUG and a memory leak: > > > > BUG: spinlock bad magic on CPU#0, swapper/1 > > Hmm... I looked at this in bits and pieces throughout the day. At > least spent an hour looking at this. This doesn't make a lot of sense > to me. I don't even touch anything in this code path AFAICT. Are > modules/kernel mixed up somehow? I need more info before I can help. > Does reverting my pm domain change make any difference (assume it > boots this far without it). I plan to dig deeper to see what really happens... Gr{oetje,eeting}s, Geert
On Fri, Feb 12, 2021 at 12:15 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > Hi Saravana, > > On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <saravanak@google.com> wrote: > > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > 1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu). > > > > > > - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device > > > node OF_POPULATED after init") is no longer needed (but already > > > queued for v5.12 anyway) > > > > Rob doesn't like the proliferation of OF_POPULATED and we don't need > > it anymore, so maybe work it out with him? It's a balance between some > > wasted memory (struct device(s)) vs not proliferating OF_POPULATED. > > Rob: should it be reverted? For v5.13? > I guess other similar "fixes" went in in the mean time. > > > > - Some devices are reprobed, despite their drivers returning > > > a real error code, and not -EPROBE_DEFER: > > > > Sorry, it's not obvious from the logs below where "reprobing" is > > happening. Can you give more pointers please? > > My log was indeed not a full log, but just the reprobes happening. > I'll send you a full log by private email. > > > Also, thinking more about this, the only way I could see this happen is: > > 1. Device fails with error that's not -EPROBE_DEFER > > 2. It somehow gets added to a device link (with AUTOPROBE_CONSUMER > > flag) where it's a consumer. > > 3. The supplier probes and the device gets added to the deferred probe > > list again. > > > > But I can't see how this sequence can happen. Device links are created > > only when a device is added. And is the supplier isn't added yet, the > > consumer wouldn't have probed in the first place. > > The full log doesn't show any evidence of the device being added > to a list in between the two probes. > > > Other than "annoying waste of time" is this causing any other problems? > > Probably not. But see below. > > > > - The PCI reprobing leads to a memory leak, for which I've sent a fix > > > "[PATCH] PCI: Fix memory leak in pci_register_io_range()" > > > https://lore.kernel.org/linux-pci/20210202100332.829047-1-geert+renesas@glider.be/ > > > > Wrt PCI reprobing, > > 1. Is this PCI never expected to probe, but it's being reattempted > > despite the NOT EPROBE_DEFER error? Or > > There is no PCIe card present, so the failure is expected. > Later it is reprobed, which of course fails again. > > > 2. The PCI was deferred probe when it should have probed and then when > > it's finally reattemped and it could succeed, we are hitting this mem > > leak issue? > > I think the leak has always been there, but it was just exposed by > this unneeded reprobe. I don't think a reprobe after that specific > error path had ever happened before. > > > I'm basically trying to distinguish between "this stuff should never > > be retried" vs "this/it's suppliers got probe deferred with > > fw_devlink=on vs but didn't get probe deferred with > > fw_devlink=permissive and that's causing issues" > > There should not be a probe deferral, as no -EPROBE_DEFER was > returned. > > > > - I2C on R-Car Gen3 does not seem to use DMA, according to > > > /sys/kernel/debug/dmaengine/summary: > > > > > > -dma4chan0 | e66d8000.i2c:tx > > > -dma4chan1 | e66d8000.i2c:rx > > > -dma5chan0 | e6510000.i2c:tx > > > > I think I need more context on the problem before I can try to fix it. > > I'm also very unfamiliar with that file. With fw_devlink=permissive, > > I2C was using DMA? If so, the next step is to see if the I2C relative > > probe order with DMA is getting changed and if so, why. > > Yes, I plan to dig deeper to see what really happens... Try fw_devlink.strict (you'll need IOMMU enabled too). If that fixes it and you also don't see this issue with fw_devlink=permissive, then it means there's probably some unnecessary probe deferral that we should try to avoid. At least, that's my hunch right now. Thanks, Saravana > > > > - On R-Mobile A1, I get a BUG and a memory leak: > > > > > > BUG: spinlock bad magic on CPU#0, swapper/1 > > > > > Hmm... I looked at this in bits and pieces throughout the day. At > > least spent an hour looking at this. This doesn't make a lot of sense > > to me. I don't even touch anything in this code path AFAICT. Are > > modules/kernel mixed up somehow? I need more info before I can help. > > Does reverting my pm domain change make any difference (assume it > > boots this far without it). > > I plan to dig deeper to see what really happens... > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds
On Thu, Feb 11, 2021 at 2:00 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > On Fri, Feb 5, 2021 at 11:26 PM Saravana Kannan <saravanak@google.com> wrote: > > There are a lot of devices/drivers where they never have a struct device > > created for them or the driver initializes the hardware without ever > > binding to the struct device. > > > > This series is intended to avoid any boot regressions due to such > > devices/drivers when fw_devlink=on and also address the handling of > > optional suppliers. > - Some devices are reprobed, despite their drivers returning > a real error code, and not -EPROBE_DEFER: > > renesas_wdt e6020000.watchdog: Watchdog blacklisted on r8a7791 ES1.* > (rwdt_probe() returns -ENODEV) > > sh-pfc e6060000.pinctrl: pin GP_7_23 already requested by > ee090000.pci; cannot claim for e6590000.usb > sh-pfc e6060000.pinctrl: pin-247 (e6590000.usb) status -22 > sh-pfc e6060000.pinctrl: could not request pin 247 > (GP_7_23) from group usb0 on device sh-pfc > renesas_usbhs e6590000.usb: Error applying setting, > reverse things back > renesas_usbhs: probe of e6590000.usb failed with error -22 > > rcar-pcie fe000000.pcie: host bridge /soc/pcie@fe000000 ranges: > rcar-pcie fe000000.pcie: IO > 0x00fe100000..0x00fe1fffff -> 0x0000000000 > rcar-pcie fe000000.pcie: MEM > 0x00fe200000..0x00fe3fffff -> 0x00fe200000 > rcar-pcie fe000000.pcie: MEM > 0x0030000000..0x0037ffffff -> 0x0030000000 > rcar-pcie fe000000.pcie: MEM > 0x0038000000..0x003fffffff -> 0x0038000000 > rcar-pcie fe000000.pcie: IB MEM > 0x0040000000..0x00bfffffff -> 0x0040000000 > rcar-pcie fe000000.pcie: IB MEM > 0x0200000000..0x02ffffffff -> 0x0200000000 > rcar-pcie fe000000.pcie: PCIe link down > (rcar_pcie_probe() returns -ENODEV) > > xhci-hcd ee000000.usb: xHCI Host Controller > xhci-hcd ee000000.usb: new USB bus registered, assigned bus number 7 > xhci-hcd ee000000.usb: Direct firmware load for > r8a779x_usb3_v3.dlmem failed with error -2 > xhci-hcd ee000000.usb: can't setup: -2 > xhci-hcd ee000000.usb: USB bus 7 deregistered > xhci-hcd: probe of ee000000.usb failed with error -2 Consumers are added to the deferred probe pending list before they are probed, but not removed on probe failure. Patch sent "[PATCH] driver core: Fix double failed probing with fw_devlink=on" https://lore.kernel.org/linux-renesas-soc/20210215111619.2385030-1-geert+renesas@glider.be/ Gr{oetje,eeting}s, Geert
Hi Saravana, On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <saravanak@google.com> wrote: > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > - I2C on R-Car Gen3 does not seem to use DMA, according to > > /sys/kernel/debug/dmaengine/summary: > > > > -dma4chan0 | e66d8000.i2c:tx > > -dma4chan1 | e66d8000.i2c:rx > > -dma5chan0 | e6510000.i2c:tx > > I think I need more context on the problem before I can try to fix it. > I'm also very unfamiliar with that file. With fw_devlink=permissive, > I2C was using DMA? If so, the next step is to see if the I2C relative > probe order with DMA is getting changed and if so, why. More detailed log: platform e66d8000.i2c: Linked as a consumer to e6150000.clock-controller platform e66d8000.i2c: Linked as a sync state only consumer to e6055400.gpio Why is e66d8000.i2c not linked as a consumer to e6700000.dma-controller? platform e6700000.dma-controller: Linked as a consumer to e6150000.clock-controller platform e66d8000.i2c: Added to deferred list platform e6700000.dma-controller: Added to deferred list bus: 'platform': driver_probe_device: matched device e6700000.dma-controller with driver rcar-dmac bus: 'platform': really_probe: probing driver rcar-dmac with device e6700000.dma-controller platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral bus: 'platform': driver_probe_device: matched device e66d8000.i2c with driver i2c-rcar bus: 'platform': really_probe: probing driver i2c-rcar with device e66d8000.i2c I2C becomes available... i2c-rcar e66d8000.i2c: request_channel failed for tx (-517) [...] but DMA is not available yet, so the driver falls back to PIO. driver: 'i2c-rcar': driver_bound: bound to device 'e66d8000.i2c' bus: 'platform': really_probe: bound device e66d8000.i2c to driver i2c-rcar platform e6700000.dma-controller: Retrying from deferred list bus: 'platform': driver_probe_device: matched device e6700000.dma-controller with driver rcar-dmac bus: 'platform': really_probe: probing driver rcar-dmac with device e6700000.dma-controller platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral platform e6700000.dma-controller: Added to deferred list platform e6700000.dma-controller: Retrying from deferred list bus: 'platform': driver_probe_device: matched device e6700000.dma-controller with driver rcar-dmac bus: 'platform': really_probe: probing driver rcar-dmac with device e6700000.dma-controller driver: 'rcar-dmac': driver_bound: bound to device 'e6700000.dma-controller' bus: 'platform': really_probe: bound device e6700000.dma-controller to driver rcar-dmac DMA becomes available. Here userspace is entered. /sys/kernel/debug/dmaengine/summary shows that the I2C controllers do not have DMA channels allocated, as the kernel has performed no more I2C transfers after DMA became available. Using i2cdetect shows that DMA is used, which is good: i2c-rcar e66d8000.i2c: got DMA channel for rx With permissive devlinks, the clock controller consumers are not added to the deferred probing list, and probe order is slightly different. The I2C controllers are still probed before the DMA controllers. But DMA becomes available a bit earlier, before the probing of the last I2C slave driver. Hence /sys/kernel/debug/dmaengine/summary shows that some I2C transfers did use DMA. So the real issue is that e66d8000.i2c not linked as a consumer to e6700000.dma-controller. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Hi Saravana, On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <saravanak@google.com> wrote: > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > 1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu). > > > > - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device > > node OF_POPULATED after init") is no longer needed (but already > > queued for v5.12 anyway) > > Rob doesn't like the proliferation of OF_POPULATED and we don't need > it anymore, so maybe work it out with him? It's a balance between some > wasted memory (struct device(s)) vs not proliferating OF_POPULATED. > > 2. SH/R-Mobile AG5 (kzm9g), APE6 (ape6evm), A1 (armadillo800-eva) > > > > - "PATCH] soc: renesas: rmobile-sysc: Set OF_POPULATED and absorb > > reset handling" is no longer needed > > https://lore.kernel.org/linux-arm-kernel/20210205133319.1921108-1-geert+renesas@glider.be/ > > Good to see more evidence that this series is fixing things at a more > generic level. I spoke too soon: if CONFIG_POWER_RESET_RMOBILE=n, booting fails again, as everything is waiting on the system controller, which never becomes available. Rcar-sysc doesn't suffer from this problem, cfr. above. Perhaps because the rmobile-sysc bindings use a hierarchical instead of a linear PM domain description, and thus consumers point to the children of the system controller node? Cfr. system-controller@e6180000 in arch/arm/boot/dts/r8a7740.dtsi. > > - On R-Mobile A1, I get a BUG and a memory leak: > > > > BUG: spinlock bad magic on CPU#0, swapper/1 > > lock: lcdc0_device+0x10c/0x308, .magic: 00000000, .owner: > > <none>/-1, .owner_cpu: 0 > > CPU: 0 PID: 1 Comm: swapper Not tainted > > 5.11.0-rc5-armadillo-00032-gf0a85c26907e #266 > > Hardware name: Generic R8A7740 (Flattened Device Tree) > > [<c010c3c8>] (unwind_backtrace) from [<c010a49c>] > > (show_stack+0x10/0x14) > > [<c010a49c>] (show_stack) from [<c0159534>] > > (do_raw_spin_lock+0x20/0x94) > > [<c0159534>] (do_raw_spin_lock) from [<c04089d8>] > > (dev_pm_get_subsys_data+0x30/0xa0) > > [<c04089d8>] (dev_pm_get_subsys_data) from [<c0413698>] > > (genpd_add_device+0x34/0x1c0) > > [<c0413698>] (genpd_add_device) from [<c041389c>] > > (of_genpd_add_device+0x34/0x4c) > > [<c041389c>] (of_genpd_add_device) from [<c0a1e9bc>] > > (board_staging_register_device+0xf8/0x118) > > [<c0a1e9bc>] (board_staging_register_device) from This is indeed a pre-existing problem. of_genpd_add_device() is called before platform_device_register(), as it needs to attach the genpd before the device is probed. But the spinlock is only initialized when the device is registered. This was masked before due to an unrelated wait context check failure, which disabled any further spinlock checks, and exposed by fw_devlinks changing probe order. Patch sent. "[PATCH] staging: board: Fix uninitialized spinlock when attaching genpd" https://lore.kernel.org/r/20210215151405.2551143-1-geert+renesas@glider.be Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On Mon, Feb 15, 2021 at 4:38 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > Hi Saravana, > > On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <saravanak@google.com> wrote: > > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > - I2C on R-Car Gen3 does not seem to use DMA, according to > > > /sys/kernel/debug/dmaengine/summary: > > > > > > -dma4chan0 | e66d8000.i2c:tx > > > -dma4chan1 | e66d8000.i2c:rx > > > -dma5chan0 | e6510000.i2c:tx > > > > I think I need more context on the problem before I can try to fix it. > > I'm also very unfamiliar with that file. With fw_devlink=permissive, > > I2C was using DMA? If so, the next step is to see if the I2C relative > > probe order with DMA is getting changed and if so, why. > > More detailed log: > > platform e66d8000.i2c: Linked as a consumer to e6150000.clock-controller > platform e66d8000.i2c: Linked as a sync state only consumer to e6055400.gpio > > Why is e66d8000.i2c not linked as a consumer to e6700000.dma-controller? Because fw_devlink.strict=1 is not set and dma/iommu is considered an "optional"/"driver decides" dependency. > platform e6700000.dma-controller: Linked as a consumer to > e6150000.clock-controller Is this the only supplier of dma-controller? > platform e66d8000.i2c: Added to deferred list > platform e6700000.dma-controller: Added to deferred list > > bus: 'platform': driver_probe_device: matched device > e6700000.dma-controller with driver rcar-dmac > bus: 'platform': really_probe: probing driver rcar-dmac with > device e6700000.dma-controller > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral > > bus: 'platform': driver_probe_device: matched device e66d8000.i2c > with driver i2c-rcar > bus: 'platform': really_probe: probing driver i2c-rcar with device > e66d8000.i2c > > I2C becomes available... > > i2c-rcar e66d8000.i2c: request_channel failed for tx (-517) > [...] > > but DMA is not available yet, so the driver falls back to PIO. > > driver: 'i2c-rcar': driver_bound: bound to device 'e66d8000.i2c' > bus: 'platform': really_probe: bound device e66d8000.i2c to driver i2c-rcar > > platform e6700000.dma-controller: Retrying from deferred list > bus: 'platform': driver_probe_device: matched device > e6700000.dma-controller with driver rcar-dmac > bus: 'platform': really_probe: probing driver rcar-dmac with > device e6700000.dma-controller > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral > platform e6700000.dma-controller: Added to deferred list > platform e6700000.dma-controller: Retrying from deferred list > bus: 'platform': driver_probe_device: matched device > e6700000.dma-controller with driver rcar-dmac > bus: 'platform': really_probe: probing driver rcar-dmac with > device e6700000.dma-controller > driver: 'rcar-dmac': driver_bound: bound to device 'e6700000.dma-controller' > bus: 'platform': really_probe: bound device > e6700000.dma-controller to driver rcar-dmac > > DMA becomes available. > > Here userspace is entered. /sys/kernel/debug/dmaengine/summary shows > that the I2C controllers do not have DMA channels allocated, as the > kernel has performed no more I2C transfers after DMA became available. > > Using i2cdetect shows that DMA is used, which is good: > > i2c-rcar e66d8000.i2c: got DMA channel for rx > > With permissive devlinks, the clock controller consumers are not added > to the deferred probing list, and probe order is slightly different. > The I2C controllers are still probed before the DMA controllers. > But DMA becomes available a bit earlier, before the probing of the last > I2C slave driver. This seems like a race? I'm guessing it's two different threads probing those two devices? And it just happens to work for "permissive" assuming the boot timing doesn't change? > Hence /sys/kernel/debug/dmaengine/summary shows that > some I2C transfers did use DMA. > > So the real issue is that e66d8000.i2c not linked as a consumer to > e6700000.dma-controller. That's because fw_devlink.strict=1 isn't set. If you need DMA to be treated as a mandatory supplier, you'll need to set the flag. Is fw_devlink=on really breaking anything here? It just seems like "permissive" got lucky with the timing and it could break at any point in the future. Thought? -Saravana
Hi Geert, On Mon, Feb 15, 2021 at 7:16 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > Hi Saravana, > > On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <saravanak@google.com> wrote: > > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > 1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu). > > > > > > - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device > > > node OF_POPULATED after init") is no longer needed (but already > > > queued for v5.12 anyway) > > > > Rob doesn't like the proliferation of OF_POPULATED and we don't need > > it anymore, so maybe work it out with him? It's a balance between some > > wasted memory (struct device(s)) vs not proliferating OF_POPULATED. > > > > 2. SH/R-Mobile AG5 (kzm9g), APE6 (ape6evm), A1 (armadillo800-eva) > > > > > > - "PATCH] soc: renesas: rmobile-sysc: Set OF_POPULATED and absorb > > > reset handling" is no longer needed > > > https://lore.kernel.org/linux-arm-kernel/20210205133319.1921108-1-geert+renesas@glider.be/ > > > > Good to see more evidence that this series is fixing things at a more > > generic level. > > I spoke too soon: if CONFIG_POWER_RESET_RMOBILE=n, > booting fails again, as everything is waiting on the system controller, > which never becomes available. > Rcar-sysc doesn't suffer from this problem, cfr. above. > Perhaps because the rmobile-sysc bindings use a hierarchical instead > of a linear PM domain description, and thus consumers point to the > children of the system controller node? > Cfr. system-controller@e6180000 in arch/arm/boot/dts/r8a7740.dtsi. Ok, I see what's going on. The problem is that the "power domain" fwnode being registered is not the node that contains the "compatible" property and becomes a device. So this patch[1] is not helping here. Fix is to do something like this (to avoid using OF_POPULATED flag and breaking reset): diff --git a/drivers/soc/renesas/rmobile-sysc.c b/drivers/soc/renesas/rmobile-sysc.c index 9046b8c933cb..b7e66139ef7d 100644 --- a/drivers/soc/renesas/rmobile-sysc.c +++ b/drivers/soc/renesas/rmobile-sysc.c @@ -344,6 +344,7 @@ static int __init rmobile_init_pm_domains(void) of_node_put(np); break; } + fwnode_dev_initialized(&np->fwnode, true); } put_special_pds(); Can you give it a shot? [1] - https://lore.kernel.org/lkml/20210205222644.2357303-8-saravanak@google.com/ > > > - On R-Mobile A1, I get a BUG and a memory leak: > > > > > > BUG: spinlock bad magic on CPU#0, swapper/1 > > > lock: lcdc0_device+0x10c/0x308, .magic: 00000000, .owner: > > > <none>/-1, .owner_cpu: 0 > > > CPU: 0 PID: 1 Comm: swapper Not tainted > > > 5.11.0-rc5-armadillo-00032-gf0a85c26907e #266 > > > Hardware name: Generic R8A7740 (Flattened Device Tree) > > > [<c010c3c8>] (unwind_backtrace) from [<c010a49c>] > > > (show_stack+0x10/0x14) > > > [<c010a49c>] (show_stack) from [<c0159534>] > > > (do_raw_spin_lock+0x20/0x94) > > > [<c0159534>] (do_raw_spin_lock) from [<c04089d8>] > > > (dev_pm_get_subsys_data+0x30/0xa0) > > > [<c04089d8>] (dev_pm_get_subsys_data) from [<c0413698>] > > > (genpd_add_device+0x34/0x1c0) > > > [<c0413698>] (genpd_add_device) from [<c041389c>] > > > (of_genpd_add_device+0x34/0x4c) > > > [<c041389c>] (of_genpd_add_device) from [<c0a1e9bc>] > > > (board_staging_register_device+0xf8/0x118) > > > [<c0a1e9bc>] (board_staging_register_device) from > > This is indeed a pre-existing problem. > of_genpd_add_device() is called before platform_device_register(), > as it needs to attach the genpd before the device is probed. > But the spinlock is only initialized when the device is registered. > This was masked before due to an unrelated wait context check failure, > which disabled any further spinlock checks, and exposed by fw_devlinks > changing probe order. > Patch sent. > "[PATCH] staging: board: Fix uninitialized spinlock when attaching genpd" > https://lore.kernel.org/r/20210215151405.2551143-1-geert+renesas@glider.be > Great! Thanks, Saravana
Hi Saravana, On Mon, Feb 15, 2021 at 10:27 PM Saravana Kannan <saravanak@google.com> wrote: > On Mon, Feb 15, 2021 at 4:38 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <saravanak@google.com> wrote: > > > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > > - I2C on R-Car Gen3 does not seem to use DMA, according to > > > > /sys/kernel/debug/dmaengine/summary: > > > > > > > > -dma4chan0 | e66d8000.i2c:tx > > > > -dma4chan1 | e66d8000.i2c:rx > > > > -dma5chan0 | e6510000.i2c:tx > > > > > > I think I need more context on the problem before I can try to fix it. > > > I'm also very unfamiliar with that file. With fw_devlink=permissive, > > > I2C was using DMA? If so, the next step is to see if the I2C relative > > > probe order with DMA is getting changed and if so, why. > > > > More detailed log: > > > > platform e66d8000.i2c: Linked as a consumer to e6150000.clock-controller > > platform e66d8000.i2c: Linked as a sync state only consumer to e6055400.gpio > > > > Why is e66d8000.i2c not linked as a consumer to e6700000.dma-controller? > > Because fw_devlink.strict=1 is not set and dma/iommu is considered an > "optional"/"driver decides" dependency. Oh, I thought dma/iommu were considered mandatory initially, but dropped as dependencies in the late boot process? > > > platform e6700000.dma-controller: Linked as a consumer to > > e6150000.clock-controller > > Is this the only supplier of dma-controller? No, e6180000.system-controller is also a supplier. > > platform e66d8000.i2c: Added to deferred list > > platform e6700000.dma-controller: Added to deferred list > > > > bus: 'platform': driver_probe_device: matched device > > e6700000.dma-controller with driver rcar-dmac > > bus: 'platform': really_probe: probing driver rcar-dmac with > > device e6700000.dma-controller > > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral > > > > bus: 'platform': driver_probe_device: matched device e66d8000.i2c > > with driver i2c-rcar > > bus: 'platform': really_probe: probing driver i2c-rcar with device > > e66d8000.i2c > > > > I2C becomes available... > > > > i2c-rcar e66d8000.i2c: request_channel failed for tx (-517) > > [...] > > > > but DMA is not available yet, so the driver falls back to PIO. > > > > driver: 'i2c-rcar': driver_bound: bound to device 'e66d8000.i2c' > > bus: 'platform': really_probe: bound device e66d8000.i2c to driver i2c-rcar > > > > platform e6700000.dma-controller: Retrying from deferred list > > bus: 'platform': driver_probe_device: matched device > > e6700000.dma-controller with driver rcar-dmac > > bus: 'platform': really_probe: probing driver rcar-dmac with > > device e6700000.dma-controller > > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral > > platform e6700000.dma-controller: Added to deferred list > > platform e6700000.dma-controller: Retrying from deferred list > > bus: 'platform': driver_probe_device: matched device > > e6700000.dma-controller with driver rcar-dmac > > bus: 'platform': really_probe: probing driver rcar-dmac with > > device e6700000.dma-controller > > driver: 'rcar-dmac': driver_bound: bound to device 'e6700000.dma-controller' > > bus: 'platform': really_probe: bound device > > e6700000.dma-controller to driver rcar-dmac > > > > DMA becomes available. > > > > Here userspace is entered. /sys/kernel/debug/dmaengine/summary shows > > that the I2C controllers do not have DMA channels allocated, as the > > kernel has performed no more I2C transfers after DMA became available. > > > > Using i2cdetect shows that DMA is used, which is good: > > > > i2c-rcar e66d8000.i2c: got DMA channel for rx > > > > With permissive devlinks, the clock controller consumers are not added > > to the deferred probing list, and probe order is slightly different. > > The I2C controllers are still probed before the DMA controllers. > > But DMA becomes available a bit earlier, before the probing of the last > > I2C slave driver. > > This seems like a race? I'm guessing it's two different threads > probing those two devices? And it just happens to work for > "permissive" assuming the boot timing doesn't change? > > > Hence /sys/kernel/debug/dmaengine/summary shows that > > some I2C transfers did use DMA. > > > > So the real issue is that e66d8000.i2c not linked as a consumer to > > e6700000.dma-controller. > > That's because fw_devlink.strict=1 isn't set. If you need DMA to be > treated as a mandatory supplier, you'll need to set the flag. > > Is fw_devlink=on really breaking anything here? It just seems like > "permissive" got lucky with the timing and it could break at any point > in the future. Thought? I don't think there is a race. fw_devlinks calling driver_deferred_probe_add() on all consumers has a big impact on probe order. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Hi Saravana, On Mon, Feb 15, 2021 at 10:57 PM Saravana Kannan <saravanak@google.com> wrote: > On Mon, Feb 15, 2021 at 7:16 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <saravanak@google.com> wrote: > > > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > > 1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu). > > > > > > > > - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device > > > > node OF_POPULATED after init") is no longer needed (but already > > > > queued for v5.12 anyway) > > > > > > Rob doesn't like the proliferation of OF_POPULATED and we don't need > > > it anymore, so maybe work it out with him? It's a balance between some > > > wasted memory (struct device(s)) vs not proliferating OF_POPULATED. > > > > > > 2. SH/R-Mobile AG5 (kzm9g), APE6 (ape6evm), A1 (armadillo800-eva) > > > > > > > > - "PATCH] soc: renesas: rmobile-sysc: Set OF_POPULATED and absorb > > > > reset handling" is no longer needed > > > > https://lore.kernel.org/linux-arm-kernel/20210205133319.1921108-1-geert+renesas@glider.be/ > > > > > > Good to see more evidence that this series is fixing things at a more > > > generic level. > > > > I spoke too soon: if CONFIG_POWER_RESET_RMOBILE=n, > > booting fails again, as everything is waiting on the system controller, > > which never becomes available. > > Rcar-sysc doesn't suffer from this problem, cfr. above. > > Perhaps because the rmobile-sysc bindings use a hierarchical instead > > of a linear PM domain description, and thus consumers point to the > > children of the system controller node? > > Cfr. system-controller@e6180000 in arch/arm/boot/dts/r8a7740.dtsi. > > Ok, I see what's going on. The problem is that the "power domain" > fwnode being registered is not the node that contains the "compatible" > property and becomes a device. So this patch[1] is not helping here. > Fix is to do something like this (to avoid using OF_POPULATED flag and > breaking reset): > > diff --git a/drivers/soc/renesas/rmobile-sysc.c > b/drivers/soc/renesas/rmobile-sysc.c > index 9046b8c933cb..b7e66139ef7d 100644 > --- a/drivers/soc/renesas/rmobile-sysc.c > +++ b/drivers/soc/renesas/rmobile-sysc.c > @@ -344,6 +344,7 @@ static int __init rmobile_init_pm_domains(void) > of_node_put(np); > break; > } > + fwnode_dev_initialized(&np->fwnode, true); > } > > put_special_pds(); > > Can you give it a shot? Thanks, works. Patch sent "[PATCH v2] soc: renesas: rmobile-sysc: Mark fwnode when PM domain is added" https://lore.kernel.org/linux-arm-kernel/20210216123958.3180014-1-geert+renesas@glider.be/ Gr{oetje,eeting}s, Geert
On Tue, Feb 16, 2021 at 12:05 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > Hi Saravana, > > On Mon, Feb 15, 2021 at 10:27 PM Saravana Kannan <saravanak@google.com> wrote: > > On Mon, Feb 15, 2021 at 4:38 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <saravanak@google.com> wrote: > > > > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > > > - I2C on R-Car Gen3 does not seem to use DMA, according to > > > > > /sys/kernel/debug/dmaengine/summary: > > > > > > > > > > -dma4chan0 | e66d8000.i2c:tx > > > > > -dma4chan1 | e66d8000.i2c:rx > > > > > -dma5chan0 | e6510000.i2c:tx > > > > > > > > I think I need more context on the problem before I can try to fix it. > > > > I'm also very unfamiliar with that file. With fw_devlink=permissive, > > > > I2C was using DMA? If so, the next step is to see if the I2C relative > > > > probe order with DMA is getting changed and if so, why. > > > > > > More detailed log: > > > > > > platform e66d8000.i2c: Linked as a consumer to e6150000.clock-controller > > > platform e66d8000.i2c: Linked as a sync state only consumer to e6055400.gpio > > > > > > Why is e66d8000.i2c not linked as a consumer to e6700000.dma-controller? > > > > Because fw_devlink.strict=1 is not set and dma/iommu is considered an > > "optional"/"driver decides" dependency. > > Oh, I thought dma/iommu were considered mandatory initially, > but dropped as dependencies in the late boot process? No, I didn't do that in case the drivers that didn't need the IOMMU/DMA were sensitive to probe order. My goal was for fw_devlink=on to not affect probe order for devices that currently don't need to defer probe. But see below... > > > > > > platform e6700000.dma-controller: Linked as a consumer to > > > e6150000.clock-controller > > > > Is this the only supplier of dma-controller? > > No, e6180000.system-controller is also a supplier. > > > > platform e66d8000.i2c: Added to deferred list > > > platform e6700000.dma-controller: Added to deferred list > > > > > > bus: 'platform': driver_probe_device: matched device > > > e6700000.dma-controller with driver rcar-dmac > > > bus: 'platform': really_probe: probing driver rcar-dmac with > > > device e6700000.dma-controller > > > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral > > > > > > bus: 'platform': driver_probe_device: matched device e66d8000.i2c > > > with driver i2c-rcar > > > bus: 'platform': really_probe: probing driver i2c-rcar with device > > > e66d8000.i2c > > > > > > I2C becomes available... > > > > > > i2c-rcar e66d8000.i2c: request_channel failed for tx (-517) > > > [...] > > > > > > but DMA is not available yet, so the driver falls back to PIO. > > > > > > driver: 'i2c-rcar': driver_bound: bound to device 'e66d8000.i2c' > > > bus: 'platform': really_probe: bound device e66d8000.i2c to driver i2c-rcar > > > > > > platform e6700000.dma-controller: Retrying from deferred list > > > bus: 'platform': driver_probe_device: matched device > > > e6700000.dma-controller with driver rcar-dmac > > > bus: 'platform': really_probe: probing driver rcar-dmac with > > > device e6700000.dma-controller > > > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral > > > platform e6700000.dma-controller: Added to deferred list > > > platform e6700000.dma-controller: Retrying from deferred list > > > bus: 'platform': driver_probe_device: matched device > > > e6700000.dma-controller with driver rcar-dmac > > > bus: 'platform': really_probe: probing driver rcar-dmac with > > > device e6700000.dma-controller > > > driver: 'rcar-dmac': driver_bound: bound to device 'e6700000.dma-controller' > > > bus: 'platform': really_probe: bound device > > > e6700000.dma-controller to driver rcar-dmac > > > > > > DMA becomes available. > > > > > > Here userspace is entered. /sys/kernel/debug/dmaengine/summary shows > > > that the I2C controllers do not have DMA channels allocated, as the > > > kernel has performed no more I2C transfers after DMA became available. > > > > > > Using i2cdetect shows that DMA is used, which is good: > > > > > > i2c-rcar e66d8000.i2c: got DMA channel for rx > > > > > > With permissive devlinks, the clock controller consumers are not added > > > to the deferred probing list, and probe order is slightly different. > > > The I2C controllers are still probed before the DMA controllers. > > > But DMA becomes available a bit earlier, before the probing of the last > > > I2C slave driver. > > > > This seems like a race? I'm guessing it's two different threads > > probing those two devices? And it just happens to work for > > "permissive" assuming the boot timing doesn't change? > > > > > Hence /sys/kernel/debug/dmaengine/summary shows that > > > some I2C transfers did use DMA. > > > > > > So the real issue is that e66d8000.i2c not linked as a consumer to > > > e6700000.dma-controller. > > > > That's because fw_devlink.strict=1 isn't set. If you need DMA to be > > treated as a mandatory supplier, you'll need to set the flag. > > > > Is fw_devlink=on really breaking anything here? It just seems like > > "permissive" got lucky with the timing and it could break at any point > > in the future. Thought? > > I don't think there is a race. Can you explain more please? This below makes it sound like DMA just sneaks in at the last minute. > > > The I2C controllers are still probed before the DMA controllers. > > > But DMA becomes available a bit earlier, before the probing of the last > > > I2C slave driver. > fw_devlinks calling driver_deferred_probe_add() > on all consumers has a big impact on probe order. Ugh... yeah. That's the real issue. This is really a device links issue that fw_devlink is exposing. I already have a bunch of things in my TODO list to improve deferred probing and probe ordering. Since this is not causing boot issues (only DMA issue) with fw_devlink=on, can we treat this as not a blocking item for fw_devlink=on? Once I go through my TODO list, it should be fixed (by not changing probe ordering unnecessarily). And if not, I can help find out a different solution at that point. Also, if you have IOMMU drivers, then fw_devlink.strict is also another solution that's available. On a separate note (not a final fix), I was wondering if we should have a config for fw_devlink.strict default value and then have it selected when IOMMU drivers configs are enabled. -Saravana
Hi Saravana, On Tue, Feb 16, 2021 at 7:49 PM Saravana Kannan <saravanak@google.com> wrote: > On Tue, Feb 16, 2021 at 12:05 AM Geert Uytterhoeven > <geert@linux-m68k.org> wrote: > > On Mon, Feb 15, 2021 at 10:27 PM Saravana Kannan <saravanak@google.com> wrote: > > > On Mon, Feb 15, 2021 at 4:38 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > > On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <saravanak@google.com> wrote: > > > > > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > > > > - I2C on R-Car Gen3 does not seem to use DMA, according to > > > > > > /sys/kernel/debug/dmaengine/summary: > > > > > > > > > > > > -dma4chan0 | e66d8000.i2c:tx > > > > > > -dma4chan1 | e66d8000.i2c:rx > > > > > > -dma5chan0 | e6510000.i2c:tx > > > > > > > > > > I think I need more context on the problem before I can try to fix it. > > > > > I'm also very unfamiliar with that file. With fw_devlink=permissive, > > > > > I2C was using DMA? If so, the next step is to see if the I2C relative > > > > > probe order with DMA is getting changed and if so, why. > > > > > > > > More detailed log: > > > > > > > > platform e66d8000.i2c: Linked as a consumer to e6150000.clock-controller > > > > platform e66d8000.i2c: Linked as a sync state only consumer to e6055400.gpio > > > > > > > > Why is e66d8000.i2c not linked as a consumer to e6700000.dma-controller? > > > > > > Because fw_devlink.strict=1 is not set and dma/iommu is considered an > > > "optional"/"driver decides" dependency. > > > > Oh, I thought dma/iommu were considered mandatory initially, > > but dropped as dependencies in the late boot process? > > No, I didn't do that in case the drivers that didn't need the > IOMMU/DMA were sensitive to probe order. > > My goal was for fw_devlink=on to not affect probe order for devices > that currently don't need to defer probe. But see below... > > > > > > > > > > platform e6700000.dma-controller: Linked as a consumer to > > > > e6150000.clock-controller > > > > > > Is this the only supplier of dma-controller? > > > > No, e6180000.system-controller is also a supplier. > > > > > > platform e66d8000.i2c: Added to deferred list > > > > platform e6700000.dma-controller: Added to deferred list > > > > > > > > bus: 'platform': driver_probe_device: matched device > > > > e6700000.dma-controller with driver rcar-dmac > > > > bus: 'platform': really_probe: probing driver rcar-dmac with > > > > device e6700000.dma-controller > > > > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral > > > > > > > > bus: 'platform': driver_probe_device: matched device e66d8000.i2c > > > > with driver i2c-rcar > > > > bus: 'platform': really_probe: probing driver i2c-rcar with device > > > > e66d8000.i2c > > > > > > > > I2C becomes available... > > > > > > > > i2c-rcar e66d8000.i2c: request_channel failed for tx (-517) > > > > [...] > > > > > > > > but DMA is not available yet, so the driver falls back to PIO. > > > > > > > > driver: 'i2c-rcar': driver_bound: bound to device 'e66d8000.i2c' > > > > bus: 'platform': really_probe: bound device e66d8000.i2c to driver i2c-rcar > > > > > > > > platform e6700000.dma-controller: Retrying from deferred list > > > > bus: 'platform': driver_probe_device: matched device > > > > e6700000.dma-controller with driver rcar-dmac > > > > bus: 'platform': really_probe: probing driver rcar-dmac with > > > > device e6700000.dma-controller > > > > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral > > > > platform e6700000.dma-controller: Added to deferred list > > > > platform e6700000.dma-controller: Retrying from deferred list > > > > bus: 'platform': driver_probe_device: matched device > > > > e6700000.dma-controller with driver rcar-dmac > > > > bus: 'platform': really_probe: probing driver rcar-dmac with > > > > device e6700000.dma-controller > > > > driver: 'rcar-dmac': driver_bound: bound to device 'e6700000.dma-controller' > > > > bus: 'platform': really_probe: bound device > > > > e6700000.dma-controller to driver rcar-dmac > > > > > > > > DMA becomes available. > > > > > > > > Here userspace is entered. /sys/kernel/debug/dmaengine/summary shows > > > > that the I2C controllers do not have DMA channels allocated, as the > > > > kernel has performed no more I2C transfers after DMA became available. > > > > > > > > Using i2cdetect shows that DMA is used, which is good: > > > > > > > > i2c-rcar e66d8000.i2c: got DMA channel for rx > > > > > > > > With permissive devlinks, the clock controller consumers are not added > > > > to the deferred probing list, and probe order is slightly different. > > > > The I2C controllers are still probed before the DMA controllers. > > > > But DMA becomes available a bit earlier, before the probing of the last > > > > I2C slave driver. > > > > > > This seems like a race? I'm guessing it's two different threads > > > probing those two devices? And it just happens to work for > > > "permissive" assuming the boot timing doesn't change? > > > > > > > Hence /sys/kernel/debug/dmaengine/summary shows that > > > > some I2C transfers did use DMA. > > > > > > > > So the real issue is that e66d8000.i2c not linked as a consumer to > > > > e6700000.dma-controller. > > > > > > That's because fw_devlink.strict=1 isn't set. If you need DMA to be > > > treated as a mandatory supplier, you'll need to set the flag. > > > > > > Is fw_devlink=on really breaking anything here? It just seems like > > > "permissive" got lucky with the timing and it could break at any point > > > in the future. Thought? > > > > I don't think there is a race. > > Can you explain more please? This below makes it sound like DMA just > sneaks in at the last minute. Yes it does, as the DMAC also has a consumer link to the IOMMU. If you ignore the consumer link from I2C to DMAC, the I2C device has less dependencies than the DMAC, so the I2C device, and the devices on the I2C bus, are probed much earlier than the DMAC. Gr{oetje,eeting}s, Geert
On Tue, Feb 16, 2021 at 12:31 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > Hi Saravana, > > On Tue, Feb 16, 2021 at 7:49 PM Saravana Kannan <saravanak@google.com> wrote: > > On Tue, Feb 16, 2021 at 12:05 AM Geert Uytterhoeven > > <geert@linux-m68k.org> wrote: > > > On Mon, Feb 15, 2021 at 10:27 PM Saravana Kannan <saravanak@google.com> wrote: > > > > On Mon, Feb 15, 2021 at 4:38 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > > > On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <saravanak@google.com> wrote: > > > > > > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > > > > > - I2C on R-Car Gen3 does not seem to use DMA, according to > > > > > > > /sys/kernel/debug/dmaengine/summary: > > > > > > > > > > > > > > -dma4chan0 | e66d8000.i2c:tx > > > > > > > -dma4chan1 | e66d8000.i2c:rx > > > > > > > -dma5chan0 | e6510000.i2c:tx > > > > > > > > > > > > I think I need more context on the problem before I can try to fix it. > > > > > > I'm also very unfamiliar with that file. With fw_devlink=permissive, > > > > > > I2C was using DMA? If so, the next step is to see if the I2C relative > > > > > > probe order with DMA is getting changed and if so, why. > > > > > > > > > > More detailed log: > > > > > > > > > > platform e66d8000.i2c: Linked as a consumer to e6150000.clock-controller > > > > > platform e66d8000.i2c: Linked as a sync state only consumer to e6055400.gpio > > > > > > > > > > Why is e66d8000.i2c not linked as a consumer to e6700000.dma-controller? > > > > > > > > Because fw_devlink.strict=1 is not set and dma/iommu is considered an > > > > "optional"/"driver decides" dependency. > > > > > > Oh, I thought dma/iommu were considered mandatory initially, > > > but dropped as dependencies in the late boot process? > > > > No, I didn't do that in case the drivers that didn't need the > > IOMMU/DMA were sensitive to probe order. > > > > My goal was for fw_devlink=on to not affect probe order for devices > > that currently don't need to defer probe. But see below... > > > > > > > > > > > > > > platform e6700000.dma-controller: Linked as a consumer to > > > > > e6150000.clock-controller > > > > > > > > Is this the only supplier of dma-controller? > > > > > > No, e6180000.system-controller is also a supplier. > > > > > > > > platform e66d8000.i2c: Added to deferred list > > > > > platform e6700000.dma-controller: Added to deferred list > > > > > > > > > > bus: 'platform': driver_probe_device: matched device > > > > > e6700000.dma-controller with driver rcar-dmac > > > > > bus: 'platform': really_probe: probing driver rcar-dmac with > > > > > device e6700000.dma-controller > > > > > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral > > > > > > > > > > bus: 'platform': driver_probe_device: matched device e66d8000.i2c > > > > > with driver i2c-rcar > > > > > bus: 'platform': really_probe: probing driver i2c-rcar with device > > > > > e66d8000.i2c > > > > > > > > > > I2C becomes available... > > > > > > > > > > i2c-rcar e66d8000.i2c: request_channel failed for tx (-517) > > > > > [...] > > > > > > > > > > but DMA is not available yet, so the driver falls back to PIO. > > > > > > > > > > driver: 'i2c-rcar': driver_bound: bound to device 'e66d8000.i2c' > > > > > bus: 'platform': really_probe: bound device e66d8000.i2c to driver i2c-rcar > > > > > > > > > > platform e6700000.dma-controller: Retrying from deferred list > > > > > bus: 'platform': driver_probe_device: matched device > > > > > e6700000.dma-controller with driver rcar-dmac > > > > > bus: 'platform': really_probe: probing driver rcar-dmac with > > > > > device e6700000.dma-controller > > > > > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral > > > > > platform e6700000.dma-controller: Added to deferred list > > > > > platform e6700000.dma-controller: Retrying from deferred list > > > > > bus: 'platform': driver_probe_device: matched device > > > > > e6700000.dma-controller with driver rcar-dmac > > > > > bus: 'platform': really_probe: probing driver rcar-dmac with > > > > > device e6700000.dma-controller > > > > > driver: 'rcar-dmac': driver_bound: bound to device 'e6700000.dma-controller' > > > > > bus: 'platform': really_probe: bound device > > > > > e6700000.dma-controller to driver rcar-dmac > > > > > > > > > > DMA becomes available. > > > > > > > > > > Here userspace is entered. /sys/kernel/debug/dmaengine/summary shows > > > > > that the I2C controllers do not have DMA channels allocated, as the > > > > > kernel has performed no more I2C transfers after DMA became available. > > > > > > > > > > Using i2cdetect shows that DMA is used, which is good: > > > > > > > > > > i2c-rcar e66d8000.i2c: got DMA channel for rx > > > > > > > > > > With permissive devlinks, the clock controller consumers are not added > > > > > to the deferred probing list, and probe order is slightly different. > > > > > The I2C controllers are still probed before the DMA controllers. > > > > > But DMA becomes available a bit earlier, before the probing of the last > > > > > I2C slave driver. > > > > > > > > This seems like a race? I'm guessing it's two different threads > > > > probing those two devices? And it just happens to work for > > > > "permissive" assuming the boot timing doesn't change? > > > > > > > > > Hence /sys/kernel/debug/dmaengine/summary shows that > > > > > some I2C transfers did use DMA. > > > > > > > > > > So the real issue is that e66d8000.i2c not linked as a consumer to > > > > > e6700000.dma-controller. > > > > > > > > That's because fw_devlink.strict=1 isn't set. If you need DMA to be > > > > treated as a mandatory supplier, you'll need to set the flag. > > > > > > > > Is fw_devlink=on really breaking anything here? It just seems like > > > > "permissive" got lucky with the timing and it could break at any point > > > > in the future. Thought? > > > > > > I don't think there is a race. > > > > Can you explain more please? This below makes it sound like DMA just > > sneaks in at the last minute. > > Yes it does, as the DMAC also has a consumer link to the IOMMU. > If you ignore the consumer link from I2C to DMAC, the I2C device has > less dependencies than the DMAC, so the I2C device, and the > devices on the I2C bus, are probed much earlier than the DMAC. Can you give this a shot? https://lore.kernel.org/lkml/20210217235130.1744843-1-saravanak@google.com/T/#u It should make sure fw_devlink doesn't add a device to the deferred probe list too soon and change the probe ordering unnecessarily. -Saravana
Hi Saravana, On Thu, Feb 18, 2021 at 12:57 AM Saravana Kannan <saravanak@google.com> wrote: > On Tue, Feb 16, 2021 at 12:31 PM Geert Uytterhoeven > <geert@linux-m68k.org> wrote: > > On Tue, Feb 16, 2021 at 7:49 PM Saravana Kannan <saravanak@google.com> wrote: > > > On Tue, Feb 16, 2021 at 12:05 AM Geert Uytterhoeven > > > <geert@linux-m68k.org> wrote: > > > > On Mon, Feb 15, 2021 at 10:27 PM Saravana Kannan <saravanak@google.com> wrote: > > > > > On Mon, Feb 15, 2021 at 4:38 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > > > > On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <saravanak@google.com> wrote: > > > > > > > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > > > > > > - I2C on R-Car Gen3 does not seem to use DMA, according to > > > > > > > > /sys/kernel/debug/dmaengine/summary: > > > > > > > > > > > > > > > > -dma4chan0 | e66d8000.i2c:tx > > > > > > > > -dma4chan1 | e66d8000.i2c:rx > > > > > > > > -dma5chan0 | e6510000.i2c:tx > > > > > > > > > > > > > > I think I need more context on the problem before I can try to fix it. > > > > > > > I'm also very unfamiliar with that file. With fw_devlink=permissive, > > > > > > > I2C was using DMA? If so, the next step is to see if the I2C relative > > > > > > > probe order with DMA is getting changed and if so, why. > > > > > > > > > > > > More detailed log: > > > > > > > > > > > > platform e66d8000.i2c: Linked as a consumer to e6150000.clock-controller > > > > > > platform e66d8000.i2c: Linked as a sync state only consumer to e6055400.gpio > > > > > > > > > > > > Why is e66d8000.i2c not linked as a consumer to e6700000.dma-controller? > > > > > > > > > > Because fw_devlink.strict=1 is not set and dma/iommu is considered an > > > > > "optional"/"driver decides" dependency. > > > > > > > > Oh, I thought dma/iommu were considered mandatory initially, > > > > but dropped as dependencies in the late boot process? > > > > > > No, I didn't do that in case the drivers that didn't need the > > > IOMMU/DMA were sensitive to probe order. > > > > > > My goal was for fw_devlink=on to not affect probe order for devices > > > that currently don't need to defer probe. But see below... > > > > > > > > > > > > > > > > > > platform e6700000.dma-controller: Linked as a consumer to > > > > > > e6150000.clock-controller > > > > > > > > > > Is this the only supplier of dma-controller? > > > > > > > > No, e6180000.system-controller is also a supplier. > > > > > > > > > > platform e66d8000.i2c: Added to deferred list > > > > > > platform e6700000.dma-controller: Added to deferred list > > > > > > > > > > > > bus: 'platform': driver_probe_device: matched device > > > > > > e6700000.dma-controller with driver rcar-dmac > > > > > > bus: 'platform': really_probe: probing driver rcar-dmac with > > > > > > device e6700000.dma-controller > > > > > > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral > > > > > > > > > > > > bus: 'platform': driver_probe_device: matched device e66d8000.i2c > > > > > > with driver i2c-rcar > > > > > > bus: 'platform': really_probe: probing driver i2c-rcar with device > > > > > > e66d8000.i2c > > > > > > > > > > > > I2C becomes available... > > > > > > > > > > > > i2c-rcar e66d8000.i2c: request_channel failed for tx (-517) > > > > > > [...] > > > > > > > > > > > > but DMA is not available yet, so the driver falls back to PIO. > > > > > > > > > > > > driver: 'i2c-rcar': driver_bound: bound to device 'e66d8000.i2c' > > > > > > bus: 'platform': really_probe: bound device e66d8000.i2c to driver i2c-rcar > > > > > > > > > > > > platform e6700000.dma-controller: Retrying from deferred list > > > > > > bus: 'platform': driver_probe_device: matched device > > > > > > e6700000.dma-controller with driver rcar-dmac > > > > > > bus: 'platform': really_probe: probing driver rcar-dmac with > > > > > > device e6700000.dma-controller > > > > > > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral > > > > > > platform e6700000.dma-controller: Added to deferred list > > > > > > platform e6700000.dma-controller: Retrying from deferred list > > > > > > bus: 'platform': driver_probe_device: matched device > > > > > > e6700000.dma-controller with driver rcar-dmac > > > > > > bus: 'platform': really_probe: probing driver rcar-dmac with > > > > > > device e6700000.dma-controller > > > > > > driver: 'rcar-dmac': driver_bound: bound to device 'e6700000.dma-controller' > > > > > > bus: 'platform': really_probe: bound device > > > > > > e6700000.dma-controller to driver rcar-dmac > > > > > > > > > > > > DMA becomes available. > > > > > > > > > > > > Here userspace is entered. /sys/kernel/debug/dmaengine/summary shows > > > > > > that the I2C controllers do not have DMA channels allocated, as the > > > > > > kernel has performed no more I2C transfers after DMA became available. > > > > > > > > > > > > Using i2cdetect shows that DMA is used, which is good: > > > > > > > > > > > > i2c-rcar e66d8000.i2c: got DMA channel for rx > > > > > > > > > > > > With permissive devlinks, the clock controller consumers are not added > > > > > > to the deferred probing list, and probe order is slightly different. > > > > > > The I2C controllers are still probed before the DMA controllers. > > > > > > But DMA becomes available a bit earlier, before the probing of the last > > > > > > I2C slave driver. > > > > > > > > > > This seems like a race? I'm guessing it's two different threads > > > > > probing those two devices? And it just happens to work for > > > > > "permissive" assuming the boot timing doesn't change? > > > > > > > > > > > Hence /sys/kernel/debug/dmaengine/summary shows that > > > > > > some I2C transfers did use DMA. > > > > > > > > > > > > So the real issue is that e66d8000.i2c not linked as a consumer to > > > > > > e6700000.dma-controller. > > > > > > > > > > That's because fw_devlink.strict=1 isn't set. If you need DMA to be > > > > > treated as a mandatory supplier, you'll need to set the flag. > > > > > > > > > > Is fw_devlink=on really breaking anything here? It just seems like > > > > > "permissive" got lucky with the timing and it could break at any point > > > > > in the future. Thought? > > > > > > > > I don't think there is a race. > > > > > > Can you explain more please? This below makes it sound like DMA just > > > sneaks in at the last minute. > > > > Yes it does, as the DMAC also has a consumer link to the IOMMU. > > If you ignore the consumer link from I2C to DMAC, the I2C device has > > less dependencies than the DMAC, so the I2C device, and the > > devices on the I2C bus, are probed much earlier than the DMAC. > > Can you give this a shot? > https://lore.kernel.org/lkml/20210217235130.1744843-1-saravanak@google.com/T/#u > > It should make sure fw_devlink doesn't add a device to the deferred > probe list too soon and change the probe ordering unnecessarily. (FTR, to keep all info in this thread) Yes, this makes I2C use DMA again on Salvator-XS during kernel boot-up. I haven't run any more elaborate tests on other platforms. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds