mbox series

[v2,0/3] Make fw_devlink=on more forgiving

Message ID 20210202043345.3778765-1-saravanak@google.com
Headers show
Series Make fw_devlink=on more forgiving | expand

Message

Saravana Kannan Feb. 2, 2021, 4:33 a.m. UTC
This patch series solves two general issues with fw_devlink=on

Patch 1/3 and 3/3 addresses the issue of firmware nodes that look like
they'll have struct devices created for them, but will never actually
have struct devices added for them. For example, DT nodes with a
compatible property that don't have devices added for them.

Patch 2/2 address (for static kernels) the issue of optional suppliers
that'll never have a driver registered for them. So, if the device could
have probed with fw_devlink=permissive with a static kernel, this patch
should allow those devices to probe with a fw_devlink=on. This doesn't
solve it for the case where modules are enabled because there's no way
to tell if a driver will never be registered or it's just about to be
registered. I have some other ideas for that, but it'll have to come
later thinking about it a bit.

Marek, Geert,

I don't expect v2 to do any better for your cases.

This series not making any difference for Marek is still a mystery to
me. I guess one of the consumers doesn't take too well to its probe (and
it's consumers' probe) being delayed till late_initcall(). I'll continue
looking into it.

Marc,

This v2 should do better than v1 with gpiolib stub driver reverted. I
forgot to take care of the case where more suppliers could link after I
went and deleted some of the links. v2 handles that now.

Tudor,

You should still make the clock driver fix (because it's a bug), but I
think this series will fix your issue too (even without the clock driver
fix). Can you please give this a shot?

Martin,

If you tested this series, can you please give a Tested-by?

Thanks,
Saravana

v1 -> v2:
Patch 1: Added a flag to fwnodes that aren't devices.
Patch 3: New patch to ise the flag set in patch 1 to not create bad links.

Saravana Kannan (3):
  driver core: fw_devlink: Detect supplier devices that will never be
    added
  driver core: fw_devlink: Handle missing drivers for optional suppliers
  of: property: Don't add links to absent suppliers

 drivers/base/base.h    |   2 +
 drivers/base/core.c    | 135 +++++++++++++++++++++++++++++++++++------
 drivers/base/dd.c      |   5 ++
 drivers/of/property.c  |   4 +-
 include/linux/fwnode.h |   2 +
 5 files changed, 127 insertions(+), 21 deletions(-)

Comments

Rafael J. Wysocki Feb. 2, 2021, 2:12 p.m. UTC | #1
On Tue, Feb 2, 2021 at 5:33 AM Saravana Kannan <saravanak@google.com> wrote:
>
> During the initial parsing of firmware by fw_devlink, fw_devlink might
> infer that some supplier firmware nodes would get populated as devices.
> But the inference is not always correct. This patch tries to logically
> detect and fix such mistakes as boot progresses or more devices probe.
>
> fw_devlink makes a fundamental assumption that once a device binds to a
> driver, it will populate (i.e: add as struct devices) all the child
> firmware nodes that could be populated as devices (if they aren't
> populated already).
>
> So, whenever a device probes, we check all its child firmware nodes. If
> a child firmware node has a corresponding device populated, we don't
> modify the child node or its descendants. However, if a child firmware
> node has not been populated as a device, we delete all the fwnode links
> where the child node or its descendants are suppliers. This ensures that
> no other device is blocked on a firmware node that will never be
> populated as a device. We also mark such fwnodes as NOT_DEVICE, so that
> no new fwnode links are created with these nodes as suppliers.
>
> Fixes: e590474768f1 ("driver core: Set fw_devlink=on by default")
> Signed-off-by: Saravana Kannan <saravanak@google.com>

Still ACKed.

> ---
>  drivers/base/core.c    | 31 ++++++++++++++++++++++++++++---
>  include/linux/fwnode.h |  2 ++
>  2 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 484a942884ba..c95b1daabac7 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -148,6 +148,21 @@ void fwnode_links_purge(struct fwnode_handle *fwnode)
>         fwnode_links_purge_consumers(fwnode);
>  }
>
> +static void fw_devlink_purge_absent_suppliers(struct fwnode_handle *fwnode)
> +{
> +       struct fwnode_handle *child;
> +
> +       /* Don't purge consumer links of an added child */
> +       if (fwnode->dev)
> +               return;
> +
> +       fwnode->flags |= FWNODE_FLAG_NOT_DEVICE;
> +       fwnode_links_purge_consumers(fwnode);
> +
> +       fwnode_for_each_available_child_node(fwnode, child)
> +               fw_devlink_purge_absent_suppliers(child);
> +}
> +
>  #ifdef CONFIG_SRCU
>  static DEFINE_MUTEX(device_links_lock);
>  DEFINE_STATIC_SRCU(device_links_srcu);
> @@ -1154,12 +1169,22 @@ void device_links_driver_bound(struct device *dev)
>         LIST_HEAD(sync_list);
>
>         /*
> -        * If a device probes successfully, it's expected to have created all
> +        * If a device binds successfully, it's expected to have created all
>          * the device links it needs to or make new device links as it needs
> -        * them. So, it no longer needs to wait on any suppliers.
> +        * them. So, fw_devlink no longer needs to create device links to any
> +        * of the device's suppliers.
> +        *
> +        * Also, if a child firmware node of this bound device is not added as
> +        * a device by now, assume it is never going to be added and make sure
> +        * other devices don't defer probe indefinitely by waiting for such a
> +        * child device.
>          */
> -       if (dev->fwnode && dev->fwnode->dev == dev)
> +       if (dev->fwnode && dev->fwnode->dev == dev) {
> +               struct fwnode_handle *child;
>                 fwnode_links_purge_suppliers(dev->fwnode);
> +               fwnode_for_each_available_child_node(dev->fwnode, child)
> +                       fw_devlink_purge_absent_suppliers(child);
> +       }
>         device_remove_file(dev, &dev_attr_waiting_for_supplier);
>
>         device_links_write_lock();
> diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
> index fde4ad97564c..21082f11473f 100644
> --- a/include/linux/fwnode.h
> +++ b/include/linux/fwnode.h
> @@ -19,8 +19,10 @@ struct device;
>   * fwnode link flags
>   *
>   * LINKS_ADDED: The fwnode has already be parsed to add fwnode links.
> + * NOT_DEVICE: The fwnode will never be populated as a struct device.
>   */
>  #define FWNODE_FLAG_LINKS_ADDED                BIT(0)
> +#define FWNODE_FLAG_NOT_DEVICE         BIT(1)
>
>  struct fwnode_handle {
>         struct fwnode_handle *secondary;
> --
> 2.30.0.365.g02bc693789-goog
>
Tudor Ambarus Feb. 2, 2021, 4:52 p.m. UTC | #2
Hi, Saravana,

On 2/2/21 6:33 AM, Saravana Kannan wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe

> 

> This patch series solves two general issues with fw_devlink=on

> 

> Patch 1/3 and 3/3 addresses the issue of firmware nodes that look like

> they'll have struct devices created for them, but will never actually

> have struct devices added for them. For example, DT nodes with a

> compatible property that don't have devices added for them.

> 

> Patch 2/2 address (for static kernels) the issue of optional suppliers

> that'll never have a driver registered for them. So, if the device could

> have probed with fw_devlink=permissive with a static kernel, this patch

> should allow those devices to probe with a fw_devlink=on. This doesn't

> solve it for the case where modules are enabled because there's no way

> to tell if a driver will never be registered or it's just about to be

> registered. I have some other ideas for that, but it'll have to come

> later thinking about it a bit.

> 

> Marek, Geert,

> 

> I don't expect v2 to do any better for your cases.

> 

> This series not making any difference for Marek is still a mystery to

> me. I guess one of the consumers doesn't take too well to its probe (and

> it's consumers' probe) being delayed till late_initcall(). I'll continue

> looking into it.

> 

> Marc,

> 

> This v2 should do better than v1 with gpiolib stub driver reverted. I

> forgot to take care of the case where more suppliers could link after I

> went and deleted some of the links. v2 handles that now.

> 

> Tudor,

> 

> You should still make the clock driver fix (because it's a bug), but I

> think this series will fix your issue too (even without the clock driver

> fix). Can you please give this a shot?

> 


Did the following tests (using sama5_defconfig and at91-sama5d2_xplained.dts):
1/ modular kernel with your v2 on top of next-20210202, and without the clock
driver fix: the problem persists.

2/ static kernel with your v2 on top of next-20210202, and without the clock
driver fix: the problem persists. Comparing to the previous test, I see that
the links to pmc are dropped. I can see the following only with early printk
enabled:
platform fc008000.serial: Dropping the link to f0014000.pmc
But later on, the serial still gets deferred waiting for the dma controller
this time:
platform f8020000.serial: probe deferral - supplier f0010000.dma-controller not ready
I'll check what happens in the dma-controller.

3/ modular kernel with your v2 and the clock driver fix on top of next-20210202:
I can boot the board and I have access to the console. I still have some
probe deferrals that I need to check:
root@sama5d2-xplained-sd:~# dmesg | grep -i defer
[    4.335625] platform b0000000.sdio-host: probe deferral - wait for supplier pmic@5b
[    4.474559] platform fc030000.adc: probe deferral - wait for supplier pmic@5b
[    4.884315] calling  deferred_probe_initcall+0x0/0xd8 @ 1
[    4.889206] platform fc030000.adc: probe deferral - wait for supplier pmic@5b
[    5.139447] initcall deferred_probe_initcall+0x0/0xd8 returned 0 after 245132 usecs
root@sama5d2-xplained-sd:~# dmesg | grep -i linked
[    4.916342] platform fc030000.adc: Linked as a consumer to 0-005b
[    4.921298] platform b0000000.sdio-host: Linked as a consumer to 0-005b
[    4.926817] i2c 0-005b: Linked as a sync state only consumer to fc038000.pinctrl
[    4.990246] platform act8945a-charger: Linked as a consumer to fc038000.pinctrl
[    5.078488] at24 1-0054: Linked as a consumer to regulator.0
[    5.111533] at91-sama5d2_adc fc030000.adc: Linked as a consumer to regulator.5
[    5.118969] sdhci-at91 b0000000.sdio-host: Linked as a consumer to regulator.3
root@sama5d2-xplained-sd:~# dmesg | grep -i drop
[    5.014026] act8945a 0-005b: Dropping the link to fc038000.pinctrl

4/ static kernel with your v2 and the clock driver fix on top of next-20210202:
I can boot the board and I have access to the console. The probe deferrals are the
same as in test 3/.

Cheers,
ta
Rob Herring Feb. 2, 2021, 5:41 p.m. UTC | #3
On Tue, Feb 2, 2021 at 10:52 AM <Tudor.Ambarus@microchip.com> wrote:
>
> Hi, Saravana,
>
> On 2/2/21 6:33 AM, Saravana Kannan wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> >
> > This patch series solves two general issues with fw_devlink=on
> >
> > Patch 1/3 and 3/3 addresses the issue of firmware nodes that look like
> > they'll have struct devices created for them, but will never actually
> > have struct devices added for them. For example, DT nodes with a
> > compatible property that don't have devices added for them.
> >
> > Patch 2/2 address (for static kernels) the issue of optional suppliers
> > that'll never have a driver registered for them. So, if the device could
> > have probed with fw_devlink=permissive with a static kernel, this patch
> > should allow those devices to probe with a fw_devlink=on. This doesn't
> > solve it for the case where modules are enabled because there's no way
> > to tell if a driver will never be registered or it's just about to be
> > registered. I have some other ideas for that, but it'll have to come
> > later thinking about it a bit.
> >
> > Marek, Geert,
> >
> > I don't expect v2 to do any better for your cases.
> >
> > This series not making any difference for Marek is still a mystery to
> > me. I guess one of the consumers doesn't take too well to its probe (and
> > it's consumers' probe) being delayed till late_initcall(). I'll continue
> > looking into it.
> >
> > Marc,
> >
> > This v2 should do better than v1 with gpiolib stub driver reverted. I
> > forgot to take care of the case where more suppliers could link after I
> > went and deleted some of the links. v2 handles that now.
> >
> > Tudor,
> >
> > You should still make the clock driver fix (because it's a bug), but I
> > think this series will fix your issue too (even without the clock driver
> > fix). Can you please give this a shot?
> >
>
> Did the following tests (using sama5_defconfig and at91-sama5d2_xplained.dts):
> 1/ modular kernel with your v2 on top of next-20210202, and without the clock
> driver fix: the problem persists.
>
> 2/ static kernel with your v2 on top of next-20210202, and without the clock
> driver fix: the problem persists. Comparing to the previous test, I see that
> the links to pmc are dropped. I can see the following only with early printk
> enabled:
> platform fc008000.serial: Dropping the link to f0014000.pmc
> But later on, the serial still gets deferred waiting for the dma controller
> this time:
> platform f8020000.serial: probe deferral - supplier f0010000.dma-controller not ready
> I'll check what happens in the dma-controller.

Not sure if it's the case here, but some serial drivers use DMA only
when available and decide that on open() rather than probe. How is
devlinks going to deal with that?

Rob
Martin Kaiser Feb. 2, 2021, 9:22 p.m. UTC | #4
Hi Saravana,

Thus wrote Saravana Kannan (saravanak@google.com):

> Martin,

> If you tested this series, can you please give a Tested-by?

I tested this v2 series on top of next-20210202 (without the fsl,avic
patch).

If modules are enabled, the kernel doesn't boot on my imx25 board. This
is expected, I guess.

With modules disabled, the kernel boots but probe fails for some
(non-mainline) drivers in my tree. All of those drivers have a gpio in
their device-tree node, such as

my_driver {
   gpio_test1 = <&gpio1 0 0>;
   ...
};

with gpio1 from arch/arm/boot/dts/imx25.dtsi.

The probe function calls

of_get_named_gpio(np, "gpio_test1", 0);

to get the gpio. This fails with -EINVAL.

Best regards,
Martin
Saravana Kannan Feb. 2, 2021, 10:43 p.m. UTC | #5
On Tue, Feb 2, 2021 at 1:22 PM Martin Kaiser <martin@kaiser.cx> wrote:
>
> Hi Saravana,
>
> Thus wrote Saravana Kannan (saravanak@google.com):
>
> > Martin,
>
> > If you tested this series, can you please give a Tested-by?
>
> I tested this v2 series on top of next-20210202 (without the fsl,avic
> patch).
>
> If modules are enabled, the kernel doesn't boot on my imx25 board. This
> is expected, I guess.
>
> With modules disabled, the kernel boots but probe fails for some
> (non-mainline) drivers in my tree.

Thanks Martin!

> All of those drivers have a gpio in
> their device-tree node, such as
>
> my_driver {
>    gpio_test1 = <&gpio1 0 0>;
>    ...
> };
>
> with gpio1 from arch/arm/boot/dts/imx25.dtsi.
>
> The probe function calls
>
> of_get_named_gpio(np, "gpio_test1", 0);
>
> to get the gpio. This fails with -EINVAL.

And you didn't see this issue with the fsl,avic patch?

The property you are using is not a standard GPIO binding (-gpios,
gpio, gpios) and I'm not surprised it's not working. The gpio1 is
probably getting probe deferred and ends up running after "my_driver".

-Saravana
Geert Uytterhoeven Feb. 3, 2021, 7:54 a.m. UTC | #6
On Tue, Feb 2, 2021 at 11:44 PM Saravana Kannan <saravanak@google.com> wrote:
> On Tue, Feb 2, 2021 at 1:22 PM Martin Kaiser <martin@kaiser.cx> wrote:

> > Thus wrote Saravana Kannan (saravanak@google.com):

> > All of those drivers have a gpio in

> > their device-tree node, such as

> >

> > my_driver {

> >    gpio_test1 = <&gpio1 0 0>;

> >    ...

> > };

> >

> > with gpio1 from arch/arm/boot/dts/imx25.dtsi.

> >

> > The probe function calls

> >

> > of_get_named_gpio(np, "gpio_test1", 0);

> >

> > to get the gpio. This fails with -EINVAL.

>

> And you didn't see this issue with the fsl,avic patch?

>

> The property you are using is not a standard GPIO binding (-gpios,

> gpio, gpios) and I'm not surprised it's not working. The gpio1 is

> probably getting probe deferred and ends up running after "my_driver".


So my_driver doesn't support deferred probe, as of_get_named_gpio()
returns -EINVAL instead of -EPROBE_DEFER?
Converting my_driver from of_get_named_gpio() to the gpiod_*() API
should at least make the driver support probe deferral, after which I
expect it to start working again on reprobe?

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Geert Uytterhoeven Feb. 3, 2021, 8:15 a.m. UTC | #7
Hi Saravana,

On Wed, Feb 3, 2021 at 9:11 AM Saravana Kannan <saravanak@google.com> wrote:
> On Tue, Feb 2, 2021 at 11:55 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:

> > On Tue, Feb 2, 2021 at 11:44 PM Saravana Kannan <saravanak@google.com> wrote:

> > > On Tue, Feb 2, 2021 at 1:22 PM Martin Kaiser <martin@kaiser.cx> wrote:

> > > > Thus wrote Saravana Kannan (saravanak@google.com):

> > > > All of those drivers have a gpio in

> > > > their device-tree node, such as

> > > >

> > > > my_driver {

> > > >    gpio_test1 = <&gpio1 0 0>;

> > > >    ...

> > > > };

> > > >

> > > > with gpio1 from arch/arm/boot/dts/imx25.dtsi.

> > > >

> > > > The probe function calls

> > > >

> > > > of_get_named_gpio(np, "gpio_test1", 0);

> > > >

> > > > to get the gpio. This fails with -EINVAL.

> > >

> > > And you didn't see this issue with the fsl,avic patch?

> > >

> > > The property you are using is not a standard GPIO binding (-gpios,

> > > gpio, gpios) and I'm not surprised it's not working. The gpio1 is

> > > probably getting probe deferred and ends up running after "my_driver".

> >

> > So my_driver doesn't support deferred probe, as of_get_named_gpio()

> > returns -EINVAL instead of -EPROBE_DEFER?

> > Converting my_driver from of_get_named_gpio() to the gpiod_*() API

> > should at least make the driver support probe deferral, after which I

> > expect it to start working again on reprobe?

>

> The way I understood the API/example, you can't just change the code

> and have it work. The DT itself isn't using standard bindings. And we


Oh, right.

> can't make kernel changes that assume the DT has been changed to match

> the code. So, the best we could do is have of_get_named_gpio() return

> -EPROBE_DEFER if it doesn't find the GPIO -- assuming that doesn't

> break other users. Or have this specific driver remap the -EINVAL to

> -EPROBE_DEFER.


The latter would hide real errors, too, and would cause futile reprobes.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Martin Kaiser Feb. 3, 2021, 10:02 p.m. UTC | #8
Thus wrote Geert Uytterhoeven (geert@linux-m68k.org):

> > The property you are using is not a standard GPIO binding (-gpios,
> > gpio, gpios) and I'm not surprised it's not working. The gpio1 is
> > probably getting probe deferred and ends up running after "my_driver".

> So my_driver doesn't support deferred probe,

I know that the gpio definition in the device-tree is non-standard (and
should have been done differently).

Apart from this, the driver uses module_platform_driver_probe. My
understanding is that this prevents probe deferral.

Does this mean that from now on, a driver which requests a gpio must not
use module_platform_driver_probe?

Thanks,
Martin
Saravana Kannan Feb. 3, 2021, 10:03 p.m. UTC | #9
On Wed, Feb 3, 2021 at 1:58 PM Martin Kaiser <martin@kaiser.cx> wrote:
>

> Thus wrote Saravana Kannan (saravanak@google.com):

>

> > > With modules disabled, the kernel boots but probe fails for some

> > > (non-mainline) drivers in my tree.

>

> > Thanks Martin!

>

> > > All of those drivers have a gpio in

> > > their device-tree node, such as

>

> > > my_driver {

> > >    gpio_test1 = <&gpio1 0 0>;

> > >    ...

> > > };

>

> > > with gpio1 from arch/arm/boot/dts/imx25.dtsi.

>

> > > The probe function calls

>

> > > of_get_named_gpio(np, "gpio_test1", 0);

>

> > > to get the gpio. This fails with -EINVAL.

>

> > And you didn't see this issue with the fsl,avic patch?

>

> No. With the fsl,avic patch in place, all drivers are probed correctly.

>

> > The property you are using is not a standard GPIO binding (-gpios,

> > gpio, gpios) and I'm not surprised it's not working.

>

> I know that I should be using the gpiod API as suggested by Geert.

>

> BTW is this definition ok? Could its driver be converted to using the

> gpiod api?

>

> rtc: rtc {

>    compatible = "moxa,moxart-rtc";

>    gpio-rtc-sclk = <&gpio 5 0>;

> ...


The correct non-deprecated binding AFAIK is something-gpios. Not
gpio-something. And then you can use different APIs to get the GPIO (I
forget what it's called).

-Saravana

>

>

> > The gpio1 is probably getting probe deferred and ends up running after

> > "my_driver".

>

> I added a debug print in the probe function. It turned out that the

> driver for gpio1 is probed for the first time after my_driver.

>

> I removed the interrupt-controller property for gpio2 for testing. gpio2

> was then probed much earlier.
Saravana Kannan Feb. 4, 2021, 7:17 p.m. UTC | #10
On Mon, Feb 1, 2021 at 8:33 PM Saravana Kannan <saravanak@google.com> wrote:
>

> If driver core marks a firmware node as not a device, don't add fwnode

> links where it's a supplier.

>

> Fixes: e590474768f1 ("driver core: Set fw_devlink=on by default")

> Signed-off-by: Saravana Kannan <saravanak@google.com>

> ---

>  drivers/of/property.c | 4 +++-

>  1 file changed, 3 insertions(+), 1 deletion(-)

>

> diff --git a/drivers/of/property.c b/drivers/of/property.c

> index 6287c6d60bb7..53d163c8d39b 100644

> --- a/drivers/of/property.c

> +++ b/drivers/of/property.c

> @@ -1103,7 +1103,9 @@ static int of_link_to_phandle(struct device_node *con_np,

>          * created for them.

>          */

>         sup_dev = get_dev_from_fwnode(&sup_np->fwnode);

> -       if (!sup_dev && of_node_check_flag(sup_np, OF_POPULATED)) {

> +       if (!sup_dev &&

> +           (of_node_check_flag(sup_np, OF_POPULATED) ||

> +            sup_np->fwnode.flags & FWNODE_FLAG_NOT_DEVICE)) {

>                 pr_debug("Not linking %pOFP to %pOFP - No struct device\n",

>                          con_np, sup_np);

>                 of_node_put(sup_np);

> --

> 2.30.0.365.g02bc693789-goog

>


Rob, Can I get an Ack or reviewed-by for this? I want to land this
patch and another one on driver-core.

Thanks,
Saravana