diff mbox series

[v1] of: platform: Batch fwnode parsing in the init_machine() path

Message ID 20201001225952.3676755-1-saravanak@google.com
State New
Headers show
Series [v1] of: platform: Batch fwnode parsing in the init_machine() path | expand

Commit Message

Saravana Kannan Oct. 1, 2020, 10:59 p.m. UTC
When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when
adding all top level devices") optimized the fwnode parsing when all top
level devices are added, it missed out optimizing this for platform
where the top level devices are added through the init_machine() path.

This commit does the optimization for all paths by simply moving the
fw_devlink_pause/resume() inside of_platform_default_populate().

Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/of/platform.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

Comments

Laurent Pinchart Oct. 1, 2020, 11:19 p.m. UTC | #1
Hi Saravana,

Thank you for the patch.

On Thu, Oct 01, 2020 at 03:59:51PM -0700, Saravana Kannan wrote:
> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when
> adding all top level devices") optimized the fwnode parsing when all top
> level devices are added, it missed out optimizing this for platform
> where the top level devices are added through the init_machine() path.
> 
> This commit does the optimization for all paths by simply moving the
> fw_devlink_pause/resume() inside of_platform_default_populate().

Based on v5.9-rc5, before the patch:

[    0.652887] cpuidle: using governor menu
[   12.349476] No ATAGs?

After the patch:

[    0.650460] cpuidle: using governor menu
[   12.262101] No ATAGs?

:-(

> Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/of/platform.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> index 071f04da32c8..79972e49b539 100644
> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root,
>  				 const struct of_dev_auxdata *lookup,
>  				 struct device *parent)
>  {
> -	return of_platform_populate(root, of_default_bus_match_table, lookup,
> -				    parent);
> +	int ret;
> +
> +	/*
> +	 * fw_devlink_pause/resume() are only safe to be called around top
> +	 * level device addition due to locking constraints.
> +	 */
> +	if (!root)
> +		fw_devlink_pause();
> +
> +	ret = of_platform_populate(root, of_default_bus_match_table, lookup,
> +				   parent);
> +
> +	if (!root)
> +		fw_devlink_resume();
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(of_platform_default_populate);
>  
> @@ -538,9 +551,7 @@ static int __init of_platform_default_populate_init(void)
>  	}
>  
>  	/* Populate everything else. */
> -	fw_devlink_pause();
>  	of_platform_default_populate(NULL, NULL, NULL);
> -	fw_devlink_resume();
>  
>  	return 0;
>  }
Grygorii Strashko Oct. 2, 2020, 11:40 a.m. UTC | #2
On 02/10/2020 02:19, Laurent Pinchart wrote:
> Hi Saravana,
> 
> Thank you for the patch.
> 
> On Thu, Oct 01, 2020 at 03:59:51PM -0700, Saravana Kannan wrote:
>> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when
>> adding all top level devices") optimized the fwnode parsing when all top
>> level devices are added, it missed out optimizing this for platform
>> where the top level devices are added through the init_machine() path.
>>
>> This commit does the optimization for all paths by simply moving the
>> fw_devlink_pause/resume() inside of_platform_default_populate().
> 
> Based on v5.9-rc5, before the patch:
> 
> [    0.652887] cpuidle: using governor menu
> [   12.349476] No ATAGs?
> 
> After the patch:
> 
> [    0.650460] cpuidle: using governor menu
> [   12.262101] No ATAGs?
> 
> :-(

This is kinda expected :( because omap2 arch doesn't call of_platform_default_populate()

Call path:
board-generic.c
  DT_MACHINE_START()
    .init_machine	= omap_generic_init,

  omap_generic_init()
    pdata_quirks_init(omap_dt_match_table);
		of_platform_populate(NULL, omap_dt_match_table,
			     omap_auxdata_lookup, NULL);

Other affected platforms
arm: mach-ux500
some mips
some powerpc

there are also case when a lot of devices placed under bus node, in such case
  of_platform_populate() calls from bus drivers will also suffer from this issue.

I think one option could be to add some parameter to _populate() or introduce new api.

By the way, is there option to disable this feature at all?
Is there Kconfig option?
Is there any reasons why such complex and time consuming code added to the kernel and not implemented on DTC level?


Also, I've came with another diff, pls check.

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 5.9.0-rc6-01791-g9acba6b38757-dirty (grygorii@grygorii-XPS-13-9370) (arm-linux-gnueabihf-gcc (GNU Toolcha0
[    0.000000] CPU: ARMv7 Processor [412fc0f2] revision 2 (ARMv7), cr=10c5387d
[    0.000000] CPU: div instructions available: patching division code
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
[    0.000000] OF: fdt: Machine model: TI AM5718 IDK
...
[    0.053443] cpuidle: using governor ladder
[    0.053470] cpuidle: using governor menu
[    0.089304] No ATAGs?
...
[    3.092291] devtmpfs: mounted
[    3.095804] Freeing unused kernel memory: 1024K
[    3.100483] Run /sbin/init as init process



------ >< ---
diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 071f04da32c8..4521b26e7745 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -514,6 +514,12 @@ static const struct of_device_id reserved_mem_matches[] = {
         {}
  };
  
+static int __init of_platform_fw_devlink_pause(void)
+{
+       fw_devlink_pause();
+}
+core_initcall(of_platform_fw_devlink_pause);
+
  static int __init of_platform_default_populate_init(void)
  {
         struct device_node *node;
@@ -538,9 +544,7 @@ static int __init of_platform_default_populate_init(void)
         }
  
         /* Populate everything else. */
-       fw_devlink_pause();
         of_platform_default_populate(NULL, NULL, NULL);
-       fw_devlink_resume();
  
         return 0;
  }
@@ -548,6 +552,7 @@ arch_initcall_sync(of_platform_default_populate_init);
  
  static int __init of_platform_sync_state_init(void)
  {
+       fw_devlink_resume();
         device_links_supplier_sync_state_resume();
         return 0;
  }
Rob Herring Oct. 2, 2020, 2:07 p.m. UTC | #3
On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote:
>
> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when
> adding all top level devices") optimized the fwnode parsing when all top
> level devices are added, it missed out optimizing this for platform
> where the top level devices are added through the init_machine() path.
>
> This commit does the optimization for all paths by simply moving the
> fw_devlink_pause/resume() inside of_platform_default_populate().
>
> Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/of/platform.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> index 071f04da32c8..79972e49b539 100644
> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root,
>                                  const struct of_dev_auxdata *lookup,
>                                  struct device *parent)
>  {
> -       return of_platform_populate(root, of_default_bus_match_table, lookup,
> -                                   parent);
> +       int ret;
> +
> +       /*
> +        * fw_devlink_pause/resume() are only safe to be called around top
> +        * level device addition due to locking constraints.
> +        */
> +       if (!root)
> +               fw_devlink_pause();
> +
> +       ret = of_platform_populate(root, of_default_bus_match_table, lookup,
> +                                  parent);

of_platform_default_populate() vs. of_platform_populate() is just a
different match table. I don't think the behavior should otherwise be
different.

There's also of_platform_probe() which has slightly different matching
behavior. It should not behave differently either with respect to
devlinks.

Rob
Grygorii Strashko Oct. 2, 2020, 3:03 p.m. UTC | #4
On 02/10/2020 14:40, Grygorii Strashko wrote:
> 
> 
> On 02/10/2020 02:19, Laurent Pinchart wrote:
>> Hi Saravana,
>>
>> Thank you for the patch.
>>
>> On Thu, Oct 01, 2020 at 03:59:51PM -0700, Saravana Kannan wrote:
>>> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when
>>> adding all top level devices") optimized the fwnode parsing when all top
>>> level devices are added, it missed out optimizing this for platform
>>> where the top level devices are added through the init_machine() path.
>>>
>>> This commit does the optimization for all paths by simply moving the
>>> fw_devlink_pause/resume() inside of_platform_default_populate().
>>
>> Based on v5.9-rc5, before the patch:
>>
>> [    0.652887] cpuidle: using governor menu
>> [   12.349476] No ATAGs?
>>
>> After the patch:
>>
>> [    0.650460] cpuidle: using governor menu
>> [   12.262101] No ATAGs?
>>
>> :-(
> 
> This is kinda expected :( because omap2 arch doesn't call of_platform_default_populate()
> 
> Call path:
> board-generic.c
>   DT_MACHINE_START()
>     .init_machine    = omap_generic_init,
> 
>   omap_generic_init()
>     pdata_quirks_init(omap_dt_match_table);
>          of_platform_populate(NULL, omap_dt_match_table,
>                   omap_auxdata_lookup, NULL);
> 
> Other affected platforms
> arm: mach-ux500
> some mips
> some powerpc
> 
> there are also case when a lot of devices placed under bus node, in such case
>   of_platform_populate() calls from bus drivers will also suffer from this issue.
> 
> I think one option could be to add some parameter to _populate() or introduce new api.
> 
> By the way, is there option to disable this feature at all?
> Is there Kconfig option?
> Is there any reasons why such complex and time consuming code added to the kernel and not implemented on DTC level?
> 
> 
> Also, I've came with another diff, pls check.
> 
> [    0.000000] Booting Linux on physical CPU 0x0
> [    0.000000] Linux version 5.9.0-rc6-01791-g9acba6b38757-dirty (grygorii@grygorii-XPS-13-9370) (arm-linux-gnueabihf-gcc (GNU Toolcha0
> [    0.000000] CPU: ARMv7 Processor [412fc0f2] revision 2 (ARMv7), cr=10c5387d
> [    0.000000] CPU: div instructions available: patching division code
> [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
> [    0.000000] OF: fdt: Machine model: TI AM5718 IDK
> ...
> [    0.053443] cpuidle: using governor ladder
> [    0.053470] cpuidle: using governor menu
> [    0.089304] No ATAGs?
> ...
> [    3.092291] devtmpfs: mounted
> [    3.095804] Freeing unused kernel memory: 1024K
> [    3.100483] Run /sbin/init as init process
> 
> 
> 
> ------ >< ---
> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> index 071f04da32c8..4521b26e7745 100644
> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -514,6 +514,12 @@ static const struct of_device_id reserved_mem_matches[] = {
>          {}
>   };
> 
> +static int __init of_platform_fw_devlink_pause(void)
> +{
> +       fw_devlink_pause();
> +}
> +core_initcall(of_platform_fw_devlink_pause);
> +
>   static int __init of_platform_default_populate_init(void)
>   {
>          struct device_node *node;
> @@ -538,9 +544,7 @@ static int __init of_platform_default_populate_init(void)
>          }
> 
>          /* Populate everything else. */
> -       fw_devlink_pause();
>          of_platform_default_populate(NULL, NULL, NULL);
> -       fw_devlink_resume();
> 
>          return 0;
>   }
> @@ -548,6 +552,7 @@ arch_initcall_sync(of_platform_default_populate_init);
> 
>   static int __init of_platform_sync_state_init(void)
>   {
> +       fw_devlink_resume();

^ it seems has to be done earlier, like
+static int __init of_platform_fw_devlink_resume(void)
+{
+       fw_devlink_resume();
+       return 0;
+}
+device_initcall_sync(of_platform_fw_devlink_resume);


>          device_links_supplier_sync_state_resume();
>          return 0;
>   }
> 
> 
>
Saravana Kannan Oct. 2, 2020, 5:48 p.m. UTC | #5
On Fri, Oct 2, 2020 at 8:03 AM 'Grygorii Strashko' via kernel-team
<kernel-team@android.com> wrote:
>
>
>
> On 02/10/2020 14:40, Grygorii Strashko wrote:
> >
> >
> > On 02/10/2020 02:19, Laurent Pinchart wrote:
> >> Hi Saravana,
> >>
> >> Thank you for the patch.
> >>
> >> On Thu, Oct 01, 2020 at 03:59:51PM -0700, Saravana Kannan wrote:
> >>> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when
> >>> adding all top level devices") optimized the fwnode parsing when all top
> >>> level devices are added, it missed out optimizing this for platform
> >>> where the top level devices are added through the init_machine() path.
> >>>
> >>> This commit does the optimization for all paths by simply moving the
> >>> fw_devlink_pause/resume() inside of_platform_default_populate().
> >>
> >> Based on v5.9-rc5, before the patch:
> >>
> >> [    0.652887] cpuidle: using governor menu
> >> [   12.349476] No ATAGs?
> >>
> >> After the patch:
> >>
> >> [    0.650460] cpuidle: using governor menu
> >> [   12.262101] No ATAGs?
> >>
> >> :-(
> >
> > This is kinda expected :( because omap2 arch doesn't call of_platform_default_populate()
> >
> > Call path:
> > board-generic.c
> >   DT_MACHINE_START()
> >     .init_machine    = omap_generic_init,
> >
> >   omap_generic_init()
> >     pdata_quirks_init(omap_dt_match_table);
> >          of_platform_populate(NULL, omap_dt_match_table,
> >                   omap_auxdata_lookup, NULL);
> >
> > Other affected platforms
> > arm: mach-ux500
> > some mips
> > some powerpc
> >
> > there are also case when a lot of devices placed under bus node, in such case
> >   of_platform_populate() calls from bus drivers will also suffer from this issue.
> >
> > I think one option could be to add some parameter to _populate() or introduce new api.
> >
> > By the way, is there option to disable this feature at all?
> > Is there Kconfig option?
> > Is there any reasons why such complex and time consuming code added to the kernel and not implemented on DTC level?
> >
> >
> > Also, I've came with another diff, pls check.
> >
> > [    0.000000] Booting Linux on physical CPU 0x0
> > [    0.000000] Linux version 5.9.0-rc6-01791-g9acba6b38757-dirty (grygorii@grygorii-XPS-13-9370) (arm-linux-gnueabihf-gcc (GNU Toolcha0
> > [    0.000000] CPU: ARMv7 Processor [412fc0f2] revision 2 (ARMv7), cr=10c5387d
> > [    0.000000] CPU: div instructions available: patching division code
> > [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
> > [    0.000000] OF: fdt: Machine model: TI AM5718 IDK
> > ...
> > [    0.053443] cpuidle: using governor ladder
> > [    0.053470] cpuidle: using governor menu
> > [    0.089304] No ATAGs?
> > ...
> > [    3.092291] devtmpfs: mounted
> > [    3.095804] Freeing unused kernel memory: 1024K
> > [    3.100483] Run /sbin/init as init process
> >
> >
> >
> > ------ >< ---
> > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > index 071f04da32c8..4521b26e7745 100644
> > --- a/drivers/of/platform.c
> > +++ b/drivers/of/platform.c
> > @@ -514,6 +514,12 @@ static const struct of_device_id reserved_mem_matches[] = {
> >          {}
> >   };
> >
> > +static int __init of_platform_fw_devlink_pause(void)
> > +{
> > +       fw_devlink_pause();
> > +}
> > +core_initcall(of_platform_fw_devlink_pause);
> > +
> >   static int __init of_platform_default_populate_init(void)
> >   {
> >          struct device_node *node;
> > @@ -538,9 +544,7 @@ static int __init of_platform_default_populate_init(void)
> >          }
> >
> >          /* Populate everything else. */
> > -       fw_devlink_pause();
> >          of_platform_default_populate(NULL, NULL, NULL);
> > -       fw_devlink_resume();
> >
> >          return 0;
> >   }
> > @@ -548,6 +552,7 @@ arch_initcall_sync(of_platform_default_populate_init);
> >
> >   static int __init of_platform_sync_state_init(void)
> >   {
> > +       fw_devlink_resume();
>
> ^ it seems has to be done earlier, like
> +static int __init of_platform_fw_devlink_resume(void)
> +{
> +       fw_devlink_resume();
> +       return 0;
> +}
> +device_initcall_sync(of_platform_fw_devlink_resume);

This will mean no device will probe until device_initcall_sync().
Unfortunately, I don't think we can make such a sweeping assumption.

-Saravana
Saravana Kannan Oct. 2, 2020, 5:51 p.m. UTC | #6
On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote:
>
> On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when
> > adding all top level devices") optimized the fwnode parsing when all top
> > level devices are added, it missed out optimizing this for platform
> > where the top level devices are added through the init_machine() path.
> >
> > This commit does the optimization for all paths by simply moving the
> > fw_devlink_pause/resume() inside of_platform_default_populate().
> >
> > Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  drivers/of/platform.c | 19 +++++++++++++++----
> >  1 file changed, 15 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > index 071f04da32c8..79972e49b539 100644
> > --- a/drivers/of/platform.c
> > +++ b/drivers/of/platform.c
> > @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root,
> >                                  const struct of_dev_auxdata *lookup,
> >                                  struct device *parent)
> >  {
> > -       return of_platform_populate(root, of_default_bus_match_table, lookup,
> > -                                   parent);
> > +       int ret;
> > +
> > +       /*
> > +        * fw_devlink_pause/resume() are only safe to be called around top
> > +        * level device addition due to locking constraints.
> > +        */
> > +       if (!root)
> > +               fw_devlink_pause();
> > +
> > +       ret = of_platform_populate(root, of_default_bus_match_table, lookup,
> > +                                  parent);
>
> of_platform_default_populate() vs. of_platform_populate() is just a
> different match table. I don't think the behavior should otherwise be
> different.
>
> There's also of_platform_probe() which has slightly different matching
> behavior. It should not behave differently either with respect to
> devlinks.

So I'm trying to do this only when the top level devices are added for
the first time. of_platform_default_populate() seems to be the most
common path. For other cases, I think we just need to call
fw_devlink_pause/resume() wherever the top level devices are added for
the first time. As I said in the other email, we can't add
fw_devlink_pause/resume() by default to of_platform_populate().

Do you have other ideas for achieving "call fw_devlink_pause/resume()
only when top level devices are added for the first time"?

-Saravana
Laurent Pinchart Oct. 2, 2020, 5:54 p.m. UTC | #7
Hi Saravana,

On Fri, Oct 02, 2020 at 10:51:51AM -0700, Saravana Kannan wrote:
> On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote:
> > On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when
> > > adding all top level devices") optimized the fwnode parsing when all top
> > > level devices are added, it missed out optimizing this for platform
> > > where the top level devices are added through the init_machine() path.
> > >
> > > This commit does the optimization for all paths by simply moving the
> > > fw_devlink_pause/resume() inside of_platform_default_populate().
> > >
> > > Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > ---
> > >  drivers/of/platform.c | 19 +++++++++++++++----
> > >  1 file changed, 15 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > > index 071f04da32c8..79972e49b539 100644
> > > --- a/drivers/of/platform.c
> > > +++ b/drivers/of/platform.c
> > > @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root,
> > >                                  const struct of_dev_auxdata *lookup,
> > >                                  struct device *parent)
> > >  {
> > > -       return of_platform_populate(root, of_default_bus_match_table, lookup,
> > > -                                   parent);
> > > +       int ret;
> > > +
> > > +       /*
> > > +        * fw_devlink_pause/resume() are only safe to be called around top
> > > +        * level device addition due to locking constraints.
> > > +        */
> > > +       if (!root)
> > > +               fw_devlink_pause();
> > > +
> > > +       ret = of_platform_populate(root, of_default_bus_match_table, lookup,
> > > +                                  parent);
> >
> > of_platform_default_populate() vs. of_platform_populate() is just a
> > different match table. I don't think the behavior should otherwise be
> > different.
> >
> > There's also of_platform_probe() which has slightly different matching
> > behavior. It should not behave differently either with respect to
> > devlinks.
> 
> So I'm trying to do this only when the top level devices are added for
> the first time. of_platform_default_populate() seems to be the most
> common path. For other cases, I think we just need to call
> fw_devlink_pause/resume() wherever the top level devices are added for
> the first time. As I said in the other email, we can't add
> fw_devlink_pause/resume() by default to of_platform_populate().
> 
> Do you have other ideas for achieving "call fw_devlink_pause/resume()
> only when top level devices are added for the first time"?

I'm not an expert in this domain, but before investigating it, would you
be able to share a hack patch that implements this (in the most simple
way) to check if it actually fixes the delays I experience on my system
?
Saravana Kannan Oct. 2, 2020, 5:58 p.m. UTC | #8
On Fri, Oct 2, 2020 at 10:55 AM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Saravana,
>
> On Fri, Oct 02, 2020 at 10:51:51AM -0700, Saravana Kannan wrote:
> > On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote:
> > > On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote:
> > > >
> > > > When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when
> > > > adding all top level devices") optimized the fwnode parsing when all top
> > > > level devices are added, it missed out optimizing this for platform
> > > > where the top level devices are added through the init_machine() path.
> > > >
> > > > This commit does the optimization for all paths by simply moving the
> > > > fw_devlink_pause/resume() inside of_platform_default_populate().
> > > >
> > > > Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > ---
> > > >  drivers/of/platform.c | 19 +++++++++++++++----
> > > >  1 file changed, 15 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > > > index 071f04da32c8..79972e49b539 100644
> > > > --- a/drivers/of/platform.c
> > > > +++ b/drivers/of/platform.c
> > > > @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root,
> > > >                                  const struct of_dev_auxdata *lookup,
> > > >                                  struct device *parent)
> > > >  {
> > > > -       return of_platform_populate(root, of_default_bus_match_table, lookup,
> > > > -                                   parent);
> > > > +       int ret;
> > > > +
> > > > +       /*
> > > > +        * fw_devlink_pause/resume() are only safe to be called around top
> > > > +        * level device addition due to locking constraints.
> > > > +        */
> > > > +       if (!root)
> > > > +               fw_devlink_pause();
> > > > +
> > > > +       ret = of_platform_populate(root, of_default_bus_match_table, lookup,
> > > > +                                  parent);
> > >
> > > of_platform_default_populate() vs. of_platform_populate() is just a
> > > different match table. I don't think the behavior should otherwise be
> > > different.
> > >
> > > There's also of_platform_probe() which has slightly different matching
> > > behavior. It should not behave differently either with respect to
> > > devlinks.
> >
> > So I'm trying to do this only when the top level devices are added for
> > the first time. of_platform_default_populate() seems to be the most
> > common path. For other cases, I think we just need to call
> > fw_devlink_pause/resume() wherever the top level devices are added for
> > the first time. As I said in the other email, we can't add
> > fw_devlink_pause/resume() by default to of_platform_populate().
> >
> > Do you have other ideas for achieving "call fw_devlink_pause/resume()
> > only when top level devices are added for the first time"?
>
> I'm not an expert in this domain, but before investigating it, would you
> be able to share a hack patch that implements this (in the most simple
> way) to check if it actually fixes the delays I experience on my system
> ?

So I take it the patch I sent out didn't work for you? Can you tell me
what machine/DT you are using?

-Saravana
Grygorii Strashko Oct. 2, 2020, 6:11 p.m. UTC | #9
On 02/10/2020 20:48, Saravana Kannan wrote:
> On Fri, Oct 2, 2020 at 8:03 AM 'Grygorii Strashko' via kernel-team
> <kernel-team@android.com> wrote:
>>
>>
>>
>> On 02/10/2020 14:40, Grygorii Strashko wrote:
>>>
>>>
>>> On 02/10/2020 02:19, Laurent Pinchart wrote:
>>>> Hi Saravana,
>>>>
>>>> Thank you for the patch.
>>>>
>>>> On Thu, Oct 01, 2020 at 03:59:51PM -0700, Saravana Kannan wrote:
>>>>> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when
>>>>> adding all top level devices") optimized the fwnode parsing when all top
>>>>> level devices are added, it missed out optimizing this for platform
>>>>> where the top level devices are added through the init_machine() path.
>>>>>
>>>>> This commit does the optimization for all paths by simply moving the
>>>>> fw_devlink_pause/resume() inside of_platform_default_populate().
>>>>
>>>> Based on v5.9-rc5, before the patch:
>>>>
>>>> [    0.652887] cpuidle: using governor menu
>>>> [   12.349476] No ATAGs?
>>>>
>>>> After the patch:
>>>>
>>>> [    0.650460] cpuidle: using governor menu
>>>> [   12.262101] No ATAGs?
>>>>
>>>> :-(
>>>
>>> This is kinda expected :( because omap2 arch doesn't call of_platform_default_populate()
>>>
>>> Call path:
>>> board-generic.c
>>>    DT_MACHINE_START()
>>>      .init_machine    = omap_generic_init,
>>>
>>>    omap_generic_init()
>>>      pdata_quirks_init(omap_dt_match_table);
>>>           of_platform_populate(NULL, omap_dt_match_table,
>>>                    omap_auxdata_lookup, NULL);
>>>
>>> Other affected platforms
>>> arm: mach-ux500
>>> some mips
>>> some powerpc
>>>
>>> there are also case when a lot of devices placed under bus node, in such case
>>>    of_platform_populate() calls from bus drivers will also suffer from this issue.
>>>
>>> I think one option could be to add some parameter to _populate() or introduce new api.
>>>
>>> By the way, is there option to disable this feature at all?
>>> Is there Kconfig option?
>>> Is there any reasons why such complex and time consuming code added to the kernel and not implemented on DTC level?
>>>
>>>
>>> Also, I've came with another diff, pls check.
>>>
>>> [    0.000000] Booting Linux on physical CPU 0x0
>>> [    0.000000] Linux version 5.9.0-rc6-01791-g9acba6b38757-dirty (grygorii@grygorii-XPS-13-9370) (arm-linux-gnueabihf-gcc (GNU Toolcha0
>>> [    0.000000] CPU: ARMv7 Processor [412fc0f2] revision 2 (ARMv7), cr=10c5387d
>>> [    0.000000] CPU: div instructions available: patching division code
>>> [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
>>> [    0.000000] OF: fdt: Machine model: TI AM5718 IDK
>>> ...
>>> [    0.053443] cpuidle: using governor ladder
>>> [    0.053470] cpuidle: using governor menu
>>> [    0.089304] No ATAGs?
>>> ...
>>> [    3.092291] devtmpfs: mounted
>>> [    3.095804] Freeing unused kernel memory: 1024K
>>> [    3.100483] Run /sbin/init as init process
>>>
>>>
>>>
>>> ------ >< ---
>>> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
>>> index 071f04da32c8..4521b26e7745 100644
>>> --- a/drivers/of/platform.c
>>> +++ b/drivers/of/platform.c
>>> @@ -514,6 +514,12 @@ static const struct of_device_id reserved_mem_matches[] = {
>>>           {}
>>>    };
>>>
>>> +static int __init of_platform_fw_devlink_pause(void)
>>> +{
>>> +       fw_devlink_pause();
>>> +}
>>> +core_initcall(of_platform_fw_devlink_pause);
>>> +
>>>    static int __init of_platform_default_populate_init(void)
>>>    {
>>>           struct device_node *node;
>>> @@ -538,9 +544,7 @@ static int __init of_platform_default_populate_init(void)
>>>           }
>>>
>>>           /* Populate everything else. */
>>> -       fw_devlink_pause();
>>>           of_platform_default_populate(NULL, NULL, NULL);
>>> -       fw_devlink_resume();
>>>
>>>           return 0;
>>>    }
>>> @@ -548,6 +552,7 @@ arch_initcall_sync(of_platform_default_populate_init);
>>>
>>>    static int __init of_platform_sync_state_init(void)
>>>    {
>>> +       fw_devlink_resume();
>>
>> ^ it seems has to be done earlier, like
>> +static int __init of_platform_fw_devlink_resume(void)
>> +{
>> +       fw_devlink_resume();
>> +       return 0;
>> +}
>> +device_initcall_sync(of_platform_fw_devlink_resume);
> 
> This will mean no device will probe until device_initcall_sync().
> Unfortunately, I don't think we can make such a sweeping assumption.

Could you answer below questions, pls?
>>> By the way, is there option to disable this feature at all?
>>> Is there Kconfig option?
Laurent Pinchart Oct. 2, 2020, 6:27 p.m. UTC | #10
Hi Saravana,

On Fri, Oct 02, 2020 at 10:58:55AM -0700, Saravana Kannan wrote:
> On Fri, Oct 2, 2020 at 10:55 AM Laurent Pinchart wrote:
> > On Fri, Oct 02, 2020 at 10:51:51AM -0700, Saravana Kannan wrote:
> > > On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote:
> > > > On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > >
> > > > > When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when
> > > > > adding all top level devices") optimized the fwnode parsing when all top
> > > > > level devices are added, it missed out optimizing this for platform
> > > > > where the top level devices are added through the init_machine() path.
> > > > >
> > > > > This commit does the optimization for all paths by simply moving the
> > > > > fw_devlink_pause/resume() inside of_platform_default_populate().
> > > > >
> > > > > Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > ---
> > > > >  drivers/of/platform.c | 19 +++++++++++++++----
> > > > >  1 file changed, 15 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > > > > index 071f04da32c8..79972e49b539 100644
> > > > > --- a/drivers/of/platform.c
> > > > > +++ b/drivers/of/platform.c
> > > > > @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root,
> > > > >                                  const struct of_dev_auxdata *lookup,
> > > > >                                  struct device *parent)
> > > > >  {
> > > > > -       return of_platform_populate(root, of_default_bus_match_table, lookup,
> > > > > -                                   parent);
> > > > > +       int ret;
> > > > > +
> > > > > +       /*
> > > > > +        * fw_devlink_pause/resume() are only safe to be called around top
> > > > > +        * level device addition due to locking constraints.
> > > > > +        */
> > > > > +       if (!root)
> > > > > +               fw_devlink_pause();
> > > > > +
> > > > > +       ret = of_platform_populate(root, of_default_bus_match_table, lookup,
> > > > > +                                  parent);
> > > >
> > > > of_platform_default_populate() vs. of_platform_populate() is just a
> > > > different match table. I don't think the behavior should otherwise be
> > > > different.
> > > >
> > > > There's also of_platform_probe() which has slightly different matching
> > > > behavior. It should not behave differently either with respect to
> > > > devlinks.
> > >
> > > So I'm trying to do this only when the top level devices are added for
> > > the first time. of_platform_default_populate() seems to be the most
> > > common path. For other cases, I think we just need to call
> > > fw_devlink_pause/resume() wherever the top level devices are added for
> > > the first time. As I said in the other email, we can't add
> > > fw_devlink_pause/resume() by default to of_platform_populate().
> > >
> > > Do you have other ideas for achieving "call fw_devlink_pause/resume()
> > > only when top level devices are added for the first time"?
> >
> > I'm not an expert in this domain, but before investigating it, would you
> > be able to share a hack patch that implements this (in the most simple
> > way) to check if it actually fixes the delays I experience on my system
> > ?
> 
> So I take it the patch I sent out didn't work for you? Can you tell me
> what machine/DT you are using?

I've replied to the patch:

Based on v5.9-rc5, before the patch:

[    0.652887] cpuidle: using governor menu
[   12.349476] No ATAGs?

After the patch:

[    0.650460] cpuidle: using governor menu
[   12.262101] No ATAGs?

I'm using an AM57xx EVM, whose DT is not upstream, but it's essentially
a am57xx-beagle-x15-revb1.dts (it includes that DTS) with a few
additional nodes for GPIO keys, LCD panel, backlight and touchscreen.
Grygorii Strashko Oct. 2, 2020, 6:35 p.m. UTC | #11
hi Saravana,

On 02/10/2020 21:27, Laurent Pinchart wrote:
> Hi Saravana,
> 
> On Fri, Oct 02, 2020 at 10:58:55AM -0700, Saravana Kannan wrote:
>> On Fri, Oct 2, 2020 at 10:55 AM Laurent Pinchart wrote:
>>> On Fri, Oct 02, 2020 at 10:51:51AM -0700, Saravana Kannan wrote:
>>>> On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote:
>>>>> On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote:
>>>>>>
>>>>>> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when
>>>>>> adding all top level devices") optimized the fwnode parsing when all top
>>>>>> level devices are added, it missed out optimizing this for platform
>>>>>> where the top level devices are added through the init_machine() path.
>>>>>>
>>>>>> This commit does the optimization for all paths by simply moving the
>>>>>> fw_devlink_pause/resume() inside of_platform_default_populate().
>>>>>>
>>>>>> Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
>>>>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
>>>>>> ---
>>>>>>   drivers/of/platform.c | 19 +++++++++++++++----
>>>>>>   1 file changed, 15 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
>>>>>> index 071f04da32c8..79972e49b539 100644
>>>>>> --- a/drivers/of/platform.c
>>>>>> +++ b/drivers/of/platform.c
>>>>>> @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root,
>>>>>>                                   const struct of_dev_auxdata *lookup,
>>>>>>                                   struct device *parent)
>>>>>>   {
>>>>>> -       return of_platform_populate(root, of_default_bus_match_table, lookup,
>>>>>> -                                   parent);
>>>>>> +       int ret;
>>>>>> +
>>>>>> +       /*
>>>>>> +        * fw_devlink_pause/resume() are only safe to be called around top
>>>>>> +        * level device addition due to locking constraints.
>>>>>> +        */
>>>>>> +       if (!root)
>>>>>> +               fw_devlink_pause();
>>>>>> +
>>>>>> +       ret = of_platform_populate(root, of_default_bus_match_table, lookup,
>>>>>> +                                  parent);
>>>>>
>>>>> of_platform_default_populate() vs. of_platform_populate() is just a
>>>>> different match table. I don't think the behavior should otherwise be
>>>>> different.
>>>>>
>>>>> There's also of_platform_probe() which has slightly different matching
>>>>> behavior. It should not behave differently either with respect to
>>>>> devlinks.
>>>>
>>>> So I'm trying to do this only when the top level devices are added for
>>>> the first time. of_platform_default_populate() seems to be the most
>>>> common path. For other cases, I think we just need to call
>>>> fw_devlink_pause/resume() wherever the top level devices are added for
>>>> the first time. As I said in the other email, we can't add
>>>> fw_devlink_pause/resume() by default to of_platform_populate().
>>>>
>>>> Do you have other ideas for achieving "call fw_devlink_pause/resume()
>>>> only when top level devices are added for the first time"?
>>>
>>> I'm not an expert in this domain, but before investigating it, would you
>>> be able to share a hack patch that implements this (in the most simple
>>> way) to check if it actually fixes the delays I experience on my system
>>> ?
>>
>> So I take it the patch I sent out didn't work for you? Can you tell me
>> what machine/DT you are using?
> 
> I've replied to the patch:
> 
> Based on v5.9-rc5, before the patch:
> 
> [    0.652887] cpuidle: using governor menu
> [   12.349476] No ATAGs?
> 
> After the patch:
> 
> [    0.650460] cpuidle: using governor menu
> [   12.262101] No ATAGs?
> 
> I'm using an AM57xx EVM, whose DT is not upstream, but it's essentially
> a am57xx-beagle-x15-revb1.dts (it includes that DTS) with a few
> additional nodes for GPIO keys, LCD panel, backlight and touchscreen.
> 

hope you are receiving my mails as I've provided you with all required information already [1]

with below diff:
[    4.177231] Freeing unused kernel memory: 1024K
[    4.181892] Run /sbin/init as init process

The best time with [2] is
[    3.100483] Run /sbin/init as init process

Still 1 sec lose.

Pls understand an issue - requirements here are like 500ms boot with can, Ethernet, camera and display on ;(

[1] https://lore.kernel.org/patchwork/patch/1316134/#1511276
[2] https://lore.kernel.org/patchwork/patch/1316134/#1511435

diff --git a/arch/arm/mach-omap2/pdata-quirks.c b/arch/arm/mach-omap2/pdata-quirks.c
index 2a4fe3e68b82..ac1ab8928190 100644
--- a/arch/arm/mach-omap2/pdata-quirks.c
+++ b/arch/arm/mach-omap2/pdata-quirks.c
@@ -591,7 +591,9 @@ void __init pdata_quirks_init(const struct of_device_id *omap_dt_match_table)
         if (of_machine_is_compatible("ti,omap3"))
                 omap3_mcbsp_init();
         pdata_quirks_check(auxdata_quirks);
+       fw_devlink_pause();
         of_platform_populate(NULL, omap_dt_match_table,
                              omap_auxdata_lookup, NULL);
+       fw_devlink_resume();
         pdata_quirks_check(pdata_quirks);
  }
Saravana Kannan Oct. 2, 2020, 7:56 p.m. UTC | #12
On Fri, Oct 2, 2020 at 11:35 AM 'Grygorii Strashko' via kernel-team
<kernel-team@android.com> wrote:
>
> hi Saravana,
>
> On 02/10/2020 21:27, Laurent Pinchart wrote:
> > Hi Saravana,
> >
> > On Fri, Oct 02, 2020 at 10:58:55AM -0700, Saravana Kannan wrote:
> >> On Fri, Oct 2, 2020 at 10:55 AM Laurent Pinchart wrote:
> >>> On Fri, Oct 02, 2020 at 10:51:51AM -0700, Saravana Kannan wrote:
> >>>> On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote:
> >>>>> On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote:
> >>>>>>
> >>>>>> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when
> >>>>>> adding all top level devices") optimized the fwnode parsing when all top
> >>>>>> level devices are added, it missed out optimizing this for platform
> >>>>>> where the top level devices are added through the init_machine() path.
> >>>>>>
> >>>>>> This commit does the optimization for all paths by simply moving the
> >>>>>> fw_devlink_pause/resume() inside of_platform_default_populate().
> >>>>>>
> >>>>>> Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
> >>>>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
> >>>>>> ---
> >>>>>>   drivers/of/platform.c | 19 +++++++++++++++----
> >>>>>>   1 file changed, 15 insertions(+), 4 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> >>>>>> index 071f04da32c8..79972e49b539 100644
> >>>>>> --- a/drivers/of/platform.c
> >>>>>> +++ b/drivers/of/platform.c
> >>>>>> @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root,
> >>>>>>                                   const struct of_dev_auxdata *lookup,
> >>>>>>                                   struct device *parent)
> >>>>>>   {
> >>>>>> -       return of_platform_populate(root, of_default_bus_match_table, lookup,
> >>>>>> -                                   parent);
> >>>>>> +       int ret;
> >>>>>> +
> >>>>>> +       /*
> >>>>>> +        * fw_devlink_pause/resume() are only safe to be called around top
> >>>>>> +        * level device addition due to locking constraints.
> >>>>>> +        */
> >>>>>> +       if (!root)
> >>>>>> +               fw_devlink_pause();
> >>>>>> +
> >>>>>> +       ret = of_platform_populate(root, of_default_bus_match_table, lookup,
> >>>>>> +                                  parent);
> >>>>>
> >>>>> of_platform_default_populate() vs. of_platform_populate() is just a
> >>>>> different match table. I don't think the behavior should otherwise be
> >>>>> different.
> >>>>>
> >>>>> There's also of_platform_probe() which has slightly different matching
> >>>>> behavior. It should not behave differently either with respect to
> >>>>> devlinks.
> >>>>
> >>>> So I'm trying to do this only when the top level devices are added for
> >>>> the first time. of_platform_default_populate() seems to be the most
> >>>> common path. For other cases, I think we just need to call
> >>>> fw_devlink_pause/resume() wherever the top level devices are added for
> >>>> the first time. As I said in the other email, we can't add
> >>>> fw_devlink_pause/resume() by default to of_platform_populate().
> >>>>
> >>>> Do you have other ideas for achieving "call fw_devlink_pause/resume()
> >>>> only when top level devices are added for the first time"?
> >>>
> >>> I'm not an expert in this domain, but before investigating it, would you
> >>> be able to share a hack patch that implements this (in the most simple
> >>> way) to check if it actually fixes the delays I experience on my system
> >>> ?
> >>
> >> So I take it the patch I sent out didn't work for you? Can you tell me
> >> what machine/DT you are using?
> >
> > I've replied to the patch:
> >
> > Based on v5.9-rc5, before the patch:
> >
> > [    0.652887] cpuidle: using governor menu
> > [   12.349476] No ATAGs?
> >
> > After the patch:
> >
> > [    0.650460] cpuidle: using governor menu
> > [   12.262101] No ATAGs?
> >
> > I'm using an AM57xx EVM, whose DT is not upstream, but it's essentially
> > a am57xx-beagle-x15-revb1.dts (it includes that DTS) with a few
> > additional nodes for GPIO keys, LCD panel, backlight and touchscreen.
> >
>
> hope you are receiving my mails as I've provided you with all required information already [1]

Laurent/Grygorii,

Looks like I'm definitely missing emails. Sorry about the confusion.

I have some other urgent things on my plate right now. Is it okay if I
get to this in a day or two? In the end, we'll find a solution that
addresses most/all of the delay.

Thanks,
Saravana
Rob Herring Oct. 2, 2020, 8:29 p.m. UTC | #13
On Fri, Oct 2, 2020 at 12:52 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote:
> >
> > On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when
> > > adding all top level devices") optimized the fwnode parsing when all top
> > > level devices are added, it missed out optimizing this for platform
> > > where the top level devices are added through the init_machine() path.
> > >
> > > This commit does the optimization for all paths by simply moving the
> > > fw_devlink_pause/resume() inside of_platform_default_populate().
> > >
> > > Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > ---
> > >  drivers/of/platform.c | 19 +++++++++++++++----
> > >  1 file changed, 15 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > > index 071f04da32c8..79972e49b539 100644
> > > --- a/drivers/of/platform.c
> > > +++ b/drivers/of/platform.c
> > > @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root,
> > >                                  const struct of_dev_auxdata *lookup,
> > >                                  struct device *parent)
> > >  {
> > > -       return of_platform_populate(root, of_default_bus_match_table, lookup,
> > > -                                   parent);
> > > +       int ret;
> > > +
> > > +       /*
> > > +        * fw_devlink_pause/resume() are only safe to be called around top
> > > +        * level device addition due to locking constraints.
> > > +        */
> > > +       if (!root)
> > > +               fw_devlink_pause();
> > > +
> > > +       ret = of_platform_populate(root, of_default_bus_match_table, lookup,
> > > +                                  parent);
> >
> > of_platform_default_populate() vs. of_platform_populate() is just a
> > different match table. I don't think the behavior should otherwise be
> > different.
> >
> > There's also of_platform_probe() which has slightly different matching
> > behavior. It should not behave differently either with respect to
> > devlinks.
>
> So I'm trying to do this only when the top level devices are added for
> the first time. of_platform_default_populate() seems to be the most
> common path. For other cases, I think we just need to call
> fw_devlink_pause/resume() wherever the top level devices are added for
> the first time.

> As I said in the other email, we can't add
> fw_devlink_pause/resume() by default to of_platform_populate().

If you detect it's the first time, you could?

>
> Do you have other ideas for achieving "call fw_devlink_pause/resume()
> only when top level devices are added for the first time"?

Eliminate the cases not using of_platform_default_populate().

There's 2 main reasons for the non default cases. The first is
auxdata. Really, for any modern platform that people care about (and
care about the boot time), they should not be using auxdata. That's
just for the DT transition. You know, a temporary thing from 9 years
ago.

The 2nd is having some parent device. This is typically an soc_device.
I really think this is kind of dumb. We should either have the parent
device always or never. After all, everything's an SoC right? Of
course changing that will break some Android systems since they like
to use non-ABI sysfs device paths.

There could also be some initcall ordering issues. IIRC, in the last
round of cleanups in this area, at91 gpio/pinctrl had an issue with
that. I think I have a half done fix for that I started.

Rob
Laurent Pinchart Oct. 3, 2020, 12:13 a.m. UTC | #14
Hi Saravana,

On Fri, Oct 02, 2020 at 12:56:30PM -0700, Saravana Kannan wrote:
> On Fri, Oct 2, 2020 at 11:35 AM 'Grygorii Strashko' via kernel-team wrote:
> > On 02/10/2020 21:27, Laurent Pinchart wrote:
> > > On Fri, Oct 02, 2020 at 10:58:55AM -0700, Saravana Kannan wrote:
> > >> On Fri, Oct 2, 2020 at 10:55 AM Laurent Pinchart wrote:
> > >>> On Fri, Oct 02, 2020 at 10:51:51AM -0700, Saravana Kannan wrote:
> > >>>> On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote:
> > >>>>> On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote:
> > >>>>>>
> > >>>>>> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when
> > >>>>>> adding all top level devices") optimized the fwnode parsing when all top
> > >>>>>> level devices are added, it missed out optimizing this for platform
> > >>>>>> where the top level devices are added through the init_machine() path.
> > >>>>>>
> > >>>>>> This commit does the optimization for all paths by simply moving the
> > >>>>>> fw_devlink_pause/resume() inside of_platform_default_populate().
> > >>>>>>
> > >>>>>> Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > >>>>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
> > >>>>>> ---
> > >>>>>>   drivers/of/platform.c | 19 +++++++++++++++----
> > >>>>>>   1 file changed, 15 insertions(+), 4 deletions(-)
> > >>>>>>
> > >>>>>> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > >>>>>> index 071f04da32c8..79972e49b539 100644
> > >>>>>> --- a/drivers/of/platform.c
> > >>>>>> +++ b/drivers/of/platform.c
> > >>>>>> @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root,
> > >>>>>>                                   const struct of_dev_auxdata *lookup,
> > >>>>>>                                   struct device *parent)
> > >>>>>>   {
> > >>>>>> -       return of_platform_populate(root, of_default_bus_match_table, lookup,
> > >>>>>> -                                   parent);
> > >>>>>> +       int ret;
> > >>>>>> +
> > >>>>>> +       /*
> > >>>>>> +        * fw_devlink_pause/resume() are only safe to be called around top
> > >>>>>> +        * level device addition due to locking constraints.
> > >>>>>> +        */
> > >>>>>> +       if (!root)
> > >>>>>> +               fw_devlink_pause();
> > >>>>>> +
> > >>>>>> +       ret = of_platform_populate(root, of_default_bus_match_table, lookup,
> > >>>>>> +                                  parent);
> > >>>>>
> > >>>>> of_platform_default_populate() vs. of_platform_populate() is just a
> > >>>>> different match table. I don't think the behavior should otherwise be
> > >>>>> different.
> > >>>>>
> > >>>>> There's also of_platform_probe() which has slightly different matching
> > >>>>> behavior. It should not behave differently either with respect to
> > >>>>> devlinks.
> > >>>>
> > >>>> So I'm trying to do this only when the top level devices are added for
> > >>>> the first time. of_platform_default_populate() seems to be the most
> > >>>> common path. For other cases, I think we just need to call
> > >>>> fw_devlink_pause/resume() wherever the top level devices are added for
> > >>>> the first time. As I said in the other email, we can't add
> > >>>> fw_devlink_pause/resume() by default to of_platform_populate().
> > >>>>
> > >>>> Do you have other ideas for achieving "call fw_devlink_pause/resume()
> > >>>> only when top level devices are added for the first time"?
> > >>>
> > >>> I'm not an expert in this domain, but before investigating it, would you
> > >>> be able to share a hack patch that implements this (in the most simple
> > >>> way) to check if it actually fixes the delays I experience on my system
> > >>> ?
> > >>
> > >> So I take it the patch I sent out didn't work for you? Can you tell me
> > >> what machine/DT you are using?
> > >
> > > I've replied to the patch:
> > >
> > > Based on v5.9-rc5, before the patch:
> > >
> > > [    0.652887] cpuidle: using governor menu
> > > [   12.349476] No ATAGs?
> > >
> > > After the patch:
> > >
> > > [    0.650460] cpuidle: using governor menu
> > > [   12.262101] No ATAGs?
> > >
> > > I'm using an AM57xx EVM, whose DT is not upstream, but it's essentially
> > > a am57xx-beagle-x15-revb1.dts (it includes that DTS) with a few
> > > additional nodes for GPIO keys, LCD panel, backlight and touchscreen.
> > >
> >
> > hope you are receiving my mails as I've provided you with all required information already [1]
> 
> Laurent/Grygorii,
> 
> Looks like I'm definitely missing emails. Sorry about the confusion.
> 
> I have some other urgent things on my plate right now. Is it okay if I
> get to this in a day or two? In the end, we'll find a solution that
> addresses most/all of the delay.

No issue on my side.

By the way, during initial investigations, I've traced code paths to
figure out if there was a particular step that would consume a large
amount of time, and found out that of_platform_populate() ends up
executing devlink-related code that seems to have an O(n^3) complexity
on the number of devices, with a few dozens of milliseconds for each
iteration. That's a very bad complexity.
Saravana Kannan Oct. 27, 2020, 3:29 a.m. UTC | #15
On Fri, Oct 2, 2020 at 5:14 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Saravana,
>
> On Fri, Oct 02, 2020 at 12:56:30PM -0700, Saravana Kannan wrote:
> > On Fri, Oct 2, 2020 at 11:35 AM 'Grygorii Strashko' via kernel-team wrote:
> > > On 02/10/2020 21:27, Laurent Pinchart wrote:
> > > > On Fri, Oct 02, 2020 at 10:58:55AM -0700, Saravana Kannan wrote:
> > > >> On Fri, Oct 2, 2020 at 10:55 AM Laurent Pinchart wrote:
> > > >>> On Fri, Oct 02, 2020 at 10:51:51AM -0700, Saravana Kannan wrote:
> > > >>>> On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote:
> > > >>>>> On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote:
> > > >>>>>>
> > > >>>>>> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when
> > > >>>>>> adding all top level devices") optimized the fwnode parsing when all top
> > > >>>>>> level devices are added, it missed out optimizing this for platform
> > > >>>>>> where the top level devices are added through the init_machine() path.
> > > >>>>>>
> > > >>>>>> This commit does the optimization for all paths by simply moving the
> > > >>>>>> fw_devlink_pause/resume() inside of_platform_default_populate().
> > > >>>>>>
> > > >>>>>> Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > > >>>>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > >>>>>> ---
> > > >>>>>>   drivers/of/platform.c | 19 +++++++++++++++----
> > > >>>>>>   1 file changed, 15 insertions(+), 4 deletions(-)
> > > >>>>>>
> > > >>>>>> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > > >>>>>> index 071f04da32c8..79972e49b539 100644
> > > >>>>>> --- a/drivers/of/platform.c
> > > >>>>>> +++ b/drivers/of/platform.c
> > > >>>>>> @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root,
> > > >>>>>>                                   const struct of_dev_auxdata *lookup,
> > > >>>>>>                                   struct device *parent)
> > > >>>>>>   {
> > > >>>>>> -       return of_platform_populate(root, of_default_bus_match_table, lookup,
> > > >>>>>> -                                   parent);
> > > >>>>>> +       int ret;
> > > >>>>>> +
> > > >>>>>> +       /*
> > > >>>>>> +        * fw_devlink_pause/resume() are only safe to be called around top
> > > >>>>>> +        * level device addition due to locking constraints.
> > > >>>>>> +        */
> > > >>>>>> +       if (!root)
> > > >>>>>> +               fw_devlink_pause();
> > > >>>>>> +
> > > >>>>>> +       ret = of_platform_populate(root, of_default_bus_match_table, lookup,
> > > >>>>>> +                                  parent);
> > > >>>>>
> > > >>>>> of_platform_default_populate() vs. of_platform_populate() is just a
> > > >>>>> different match table. I don't think the behavior should otherwise be
> > > >>>>> different.
> > > >>>>>
> > > >>>>> There's also of_platform_probe() which has slightly different matching
> > > >>>>> behavior. It should not behave differently either with respect to
> > > >>>>> devlinks.
> > > >>>>
> > > >>>> So I'm trying to do this only when the top level devices are added for
> > > >>>> the first time. of_platform_default_populate() seems to be the most
> > > >>>> common path. For other cases, I think we just need to call
> > > >>>> fw_devlink_pause/resume() wherever the top level devices are added for
> > > >>>> the first time. As I said in the other email, we can't add
> > > >>>> fw_devlink_pause/resume() by default to of_platform_populate().
> > > >>>>
> > > >>>> Do you have other ideas for achieving "call fw_devlink_pause/resume()
> > > >>>> only when top level devices are added for the first time"?
> > > >>>
> > > >>> I'm not an expert in this domain, but before investigating it, would you
> > > >>> be able to share a hack patch that implements this (in the most simple
> > > >>> way) to check if it actually fixes the delays I experience on my system
> > > >>> ?
> > > >>
> > > >> So I take it the patch I sent out didn't work for you? Can you tell me
> > > >> what machine/DT you are using?
> > > >
> > > > I've replied to the patch:
> > > >
> > > > Based on v5.9-rc5, before the patch:
> > > >
> > > > [    0.652887] cpuidle: using governor menu
> > > > [   12.349476] No ATAGs?
> > > >
> > > > After the patch:
> > > >
> > > > [    0.650460] cpuidle: using governor menu
> > > > [   12.262101] No ATAGs?
> > > >
> > > > I'm using an AM57xx EVM, whose DT is not upstream, but it's essentially
> > > > a am57xx-beagle-x15-revb1.dts (it includes that DTS) with a few
> > > > additional nodes for GPIO keys, LCD panel, backlight and touchscreen.
> > > >
> > >
> > > hope you are receiving my mails as I've provided you with all required information already [1]
> >
> > Laurent/Grygorii,
> >
> > Looks like I'm definitely missing emails. Sorry about the confusion.
> >
> > I have some other urgent things on my plate right now. Is it okay if I
> > get to this in a day or two? In the end, we'll find a solution that
> > addresses most/all of the delay.
>
> No issue on my side.

Hi Laurent,

Sorry it took awhile for me to get back to this.

Can you try throwing around fw_devlink_pause/resume() around the
of_platform_populate() call in arch/arm/mach-omap2/pdata-quirks.c?
Just trying to verify the cause/fix.

If it fixes the issue, then considering Rob's comments [1], a good
short term solution might be to have the suggestion above and some way
to do pause/resume only when the top level devices are added.

> By the way, during initial investigations, I've traced code paths to
> figure out if there was a particular step that would consume a large
> amount of time, and found out that of_platform_populate() ends up
> executing devlink-related code that seems to have an O(n^3) complexity
> on the number of devices, with a few dozens of milliseconds for each
> iteration. That's a very bad complexity.

As you said, the complexity of fw_devlink parsing can be O(N^2). There
are other ways to improve it to make it O(N) but it has a bunch of
additional complexity and memory increase. When I tried to do it that
way the first time, I was question whether O(N^2) actually translated
to measurable difference. Looks like we do now :) I have something in
mind for how to do it with O(N) complexity, but I expect it to take a
while to get in. So in the meantime, I'm thinking of using
fw_devlink_pause/resume() as a short term optimization.

-Saravana

[1] - https://lore.kernel.org/linux-omap/CAL_Jsq+6mxtFei3+1ic4c5XCftJ8nZK6_S5_d15yEXQ02BTNKw@mail.gmail.com/
diff mbox series

Patch

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 071f04da32c8..79972e49b539 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -501,8 +501,21 @@  int of_platform_default_populate(struct device_node *root,
 				 const struct of_dev_auxdata *lookup,
 				 struct device *parent)
 {
-	return of_platform_populate(root, of_default_bus_match_table, lookup,
-				    parent);
+	int ret;
+
+	/*
+	 * fw_devlink_pause/resume() are only safe to be called around top
+	 * level device addition due to locking constraints.
+	 */
+	if (!root)
+		fw_devlink_pause();
+
+	ret = of_platform_populate(root, of_default_bus_match_table, lookup,
+				   parent);
+
+	if (!root)
+		fw_devlink_resume();
+	return ret;
 }
 EXPORT_SYMBOL_GPL(of_platform_default_populate);
 
@@ -538,9 +551,7 @@  static int __init of_platform_default_populate_init(void)
 	}
 
 	/* Populate everything else. */
-	fw_devlink_pause();
 	of_platform_default_populate(NULL, NULL, NULL);
-	fw_devlink_resume();
 
 	return 0;
 }