Message ID | 20201001225952.3676755-1-saravanak@google.com |
---|---|
State | New |
Headers | show |
Series | [v1] of: platform: Batch fwnode parsing in the init_machine() path | expand |
Hi Saravana, Thank you for the patch. On Thu, Oct 01, 2020 at 03:59:51PM -0700, Saravana Kannan wrote: > When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when > adding all top level devices") optimized the fwnode parsing when all top > level devices are added, it missed out optimizing this for platform > where the top level devices are added through the init_machine() path. > > This commit does the optimization for all paths by simply moving the > fw_devlink_pause/resume() inside of_platform_default_populate(). Based on v5.9-rc5, before the patch: [ 0.652887] cpuidle: using governor menu [ 12.349476] No ATAGs? After the patch: [ 0.650460] cpuidle: using governor menu [ 12.262101] No ATAGs? :-( > Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com> > Signed-off-by: Saravana Kannan <saravanak@google.com> > --- > drivers/of/platform.c | 19 +++++++++++++++---- > 1 file changed, 15 insertions(+), 4 deletions(-) > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c > index 071f04da32c8..79972e49b539 100644 > --- a/drivers/of/platform.c > +++ b/drivers/of/platform.c > @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root, > const struct of_dev_auxdata *lookup, > struct device *parent) > { > - return of_platform_populate(root, of_default_bus_match_table, lookup, > - parent); > + int ret; > + > + /* > + * fw_devlink_pause/resume() are only safe to be called around top > + * level device addition due to locking constraints. > + */ > + if (!root) > + fw_devlink_pause(); > + > + ret = of_platform_populate(root, of_default_bus_match_table, lookup, > + parent); > + > + if (!root) > + fw_devlink_resume(); > + return ret; > } > EXPORT_SYMBOL_GPL(of_platform_default_populate); > > @@ -538,9 +551,7 @@ static int __init of_platform_default_populate_init(void) > } > > /* Populate everything else. */ > - fw_devlink_pause(); > of_platform_default_populate(NULL, NULL, NULL); > - fw_devlink_resume(); > > return 0; > }
On 02/10/2020 02:19, Laurent Pinchart wrote: > Hi Saravana, > > Thank you for the patch. > > On Thu, Oct 01, 2020 at 03:59:51PM -0700, Saravana Kannan wrote: >> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when >> adding all top level devices") optimized the fwnode parsing when all top >> level devices are added, it missed out optimizing this for platform >> where the top level devices are added through the init_machine() path. >> >> This commit does the optimization for all paths by simply moving the >> fw_devlink_pause/resume() inside of_platform_default_populate(). > > Based on v5.9-rc5, before the patch: > > [ 0.652887] cpuidle: using governor menu > [ 12.349476] No ATAGs? > > After the patch: > > [ 0.650460] cpuidle: using governor menu > [ 12.262101] No ATAGs? > > :-( This is kinda expected :( because omap2 arch doesn't call of_platform_default_populate() Call path: board-generic.c DT_MACHINE_START() .init_machine = omap_generic_init, omap_generic_init() pdata_quirks_init(omap_dt_match_table); of_platform_populate(NULL, omap_dt_match_table, omap_auxdata_lookup, NULL); Other affected platforms arm: mach-ux500 some mips some powerpc there are also case when a lot of devices placed under bus node, in such case of_platform_populate() calls from bus drivers will also suffer from this issue. I think one option could be to add some parameter to _populate() or introduce new api. By the way, is there option to disable this feature at all? Is there Kconfig option? Is there any reasons why such complex and time consuming code added to the kernel and not implemented on DTC level? Also, I've came with another diff, pls check. [ 0.000000] Booting Linux on physical CPU 0x0 [ 0.000000] Linux version 5.9.0-rc6-01791-g9acba6b38757-dirty (grygorii@grygorii-XPS-13-9370) (arm-linux-gnueabihf-gcc (GNU Toolcha0 [ 0.000000] CPU: ARMv7 Processor [412fc0f2] revision 2 (ARMv7), cr=10c5387d [ 0.000000] CPU: div instructions available: patching division code [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache [ 0.000000] OF: fdt: Machine model: TI AM5718 IDK ... [ 0.053443] cpuidle: using governor ladder [ 0.053470] cpuidle: using governor menu [ 0.089304] No ATAGs? ... [ 3.092291] devtmpfs: mounted [ 3.095804] Freeing unused kernel memory: 1024K [ 3.100483] Run /sbin/init as init process ------ >< --- diff --git a/drivers/of/platform.c b/drivers/of/platform.c index 071f04da32c8..4521b26e7745 100644 --- a/drivers/of/platform.c +++ b/drivers/of/platform.c @@ -514,6 +514,12 @@ static const struct of_device_id reserved_mem_matches[] = { {} }; +static int __init of_platform_fw_devlink_pause(void) +{ + fw_devlink_pause(); +} +core_initcall(of_platform_fw_devlink_pause); + static int __init of_platform_default_populate_init(void) { struct device_node *node; @@ -538,9 +544,7 @@ static int __init of_platform_default_populate_init(void) } /* Populate everything else. */ - fw_devlink_pause(); of_platform_default_populate(NULL, NULL, NULL); - fw_devlink_resume(); return 0; } @@ -548,6 +552,7 @@ arch_initcall_sync(of_platform_default_populate_init); static int __init of_platform_sync_state_init(void) { + fw_devlink_resume(); device_links_supplier_sync_state_resume(); return 0; }
On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote: > > When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when > adding all top level devices") optimized the fwnode parsing when all top > level devices are added, it missed out optimizing this for platform > where the top level devices are added through the init_machine() path. > > This commit does the optimization for all paths by simply moving the > fw_devlink_pause/resume() inside of_platform_default_populate(). > > Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com> > Signed-off-by: Saravana Kannan <saravanak@google.com> > --- > drivers/of/platform.c | 19 +++++++++++++++---- > 1 file changed, 15 insertions(+), 4 deletions(-) > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c > index 071f04da32c8..79972e49b539 100644 > --- a/drivers/of/platform.c > +++ b/drivers/of/platform.c > @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root, > const struct of_dev_auxdata *lookup, > struct device *parent) > { > - return of_platform_populate(root, of_default_bus_match_table, lookup, > - parent); > + int ret; > + > + /* > + * fw_devlink_pause/resume() are only safe to be called around top > + * level device addition due to locking constraints. > + */ > + if (!root) > + fw_devlink_pause(); > + > + ret = of_platform_populate(root, of_default_bus_match_table, lookup, > + parent); of_platform_default_populate() vs. of_platform_populate() is just a different match table. I don't think the behavior should otherwise be different. There's also of_platform_probe() which has slightly different matching behavior. It should not behave differently either with respect to devlinks. Rob
On 02/10/2020 14:40, Grygorii Strashko wrote: > > > On 02/10/2020 02:19, Laurent Pinchart wrote: >> Hi Saravana, >> >> Thank you for the patch. >> >> On Thu, Oct 01, 2020 at 03:59:51PM -0700, Saravana Kannan wrote: >>> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when >>> adding all top level devices") optimized the fwnode parsing when all top >>> level devices are added, it missed out optimizing this for platform >>> where the top level devices are added through the init_machine() path. >>> >>> This commit does the optimization for all paths by simply moving the >>> fw_devlink_pause/resume() inside of_platform_default_populate(). >> >> Based on v5.9-rc5, before the patch: >> >> [ 0.652887] cpuidle: using governor menu >> [ 12.349476] No ATAGs? >> >> After the patch: >> >> [ 0.650460] cpuidle: using governor menu >> [ 12.262101] No ATAGs? >> >> :-( > > This is kinda expected :( because omap2 arch doesn't call of_platform_default_populate() > > Call path: > board-generic.c > DT_MACHINE_START() > .init_machine = omap_generic_init, > > omap_generic_init() > pdata_quirks_init(omap_dt_match_table); > of_platform_populate(NULL, omap_dt_match_table, > omap_auxdata_lookup, NULL); > > Other affected platforms > arm: mach-ux500 > some mips > some powerpc > > there are also case when a lot of devices placed under bus node, in such case > of_platform_populate() calls from bus drivers will also suffer from this issue. > > I think one option could be to add some parameter to _populate() or introduce new api. > > By the way, is there option to disable this feature at all? > Is there Kconfig option? > Is there any reasons why such complex and time consuming code added to the kernel and not implemented on DTC level? > > > Also, I've came with another diff, pls check. > > [ 0.000000] Booting Linux on physical CPU 0x0 > [ 0.000000] Linux version 5.9.0-rc6-01791-g9acba6b38757-dirty (grygorii@grygorii-XPS-13-9370) (arm-linux-gnueabihf-gcc (GNU Toolcha0 > [ 0.000000] CPU: ARMv7 Processor [412fc0f2] revision 2 (ARMv7), cr=10c5387d > [ 0.000000] CPU: div instructions available: patching division code > [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache > [ 0.000000] OF: fdt: Machine model: TI AM5718 IDK > ... > [ 0.053443] cpuidle: using governor ladder > [ 0.053470] cpuidle: using governor menu > [ 0.089304] No ATAGs? > ... > [ 3.092291] devtmpfs: mounted > [ 3.095804] Freeing unused kernel memory: 1024K > [ 3.100483] Run /sbin/init as init process > > > > ------ >< --- > diff --git a/drivers/of/platform.c b/drivers/of/platform.c > index 071f04da32c8..4521b26e7745 100644 > --- a/drivers/of/platform.c > +++ b/drivers/of/platform.c > @@ -514,6 +514,12 @@ static const struct of_device_id reserved_mem_matches[] = { > {} > }; > > +static int __init of_platform_fw_devlink_pause(void) > +{ > + fw_devlink_pause(); > +} > +core_initcall(of_platform_fw_devlink_pause); > + > static int __init of_platform_default_populate_init(void) > { > struct device_node *node; > @@ -538,9 +544,7 @@ static int __init of_platform_default_populate_init(void) > } > > /* Populate everything else. */ > - fw_devlink_pause(); > of_platform_default_populate(NULL, NULL, NULL); > - fw_devlink_resume(); > > return 0; > } > @@ -548,6 +552,7 @@ arch_initcall_sync(of_platform_default_populate_init); > > static int __init of_platform_sync_state_init(void) > { > + fw_devlink_resume(); ^ it seems has to be done earlier, like +static int __init of_platform_fw_devlink_resume(void) +{ + fw_devlink_resume(); + return 0; +} +device_initcall_sync(of_platform_fw_devlink_resume); > device_links_supplier_sync_state_resume(); > return 0; > } > > >
On Fri, Oct 2, 2020 at 8:03 AM 'Grygorii Strashko' via kernel-team <kernel-team@android.com> wrote: > > > > On 02/10/2020 14:40, Grygorii Strashko wrote: > > > > > > On 02/10/2020 02:19, Laurent Pinchart wrote: > >> Hi Saravana, > >> > >> Thank you for the patch. > >> > >> On Thu, Oct 01, 2020 at 03:59:51PM -0700, Saravana Kannan wrote: > >>> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when > >>> adding all top level devices") optimized the fwnode parsing when all top > >>> level devices are added, it missed out optimizing this for platform > >>> where the top level devices are added through the init_machine() path. > >>> > >>> This commit does the optimization for all paths by simply moving the > >>> fw_devlink_pause/resume() inside of_platform_default_populate(). > >> > >> Based on v5.9-rc5, before the patch: > >> > >> [ 0.652887] cpuidle: using governor menu > >> [ 12.349476] No ATAGs? > >> > >> After the patch: > >> > >> [ 0.650460] cpuidle: using governor menu > >> [ 12.262101] No ATAGs? > >> > >> :-( > > > > This is kinda expected :( because omap2 arch doesn't call of_platform_default_populate() > > > > Call path: > > board-generic.c > > DT_MACHINE_START() > > .init_machine = omap_generic_init, > > > > omap_generic_init() > > pdata_quirks_init(omap_dt_match_table); > > of_platform_populate(NULL, omap_dt_match_table, > > omap_auxdata_lookup, NULL); > > > > Other affected platforms > > arm: mach-ux500 > > some mips > > some powerpc > > > > there are also case when a lot of devices placed under bus node, in such case > > of_platform_populate() calls from bus drivers will also suffer from this issue. > > > > I think one option could be to add some parameter to _populate() or introduce new api. > > > > By the way, is there option to disable this feature at all? > > Is there Kconfig option? > > Is there any reasons why such complex and time consuming code added to the kernel and not implemented on DTC level? > > > > > > Also, I've came with another diff, pls check. > > > > [ 0.000000] Booting Linux on physical CPU 0x0 > > [ 0.000000] Linux version 5.9.0-rc6-01791-g9acba6b38757-dirty (grygorii@grygorii-XPS-13-9370) (arm-linux-gnueabihf-gcc (GNU Toolcha0 > > [ 0.000000] CPU: ARMv7 Processor [412fc0f2] revision 2 (ARMv7), cr=10c5387d > > [ 0.000000] CPU: div instructions available: patching division code > > [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache > > [ 0.000000] OF: fdt: Machine model: TI AM5718 IDK > > ... > > [ 0.053443] cpuidle: using governor ladder > > [ 0.053470] cpuidle: using governor menu > > [ 0.089304] No ATAGs? > > ... > > [ 3.092291] devtmpfs: mounted > > [ 3.095804] Freeing unused kernel memory: 1024K > > [ 3.100483] Run /sbin/init as init process > > > > > > > > ------ >< --- > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c > > index 071f04da32c8..4521b26e7745 100644 > > --- a/drivers/of/platform.c > > +++ b/drivers/of/platform.c > > @@ -514,6 +514,12 @@ static const struct of_device_id reserved_mem_matches[] = { > > {} > > }; > > > > +static int __init of_platform_fw_devlink_pause(void) > > +{ > > + fw_devlink_pause(); > > +} > > +core_initcall(of_platform_fw_devlink_pause); > > + > > static int __init of_platform_default_populate_init(void) > > { > > struct device_node *node; > > @@ -538,9 +544,7 @@ static int __init of_platform_default_populate_init(void) > > } > > > > /* Populate everything else. */ > > - fw_devlink_pause(); > > of_platform_default_populate(NULL, NULL, NULL); > > - fw_devlink_resume(); > > > > return 0; > > } > > @@ -548,6 +552,7 @@ arch_initcall_sync(of_platform_default_populate_init); > > > > static int __init of_platform_sync_state_init(void) > > { > > + fw_devlink_resume(); > > ^ it seems has to be done earlier, like > +static int __init of_platform_fw_devlink_resume(void) > +{ > + fw_devlink_resume(); > + return 0; > +} > +device_initcall_sync(of_platform_fw_devlink_resume); This will mean no device will probe until device_initcall_sync(). Unfortunately, I don't think we can make such a sweeping assumption. -Saravana
On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote: > > On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote: > > > > When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when > > adding all top level devices") optimized the fwnode parsing when all top > > level devices are added, it missed out optimizing this for platform > > where the top level devices are added through the init_machine() path. > > > > This commit does the optimization for all paths by simply moving the > > fw_devlink_pause/resume() inside of_platform_default_populate(). > > > > Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com> > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > --- > > drivers/of/platform.c | 19 +++++++++++++++---- > > 1 file changed, 15 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c > > index 071f04da32c8..79972e49b539 100644 > > --- a/drivers/of/platform.c > > +++ b/drivers/of/platform.c > > @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root, > > const struct of_dev_auxdata *lookup, > > struct device *parent) > > { > > - return of_platform_populate(root, of_default_bus_match_table, lookup, > > - parent); > > + int ret; > > + > > + /* > > + * fw_devlink_pause/resume() are only safe to be called around top > > + * level device addition due to locking constraints. > > + */ > > + if (!root) > > + fw_devlink_pause(); > > + > > + ret = of_platform_populate(root, of_default_bus_match_table, lookup, > > + parent); > > of_platform_default_populate() vs. of_platform_populate() is just a > different match table. I don't think the behavior should otherwise be > different. > > There's also of_platform_probe() which has slightly different matching > behavior. It should not behave differently either with respect to > devlinks. So I'm trying to do this only when the top level devices are added for the first time. of_platform_default_populate() seems to be the most common path. For other cases, I think we just need to call fw_devlink_pause/resume() wherever the top level devices are added for the first time. As I said in the other email, we can't add fw_devlink_pause/resume() by default to of_platform_populate(). Do you have other ideas for achieving "call fw_devlink_pause/resume() only when top level devices are added for the first time"? -Saravana
Hi Saravana, On Fri, Oct 02, 2020 at 10:51:51AM -0700, Saravana Kannan wrote: > On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote: > > On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote: > > > > > > When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when > > > adding all top level devices") optimized the fwnode parsing when all top > > > level devices are added, it missed out optimizing this for platform > > > where the top level devices are added through the init_machine() path. > > > > > > This commit does the optimization for all paths by simply moving the > > > fw_devlink_pause/resume() inside of_platform_default_populate(). > > > > > > Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com> > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > > --- > > > drivers/of/platform.c | 19 +++++++++++++++---- > > > 1 file changed, 15 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c > > > index 071f04da32c8..79972e49b539 100644 > > > --- a/drivers/of/platform.c > > > +++ b/drivers/of/platform.c > > > @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root, > > > const struct of_dev_auxdata *lookup, > > > struct device *parent) > > > { > > > - return of_platform_populate(root, of_default_bus_match_table, lookup, > > > - parent); > > > + int ret; > > > + > > > + /* > > > + * fw_devlink_pause/resume() are only safe to be called around top > > > + * level device addition due to locking constraints. > > > + */ > > > + if (!root) > > > + fw_devlink_pause(); > > > + > > > + ret = of_platform_populate(root, of_default_bus_match_table, lookup, > > > + parent); > > > > of_platform_default_populate() vs. of_platform_populate() is just a > > different match table. I don't think the behavior should otherwise be > > different. > > > > There's also of_platform_probe() which has slightly different matching > > behavior. It should not behave differently either with respect to > > devlinks. > > So I'm trying to do this only when the top level devices are added for > the first time. of_platform_default_populate() seems to be the most > common path. For other cases, I think we just need to call > fw_devlink_pause/resume() wherever the top level devices are added for > the first time. As I said in the other email, we can't add > fw_devlink_pause/resume() by default to of_platform_populate(). > > Do you have other ideas for achieving "call fw_devlink_pause/resume() > only when top level devices are added for the first time"? I'm not an expert in this domain, but before investigating it, would you be able to share a hack patch that implements this (in the most simple way) to check if it actually fixes the delays I experience on my system ?
On Fri, Oct 2, 2020 at 10:55 AM Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > Hi Saravana, > > On Fri, Oct 02, 2020 at 10:51:51AM -0700, Saravana Kannan wrote: > > On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote: > > > On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote: > > > > > > > > When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when > > > > adding all top level devices") optimized the fwnode parsing when all top > > > > level devices are added, it missed out optimizing this for platform > > > > where the top level devices are added through the init_machine() path. > > > > > > > > This commit does the optimization for all paths by simply moving the > > > > fw_devlink_pause/resume() inside of_platform_default_populate(). > > > > > > > > Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com> > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > > > --- > > > > drivers/of/platform.c | 19 +++++++++++++++---- > > > > 1 file changed, 15 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c > > > > index 071f04da32c8..79972e49b539 100644 > > > > --- a/drivers/of/platform.c > > > > +++ b/drivers/of/platform.c > > > > @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root, > > > > const struct of_dev_auxdata *lookup, > > > > struct device *parent) > > > > { > > > > - return of_platform_populate(root, of_default_bus_match_table, lookup, > > > > - parent); > > > > + int ret; > > > > + > > > > + /* > > > > + * fw_devlink_pause/resume() are only safe to be called around top > > > > + * level device addition due to locking constraints. > > > > + */ > > > > + if (!root) > > > > + fw_devlink_pause(); > > > > + > > > > + ret = of_platform_populate(root, of_default_bus_match_table, lookup, > > > > + parent); > > > > > > of_platform_default_populate() vs. of_platform_populate() is just a > > > different match table. I don't think the behavior should otherwise be > > > different. > > > > > > There's also of_platform_probe() which has slightly different matching > > > behavior. It should not behave differently either with respect to > > > devlinks. > > > > So I'm trying to do this only when the top level devices are added for > > the first time. of_platform_default_populate() seems to be the most > > common path. For other cases, I think we just need to call > > fw_devlink_pause/resume() wherever the top level devices are added for > > the first time. As I said in the other email, we can't add > > fw_devlink_pause/resume() by default to of_platform_populate(). > > > > Do you have other ideas for achieving "call fw_devlink_pause/resume() > > only when top level devices are added for the first time"? > > I'm not an expert in this domain, but before investigating it, would you > be able to share a hack patch that implements this (in the most simple > way) to check if it actually fixes the delays I experience on my system > ? So I take it the patch I sent out didn't work for you? Can you tell me what machine/DT you are using? -Saravana
On 02/10/2020 20:48, Saravana Kannan wrote: > On Fri, Oct 2, 2020 at 8:03 AM 'Grygorii Strashko' via kernel-team > <kernel-team@android.com> wrote: >> >> >> >> On 02/10/2020 14:40, Grygorii Strashko wrote: >>> >>> >>> On 02/10/2020 02:19, Laurent Pinchart wrote: >>>> Hi Saravana, >>>> >>>> Thank you for the patch. >>>> >>>> On Thu, Oct 01, 2020 at 03:59:51PM -0700, Saravana Kannan wrote: >>>>> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when >>>>> adding all top level devices") optimized the fwnode parsing when all top >>>>> level devices are added, it missed out optimizing this for platform >>>>> where the top level devices are added through the init_machine() path. >>>>> >>>>> This commit does the optimization for all paths by simply moving the >>>>> fw_devlink_pause/resume() inside of_platform_default_populate(). >>>> >>>> Based on v5.9-rc5, before the patch: >>>> >>>> [ 0.652887] cpuidle: using governor menu >>>> [ 12.349476] No ATAGs? >>>> >>>> After the patch: >>>> >>>> [ 0.650460] cpuidle: using governor menu >>>> [ 12.262101] No ATAGs? >>>> >>>> :-( >>> >>> This is kinda expected :( because omap2 arch doesn't call of_platform_default_populate() >>> >>> Call path: >>> board-generic.c >>> DT_MACHINE_START() >>> .init_machine = omap_generic_init, >>> >>> omap_generic_init() >>> pdata_quirks_init(omap_dt_match_table); >>> of_platform_populate(NULL, omap_dt_match_table, >>> omap_auxdata_lookup, NULL); >>> >>> Other affected platforms >>> arm: mach-ux500 >>> some mips >>> some powerpc >>> >>> there are also case when a lot of devices placed under bus node, in such case >>> of_platform_populate() calls from bus drivers will also suffer from this issue. >>> >>> I think one option could be to add some parameter to _populate() or introduce new api. >>> >>> By the way, is there option to disable this feature at all? >>> Is there Kconfig option? >>> Is there any reasons why such complex and time consuming code added to the kernel and not implemented on DTC level? >>> >>> >>> Also, I've came with another diff, pls check. >>> >>> [ 0.000000] Booting Linux on physical CPU 0x0 >>> [ 0.000000] Linux version 5.9.0-rc6-01791-g9acba6b38757-dirty (grygorii@grygorii-XPS-13-9370) (arm-linux-gnueabihf-gcc (GNU Toolcha0 >>> [ 0.000000] CPU: ARMv7 Processor [412fc0f2] revision 2 (ARMv7), cr=10c5387d >>> [ 0.000000] CPU: div instructions available: patching division code >>> [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache >>> [ 0.000000] OF: fdt: Machine model: TI AM5718 IDK >>> ... >>> [ 0.053443] cpuidle: using governor ladder >>> [ 0.053470] cpuidle: using governor menu >>> [ 0.089304] No ATAGs? >>> ... >>> [ 3.092291] devtmpfs: mounted >>> [ 3.095804] Freeing unused kernel memory: 1024K >>> [ 3.100483] Run /sbin/init as init process >>> >>> >>> >>> ------ >< --- >>> diff --git a/drivers/of/platform.c b/drivers/of/platform.c >>> index 071f04da32c8..4521b26e7745 100644 >>> --- a/drivers/of/platform.c >>> +++ b/drivers/of/platform.c >>> @@ -514,6 +514,12 @@ static const struct of_device_id reserved_mem_matches[] = { >>> {} >>> }; >>> >>> +static int __init of_platform_fw_devlink_pause(void) >>> +{ >>> + fw_devlink_pause(); >>> +} >>> +core_initcall(of_platform_fw_devlink_pause); >>> + >>> static int __init of_platform_default_populate_init(void) >>> { >>> struct device_node *node; >>> @@ -538,9 +544,7 @@ static int __init of_platform_default_populate_init(void) >>> } >>> >>> /* Populate everything else. */ >>> - fw_devlink_pause(); >>> of_platform_default_populate(NULL, NULL, NULL); >>> - fw_devlink_resume(); >>> >>> return 0; >>> } >>> @@ -548,6 +552,7 @@ arch_initcall_sync(of_platform_default_populate_init); >>> >>> static int __init of_platform_sync_state_init(void) >>> { >>> + fw_devlink_resume(); >> >> ^ it seems has to be done earlier, like >> +static int __init of_platform_fw_devlink_resume(void) >> +{ >> + fw_devlink_resume(); >> + return 0; >> +} >> +device_initcall_sync(of_platform_fw_devlink_resume); > > This will mean no device will probe until device_initcall_sync(). > Unfortunately, I don't think we can make such a sweeping assumption. Could you answer below questions, pls? >>> By the way, is there option to disable this feature at all? >>> Is there Kconfig option?
Hi Saravana, On Fri, Oct 02, 2020 at 10:58:55AM -0700, Saravana Kannan wrote: > On Fri, Oct 2, 2020 at 10:55 AM Laurent Pinchart wrote: > > On Fri, Oct 02, 2020 at 10:51:51AM -0700, Saravana Kannan wrote: > > > On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote: > > > > On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote: > > > > > > > > > > When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when > > > > > adding all top level devices") optimized the fwnode parsing when all top > > > > > level devices are added, it missed out optimizing this for platform > > > > > where the top level devices are added through the init_machine() path. > > > > > > > > > > This commit does the optimization for all paths by simply moving the > > > > > fw_devlink_pause/resume() inside of_platform_default_populate(). > > > > > > > > > > Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com> > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > > > > --- > > > > > drivers/of/platform.c | 19 +++++++++++++++---- > > > > > 1 file changed, 15 insertions(+), 4 deletions(-) > > > > > > > > > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c > > > > > index 071f04da32c8..79972e49b539 100644 > > > > > --- a/drivers/of/platform.c > > > > > +++ b/drivers/of/platform.c > > > > > @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root, > > > > > const struct of_dev_auxdata *lookup, > > > > > struct device *parent) > > > > > { > > > > > - return of_platform_populate(root, of_default_bus_match_table, lookup, > > > > > - parent); > > > > > + int ret; > > > > > + > > > > > + /* > > > > > + * fw_devlink_pause/resume() are only safe to be called around top > > > > > + * level device addition due to locking constraints. > > > > > + */ > > > > > + if (!root) > > > > > + fw_devlink_pause(); > > > > > + > > > > > + ret = of_platform_populate(root, of_default_bus_match_table, lookup, > > > > > + parent); > > > > > > > > of_platform_default_populate() vs. of_platform_populate() is just a > > > > different match table. I don't think the behavior should otherwise be > > > > different. > > > > > > > > There's also of_platform_probe() which has slightly different matching > > > > behavior. It should not behave differently either with respect to > > > > devlinks. > > > > > > So I'm trying to do this only when the top level devices are added for > > > the first time. of_platform_default_populate() seems to be the most > > > common path. For other cases, I think we just need to call > > > fw_devlink_pause/resume() wherever the top level devices are added for > > > the first time. As I said in the other email, we can't add > > > fw_devlink_pause/resume() by default to of_platform_populate(). > > > > > > Do you have other ideas for achieving "call fw_devlink_pause/resume() > > > only when top level devices are added for the first time"? > > > > I'm not an expert in this domain, but before investigating it, would you > > be able to share a hack patch that implements this (in the most simple > > way) to check if it actually fixes the delays I experience on my system > > ? > > So I take it the patch I sent out didn't work for you? Can you tell me > what machine/DT you are using? I've replied to the patch: Based on v5.9-rc5, before the patch: [ 0.652887] cpuidle: using governor menu [ 12.349476] No ATAGs? After the patch: [ 0.650460] cpuidle: using governor menu [ 12.262101] No ATAGs? I'm using an AM57xx EVM, whose DT is not upstream, but it's essentially a am57xx-beagle-x15-revb1.dts (it includes that DTS) with a few additional nodes for GPIO keys, LCD panel, backlight and touchscreen.
hi Saravana, On 02/10/2020 21:27, Laurent Pinchart wrote: > Hi Saravana, > > On Fri, Oct 02, 2020 at 10:58:55AM -0700, Saravana Kannan wrote: >> On Fri, Oct 2, 2020 at 10:55 AM Laurent Pinchart wrote: >>> On Fri, Oct 02, 2020 at 10:51:51AM -0700, Saravana Kannan wrote: >>>> On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote: >>>>> On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote: >>>>>> >>>>>> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when >>>>>> adding all top level devices") optimized the fwnode parsing when all top >>>>>> level devices are added, it missed out optimizing this for platform >>>>>> where the top level devices are added through the init_machine() path. >>>>>> >>>>>> This commit does the optimization for all paths by simply moving the >>>>>> fw_devlink_pause/resume() inside of_platform_default_populate(). >>>>>> >>>>>> Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com> >>>>>> Signed-off-by: Saravana Kannan <saravanak@google.com> >>>>>> --- >>>>>> drivers/of/platform.c | 19 +++++++++++++++---- >>>>>> 1 file changed, 15 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/drivers/of/platform.c b/drivers/of/platform.c >>>>>> index 071f04da32c8..79972e49b539 100644 >>>>>> --- a/drivers/of/platform.c >>>>>> +++ b/drivers/of/platform.c >>>>>> @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root, >>>>>> const struct of_dev_auxdata *lookup, >>>>>> struct device *parent) >>>>>> { >>>>>> - return of_platform_populate(root, of_default_bus_match_table, lookup, >>>>>> - parent); >>>>>> + int ret; >>>>>> + >>>>>> + /* >>>>>> + * fw_devlink_pause/resume() are only safe to be called around top >>>>>> + * level device addition due to locking constraints. >>>>>> + */ >>>>>> + if (!root) >>>>>> + fw_devlink_pause(); >>>>>> + >>>>>> + ret = of_platform_populate(root, of_default_bus_match_table, lookup, >>>>>> + parent); >>>>> >>>>> of_platform_default_populate() vs. of_platform_populate() is just a >>>>> different match table. I don't think the behavior should otherwise be >>>>> different. >>>>> >>>>> There's also of_platform_probe() which has slightly different matching >>>>> behavior. It should not behave differently either with respect to >>>>> devlinks. >>>> >>>> So I'm trying to do this only when the top level devices are added for >>>> the first time. of_platform_default_populate() seems to be the most >>>> common path. For other cases, I think we just need to call >>>> fw_devlink_pause/resume() wherever the top level devices are added for >>>> the first time. As I said in the other email, we can't add >>>> fw_devlink_pause/resume() by default to of_platform_populate(). >>>> >>>> Do you have other ideas for achieving "call fw_devlink_pause/resume() >>>> only when top level devices are added for the first time"? >>> >>> I'm not an expert in this domain, but before investigating it, would you >>> be able to share a hack patch that implements this (in the most simple >>> way) to check if it actually fixes the delays I experience on my system >>> ? >> >> So I take it the patch I sent out didn't work for you? Can you tell me >> what machine/DT you are using? > > I've replied to the patch: > > Based on v5.9-rc5, before the patch: > > [ 0.652887] cpuidle: using governor menu > [ 12.349476] No ATAGs? > > After the patch: > > [ 0.650460] cpuidle: using governor menu > [ 12.262101] No ATAGs? > > I'm using an AM57xx EVM, whose DT is not upstream, but it's essentially > a am57xx-beagle-x15-revb1.dts (it includes that DTS) with a few > additional nodes for GPIO keys, LCD panel, backlight and touchscreen. > hope you are receiving my mails as I've provided you with all required information already [1] with below diff: [ 4.177231] Freeing unused kernel memory: 1024K [ 4.181892] Run /sbin/init as init process The best time with [2] is [ 3.100483] Run /sbin/init as init process Still 1 sec lose. Pls understand an issue - requirements here are like 500ms boot with can, Ethernet, camera and display on ;( [1] https://lore.kernel.org/patchwork/patch/1316134/#1511276 [2] https://lore.kernel.org/patchwork/patch/1316134/#1511435 diff --git a/arch/arm/mach-omap2/pdata-quirks.c b/arch/arm/mach-omap2/pdata-quirks.c index 2a4fe3e68b82..ac1ab8928190 100644 --- a/arch/arm/mach-omap2/pdata-quirks.c +++ b/arch/arm/mach-omap2/pdata-quirks.c @@ -591,7 +591,9 @@ void __init pdata_quirks_init(const struct of_device_id *omap_dt_match_table) if (of_machine_is_compatible("ti,omap3")) omap3_mcbsp_init(); pdata_quirks_check(auxdata_quirks); + fw_devlink_pause(); of_platform_populate(NULL, omap_dt_match_table, omap_auxdata_lookup, NULL); + fw_devlink_resume(); pdata_quirks_check(pdata_quirks); }
On Fri, Oct 2, 2020 at 11:35 AM 'Grygorii Strashko' via kernel-team <kernel-team@android.com> wrote: > > hi Saravana, > > On 02/10/2020 21:27, Laurent Pinchart wrote: > > Hi Saravana, > > > > On Fri, Oct 02, 2020 at 10:58:55AM -0700, Saravana Kannan wrote: > >> On Fri, Oct 2, 2020 at 10:55 AM Laurent Pinchart wrote: > >>> On Fri, Oct 02, 2020 at 10:51:51AM -0700, Saravana Kannan wrote: > >>>> On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote: > >>>>> On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote: > >>>>>> > >>>>>> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when > >>>>>> adding all top level devices") optimized the fwnode parsing when all top > >>>>>> level devices are added, it missed out optimizing this for platform > >>>>>> where the top level devices are added through the init_machine() path. > >>>>>> > >>>>>> This commit does the optimization for all paths by simply moving the > >>>>>> fw_devlink_pause/resume() inside of_platform_default_populate(). > >>>>>> > >>>>>> Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com> > >>>>>> Signed-off-by: Saravana Kannan <saravanak@google.com> > >>>>>> --- > >>>>>> drivers/of/platform.c | 19 +++++++++++++++---- > >>>>>> 1 file changed, 15 insertions(+), 4 deletions(-) > >>>>>> > >>>>>> diff --git a/drivers/of/platform.c b/drivers/of/platform.c > >>>>>> index 071f04da32c8..79972e49b539 100644 > >>>>>> --- a/drivers/of/platform.c > >>>>>> +++ b/drivers/of/platform.c > >>>>>> @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root, > >>>>>> const struct of_dev_auxdata *lookup, > >>>>>> struct device *parent) > >>>>>> { > >>>>>> - return of_platform_populate(root, of_default_bus_match_table, lookup, > >>>>>> - parent); > >>>>>> + int ret; > >>>>>> + > >>>>>> + /* > >>>>>> + * fw_devlink_pause/resume() are only safe to be called around top > >>>>>> + * level device addition due to locking constraints. > >>>>>> + */ > >>>>>> + if (!root) > >>>>>> + fw_devlink_pause(); > >>>>>> + > >>>>>> + ret = of_platform_populate(root, of_default_bus_match_table, lookup, > >>>>>> + parent); > >>>>> > >>>>> of_platform_default_populate() vs. of_platform_populate() is just a > >>>>> different match table. I don't think the behavior should otherwise be > >>>>> different. > >>>>> > >>>>> There's also of_platform_probe() which has slightly different matching > >>>>> behavior. It should not behave differently either with respect to > >>>>> devlinks. > >>>> > >>>> So I'm trying to do this only when the top level devices are added for > >>>> the first time. of_platform_default_populate() seems to be the most > >>>> common path. For other cases, I think we just need to call > >>>> fw_devlink_pause/resume() wherever the top level devices are added for > >>>> the first time. As I said in the other email, we can't add > >>>> fw_devlink_pause/resume() by default to of_platform_populate(). > >>>> > >>>> Do you have other ideas for achieving "call fw_devlink_pause/resume() > >>>> only when top level devices are added for the first time"? > >>> > >>> I'm not an expert in this domain, but before investigating it, would you > >>> be able to share a hack patch that implements this (in the most simple > >>> way) to check if it actually fixes the delays I experience on my system > >>> ? > >> > >> So I take it the patch I sent out didn't work for you? Can you tell me > >> what machine/DT you are using? > > > > I've replied to the patch: > > > > Based on v5.9-rc5, before the patch: > > > > [ 0.652887] cpuidle: using governor menu > > [ 12.349476] No ATAGs? > > > > After the patch: > > > > [ 0.650460] cpuidle: using governor menu > > [ 12.262101] No ATAGs? > > > > I'm using an AM57xx EVM, whose DT is not upstream, but it's essentially > > a am57xx-beagle-x15-revb1.dts (it includes that DTS) with a few > > additional nodes for GPIO keys, LCD panel, backlight and touchscreen. > > > > hope you are receiving my mails as I've provided you with all required information already [1] Laurent/Grygorii, Looks like I'm definitely missing emails. Sorry about the confusion. I have some other urgent things on my plate right now. Is it okay if I get to this in a day or two? In the end, we'll find a solution that addresses most/all of the delay. Thanks, Saravana
On Fri, Oct 2, 2020 at 12:52 PM Saravana Kannan <saravanak@google.com> wrote: > > On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote: > > > > On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote: > > > > > > When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when > > > adding all top level devices") optimized the fwnode parsing when all top > > > level devices are added, it missed out optimizing this for platform > > > where the top level devices are added through the init_machine() path. > > > > > > This commit does the optimization for all paths by simply moving the > > > fw_devlink_pause/resume() inside of_platform_default_populate(). > > > > > > Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com> > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > > --- > > > drivers/of/platform.c | 19 +++++++++++++++---- > > > 1 file changed, 15 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c > > > index 071f04da32c8..79972e49b539 100644 > > > --- a/drivers/of/platform.c > > > +++ b/drivers/of/platform.c > > > @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root, > > > const struct of_dev_auxdata *lookup, > > > struct device *parent) > > > { > > > - return of_platform_populate(root, of_default_bus_match_table, lookup, > > > - parent); > > > + int ret; > > > + > > > + /* > > > + * fw_devlink_pause/resume() are only safe to be called around top > > > + * level device addition due to locking constraints. > > > + */ > > > + if (!root) > > > + fw_devlink_pause(); > > > + > > > + ret = of_platform_populate(root, of_default_bus_match_table, lookup, > > > + parent); > > > > of_platform_default_populate() vs. of_platform_populate() is just a > > different match table. I don't think the behavior should otherwise be > > different. > > > > There's also of_platform_probe() which has slightly different matching > > behavior. It should not behave differently either with respect to > > devlinks. > > So I'm trying to do this only when the top level devices are added for > the first time. of_platform_default_populate() seems to be the most > common path. For other cases, I think we just need to call > fw_devlink_pause/resume() wherever the top level devices are added for > the first time. > As I said in the other email, we can't add > fw_devlink_pause/resume() by default to of_platform_populate(). If you detect it's the first time, you could? > > Do you have other ideas for achieving "call fw_devlink_pause/resume() > only when top level devices are added for the first time"? Eliminate the cases not using of_platform_default_populate(). There's 2 main reasons for the non default cases. The first is auxdata. Really, for any modern platform that people care about (and care about the boot time), they should not be using auxdata. That's just for the DT transition. You know, a temporary thing from 9 years ago. The 2nd is having some parent device. This is typically an soc_device. I really think this is kind of dumb. We should either have the parent device always or never. After all, everything's an SoC right? Of course changing that will break some Android systems since they like to use non-ABI sysfs device paths. There could also be some initcall ordering issues. IIRC, in the last round of cleanups in this area, at91 gpio/pinctrl had an issue with that. I think I have a half done fix for that I started. Rob
Hi Saravana, On Fri, Oct 02, 2020 at 12:56:30PM -0700, Saravana Kannan wrote: > On Fri, Oct 2, 2020 at 11:35 AM 'Grygorii Strashko' via kernel-team wrote: > > On 02/10/2020 21:27, Laurent Pinchart wrote: > > > On Fri, Oct 02, 2020 at 10:58:55AM -0700, Saravana Kannan wrote: > > >> On Fri, Oct 2, 2020 at 10:55 AM Laurent Pinchart wrote: > > >>> On Fri, Oct 02, 2020 at 10:51:51AM -0700, Saravana Kannan wrote: > > >>>> On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote: > > >>>>> On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote: > > >>>>>> > > >>>>>> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when > > >>>>>> adding all top level devices") optimized the fwnode parsing when all top > > >>>>>> level devices are added, it missed out optimizing this for platform > > >>>>>> where the top level devices are added through the init_machine() path. > > >>>>>> > > >>>>>> This commit does the optimization for all paths by simply moving the > > >>>>>> fw_devlink_pause/resume() inside of_platform_default_populate(). > > >>>>>> > > >>>>>> Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com> > > >>>>>> Signed-off-by: Saravana Kannan <saravanak@google.com> > > >>>>>> --- > > >>>>>> drivers/of/platform.c | 19 +++++++++++++++---- > > >>>>>> 1 file changed, 15 insertions(+), 4 deletions(-) > > >>>>>> > > >>>>>> diff --git a/drivers/of/platform.c b/drivers/of/platform.c > > >>>>>> index 071f04da32c8..79972e49b539 100644 > > >>>>>> --- a/drivers/of/platform.c > > >>>>>> +++ b/drivers/of/platform.c > > >>>>>> @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root, > > >>>>>> const struct of_dev_auxdata *lookup, > > >>>>>> struct device *parent) > > >>>>>> { > > >>>>>> - return of_platform_populate(root, of_default_bus_match_table, lookup, > > >>>>>> - parent); > > >>>>>> + int ret; > > >>>>>> + > > >>>>>> + /* > > >>>>>> + * fw_devlink_pause/resume() are only safe to be called around top > > >>>>>> + * level device addition due to locking constraints. > > >>>>>> + */ > > >>>>>> + if (!root) > > >>>>>> + fw_devlink_pause(); > > >>>>>> + > > >>>>>> + ret = of_platform_populate(root, of_default_bus_match_table, lookup, > > >>>>>> + parent); > > >>>>> > > >>>>> of_platform_default_populate() vs. of_platform_populate() is just a > > >>>>> different match table. I don't think the behavior should otherwise be > > >>>>> different. > > >>>>> > > >>>>> There's also of_platform_probe() which has slightly different matching > > >>>>> behavior. It should not behave differently either with respect to > > >>>>> devlinks. > > >>>> > > >>>> So I'm trying to do this only when the top level devices are added for > > >>>> the first time. of_platform_default_populate() seems to be the most > > >>>> common path. For other cases, I think we just need to call > > >>>> fw_devlink_pause/resume() wherever the top level devices are added for > > >>>> the first time. As I said in the other email, we can't add > > >>>> fw_devlink_pause/resume() by default to of_platform_populate(). > > >>>> > > >>>> Do you have other ideas for achieving "call fw_devlink_pause/resume() > > >>>> only when top level devices are added for the first time"? > > >>> > > >>> I'm not an expert in this domain, but before investigating it, would you > > >>> be able to share a hack patch that implements this (in the most simple > > >>> way) to check if it actually fixes the delays I experience on my system > > >>> ? > > >> > > >> So I take it the patch I sent out didn't work for you? Can you tell me > > >> what machine/DT you are using? > > > > > > I've replied to the patch: > > > > > > Based on v5.9-rc5, before the patch: > > > > > > [ 0.652887] cpuidle: using governor menu > > > [ 12.349476] No ATAGs? > > > > > > After the patch: > > > > > > [ 0.650460] cpuidle: using governor menu > > > [ 12.262101] No ATAGs? > > > > > > I'm using an AM57xx EVM, whose DT is not upstream, but it's essentially > > > a am57xx-beagle-x15-revb1.dts (it includes that DTS) with a few > > > additional nodes for GPIO keys, LCD panel, backlight and touchscreen. > > > > > > > hope you are receiving my mails as I've provided you with all required information already [1] > > Laurent/Grygorii, > > Looks like I'm definitely missing emails. Sorry about the confusion. > > I have some other urgent things on my plate right now. Is it okay if I > get to this in a day or two? In the end, we'll find a solution that > addresses most/all of the delay. No issue on my side. By the way, during initial investigations, I've traced code paths to figure out if there was a particular step that would consume a large amount of time, and found out that of_platform_populate() ends up executing devlink-related code that seems to have an O(n^3) complexity on the number of devices, with a few dozens of milliseconds for each iteration. That's a very bad complexity.
On Fri, Oct 2, 2020 at 5:14 PM Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > Hi Saravana, > > On Fri, Oct 02, 2020 at 12:56:30PM -0700, Saravana Kannan wrote: > > On Fri, Oct 2, 2020 at 11:35 AM 'Grygorii Strashko' via kernel-team wrote: > > > On 02/10/2020 21:27, Laurent Pinchart wrote: > > > > On Fri, Oct 02, 2020 at 10:58:55AM -0700, Saravana Kannan wrote: > > > >> On Fri, Oct 2, 2020 at 10:55 AM Laurent Pinchart wrote: > > > >>> On Fri, Oct 02, 2020 at 10:51:51AM -0700, Saravana Kannan wrote: > > > >>>> On Fri, Oct 2, 2020 at 7:08 AM Rob Herring <robh+dt@kernel.org> wrote: > > > >>>>> On Thu, Oct 1, 2020 at 5:59 PM Saravana Kannan <saravanak@google.com> wrote: > > > >>>>>> > > > >>>>>> When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when > > > >>>>>> adding all top level devices") optimized the fwnode parsing when all top > > > >>>>>> level devices are added, it missed out optimizing this for platform > > > >>>>>> where the top level devices are added through the init_machine() path. > > > >>>>>> > > > >>>>>> This commit does the optimization for all paths by simply moving the > > > >>>>>> fw_devlink_pause/resume() inside of_platform_default_populate(). > > > >>>>>> > > > >>>>>> Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com> > > > >>>>>> Signed-off-by: Saravana Kannan <saravanak@google.com> > > > >>>>>> --- > > > >>>>>> drivers/of/platform.c | 19 +++++++++++++++---- > > > >>>>>> 1 file changed, 15 insertions(+), 4 deletions(-) > > > >>>>>> > > > >>>>>> diff --git a/drivers/of/platform.c b/drivers/of/platform.c > > > >>>>>> index 071f04da32c8..79972e49b539 100644 > > > >>>>>> --- a/drivers/of/platform.c > > > >>>>>> +++ b/drivers/of/platform.c > > > >>>>>> @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root, > > > >>>>>> const struct of_dev_auxdata *lookup, > > > >>>>>> struct device *parent) > > > >>>>>> { > > > >>>>>> - return of_platform_populate(root, of_default_bus_match_table, lookup, > > > >>>>>> - parent); > > > >>>>>> + int ret; > > > >>>>>> + > > > >>>>>> + /* > > > >>>>>> + * fw_devlink_pause/resume() are only safe to be called around top > > > >>>>>> + * level device addition due to locking constraints. > > > >>>>>> + */ > > > >>>>>> + if (!root) > > > >>>>>> + fw_devlink_pause(); > > > >>>>>> + > > > >>>>>> + ret = of_platform_populate(root, of_default_bus_match_table, lookup, > > > >>>>>> + parent); > > > >>>>> > > > >>>>> of_platform_default_populate() vs. of_platform_populate() is just a > > > >>>>> different match table. I don't think the behavior should otherwise be > > > >>>>> different. > > > >>>>> > > > >>>>> There's also of_platform_probe() which has slightly different matching > > > >>>>> behavior. It should not behave differently either with respect to > > > >>>>> devlinks. > > > >>>> > > > >>>> So I'm trying to do this only when the top level devices are added for > > > >>>> the first time. of_platform_default_populate() seems to be the most > > > >>>> common path. For other cases, I think we just need to call > > > >>>> fw_devlink_pause/resume() wherever the top level devices are added for > > > >>>> the first time. As I said in the other email, we can't add > > > >>>> fw_devlink_pause/resume() by default to of_platform_populate(). > > > >>>> > > > >>>> Do you have other ideas for achieving "call fw_devlink_pause/resume() > > > >>>> only when top level devices are added for the first time"? > > > >>> > > > >>> I'm not an expert in this domain, but before investigating it, would you > > > >>> be able to share a hack patch that implements this (in the most simple > > > >>> way) to check if it actually fixes the delays I experience on my system > > > >>> ? > > > >> > > > >> So I take it the patch I sent out didn't work for you? Can you tell me > > > >> what machine/DT you are using? > > > > > > > > I've replied to the patch: > > > > > > > > Based on v5.9-rc5, before the patch: > > > > > > > > [ 0.652887] cpuidle: using governor menu > > > > [ 12.349476] No ATAGs? > > > > > > > > After the patch: > > > > > > > > [ 0.650460] cpuidle: using governor menu > > > > [ 12.262101] No ATAGs? > > > > > > > > I'm using an AM57xx EVM, whose DT is not upstream, but it's essentially > > > > a am57xx-beagle-x15-revb1.dts (it includes that DTS) with a few > > > > additional nodes for GPIO keys, LCD panel, backlight and touchscreen. > > > > > > > > > > hope you are receiving my mails as I've provided you with all required information already [1] > > > > Laurent/Grygorii, > > > > Looks like I'm definitely missing emails. Sorry about the confusion. > > > > I have some other urgent things on my plate right now. Is it okay if I > > get to this in a day or two? In the end, we'll find a solution that > > addresses most/all of the delay. > > No issue on my side. Hi Laurent, Sorry it took awhile for me to get back to this. Can you try throwing around fw_devlink_pause/resume() around the of_platform_populate() call in arch/arm/mach-omap2/pdata-quirks.c? Just trying to verify the cause/fix. If it fixes the issue, then considering Rob's comments [1], a good short term solution might be to have the suggestion above and some way to do pause/resume only when the top level devices are added. > By the way, during initial investigations, I've traced code paths to > figure out if there was a particular step that would consume a large > amount of time, and found out that of_platform_populate() ends up > executing devlink-related code that seems to have an O(n^3) complexity > on the number of devices, with a few dozens of milliseconds for each > iteration. That's a very bad complexity. As you said, the complexity of fw_devlink parsing can be O(N^2). There are other ways to improve it to make it O(N) but it has a bunch of additional complexity and memory increase. When I tried to do it that way the first time, I was question whether O(N^2) actually translated to measurable difference. Looks like we do now :) I have something in mind for how to do it with O(N) complexity, but I expect it to take a while to get in. So in the meantime, I'm thinking of using fw_devlink_pause/resume() as a short term optimization. -Saravana [1] - https://lore.kernel.org/linux-omap/CAL_Jsq+6mxtFei3+1ic4c5XCftJ8nZK6_S5_d15yEXQ02BTNKw@mail.gmail.com/
diff --git a/drivers/of/platform.c b/drivers/of/platform.c index 071f04da32c8..79972e49b539 100644 --- a/drivers/of/platform.c +++ b/drivers/of/platform.c @@ -501,8 +501,21 @@ int of_platform_default_populate(struct device_node *root, const struct of_dev_auxdata *lookup, struct device *parent) { - return of_platform_populate(root, of_default_bus_match_table, lookup, - parent); + int ret; + + /* + * fw_devlink_pause/resume() are only safe to be called around top + * level device addition due to locking constraints. + */ + if (!root) + fw_devlink_pause(); + + ret = of_platform_populate(root, of_default_bus_match_table, lookup, + parent); + + if (!root) + fw_devlink_resume(); + return ret; } EXPORT_SYMBOL_GPL(of_platform_default_populate); @@ -538,9 +551,7 @@ static int __init of_platform_default_populate_init(void) } /* Populate everything else. */ - fw_devlink_pause(); of_platform_default_populate(NULL, NULL, NULL); - fw_devlink_resume(); return 0; }
When commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when adding all top level devices") optimized the fwnode parsing when all top level devices are added, it missed out optimizing this for platform where the top level devices are added through the init_machine() path. This commit does the optimization for all paths by simply moving the fw_devlink_pause/resume() inside of_platform_default_populate(). Reported-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Saravana Kannan <saravanak@google.com> --- drivers/of/platform.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-)