diff mbox series

[v2,1/2] arm: stm32mp: activate data cache in SPL and before relocation

Message ID 20200403105644.v2.1.I2ff601b652f4995a3401dc67c2369a4187046ed8@changeid
State Superseded
Headers show
Series arm: stm32mp1: activate data cache in SPL and before relocation | expand

Commit Message

Patrick Delaunay April 3, 2020, 9:25 a.m. UTC
Activate the data cache in SPL and in U-Boot before relocation.

In arch_cpu_init(), the function early_enable_caches() sets the early
TLB, early_tlb[] located .init section, and set cacheable:
- for SPL, all the SYSRAM
- for U-Boot, all the DDR

After relocation, the function enable_caches() (called by board_r)
reconfigures the MMU with new TLB location (reserved in
board_f.c::reserve_mmu) and re-enable the data cache.

This patch allows to reduce the execution time, particularly
- for the device tree parsing in U-Boot pre-reloc stage
  (dm_extended_scan_fd =>dm_scan_fdt)
- in I2C timing computation in SPL (stm32_i2c_choose_solution())

For example, the result on STM32MP157C-DK2 board is:
   1,6s gain for trusted boot chain with TF-A
   2,2s gain for basic boot chain with SPL

As TLB is added in .data section, the binary size increased and
the SPL load time by ROM code increased (30ms on DK2).

Signed-off-by: Patrick Delaunay <patrick.delaunay at st.com>
---

Changes in v2:
- create a new function early_enable_caches
- use TLB in .init section
- use the default weak dram_bank_mmu_setup() and
  use mmu_set_region_dcache_behaviour() to setup
  the early MMU configuration
- enable data cache on DDR in SPL, after DDR controller initialization

 arch/arm/mach-stm32mp/cpu.c | 43 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

Comments

Marek Vasut April 3, 2020, 9:31 p.m. UTC | #1
On 4/3/20 11:25 AM, Patrick Delaunay wrote:
[...]
> diff --git a/arch/arm/mach-stm32mp/cpu.c b/arch/arm/mach-stm32mp/cpu.c
> index 36a9205819..c22c1a9bbc 100644
> --- a/arch/arm/mach-stm32mp/cpu.c
> +++ b/arch/arm/mach-stm32mp/cpu.c
> @@ -75,6 +75,12 @@
>  #define PKG_SHIFT	27
>  #define PKG_MASK	GENMASK(2, 0)
>  
> +/*
> + * early TLB into the .data section so that it not get cleared
> + * with 16kB allignment (see TTBR0_BASE_ADDR_MASK)
> + */
> +u8 early_tlb[PGTABLE_SIZE] __section(".data") __aligned(0x4000);

Can you early-malloc this one ?
(why do you need this in __section("data") ?)

[...]
Patrick Delaunay April 9, 2020, 6:32 p.m. UTC | #2
Dear Marek,

> From: Marek Vasut <marex at denx.de>
> Sent: vendredi 3 avril 2020 23:32
> 
> On 4/3/20 11:25 AM, Patrick Delaunay wrote:
> [...]
> > diff --git a/arch/arm/mach-stm32mp/cpu.c b/arch/arm/mach-stm32mp/cpu.c
> > index 36a9205819..c22c1a9bbc 100644
> > --- a/arch/arm/mach-stm32mp/cpu.c
> > +++ b/arch/arm/mach-stm32mp/cpu.c
> > @@ -75,6 +75,12 @@
> >  #define PKG_SHIFT	27
> >  #define PKG_MASK	GENMASK(2, 0)
> >
> > +/*
> > + * early TLB into the .data section so that it not get cleared
> > + * with 16kB allignment (see TTBR0_BASE_ADDR_MASK)  */
> > +u8 early_tlb[PGTABLE_SIZE] __section(".data") __aligned(0x4000);
> 
> Can you early-malloc this one ?

I try to early maloc and it is failing because my code in arch_cpu_init() is executed before 
the early poll initialization done in spl_common_init () called by spl_early_init()
So it too late for my use case....

And if I initialise the MMU and the cache after this function it is too late, as
dm_init_and_scan and fdt parsin is also called in spl_common_init()

> (why do you need this in __section("data") ?)

I try to use .bss and it is failing because the bss is resetted to 0 in SPL 
after board_init_f, and the MMU is cleared without notice.

In fact BBS is not available, board_init_f() can use only stack variables
and global_data (see README:258).

When I investigate the issue, I found CONFIG_SPL_EARLY_BSS
that explain this point :

config SPL_EARLY_BSS
	depends on ARM && !ARM64
	bool "Allows initializing BSS early before entering board_init_f"
	help
	  On some platform we have sufficient memory available early on to
	  allow setting up and using a basic BSS prior to entering
	  board_init_f. Activating this option will also de-activate the
	  clearing of BSS during the SPL relocation process, thus allowing
	  to carry state from board_init_f to board_init_r by way of BSS.

So it is s compromise between harcoded addred (end of SYSRAM)
or glabal variable in .data section

V2 patch with .data seems more elegant for me (it avoid assumption on
U-Boot size for preloc case).

And if you have size issue for SPL you can deactivate cache for SPL only
(CONFIG_SPL_SYS_DCACHE_OFF).

> [...]

Regards

Patrick
Marek Vasut April 10, 2020, 8:15 a.m. UTC | #3
On 4/9/20 8:32 PM, Patrick DELAUNAY wrote:
> Dear Marek,
> 
>> From: Marek Vasut <marex at denx.de>
>> Sent: vendredi 3 avril 2020 23:32
>>
>> On 4/3/20 11:25 AM, Patrick Delaunay wrote:
>> [...]
>>> diff --git a/arch/arm/mach-stm32mp/cpu.c b/arch/arm/mach-stm32mp/cpu.c
>>> index 36a9205819..c22c1a9bbc 100644
>>> --- a/arch/arm/mach-stm32mp/cpu.c
>>> +++ b/arch/arm/mach-stm32mp/cpu.c
>>> @@ -75,6 +75,12 @@
>>>  #define PKG_SHIFT	27
>>>  #define PKG_MASK	GENMASK(2, 0)
>>>
>>> +/*
>>> + * early TLB into the .data section so that it not get cleared
>>> + * with 16kB allignment (see TTBR0_BASE_ADDR_MASK)  */
>>> +u8 early_tlb[PGTABLE_SIZE] __section(".data") __aligned(0x4000);
>>
>> Can you early-malloc this one ?
> 
> I try to early maloc and it is failing because my code in arch_cpu_init() is executed before 
> the early poll initialization done in spl_common_init () called by spl_early_init()
> So it too late for my use case....
> 
> And if I initialise the MMU and the cache after this function it is too late, as
> dm_init_and_scan and fdt parsin is also called in spl_common_init()

Aha, OK. Can you document it in the commit message ? That's a real good
piece of information.

>> (why do you need this in __section("data") ?)
> 
> I try to use .bss and it is failing because the bss is resetted to 0 in SPL 
> after board_init_f, and the MMU is cleared without notice.
> 
> In fact BBS is not available, board_init_f() can use only stack variables
> and global_data (see README:258).
> 
> When I investigate the issue, I found CONFIG_SPL_EARLY_BSS
> that explain this point :
> 
> config SPL_EARLY_BSS
> 	depends on ARM && !ARM64
> 	bool "Allows initializing BSS early before entering board_init_f"
> 	help
> 	  On some platform we have sufficient memory available early on to
> 	  allow setting up and using a basic BSS prior to entering
> 	  board_init_f. Activating this option will also de-activate the
> 	  clearing of BSS during the SPL relocation process, thus allowing
> 	  to carry state from board_init_f to board_init_r by way of BSS.
> 
> So it is s compromise between harcoded addred (end of SYSRAM)
> or glabal variable in .data section
> 
> V2 patch with .data seems more elegant for me (it avoid assumption on
> U-Boot size for preloc case).
> 
> And if you have size issue for SPL you can deactivate cache for SPL only
> (CONFIG_SPL_SYS_DCACHE_OFF).

OK
diff mbox series

Patch

diff --git a/arch/arm/mach-stm32mp/cpu.c b/arch/arm/mach-stm32mp/cpu.c
index 36a9205819..c22c1a9bbc 100644
--- a/arch/arm/mach-stm32mp/cpu.c
+++ b/arch/arm/mach-stm32mp/cpu.c
@@ -75,6 +75,12 @@ 
 #define PKG_SHIFT	27
 #define PKG_MASK	GENMASK(2, 0)
 
+/*
+ * early TLB into the .data section so that it not get cleared
+ * with 16kB allignment (see TTBR0_BASE_ADDR_MASK)
+ */
+u8 early_tlb[PGTABLE_SIZE] __section(".data") __aligned(0x4000);
+
 #if !defined(CONFIG_SPL) || defined(CONFIG_SPL_BUILD)
 #ifndef CONFIG_STM32MP1_TRUSTED
 static void security_init(void)
@@ -186,6 +192,32 @@  u32 get_bootmode(void)
 		    TAMP_BOOT_MODE_SHIFT;
 }
 
+/*
+ * initialize the MMU and activate cache in SPL or in U- Boot pre-reloc stage
+ * MMU/TLB is updated in enable_caches() for U-Boot after relocation
+ * or is deactivated in U-Boot entry function start.S::cpu_init_cp15
+ */
+static void early_enable_caches(void)
+{
+	/* I-cache is already enabled in start.S: cpu_init_cp15 */
+
+	if (CONFIG_IS_ENABLED(SYS_DCACHE_OFF))
+		return;
+
+	gd->arch.tlb_size = PGTABLE_SIZE;
+	gd->arch.tlb_addr = (unsigned long)&early_tlb;
+
+	dcache_enable();
+
+	if (IS_ENABLED(CONFIG_SPL_BUILD))
+		mmu_set_region_dcache_behaviour(STM32_SYSRAM_BASE,
+						STM32_SYSRAM_SIZE,
+						DCACHE_DEFAULT_OPTION);
+	else
+		mmu_set_region_dcache_behaviour(STM32_DDR_BASE, STM32_DDR_SIZE,
+						DCACHE_DEFAULT_OPTION);
+}
+
 /*
  * Early system init
  */
@@ -193,6 +225,8 @@  int arch_cpu_init(void)
 {
 	u32 boot_mode;
 
+	early_enable_caches();
+
 	/* early armv7 timer init: needed for polling */
 	timer_init();
 
@@ -225,7 +259,14 @@  int arch_cpu_init(void)
 
 void enable_caches(void)
 {
-	/* Enable D-cache. I-cache is already enabled in start.S */
+	/* I-cache is already enabled in start.S: icache_enable() not needed */
+
+	/* deactivate the data cache, early enabled in arch_cpu_init() */
+	dcache_disable();
+	/*
+	 * update MMU after relocation and enable the data cache
+	 * warning: the TLB location udpated in board_f.c::reserve_mmu
+	 */
 	dcache_enable();
 }