diff mbox series

[v4,1/2] arm: stm32mp: activate data cache in SPL and before relocation

Message ID 20200430163010.v4.1.I2ff601b652f4995a3401dc67c2369a4187046ed8@changeid
State Accepted
Commit 7e8471cae5c6614c54b9cfae2746d7299bd47a0c
Headers show
Series arm: stm32mp1: activate data cache in SPL and before relocation | expand

Commit Message

Patrick Delaunay April 30, 2020, 2:30 p.m. UTC
Activate the data cache in SPL and in U-Boot before relocation.

In arch_cpu_init(), the function early_enable_caches() sets the early
TLB, early_tlb[] located .init section, and set cacheable:
- for SPL, all the SYSRAM
- for U-Boot, all the DDR

After relocation, the function enable_caches() (called by board_r)
reconfigures the MMU with new TLB location (reserved in
board_f.c::reserve_mmu) and re-enable the data cache.

This patch allows to reduce the execution time, particularly
- for the device tree parsing in U-Boot pre-reloc stage
  (dm_extended_scan_fd =>dm_scan_fdt)
- in I2C timing computation in SPL (stm32_i2c_choose_solution())

For example, the result on STM32MP157C-DK2 board is:
   1,6s gain for trusted boot chain with TF-A
   2,2s gain for basic boot chain with SPL

For information, as TLB is added in .data section, the binary size
increased and the SPL load time by ROM code increased (30ms on DK2).

But early malloc can't be used for TLB because arch_cpu_init()
is executed before the early poll initialization done in spl_common_init()
called by spl_early_init() So it too late for this use case.
And if I initialize the MMU and the cache after this function it is
too late, as dm_init_and_scan and fdt parsing is also called in
spl_common_init().

And .BSS can be used in board_init_f(): only stack and global can use
before BSS init done in board_init_r().

So .data is the better solution without hardcoded location but if you
have size issue for SPL you can deactivate cache for SPL only
(with CONFIG_SPL_SYS_DCACHE_OFF).

Reviewed-by: Patrice Chotard <patrice.chotard at st.com>
Signed-off-by: Patrick Delaunay <patrick.delaunay at st.com>
---

Changes in v4:
- fix commit message and comment and add Patrice Chotard reviewed-by

Changes in v3:
- add Information in commit-message on early malloc and .BSS

Changes in v2:
- create a new function early_enable_caches
- use TLB in .init section
- use the default weak dram_bank_mmu_setup() and
  use mmu_set_region_dcache_behaviour() to setup
  the early MMU configuration
- enable data cache on DDR in SPL, after DDR controller initialization

 arch/arm/mach-stm32mp/cpu.c | 43 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

Comments

Patrick Delaunay May 14, 2020, 9:39 a.m. UTC | #1
Hi,

> From: Patrick DELAUNAY <patrick.delaunay at st.com>
> Sent: jeudi 30 avril 2020 16:30
> 
> Activate the data cache in SPL and in U-Boot before relocation.
> 
> In arch_cpu_init(), the function early_enable_caches() sets the early TLB,
> early_tlb[] located .init section, and set cacheable:
> - for SPL, all the SYSRAM
> - for U-Boot, all the DDR
> 
> After relocation, the function enable_caches() (called by board_r) reconfigures the
> MMU with new TLB location (reserved in
> board_f.c::reserve_mmu) and re-enable the data cache.
> 
> This patch allows to reduce the execution time, particularly
> - for the device tree parsing in U-Boot pre-reloc stage
>   (dm_extended_scan_fd =>dm_scan_fdt)
> - in I2C timing computation in SPL (stm32_i2c_choose_solution())
> 
> For example, the result on STM32MP157C-DK2 board is:
>    1,6s gain for trusted boot chain with TF-A
>    2,2s gain for basic boot chain with SPL
> 
> For information, as TLB is added in .data section, the binary size increased and
> the SPL load time by ROM code increased (30ms on DK2).
> 
> But early malloc can't be used for TLB because arch_cpu_init() is executed before
> the early poll initialization done in spl_common_init() called by spl_early_init() So it
> too late for this use case.
> And if I initialize the MMU and the cache after this function it is too late, as
> dm_init_and_scan and fdt parsing is also called in spl_common_init().
> 
> And .BSS can be used in board_init_f(): only stack and global can use before BSS
> init done in board_init_r().
> 
> So .data is the better solution without hardcoded location but if you have size
> issue for SPL you can deactivate cache for SPL only (with
> CONFIG_SPL_SYS_DCACHE_OFF).
> 
> Reviewed-by: Patrice Chotard <patrice.chotard at st.com>
> Signed-off-by: Patrick Delaunay <patrick.delaunay at st.com>
> ---
> 
> Changes in v4:
> - fix commit message and comment and add Patrice Chotard reviewed-by
> 
> Changes in v3:
> - add Information in commit-message on early malloc and .BSS
> 
> Changes in v2:
> - create a new function early_enable_caches
> - use TLB in .init section
> - use the default weak dram_bank_mmu_setup() and
>   use mmu_set_region_dcache_behaviour() to setup
>   the early MMU configuration
> - enable data cache on DDR in SPL, after DDR controller initialization
> 
>  arch/arm/mach-stm32mp/cpu.c | 43
> ++++++++++++++++++++++++++++++++++++-
>  1 file changed, 42 insertions(+), 1 deletion(-)
> 

Applied to u-boot-stm/master, thanks!

Regards

Patrick
diff mbox series

Patch

diff --git a/arch/arm/mach-stm32mp/cpu.c b/arch/arm/mach-stm32mp/cpu.c
index 74d03fa7dd..bbffa3b9ff 100644
--- a/arch/arm/mach-stm32mp/cpu.c
+++ b/arch/arm/mach-stm32mp/cpu.c
@@ -75,6 +75,12 @@ 
 #define PKG_SHIFT	27
 #define PKG_MASK	GENMASK(2, 0)
 
+/*
+ * early TLB into the .data section so that it not get cleared
+ * with 16kB allignment (see TTBR0_BASE_ADDR_MASK)
+ */
+u8 early_tlb[PGTABLE_SIZE] __section(".data") __aligned(0x4000);
+
 #if !defined(CONFIG_SPL) || defined(CONFIG_SPL_BUILD)
 #ifndef CONFIG_TFABOOT
 static void security_init(void)
@@ -186,6 +192,32 @@  u32 get_bootmode(void)
 		    TAMP_BOOT_MODE_SHIFT;
 }
 
+/*
+ * initialize the MMU and activate cache in SPL or in U-Boot pre-reloc stage
+ * MMU/TLB is updated in enable_caches() for U-Boot after relocation
+ * or is deactivated in U-Boot entry function start.S::cpu_init_cp15
+ */
+static void early_enable_caches(void)
+{
+	/* I-cache is already enabled in start.S: cpu_init_cp15 */
+
+	if (CONFIG_IS_ENABLED(SYS_DCACHE_OFF))
+		return;
+
+	gd->arch.tlb_size = PGTABLE_SIZE;
+	gd->arch.tlb_addr = (unsigned long)&early_tlb;
+
+	dcache_enable();
+
+	if (IS_ENABLED(CONFIG_SPL_BUILD))
+		mmu_set_region_dcache_behaviour(STM32_SYSRAM_BASE,
+						STM32_SYSRAM_SIZE,
+						DCACHE_DEFAULT_OPTION);
+	else
+		mmu_set_region_dcache_behaviour(STM32_DDR_BASE, STM32_DDR_SIZE,
+						DCACHE_DEFAULT_OPTION);
+}
+
 /*
  * Early system init
  */
@@ -193,6 +225,8 @@  int arch_cpu_init(void)
 {
 	u32 boot_mode;
 
+	early_enable_caches();
+
 	/* early armv7 timer init: needed for polling */
 	timer_init();
 
@@ -225,7 +259,14 @@  int arch_cpu_init(void)
 
 void enable_caches(void)
 {
-	/* Enable D-cache. I-cache is already enabled in start.S */
+	/* I-cache is already enabled in start.S: icache_enable() not needed */
+
+	/* deactivate the data cache, early enabled in arch_cpu_init() */
+	dcache_disable();
+	/*
+	 * update MMU after relocation and enable the data cache
+	 * warning: the TLB location udpated in board_f.c::reserve_mmu
+	 */
 	dcache_enable();
 }