diff mbox

[4/4] arm64: align PHYS_OFFSET to block size

Message ID 1427125016-3873-5-git-send-email-ard.biesheuvel@linaro.org
State New
Headers show

Commit Message

Ard Biesheuvel March 23, 2015, 3:36 p.m. UTC
This aligns PHYS_OFFSET down to an alignment that allows system
RAM to be mapped using the largest blocks available, i.e., 1 GB
blocks on 4 KB pages or 512 MB blocks on 64 KB pages.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/kernel/head.S | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Comments

Ard Biesheuvel March 26, 2015, 6:22 a.m. UTC | #1
On 25 March 2015 at 15:59, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Mon, Mar 23, 2015 at 04:36:56PM +0100, Ard Biesheuvel wrote:
>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> index 16134608eecf..fd8434753372 100644
>> --- a/arch/arm64/kernel/head.S
>> +++ b/arch/arm64/kernel/head.S
>> @@ -49,13 +49,15 @@
>>  #ifdef CONFIG_ARM64_64K_PAGES
>>  #define BLOCK_SHIFT  PAGE_SHIFT
>>  #define BLOCK_SIZE   PAGE_SIZE
>> -#define TABLE_SHIFT  PMD_SHIFT
>>  #else
>>  #define BLOCK_SHIFT  SECTION_SHIFT
>>  #define BLOCK_SIZE   SECTION_SIZE
>> -#define TABLE_SHIFT  PUD_SHIFT
>>  #endif
>>
>> +#define TABLE_SHIFT  (BLOCK_SHIFT + PAGE_SHIFT - 3)
>> +#define TABLE_SIZE   (1 << TABLE_SHIFT)
>> +#define TABLE_MASK   (~(TABLE_SIZE - 1))
>> +
>>  #define KERNEL_START _text
>>  #define KERNEL_END   _end
>>
>> @@ -237,7 +239,10 @@ ENTRY(stext)
>>       bl      el2_setup                       // Drop to EL1, w20=cpu_boot_mode
>>
>>       adrp    x24, __PHYS_OFFSET
>> -     mov     x23, #KIMAGE_OFFSET
>> +     and     x23, x24, #~TABLE_MASK          // image offset
>> +     and     x24, x24, #TABLE_MASK           // PHYS_OFFSET
>> +     mov     x0, #KIMAGE_OFFSET
>> +     add     x23, x23, x0
>
> I'm still trying to figure out how this works. Does the code imply that
> the kernel image can only be loaded within a block size of the
> PHYS_OFFSET? If that's the case, it's not too flexible.
>

For now, yes.

> My thoughts were to make PHYS_OFFSET permanently 0 (i.e. get rid of it
> entirely) or at least compute it at DT parsing time. I'm more inclined
> for making it 0 assuming that it doesn't break anything else (vmemmap
> virtual range may get slightly bigger but still not significant,
> basically max_phys_addr / sizeof(struct page)).
>

Making it zero would be an improvement, I suppose
Ard Biesheuvel March 26, 2015, 6:23 a.m. UTC | #2
On 25 March 2015 at 15:14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Mon, Mar 23, 2015 at 04:36:56PM +0100, Ard Biesheuvel wrote:
>> This aligns PHYS_OFFSET down to an alignment that allows system
>> RAM to be mapped using the largest blocks available, i.e., 1 GB
>> blocks on 4 KB pages or 512 MB blocks on 64 KB pages.
>
> Does this assume that the platform RAM starts at such block aligned
> addresses? If not, is the unaligned range ignored?
>

No, it doesn't. The unaligned range is not mapped by the early code
(it only maps from the start of image)
Later on, it just maps whatever is in the DT nodes.
Ard Biesheuvel March 27, 2015, 1:16 p.m. UTC | #3
On 26 March 2015 at 07:22, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 25 March 2015 at 15:59, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On Mon, Mar 23, 2015 at 04:36:56PM +0100, Ard Biesheuvel wrote:
>>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>>> index 16134608eecf..fd8434753372 100644
>>> --- a/arch/arm64/kernel/head.S
>>> +++ b/arch/arm64/kernel/head.S
>>> @@ -49,13 +49,15 @@
>>>  #ifdef CONFIG_ARM64_64K_PAGES
>>>  #define BLOCK_SHIFT  PAGE_SHIFT
>>>  #define BLOCK_SIZE   PAGE_SIZE
>>> -#define TABLE_SHIFT  PMD_SHIFT
>>>  #else
>>>  #define BLOCK_SHIFT  SECTION_SHIFT
>>>  #define BLOCK_SIZE   SECTION_SIZE
>>> -#define TABLE_SHIFT  PUD_SHIFT
>>>  #endif
>>>
>>> +#define TABLE_SHIFT  (BLOCK_SHIFT + PAGE_SHIFT - 3)
>>> +#define TABLE_SIZE   (1 << TABLE_SHIFT)
>>> +#define TABLE_MASK   (~(TABLE_SIZE - 1))
>>> +
>>>  #define KERNEL_START _text
>>>  #define KERNEL_END   _end
>>>
>>> @@ -237,7 +239,10 @@ ENTRY(stext)
>>>       bl      el2_setup                       // Drop to EL1, w20=cpu_boot_mode
>>>
>>>       adrp    x24, __PHYS_OFFSET
>>> -     mov     x23, #KIMAGE_OFFSET
>>> +     and     x23, x24, #~TABLE_MASK          // image offset
>>> +     and     x24, x24, #TABLE_MASK           // PHYS_OFFSET
>>> +     mov     x0, #KIMAGE_OFFSET
>>> +     add     x23, x23, x0
>>
>> I'm still trying to figure out how this works. Does the code imply that
>> the kernel image can only be loaded within a block size of the
>> PHYS_OFFSET? If that's the case, it's not too flexible.
>>
>
> For now, yes.
>
>> My thoughts were to make PHYS_OFFSET permanently 0 (i.e. get rid of it
>> entirely) or at least compute it at DT parsing time. I'm more inclined
>> for making it 0 assuming that it doesn't break anything else (vmemmap
>> virtual range may get slightly bigger but still not significant,
>> basically max_phys_addr / sizeof(struct page)).
>>
>
> Making it zero would be an improvement, I suppose

Actually, wouldn't that reintroduce a similar VA range problem to the
one I fixed the other day?

On Seattle, with its DRAM at 0x80_0000_0000, you wouldn't have enough
space after PAGE_OFFSET
Ard Biesheuvel March 30, 2015, 2 p.m. UTC | #4
On 30 March 2015 at 15:49, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Fri, Mar 27, 2015 at 02:16:19PM +0100, Ard Biesheuvel wrote:
>> On 26 March 2015 at 07:22, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> > On 25 March 2015 at 15:59, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >> On Mon, Mar 23, 2015 at 04:36:56PM +0100, Ard Biesheuvel wrote:
>> >>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> >>> index 16134608eecf..fd8434753372 100644
>> >>> --- a/arch/arm64/kernel/head.S
>> >>> +++ b/arch/arm64/kernel/head.S
>> >>> @@ -49,13 +49,15 @@
>> >>>  #ifdef CONFIG_ARM64_64K_PAGES
>> >>>  #define BLOCK_SHIFT  PAGE_SHIFT
>> >>>  #define BLOCK_SIZE   PAGE_SIZE
>> >>> -#define TABLE_SHIFT  PMD_SHIFT
>> >>>  #else
>> >>>  #define BLOCK_SHIFT  SECTION_SHIFT
>> >>>  #define BLOCK_SIZE   SECTION_SIZE
>> >>> -#define TABLE_SHIFT  PUD_SHIFT
>> >>>  #endif
>> >>>
>> >>> +#define TABLE_SHIFT  (BLOCK_SHIFT + PAGE_SHIFT - 3)
>> >>> +#define TABLE_SIZE   (1 << TABLE_SHIFT)
>> >>> +#define TABLE_MASK   (~(TABLE_SIZE - 1))
>> >>> +
>> >>>  #define KERNEL_START _text
>> >>>  #define KERNEL_END   _end
>> >>>
>> >>> @@ -237,7 +239,10 @@ ENTRY(stext)
>> >>>       bl      el2_setup                       // Drop to EL1, w20=cpu_boot_mode
>> >>>
>> >>>       adrp    x24, __PHYS_OFFSET
>> >>> -     mov     x23, #KIMAGE_OFFSET
>> >>> +     and     x23, x24, #~TABLE_MASK          // image offset
>> >>> +     and     x24, x24, #TABLE_MASK           // PHYS_OFFSET
>> >>> +     mov     x0, #KIMAGE_OFFSET
>> >>> +     add     x23, x23, x0
>> >>
>> >> I'm still trying to figure out how this works. Does the code imply that
>> >> the kernel image can only be loaded within a block size of the
>> >> PHYS_OFFSET? If that's the case, it's not too flexible.
>> >
>> > For now, yes.
>
> Can we defer the setting of PHYS_OFFSET until we parse the DT memory
> nodes?
>

I experimented a bit with that, but it is quite hairy. Any
manipulation of the page tables goes through __va/__pa, so you need a
valid PHYS_OFFSET there to ensure they point at the right physical
region. But PHYS_OFFSET also needs to be small enough for the DT
parsing code not to disregard regions that are below it. And then
there is the memblock limit to ensure that early dynamically allocated
page tables come from a region that is already mapped.

I think it may be doable, but it would require some significant
hacking, e.g., call early_init_scan_dt() at its physical address with
only the ID map loaded and the MMU and caches on, and only after that
start populating the virtual address space. Or at least only populate
the lower half, i.e., mappings below PAGE_OFFSET for the kernel and
the FDT

>> >> My thoughts were to make PHYS_OFFSET permanently 0 (i.e. get rid of it
>> >> entirely) or at least compute it at DT parsing time. I'm more inclined
>> >> for making it 0 assuming that it doesn't break anything else (vmemmap
>> >> virtual range may get slightly bigger but still not significant,
>> >> basically max_phys_addr / sizeof(struct page)).
>> >
>> > Making it zero would be an improvement, I suppose
>>
>> Actually, wouldn't that reintroduce a similar VA range problem to the
>> one I fixed the other day?
>>
>> On Seattle, with its DRAM at 0x80_0000_0000, you wouldn't have enough
>> space after PAGE_OFFSET
>
> Ah, yes. When I thought about this PHYS_OFFSET == 0 in the past, we
> didn't have any patches and the VA range had to be sufficiently large.
>
> --
> Catalin
Ard Biesheuvel March 30, 2015, 6:08 p.m. UTC | #5
On 30 March 2015 at 17:00, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Mon, Mar 30, 2015 at 04:00:31PM +0200, Ard Biesheuvel wrote:
>> On 30 March 2015 at 15:49, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> > Can we defer the setting of PHYS_OFFSET until we parse the DT memory
>> > nodes?
>>
>> I experimented a bit with that, but it is quite hairy. Any
>> manipulation of the page tables goes through __va/__pa, so you need a
>> valid PHYS_OFFSET there to ensure they point at the right physical
>> region.
>
> Yes, so we need set PHYS_OFFSET before we start using __va/__pa.
>
>> But PHYS_OFFSET also needs to be small enough for the DT
>> parsing code not to disregard regions that are below it.
>
> With PHYS_OFFSET as 0 initially, no regions would be dropped. But we
> could write a simplified early_init_dt_add_memory_arch() which avoids
> the PHYS_OFFSET check while setting the actual PHYS_OFFSET to the
> minimal address detected. I can see the generic function only uses
> __pa() to get the PHYS_OFFSET.
>

Excellent idea.

>> And then
>> there is the memblock limit to ensure that early dynamically allocated
>> page tables come from a region that is already mapped.
>
> By the time we start using memblock allocations, we have a PHYS_OFFSET
> set. The early DT parsing does not require any memory allocations AFAIK.
> We need to make sure that setup_machine_fdt creates a VA mapping of the
> DT and does not require the __va() macro (I thought I've seen some
> patches to use fixmap for DT).
>

Yes, so did I :-)

But using early_fixmap() implies using the ordinary page tables
manipulation code, which uses __pa/__va
So instead, I should refactor those patches to simply pick a VA offset
and map the FDT there from head.S

>> I think it may be doable, but it would require some significant
>> hacking, e.g., call early_init_scan_dt() at its physical address with
>> only the ID map loaded and the MMU and caches on, and only after that
>> start populating the virtual address space. Or at least only populate
>> the lower half, i.e., mappings below PAGE_OFFSET for the kernel and
>> the FDT
>
> With some form of your patches, we already decouple the PAGE_OFFSET from
> the kernel text mapping. We map the latter at some very high
> KERNEL_PAGE_OFFSET, the DT via fixmap (which is part of the kernel data
> section, so mapped at the high KERNEL_PAGE_OFFSET). Once we start
> calling early_init_dt_add_memory_arch(), we set the real PAGE_OFFSET and
> are free to use __pa/__va after setup_machine_fdt(). The actual linear
> mappings will be created later via paging_init().
>

OK, that sounds like it could work.

The only thing to note is (and this should answer Mark's question in
the other thread) is that the pgdir region should be mapped via the
linear mapping as well.
The alternative is to make __va() check whether the input physical
address falls within the pgdir region, and subtract the PAGE_OFFSET -
KERNEL_PAGE_OFFSET from the return value, but this is less trivial
than the __pa() counterpart which only needs to check a single bit.

So I propose we take defines KERNEL_PAGE_OFFSET as (PAGE_OFFSET -
SZ_64M) and FDT_PAGE_OFFSET (or whatever you want to call it) as
(PAGE_OFFSET - SZ_2M).
That way, the kernel, fdt and module space always share the same 128
MB window, which we can move around in the vmalloc space later if we
want to implement kASLR.
diff mbox

Patch

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 16134608eecf..fd8434753372 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -49,13 +49,15 @@ 
 #ifdef CONFIG_ARM64_64K_PAGES
 #define BLOCK_SHIFT	PAGE_SHIFT
 #define BLOCK_SIZE	PAGE_SIZE
-#define TABLE_SHIFT	PMD_SHIFT
 #else
 #define BLOCK_SHIFT	SECTION_SHIFT
 #define BLOCK_SIZE	SECTION_SIZE
-#define TABLE_SHIFT	PUD_SHIFT
 #endif
 
+#define TABLE_SHIFT	(BLOCK_SHIFT + PAGE_SHIFT - 3)
+#define TABLE_SIZE	(1 << TABLE_SHIFT)
+#define TABLE_MASK	(~(TABLE_SIZE - 1))
+
 #define KERNEL_START	_text
 #define KERNEL_END	_end
 
@@ -237,7 +239,10 @@  ENTRY(stext)
 	bl	el2_setup			// Drop to EL1, w20=cpu_boot_mode
 
 	adrp	x24, __PHYS_OFFSET
-	mov	x23, #KIMAGE_OFFSET
+	and	x23, x24, #~TABLE_MASK		// image offset
+	and	x24, x24, #TABLE_MASK		// PHYS_OFFSET
+	mov	x0, #KIMAGE_OFFSET
+	add	x23, x23, x0
 
 	bl	set_cpu_boot_mode_flag
 	bl	__create_page_tables		// x25=TTBR0, x26=TTBR1