diff mbox

[v3,3/7] arm64: split off early mapping code from early_fixmap_init()

Message ID 1447672998-20981-4-git-send-email-ard.biesheuvel@linaro.org
State New
Headers show

Commit Message

Ard Biesheuvel Nov. 16, 2015, 11:23 a.m. UTC
This splits off and generalises the population of the statically
allocated fixmap page tables so that we may reuse it later for
the linear mapping once we move the kernel text mapping out of it.

This also involves taking into account that table entries at any of
the levels we are populating may have been populated already, since
the fixmap mapping might not be disjoint up to the pgd level anymore
from other early mappings.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

---
 arch/arm64/include/asm/compiler.h |  2 +
 arch/arm64/kernel/vmlinux.lds.S   | 12 ++--
 arch/arm64/mm/mmu.c               | 60 ++++++++++++++------
 3 files changed, 51 insertions(+), 23 deletions(-)

-- 
1.9.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Comments

Mark Rutland Dec. 3, 2015, 12:18 p.m. UTC | #1
Hi Ard,

Apologies that it's taken me so long to get around to this...

On Mon, Nov 16, 2015 at 12:23:14PM +0100, Ard Biesheuvel wrote:
> This splits off and generalises the population of the statically

> allocated fixmap page tables so that we may reuse it later for

> the linear mapping once we move the kernel text mapping out of it.

> 

> This also involves taking into account that table entries at any of

> the levels we are populating may have been populated already, since

> the fixmap mapping might not be disjoint up to the pgd level anymore

> from other early mappings.


As a heads-up, for avoiding TLB conflicts, I'm currently working on
alternative way of creating the kernel page tables which will definitely
conflict here, and may or may not supercede this approach.

By adding new FIX_{PGD,PUD,PMD,PTE} indicees to the fixmap, we can
allocate page tables from anywhere via memblock, and temporarily map
them as we need to.

That would avoid the need for the bootstrap tables. In head.S we'd only
need to create a temporary (coarse-grained, RWX) kernel mapping (with
the fixmap bolted on). Later we would create a whole new set of tables
with a fine-grained kernel mapping and a full linear mapping using the
new fixmap entries to temporarily map tables, then switch over to those
atomically.

Otherwise, one minor comment below.

> +static void __init bootstrap_early_mapping(unsigned long addr,

> +					   struct bootstrap_pgtables *reg,

> +					   bool pte_level)


The only caller in this patch passes true for pte_level.

Can we not introduce the argument when it is first needed? Or at least
have something in the commit message as to why we'll need it later?

>  	/*

>  	 * The boot-ioremap range spans multiple pmds, for which

> -	 * we are not preparted:

> +	 * we are not prepared:

>  	 */


I cannot wait to see this typo go!

Otherwise, this looks fine to me.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel Dec. 3, 2015, 1:31 p.m. UTC | #2
On 3 December 2015 at 13:18, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi Ard,

>

> Apologies that it's taken me so long to get around to this...

>

> On Mon, Nov 16, 2015 at 12:23:14PM +0100, Ard Biesheuvel wrote:

>> This splits off and generalises the population of the statically

>> allocated fixmap page tables so that we may reuse it later for

>> the linear mapping once we move the kernel text mapping out of it.

>>

>> This also involves taking into account that table entries at any of

>> the levels we are populating may have been populated already, since

>> the fixmap mapping might not be disjoint up to the pgd level anymore

>> from other early mappings.

>

> As a heads-up, for avoiding TLB conflicts, I'm currently working on

> alternative way of creating the kernel page tables which will definitely

> conflict here, and may or may not supercede this approach.

>

> By adding new FIX_{PGD,PUD,PMD,PTE} indicees to the fixmap, we can

> allocate page tables from anywhere via memblock, and temporarily map

> them as we need to.

>


Interesting. So how are you dealing with the va<->pa translations and
vice versa that occur all over the place in create_mapping() et al ?

> That would avoid the need for the bootstrap tables. In head.S we'd only

> need to create a temporary (coarse-grained, RWX) kernel mapping (with

> the fixmap bolted on). Later we would create a whole new set of tables

> with a fine-grained kernel mapping and a full linear mapping using the

> new fixmap entries to temporarily map tables, then switch over to those

> atomically.

>


If we change back to a full linear mapping, are we back to not putting
the Image astride a 1GB/32MB/512MB boundary (depending on page size)?

Anyway, to illustrate where I am headed with this: in my next version
of this series, I intend to move the kernel mapping to the start of
the vmalloc area, which gets moved up 64 MB to make room for the
module area (which also moves down). That way, we can still load
modules as before, but no longer have a need for a dedicated carveout
for the kernel below PAGE_OFFSET.

The next step is then to move the kernel Image up inside the vmalloc
area based on some randomness we get from the bootloader, and relocate
it in place (using the same approach as in the patches I sent out
beginning of this year). I have implemented module PLTs so that the
Image and the modules no longer need to be within 128 MB of each
other, which means that we can have full KASLR for modules and Image,
and also place the kernel anywhere in physical memory.The module PLTs
would be a runtime penalty only, i.e., a KASLR capable kernel running
without KASLR would not incur the penalty of branching via PLTs. The
only build time option is -mcmodel=large for modules so that data
symbol references are absolute, but that is unlike to hurt
performance.

> Otherwise, one minor comment below.

>

>> +static void __init bootstrap_early_mapping(unsigned long addr,

>> +                                        struct bootstrap_pgtables *reg,

>> +                                        bool pte_level)

>

> The only caller in this patch passes true for pte_level.

>

> Can we not introduce the argument when it is first needed? Or at least

> have something in the commit message as to why we'll need it later?

>


Yes, that should be possible.

>>       /*

>>        * The boot-ioremap range spans multiple pmds, for which

>> -      * we are not preparted:

>> +      * we are not prepared:

>>        */

>

> I cannot wait to see this typo go!

>

> Otherwise, this looks fine to me.

>


Thanks Mark

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Mark Rutland Dec. 3, 2015, 1:59 p.m. UTC | #3
On Thu, Dec 03, 2015 at 02:31:19PM +0100, Ard Biesheuvel wrote:
> On 3 December 2015 at 13:18, Mark Rutland <mark.rutland@arm.com> wrote:

> > Hi Ard,

> >

> > Apologies that it's taken me so long to get around to this...

> >

> > On Mon, Nov 16, 2015 at 12:23:14PM +0100, Ard Biesheuvel wrote:

> >> This splits off and generalises the population of the statically

> >> allocated fixmap page tables so that we may reuse it later for

> >> the linear mapping once we move the kernel text mapping out of it.

> >>

> >> This also involves taking into account that table entries at any of

> >> the levels we are populating may have been populated already, since

> >> the fixmap mapping might not be disjoint up to the pgd level anymore

> >> from other early mappings.

> >

> > As a heads-up, for avoiding TLB conflicts, I'm currently working on

> > alternative way of creating the kernel page tables which will definitely

> > conflict here, and may or may not supercede this approach.

> >

> > By adding new FIX_{PGD,PUD,PMD,PTE} indicees to the fixmap, we can

> > allocate page tables from anywhere via memblock, and temporarily map

> > them as we need to.

> >

> 

> Interesting. So how are you dealing with the va<->pa translations and

> vice versa that occur all over the place in create_mapping() et al ?


By rewriting create_mapping() et al to not do that ;)

That's requiring a fair amount of massaging, but so far I've not hit
anything that renders the approach impossible.

> > That would avoid the need for the bootstrap tables. In head.S we'd only

> > need to create a temporary (coarse-grained, RWX) kernel mapping (with

> > the fixmap bolted on). Later we would create a whole new set of tables

> > with a fine-grained kernel mapping and a full linear mapping using the

> > new fixmap entries to temporarily map tables, then switch over to those

> > atomically.

> >

> 

> If we change back to a full linear mapping, are we back to not putting

> the Image astride a 1GB/32MB/512MB boundary (depending on page size)?


I'm not exactly sure what you mean here.

The kernel mapping may inhibit using large section mappings, but this is
necessary anyway due to permission changes at sub-section granularity
(e.g. in fixup_init).

The idea is that when the kernel tables are set up, things are mapped at
the largest possible granularity that permits later permission changes
without breaking/making sections (such that we can avoid TLB conflicts).

So we'd map the kernel and memory in segments, where no two segments
share a common last-level entry (i.e. they're all at least page-aligned,
and don't share a section with another segment).

We'd have separate segments for:
* memory below TEXT_OFFSET
* text
* rodata
* init
* altinstr (I think this can be folded into rodata)
* bss / data, tables
* memory above _end

Later I think it should be relatively simple to move the memory segment
mapping for split-VA.

> Anyway, to illustrate where I am headed with this: in my next version

> of this series, I intend to move the kernel mapping to the start of

> the vmalloc area, which gets moved up 64 MB to make room for the

> module area (which also moves down). That way, we can still load

> modules as before, but no longer have a need for a dedicated carveout

> for the kernel below PAGE_OFFSET.


Ok.

> The next step is then to move the kernel Image up inside the vmalloc

> area based on some randomness we get from the bootloader, and relocate

> it in place (using the same approach as in the patches I sent out

> beginning of this year). I have implemented module PLTs so that the

> Image and the modules no longer need to be within 128 MB of each

> other, which means that we can have full KASLR for modules and Image,

> and also place the kernel anywhere in physical memory.The module PLTs

> would be a runtime penalty only, i.e., a KASLR capable kernel running

> without KASLR would not incur the penalty of branching via PLTs. The

> only build time option is -mcmodel=large for modules so that data

> symbol references are absolute, but that is unlike to hurt

> performance.


I'm certainly interested in seeing this!

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel Dec. 3, 2015, 2:05 p.m. UTC | #4
On 3 December 2015 at 14:59, Mark Rutland <mark.rutland@arm.com> wrote:
> On Thu, Dec 03, 2015 at 02:31:19PM +0100, Ard Biesheuvel wrote:

>> On 3 December 2015 at 13:18, Mark Rutland <mark.rutland@arm.com> wrote:

>> > Hi Ard,

>> >

>> > Apologies that it's taken me so long to get around to this...

>> >

>> > On Mon, Nov 16, 2015 at 12:23:14PM +0100, Ard Biesheuvel wrote:

>> >> This splits off and generalises the population of the statically

>> >> allocated fixmap page tables so that we may reuse it later for

>> >> the linear mapping once we move the kernel text mapping out of it.

>> >>

>> >> This also involves taking into account that table entries at any of

>> >> the levels we are populating may have been populated already, since

>> >> the fixmap mapping might not be disjoint up to the pgd level anymore

>> >> from other early mappings.

>> >

>> > As a heads-up, for avoiding TLB conflicts, I'm currently working on

>> > alternative way of creating the kernel page tables which will definitely

>> > conflict here, and may or may not supercede this approach.

>> >

>> > By adding new FIX_{PGD,PUD,PMD,PTE} indicees to the fixmap, we can

>> > allocate page tables from anywhere via memblock, and temporarily map

>> > them as we need to.

>> >

>>

>> Interesting. So how are you dealing with the va<->pa translations and

>> vice versa that occur all over the place in create_mapping() et al ?

>

> By rewriting create_mapping() et al to not do that ;)

>

> That's requiring a fair amount of massaging, but so far I've not hit

> anything that renders the approach impossible.

>

>> > That would avoid the need for the bootstrap tables. In head.S we'd only

>> > need to create a temporary (coarse-grained, RWX) kernel mapping (with

>> > the fixmap bolted on). Later we would create a whole new set of tables

>> > with a fine-grained kernel mapping and a full linear mapping using the

>> > new fixmap entries to temporarily map tables, then switch over to those

>> > atomically.

>> >

>>

>> If we change back to a full linear mapping, are we back to not putting

>> the Image astride a 1GB/32MB/512MB boundary (depending on page size)?

>

> I'm not exactly sure what you mean here.

>


Apologies, I misread 'linear mapping' as 'id mapping, which of course
are two different things entirely

> The kernel mapping may inhibit using large section mappings, but this is

> necessary anyway due to permission changes at sub-section granularity

> (e.g. in fixup_init).

>

> The idea is that when the kernel tables are set up, things are mapped at

> the largest possible granularity that permits later permission changes

> without breaking/making sections (such that we can avoid TLB conflicts).

>

> So we'd map the kernel and memory in segments, where no two segments

> share a common last-level entry (i.e. they're all at least page-aligned,

> and don't share a section with another segment).

>

> We'd have separate segments for:

> * memory below TEXT_OFFSET

> * text

> * rodata

> * init

> * altinstr (I think this can be folded into rodata)

> * bss / data, tables

> * memory above _end

>

> Later I think it should be relatively simple to move the memory segment

> mapping for split-VA.

>


I'd need to see it to understand, I guess, but getting rid of the
pa<->va translations is definitely an improvement for the stuff I am
trying to do, and would probably make it a lot cleaner.

>> Anyway, to illustrate where I am headed with this: in my next version

>> of this series, I intend to move the kernel mapping to the start of

>> the vmalloc area, which gets moved up 64 MB to make room for the

>> module area (which also moves down). That way, we can still load

>> modules as before, but no longer have a need for a dedicated carveout

>> for the kernel below PAGE_OFFSET.

>

> Ok.

>

>> The next step is then to move the kernel Image up inside the vmalloc

>> area based on some randomness we get from the bootloader, and relocate

>> it in place (using the same approach as in the patches I sent out

>> beginning of this year). I have implemented module PLTs so that the

>> Image and the modules no longer need to be within 128 MB of each

>> other, which means that we can have full KASLR for modules and Image,

>> and also place the kernel anywhere in physical memory.The module PLTs

>> would be a runtime penalty only, i.e., a KASLR capable kernel running

>> without KASLR would not incur the penalty of branching via PLTs. The

>> only build time option is -mcmodel=large for modules so that data

>> symbol references are absolute, but that is unlike to hurt

>> performance.

>

> I'm certainly interested in seeing this!

>


I have patches for all of this, only they don't live on the same branch yet :-)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Catalin Marinas Dec. 7, 2015, 4:08 p.m. UTC | #5
On Thu, Dec 03, 2015 at 02:31:19PM +0100, Ard Biesheuvel wrote:
> On 3 December 2015 at 13:18, Mark Rutland <mark.rutland@arm.com> wrote:

> > As a heads-up, for avoiding TLB conflicts, I'm currently working on

> > alternative way of creating the kernel page tables which will definitely

> > conflict here, and may or may not supercede this approach.

> >

> > By adding new FIX_{PGD,PUD,PMD,PTE} indicees to the fixmap, we can

> > allocate page tables from anywhere via memblock, and temporarily map

> > them as we need to.

[...]
> > That would avoid the need for the bootstrap tables. In head.S we'd only

> > need to create a temporary (coarse-grained, RWX) kernel mapping (with

> > the fixmap bolted on). Later we would create a whole new set of tables

> > with a fine-grained kernel mapping and a full linear mapping using the

> > new fixmap entries to temporarily map tables, then switch over to those

> > atomically.


If we separate the kernel image mapping from the linear one, I think
things would be slightly simpler to avoid TLB conflicts (but I haven't
looked at Mark's patches yet).

> If we change back to a full linear mapping, are we back to not putting

> the Image astride a 1GB/32MB/512MB boundary (depending on page size)?

> 

> Anyway, to illustrate where I am headed with this: in my next version

> of this series, I intend to move the kernel mapping to the start of

> the vmalloc area, which gets moved up 64 MB to make room for the

> module area (which also moves down). That way, we can still load

> modules as before, but no longer have a need for a dedicated carveout

> for the kernel below PAGE_OFFSET.


This makes sense, I guess it can be easily added to the existing series
just by changing the KIMAGE_OFFSET macro.

> The next step is then to move the kernel Image up inside the vmalloc

> area based on some randomness we get from the bootloader, and relocate

> it in place (using the same approach as in the patches I sent out

> beginning of this year). I have implemented module PLTs so that the

> Image and the modules no longer need to be within 128 MB of each

> other, which means that we can have full KASLR for modules and Image,

> and also place the kernel anywhere in physical memory.The module PLTs

> would be a runtime penalty only, i.e., a KASLR capable kernel running

> without KASLR would not incur the penalty of branching via PLTs. The

> only build time option is -mcmodel=large for modules so that data

> symbol references are absolute, but that is unlike to hurt

> performance.


I guess full KASLR would be conditional on a config option.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel Dec. 7, 2015, 4:13 p.m. UTC | #6
On 7 December 2015 at 17:08, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, Dec 03, 2015 at 02:31:19PM +0100, Ard Biesheuvel wrote:

>> On 3 December 2015 at 13:18, Mark Rutland <mark.rutland@arm.com> wrote:

>> > As a heads-up, for avoiding TLB conflicts, I'm currently working on

>> > alternative way of creating the kernel page tables which will definitely

>> > conflict here, and may or may not supercede this approach.

>> >

>> > By adding new FIX_{PGD,PUD,PMD,PTE} indicees to the fixmap, we can

>> > allocate page tables from anywhere via memblock, and temporarily map

>> > them as we need to.

> [...]

>> > That would avoid the need for the bootstrap tables. In head.S we'd only

>> > need to create a temporary (coarse-grained, RWX) kernel mapping (with

>> > the fixmap bolted on). Later we would create a whole new set of tables

>> > with a fine-grained kernel mapping and a full linear mapping using the

>> > new fixmap entries to temporarily map tables, then switch over to those

>> > atomically.

>

> If we separate the kernel image mapping from the linear one, I think

> things would be slightly simpler to avoid TLB conflicts (but I haven't

> looked at Mark's patches yet).

>

>> If we change back to a full linear mapping, are we back to not putting

>> the Image astride a 1GB/32MB/512MB boundary (depending on page size)?

>>

>> Anyway, to illustrate where I am headed with this: in my next version

>> of this series, I intend to move the kernel mapping to the start of

>> the vmalloc area, which gets moved up 64 MB to make room for the

>> module area (which also moves down). That way, we can still load

>> modules as before, but no longer have a need for a dedicated carveout

>> for the kernel below PAGE_OFFSET.

>

> This makes sense, I guess it can be easily added to the existing series

> just by changing the KIMAGE_OFFSET macro.

>


Indeed. The only difference is that the VM area needs to be reserved
explicitly, to prevent vmalloc() from reusing it.

>> The next step is then to move the kernel Image up inside the vmalloc

>> area based on some randomness we get from the bootloader, and relocate

>> it in place (using the same approach as in the patches I sent out

>> beginning of this year). I have implemented module PLTs so that the

>> Image and the modules no longer need to be within 128 MB of each

>> other, which means that we can have full KASLR for modules and Image,

>> and also place the kernel anywhere in physical memory.The module PLTs

>> would be a runtime penalty only, i.e., a KASLR capable kernel running

>> without KASLR would not incur the penalty of branching via PLTs. The

>> only build time option is -mcmodel=large for modules so that data

>> symbol references are absolute, but that is unlike to hurt

>> performance.

>

> I guess full KASLR would be conditional on a config option.

>


Yes. But it would be nice if the only build time penalty is the use of
-mcmodel=large for modules, so that distro kernels can enable KASLR
unconditionally (especially since -mcmodel=large is likely to be
enabled for distro kernels anyway, due to the A53 erratum that
requires it.)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Will Deacon Dec. 8, 2015, 12:40 p.m. UTC | #7
On Thu, Dec 03, 2015 at 12:18:40PM +0000, Mark Rutland wrote:
> Apologies that it's taken me so long to get around to this...

> 

> On Mon, Nov 16, 2015 at 12:23:14PM +0100, Ard Biesheuvel wrote:

> > This splits off and generalises the population of the statically

> > allocated fixmap page tables so that we may reuse it later for

> > the linear mapping once we move the kernel text mapping out of it.

> > 

> > This also involves taking into account that table entries at any of

> > the levels we are populating may have been populated already, since

> > the fixmap mapping might not be disjoint up to the pgd level anymore

> > from other early mappings.

> 

> As a heads-up, for avoiding TLB conflicts, I'm currently working on

> alternative way of creating the kernel page tables which will definitely

> conflict here, and may or may not supercede this approach.


Given that the Christmas break is around the corner and your TLB series
is probably going to take some time to get right, I suggest we persevere
with Ard's current patch series for 4.5 and merge the TLB conflict solution
for 4.6. I don't want us to end up in a situation where this is needlessly
blocked on something that isn't quite ready.

Any objections? If not, Ard -- can you post a new version of this, please?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel Dec. 8, 2015, 1:29 p.m. UTC | #8
On 8 December 2015 at 13:40, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Dec 03, 2015 at 12:18:40PM +0000, Mark Rutland wrote:

>> Apologies that it's taken me so long to get around to this...

>>

>> On Mon, Nov 16, 2015 at 12:23:14PM +0100, Ard Biesheuvel wrote:

>> > This splits off and generalises the population of the statically

>> > allocated fixmap page tables so that we may reuse it later for

>> > the linear mapping once we move the kernel text mapping out of it.

>> >

>> > This also involves taking into account that table entries at any of

>> > the levels we are populating may have been populated already, since

>> > the fixmap mapping might not be disjoint up to the pgd level anymore

>> > from other early mappings.

>>

>> As a heads-up, for avoiding TLB conflicts, I'm currently working on

>> alternative way of creating the kernel page tables which will definitely

>> conflict here, and may or may not supercede this approach.

>

> Given that the Christmas break is around the corner and your TLB series

> is probably going to take some time to get right, I suggest we persevere

> with Ard's current patch series for 4.5 and merge the TLB conflict solution

> for 4.6. I don't want us to end up in a situation where this is needlessly

> blocked on something that isn't quite ready.

>

> Any objections? If not, Ard -- can you post a new version of this, please?

>


Happy to post a new version, with the following remarks
- my current private tree has evolved in the mean time, and I am now
putting the kernel image at the base of the vmalloc region (and the
module region right before)
- I think Mark's changes would allow me to deobfuscate the VA bias
that redirects __va() translations into the kernel VA space rather
than the linear mapping

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Will Deacon Dec. 8, 2015, 1:51 p.m. UTC | #9
On Tue, Dec 08, 2015 at 02:29:33PM +0100, Ard Biesheuvel wrote:
> On 8 December 2015 at 13:40, Will Deacon <will.deacon@arm.com> wrote:

> > On Thu, Dec 03, 2015 at 12:18:40PM +0000, Mark Rutland wrote:

> >> Apologies that it's taken me so long to get around to this...

> >>

> >> On Mon, Nov 16, 2015 at 12:23:14PM +0100, Ard Biesheuvel wrote:

> >> > This splits off and generalises the population of the statically

> >> > allocated fixmap page tables so that we may reuse it later for

> >> > the linear mapping once we move the kernel text mapping out of it.

> >> >

> >> > This also involves taking into account that table entries at any of

> >> > the levels we are populating may have been populated already, since

> >> > the fixmap mapping might not be disjoint up to the pgd level anymore

> >> > from other early mappings.

> >>

> >> As a heads-up, for avoiding TLB conflicts, I'm currently working on

> >> alternative way of creating the kernel page tables which will definitely

> >> conflict here, and may or may not supercede this approach.

> >

> > Given that the Christmas break is around the corner and your TLB series

> > is probably going to take some time to get right, I suggest we persevere

> > with Ard's current patch series for 4.5 and merge the TLB conflict solution

> > for 4.6. I don't want us to end up in a situation where this is needlessly

> > blocked on something that isn't quite ready.

> >

> > Any objections? If not, Ard -- can you post a new version of this, please?

> >

> 

> Happy to post a new version, with the following remarks

> - my current private tree has evolved in the mean time, and I am now

> putting the kernel image at the base of the vmalloc region (and the

> module region right before)

> - I think Mark's changes would allow me to deobfuscate the VA bias

> that redirects __va() translations into the kernel VA space rather

> than the linear mapping


I'll leave that up to you. I'm just trying to avoid you growing a dependency
on something that's unlikely to make it for 4.5. If Mark separates out the
parts you need, perhaps that offers us some middle ground.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel Dec. 15, 2015, 7:19 p.m. UTC | #10
On 8 December 2015 at 14:51, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Dec 08, 2015 at 02:29:33PM +0100, Ard Biesheuvel wrote:

>> On 8 December 2015 at 13:40, Will Deacon <will.deacon@arm.com> wrote:

>> > On Thu, Dec 03, 2015 at 12:18:40PM +0000, Mark Rutland wrote:

>> >> Apologies that it's taken me so long to get around to this...

>> >>

>> >> On Mon, Nov 16, 2015 at 12:23:14PM +0100, Ard Biesheuvel wrote:

>> >> > This splits off and generalises the population of the statically

>> >> > allocated fixmap page tables so that we may reuse it later for

>> >> > the linear mapping once we move the kernel text mapping out of it.

>> >> >

>> >> > This also involves taking into account that table entries at any of

>> >> > the levels we are populating may have been populated already, since

>> >> > the fixmap mapping might not be disjoint up to the pgd level anymore

>> >> > from other early mappings.

>> >>

>> >> As a heads-up, for avoiding TLB conflicts, I'm currently working on

>> >> alternative way of creating the kernel page tables which will definitely

>> >> conflict here, and may or may not supercede this approach.

>> >

>> > Given that the Christmas break is around the corner and your TLB series

>> > is probably going to take some time to get right, I suggest we persevere

>> > with Ard's current patch series for 4.5 and merge the TLB conflict solution

>> > for 4.6. I don't want us to end up in a situation where this is needlessly

>> > blocked on something that isn't quite ready.

>> >

>> > Any objections? If not, Ard -- can you post a new version of this, please?

>> >

>>

>> Happy to post a new version, with the following remarks

>> - my current private tree has evolved in the mean time, and I am now

>> putting the kernel image at the base of the vmalloc region (and the

>> module region right before)

>> - I think Mark's changes would allow me to deobfuscate the VA bias

>> that redirects __va() translations into the kernel VA space rather

>> than the linear mapping

>

> I'll leave that up to you. I'm just trying to avoid you growing a dependency

> on something that's unlikely to make it for 4.5. If Mark separates out the

> parts you need, perhaps that offers us some middle ground.

>


I have played around with Mark's code a bit, and it looks like it's a
huge improvement for the split VA patches as well: I have a patch that
removes early_fixmap_init's dependency on the linear mapping, and
combined with Mark's patches to use the fixmap for manipulating the
page tables, it looks like I no longer need the VA bias to redirect
__va translations into the kernel mapping early on.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff mbox

Patch

diff --git a/arch/arm64/include/asm/compiler.h b/arch/arm64/include/asm/compiler.h
index ee35fd0f2236..dd342af63673 100644
--- a/arch/arm64/include/asm/compiler.h
+++ b/arch/arm64/include/asm/compiler.h
@@ -27,4 +27,6 @@ 
  */
 #define __asmeq(x, y)  ".ifnc " x "," y " ; .err ; .endif\n\t"
 
+#define __pgdir		__attribute__((section(".pgdir"),aligned(PAGE_SIZE)))
+
 #endif	/* __ASM_COMPILER_H */
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 1ee2c3937d4e..87a596246ec7 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -164,11 +164,13 @@  SECTIONS
 
 	BSS_SECTION(0, 0, 0)
 
-	. = ALIGN(PAGE_SIZE);
-	idmap_pg_dir = .;
-	. += IDMAP_DIR_SIZE;
-	swapper_pg_dir = .;
-	. += SWAPPER_DIR_SIZE;
+	.pgdir (NOLOAD) : ALIGN(PAGE_SIZE) {
+		idmap_pg_dir = .;
+		. += IDMAP_DIR_SIZE;
+		swapper_pg_dir = .;
+		. += SWAPPER_DIR_SIZE;
+		*(.pgdir)
+	}
 
 	_end = .;
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 32ddd893da9a..4f397a87c2be 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -396,6 +396,44 @@  static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
 }
 #endif
 
+struct bootstrap_pgtables {
+	pte_t	pte[PTRS_PER_PTE];
+	pmd_t	pmd[PTRS_PER_PMD > 1 ? PTRS_PER_PMD : 0];
+	pud_t	pud[PTRS_PER_PUD > 1 ? PTRS_PER_PUD : 0];
+};
+
+static void __init bootstrap_early_mapping(unsigned long addr,
+					   struct bootstrap_pgtables *reg,
+					   bool pte_level)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	pgd = pgd_offset_k(addr);
+	if (pgd_none(*pgd)) {
+		clear_page(reg->pud);
+		memblock_reserve(__pa(reg->pud), PAGE_SIZE);
+		pgd_populate(&init_mm, pgd, reg->pud);
+	}
+	pud = pud_offset(pgd, addr);
+	if (pud_none(*pud)) {
+		clear_page(reg->pmd);
+		memblock_reserve(__pa(reg->pmd), PAGE_SIZE);
+		pud_populate(&init_mm, pud, reg->pmd);
+	}
+
+	if (!pte_level)
+		return;
+
+	pmd = pmd_offset(pud, addr);
+	if (pmd_none(*pmd)) {
+		clear_page(reg->pte);
+		memblock_reserve(__pa(reg->pte), PAGE_SIZE);
+		pmd_populate_kernel(&init_mm, pmd, reg->pte);
+	}
+}
+
 static void __init map_mem(void)
 {
 	struct memblock_region *reg;
@@ -598,14 +636,6 @@  void vmemmap_free(unsigned long start, unsigned long end)
 }
 #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
 
-static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
-#if CONFIG_PGTABLE_LEVELS > 2
-static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
-#endif
-#if CONFIG_PGTABLE_LEVELS > 3
-static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
-#endif
-
 static inline pud_t * fixmap_pud(unsigned long addr)
 {
 	pgd_t *pgd = pgd_offset_k(addr);
@@ -635,21 +665,15 @@  static inline pte_t * fixmap_pte(unsigned long addr)
 
 void __init early_fixmap_init(void)
 {
-	pgd_t *pgd;
-	pud_t *pud;
+	static struct bootstrap_pgtables fixmap_bs_pgtables __pgdir;
 	pmd_t *pmd;
-	unsigned long addr = FIXADDR_START;
 
-	pgd = pgd_offset_k(addr);
-	pgd_populate(&init_mm, pgd, bm_pud);
-	pud = pud_offset(pgd, addr);
-	pud_populate(&init_mm, pud, bm_pmd);
-	pmd = pmd_offset(pud, addr);
-	pmd_populate_kernel(&init_mm, pmd, bm_pte);
+	bootstrap_early_mapping(FIXADDR_START, &fixmap_bs_pgtables, true);
+	pmd = fixmap_pmd(FIXADDR_START);
 
 	/*
 	 * The boot-ioremap range spans multiple pmds, for which
-	 * we are not preparted:
+	 * we are not prepared:
 	 */
 	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
 		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));