Message ID | 1398857782-1525-1-git-send-email-steve.capper@linaro.org |
---|---|
State | Accepted |
Commit | 206a2a73a62d37c8b8f6ddd3180c202b2e7298ab |
Headers | show |
On Wednesday 30 April 2014 12:36:22 Steve Capper wrote: > We have the capability to map 1GB level 1 blocks when using a 4K > granule. > > This patch adjusts the create_mapping logic s.t. when mapping physical > memory on boot, we attempt to use a 1GB block if both the VA and PA > start and end are 1GB aligned. This both reduces the levels of lookup > required to resolve a kernel logical address, as well as reduces TLB > pressure on cores that support 1GB TLB entries. > > Signed-off-by: Steve Capper <steve.capper@linaro.org> > --- > Hello, > This patch has been tested on the FastModel for 4K and 64K pages. > Also, this has been tested with Jungseok's 4 level patch. > > I put in the explicit check for PAGE_SHIFT, as I am anticipating a > three level 64KB configuration at some point. > > With two level 64K, a PUD is equivalent to a PMD which is equivalent to > a PGD, and these are all level 2 descriptors. > > Under three level 64K, a PUD would be equivalent to a PGD which would > be a level 1 descriptor thus may not be a block. > > Comments/critique/testers welcome. It seems like a great idea. I have to admit that I don't understand the existing code, but what are the page sizes used here? Does the code always use the largest possible page size, or does it just use either small pages or 1G pages? In combination with the contiguous page hint, we should be able to theoretically support 4KB/64KB/2M/32M/1G/16G TLBs in any combination for boot-time mappings on a 4K page size kernel, or 64KB/1M/512M/8G on a 64KB page size kernel. Arnd
On Wed, Apr 30, 2014 at 08:11:26PM +0200, Arnd Bergmann wrote: > On Wednesday 30 April 2014 12:36:22 Steve Capper wrote: > > We have the capability to map 1GB level 1 blocks when using a 4K > > granule. > > > > This patch adjusts the create_mapping logic s.t. when mapping physical > > memory on boot, we attempt to use a 1GB block if both the VA and PA > > start and end are 1GB aligned. This both reduces the levels of lookup > > required to resolve a kernel logical address, as well as reduces TLB > > pressure on cores that support 1GB TLB entries. > > > > Signed-off-by: Steve Capper <steve.capper@linaro.org> > > --- > > Hello, > > This patch has been tested on the FastModel for 4K and 64K pages. > > Also, this has been tested with Jungseok's 4 level patch. > > > > I put in the explicit check for PAGE_SHIFT, as I am anticipating a > > three level 64KB configuration at some point. > > > > With two level 64K, a PUD is equivalent to a PMD which is equivalent to > > a PGD, and these are all level 2 descriptors. > > > > Under three level 64K, a PUD would be equivalent to a PGD which would > > be a level 1 descriptor thus may not be a block. > > > > Comments/critique/testers welcome. > > It seems like a great idea. I have to admit that I don't understand > the existing code, but what are the page sizes used here? Actually, I think it was your idea ;-). I remember you talking about increasing the mapping size when 4-level page tables were being discussed. (I think I should have added a Reported-by, would be happy to if you want?). With a 64KB granule, we'll map 512MB blocks if possible, otherwise 64K. And with a 4KB granule, the original code will map 2MB blocks if possible, and 4KB otherwise. The patch will make the 4KB granule case also map 1GB blocks if possible. > > Does the code always use the largest possible page size, or does > it just use either small pages or 1G pages? The code will put down the largest mappings it can. As the physical memory sizes/address are very likely to be aligned to whatever block size we use; we are likely to achieve the maximum size for our mappings. > > In combination with the contiguous page hint, we should be able > to theoretically support 4KB/64KB/2M/32M/1G/16G TLBs in any > combination for boot-time mappings on a 4K page size kernel, > or 64KB/1M/512M/8G on a 64KB page size kernel. > A contiguous hint could be applied to these mappings. The logic would be a bit more complicated though when we consider different granules. For 4KB we chain together 16 entries, for 64KB we use 32. If/when we adopt a 16KB granule, we use 32 entries for a level 2 lookup and 128 entries for a level 3 lookup... The largest TLB entry sizes that I am aware of in play are the block sizes (i.e. 2MB, 512MB, 1GB). So I don't think we'll get any benefit at the moment for adding the contiguous logic. Cheers,
On Thursday 01 May 2014 09:54:12 Steve Capper wrote: > On Wed, Apr 30, 2014 at 08:11:26PM +0200, Arnd Bergmann wrote: > > On Wednesday 30 April 2014 12:36:22 Steve Capper wrote: > > > We have the capability to map 1GB level 1 blocks when using a 4K > > > granule. > > > > > > This patch adjusts the create_mapping logic s.t. when mapping physical > > > memory on boot, we attempt to use a 1GB block if both the VA and PA > > > start and end are 1GB aligned. This both reduces the levels of lookup > > > required to resolve a kernel logical address, as well as reduces TLB > > > pressure on cores that support 1GB TLB entries. > > > > > > Signed-off-by: Steve Capper <steve.capper@linaro.org> > > > --- > > > Hello, > > > This patch has been tested on the FastModel for 4K and 64K pages. > > > Also, this has been tested with Jungseok's 4 level patch. > > > > > > I put in the explicit check for PAGE_SHIFT, as I am anticipating a > > > three level 64KB configuration at some point. > > > > > > With two level 64K, a PUD is equivalent to a PMD which is equivalent to > > > a PGD, and these are all level 2 descriptors. > > > > > > Under three level 64K, a PUD would be equivalent to a PGD which would > > > be a level 1 descriptor thus may not be a block. > > > > > > Comments/critique/testers welcome. > > > > It seems like a great idea. I have to admit that I don't understand > > the existing code, but what are the page sizes used here? > > Actually, I think it was your idea ;-). I remember you talking about > increasing the mapping size when 4-level page tables were being > discussed. (I think I should have added a Reported-by, would be happy > to if you want?). I completely forgot we had talked about this. > With a 64KB granule, we'll map 512MB blocks if possible, otherwise 64K. > And with a 4KB granule, the original code will map 2MB blocks if > possible, and 4KB otherwise. > > The patch will make the 4KB granule case also map 1GB blocks if > possible. Ok. > > In combination with the contiguous page hint, we should be able > > to theoretically support 4KB/64KB/2M/32M/1G/16G TLBs in any > > combination for boot-time mappings on a 4K page size kernel, > > or 64KB/1M/512M/8G on a 64KB page size kernel. > > A contiguous hint could be applied to these mappings. The logic would > be a bit more complicated though when we consider different granules. > For 4KB we chain together 16 entries, for 64KB we use 32. If/when we > adopt a 16KB granule, we use 32 entries for a level 2 lookup and > 128 entries for a level 3 lookup... > > The largest TLB entry sizes that I am aware of in play are the block > sizes (i.e. 2MB, 512MB, 1GB). So I don't think we'll get any benefit at > the moment for adding the contiguous logic. Is that an architecture limit, or specific to the Cortex-A53/A57 implementations? Arnd
On Thu, May 01, 2014 at 03:36:05PM +0200, Arnd Bergmann wrote: > On Thursday 01 May 2014 09:54:12 Steve Capper wrote: > > On Wed, Apr 30, 2014 at 08:11:26PM +0200, Arnd Bergmann wrote: > > > On Wednesday 30 April 2014 12:36:22 Steve Capper wrote: > > > > We have the capability to map 1GB level 1 blocks when using a 4K > > > > granule. > > > > > > > > This patch adjusts the create_mapping logic s.t. when mapping physical > > > > memory on boot, we attempt to use a 1GB block if both the VA and PA > > > > start and end are 1GB aligned. This both reduces the levels of lookup > > > > required to resolve a kernel logical address, as well as reduces TLB > > > > pressure on cores that support 1GB TLB entries. > > > > > > > > Signed-off-by: Steve Capper <steve.capper@linaro.org> > > > > --- > > > > Hello, > > > > This patch has been tested on the FastModel for 4K and 64K pages. > > > > Also, this has been tested with Jungseok's 4 level patch. > > > > > > > > I put in the explicit check for PAGE_SHIFT, as I am anticipating a > > > > three level 64KB configuration at some point. > > > > > > > > With two level 64K, a PUD is equivalent to a PMD which is equivalent to > > > > a PGD, and these are all level 2 descriptors. > > > > > > > > Under three level 64K, a PUD would be equivalent to a PGD which would > > > > be a level 1 descriptor thus may not be a block. > > > > > > > > Comments/critique/testers welcome. > > > > > > It seems like a great idea. I have to admit that I don't understand > > > the existing code, but what are the page sizes used here? > > > > Actually, I think it was your idea ;-). I remember you talking about > > increasing the mapping size when 4-level page tables were being > > discussed. (I think I should have added a Reported-by, would be happy > > to if you want?). > > I completely forgot we had talked about this. > > > With a 64KB granule, we'll map 512MB blocks if possible, otherwise 64K. > > And with a 4KB granule, the original code will map 2MB blocks if > > possible, and 4KB otherwise. > > > > The patch will make the 4KB granule case also map 1GB blocks if > > possible. > > Ok. > > > > In combination with the contiguous page hint, we should be able > > > to theoretically support 4KB/64KB/2M/32M/1G/16G TLBs in any > > > combination for boot-time mappings on a 4K page size kernel, > > > or 64KB/1M/512M/8G on a 64KB page size kernel. > > > > A contiguous hint could be applied to these mappings. The logic would > > be a bit more complicated though when we consider different granules. > > For 4KB we chain together 16 entries, for 64KB we use 32. If/when we > > adopt a 16KB granule, we use 32 entries for a level 2 lookup and > > 128 entries for a level 3 lookup... > > > > The largest TLB entry sizes that I am aware of in play are the block > > sizes (i.e. 2MB, 512MB, 1GB). So I don't think we'll get any benefit at > > the moment for adding the contiguous logic. > > Is that an architecture limit, or specific to the Cortex-A53/A57 > implementations? Those are the TLBs that are documented for the Cortex-A53 and Cortex-A57. I have an idea of what the architectural limit is, but I will need to seek confirmation on it. Cheers,
On Wednesday, April 30, 2014 8:36 PM, Steve Capper wrote: > We have the capability to map 1GB level 1 blocks when using a 4K granule. > > This patch adjusts the create_mapping logic s.t. when mapping physical memory on boot, we attempt to > use a 1GB block if both the VA and PA start and end are 1GB aligned. This both reduces the levels of > lookup required to resolve a kernel logical address, as well as reduces TLB pressure on cores that > support 1GB TLB entries. > > Signed-off-by: Steve Capper <steve.capper@linaro.org> > --- > Hello, > This patch has been tested on the FastModel for 4K and 64K pages. > Also, this has been tested with Jungseok's 4 level patch. > > I put in the explicit check for PAGE_SHIFT, as I am anticipating a three level 64KB configuration at > some point. > > With two level 64K, a PUD is equivalent to a PMD which is equivalent to a PGD, and these are all level > 2 descriptors. > > Under three level 64K, a PUD would be equivalent to a PGD which would be a level 1 descriptor thus may > not be a block. > > Comments/critique/testers welcome. Hi, Steve I've tested on my platform, and it works well. If SoC design follows "Principles of ARM Memory Maps", PA should be supposed to be 1GB aligned. Thus, I think this patch is effective against them. Best Regards Jungseok Lee
On Wed, Apr 30, 2014 at 12:36:22PM +0100, Steve Capper wrote: > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > index 4d29332..867e979 100644 > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -234,7 +234,20 @@ static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr, > pud = pud_offset(pgd, addr); > do { > next = pud_addr_end(addr, end); > - alloc_init_pmd(pud, addr, next, phys); > + > + /* > + * For 4K granule only, attempt to put down a 1GB block > + */ > + if ((PAGE_SHIFT == 12) && > + ((addr | next | phys) & ~PUD_MASK) == 0) { > + pud_t old_pud = *pud; > + set_pud(pud, __pud(phys | prot_sect_kernel)); > + > + if (!pud_none(old_pud)) > + flush_tlb_all(); We could even free the original pmd here. I think a memblock_free(pud_pfn(old_pud) << PAGE_SHIFT, PAGE_SIZE) should do (untested, and you need to define pud_pfn).
On Fri, May 02, 2014 at 10:03:02AM +0900, Jungseok Lee wrote: > On Wednesday, April 30, 2014 8:36 PM, Steve Capper wrote: > > We have the capability to map 1GB level 1 blocks when using a 4K granule. > > > > This patch adjusts the create_mapping logic s.t. when mapping physical memory on boot, we attempt to > > use a 1GB block if both the VA and PA start and end are 1GB aligned. This both reduces the levels of > > lookup required to resolve a kernel logical address, as well as reduces TLB pressure on cores that > > support 1GB TLB entries. > > > > Signed-off-by: Steve Capper <steve.capper@linaro.org> > > --- > > Hello, > > This patch has been tested on the FastModel for 4K and 64K pages. > > Also, this has been tested with Jungseok's 4 level patch. > > > > I put in the explicit check for PAGE_SHIFT, as I am anticipating a three level 64KB configuration at > > some point. > > > > With two level 64K, a PUD is equivalent to a PMD which is equivalent to a PGD, and these are all level > > 2 descriptors. > > > > Under three level 64K, a PUD would be equivalent to a PGD which would be a level 1 descriptor thus may > > not be a block. > > > > Comments/critique/testers welcome. > > Hi, Steve > > I've tested on my platform, and it works well. > Thanks for giving this a go! > If SoC design follows "Principles of ARM Memory Maps", > PA should be supposed to be 1GB aligned. Thus, I think > this patch is effective against them. > > Best Regards > Jungseok Lee >
On Fri, May 02, 2014 at 09:51:21AM +0100, Catalin Marinas wrote: > On Wed, Apr 30, 2014 at 12:36:22PM +0100, Steve Capper wrote: > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > > index 4d29332..867e979 100644 > > --- a/arch/arm64/mm/mmu.c > > +++ b/arch/arm64/mm/mmu.c > > @@ -234,7 +234,20 @@ static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr, > > pud = pud_offset(pgd, addr); > > do { > > next = pud_addr_end(addr, end); > > - alloc_init_pmd(pud, addr, next, phys); > > + > > + /* > > + * For 4K granule only, attempt to put down a 1GB block > > + */ > > + if ((PAGE_SHIFT == 12) && > > + ((addr | next | phys) & ~PUD_MASK) == 0) { > > + pud_t old_pud = *pud; > > + set_pud(pud, __pud(phys | prot_sect_kernel)); > > + > > + if (!pud_none(old_pud)) > > + flush_tlb_all(); > > We could even free the original pmd here. I think a > memblock_free(pud_pfn(old_pud) << PAGE_SHIFT, PAGE_SIZE) should do > (untested, and you need to define pud_pfn). I see what you mean, we will potentially have an unused page in our swapper_pg_dir array. I'll have a think, and add some logic to remove the redundant page. Cheers,
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 4d29332..867e979 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -234,7 +234,20 @@ static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr, pud = pud_offset(pgd, addr); do { next = pud_addr_end(addr, end); - alloc_init_pmd(pud, addr, next, phys); + + /* + * For 4K granule only, attempt to put down a 1GB block + */ + if ((PAGE_SHIFT == 12) && + ((addr | next | phys) & ~PUD_MASK) == 0) { + pud_t old_pud = *pud; + set_pud(pud, __pud(phys | prot_sect_kernel)); + + if (!pud_none(old_pud)) + flush_tlb_all(); + } else { + alloc_init_pmd(pud, addr, next, phys); + } phys += next - addr; } while (pud++, addr = next, addr != end); }
We have the capability to map 1GB level 1 blocks when using a 4K granule. This patch adjusts the create_mapping logic s.t. when mapping physical memory on boot, we attempt to use a 1GB block if both the VA and PA start and end are 1GB aligned. This both reduces the levels of lookup required to resolve a kernel logical address, as well as reduces TLB pressure on cores that support 1GB TLB entries. Signed-off-by: Steve Capper <steve.capper@linaro.org> --- Hello, This patch has been tested on the FastModel for 4K and 64K pages. Also, this has been tested with Jungseok's 4 level patch. I put in the explicit check for PAGE_SHIFT, as I am anticipating a three level 64KB configuration at some point. With two level 64K, a PUD is equivalent to a PMD which is equivalent to a PGD, and these are all level 2 descriptors. Under three level 64K, a PUD would be equivalent to a PGD which would be a level 1 descriptor thus may not be a block. Comments/critique/testers welcome. Cheers,