From patchwork Wed May 8 09:52:33 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Capper X-Patchwork-Id: 16781 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-wi0-f198.google.com (mail-wi0-f198.google.com [209.85.212.198]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 3B846238FB for ; Wed, 8 May 2013 09:53:33 +0000 (UTC) Received: by mail-wi0-f198.google.com with SMTP id hq12sf1580545wib.9 for ; Wed, 08 May 2013 02:53:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-beenthere:x-received:received-spf :x-received:x-forwarded-to:x-forwarded-for:delivered-to:x-received :received-spf:x-received:from:to:cc:subject:date:message-id:x-mailer :in-reply-to:references:x-gm-message-state:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :x-google-group-id:list-post:list-help:list-archive:list-unsubscribe; bh=n+nr8UFYOngIL+SVvAOdgFIwfMrrtl+asoZjIoB0HVA=; b=pgP5fXAY+LawLt6SXkA6mHlM/w40bSXvuEudGO958Xi8yEM4aD3V+8aWP6Q4U3pSGI 4BK1sF8wKq4XQwhRaLtd2z6R20PcRp+l1ONRfa8HIccmuSWE6in2KFTAdMfeSr+VVg3v Z8qqsnlOOuJOhSC6zBFnqSmOGNC7q/HP1cPblgoH3af0n6U947MSexD9W1u5aPo9EPew BTXX+1swpMiuGxSR7mnonnxfYP5sEPXz7j8VngLhy3PzLjstxfGuhpr7I9QnGcWXmiAV mv24uHs7uDy1DGauwzajxN5EXvJasRqkdIlZgsoXFNmImHMDxmD8Ww6RkTHqAsrVOuBE unJw== X-Received: by 10.180.20.105 with SMTP id m9mr19410493wie.5.1368006791551; Wed, 08 May 2013 02:53:11 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.180.88.133 with SMTP id bg5ls524415wib.38.gmail; Wed, 08 May 2013 02:53:11 -0700 (PDT) X-Received: by 10.194.77.146 with SMTP id s18mr9373642wjw.16.1368006791452; Wed, 08 May 2013 02:53:11 -0700 (PDT) Received: from mail-ve0-x22d.google.com (mail-ve0-x22d.google.com [2607:f8b0:400c:c01::22d]) by mx.google.com with ESMTPS id w6si1484497wij.34.2013.05.08.02.53.11 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 08 May 2013 02:53:11 -0700 (PDT) Received-SPF: neutral (google.com: 2607:f8b0:400c:c01::22d is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=2607:f8b0:400c:c01::22d; Received: by mail-ve0-f173.google.com with SMTP id ox1so1571390veb.18 for ; Wed, 08 May 2013 02:53:10 -0700 (PDT) X-Received: by 10.220.82.68 with SMTP id a4mr3868152vcl.49.1368006790282; Wed, 08 May 2013 02:53:10 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.58.127.98 with SMTP id nf2csp146830veb; Wed, 8 May 2013 02:53:09 -0700 (PDT) X-Received: by 10.180.188.141 with SMTP id ga13mr20616357wic.9.1368006788926; Wed, 08 May 2013 02:53:08 -0700 (PDT) Received: from mail-wg0-x22e.google.com (mail-wg0-x22e.google.com [2a00:1450:400c:c00::22e]) by mx.google.com with ESMTPS id s7si3145403wjs.56.2013.05.08.02.53.08 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 08 May 2013 02:53:08 -0700 (PDT) Received-SPF: neutral (google.com: 2a00:1450:400c:c00::22e is neither permitted nor denied by best guess record for domain of steve.capper@linaro.org) client-ip=2a00:1450:400c:c00::22e; Received: by mail-wg0-f46.google.com with SMTP id n12so1612981wgh.1 for ; Wed, 08 May 2013 02:53:08 -0700 (PDT) X-Received: by 10.180.24.69 with SMTP id s5mr20471440wif.34.1368006788398; Wed, 08 May 2013 02:53:08 -0700 (PDT) Received: from localhost.localdomain (marmot.wormnet.eu. [188.246.204.87]) by mx.google.com with ESMTPSA id m14sm8068040wij.9.2013.05.08.02.53.07 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 08 May 2013 02:53:07 -0700 (PDT) From: Steve Capper To: linux-mm@kvack.org, x86@kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org Cc: Michal Hocko , Ken Chen , Mel Gorman , Catalin Marinas , Will Deacon , patches@linaro.org, Steve Capper Subject: [RFC PATCH v2 01/11] mm: hugetlb: Copy huge_pmd_share from x86 to mm. Date: Wed, 8 May 2013 10:52:33 +0100 Message-Id: <1368006763-30774-2-git-send-email-steve.capper@linaro.org> X-Mailer: git-send-email 1.7.2.5 In-Reply-To: <1368006763-30774-1-git-send-email-steve.capper@linaro.org> References: <1368006763-30774-1-git-send-email-steve.capper@linaro.org> X-Gm-Message-State: ALoCoQmGCWoEFfh6EC52qyt8Ywle3EIAkH0nTP6zl3plTiYbDMWXLuwtdq0OWxbCwKW8R1T5ls73 X-Original-Sender: steve.capper@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 2607:f8b0:400c:c01::22d is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , Under x86, multiple puds can be made to reference the same bank of huge pmds provided that they represent a full PUD_SIZE of shared huge memory that is aligned to a PUD_SIZE boundary. The code to share pmds does not require any architecture specific knowledge other than the fact that pmds can be indexed, thus can be beneficial to some other architectures. This patch copies the huge pmd sharing (and unsharing) logic from x86/ to mm/ and introduces a new config option to activate it: CONFIG_ARCH_WANTS_HUGE_PMD_SHARE Signed-off-by: Steve Capper --- include/linux/hugetlb.h | 4 ++ mm/hugetlb.c | 122 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 126 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 16e4e9a..795c32d 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -68,6 +68,10 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed); int dequeue_hwpoisoned_huge_page(struct page *page); void copy_huge_page(struct page *dst, struct page *src); +#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE +pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud); +#endif + extern unsigned long hugepages_treat_as_movable; extern const unsigned long hugetlb_zero, hugetlb_infinity; extern int sysctl_hugetlb_shm_group; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ca9a7c6..41179b0 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3142,6 +3142,128 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed) hugetlb_acct_memory(h, -(chg - freed)); } +#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE +static unsigned long page_table_shareable(struct vm_area_struct *svma, + struct vm_area_struct *vma, + unsigned long addr, pgoff_t idx) +{ + unsigned long saddr = ((idx - svma->vm_pgoff) << PAGE_SHIFT) + + svma->vm_start; + unsigned long sbase = saddr & PUD_MASK; + unsigned long s_end = sbase + PUD_SIZE; + + /* Allow segments to share if only one is marked locked */ + unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED; + unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED; + + /* + * match the virtual addresses, permission and the alignment of the + * page table page. + */ + if (pmd_index(addr) != pmd_index(saddr) || + vm_flags != svm_flags || + sbase < svma->vm_start || svma->vm_end < s_end) + return 0; + + return saddr; +} + +static int vma_shareable(struct vm_area_struct *vma, unsigned long addr) +{ + unsigned long base = addr & PUD_MASK; + unsigned long end = base + PUD_SIZE; + + /* + * check on proper vm_flags and page table alignment + */ + if (vma->vm_flags & VM_MAYSHARE && + vma->vm_start <= base && end <= vma->vm_end) + return 1; + return 0; +} + +/* + * Search for a shareable pmd page for hugetlb. In any case calls pmd_alloc() + * and returns the corresponding pte. While this is not necessary for the + * !shared pmd case because we can allocate the pmd later as well, it makes the + * code much cleaner. pmd allocation is essential for the shared case because + * pud has to be populated inside the same i_mmap_mutex section - otherwise + * racing tasks could either miss the sharing (see huge_pte_offset) or select a + * bad pmd for sharing. + */ +pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) +{ + struct vm_area_struct *vma = find_vma(mm, addr); + struct address_space *mapping = vma->vm_file->f_mapping; + pgoff_t idx = ((addr - vma->vm_start) >> PAGE_SHIFT) + + vma->vm_pgoff; + struct vm_area_struct *svma; + unsigned long saddr; + pte_t *spte = NULL; + pte_t *pte; + + if (!vma_shareable(vma, addr)) + return (pte_t *)pmd_alloc(mm, pud, addr); + + mutex_lock(&mapping->i_mmap_mutex); + vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { + if (svma == vma) + continue; + + saddr = page_table_shareable(svma, vma, addr, idx); + if (saddr) { + spte = huge_pte_offset(svma->vm_mm, saddr); + if (spte) { + get_page(virt_to_page(spte)); + break; + } + } + } + + if (!spte) + goto out; + + spin_lock(&mm->page_table_lock); + if (pud_none(*pud)) + pud_populate(mm, pud, + (pmd_t *)((unsigned long)spte & PAGE_MASK)); + else + put_page(virt_to_page(spte)); + spin_unlock(&mm->page_table_lock); +out: + pte = (pte_t *)pmd_alloc(mm, pud, addr); + mutex_unlock(&mapping->i_mmap_mutex); + return pte; +} + +/* + * unmap huge page backed by shared pte. + * + * Hugetlb pte page is ref counted at the time of mapping. If pte is shared + * indicated by page_count > 1, unmap is achieved by clearing pud and + * decrementing the ref count. If count == 1, the pte page is not shared. + * + * called with vma->vm_mm->page_table_lock held. + * + * returns: 1 successfully unmapped a shared pte page + * 0 the underlying pte page is not shared, or it is the last user + */ +int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep) +{ + pgd_t *pgd = pgd_offset(mm, *addr); + pud_t *pud = pud_offset(pgd, *addr); + + BUG_ON(page_count(virt_to_page(ptep)) == 0); + if (page_count(virt_to_page(ptep)) == 1) + return 0; + + pud_clear(pud); + put_page(virt_to_page(ptep)); + *addr = ALIGN(*addr, HPAGE_SIZE * PTRS_PER_PTE) - HPAGE_SIZE; + return 1; +} +#endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ + #ifdef CONFIG_MEMORY_FAILURE /* Should be called in hugetlb_lock */