From patchwork Tue May 6 15:30:04 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Capper X-Patchwork-Id: 29715 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-qc0-f199.google.com (mail-qc0-f199.google.com [209.85.216.199]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id C2F5C20534 for ; Tue, 6 May 2014 15:32:50 +0000 (UTC) Received: by mail-qc0-f199.google.com with SMTP id i17sf21118523qcy.2 for ; Tue, 06 May 2014 08:32:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:from:to:subject:date:message-id :in-reply-to:references:cc:precedence:list-id:list-unsubscribe :list-archive:list-post:list-help:list-subscribe:mime-version:sender :errors-to:x-original-sender:x-original-authentication-results :mailing-list:content-type:content-transfer-encoding; bh=QC9gfc3dpuXGfZhh0MJViyVuEsRHMQhOxzBk7zUBgiw=; b=lUYUBvFCQ4EHMpQVNCk7AcNj54SHlwALTzUoTZPyuviI23Igv/AUR7oxYzz5RONhLS PaeMQ6D6M4e1YJWPZTWf1+ilTVd/q/eVGZme8X7V+mOqDsTJVvfUnBQOFOyqb/Tt5XHb dlAJ99vToG8LHNNgd1BrTsHoYqyc8HBTZQL6HUHTlUqCavj2u2DSMbhXfUhnw3UjdDwr bV3fn6wHx7BSpz1qNWDqNr2TryFEnqPNRiAeSjdDLF6brHOr9GYCecRFLQQnVOcXb9sc kMk8dW3usyguk1SzkjmGqiX2O4wkDhkDazIQvkTW6VPtsJPis2qpxk0mYqRMv7cENcai Mi/Q== X-Gm-Message-State: ALoCoQkqSj50BfBZF0IFGDocFv44rvXQ+HY2ou4P91ZpGuGVsjflfGcdNBUvdEpcH6S2Os5zsvOO X-Received: by 10.236.130.101 with SMTP id j65mr1093422yhi.29.1399390370524; Tue, 06 May 2014 08:32:50 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.94.145 with SMTP id g17ls832043qge.76.gmail; Tue, 06 May 2014 08:32:50 -0700 (PDT) X-Received: by 10.52.3.168 with SMTP id d8mr885528vdd.79.1399390370248; Tue, 06 May 2014 08:32:50 -0700 (PDT) Received: from mail-vc0-f177.google.com (mail-vc0-f177.google.com [209.85.220.177]) by mx.google.com with ESMTPS id oh5si2381734vcb.140.2014.05.06.08.32.50 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 06 May 2014 08:32:50 -0700 (PDT) Received-SPF: none (google.com: patch+caf_=patchwork-forward=linaro.org@linaro.org does not designate permitted sender hosts) client-ip=209.85.220.177; Received: by mail-vc0-f177.google.com with SMTP id if11so1154623vcb.8 for ; Tue, 06 May 2014 08:32:50 -0700 (PDT) X-Received: by 10.220.7.131 with SMTP id d3mr1068092vcd.45.1399390370103; Tue, 06 May 2014 08:32:50 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.220.221.72 with SMTP id ib8csp233204vcb; Tue, 6 May 2014 08:32:47 -0700 (PDT) X-Received: by 10.140.26.243 with SMTP id 106mr50634358qgv.91.1399390366382; Tue, 06 May 2014 08:32:46 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [2001:1868:205::9]) by mx.google.com with ESMTPS id 68si5310167qgn.164.2014.05.06.08.32.40 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 May 2014 08:32:46 -0700 (PDT) Received-SPF: none (google.com: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org does not designate permitted sender hosts) client-ip=2001:1868:205::9; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1WhhKl-0000CX-39; Tue, 06 May 2014 15:31:07 +0000 Received: from mail-we0-f179.google.com ([74.125.82.179]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1WhhKT-0008DS-8l for linux-arm-kernel@lists.infradead.org; Tue, 06 May 2014 15:30:50 +0000 Received: by mail-we0-f179.google.com with SMTP id q59so6919001wes.10 for ; Tue, 06 May 2014 08:30:26 -0700 (PDT) X-Received: by 10.194.94.71 with SMTP id da7mr3028220wjb.56.1399390226323; Tue, 06 May 2014 08:30:26 -0700 (PDT) Received: from marmot.wormnet.eu (marmot.wormnet.eu. [188.246.204.87]) by mx.google.com with ESMTPSA id ho2sm2330106wib.15.2014.05.06.08.30.25 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 May 2014 08:30:25 -0700 (PDT) From: Steve Capper To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH V5 1/6] mm: Introduce a general RCU get_user_pages_fast. Date: Tue, 6 May 2014 16:30:04 +0100 Message-Id: <1399390209-1756-2-git-send-email-steve.capper@linaro.org> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20140506_083049_687988_FCEAB1EF X-CRM114-Status: GOOD ( 22.23 ) X-Spam-Score: -0.7 (/) X-Spam-Report: SpamAssassin version 3.3.2 on bombadil.infradead.org summary: Content analysis details: (-0.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [74.125.82.179 listed in list.dnswl.org] Cc: anders.roxell@linaro.org, peterz@infradead.org, gary.robertson@linaro.org, will.deacon@arm.com, Steve Capper , akpm@linux-foundation.org, christoffer.dall@linaro.org X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , List-Subscribe: , MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: steve.capper@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: patch+caf_=patchwork-forward=linaro.org@linaro.org does not designate permitted sender hosts) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 A general RCU implementation of get_user_pages_fast. It is based on the PowerPC implementation. The lockless page cache protocols are used as this implementation assumes that TLB invalidations do not necessarily need to be broadcast via IPI. This implementation does however assume that THP splits will broadcast an IPI, and this is why interrupts are disabled in the fast_gup walker (otherwise calls to rcu_read_(un)lock would suffice). Signed-off-by: Steve Capper --- mm/Kconfig | 3 + mm/Makefile | 1 + mm/gup.c | 297 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 301 insertions(+) create mode 100644 mm/gup.c diff --git a/mm/Kconfig b/mm/Kconfig index ebe5880..8848a16 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -134,6 +134,9 @@ config HAVE_MEMBLOCK config HAVE_MEMBLOCK_NODE_MAP boolean +config HAVE_RCU_GUP + boolean + config ARCH_DISCARD_MEMBLOCK boolean diff --git a/mm/Makefile b/mm/Makefile index b484452..83e6ac2 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -29,6 +29,7 @@ else endif obj-$(CONFIG_HAVE_MEMBLOCK) += memblock.o +obj-$(CONFIG_HAVE_RCU_GUP) += gup.o obj-$(CONFIG_BOUNCE) += bounce.o obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o diff --git a/mm/gup.c b/mm/gup.c new file mode 100644 index 0000000..b35296f --- /dev/null +++ b/mm/gup.c @@ -0,0 +1,297 @@ +/* + * mm/gup.c + * + * Copyright (C) 2014 Linaro Ltd. + * + * Based on arch/powerpc/mm/gup.c which is: + * Copyright (C) 2008 Nick Piggin + * Copyright (C) 2008 Novell Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include +#include +#include +#include +#include +#include + +#ifdef __HAVE_ARCH_PTE_SPECIAL +static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + pte_t *ptep, *ptem; + int ret = 0; + + ptem = ptep = pte_offset_map(&pmd, addr); + do { + pte_t pte = ACCESS_ONCE(*ptep); + struct page *page; + + if (!pte_present(pte) || pte_special(pte) + || (write && !pte_write(pte))) + goto pte_unmap; + + VM_BUG_ON(!pfn_valid(pte_pfn(pte))); + page = pte_page(pte); + + if (!page_cache_get_speculative(page)) + goto pte_unmap; + + if (unlikely(pte_val(pte) != pte_val(*ptep))) { + put_page(page); + goto pte_unmap; + } + + pages[*nr] = page; + (*nr)++; + + } while (ptep++, addr += PAGE_SIZE, addr != end); + + ret = 1; + +pte_unmap: + pte_unmap(ptem); + return ret; +} +#else + +/* + * If we can't determine whether or not a pte is special, then fail immediately + * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not + * to be special. + */ +static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + return 0; +} +#endif /* __HAVE_ARCH_PTE_SPECIAL */ + +static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, + unsigned long end, int write, struct page **pages, int *nr) +{ + struct page *head, *page, *tail; + int refs; + + if (!pmd_present(orig) || (write && !pmd_write(orig))) + return 0; + + refs = 0; + head = pmd_page(orig); + page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT); + tail = page; + do { + VM_BUG_ON(compound_head(page) != head); + pages[*nr] = page; + (*nr)++; + page++; + refs++; + } while (addr += PAGE_SIZE, addr != end); + + if (!page_cache_add_speculative(head, refs)) { + *nr -= refs; + return 0; + } + + if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { + *nr -= refs; + while (refs--) + put_page(head); + return 0; + } + + /* + * Any tail pages need their mapcount reference taken before we + * return. (This allows the THP code to bump their ref count when + * they are split into base pages). + */ + while (refs--) { + if (PageTail(tail)) + get_huge_page_tail(tail); + tail++; + } + + return 1; +} + +static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, + unsigned long end, int write, struct page **pages, int *nr) +{ + struct page *head, *page, *tail; + pmd_t origpmd = __pmd(pud_val(orig)); + int refs; + + if (!pmd_present(origpmd) || (write && !pmd_write(origpmd))) + return 0; + + refs = 0; + head = pmd_page(origpmd); + page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT); + tail = page; + do { + VM_BUG_ON(compound_head(page) != head); + pages[*nr] = page; + (*nr)++; + page++; + refs++; + } while (addr += PAGE_SIZE, addr != end); + + if (!page_cache_add_speculative(head, refs)) { + *nr -= refs; + return 0; + } + + if (unlikely(pud_val(orig) != pud_val(*pudp))) { + *nr -= refs; + while (refs--) + put_page(head); + return 0; + } + + while (refs--) { + if (PageTail(tail)) + get_huge_page_tail(tail); + tail++; + } + + return 1; +} + +static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + unsigned long next; + pmd_t *pmdp; + + pmdp = pmd_offset(&pud, addr); + do { + pmd_t pmd = ACCESS_ONCE(*pmdp); + next = pmd_addr_end(addr, end); + if (pmd_none(pmd) || pmd_trans_splitting(pmd)) + return 0; + + if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) { + if (!gup_huge_pmd(pmd, pmdp, addr, next, write, + pages, nr)) + return 0; + } else { + if (!gup_pte_range(pmd, addr, next, write, pages, nr)) + return 0; + } + } while (pmdp++, addr = next, addr != end); + + return 1; +} + +static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + unsigned long next; + pud_t *pudp; + + pudp = pud_offset(pgdp, addr); + do { + pud_t pud = ACCESS_ONCE(*pudp); + next = pud_addr_end(addr, end); + if (pud_none(pud)) + return 0; + if (pud_huge(pud)) { + if (!gup_huge_pud(pud, pudp, addr, next, write, + pages, nr)) + return 0; + } else if (!gup_pmd_range(pud, addr, next, write, pages, nr)) + return 0; + } while (pudp++, addr = next, addr != end); + + return 1; +} + +/* + * Like get_user_pages_fast() except its IRQ-safe in that it won't fall + * back to the regular GUP. + */ +int __get_user_pages_fast(unsigned long start, int nr_pages, int write, + struct page **pages) +{ + struct mm_struct *mm = current->mm; + unsigned long addr, len, end; + unsigned long next, flags; + pgd_t *pgdp; + int nr = 0; + + start &= PAGE_MASK; + addr = start; + len = (unsigned long) nr_pages << PAGE_SHIFT; + end = start + len; + + if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ, + start, len))) + return 0; + + /* + * Disable interrupts, we use the nested form as we can already + * have interrupts disabled by get_futex_key. + * + * With interrupts disabled, we block page table pages from being + * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h + * for more details. + * + * We do not adopt an rcu_read_lock(.) here as we also want to + * block IPIs that come from THPs splitting. + */ + + local_irq_save(flags); + pgdp = pgd_offset(mm, addr); + do { + next = pgd_addr_end(addr, end); + if (pgd_none(*pgdp)) + break; + else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr)) + break; + } while (pgdp++, addr = next, addr != end); + local_irq_restore(flags); + + return nr; +} + +int get_user_pages_fast(unsigned long start, int nr_pages, int write, + struct page **pages) +{ + struct mm_struct *mm = current->mm; + int nr, ret; + + start &= PAGE_MASK; + nr = __get_user_pages_fast(start, nr_pages, write, pages); + ret = nr; + + if (nr < nr_pages) { + /* Try to get the remaining pages with get_user_pages */ + start += nr << PAGE_SHIFT; + pages += nr; + + down_read(&mm->mmap_sem); + ret = get_user_pages(current, mm, start, + nr_pages - nr, write, 0, pages, NULL); + up_read(&mm->mmap_sem); + + /* Have to be a bit careful with return values */ + if (nr > 0) { + if (ret < 0) + ret = nr; + else + ret += nr; + } + } + + return ret; +}