From patchwork Thu Nov 19 16:05:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 329481 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA8E1C63697 for ; Thu, 19 Nov 2020 16:07:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6BE2022256 for ; Thu, 19 Nov 2020 16:07:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=sent.com header.i=@sent.com header.b="FDLz6gJ+"; dkim=temperror (0-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="A1v6B0Ra" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728014AbgKSQGj (ORCPT ); Thu, 19 Nov 2020 11:06:39 -0500 Received: from wnew2-smtp.messagingengine.com ([64.147.123.27]:59533 "EHLO wnew2-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727677AbgKSQGj (ORCPT ); Thu, 19 Nov 2020 11:06:39 -0500 Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailnew.west.internal (Postfix) with ESMTP id 45ADFE24; Thu, 19 Nov 2020 11:06:37 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Thu, 19 Nov 2020 11:06:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=BRsajNq/mgBRP PZp1kutKtKmDBoBjDe/+SYMX/W0D0U=; b=FDLz6gJ+dO7knnH13FpyK8TQaxf+W EYqy2fEmmEG/IteDYR+ZIaWGci0rTPS7vM/oetlWgX+QVU0g41WT/Ou2ffqfgZds ExXBpCw6ptAB1Mv4f5aF2OSAUMj81CsSoglgZ0T20E3lklRHuCYdLCq0NkrFRvZp OdS9XqWyKCz7c5NavdDtXD3/6k7dUVUgj7vkV2U/fbIKkKtzlMyULtYV2ZlEyOFJ Sb+XcLsvN1GgHCf3lBhPst9rcYvLSCENpVfjqQTPcRYsevrcwAkqUTxopNNGhIUd Ohx+5HQW3nR2lTGKpBnP43IGtby7KqJEybOujQ8++9bI9+r42SA4Y502w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=BRsajNq/mgBRPPZp1kutKtKmDBoBjDe/+SYMX/W0D0U=; b=A1v6B0Ra SEHovgQKtdi3lWD/XaTW6wvMauUHvKQBSs8n2sQSI/HszBpuYS5deYD2sE0oQWGn KhBTR2Dqw3hysWRElAx0QaGKqlUXZCO84PXqauI3oqyLZ0XBhhOsbzDz3qJie+QY efBDx6pv1qWH3qgZMlR8EwUFjwwQpBDOZ1fLXi60k2UtzwUfAAIqKO8KvbFv3eUR b5Hrvn8TctVtEfHd510qV1b+7ZMhXlpfgVSYsot+zeId0txzHerLBR6HS0bwL6Yz uxByuDuCMR67r0e+lqMJfobqelQf8PtazJRoJIjQ0wRRKbdLcTVKwCLl4uOIXr4c 65Avdcmq85GcpA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrudefjedgkeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkffojghfrhgggfestdhqredtredttdenucfhrhhomhepkghiucgj rghnuceoiihirdihrghnsehsvghnthdrtghomheqnecuggftrfgrthhtvghrnhepieejue dvueduuefhgefhheeiuedvtedvuefgieegveetueeiueehtdegudehfeelnecukfhppedu vddrgeeirddutdeirdduieegnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpe hmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomh X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id D12EE3064AAE; Thu, 19 Nov 2020 11:06:35 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, Matthew Wilcox , "Kirill A . Shutemov" Cc: Roman Gushchin , Andrew Morton , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Yang Shi , Michal Hocko , John Hubbard , Ralph Campbell , David Nellans , Zi Yan Subject: [PATCH 1/7] XArray: Fix splitting to non-zero orders Date: Thu, 19 Nov 2020 11:05:59 -0500 Message-Id: <20201119160605.1272425-2-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201119160605.1272425-1-zi.yan@sent.com> References: <20201119160605.1272425-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: "Matthew Wilcox (Oracle)" Splitting an order-4 entry into order-2 entries would leave the array containing pointers to 000040008000c000 instead of 000044448888cccc. This is a one-character fix, but enhance the test suite to check this case. Reported-by: Zi Yan Signed-off-by: Matthew Wilcox (Oracle) --- lib/test_xarray.c | 26 ++++++++++++++------------ lib/xarray.c | 4 ++-- 2 files changed, 16 insertions(+), 14 deletions(-) diff --git a/lib/test_xarray.c b/lib/test_xarray.c index 8294f43f4981..8b1c318189ce 100644 --- a/lib/test_xarray.c +++ b/lib/test_xarray.c @@ -1530,24 +1530,24 @@ static noinline void check_store_range(struct xarray *xa) #ifdef CONFIG_XARRAY_MULTI static void check_split_1(struct xarray *xa, unsigned long index, - unsigned int order) + unsigned int order, unsigned int new_order) { - XA_STATE(xas, xa, index); - void *entry; - unsigned int i = 0; + XA_STATE_ORDER(xas, xa, index, new_order); + unsigned int i; xa_store_order(xa, index, order, xa, GFP_KERNEL); xas_split_alloc(&xas, xa, order, GFP_KERNEL); xas_lock(&xas); xas_split(&xas, xa, order); + for (i = 0; i < (1 << order); i += (1 << new_order)) + __xa_store(xa, index + i, xa_mk_index(index + i), 0); xas_unlock(&xas); - xa_for_each(xa, index, entry) { - XA_BUG_ON(xa, entry != xa); - i++; + for (i = 0; i < (1 << order); i++) { + unsigned int val = index + (i & ~((1 << new_order) - 1)); + XA_BUG_ON(xa, xa_load(xa, index + i) != xa_mk_index(val)); } - XA_BUG_ON(xa, i != 1 << order); xa_set_mark(xa, index, XA_MARK_0); XA_BUG_ON(xa, !xa_get_mark(xa, index, XA_MARK_0)); @@ -1557,14 +1557,16 @@ static void check_split_1(struct xarray *xa, unsigned long index, static noinline void check_split(struct xarray *xa) { - unsigned int order; + unsigned int order, new_order; XA_BUG_ON(xa, !xa_empty(xa)); for (order = 1; order < 2 * XA_CHUNK_SHIFT; order++) { - check_split_1(xa, 0, order); - check_split_1(xa, 1UL << order, order); - check_split_1(xa, 3UL << order, order); + for (new_order = 0; new_order < order; new_order++) { + check_split_1(xa, 0, order, new_order); + check_split_1(xa, 1UL << order, order, new_order); + check_split_1(xa, 3UL << order, order, new_order); + } } } #else diff --git a/lib/xarray.c b/lib/xarray.c index fc70e37c4c17..74915ba018c4 100644 --- a/lib/xarray.c +++ b/lib/xarray.c @@ -1012,7 +1012,7 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order, do { unsigned int i; - void *sibling; + void *sibling = NULL; struct xa_node *node; node = kmem_cache_alloc(radix_tree_node_cachep, gfp); @@ -1022,7 +1022,7 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order, for (i = 0; i < XA_CHUNK_SIZE; i++) { if ((i & mask) == 0) { RCU_INIT_POINTER(node->slots[i], entry); - sibling = xa_mk_sibling(0); + sibling = xa_mk_sibling(i); } else { RCU_INIT_POINTER(node->slots[i], sibling); } From patchwork Thu Nov 19 16:06:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 329482 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B85AC63697 for ; Thu, 19 Nov 2020 16:07:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3127F22256 for ; Thu, 19 Nov 2020 16:07:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=sent.com header.i=@sent.com header.b="YsR7n7kW"; dkim=temperror (0-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="R/eTVlyp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728011AbgKSQGx (ORCPT ); Thu, 19 Nov 2020 11:06:53 -0500 Received: from wnew2-smtp.messagingengine.com ([64.147.123.27]:35801 "EHLO wnew2-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727983AbgKSQGj (ORCPT ); Thu, 19 Nov 2020 11:06:39 -0500 Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailnew.west.internal (Postfix) with ESMTP id 4BF4AEB4; Thu, 19 Nov 2020 11:06:37 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Thu, 19 Nov 2020 11:06:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=6B/FUvKHHScTD Z2PaEwUPvScxGZ9bSlz+44XAiXDEiM=; b=YsR7n7kW9Jvv8ACqaj1dWyAH6eKXW h5uDkOOwoRc8QTBdpZgX4wGYyBgPO3nL7G9+IyPZtcUqTpGYozxeSnGdTrhrQqLj VAhnrQgTxE4a+ipeWbzKP3T3oafEOeB5cboejbI7X+lavexpWCHIph963Fl93d5N M29uNM1TAddQdJbi2wFcdPEW4nJwcirnb8lGntEQqOF/X8zrmrghjyiHUvboVSz8 ZpUf6aY/8++LbCLL5AfUWv9n/j+3CDRTdJ11aUMPI2rwO/lPYfrwGj+FxFVV+ezk RZQdq3jU3gySY1iXvj+8PNr5zaBdSc635XWPOMZfMqm70ZahT9JlOXRjA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=6B/FUvKHHScTDZ2PaEwUPvScxGZ9bSlz+44XAiXDEiM=; b=R/eTVlyp jOZAw0DvdwgtqygGOQqAcaVBwcbkztAAMsP0jnyAgSzzhg/vmBNwzlzQ94DtwFlB dTwl92GHJ/804MT7/VwrOelhdvTlDUh2xoUKVZjZ1g9Y+EJPwXJdtKFGo19RjTzz VZbGkma7rfG+4I334Kgqq/iDOGlv8DhnjnA+84L7mLAaOIT/dvt0gxzn2L/YlOYX FEa0F4GuQY+lrmgQnHkU+AyoHSieSG87fn1VhR+R/2m01oTvQwKwdqzOWcSk8nNT LY376r778znoUsN8EuTLRNSpd2jdr6+gqyjahCbWZYdRmpJ684+7F92/YaDX5OLk LIONDlJPr+75+A== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrudefjedgkeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkffojghfrhgggfestdhqredtredttdenucfhrhhomhepkghiucgj rghnuceoiihirdihrghnsehsvghnthdrtghomheqnecuggftrfgrthhtvghrnhepieejue dvueduuefhgefhheeiuedvtedvuefgieegveetueeiueehtdegudehfeelnecukfhppedu vddrgeeirddutdeirdduieegnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpe hmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomh X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id 221EE3064AAF; Thu, 19 Nov 2020 11:06:36 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, Matthew Wilcox , "Kirill A . Shutemov" Cc: Roman Gushchin , Andrew Morton , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Yang Shi , Michal Hocko , John Hubbard , Ralph Campbell , David Nellans , Zi Yan Subject: [PATCH 2/7] mm: huge_memory: add new debugfs interface to trigger split huge page on any page range. Date: Thu, 19 Nov 2020 11:06:00 -0500 Message-Id: <20201119160605.1272425-3-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201119160605.1272425-1-zi.yan@sent.com> References: <20201119160605.1272425-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Zi Yan Huge pages in the process with the given pid and virtual address range are split. It is used to test split huge page function. In addition, a testing program is added to tools/testing/selftests/vm to utilize the interface by splitting PMD THPs and PTE-mapped THPs. Signed-off-by: Zi Yan --- mm/huge_memory.c | 98 ++++++ mm/internal.h | 1 + mm/migrate.c | 2 +- tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 1 + .../selftests/vm/split_huge_page_test.c | 313 ++++++++++++++++++ 6 files changed, 415 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/vm/split_huge_page_test.c diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1bf51d3f2f2d..88d8b7fce5d7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -7,6 +7,7 @@ #include #include +#include #include #include #include @@ -2934,10 +2935,107 @@ static int split_huge_pages_set(void *data, u64 val) DEFINE_DEBUGFS_ATTRIBUTE(split_huge_pages_fops, NULL, split_huge_pages_set, "%llu\n"); +static ssize_t split_huge_pages_in_range_pid_write(struct file *file, + const char __user *buf, size_t count, loff_t *ppops) +{ + static DEFINE_MUTEX(mutex); + ssize_t ret; + char input_buf[80]; /* hold pid, start_vaddr, end_vaddr */ + int pid; + unsigned long vaddr_start, vaddr_end, addr; + nodemask_t task_nodes; + struct mm_struct *mm; + unsigned long total = 0, split = 0; + + ret = mutex_lock_interruptible(&mutex); + if (ret) + return ret; + + ret = -EFAULT; + + memset(input_buf, 0, 80); + if (copy_from_user(input_buf, buf, min_t(size_t, count, 80))) + goto out; + + input_buf[79] = '\0'; + ret = sscanf(input_buf, "%d,0x%lx,0x%lx", &pid, &vaddr_start, &vaddr_end); + if (ret != 3) { + ret = -EINVAL; + goto out; + } + vaddr_start &= PAGE_MASK; + vaddr_end &= PAGE_MASK; + + ret = strlen(input_buf); + pr_debug("split huge pages in pid: %d, vaddr: [%lx - %lx]\n", + pid, vaddr_start, vaddr_end); + + mm = find_mm_struct(pid, &task_nodes); + if (IS_ERR(mm)) { + ret = -EINVAL; + goto out; + } + + mmap_read_lock(mm); + /* + * always increase addr by PAGE_SIZE, since we could have a PTE page + * table filled with PTE-mapped THPs, each of which is distinct. + */ + for (addr = vaddr_start; addr < vaddr_end; addr += PAGE_SIZE) { + struct vm_area_struct *vma = find_vma(mm, addr); + unsigned int follflags; + struct page *page; + + if (!vma || addr < vma->vm_start || !vma_migratable(vma)) + break; + + /* FOLL_DUMP to ignore special (like zero) pages */ + follflags = FOLL_GET | FOLL_DUMP; + page = follow_page(vma, addr, follflags); + + if (IS_ERR(page)) + break; + if (!page) + break; + + if (!is_transparent_hugepage(page)) + continue; + + total++; + if (!can_split_huge_page(compound_head(page), NULL)) + continue; + + if (!trylock_page(page)) + continue; + + if (!split_huge_page(page)) + split++; + + unlock_page(page); + put_page(page); + } + mmap_read_unlock(mm); + mmput(mm); + + pr_debug("%lu of %lu THP split\n", split, total); +out: + mutex_unlock(&mutex); + return ret; + +} + +static const struct file_operations split_huge_pages_in_range_pid_fops = { + .owner = THIS_MODULE, + .write = split_huge_pages_in_range_pid_write, + .llseek = no_llseek, +}; + static int __init split_huge_pages_debugfs(void) { debugfs_create_file("split_huge_pages", 0200, NULL, NULL, &split_huge_pages_fops); + debugfs_create_file("split_huge_pages_in_range_pid", 0200, NULL, NULL, + &split_huge_pages_in_range_pid_fops); return 0; } late_initcall(split_huge_pages_debugfs); diff --git a/mm/internal.h b/mm/internal.h index fbebc3ff288c..b94b2d96e47a 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -627,4 +627,5 @@ struct migration_target_control { bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end); void page_cache_free_page(struct address_space *mapping, struct page *page); +struct mm_struct *find_mm_struct(pid_t pid, nodemask_t *mem_nodes); #endif /* __MM_INTERNAL_H */ diff --git a/mm/migrate.c b/mm/migrate.c index 6dfc7ea08f78..8fb328f9e180 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1853,7 +1853,7 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages, return nr_pages ? -EFAULT : 0; } -static struct mm_struct *find_mm_struct(pid_t pid, nodemask_t *mem_nodes) +struct mm_struct *find_mm_struct(pid_t pid, nodemask_t *mem_nodes) { struct task_struct *task; struct mm_struct *mm; diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore index c8deddc81e7a..da92ded5a27c 100644 --- a/tools/testing/selftests/vm/.gitignore +++ b/tools/testing/selftests/vm/.gitignore @@ -23,3 +23,4 @@ write_to_hugetlbfs hmm-tests memfd_secret local_config.* +split_huge_page_test diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 9ab98946fbf2..1cc4e5b76dac 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -43,6 +43,7 @@ TEST_GEN_FILES += on-fault-limit TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd +TEST_GEN_FILES += split_huge_page_test ifeq ($(ARCH),x86_64) CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh $(CC) ../x86/trivial_32bit_program.c -m32) diff --git a/tools/testing/selftests/vm/split_huge_page_test.c b/tools/testing/selftests/vm/split_huge_page_test.c new file mode 100644 index 000000000000..cd2ced8c1261 --- /dev/null +++ b/tools/testing/selftests/vm/split_huge_page_test.c @@ -0,0 +1,313 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define _GNU_SOURCE +#include +#include +#include "numa.h" +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +uint64_t pagesize; +unsigned int pageshift; +uint64_t pmd_pagesize; + +#define PMD_SIZE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size" +#define SPLIT_DEBUGFS "/sys/kernel/debug/split_huge_pages_in_range_pid" +#define SMAP_PATH "/proc/self/smaps" +#define INPUT_MAX 80 + +#define PFN_MASK ((1UL<<55)-1) +#define KPF_THP (1UL<<22) + +int is_backed_by_thp(char *vaddr, int pagemap_file, int kpageflags_file) +{ + uint64_t paddr; + uint64_t page_flags; + + if (pagemap_file) { + pread(pagemap_file, &paddr, sizeof(paddr), + ((long)vaddr >> pageshift) * sizeof(paddr)); + + if (kpageflags_file) { + pread(kpageflags_file, &page_flags, sizeof(page_flags), + (paddr & PFN_MASK) * sizeof(page_flags)); + + return !!(page_flags & KPF_THP); + } + } + return 0; +} + + +static uint64_t read_pmd_pagesize(void) +{ + int fd; + char buf[20]; + ssize_t num_read; + + fd = open(PMD_SIZE_PATH, O_RDONLY); + if (fd == -1) { + perror("Open hpage_pmd_size failed"); + exit(EXIT_FAILURE); + } + num_read = read(fd, buf, 19); + if (num_read < 1) { + close(fd); + perror("Read hpage_pmd_size failed"); + exit(EXIT_FAILURE); + } + buf[num_read] = '\0'; + close(fd); + + return strtoul(buf, NULL, 10); +} + +static int write_file(const char *path, const char *buf, size_t buflen) +{ + int fd; + ssize_t numwritten; + + fd = open(path, O_WRONLY); + if (fd == -1) + return 0; + + numwritten = write(fd, buf, buflen - 1); + close(fd); + if (numwritten < 1) + return 0; + + return (unsigned int) numwritten; +} + +static void write_debugfs(int pid, uint64_t vaddr_start, uint64_t vaddr_end) +{ + char input[INPUT_MAX]; + int ret; + + ret = snprintf(input, INPUT_MAX, "%d,0x%lx,0x%lx", pid, vaddr_start, + vaddr_end); + if (ret >= INPUT_MAX) { + printf("%s: Debugfs input is too long\n", __func__); + exit(EXIT_FAILURE); + } + + if (!write_file(SPLIT_DEBUGFS, input, ret + 1)) { + perror(SPLIT_DEBUGFS); + exit(EXIT_FAILURE); + } +} + +#define MAX_LINE_LENGTH 500 + +static bool check_for_pattern(FILE *fp, const char *pattern, char *buf) +{ + while (fgets(buf, MAX_LINE_LENGTH, fp) != NULL) { + if (!strncmp(buf, pattern, strlen(pattern))) + return true; + } + return false; +} + +static uint64_t check_huge(void *addr) +{ + uint64_t thp = 0; + int ret; + FILE *fp; + char buffer[MAX_LINE_LENGTH]; + char addr_pattern[MAX_LINE_LENGTH]; + + ret = snprintf(addr_pattern, MAX_LINE_LENGTH, "%08lx-", + (unsigned long) addr); + if (ret >= MAX_LINE_LENGTH) { + printf("%s: Pattern is too long\n", __func__); + exit(EXIT_FAILURE); + } + + + fp = fopen(SMAP_PATH, "r"); + if (!fp) { + printf("%s: Failed to open file %s\n", __func__, SMAP_PATH); + exit(EXIT_FAILURE); + } + if (!check_for_pattern(fp, addr_pattern, buffer)) + goto err_out; + + /* + * Fetch the AnonHugePages: in the same block and check the number of + * hugepages. + */ + if (!check_for_pattern(fp, "AnonHugePages:", buffer)) + goto err_out; + + if (sscanf(buffer, "AnonHugePages:%10ld kB", &thp) != 1) { + printf("Reading smap error\n"); + exit(EXIT_FAILURE); + } + +err_out: + fclose(fp); + return thp; +} + +void split_pmd_thp(void) +{ + char *one_page; + size_t len = 4 * pmd_pagesize; + uint64_t thp_size; + size_t i; + + one_page = memalign(pmd_pagesize, len); + + madvise(one_page, len, MADV_HUGEPAGE); + + for (i = 0; i < len; i++) + one_page[i] = (char)i; + + thp_size = check_huge(one_page); + if (!thp_size) { + printf("No THP is allocatd"); + exit(EXIT_FAILURE); + } + + /* split all possible huge pages */ + write_debugfs(getpid(), (uint64_t)one_page, (uint64_t)one_page + len); + + for (i = 0; i < len; i++) + if (one_page[i] != (char)i) { + printf("%ld byte corrupted\n", i); + exit(EXIT_FAILURE); + } + + + thp_size = check_huge(one_page); + if (thp_size) { + printf("Still %ld kB AnonHugePages not split\n", thp_size); + exit(EXIT_FAILURE); + } + + printf("Split huge pages successful\n"); + free(one_page); +} + +void split_pte_mapped_thp(void) +{ + char *one_page, *pte_mapped, *pte_mapped2; + size_t len = 4 * pmd_pagesize; + uint64_t thp_size; + size_t i; + const char *pagemap_template = "/proc/%d/pagemap"; + const char *kpageflags_proc = "/proc/kpageflags"; + char pagemap_proc[255]; + int pagemap_fd; + int kpageflags_fd; + + if (snprintf(pagemap_proc, 255, pagemap_template, getpid()) < 0) { + perror("get pagemap proc error"); + exit(EXIT_FAILURE); + } + pagemap_fd = open(pagemap_proc, O_RDONLY); + + if (pagemap_fd == -1) { + perror("read pagemap:"); + exit(EXIT_FAILURE); + } + + kpageflags_fd = open(kpageflags_proc, O_RDONLY); + + if (kpageflags_fd == -1) { + perror("read kpageflags:"); + exit(EXIT_FAILURE); + } + + one_page = mmap((void *)(1UL << 30), len, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + + madvise(one_page, len, MADV_HUGEPAGE); + + for (i = 0; i < len; i++) + one_page[i] = (char)i; + + thp_size = check_huge(one_page); + if (!thp_size) { + printf("No THP is allocatd"); + exit(EXIT_FAILURE); + } + + pte_mapped = mremap(one_page, pagesize, pagesize, MREMAP_MAYMOVE); + + for (i = 1; i < 4; i++) { + pte_mapped2 = mremap(one_page + pmd_pagesize * i + pagesize * i, + pagesize, pagesize, + MREMAP_MAYMOVE|MREMAP_FIXED, + pte_mapped + pagesize * i); + if (pte_mapped2 == (char *)-1) { + perror("mremap failed"); + exit(EXIT_FAILURE); + } + } + + /* smap does not show THPs after mremap, use kpageflags instead */ + thp_size = 0; + for (i = 0; i < pagesize * 4; i++) + if (i % pagesize == 0 && + is_backed_by_thp(&pte_mapped[i], pagemap_fd, kpageflags_fd)) + thp_size++; + + if (thp_size != 4) { + printf("Some THPs are missing during mremap\n"); + exit(EXIT_FAILURE); + } + + /* split all possible huge pages */ + write_debugfs(getpid(), (uint64_t)pte_mapped, + (uint64_t)pte_mapped + pagesize * 4); + + /* smap does not show THPs after mremap, use kpageflags instead */ + thp_size = 0; + for (i = 0; i < pagesize * 4; i++) { + if (pte_mapped[i] != (char)i) { + printf("%ld byte corrupted\n", i); + exit(EXIT_FAILURE); + } + if (i % pagesize == 0 && + is_backed_by_thp(&pte_mapped[i], pagemap_fd, kpageflags_fd)) + thp_size++; + } + + if (thp_size) { + printf("Still %ld THPs not split\n", thp_size); + exit(EXIT_FAILURE); + } + + printf("Split PTE-mapped huge pages successful\n"); + munmap(one_page, len); + close(pagemap_fd); + close(kpageflags_fd); +} + +int main(int argc, char **argv) +{ + if (geteuid() != 0) { + printf("Please run the benchmark as root\n"); + exit(EXIT_FAILURE); + } + + pagesize = getpagesize(); + pageshift = ffs(pagesize) - 1; + pmd_pagesize = read_pmd_pagesize(); + + split_pmd_thp(); + split_pte_mapped_thp(); + + return 0; +} From patchwork Thu Nov 19 16:06:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 328747 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 008DCC56201 for ; Thu, 19 Nov 2020 16:06:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8A53C246E1 for ; Thu, 19 Nov 2020 16:06:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=sent.com header.i=@sent.com header.b="xCI3YILo"; dkim=temperror (0-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="dWEJemL6" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728000AbgKSQGj (ORCPT ); Thu, 19 Nov 2020 11:06:39 -0500 Received: from wnew2-smtp.messagingengine.com ([64.147.123.27]:41859 "EHLO wnew2-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727990AbgKSQGj (ORCPT ); Thu, 19 Nov 2020 11:06:39 -0500 Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailnew.west.internal (Postfix) with ESMTP id 5E12FED5; Thu, 19 Nov 2020 11:06:37 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Thu, 19 Nov 2020 11:06:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=jekI5tMSgt2QD NEmEebnlTl09MgC4oLm1jXTahZK9g8=; b=xCI3YILoNwMJ1q3mpREI968HKniUl KJDnSdZzmgVVnjECKpH0wPvIbLnmzCkMphzUtMR+5cfiqfcM3tdtcgpeXb65wCiM +urbVtIAJgLnh4DAo36h/KT2nPVhjyCmXMfW94sVucMb6gq14hB5feX/s1fV55IL HvNWnWw9k8l/6oBlGLyAvt/RHInv5y8QDxe425V5nAVfsS6sTaaM0NGuYA/K4lm1 X5EIOb3NC+6B27XbyjE5qlwsVYSj/LwU18Z7V+ZwVGDSGUEoEgOmBXKUe0lzX2qY IigGXIu4vHfIauHTQw1qnIsy/XsFbEr22aPH9sgE5nlh63K+48ca/462Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=jekI5tMSgt2QDNEmEebnlTl09MgC4oLm1jXTahZK9g8=; b=dWEJemL6 go/dlyGtVNcvQPCyNYEQ4yAE6bfsE+reEGbEUhp6yeoM5z3kgyGSEtZxebUWNw2n oS0zkUppYyq+pWeh0WOmuhUSi+o5U4L6DNanYirO56FTtCLQKqteopJ9FMV0e8Mq xYOqKCiqs26cw/s+nyawd38+64QErA6NQ2SYm61oV7ZFeN7VByBtLdu1iaHZ8h46 g1hfFE7el4ri55sbEkTcBip04TxyiMexaaLMZknsz2NzL2adjVRmufxJWpUfdxEO ur5tYdjoPSiKb1SILXp5X8gLdP87QKduHxw2IeYOy8aH73SxKFI0hx+Gf+BhyZXA ulQJHYSJPPSQzg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrudefjedgkeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkffojghfrhgggfestdhqredtredttdenucfhrhhomhepkghiucgj rghnuceoiihirdihrghnsehsvghnthdrtghomheqnecuggftrfgrthhtvghrnhepieejue dvueduuefhgefhheeiuedvtedvuefgieegveetueeiueehtdegudehfeelnecukfhppedu vddrgeeirddutdeirdduieegnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpe hmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomh X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id 686B03064AB4; Thu, 19 Nov 2020 11:06:36 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, Matthew Wilcox , "Kirill A . Shutemov" Cc: Roman Gushchin , Andrew Morton , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Yang Shi , Michal Hocko , John Hubbard , Ralph Campbell , David Nellans , Zi Yan Subject: [PATCH 3/7] mm: memcg: make memcg huge page split support any order split. Date: Thu, 19 Nov 2020 11:06:01 -0500 Message-Id: <20201119160605.1272425-4-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201119160605.1272425-1-zi.yan@sent.com> References: <20201119160605.1272425-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Zi Yan It sets memcg information for the pages after the split. A new parameter new_order is added to tell the new page order, always 0 for now. It prepares for upcoming changes to support split huge page to any lower order. Signed-off-by: Zi Yan Reviewed-by: Ralph Campbell Acked-by: Roman Gushchin --- include/linux/memcontrol.h | 5 +++-- mm/huge_memory.c | 2 +- mm/memcontrol.c | 6 +++--- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index a8d5daf95988..39707feae505 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1062,7 +1062,7 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm, } #ifdef CONFIG_TRANSPARENT_HUGEPAGE -void mem_cgroup_split_huge_fixup(struct page *head); +void mem_cgroup_split_huge_fixup(struct page *head, unsigned int new_order); #endif #else /* CONFIG_MEMCG */ @@ -1396,7 +1396,8 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, return 0; } -static inline void mem_cgroup_split_huge_fixup(struct page *head) +static inline void mem_cgroup_split_huge_fixup(struct page *head, + unsigned int new_order) { } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 88d8b7fce5d7..d7ab5cac5851 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2428,7 +2428,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, lruvec = mem_cgroup_page_lruvec(head, pgdat); /* complete memcg works before add pages to LRU */ - mem_cgroup_split_huge_fixup(head); + mem_cgroup_split_huge_fixup(head, 0); if (PageAnon(head) && PageSwapCache(head)) { swp_entry_t entry = { .val = page_private(head) }; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index de5869dd354d..4521ed3a51b7 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3223,15 +3223,15 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size) * Because tail pages are not marked as "used", set it. We're under * pgdat->lru_lock and migration entries setup in all page mappings. */ -void mem_cgroup_split_huge_fixup(struct page *head) +void mem_cgroup_split_huge_fixup(struct page *head, unsigned int new_order) { struct mem_cgroup *memcg = page_memcg(head); - int i; + int i, new_nr = 1 << new_order; if (mem_cgroup_disabled()) return; - for (i = 1; i < thp_nr_pages(head); i++) { + for (i = new_nr; i < thp_nr_pages(head); i += new_nr) { css_get(&memcg->css); head[i].memcg_data = (unsigned long)memcg; } From patchwork Thu Nov 19 16:06:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 328744 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 808B7C6369E for ; Thu, 19 Nov 2020 16:07:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 28A2E22253 for ; Thu, 19 Nov 2020 16:07:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=sent.com header.i=@sent.com header.b="A09NhnlL"; dkim=temperror (0-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="QRnzrNRm" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727983AbgKSQG5 (ORCPT ); Thu, 19 Nov 2020 11:06:57 -0500 Received: from wnew2-smtp.messagingengine.com ([64.147.123.27]:43701 "EHLO wnew2-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727991AbgKSQGj (ORCPT ); Thu, 19 Nov 2020 11:06:39 -0500 Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailnew.west.internal (Postfix) with ESMTP id BDCADE96; Thu, 19 Nov 2020 11:06:37 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Thu, 19 Nov 2020 11:06:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=mH0cmjCFFHawR fu2fp7V1aZ23JKakX1bmunG/jk3ld4=; b=A09NhnlLth9AQs85DScDDR2MgWzXd prjq6bmdlLKZn0ZEFWuts6n4+bz/y47V2XPGS3Vk9H9j3ulvm8a05iqHjewAF+Ba ApGMO9A9xRDrPuJPLRb5r1WzBKuZNVdg8iE9d+OA2sQIbEgJees7HvU2kdS7X/VC DPGdXnZqIB6qCL+GDOGDshFfj6jgh8lUxxg7vFA/659eEaDsol3Bl1qHAQ/KUePe 1kWch0bJYbkAy3YyF+n12VsmPEVr2SH8udGr4u/E0KFMDvJ8p2NbFnjAtPgFdDTK mk0gwLGNYxz6btwntr5pFxbwEks136YS+6jRMcMdZgugNceXnXRcMwhTA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=mH0cmjCFFHawRfu2fp7V1aZ23JKakX1bmunG/jk3ld4=; b=QRnzrNRm Rw+rqQrua0o+q4j5TfPTmPFEX1Zstcx3I14GeMFTvIj/F/5Xxf7bsGvb9w4GYN28 WXre3l7UJkoosap2LDoy6Mz6yDjtwzy9po60oxlGqKWZv9GbHPRdgZ7u7ZbUdZHo 1IVYWYxcAixjzs1hyaQFuVN0jwpjRZxSlAxL4+oGVibPFSS+ed1P0LlrbywS+EG3 xg2JJmTC/iErijvaDl3Sm4CPyErAtpv7V8wuz3KplTdkGnDGksllyDmISe+UimCg 193rI8sLPGNLJaSnRJq7jpuzW8JqLfnEEqC1Yxtdotwde4+s/YaRLdZT9fIToWwu xr6S4Qdr5JXHjA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrudefjedgkeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkffojghfrhgggfestdhqredtredttdenucfhrhhomhepkghiucgj rghnuceoiihirdihrghnsehsvghnthdrtghomheqnecuggftrfgrthhtvghrnhepieejue dvueduuefhgefhheeiuedvtedvuefgieegveetueeiueehtdegudehfeelnecukfhppedu vddrgeeirddutdeirdduieegnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpe hmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomh X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id AD9AB3064AB5; Thu, 19 Nov 2020 11:06:36 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, Matthew Wilcox , "Kirill A . Shutemov" Cc: Roman Gushchin , Andrew Morton , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Yang Shi , Michal Hocko , John Hubbard , Ralph Campbell , David Nellans , Zi Yan Subject: [PATCH 4/7] mm: page_owner: add support for splitting to any order in split page_owner. Date: Thu, 19 Nov 2020 11:06:02 -0500 Message-Id: <20201119160605.1272425-5-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201119160605.1272425-1-zi.yan@sent.com> References: <20201119160605.1272425-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Zi Yan It adds a new_order parameter to set new page order in page owner and uses old_order instead of nr to make the parameters look consistent. It prepares for upcoming changes to support split huge page to any lower order. Signed-off-by: Zi Yan --- include/linux/page_owner.h | 10 ++++++---- mm/huge_memory.c | 3 ++- mm/page_alloc.c | 2 +- mm/page_owner.c | 13 +++++++------ 4 files changed, 16 insertions(+), 12 deletions(-) diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h index 3468794f83d2..9caaed51403c 100644 --- a/include/linux/page_owner.h +++ b/include/linux/page_owner.h @@ -11,7 +11,8 @@ extern struct page_ext_operations page_owner_ops; extern void __reset_page_owner(struct page *page, unsigned int order); extern void __set_page_owner(struct page *page, unsigned int order, gfp_t gfp_mask); -extern void __split_page_owner(struct page *page, unsigned int nr); +extern void __split_page_owner(struct page *page, unsigned int old_order, + unsigned int new_order); extern void __copy_page_owner(struct page *oldpage, struct page *newpage); extern void __set_page_owner_migrate_reason(struct page *page, int reason); extern void __dump_page_owner(struct page *page); @@ -31,10 +32,11 @@ static inline void set_page_owner(struct page *page, __set_page_owner(page, order, gfp_mask); } -static inline void split_page_owner(struct page *page, unsigned int nr) +static inline void split_page_owner(struct page *page, unsigned int old_order, + unsigned int new_order) { if (static_branch_unlikely(&page_owner_inited)) - __split_page_owner(page, nr); + __split_page_owner(page, old_order, new_order); } static inline void copy_page_owner(struct page *oldpage, struct page *newpage) { @@ -60,7 +62,7 @@ static inline void set_page_owner(struct page *page, { } static inline void split_page_owner(struct page *page, - unsigned int order) + unsigned int old_order, unsigned int new_order) { } static inline void copy_page_owner(struct page *oldpage, struct page *newpage) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d7ab5cac5851..aae7405a0989 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2422,6 +2422,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, struct lruvec *lruvec; struct address_space *swap_cache = NULL; unsigned long offset = 0; + unsigned int order = thp_order(head); unsigned int nr = thp_nr_pages(head); int i; @@ -2458,7 +2459,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, ClearPageCompound(head); - split_page_owner(head, nr); + split_page_owner(head, order, 0); /* See comment in __split_huge_page_tail() */ if (PageAnon(head)) { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 63d8d8b72c10..414f26950190 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3297,7 +3297,7 @@ void split_page(struct page *page, unsigned int order) for (i = 1; i < (1 << order); i++) set_page_refcounted(page + i); - split_page_owner(page, 1 << order); + split_page_owner(page, order, 0); } EXPORT_SYMBOL_GPL(split_page); diff --git a/mm/page_owner.c b/mm/page_owner.c index b735a8eafcdb..00a679a1230b 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -204,19 +204,20 @@ void __set_page_owner_migrate_reason(struct page *page, int reason) page_owner->last_migrate_reason = reason; } -void __split_page_owner(struct page *page, unsigned int nr) +void __split_page_owner(struct page *page, unsigned int old_order, + unsigned int new_order) { - int i; - struct page_ext *page_ext = lookup_page_ext(page); + int i, old_nr = 1 << old_order, new_nr = 1 << new_order; + struct page_ext *page_ext; struct page_owner *page_owner; if (unlikely(!page_ext)) return; - for (i = 0; i < nr; i++) { + for (i = 0; i < old_nr; i += new_nr) { + page_ext = lookup_page_ext(page + i); page_owner = get_page_owner(page_ext); - page_owner->order = 0; - page_ext = page_ext_next(page_ext); + page_owner->order = new_order; } } From patchwork Thu Nov 19 16:06:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 329483 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06C2EC56201 for ; Thu, 19 Nov 2020 16:07:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 97650246AD for ; Thu, 19 Nov 2020 16:07:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=sent.com header.i=@sent.com header.b="rlpQFsO1"; dkim=temperror (0-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="iE/717et" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727990AbgKSQGs (ORCPT ); Thu, 19 Nov 2020 11:06:48 -0500 Received: from wnew2-smtp.messagingengine.com ([64.147.123.27]:59187 "EHLO wnew2-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728003AbgKSQGk (ORCPT ); Thu, 19 Nov 2020 11:06:40 -0500 Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailnew.west.internal (Postfix) with ESMTP id 2011388A; Thu, 19 Nov 2020 11:06:38 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Thu, 19 Nov 2020 11:06:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=2hOwC3vD5z2wV X1SKfcH1zwPI63wkPgsCwpi9pqtZDg=; b=rlpQFsO10+XbTiq1a9cNkTMDly+co 3bOS8q9jLQ/YbHNql0DlbPg9A08wgxbyD2Nk0vtdJBnIwe3S63ikdLlh4uV0YOsY XSn1TwzdHhO3cjRcBpWXJ/zxshuRFLjoKyrIW68s8FzoOloeV97mv4EaDUPwWjVB ZYUf8Dl8A24/zBehR9ISL5TPfikhL580UIPb5XgDk8K8DRjqe7bRCpMX/cyIJVWz 1KtyRLKotZxXMwHmd9MzhdqNTpR01j1TtvuWJzHmjh+dSexnQKUEqc8abCYmHp0d CcwGAo4xLEdeC5Ui6OU82IbsAtd2d3HXMhxnX816MMm7NfQs+V5+jJQaQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=2hOwC3vD5z2wVX1SKfcH1zwPI63wkPgsCwpi9pqtZDg=; b=iE/717et uDPUC2YKAJEerz746LTZIk/MRtTLFlV477H+qhy1eE2wqTs9MqBFgGXilmjnesJg +Ae3GlBmr4DF0AW8e2S9Cl0675agqQsIyGheaJkS2qJuSqOy91gvZjzot9GmPAYK PnQll+DxULiufXSIxXj2YHkgz5Fst7dkkbYp3UJfwDy4TvRSHFbQCwITKsM8btEL 735PlOxfSdCJwJ1pSZsdRWFu48dyQ6AOXlMCxoY0g2iLy7gqXTt6KnTMcNR9pIF6 A1BlFu8Qr8rCPtfWJjxEBn83QPH45keBBpnAnKlk2eH5KFZq5f7H9v5CLay0yXUc JRlwi7HgIIjv8w== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrudefjedgkeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkffojghfrhgggfestdhqredtredttdenucfhrhhomhepkghiucgj rghnuceoiihirdihrghnsehsvghnthdrtghomheqnecuggftrfgrthhtvghrnhepieejue dvueduuefhgefhheeiuedvtedvuefgieegveetueeiueehtdegudehfeelnecukfhppedu vddrgeeirddutdeirdduieegnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpe hmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomh X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id 0028E3064AB6; Thu, 19 Nov 2020 11:06:36 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, Matthew Wilcox , "Kirill A . Shutemov" Cc: Roman Gushchin , Andrew Morton , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Yang Shi , Michal Hocko , John Hubbard , Ralph Campbell , David Nellans , Zi Yan Subject: [PATCH 5/7] mm: thp: split huge page to any lower order pages. Date: Thu, 19 Nov 2020 11:06:03 -0500 Message-Id: <20201119160605.1272425-6-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201119160605.1272425-1-zi.yan@sent.com> References: <20201119160605.1272425-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Zi Yan To split a THP to any lower order pages, we need to reform THPs on subpages at given order and add page refcount based on the new page order. Also we need to reinitialize page_deferred_list after removing the page from the split_queue, otherwise a subsequent split will see list corruption when checking the page_deferred_list again. It has many uses, like minimizing the number of pages after truncating a pagecache THP. For anonymous THPs, we can only split them to order-0 like before until we add support for any size anonymous THPs. Signed-off-by: Zi Yan --- include/linux/huge_mm.h | 8 +++ mm/huge_memory.c | 119 +++++++++++++++++++++++++++++----------- mm/swap.c | 1 - 3 files changed, 96 insertions(+), 32 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 7723deda33e2..0c856f805617 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -182,6 +182,8 @@ bool is_transparent_hugepage(struct page *page); bool can_split_huge_page(struct page *page, int *pextra_pins); int split_huge_page_to_list(struct page *page, struct list_head *list); +int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, + unsigned int new_order); static inline int split_huge_page(struct page *page) { return split_huge_page_to_list(page, NULL); @@ -385,6 +387,12 @@ split_huge_page_to_list(struct page *page, struct list_head *list) { return 0; } +static inline int +split_huge_page_to_list_to_order(struct page *page, struct list_head *list, + unsigned int new_order) +{ + return 0; +} static inline int split_huge_page(struct page *page) { return 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index aae7405a0989..cc70f70862d8 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2325,12 +2325,14 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, static void unmap_page(struct page *page) { - enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | - TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD; + enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_RMAP_LOCKED; bool unmap_success; VM_BUG_ON_PAGE(!PageHead(page), page); + if (thp_order(page) >= HPAGE_PMD_ORDER) + ttu_flags |= TTU_SPLIT_HUGE_PMD; + if (PageAnon(page)) ttu_flags |= TTU_SPLIT_FREEZE; @@ -2338,21 +2340,23 @@ static void unmap_page(struct page *page) VM_BUG_ON_PAGE(!unmap_success, page); } -static void remap_page(struct page *page, unsigned int nr) +static void remap_page(struct page *page, unsigned int nr, unsigned int new_nr) { - int i; - if (PageTransHuge(page)) { + unsigned int i; + + if (thp_nr_pages(page) == nr) { remove_migration_ptes(page, page, true); } else { - for (i = 0; i < nr; i++) + for (i = 0; i < nr; i += new_nr) remove_migration_ptes(page + i, page + i, true); } } static void __split_huge_page_tail(struct page *head, int tail, - struct lruvec *lruvec, struct list_head *list) + struct lruvec *lruvec, struct list_head *list, unsigned int new_order) { struct page *page_tail = head + tail; + unsigned long compound_head_flag = new_order ? (1L << PG_head) : 0; VM_BUG_ON_PAGE(atomic_read(&page_tail->_mapcount) != -1, page_tail); @@ -2376,6 +2380,7 @@ static void __split_huge_page_tail(struct page *head, int tail, #ifdef CONFIG_64BIT (1L << PG_arch_2) | #endif + compound_head_flag | (1L << PG_dirty))); /* ->mapping in first tail page is compound_mapcount */ @@ -2384,7 +2389,10 @@ static void __split_huge_page_tail(struct page *head, int tail, page_tail->mapping = head->mapping; page_tail->index = head->index + tail; - /* Page flags must be visible before we make the page non-compound. */ + /* + * Page flags must be visible before we make the page non-compound or + * a compound page in new_order. + */ smp_wmb(); /* @@ -2394,10 +2402,15 @@ static void __split_huge_page_tail(struct page *head, int tail, * which needs correct compound_head(). */ clear_compound_head(page_tail); + if (new_order) { + prep_compound_page(page_tail, new_order); + thp_prep(page_tail); + } /* Finally unfreeze refcount. Additional reference from page cache. */ - page_ref_unfreeze(page_tail, 1 + (!PageAnon(head) || - PageSwapCache(head))); + page_ref_unfreeze(page_tail, 1 + ((!PageAnon(head) || + PageSwapCache(head)) ? + thp_nr_pages(page_tail) : 0)); if (page_is_young(head)) set_page_young(page_tail); @@ -2415,7 +2428,7 @@ static void __split_huge_page_tail(struct page *head, int tail, } static void __split_huge_page(struct page *page, struct list_head *list, - pgoff_t end, unsigned long flags) + pgoff_t end, unsigned long flags, unsigned int new_order) { struct page *head = compound_head(page); pg_data_t *pgdat = page_pgdat(head); @@ -2424,12 +2437,13 @@ static void __split_huge_page(struct page *page, struct list_head *list, unsigned long offset = 0; unsigned int order = thp_order(head); unsigned int nr = thp_nr_pages(head); + unsigned int new_nr = 1 << new_order; int i; lruvec = mem_cgroup_page_lruvec(head, pgdat); /* complete memcg works before add pages to LRU */ - mem_cgroup_split_huge_fixup(head, 0); + mem_cgroup_split_huge_fixup(head, new_order); if (PageAnon(head) && PageSwapCache(head)) { swp_entry_t entry = { .val = page_private(head) }; @@ -2439,46 +2453,54 @@ static void __split_huge_page(struct page *page, struct list_head *list, xa_lock(&swap_cache->i_pages); } - for (i = nr - 1; i >= 1; i--) { - __split_huge_page_tail(head, i, lruvec, list); + for (i = nr - new_nr; i >= new_nr; i -= new_nr) { + __split_huge_page_tail(head, i, lruvec, list, new_order); /* Some pages can be beyond i_size: drop them from page cache */ if (head[i].index >= end) { ClearPageDirty(head + i); __delete_from_page_cache(head + i, NULL); if (IS_ENABLED(CONFIG_SHMEM) && PageSwapBacked(head)) - shmem_uncharge(head->mapping->host, 1); + shmem_uncharge(head->mapping->host, new_nr); put_page(head + i); } else if (!PageAnon(page)) { __xa_store(&head->mapping->i_pages, head[i].index, head + i, 0); } else if (swap_cache) { + /* + * split anonymous THPs (including swapped out ones) to + * non-zero order not supported + */ + VM_BUG_ON(new_order); __xa_store(&swap_cache->i_pages, offset + i, head + i, 0); } } - ClearPageCompound(head); + if (!new_order) + ClearPageCompound(head); + else + set_compound_order(head, new_order); - split_page_owner(head, order, 0); + split_page_owner(head, order, new_order); /* See comment in __split_huge_page_tail() */ if (PageAnon(head)) { /* Additional pin to swap cache */ if (PageSwapCache(head)) { - page_ref_add(head, 2); + page_ref_add(head, 1 + new_nr); xa_unlock(&swap_cache->i_pages); } else { page_ref_inc(head); } } else { /* Additional pin to page cache */ - page_ref_add(head, 2); + page_ref_add(head, 1 + new_nr); xa_unlock(&head->mapping->i_pages); } spin_unlock_irqrestore(&pgdat->lru_lock, flags); - remap_page(head, nr); + remap_page(head, nr, new_nr); if (PageSwapCache(head)) { swp_entry_t entry = { .val = page_private(head) }; @@ -2486,7 +2508,14 @@ static void __split_huge_page(struct page *page, struct list_head *list, split_swap_cluster(entry); } - for (i = 0; i < nr; i++) { + /* + * set page to its compound_head when split to THPs, so that GUP pin and + * PG_locked are transferred to the right after-split page + */ + if (new_order) + page = compound_head(page); + + for (i = 0; i < nr; i += new_nr) { struct page *subpage = head + i; if (subpage == page) continue; @@ -2604,37 +2633,61 @@ bool can_split_huge_page(struct page *page, int *pextra_pins) * This function splits huge page into normal pages. @page can point to any * subpage of huge page to split. Split doesn't change the position of @page. * + * See split_huge_page_to_list_to_order() for more details. + * + * Returns 0 if the hugepage is split successfully. + * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under + * us. + */ +int split_huge_page_to_list(struct page *page, struct list_head *list) +{ + return split_huge_page_to_list_to_order(page, list, 0); +} + +/* + * This function splits huge page into pages in @new_order. @page can point to + * any subpage of huge page to split. Split doesn't change the position of + * @page. + * * Only caller must hold pin on the @page, otherwise split fails with -EBUSY. * The huge page must be locked. * * If @list is null, tail pages will be added to LRU list, otherwise, to @list. * - * Both head page and tail pages will inherit mapping, flags, and so on from - * the hugepage. + * Pages in new_order will inherit mapping, flags, and so on from the hugepage. * - * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if - * they are not mapped. + * GUP pin and PG_locked transferred to @page or the compound page @page belongs + * to. Rest subpages can be freed if they are not mapped. * * Returns 0 if the hugepage is split successfully. * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under * us. */ -int split_huge_page_to_list(struct page *page, struct list_head *list) +int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, + unsigned int new_order) { struct page *head = compound_head(page); struct pglist_data *pgdata = NODE_DATA(page_to_nid(head)); struct deferred_split *ds_queue = get_deferred_split_queue(head); - XA_STATE(xas, &head->mapping->i_pages, head->index); + /* reset xarray order to new order after split */ + XA_STATE_ORDER(xas, &head->mapping->i_pages, head->index, new_order); struct anon_vma *anon_vma = NULL; struct address_space *mapping = NULL; int count, mapcount, extra_pins, ret; unsigned long flags; pgoff_t end; + VM_BUG_ON(thp_order(head) <= new_order); VM_BUG_ON_PAGE(is_huge_zero_page(head), head); VM_BUG_ON_PAGE(!PageLocked(head), head); VM_BUG_ON_PAGE(!PageCompound(head), head); + /* Cannot split THP to order-1 (no order-1 THPs) */ + VM_BUG_ON(new_order == 1); + + /* Split anonymous THP to non-zero order not support */ + VM_BUG_ON(PageAnon(head) && new_order); + if (PageWriteback(head)) return -EBUSY; @@ -2720,18 +2773,22 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { if (!list_empty(page_deferred_list(head))) { ds_queue->split_queue_len--; - list_del(page_deferred_list(head)); + list_del_init(page_deferred_list(head)); } spin_unlock(&ds_queue->split_queue_lock); if (mapping) { if (PageSwapBacked(head)) __dec_lruvec_page_state(head, NR_SHMEM_THPS); - else + else if (!new_order) + /* + * Decrease THP stats only if split to normal + * pages + */ __mod_lruvec_page_state(head, NR_FILE_THPS, -thp_nr_pages(head)); } - __split_huge_page(page, list, end, flags); + __split_huge_page(page, list, end, flags, new_order); ret = 0; } else { if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { @@ -2746,7 +2803,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) fail: if (mapping) xas_unlock(&xas); spin_unlock_irqrestore(&pgdata->lru_lock, flags); - remap_page(head, thp_nr_pages(head)); + remap_page(head, thp_nr_pages(head), 1); ret = -EBUSY; } diff --git a/mm/swap.c b/mm/swap.c index b667870c8a0b..8f53ab593438 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -983,7 +983,6 @@ void lru_add_page_tail(struct page *page, struct page *page_tail, struct lruvec *lruvec, struct list_head *list) { VM_BUG_ON_PAGE(!PageHead(page), page); - VM_BUG_ON_PAGE(PageCompound(page_tail), page); VM_BUG_ON_PAGE(PageLRU(page_tail), page); lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock); From patchwork Thu Nov 19 16:06:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 328746 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B470FC6379F for ; Thu, 19 Nov 2020 16:07:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1699422253 for ; Thu, 19 Nov 2020 16:07:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=sent.com header.i=@sent.com header.b="O+FbeQZV"; dkim=temperror (0-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="QxKk4jGi" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728155AbgKSQGs (ORCPT ); Thu, 19 Nov 2020 11:06:48 -0500 Received: from wnew2-smtp.messagingengine.com ([64.147.123.27]:42993 "EHLO wnew2-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728011AbgKSQGk (ORCPT ); Thu, 19 Nov 2020 11:06:40 -0500 Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailnew.west.internal (Postfix) with ESMTP id 4C3739A7; Thu, 19 Nov 2020 11:06:38 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Thu, 19 Nov 2020 11:06:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=2kAA7JKtM0LAG LniO91Baa3+LL5SEw62I50GNU8Qx8M=; b=O+FbeQZVoV7W8KIaKkXLaGiTMIsop DmnkQYD+8L+63x7o6MyZgIuTLRxGqFNEr8dODmcXkV1ouvJWGsfRmwz39u9dz3bR lTxE5vuI8AP4hX8r50ZicCwB9OIOszq2VeKz9E6u59iGUUHy2+jzkkGD0k+GrL7a yQskYbF0+Zv/VOWFW0UAEKkx0zNTmKTxMbu7z/MnHsgYDMJd/tRyN6BuybBQ4rz/ q2Sf+K3YNtLKr1+oacLZTCFnX9yNjbxpYkT1d6DHzDq7JiCo5462PO26bYHrbuOx Y1b55JN+MZzyVSwQStXSDoHWZfqhzBkvPui8OsIF48d/lJiiIAR61NEhQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=2kAA7JKtM0LAGLniO91Baa3+LL5SEw62I50GNU8Qx8M=; b=QxKk4jGi +Ux36MDxnvR1hin/0kHifq4oqgG2YX70h8w0x68zowUSvCTqjms9t3Es8pHr/tZl UTE/gOgoh/qrNRK2aRNWPJV5Zd0ey0Mji7PhviLWDeP2CMLRNCANnmPYH0vdiRii kzclPTuemaTRjPWlcTVD2GTq6BKptncn0s/E2RmJ8Xe61v1lOY6tS9VwoZee1BRs DZysnvCxDH49ktYecBjMSI2OL/WYpUSJe+JXlf5+no9ulzi3hgglT9e/DLJuAh9r Rcpl7sbMyYoyy2OMAQN8jKapVAMNyE1BV23FtpmetMsgI9xH8N8wb1rHgoQN6wG7 fPYRQVbi3AOzRg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrudefjedgkeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkffojghfrhgggfestdhqredtredttdenucfhrhhomhepkghiucgj rghnuceoiihirdihrghnsehsvghnthdrtghomheqnecuggftrfgrthhtvghrnhepieejue dvueduuefhgefhheeiuedvtedvuefgieegveetueeiueehtdegudehfeelnecukfhppedu vddrgeeirddutdeirdduieegnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpe hmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomh X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id 4652D3064AB9; Thu, 19 Nov 2020 11:06:37 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, Matthew Wilcox , "Kirill A . Shutemov" Cc: Roman Gushchin , Andrew Morton , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Yang Shi , Michal Hocko , John Hubbard , Ralph Campbell , David Nellans , Zi Yan Subject: [PATCH 6/7] mm: truncate: split thp to a non-zero order if possible. Date: Thu, 19 Nov 2020 11:06:04 -0500 Message-Id: <20201119160605.1272425-7-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201119160605.1272425-1-zi.yan@sent.com> References: <20201119160605.1272425-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Zi Yan To minimize the number of pages after a truncation, when truncating a THP, we do not need to split it all the way down to order-0. The THP has at most three parts, the part before offset, the part to be truncated, the part left at the end. Use the non-zero minimum of them to decide what order we split the THP to. Signed-off-by: Zi Yan --- mm/truncate.c | 29 +++++++++++++++++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/mm/truncate.c b/mm/truncate.c index 20bd17538ec2..2e93d702f2c6 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -237,7 +237,9 @@ int truncate_inode_page(struct address_space *mapping, struct page *page) bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end) { loff_t pos = page_offset(page); - unsigned int offset, length; + unsigned int offset, length, remaining, min_subpage_size = PAGE_SIZE; + unsigned int new_order; + if (pos < start) offset = start - pos; @@ -248,6 +250,7 @@ bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end) length = length - offset; else length = end + 1 - pos - offset; + remaining = thp_size(page) - offset - length; wait_on_page_writeback(page); if (length == thp_size(page)) { @@ -267,7 +270,29 @@ bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end) do_invalidatepage(page, offset, length); if (!PageTransHuge(page)) return true; - return split_huge_page(page) == 0; + + /* + * find the non-zero minimum of offset, length, and remaining and use it + * to decide the new order of the page after split + */ + if (offset && remaining) + min_subpage_size = min_t(unsigned int, + min_t(unsigned int, offset, length), + remaining); + else if (!offset) + min_subpage_size = min_t(unsigned int, length, remaining); + else /* remaining == 0 */ + min_subpage_size = min_t(unsigned int, length, offset); + + min_subpage_size = max_t(unsigned int, PAGE_SIZE, min_subpage_size); + + new_order = ilog2(min_subpage_size/PAGE_SIZE); + + /* order-1 THP not supported, downgrade to order-0 */ + if (new_order == 1) + new_order = 0; + + return split_huge_page_to_list_to_order(page, NULL, new_order) == 0; } /* From patchwork Thu Nov 19 16:06:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 329484 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 928F0C56201 for ; Thu, 19 Nov 2020 16:07:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 29A1F246E0 for ; Thu, 19 Nov 2020 16:07:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=sent.com header.i=@sent.com header.b="eWpG8C0r"; dkim=temperror (0-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="LUWPoVjV" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727981AbgKSQGo (ORCPT ); Thu, 19 Nov 2020 11:06:44 -0500 Received: from wnew2-smtp.messagingengine.com ([64.147.123.27]:39295 "EHLO wnew2-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727990AbgKSQGk (ORCPT ); Thu, 19 Nov 2020 11:06:40 -0500 Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailnew.west.internal (Postfix) with ESMTP id AE7AAF1C; Thu, 19 Nov 2020 11:06:38 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Thu, 19 Nov 2020 11:06:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=pFCt70MxzaEzv 2lOS0NujyPfuPKcEbkEBkSMx2TXvm8=; b=eWpG8C0rUMP9kMFf2cQYzmxS1ksMO 25b80bTMc7RIyHLAVP1JANvL9jx+iy5+BGte0P7XAJieDLTxiAKH7GMiZd3q3lBv g6jXM8ohkMHloNsnUxn1x6QsnHMA+0GTtgCjmPGidfQY2ZqLBQVeiF11TwrTYM88 n849Ma58v6QalSdmAhASSOW6xwrcjAHoxydNRc75MxCADDwT+b8W3mIcw3Ekyz34 wyD+NvTp9Njrn+XxbV88NDHHXuo4Zyf7qS5zR04EIRoaFEERNQd6f6u6A4Ef9iCl JK4zRGzLcXAy6xSTr9dO564Z/2M/4pXLle013HWXwSlwinjDGwwCbZepA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=pFCt70MxzaEzv2lOS0NujyPfuPKcEbkEBkSMx2TXvm8=; b=LUWPoVjV qx6lVoTmyH7gtpmYeYoYqQMGehVMfebDGQABXf/WHWX9Q7LdRdsKYjPU6tD34ikE 0wiOcj84rTuixwmofGV4DMBMIvf0Nsy8GhEiCh/p2Nq7i8JZ1G24kAoMQHGMAHJz oOjIagRMkARl46k5TKWbYzlz7mCAXX+N85Bz/HSNTzdt0GDeKE3WgD/QlCVUosvW Tzi2yrcNgWwz0x0LD3rkYsqM5r1qYHcgee5niKGr928CqOBMeKb+j43rhCzlvWa7 v1mWmieG9gzq2sqPQoPUbNSBK9pcmYDhcAuiXdNYjgYCmDkWzQW7Qgya2kh08Rpu SyozUI/fasj4OA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrudefjedgkeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkffojghfrhgggfestdhqredtredttdenucfhrhhomhepkghiucgj rghnuceoiihirdihrghnsehsvghnthdrtghomheqnecuggftrfgrthhtvghrnhepieejue dvueduuefhgefhheeiuedvtedvuefgieegveetueeiueehtdegudehfeelnecukfhppedu vddrgeeirddutdeirdduieegnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpe hmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomh X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id 8C2633064ABC; Thu, 19 Nov 2020 11:06:37 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, Matthew Wilcox , "Kirill A . Shutemov" Cc: Roman Gushchin , Andrew Morton , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Yang Shi , Michal Hocko , John Hubbard , Ralph Campbell , David Nellans , Zi Yan Subject: [PATCH 7/7] mm: huge_memory: enable debugfs to split huge pages to any order. Date: Thu, 19 Nov 2020 11:06:05 -0500 Message-Id: <20201119160605.1272425-8-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201119160605.1272425-1-zi.yan@sent.com> References: <20201119160605.1272425-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Zi Yan It is used to test split_huge_page_to_list_to_order for pagecache THPs. Also add test cases for split_huge_page_to_list_to_order via both debugfs, truncating a file, and punching holes in a file. Signed-off-by: Zi Yan --- mm/huge_memory.c | 13 +- .../selftests/vm/split_huge_page_test.c | 192 ++++++++++++++++-- 2 files changed, 186 insertions(+), 19 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index cc70f70862d8..d6ce7be65fb2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2999,7 +2999,7 @@ static ssize_t split_huge_pages_in_range_pid_write(struct file *file, static DEFINE_MUTEX(mutex); ssize_t ret; char input_buf[80]; /* hold pid, start_vaddr, end_vaddr */ - int pid; + int pid, to_order = 0; unsigned long vaddr_start, vaddr_end, addr; nodemask_t task_nodes; struct mm_struct *mm; @@ -3016,8 +3016,9 @@ static ssize_t split_huge_pages_in_range_pid_write(struct file *file, goto out; input_buf[79] = '\0'; - ret = sscanf(input_buf, "%d,0x%lx,0x%lx", &pid, &vaddr_start, &vaddr_end); - if (ret != 3) { + ret = sscanf(input_buf, "%d,0x%lx,0x%lx,%d", &pid, &vaddr_start, &vaddr_end, &to_order); + /* cannot split to order-1 THP, which is not possible */ + if ((ret != 3 && ret != 4) || to_order == 1) { ret = -EINVAL; goto out; } @@ -3025,8 +3026,8 @@ static ssize_t split_huge_pages_in_range_pid_write(struct file *file, vaddr_end &= PAGE_MASK; ret = strlen(input_buf); - pr_debug("split huge pages in pid: %d, vaddr: [%lx - %lx]\n", - pid, vaddr_start, vaddr_end); + pr_debug("split huge pages in pid: %d, vaddr: [%lx - %lx], to order: %d\n", + pid, vaddr_start, vaddr_end, to_order); mm = find_mm_struct(pid, &task_nodes); if (IS_ERR(mm)) { @@ -3066,7 +3067,7 @@ static ssize_t split_huge_pages_in_range_pid_write(struct file *file, if (!trylock_page(page)) continue; - if (!split_huge_page(page)) + if (!split_huge_page_to_list_to_order(page, NULL, to_order)) split++; unlock_page(page); diff --git a/tools/testing/selftests/vm/split_huge_page_test.c b/tools/testing/selftests/vm/split_huge_page_test.c index cd2ced8c1261..bfd35ae9cfd2 100644 --- a/tools/testing/selftests/vm/split_huge_page_test.c +++ b/tools/testing/selftests/vm/split_huge_page_test.c @@ -16,6 +16,7 @@ #include #include #include +#include uint64_t pagesize; unsigned int pageshift; @@ -24,6 +25,7 @@ uint64_t pmd_pagesize; #define PMD_SIZE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size" #define SPLIT_DEBUGFS "/sys/kernel/debug/split_huge_pages_in_range_pid" #define SMAP_PATH "/proc/self/smaps" +#define THP_FS_PATH "/mnt/thp_fs" #define INPUT_MAX 80 #define PFN_MASK ((1UL<<55)-1) @@ -89,19 +91,20 @@ static int write_file(const char *path, const char *buf, size_t buflen) return (unsigned int) numwritten; } -static void write_debugfs(int pid, uint64_t vaddr_start, uint64_t vaddr_end) +static void write_debugfs(int pid, uint64_t vaddr_start, uint64_t vaddr_end, int order) { char input[INPUT_MAX]; int ret; - ret = snprintf(input, INPUT_MAX, "%d,0x%lx,0x%lx", pid, vaddr_start, - vaddr_end); + ret = snprintf(input, INPUT_MAX, "%d,0x%lx,0x%lx,%d", pid, vaddr_start, + vaddr_end, order); if (ret >= INPUT_MAX) { printf("%s: Debugfs input is too long\n", __func__); exit(EXIT_FAILURE); } - if (!write_file(SPLIT_DEBUGFS, input, ret + 1)) { + /* order == 1 is an invalid input that should be detected. */ + if (order != 1 && !write_file(SPLIT_DEBUGFS, input, ret + 1)) { perror(SPLIT_DEBUGFS); exit(EXIT_FAILURE); } @@ -118,7 +121,7 @@ static bool check_for_pattern(FILE *fp, const char *pattern, char *buf) return false; } -static uint64_t check_huge(void *addr) +static uint64_t check_huge(void *addr, const char *prefix) { uint64_t thp = 0; int ret; @@ -143,13 +146,13 @@ static uint64_t check_huge(void *addr) goto err_out; /* - * Fetch the AnonHugePages: in the same block and check the number of + * Fetch the @prefix in the same block and check the number of * hugepages. */ - if (!check_for_pattern(fp, "AnonHugePages:", buffer)) + if (!check_for_pattern(fp, prefix, buffer)) goto err_out; - if (sscanf(buffer, "AnonHugePages:%10ld kB", &thp) != 1) { + if (sscanf(&buffer[strlen(prefix)], "%10ld kB", &thp) != 1) { printf("Reading smap error\n"); exit(EXIT_FAILURE); } @@ -173,14 +176,14 @@ void split_pmd_thp(void) for (i = 0; i < len; i++) one_page[i] = (char)i; - thp_size = check_huge(one_page); + thp_size = check_huge(one_page, "AnonHugePages:"); if (!thp_size) { printf("No THP is allocatd"); exit(EXIT_FAILURE); } /* split all possible huge pages */ - write_debugfs(getpid(), (uint64_t)one_page, (uint64_t)one_page + len); + write_debugfs(getpid(), (uint64_t)one_page, (uint64_t)one_page + len, 0); for (i = 0; i < len; i++) if (one_page[i] != (char)i) { @@ -189,7 +192,7 @@ void split_pmd_thp(void) } - thp_size = check_huge(one_page); + thp_size = check_huge(one_page, "AnonHugePages:"); if (thp_size) { printf("Still %ld kB AnonHugePages not split\n", thp_size); exit(EXIT_FAILURE); @@ -237,7 +240,7 @@ void split_pte_mapped_thp(void) for (i = 0; i < len; i++) one_page[i] = (char)i; - thp_size = check_huge(one_page); + thp_size = check_huge(one_page, "AnonHugePages:"); if (!thp_size) { printf("No THP is allocatd"); exit(EXIT_FAILURE); @@ -270,7 +273,7 @@ void split_pte_mapped_thp(void) /* split all possible huge pages */ write_debugfs(getpid(), (uint64_t)pte_mapped, - (uint64_t)pte_mapped + pagesize * 4); + (uint64_t)pte_mapped + pagesize * 4, 0); /* smap does not show THPs after mremap, use kpageflags instead */ thp_size = 0; @@ -295,19 +298,182 @@ void split_pte_mapped_thp(void) close(kpageflags_fd); } +void create_pagecache_thp_and_fd(size_t fd_size, int *fd, char **addr) +{ + const char testfile[] = THP_FS_PATH "/test"; + size_t i; + int dummy; + + srand(time(NULL)); + + *fd = open(testfile, O_CREAT | O_RDWR, 0664); + if (*fd == -1) { + perror("Failed to create a file at "THP_FS_PATH); + exit(EXIT_FAILURE); + } + + for (i = 0; i < fd_size; i++) { + unsigned char byte = (unsigned char)i; + + write(*fd, &byte, sizeof(byte)); + } + close(*fd); + sync(); + *fd = open("/proc/sys/vm/drop_caches", O_WRONLY); + if (*fd == -1) { + perror("open drop_caches"); + exit(EXIT_FAILURE); + } + if (write(*fd, "3", 1) != 1) { + perror("write to drop_caches"); + exit(EXIT_FAILURE); + } + close(*fd); + + *fd = open(testfile, O_RDWR); + if (*fd == -1) { + perror("Failed to open a file at "THP_FS_PATH); + exit(EXIT_FAILURE); + } + + *addr = mmap(NULL, fd_size, PROT_READ|PROT_WRITE, MAP_SHARED, *fd, 0); + if (*addr == (char *)-1) { + perror("cannot mmap"); + exit(1); + } + madvise(*addr, fd_size, MADV_HUGEPAGE); + + for (size_t i = 0; i < fd_size; i++) + dummy += *(*addr + i); + + if (!check_huge(*addr, "FilePmdMapped:")) { + printf("No pagecache THP generated, please mount a filesystem " + "supporting pagecache THP at "THP_FS_PATH"\n"); + exit(EXIT_FAILURE); + } +} + +void split_thp_in_pagecache_to_order(size_t fd_size, int order) +{ + int fd; + char *addr; + size_t i; + + create_pagecache_thp_and_fd(fd_size, &fd, &addr); + + printf("split %ld kB pagecache page to order %d ... ", fd_size >> 10, order); + write_debugfs(getpid(), (uint64_t)addr, (uint64_t)addr + fd_size, order); + + for (i = 0; i < fd_size; i++) + if (*(addr + i) != (char)i) { + printf("%lu byte corrupted in the file\n"); + exit(EXIT_FAILURE); + } + + close(fd); + printf("done\n"); +} + +void truncate_thp_in_pagecache_to_order(size_t fd_size, int order) +{ + int fd; + char *addr; + size_t i; + + create_pagecache_thp_and_fd(fd_size, &fd, &addr); + + printf("truncate %ld kB pagecache page to size %lu kB ... ", fd_size >> 10, 4UL << order); + ftruncate(fd, pagesize << order); + + for (i = 0; i < (pagesize << order); i++) + if (*(addr + i) != (char)i) { + printf("%lu byte corrupted in the file\n"); + exit(EXIT_FAILURE); + } + + close(fd); + printf("done\n"); +} + +void punch_hole_in_pagecache_thp(size_t fd_size, off_t offset[], off_t len[], int n) +{ + int fd, j; + char *addr; + size_t i; + + create_pagecache_thp_and_fd(fd_size, &fd, &addr); + + for (j = 0; j < n; j++) { + printf("addr: %lx, punch a hole at offset %ld kB with len %ld kB ... ", + addr, offset[j] >> 10, len[j] >> 10); + fallocate(fd, FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE, offset[j], len[j]); + printf("done\n"); + } + + for (i = 0; i < fd_size; i++) { + int in_hole = 0; + + for (j = 0; j < n; j++) + if (i >= offset[j] && i <= (offset[j] + len[j])) { + in_hole = 1; + break; + } + + if (in_hole) + continue; + if (*(addr + i) != (char)i) { + printf("%lu byte corrupted in the file\n", i); + exit(EXIT_FAILURE); + } + } + + close(fd); + +} + int main(int argc, char **argv) { + int i; + size_t fd_size; + off_t offset[2], len[2]; + if (geteuid() != 0) { printf("Please run the benchmark as root\n"); exit(EXIT_FAILURE); } + setbuf(stdout, NULL); + pagesize = getpagesize(); pageshift = ffs(pagesize) - 1; pmd_pagesize = read_pmd_pagesize(); + fd_size = 2 * pmd_pagesize; split_pmd_thp(); split_pte_mapped_thp(); + for (i = 8; i >= 0; i--) + split_thp_in_pagecache_to_order(fd_size, i); + + /* + * for i is 1, truncate code in the kernel should create order-0 pages + * instead of order-1 THPs, since order-1 THP is not supported. No error + * is expected. + */ + for (i = 8; i >= 0; i--) + truncate_thp_in_pagecache_to_order(fd_size, i); + + offset[0] = 123 * pagesize; + offset[1] = 4 * pagesize; + len[0] = 200 * pagesize; + len[1] = 16 * pagesize; + punch_hole_in_pagecache_thp(fd_size, offset, len, 2); + + offset[0] = 259 * pagesize; + offset[1] = 33 * pagesize; + len[0] = 129 * pagesize; + len[1] = 16 * pagesize; + punch_hole_in_pagecache_thp(fd_size, offset, len, 2); + return 0; }