From patchwork Tue Mar 19 17:26:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gregory Price X-Patchwork-Id: 781446 Received: from mail-oa1-f65.google.com (mail-oa1-f65.google.com [209.85.160.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A54F120319; Tue, 19 Mar 2024 17:26:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710869183; cv=none; b=TQWn7pWxyN9Wkh6nceDgra8OT1pUKQoFX96k9Jc2ZpFStUXmIBqGBcav/e7ufgd0yj+UcVy638KYfCiUh+RfBNKjWYK9uDgJtitNvW3cpw1kBA5RZexOEfkATyiSm5jBwlqDNiM3i6J84h/FtME7IgiFq6bb0t0O9P8EWMylS9M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710869183; c=relaxed/simple; bh=Hr5VhpDIzSvZTgKGyGtzp2VwFfNc3smXgyRFyc4NErM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=lXjE5eBrXR673M33S4wzjQLcb1+BsqBujSuOiJ8PFZ8wGba/3f0GXSY9EfsXZSPNKwlTtMkwPCR0ZQDT4Lfn8JpDfJQ0LdR6GZOZryZBjkA0VmfoyIqBlmlL34v1/JTXZ2lG9DxKL1uUmtT3/jFJ5vm/rAq30sh3AeKCG4qOobo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=eyFMNJaI; arc=none smtp.client-ip=209.85.160.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eyFMNJaI" Received: by mail-oa1-f65.google.com with SMTP id 586e51a60fabf-22195b3a8fbso4203463fac.3; Tue, 19 Mar 2024 10:26:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710869181; x=1711473981; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eAq3HlkJfm7dGuDaSIp3bS27QlX0O1byv2nkTq/4SUQ=; b=eyFMNJaIolk6LmbctJRoNNCCo7nsQwYPRZcMeStL+ynXUtRG7HBQAG1YjOpmmF4CpC oHhnsG0EkUo756QTl1cp0DVxyoVOTgBGLyxs1HQdCpKAfPhhZOH3+lG2j65sn6Z8H6lT nO//UJ8vfCbaI92UBdBoSQOk6D0YpuDqsbNX/jxz6qhx/dLNSf3+lnuvmWZ48zHr0XmE oHrN46eTjXcN5FuB16919BaIqh7B8/IE9G5ynm3WEqmoNq2vTIPzMNkeHB2O1SDeUeyT 3CVCb5WrBcD78JkiADCcstnEcHXfGJN89QFObCjKeCya4z0kE2beTNryHoeqMTYJbtqa VYxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710869181; x=1711473981; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eAq3HlkJfm7dGuDaSIp3bS27QlX0O1byv2nkTq/4SUQ=; b=HCDS7RvRU62b0QB6l03G//SxpYGTp9/0vSIorQyKDmpPN1KlBUZVKEQIGHGqSIK1e9 Mrm6ccA+L6gVGHCCwFUQcuV2SD3eTAHLbsrRJ3FVq1A75ZIpnk5QkvQJRuwQtk5fStOh 8riX97gQvT4e17o5ubP6JJxFwMVqBPdFTpxJRTo3Jmva5I4039o5W36LKmVyF3e+SR0O SbDuqwDrboYRppbsmHyaODLPqXR1Rz+S+gIet2R5QSXzWZG9mURFGetovIEt631R1KXx foikb+7Ay+I3ghN+DOUbWzQYJbQQyHp94x+LXVOKMM/bi7yR/IBWzC6ElHrM/QHpIjf3 u52Q== X-Forwarded-Encrypted: i=1; AJvYcCWKgLR6tBBE7aru2xOXty0Bb6aMjzKM6RhKmECJojVXTxbQXnbceVEbth/hKe2JmZB3ULIZxk1u/mBxyAEmxGqqI+OHdESjIPJclMOD2GTQsV+go+Uu+IPUD8SY25Hs/aKe/0LvIcktSWb49zohczChlNkcTkeEn9zDmt3e8rJvAvYh1BFEXOUEmg== X-Gm-Message-State: AOJu0YzMo3Sp6fzzW1A6kBVKXvV5IapO9oMV5ibuClvnElMPT/c4+gu5 mlvrIfxWFRkmYd8x+h6l0booxYBfJIpwTgm/IyR71It4/lrdPaI= X-Google-Smtp-Source: AGHT+IHdLRRa8DiGq1l36i5KjGkKjNLCmYs8Ature6S/Na71QQXF8/ejewsK7UtTnQzrrFmuDArAMw== X-Received: by 2002:a05:6870:82a4:b0:222:12e4:f5e with SMTP id q36-20020a05687082a400b0022212e40f5emr16795849oae.58.1710869180782; Tue, 19 Mar 2024 10:26:20 -0700 (PDT) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id i20-20020aa787d4000000b006e57247f4e5sm10038173pfo.8.2024.03.19.10.26.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Mar 2024 10:26:20 -0700 (PDT) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-mm@kvack.org Cc: linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, ying.huang@intel.com, dan.j.williams@intel.com, honggyu.kim@sk.com, corbet@lwn.net, arnd@arndb.de, luto@kernel.org, akpm@linux-foundation.org, shuah@kernel.org, Gregory Price Subject: [RFC v3 1/3] mm/migrate: refactor add_page_for_migration for code re-use Date: Tue, 19 Mar 2024 13:26:07 -0400 Message-Id: <20240319172609.332900-2-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20240319172609.332900-1-gregory.price@memverge.com> References: <20240319172609.332900-1-gregory.price@memverge.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 add_page_for_migration presently does two actions: 1) validates the page is present and migratable 2) isolates the page from LRU and puts it into the migration list Break add_page_for_migration into 2 functions: add_page_for_migration - isolate the page from LUR and add to list add_virt_page_for_migration - validate the page and call the above add_page_for_migration does not require the mm_struct and so can be re-used for a physical addressing version of move_pages Signed-off-by: Gregory Price --- mm/migrate.c | 84 +++++++++++++++++++++++++++++++--------------------- 1 file changed, 50 insertions(+), 34 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index c27b1f8097d4..27071a07ffbb 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2066,6 +2066,46 @@ static int do_move_pages_to_node(struct list_head *pagelist, int node) return err; } +/* + * Isolates the page from the LRU and puts it into the given pagelist + * Returns: + * errno - if the page cannot be isolated + * 0 - when it doesn't have to be migrated because it is already on the + * target node + * 1 - when it has been queued + */ +static int add_page_for_migration(struct page *page, + struct folio *folio, + int node, + struct list_head *pagelist, + bool migrate_all) +{ + if (folio_is_zone_device(folio)) + return -ENOENT; + + if (folio_nid(folio) == node) + return 0; + + if (page_mapcount(page) > 1 && !migrate_all) + return -EACCES; + + if (folio_test_hugetlb(folio)) { + if (isolate_hugetlb(folio, pagelist)) + return 1; + return -EBUSY; + } + + if (!folio_isolate_lru(folio)) + return -EBUSY; + + list_add_tail(&folio->lru, pagelist); + node_stat_mod_folio(folio, + NR_ISOLATED_ANON + folio_is_file_lru(folio), + folio_nr_pages(folio)); + + return 1; +} + /* * Resolves the given address to a struct page, isolates it from the LRU and * puts it to the given pagelist. @@ -2075,19 +2115,19 @@ static int do_move_pages_to_node(struct list_head *pagelist, int node) * target node * 1 - when it has been queued */ -static int add_page_for_migration(struct mm_struct *mm, const void __user *p, - int node, struct list_head *pagelist, bool migrate_all) +static int add_virt_page_for_migration(struct mm_struct *mm, + const void __user *p, int node, struct list_head *pagelist, + bool migrate_all) { struct vm_area_struct *vma; unsigned long addr; struct page *page; struct folio *folio; - int err; + int err = -EFAULT; mmap_read_lock(mm); addr = (unsigned long)untagged_addr_remote(mm, p); - err = -EFAULT; vma = vma_lookup(mm, addr); if (!vma || !vma_migratable(vma)) goto out; @@ -2095,41 +2135,17 @@ static int add_page_for_migration(struct mm_struct *mm, const void __user *p, /* FOLL_DUMP to ignore special (like zero) pages */ page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP); - err = PTR_ERR(page); - if (IS_ERR(page)) - goto out; - err = -ENOENT; if (!page) goto out; - folio = page_folio(page); - if (folio_is_zone_device(folio)) - goto out_putfolio; - - err = 0; - if (folio_nid(folio) == node) - goto out_putfolio; + err = PTR_ERR(page); + if (IS_ERR(page)) + goto out; - err = -EACCES; - if (page_mapcount(page) > 1 && !migrate_all) - goto out_putfolio; + folio = page_folio(page); + err = add_page_for_migration(page, folio, node, pagelist, migrate_all); - err = -EBUSY; - if (folio_test_hugetlb(folio)) { - if (isolate_hugetlb(folio, pagelist)) - err = 1; - } else { - if (!folio_isolate_lru(folio)) - goto out_putfolio; - - err = 1; - list_add_tail(&folio->lru, pagelist); - node_stat_mod_folio(folio, - NR_ISOLATED_ANON + folio_is_file_lru(folio), - folio_nr_pages(folio)); - } -out_putfolio: /* * Either remove the duplicate refcount from folio_isolate_lru() * or drop the folio ref if it was not isolated. @@ -2229,7 +2245,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, * Errors in the page lookup or isolation are not fatal and we simply * report them via status */ - err = add_page_for_migration(mm, p, current_node, &pagelist, + err = add_virt_page_for_migration(mm, p, current_node, &pagelist, flags & MPOL_MF_MOVE_ALL); if (err > 0) { From patchwork Tue Mar 19 17:26:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gregory Price X-Patchwork-Id: 781238 Received: from mail-oa1-f68.google.com (mail-oa1-f68.google.com [209.85.160.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6FC6425776; Tue, 19 Mar 2024 17:26:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710869186; cv=none; b=UFdun/kY/GrVrqhMvuo8pC+l+IBCuwJk3InpxEfOEZduGWIvbBfKEhHgKCZvnYflIZCZvSR0jf9TneZHX86jOrZZ6N1fhmDkiN1mPmVroo4fnL4Ox73Ocy8K4Qjz3T0K35fwoje/94InnpyQuXX7TtU0e5JbEWQuX1R4mCXbEyw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710869186; c=relaxed/simple; bh=qYT9MHaFOgE6lrZ48XEaUc4YveDSVDGuV4lc/tno//I=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WFIN2vYzxRJjUVs8MqcI9W7WJuEEaa6VoHK4ApZypMYgfG2iSi46/5SyAPpI2v4ZUNZ9qeRG40UDr5IJwVRZGPhgCwzyiWfogNe3bzbAqrApOwG9i5NWGLXIuAtcLPXb3vKHAD3DNuivDLvXau8MJZrvSReYJycZv+zN1StEHDU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ZfWpJZwv; arc=none smtp.client-ip=209.85.160.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ZfWpJZwv" Received: by mail-oa1-f68.google.com with SMTP id 586e51a60fabf-22195b3a8fbso4203495fac.3; Tue, 19 Mar 2024 10:26:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710869183; x=1711473983; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FTBDieuF540mEJlMkTXZ9hm5jGiJQwQ7vVCPHWF7qjI=; b=ZfWpJZwv6jbdgE0B001kI5sQV4yC/Vr2ck1GEYPZgiGqEqleqXTFXi7BGcgVmtYkAv ruQ2iZyRwfvPYE4yVFVBhqEIt3SNLausmaQrC9ZYN3WmIXmOrkNdrzte3v30vzacGJXK HNl4tVVhAWqG7KKcqxGfVcBq7ZgbeWEZu8Hqt6N7KECwKm+/MN2YKO25jJjN6FtbqJC8 bVMWlFYkVj+DOzYyYY0+/5vfoRTi8z8ifi8Ns0LX935ab8FzsXWeuuSwnG5yYWeI37G0 GBvfFCE65ClqN5atXZPVQ1cn4svtXN5SG1FMJLc9h78VZBFaWkMR1+mtdoij9i/mTCUW 866g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710869183; x=1711473983; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FTBDieuF540mEJlMkTXZ9hm5jGiJQwQ7vVCPHWF7qjI=; b=TYwME+obtC8ERBw26lYk8pWEhAvg+XEWFTm6Zg50e3OlqVP9DO7VDhu0IUVobt/z8Z dZFYtSIVSn52puJUIqFkvXtJpNdLuT7hyti/QJjIrwhHbpb+QUWt0pHmwFIZEitD17AL tBVVJKSAdzc+9jPttF0IYPwXyOgk9vwBPSKezaMwp/N3Na6uMsbcH7ulpClzr3P0xFPq sL4cIGOV+Ig5vhqWDdkKAvYidmX3nNaEugj2bV8hToyR7RfW5io07ESKooeN1uvslA5R dsqyDT95Get3d3La1LlUVQeBT+yDZ29AQhIl7XBHZVZkYly138xkutfYdP2ORoOaDKC0 VtzA== X-Forwarded-Encrypted: i=1; AJvYcCXgAGi9msfYeU7f6i5QlZ7SbblCVwuM2ex6vHXo4z1ojdMsL2RSE2q0tvc5H32fYf9TTVmrrITvyt7pjItLxKXbpe/oAuZUBuJ7aJc9tAcT0Ka7VS9JB5HGw87upPqNlF29HRly2MT897KDc4Lnw7ReppNY7H5pb3zc3mKj10SH2aDp9y/f4c2uNg== X-Gm-Message-State: AOJu0YykAbinMYXsGBqLVYGEP7zbPPGHFnt1jtddy2u/rVA2kskd71o4 QS+P4bJT2qzBNlxz0Rcc2p+47BGD75Daa7q54pdRQYbbkw6x67E= X-Google-Smtp-Source: AGHT+IGaf/uHJ6L5pUMmTCczrFA7c5N7kDE89UuogEKnPmdGioB4eNgXxXmIIaoScGX6cDBDJfl0cw== X-Received: by 2002:a05:6871:33a8:b0:222:1353:ad0f with SMTP id ng40-20020a05687133a800b002221353ad0fmr19460724oac.24.1710869183545; Tue, 19 Mar 2024 10:26:23 -0700 (PDT) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id i20-20020aa787d4000000b006e57247f4e5sm10038173pfo.8.2024.03.19.10.26.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Mar 2024 10:26:23 -0700 (PDT) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-mm@kvack.org Cc: linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, ying.huang@intel.com, dan.j.williams@intel.com, honggyu.kim@sk.com, corbet@lwn.net, arnd@arndb.de, luto@kernel.org, akpm@linux-foundation.org, shuah@kernel.org, Gregory Price Subject: [RFC v3 2/3] mm/migrate: Create move_phys_pages syscall Date: Tue, 19 Mar 2024 13:26:08 -0400 Message-Id: <20240319172609.332900-3-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20240319172609.332900-1-gregory.price@memverge.com> References: <20240319172609.332900-1-gregory.price@memverge.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Similar to the move_pages system call, instead of taking a pid and list of virtual addresses, this system call takes a list of physical addresses. Because there is no task to validate the memory policy against, each page needs to be interrogated to determine whether the migration is valid, and all tasks that map it need to be interrogated. This is accomplished in via a rmap_walk on the folio containing the page, and an interrogation of all tasks that map the page (by way of each task's vma). Each page must be interrogated individually, which should be considered when using this to migrate shared regions. The remaining logic is the same as the move_pages syscall. One change to do_pages_move is made (to check whether an mm_struct is passed) in order to re-use the existing migration code. Signed-off-by: Gregory Price --- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + include/linux/syscalls.h | 5 + include/uapi/asm-generic/unistd.h | 8 +- kernel/sys_ni.c | 1 + mm/migrate.c | 206 +++++++++++++++++++++++- tools/include/uapi/asm-generic/unistd.h | 8 +- 7 files changed, 222 insertions(+), 8 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 5f8591ce7f25..250c00281029 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -466,3 +466,4 @@ 459 i386 lsm_get_self_attr sys_lsm_get_self_attr 460 i386 lsm_set_self_attr sys_lsm_set_self_attr 461 i386 lsm_list_modules sys_lsm_list_modules +462 i386 move_phys_pages sys_move_phys_pages diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 7e8d46f4147f..a928df7c6f52 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -383,6 +383,7 @@ 459 common lsm_get_self_attr sys_lsm_get_self_attr 460 common lsm_set_self_attr sys_lsm_set_self_attr 461 common lsm_list_modules sys_lsm_list_modules +462 common move_phys_pages sys_move_phys_pages # # Due to a historical design error, certain syscalls are numbered differently diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 77eb9b0e7685..575ba9d26e30 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -840,6 +840,11 @@ asmlinkage long sys_move_pages(pid_t pid, unsigned long nr_pages, const int __user *nodes, int __user *status, int flags); +asmlinkage long sys_move_phys_pages(unsigned long nr_pages, + const void __user * __user *pages, + const int __user *nodes, + int __user *status, + int flags); asmlinkage long sys_rt_tgsigqueueinfo(pid_t tgid, pid_t pid, int sig, siginfo_t __user *uinfo); asmlinkage long sys_perf_event_open( diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 75f00965ab15..13bc8dd16d6b 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -842,8 +842,14 @@ __SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr) #define __NR_lsm_list_modules 461 __SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules) +/* CONFIG_MMU only */ +#ifndef __ARCH_NOMMU +#define __NR_move_phys_pages 462 +__SYSCALL(__NR_move_phys_pages, sys_move_phys_pages) +#endif + #undef __NR_syscalls -#define __NR_syscalls 462 +#define __NR_syscalls 463 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index faad00cce269..254915fd1e2c 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -196,6 +196,7 @@ COND_SYSCALL(migrate_pages); COND_SYSCALL(move_pages); COND_SYSCALL(set_mempolicy_home_node); COND_SYSCALL(cachestat); +COND_SYSCALL(move_phys_pages); COND_SYSCALL(perf_event_open); COND_SYSCALL(accept4); diff --git a/mm/migrate.c b/mm/migrate.c index 27071a07ffbb..7213703441f8 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2182,9 +2182,119 @@ static int move_pages_and_store_status(int node, return store_status(status, start, node, i - start); } +struct rmap_page_ctxt { + bool found; + bool migratable; + bool node_allowed; + int node; +}; + +/* + * Walks each vma mapping a given page and determines if those + * vma's are both migratable, and that the target node is within + * the allowed cpuset of the owning task. + */ +static bool phys_page_migratable(struct folio *folio, + struct vm_area_struct *vma, + unsigned long address, + void *arg) +{ + struct rmap_page_ctxt *ctxt = arg; +#ifdef CONFIG_MEMCG + struct task_struct *owner = vma->vm_mm->owner; + nodemask_t task_nodes = cpuset_mems_allowed(owner); +#else + nodemask_t task_nodes = node_possible_map; +#endif + + ctxt->found = true; + ctxt->migratable &= vma_migratable(vma); + ctxt->node_allowed &= node_isset(ctxt->node, task_nodes); + + return ctxt->migratable && ctxt->node_allowed; +} + +static struct folio *phys_migrate_get_folio(struct page *page) +{ + struct folio *folio; + + folio = page_folio(page); + if (!folio_test_lru(folio) || !folio_try_get(folio)) + return NULL; + if (unlikely(page_folio(page) != folio || !folio_test_lru(folio))) { + folio_put(folio); + folio = NULL; + } + return folio; +} + +/* + * Validates the physical address is online and migratable. Walks the folio + * containing the page to validate the vma is migratable and the cpuset node + * restrictions. Then calls add_page_for_migration to isolate it from the + * LRU and place it into the given pagelist. + * Returns: + * errno - if the page is not online, migratable, or can't be isolated + * 0 - when it doesn't have to be migrated because it is already on the + * target node + * 1 - when it has been queued + */ +static int add_phys_page_for_migration(const void __user *p, int node, + struct list_head *pagelist, + bool migrate_all) +{ + unsigned long pfn; + struct page *page; + struct folio *folio; + int err; + struct rmap_page_ctxt rmctxt = { + .found = false, + .migratable = true, + .node_allowed = true, + .node = node + }; + struct rmap_walk_control rwc = { + .rmap_one = phys_page_migratable, + .arg = &rmctxt + }; + + pfn = ((unsigned long)p) >> PAGE_SHIFT; + page = pfn_to_online_page(pfn); + if (!page || PageTail(page)) + return -ENOENT; + + folio = phys_migrate_get_folio(page); + if (!folio) + return -ENOENT; + + rmap_walk(folio, &rwc); + + if (!rmctxt.found) + err = -ENOENT; + else if (!rmctxt.migratable) + err = -EFAULT; + else if (!rmctxt.node_allowed) + err = -EACCES; + else + err = add_page_for_migration(page, folio, node, pagelist, + migrate_all); + + folio_put(folio); + + return err; +} + /* * Migrate an array of page address onto an array of nodes and fill * the corresponding array of status. + * + * When the mm argument is not NULL, task_nodes is expected to be the + * cpuset nodemask for the task which owns the mm_struct, and the + * values located in (*pages) are expected to be virtual addresses. + * + * When the mm argument is NULL, the values located at (*pages) are + * expected to be physical addresses, and task_nodes is expected to + * be empty. */ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, unsigned long nr_pages, @@ -2226,7 +2336,14 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, goto out_flush; err = -EACCES; - if (!node_isset(node, task_nodes)) + /* + * if mm is NULL, then the pages are addressed via physical + * address and the task_nodes structure is empty. Validation + * of migratability is deferred to add_phys_page_for_migration + * where vma's that map the address will have their node_mask + * checked to ensure the requested node bit is set. + */ + if (mm && !node_isset(node, task_nodes)) goto out_flush; if (current_node == NUMA_NO_NODE) { @@ -2243,10 +2360,17 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, /* * Errors in the page lookup or isolation are not fatal and we simply - * report them via status + * report them via status. + * + * If mm is NULL, then p treated as is a physical address. */ - err = add_virt_page_for_migration(mm, p, current_node, &pagelist, - flags & MPOL_MF_MOVE_ALL); + if (mm) + err = add_virt_page_for_migration(mm, p, current_node, &pagelist, + flags & MPOL_MF_MOVE_ALL); + else + err = add_phys_page_for_migration(p, current_node, &pagelist, + flags & MPOL_MF_MOVE_ALL); + if (err > 0) { /* The page is successfully queued for migration */ @@ -2334,6 +2458,37 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages, mmap_read_unlock(mm); } +/* + * Determine the nodes pages pointed to by the physical addresses in the + * pages array, and store those node values in the status array + */ +static void do_phys_pages_stat_array(unsigned long nr_pages, + const void __user **pages, int *status) +{ + unsigned long i; + + for (i = 0; i < nr_pages; i++) { + unsigned long pfn = (unsigned long)(*pages) >> PAGE_SHIFT; + struct page *page = pfn_to_online_page(pfn); + int err = -ENOENT; + + if (!page) + goto set_status; + + get_page(page); + + if (!is_zone_device_page(page)) + err = page_to_nid(page); + + put_page(page); +set_status: + *status = err; + + pages++; + status++; + } +} + static int get_compat_pages_array(const void __user *chunk_pages[], const void __user * __user *pages, unsigned long chunk_nr) @@ -2376,7 +2531,10 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages, break; } - do_pages_stat_array(mm, chunk_nr, chunk_pages, chunk_status); + if (mm) + do_pages_stat_array(mm, chunk_nr, chunk_pages, chunk_status); + else + do_phys_pages_stat_array(chunk_nr, chunk_pages, chunk_status); if (copy_to_user(status, chunk_status, chunk_nr * sizeof(*status))) break; @@ -2449,7 +2607,7 @@ static int kernel_move_pages(pid_t pid, unsigned long nr_pages, nodemask_t task_nodes; /* Check flags */ - if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL)) + if (flags & ~(MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) return -EINVAL; if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE)) @@ -2477,6 +2635,42 @@ SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages, return kernel_move_pages(pid, nr_pages, pages, nodes, status, flags); } +/* + * Move a list of physically-addressed pages to the list of target nodes + */ +static int kernel_move_phys_pages(unsigned long nr_pages, + const void __user * __user *pages, + const int __user *nodes, + int __user *status, int flags) +{ + nodemask_t dummy_nodes; + + if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL)) + return -EINVAL; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (!nodes) + return do_pages_stat(NULL, nr_pages, pages, status); + + /* + * When the mm argument to do_pages_move is null, the task_nodes + * argument is ignored, so pass in an empty nodemask as a dummy. + */ + nodes_clear(dummy_nodes); + return do_pages_move(NULL, dummy_nodes, nr_pages, pages, nodes, status, + flags); +} + +SYSCALL_DEFINE5(move_phys_pages, unsigned long, nr_pages, + const void __user * __user *, pages, + const int __user *, nodes, + int __user *, status, int, flags) +{ + return kernel_move_phys_pages(nr_pages, pages, nodes, status, flags); +} + #ifdef CONFIG_NUMA_BALANCING /* * Returns true if this is a safe migration target node for misplaced NUMA diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h index 75f00965ab15..13bc8dd16d6b 100644 --- a/tools/include/uapi/asm-generic/unistd.h +++ b/tools/include/uapi/asm-generic/unistd.h @@ -842,8 +842,14 @@ __SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr) #define __NR_lsm_list_modules 461 __SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules) +/* CONFIG_MMU only */ +#ifndef __ARCH_NOMMU +#define __NR_move_phys_pages 462 +__SYSCALL(__NR_move_phys_pages, sys_move_phys_pages) +#endif + #undef __NR_syscalls -#define __NR_syscalls 462 +#define __NR_syscalls 463 /* * 32 bit systems traditionally used different From patchwork Tue Mar 19 17:26:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gregory Price X-Patchwork-Id: 781445 Received: from mail-pf1-f194.google.com (mail-pf1-f194.google.com [209.85.210.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2182C2B9C4; Tue, 19 Mar 2024 17:26:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.194 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710869188; cv=none; b=tMs1A6esucphgkSobaCdWzY96SGrhxQEe/nuNJZ9q0iVx52lB6KBQ2d5hSui+4Edf630IQ4COiGLjOCEg7q5wEcL1WiXZ/0rIZ2UV2fygCOopFY0JWLjwwTG9bmH9owQem7fI3uOkX5ZI1txv9eJ2R+wxrfRbYcF2ofrHIcpvP4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710869188; c=relaxed/simple; bh=9vsdrHOdvU595S95xdDkff2jQzfQkUKHeLLqhOfOYBo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=t9UPwjwN5FhFmi6yLPD+T19lsHFTBj7XeeUvkd6e+J5Nd03vNzAv6iKtaL6MxSp2bdLqG/GdZNuPsTSlA3pK0+J9xz/j2MUgw7SoJMTG+X+BsmaGJKM28szDGJbEtc6tc7Vl9QZYLI/jTWZl+XXcSQPIJR1BrM6uNJbKIugyllQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VJhbVJTi; arc=none smtp.client-ip=209.85.210.194 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VJhbVJTi" Received: by mail-pf1-f194.google.com with SMTP id d2e1a72fcca58-6e7425a6714so879970b3a.0; Tue, 19 Mar 2024 10:26:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710869186; x=1711473986; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mICNFxVbuAWoATTcVLf7/7w/j3oPrAjNRgOFqEMU8tA=; b=VJhbVJTi+vb9ithpCZ6RdpkDpxO7aUMTN4nLGM+c3Vwzg1yUEJCKUJRdzNWYk3uchB FqZxs/KdjZVHQsw5xRIPlvN3BV+lFFadX1oh89lAMq8uHNBnSYiexAm5Hvu2W6/Fs1QT sbI9lugyUakEvw+vY3RtAwRyNwBorLlZFwv5KjQNIJYopw30FNdduAJlrVCEsQNoGIJX mSrY1YhmDWas1jkzySdb3A3oNTEv64lnt1L3n3ZDf5loeJ1ogipksQBShMu2KiQ3O9SN VpP1eG6yJj0PclsIBsPBeupiEMhPbNfUlAfApNA2rXoEBRE/xoTUf3QHKur8FQ1Ka2Qx VPlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710869186; x=1711473986; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mICNFxVbuAWoATTcVLf7/7w/j3oPrAjNRgOFqEMU8tA=; b=Sfrhz+FAXU5krhUSBpPm9njrlNQCjWdKz6+XoFTPN0DDomakIWM+/146Du2gltu3L6 +oaKD37Ge8gd+CcbQdZhKk5XBreq2a8pQCcXxmB5n7alaKY7RhdMnB0Yopzg1jcHtPqp 9HSYdH1M0I3s9G/u3kBCoW2jTIxQZKKyiN7QKmXTXs/TBaZw4J0qKLTRBlZOZ96AowGg eSClxIrhdiJ5BV8uLufAJlBKhUU8q2zTpLqeHzgvvSI/iZm5nkIj7wyez0fYxbbme3k0 ugorOKI1exrGGH4lVLPw8V8WU4HBTXJKCSpE+tkj7uiJgy+rccPAWhIlk8/705yig46y FGaA== X-Forwarded-Encrypted: i=1; AJvYcCWIGNafoVRFp1CsnZwHezj4fem1B30mNzICC6vNyqz7L5Z4ENzY6heL2PRshiavjmrFHExQvTUinUQm2gCYaQkn6v/UFydWJq7QFS8sqJBsy55acQAYee0jJzUK54laHjBWhXUlm+c1Tcc2pjDsmwPXXhxGA1FkWam1eNlCgFejOmI0TR5dl69PNg== X-Gm-Message-State: AOJu0Yw+XOmg8fuHaSr4UvDHkjQ/DPS/fWfkjVc/PQH/E1pdPWcJFmng xJ/LTSayst9nyr3D19Mko0n0XkAB6BLVsO2cdQkrKPxjjxzb1zA= X-Google-Smtp-Source: AGHT+IEkv9g+J0IjH2Wlhic/LwJ5mFDbtJLCc/rqi59tk5VwUYFC/wW8XfNIU6Pv1Kca1IuACtmZRQ== X-Received: by 2002:a05:6a00:99c:b0:6e6:976d:7f53 with SMTP id u28-20020a056a00099c00b006e6976d7f53mr3783163pfg.16.1710869186329; Tue, 19 Mar 2024 10:26:26 -0700 (PDT) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id i20-20020aa787d4000000b006e57247f4e5sm10038173pfo.8.2024.03.19.10.26.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Mar 2024 10:26:26 -0700 (PDT) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-mm@kvack.org Cc: linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, ying.huang@intel.com, dan.j.williams@intel.com, honggyu.kim@sk.com, corbet@lwn.net, arnd@arndb.de, luto@kernel.org, akpm@linux-foundation.org, shuah@kernel.org, Gregory Price Subject: [RFC v3 3/3] ktest: sys_move_phys_pages ktest Date: Tue, 19 Mar 2024 13:26:09 -0400 Message-Id: <20240319172609.332900-4-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20240319172609.332900-1-gregory.price@memverge.com> References: <20240319172609.332900-1-gregory.price@memverge.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Implement simple ktest that looks up the physical address via /proc/self/pagemap and migrates the page based on that information. Signed-off-by: Gregory Price --- tools/testing/selftests/mm/migration.c | 99 ++++++++++++++++++++++++++ 1 file changed, 99 insertions(+) diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c index 6908569ef406..c005c98dbdc1 100644 --- a/tools/testing/selftests/mm/migration.c +++ b/tools/testing/selftests/mm/migration.c @@ -5,6 +5,8 @@ */ #include "../kselftest_harness.h" +#include +#include #include #include #include @@ -14,11 +16,17 @@ #include #include #include +#include #define TWOMEG (2<<20) #define RUNTIME (20) +#define GET_BIT(X, Y) ((X & ((uint64_t)1<> Y) +#define GET_PFN(X) (X & 0x7FFFFFFFFFFFFFull) #define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1))) +#define PAGEMAP_ENTRY 8 +const int __endian_bit = 1; +#define is_bigendian() ((*(char *)&__endian_bit) == 0) FIXTURE(migration) { @@ -94,6 +102,45 @@ int migrate(uint64_t *ptr, int n1, int n2) return 0; } +int migrate_phys(uint64_t paddr, int n1, int n2) +{ + int ret, tmp; + int status = 0; + struct timespec ts1, ts2; + + if (clock_gettime(CLOCK_MONOTONIC, &ts1)) + return -1; + + while (1) { + if (clock_gettime(CLOCK_MONOTONIC, &ts2)) + return -1; + + if (ts2.tv_sec - ts1.tv_sec >= RUNTIME) + return 0; + + /* + * FIXME: move_phys_pages was syscall 462 during RFC. + * Update this when an official syscall number is adopted + * and the libnuma interface is implemented. + */ + ret = syscall(462, 1, (void **) &paddr, &n2, &status, + MPOL_MF_MOVE_ALL); + if (ret) { + if (ret > 0) + printf("Didn't migrate %d pages\n", ret); + else + perror("Couldn't migrate pages"); + return -2; + } + + tmp = n2; + n2 = n1; + n1 = tmp; + } + + return 0; +} + void *access_mem(void *ptr) { volatile uint64_t y = 0; @@ -199,4 +246,56 @@ TEST_F_TIMEOUT(migration, private_anon_thp, 2*RUNTIME) ASSERT_EQ(pthread_cancel(self->threads[i]), 0); } +/* + * Same as the basic migration, but test move_phys_pages. + */ +TEST_F_TIMEOUT(migration, phys_addr, 2*RUNTIME) +{ + uint64_t *ptr; + uint64_t pagemap_val, paddr, file_offset; + unsigned char c_buf[PAGEMAP_ENTRY]; + int i, c, status; + FILE *f; + + if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) + SKIP(return, "Not enough threads or NUMA nodes available"); + + ptr = mmap(NULL, TWOMEG, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + ASSERT_NE(ptr, MAP_FAILED); + + memset(ptr, 0xde, TWOMEG); + + /* PFN of ptr from /proc/self/pagemap */ + f = fopen("/proc/self/pagemap", "rb"); + file_offset = ((uint64_t)ptr) / getpagesize() * PAGEMAP_ENTRY; + status = fseek(f, file_offset, SEEK_SET); + ASSERT_EQ(status, 0); + for (i = 0; i < PAGEMAP_ENTRY; i++) { + c = getc(f); + ASSERT_NE(c, EOF); + /* handle endiand differences */ + if (is_bigendian()) + c_buf[i] = c; + else + c_buf[PAGEMAP_ENTRY - i - 1] = c; + } + fclose(f); + + for (i = 0; i < PAGEMAP_ENTRY; i++) + pagemap_val = (pagemap_val << 8) + c_buf[i]; + + ASSERT_TRUE(GET_BIT(pagemap_val, 63)); + /* This reports a pfn, we need to shift this by page size */ + paddr = GET_PFN(pagemap_val) << __builtin_ctz(getpagesize()); + + for (i = 0; i < self->nthreads - 1; i++) + if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) + perror("Couldn't create thread"); + + ASSERT_EQ(migrate_phys(paddr, self->n1, self->n2), 0); + for (i = 0; i < self->nthreads - 1; i++) + ASSERT_EQ(pthread_cancel(self->threads[i]), 0); +} + TEST_HARNESS_MAIN