[v27,0/6] Implement IOCTL to get and optionally clear info about PTEs

Message ID	20230808104309.357852-1-usama.anjum@collabora.com
Headers	show Return-Path: <linux-kselftest-owner@vger.kernel.org> sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id 9CDAA66071FA; Tue, 8 Aug 2023 11:43:15 +0100 (BST) From: Muhammad Usama Anjum <usama.anjum@collabora.com> To: Peter Xu <peterx@redhat.com>, David Hildenbrand <david@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, =?utf-8?b?TWljaGHFgiBNaXJvc8WC?= =?utf-8?b?YXc=?= <emmir@google.com>, Andrei Vagin <avagin@gmail.com>, Danylo Mocherniuk <mdanylo@google.com>, Paul Gofman <pgofman@codeweavers.com>, Cyrill Gorcunov <gorcunov@gmail.com>, Mike Rapoport <rppt@kernel.org>, Nadav Amit <namit@vmware.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk>, Shuah Khan <shuah@kernel.org>, Christian Brauner <brauner@kernel.org>, Yang Shi <shy828301@gmail.com>, Vlastimil Babka <vbabka@suse.cz>, "Liam R . Howlett" <Liam.Howlett@Oracle.com>, Yun Zhou <yun.zhou@windriver.com>, Suren Baghdasaryan <surenb@google.com>, Alex Sierra <alex.sierra@amd.com>, Muhammad Usama Anjum <usama.anjum@collabora.com>, Matthew Wilcox <willy@infradead.org>, Pasha Tatashin <pasha.tatashin@soleen.com>, Axel Rasmussen <axelrasmussen@google.com>, "Gustavo A . R . Silva" <gustavoars@kernel.org>, Dan Williams <dan.j.williams@intel.com>, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Greg KH <gregkh@linuxfoundation.org>, kernel@collabora.com Subject: [PATCH v27 0/6] Implement IOCTL to get and optionally clear info about PTEs Date: Tue, 8 Aug 2023 15:43:03 +0500 Message-Id: <20230808104309.357852-1-usama.anjum@collabora.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Implement IOCTL to get and optionally clear info about PTEs \| expand [v27,0/6] Implement IOCTL to get and optionally clear info about PTEs [v27,1/6] userfaultfd: UFFD_FEATURE_WP_ASYNC [v27,2/6] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs [v27,3/6] fs/proc/task_mmu: Add fast paths to get/clear PAGE_IS_WRITTEN flag [v27,4/6] tools headers UAPI: Update linux/fs.h with the kernel sources [v27,5/6] mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL [v27,6/6] selftests: mm: add pagemap ioctl tests

*Changes in v27:* - Handle review comments and minor improvements - Add performance improvement patch on top with test for easy review *Changes in v26:* - Code re-structurring and API changes in PAGEMAP_IOCTL *Changes in v25*: - Do proper filtering on hole as well (hole got missed earlier) *Changes in v24*: - Rebase on top of next-20230710 - Place WP markers in case of hole as well *Changes in v23*: - Set vec_buf_index in loop only when vec_buf_index is set - Return -EFAULT instead of -EINVAL if vec is NULL - Correctly return the walk ending address to the page granularity *Changes in v22*: - Interface change: - Replace [start start + len) with [start, end) - Return the ending address of the address walk in start *Changes in v21*: - Abort walk instead of returning error if WP is to be performed on partial hugetlb *Changes in v20* - Correct PAGE_IS_FILE and add PAGE_IS_PFNZERO *Changes in v19* - Minor changes and interface updates *Changes in v18* - Rebase on top of next-20230613 - Minor updates *Changes in v17* - Rebase on top of next-20230606 - Minor improvements in PAGEMAP_SCAN IOCTL patch *Changes in v16* - Fix a corner case - Add exclusive PM_SCAN_OP_WP back *Changes in v15* - Build fix (Add missed build fix in RESEND) *Changes in v14* - Fix build error caused by #ifdef added at last minute in some configs *Changes in v13* - Rebase on top of next-20230414 - Give-up on using uffd_wp_range() and write new helpers, flush tlb only once *Changes in v12* - Update and other memory types to UFFD_FEATURE_WP_ASYNC - Rebaase on top of next-20230406 - Review updates *Changes in v11* - Rebase on top of next-20230307 - Base patches on UFFD_FEATURE_WP_UNPOPULATED - Do a lot of cosmetic changes and review updates - Remove ENGAGE_WP + !GET operation as it can be performed with UFFDIO_WRITEPROTECT *Changes in v10* - Add specific condition to return error if hugetlb is used with wp async - Move changes in tools/include/uapi/linux/fs.h to separate patch - Add documentation *Changes in v9:* - Correct fault resolution for userfaultfd wp async - Fix build warnings and errors which were happening on some configs - Simplify pagemap ioctl's code *Changes in v8:* - Update uffd async wp implementation - Improve PAGEMAP_IOCTL implementation *Changes in v7:* - Add uffd wp async - Update the IOCTL to use uffd under the hood instead of soft-dirty flags *Motivation* The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows GetWriteWatch() and ResetWriteWatch() syscalls [1]. The GetWriteWatch() retrieves the addresses of the pages that are written to in a region of virtual memory. This syscall is used in Windows applications and games etc. This syscall is being emulated in pretty slow manner in userspace. Our purpose is to enhance the kernel such that we translate it efficiently in a better way. Currently some out of tree hack patches are being used to efficiently emulate it in some kernels. We intend to replace those with these patches. So the whole gaming on Linux can effectively get benefit from this. It means there would be tons of users of this code. CRIU use case [2] was mentioned by Andrei and Danylo: > Use cases for migrating sparse VMAs are binaries sanitized with ASAN, > MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of > shadow memory [4]. Being able to migrate such binaries allows to highly > reduce the amount of work needed to identify and fix post-migration > crashes, which happen constantly. Andrei's defines the following uses of this code: * it is more granular and allows us to track changed pages more effectively. The current interface can clear dirty bits for the entire process only. In addition, reading info about pages is a separate operation. It means we must freeze the process to read information about all its pages, reset dirty bits, only then we can start dumping pages. The information about pages becomes more and more outdated, while we are processing pages. The new interface solves both these downsides. First, it allows us to read pte bits and clear the soft-dirty bit atomically. It means that CRIU will not need to freeze processes to pre-dump their memory. Second, it clears soft-dirty bits for a specified region of memory. It means CRIU will have actual info about pages to the moment of dumping them. * The new interface has to be much faster because basic page filtering is happening in the kernel. With the old interface, we have to read pagemap for each page. *Implementation Evolution (Short Summary)*

[v27,0/6] Implement IOCTL to get and optionally clear info about PTEs

Message