From patchwork Thu Jun 15 14:11:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muhammad Usama Anjum X-Patchwork-Id: 693160 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0055FEB64D9 for ; Thu, 15 Jun 2023 14:12:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240650AbjFOOMJ (ORCPT ); Thu, 15 Jun 2023 10:12:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230500AbjFOOMI (ORCPT ); Thu, 15 Jun 2023 10:12:08 -0400 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E22D211D; Thu, 15 Jun 2023 07:12:06 -0700 (PDT) Received: from localhost.localdomain (unknown [119.155.33.163]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id BFAD56606ECA; Thu, 15 Jun 2023 15:11:56 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1686838324; bh=yO5D5OhI46lT8VszB72WzEq061hKVVn3i8/5rBhwkEY=; h=From:To:Cc:Subject:Date:From; b=W47N2uF8fH9nGk6Qyw0NN4aZLM4RUTSfq0e3qiQEPeSVsRYlUUM7nDL+zFZTTjmBb BYw+P1coE0SIX0Gnh5sgrsJwqNkrqWCX0XZ9LfG8jlJs4D/FQXamofKaI/h/iHNPjh NOe98I1FfCG80zA2XgRyps4AvqyLHhgCgcIs2N3rRoeKGL1v6szMh/d3zG74GhjCu5 ZngeCcmNDwaxBbSLW2pLyUH8nzfQc1oznw3SPBpLj0W+IwFxYDucYznE4wPKBydRIl gZIIUwGnc+HKn9Vo3JtqNb/iD6PGIK+J4kTAToRi6Nz4D6Xc0Y7nxz0ZM6SYw9cpdw IvSeHgfLYXXcA== From: Muhammad Usama Anjum To: Peter Xu , David Hildenbrand , Andrew Morton , =?utf-8?b?TWljaGHFgiBNaXJvc8WC?= =?utf-8?b?YXc=?= , Andrei Vagin , Danylo Mocherniuk , Paul Gofman , Cyrill Gorcunov , Mike Rapoport , Nadav Amit Cc: Alexander Viro , Shuah Khan , Christian Brauner , Yang Shi , Vlastimil Babka , "Liam R . Howlett" , Yun Zhou , Suren Baghdasaryan , Alex Sierra , Muhammad Usama Anjum , Matthew Wilcox , Pasha Tatashin , Axel Rasmussen , "Gustavo A . R . Silva" , Dan Williams , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Greg KH , kernel@collabora.com Subject: [PATCH v19 0/5] Implement IOCTL to get and optionally clear info about PTEs Date: Thu, 15 Jun 2023 19:11:39 +0500 Message-Id: <20230615141144.665148-1-usama.anjum@collabora.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org *Changes in v19* - Minor changes and interface updates *Changes in v18* - Rebase on top of next-20230613 - Minor updates *Changes in v17* - Rebase on top of next-20230606 - Minor improvements in PAGEMAP_SCAN IOCTL patch *Changes in v16* - Fix a corner case - Add exclusive PM_SCAN_OP_WP back *Changes in v15* - Build fix (Add missed build fix in RESEND) *Changes in v14* - Fix build error caused by #ifdef added at last minute in some configs *Changes in v13* - Rebase on top of next-20230414 - Give-up on using uffd_wp_range() and write new helpers, flush tlb only once *Changes in v12* - Update and other memory types to UFFD_FEATURE_WP_ASYNC - Rebaase on top of next-20230406 - Review updates *Changes in v11* - Rebase on top of next-20230307 - Base patches on UFFD_FEATURE_WP_UNPOPULATED - Do a lot of cosmetic changes and review updates - Remove ENGAGE_WP + !GET operation as it can be performed with UFFDIO_WRITEPROTECT *Changes in v10* - Add specific condition to return error if hugetlb is used with wp async - Move changes in tools/include/uapi/linux/fs.h to separate patch - Add documentation *Changes in v9:* - Correct fault resolution for userfaultfd wp async - Fix build warnings and errors which were happening on some configs - Simplify pagemap ioctl's code *Changes in v8:* - Update uffd async wp implementation - Improve PAGEMAP_IOCTL implementation *Changes in v7:* - Add uffd wp async - Update the IOCTL to use uffd under the hood instead of soft-dirty flags *Motivation* The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of the pages that are written to in a region of virtual memory. This syscall is used in Windows applications and games etc. This syscall is being emulated in pretty slow manner in userspace. Our purpose is to enhance the kernel such that we translate it efficiently in a better way. Currently some out of tree hack patches are being used to efficiently emulate it in some kernels. We intend to replace those with these patches. So the whole gaming on Linux can effectively get benefit from this. It means there would be tons of users of this code. CRIU use case [2] was mentioned by Andrei and Danylo: > Use cases for migrating sparse VMAs are binaries sanitized with ASAN, > MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of > shadow memory [4]. Being able to migrate such binaries allows to highly > reduce the amount of work needed to identify and fix post-migration > crashes, which happen constantly. Andrei's defines the following uses of this code: * it is more granular and allows us to track changed pages more effectively. The current interface can clear dirty bits for the entire process only. In addition, reading info about pages is a separate operation. It means we must freeze the process to read information about all its pages, reset dirty bits, only then we can start dumping pages. The information about pages becomes more and more outdated, while we are processing pages. The new interface solves both these downsides. First, it allows us to read pte bits and clear the soft-dirty bit atomically. It means that CRIU will not need to freeze processes to pre-dump their memory. Second, it clears soft-dirty bits for a specified region of memory. It means CRIU will have actual info about pages to the moment of dumping them. * The new interface has to be much faster because basic page filtering is happening in the kernel. With the old interface, we have to read pagemap for each page. *Implementation Evolution (Short Summary)*