From patchwork Tue Jan 24 05:42:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alistair Popple X-Patchwork-Id: 646834 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A65CC54EB4 for ; Tue, 24 Jan 2023 05:43:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232680AbjAXFns (ORCPT ); Tue, 24 Jan 2023 00:43:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38426 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232036AbjAXFnq (ORCPT ); Tue, 24 Jan 2023 00:43:46 -0500 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2065.outbound.protection.outlook.com [40.107.94.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B16CD3B0FD; Mon, 23 Jan 2023 21:43:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kFiXQfbvBkmWql4u3YiSkcsVXxx4fdxDzFh+cfG0F7YG8RKF2myht+SFIZWPiJ90I+y1/CVQqAgI0vhDNpGIJz0XjEoBz37W/5V8S7cisFLFCvVaqLctGASNfBvYQvgM6t2zZmoUQNV9mmMg8MkP5VXod+lriaP92oQcMXTyqj6OsV8yyxcnC3Phl4VT+c+APnMGU4Uq+SPs/v/KQ+O550bl1MhAmXjlUFPbDMFi0GJW0ZRoc2d7VpmIa3N08a8bztR4bmhdkD3kh9siSvj+Sm5EKd0HARzz46HJoEUBQ5G8J2JwaILF0SURY0EpY42U1VgVoQmSYxIvTIF3DjFHJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9MtXqt5AI6xt4ICbluiJvN2UYWO1rvf+ghlewsHqu9o=; b=Lqn6Uk7KNBKGCHzoBnigDNo1RiNUI8Hb/hygwqIRZayF1fZm0PeeQoBCWA9nipmIFQ9O6sDUYWZCryEJ8xlpLY6HMZYu+Fmi1cMAqM3Jlg6vvGhq30PYRoqrEtycC3aRavYUAMnlMhFnDUBpc4tyvtadjwPTU0KA5nUGvEj1RjMV0KryOFZFiVcQKdGtcUibr5k9n1TRN1pn0f8V6kJOcFGWueZMv3skF2iCKaPbkGb0MMRXa4hMMHja14TyZGSzd70Bp2QtBQRqIEI7cNFX859fy9XvJkeSMPzklr8FUU+nm8AkIfFemdZj3DX8ag4ZAujP5wCUtmEMJ/8PURug0Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9MtXqt5AI6xt4ICbluiJvN2UYWO1rvf+ghlewsHqu9o=; b=bQbX4+VEqdrv0T+n3h6V6HKNVTnrJLoLgKshMGWPDjRq+Be3ItoR2iOEfJU8l91nO1ej+Qx7LNgn6gES/pIh9BOzUDP6NL0vmAHEyYxWI9g6rU+TN3EXswdvJL+ZIGEElHreUTXi4GDqPmJXkp/rNTr051kR+aUGHxWFhc1uIw3Oqc81W7WWfX30n5aTOYTvtM2Biu2z7mj1uEVUIn4TnlUlldMEU2ywLWoUl2asaa0LzZg4b2zfIbStdb35n4e9D/vlclBBG8r9NfLa5kYRgze837+mJQO3/Qptnk6Yt8oBp6DBreUwxtxoJS4KcAXc26StmTMiZT3dBMoU8H+/2A== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from BYAPR12MB3176.namprd12.prod.outlook.com (2603:10b6:a03:134::26) by MW3PR12MB4540.namprd12.prod.outlook.com (2603:10b6:303:52::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.33; Tue, 24 Jan 2023 05:43:38 +0000 Received: from BYAPR12MB3176.namprd12.prod.outlook.com ([fe80::465a:6564:6198:2f4e]) by BYAPR12MB3176.namprd12.prod.outlook.com ([fe80::465a:6564:6198:2f4e%4]) with mapi id 15.20.6002.033; Tue, 24 Jan 2023 05:43:38 +0000 From: Alistair Popple To: linux-mm@kvack.org, cgroups@vger.kernel.org Cc: linux-kernel@vger.kernel.org, jgg@nvidia.com, jhubbard@nvidia.com, tjmercier@google.com, hannes@cmpxchg.org, surenb@google.com, mkoutny@suse.com, daniel@ffwll.ch, Alistair Popple , linuxppc-dev@lists.ozlabs.org, linux-fpga@vger.kernel.org, linux-rdma@vger.kernel.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, bpf@vger.kernel.org, rds-devel@oss.oracle.com, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 01/19] mm: Introduce vm_account Date: Tue, 24 Jan 2023 16:42:30 +1100 Message-Id: <748338ffe4c42d86669923159fe0426808ecb04d.1674538665.git-series.apopple@nvidia.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: References: X-ClientProxiedBy: SYCP282CA0018.AUSP282.PROD.OUTLOOK.COM (2603:10c6:10:80::30) To BYAPR12MB3176.namprd12.prod.outlook.com (2603:10b6:a03:134::26) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BYAPR12MB3176:EE_|MW3PR12MB4540:EE_ X-MS-Office365-Filtering-Correlation-Id: 949f3254-124b-4c17-6368-08dafdcdf107 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: iMmh0azBbDX7+rL+BE5BEHNjFZdAqJn47L6YKbnxs3D7TsqxrUTQx2RSCUk3jZThDL42O5RGyuwH8zpvTaaLcOekSn4NVxKfSwSxRDATckJ/zWVIPqmuFSlgCVOonXKwtxpTwN1omoNXUf9FGHoaM9MxMnZ2w15IS/57C4DkJPJAlPOxG4ND1JVCtSVxP5K5MV6xFqVsVvrgy56/sKfAKBQdvPbVKI/omxH9zkik6NzSSXsY0fzHd1QTbpYr8sVHLVVCDgb7VA3AdOZUSIEaesfXlSVYkw3EchIllny8q8jD3x0bOdk7FzQVwHxfcXEVeeWJxJLdHoLJJRZs2hHdCT+sImwmAY9nSEE2OyMPhMChDjSCXO7tjAfKmy3I6TxDv6oa/TbXajFwGAbtXr7yTAav86G8TJ7anCZX5Rg8wLUQZM5NTm+tL7CsdosPro8EUMDw3Y37M+N5P5/Tva6u2OvPFeIcqtJwjyYAN1HbpxEVsaK0AkiZI0y5DXwfJow2IaIkLv0djeJUZRxfWusx8xSjt9EGndEsQuEoVc5uGYhZbIs9OgHm9KHFlSm6z8iQf+O+kNdXH+N7nzD+JBy/zgYzeNgm8ao//zRWixl67VmgGTUH/03nhbyX+5GhLzyrA6+f/T6I+iHfINFXU8Hpmw== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR12MB3176.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(346002)(366004)(376002)(136003)(396003)(39860400002)(451199015)(36756003)(86362001)(5660300002)(38100700002)(2906002)(4326008)(8936002)(7416002)(83380400001)(41300700001)(66476007)(478600001)(6486002)(66556008)(6506007)(6512007)(8676002)(26005)(186003)(316002)(66946007)(2616005)(6666004); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: j5U5QqlRpm/Gik3sp+z+oKUI1HHnjDZEA4OFtF3hAjU3mL9XzunDP3p9FVmzdc7AYMqirmc+EwX01nM+UL6Va9Qt8dzQr/HKEZPZdyPwz1AZF8tS8qY9AV1pI442Z0Df88RUPMt+VWzZkND6qan6r06X/eAu39uUHw/eVW6jQRbgjSw0SV5MCYWUklPKU2UaJ/5RHBoNnUTtZFrtigudjmyEdIek0jtqqIOp5QsDafeV1vUkPoPcElpHS8v8SOnMklNkkoBYkshtlC9CaGS2qb/DjLMz7hvvxz7KNCtJ4ySVqxEiaV+GM5UK8w3UZRftuefpBb/dek3LIF8RgY3vjiTTnjNNE4LuLLKT8jVg8VsQ/QV3NWyRV6lHrhuVPOjjjVckgokD1HVffBtuVxFGG9Gq7UBCWTOF8O7MM+ayDF6MVDmKddQmxdIWpgy5tKhwPV+y0IZT+LnY8w/DkkvKUK+5I0q6gsqKi14EDz7V7tpIto4QUwYasZ7e1ZCVcbMW+dVTPL8d7h/mZ23Ue3UenIO3VzjS5L88Yq4eNLIdcOWE8B4znTMJu570YAx4Qmxz80zQIUYp23iBgByjNfITPJ3Pd/hVafLEcYXQ17b/SyDimpsDTD8ZC4zPjO5yxiy23pKXSgoYkPRBstRPUDsZK6cho2XfYlDq7338t5Wi6nSBx+D8/0ETEvAFHmxit2H3wqktgA4YYzD43azOK4EMt7f5cuthU5krTGsKnTYQ3+/Lk5vaN6+gIJTXHvauTOdYpHQNBMU6k+B2TGyIIMR86LrtFu2pdyjwmIV945IvAA7HmwvN7jsKjaewpK/tmzBgabRSPg0Q3WZLYbyPSYwqIqUOx8yxB21fNXe6eA3hzFrorABOkoZS1jOGmgo1MCV61ulFEMkwSMrtE1zoPOPcnbB1LFDJWk+TXPBMqewFDcQCvi/Ntiazm8iR6Rg2E5tzomKQ42Dj6L5TGfG5QZjGbf91kV+oI8lWdBxzve+vS2uPQ+KEsLB0DtP7yPfOvIGcyT/nlDXCfDYQF8obLAgsRb36HHqlttTASui5EzGFokjNVp0oj29zBYD4T/LBce33RBhN39V7wgATxPS4Fc4N7vh98uz1EGDLvMRyJe9cu6gdGQdHp4riSS5mqB9MittsgVLeB9TP1R1NFA0RnvOIQdYzxS2zP2PzJ6O9teQYZzFbuxROsaeqg9JxT8fcmCIpmzYJc1NQK0Czh2LALiWJ3S7feot3MQOBwrfJo3FbIGJSYGGWoM3juiDkZ0F9RLR0+T7CwRuada/tWH8cRMO8hGypN7GELE75lheA9jRkafV85GnnMy7uRqRodSp3LKxtlzbYY9wzpV6LrBubYD5ZV0rNorAuBK4v/76yB8mlW3jc+dbAsfVkmcmM00ED40e0zr35seRfbe0iFtT+4qUOfnhWhY6JQGblGbtl8TCYEQuXfNTjisSHSXLJcpkRQa0fkSAFa8Zo8TyMq96w9gNyOivNcOGpdAmHEnmSVeGrFbvR32gtRkftbUeq7R3YF3F73v7vWIB1n+GnUTQuRCz49pDfaKVxzGq7t2I113W3R3C0c+rT1iX7AMaDutlmuub9 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 949f3254-124b-4c17-6368-08dafdcdf107 X-MS-Exchange-CrossTenant-AuthSource: BYAPR12MB3176.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Jan 2023 05:43:38.3387 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: uIP4ouzbfWVCa0XUJTot9rM9L+PhLKwqPJg7yXDOHet2Jme+y1DQ/VxC+ZGllQMyEfpyeIVZtQ1TVWuxyBQR6g== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW3PR12MB4540 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Kernel drivers that pin pages should account these pages against either user->locked_vm or mm->pinned_vm and fail the pinning if RLIMIT_MEMLOCK is exceeded and CAP_IPC_LOCK isn't held. Currently drivers open-code this accounting and use various methods to update the atomic variables and check against the limits leading to various bugs and inconsistencies. To fix this introduce a standard interface for charging pinned and locked memory. As this involves taking references on kernel objects such as mm_struct or user_struct we introduce a new vm_account struct to hold these references. Several helper functions are then introduced to grab references and check limits. As the way these limits are charged and enforced is visible to userspace we need to be careful not to break existing applications by charging to different counters. As a result the vm_account functions support accounting to different counters as required. A future change will extend this to also account against a cgroup for pinned pages. Signed-off-by: Alistair Popple Cc: linux-kernel@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-fpga@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: kvm@vger.kernel.org Cc: netdev@vger.kernel.org Cc: cgroups@vger.kernel.org Cc: io-uring@vger.kernel.org Cc: linux-mm@kvack.org Cc: bpf@vger.kernel.org Cc: rds-devel@oss.oracle.com Cc: linux-kselftest@vger.kernel.org --- include/linux/mm_types.h | 87 ++++++++++++++++++++++++++++++++++++++++- mm/util.c | 89 +++++++++++++++++++++++++++++++++++++++++- 2 files changed, 176 insertions(+) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 9757067..7de2168 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1085,4 +1085,91 @@ enum fault_flag { typedef unsigned int __bitwise zap_flags_t; +/** + * enum vm_account_flags - Determine how pinned/locked memory is accounted. + * @VM_ACCOUNT_TASK: Account pinned memory to mm->pinned_vm. + * @VM_ACCOUNT_BYPASS: Don't enforce rlimit on any charges. + * @VM_ACCOUNT_USER: Accounnt locked memory to user->locked_vm. + * + * Determines which statistic pinned/locked memory is accounted + * against. All limits will be enforced against RLIMIT_MEMLOCK and the + * pins cgroup if CONFIG_CGROUP_PINS is enabled. + * + * New drivers should use VM_ACCOUNT_TASK. VM_ACCOUNT_USER is used by + * pre-existing drivers to maintain existing accounting against + * user->locked_mm rather than mm->pinned_mm. + * + * VM_ACCOUNT_BYPASS may also be specified to bypass rlimit + * checks. Typically this is used to cache CAP_IPC_LOCK from when a + * driver is first initialised. Note that this does not bypass cgroup + * limit checks. + */ +enum vm_account_flags { + VM_ACCOUNT_TASK = 0, + VM_ACCOUNT_BYPASS = 1, + VM_ACCOUNT_USER = 2, +}; + +struct vm_account { + struct task_struct *task; + union { + struct mm_struct *mm; + struct user_struct *user; + } a; + enum vm_account_flags flags; +}; + +/** + * vm_account_init - Initialise a new struct vm_account. + * @vm_account: pointer to uninitialised vm_account. + * @task: task to charge against. + * @user: user to charge against. Must be non-NULL for VM_ACCOUNT_USER. + * @flags: flags to use when charging to vm_account. + * + * Initialise a new uninitialiused struct vm_account. Takes references + * on the task/mm/user/cgroup as required although callers must ensure + * any references passed in remain valid for the duration of this + * call. + */ +void vm_account_init(struct vm_account *vm_account, struct task_struct *task, + struct user_struct *user, enum vm_account_flags flags); +/** + * vm_account_init_current - Initialise a new struct vm_account. + * @vm_account: pointer to uninitialised vm_account. + * + * Helper to initialise a vm_account for the common case of charging + * with VM_ACCOUNT_TASK against current. + */ +void vm_account_init_current(struct vm_account *vm_account); + +/** + * vm_account_release - Initialise a new struct vm_account. + * @vm_account: pointer to initialised vm_account. + * + * Drop any object references obtained by vm_account_init(). The + * vm_account must not be used after calling this unless reinitialised + * with vm_account_init(). + */ +void vm_account_release(struct vm_account *vm_account); + +/** + * vm_account_pinned - Charge pinned or locked memory to the vm_account. + * @vm_account: pointer to an initialised vm_account. + * @npages: number of pages to charge. + * + * Return: 0 on success, -ENOMEM if a limit would be exceeded. + * + * Note: All pages must be explicitly uncharged with + * vm_unaccount_pinned() prior to releasing the vm_account with + * vm_account_release(). + */ +int vm_account_pinned(struct vm_account *vm_account, unsigned long npages); + +/** + * vm_unaccount_pinned - Uncharge pinned or locked memory to the vm_account. + * @vm_account: pointer to an initialised vm_account. + * @npages: number of pages to uncharge. + */ +void vm_unaccount_pinned(struct vm_account *vm_account, unsigned long npages); + #endif /* _LINUX_MM_TYPES_H */ diff --git a/mm/util.c b/mm/util.c index b56c92f..af40b1e 100644 --- a/mm/util.c +++ b/mm/util.c @@ -430,6 +430,95 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack) } #endif +void vm_account_init(struct vm_account *vm_account, struct task_struct *task, + struct user_struct *user, enum vm_account_flags flags) +{ + vm_account->task = get_task_struct(task); + + if (flags & VM_ACCOUNT_USER) { + vm_account->a.user = get_uid(user); + } else { + mmgrab(task->mm); + vm_account->a.mm = task->mm; + } + + vm_account->flags = flags; +} +EXPORT_SYMBOL_GPL(vm_account_init); + +void vm_account_init_current(struct vm_account *vm_account) +{ + vm_account_init(vm_account, current, NULL, VM_ACCOUNT_TASK); +} +EXPORT_SYMBOL_GPL(vm_account_init_current); + +void vm_account_release(struct vm_account *vm_account) +{ + put_task_struct(vm_account->task); + if (vm_account->flags & VM_ACCOUNT_USER) + free_uid(vm_account->a.user); + else + mmdrop(vm_account->a.mm); +} +EXPORT_SYMBOL_GPL(vm_account_release); + +/* + * Charge pages with an atomic compare and swap. Returns -ENOMEM on + * failure, 1 on success and 0 for retry. + */ +static int vm_account_cmpxchg(struct vm_account *vm_account, + unsigned long npages, unsigned long lock_limit) +{ + u64 cur_pages, new_pages; + + if (vm_account->flags & VM_ACCOUNT_USER) + cur_pages = atomic_long_read(&vm_account->a.user->locked_vm); + else + cur_pages = atomic64_read(&vm_account->a.mm->pinned_vm); + + new_pages = cur_pages + npages; + if (lock_limit != RLIM_INFINITY && new_pages > lock_limit) + return -ENOMEM; + + if (vm_account->flags & VM_ACCOUNT_USER) { + return atomic_long_cmpxchg(&vm_account->a.user->locked_vm, + cur_pages, new_pages) == cur_pages; + } else { + return atomic64_cmpxchg(&vm_account->a.mm->pinned_vm, + cur_pages, new_pages) == cur_pages; + } +} + +int vm_account_pinned(struct vm_account *vm_account, unsigned long npages) +{ + unsigned long lock_limit = RLIM_INFINITY; + int ret; + + if (!(vm_account->flags & VM_ACCOUNT_BYPASS) && !capable(CAP_IPC_LOCK)) + lock_limit = task_rlimit(vm_account->task, + RLIMIT_MEMLOCK) >> PAGE_SHIFT; + + while (true) { + ret = vm_account_cmpxchg(vm_account, npages, lock_limit); + if (ret > 0) + break; + else if (ret < 0) + return ret; + } + + return 0; +} +EXPORT_SYMBOL_GPL(vm_account_pinned); + +void vm_unaccount_pinned(struct vm_account *vm_account, unsigned long npages) +{ + if (vm_account->flags & VM_ACCOUNT_USER) + atomic_long_sub(npages, &vm_account->a.user->locked_vm); + else + atomic64_sub(npages, &vm_account->a.mm->pinned_vm); +} +EXPORT_SYMBOL_GPL(vm_unaccount_pinned); + /** * __account_locked_vm - account locked pages to an mm's locked_vm * @mm: mm to account against From patchwork Tue Jan 24 05:42:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alistair Popple X-Patchwork-Id: 646833 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0E70C25B4E for ; Tue, 24 Jan 2023 05:50:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232705AbjAXFuz (ORCPT ); Tue, 24 Jan 2023 00:50:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47060 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233165AbjAXFuk (ORCPT ); Tue, 24 Jan 2023 00:50:40 -0500 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2065.outbound.protection.outlook.com [40.107.93.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69F923C28C; Mon, 23 Jan 2023 21:50:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jSm/3kHBRU0tM8XeZYx+hSJdYgI46Gn/Vi0v09qYJZYLroCZl64ypeiS+1+jp5Ji2V00wkqLceGJxxmoYY30NkS23/tSC2GMHmIfSflJZlgMN9lkEzpqaeQR1RTIkEnSH1HZRmPvK4RuPXLJUrTU1kTW9wbm6teH/Xf/aA10tFpqUV0iUBwLy586Icn0qH6DnygCwZzusIDHbxKQeGLtWC4j70JSF+fxC1C6XiGBHxGtH2O1Yid2G7k1KL7oBi+LEmy4nWbst80olTmwLxmtFMfxPKTVGDft/WGcjvpbqfHyB2SVyxglffmOD8OsF+U63o3IRIAGBbOStT9PMeUp6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CGKTTJbiu7jo+yKcPAlQKTQIKt8ia+61r7AgkXzDJs4=; b=apamvDn7VYzIigPlZIGjVuHuDL3/R9Z1TRxdP/Z/Ev0qzDRZaIKsg5F8VZxU+KDTQFNjz+3EWLKeUwD6gqlHRWdPUWARaEv3ThF24EeoPxCcx2QqG8lAZQOrEDHmsCcxGQWXrN8YI2sC0nJjOcdleiNRFKxg39CBLhKhp2qAMwQIudisidDE3rahs04qc+GYBdRlS/1HbYl92MyBYQ7Zj2Yehorgd73iEc2w/la50wllNo/vc4mMgIJ4SqL2KG4wDC7PMGfHgH98GbzpuyfpwV7uiednnRuxXSPjsC13/nuDK5jivrHSAM+PxMLMaQvk1oSAGJAvgsq8VBFl/J3C9g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CGKTTJbiu7jo+yKcPAlQKTQIKt8ia+61r7AgkXzDJs4=; b=L9m5dbqSXd5PZ1FXIMuhCAMwXcfHhBscfWWGOTrz7txmi80DMBxphaqvJ6fNZdyb/zMXzHEqHsaH7/w5R2oXo2LCD5Qpm2ozj+4k+TuMQcmO4gZ6JlyFb5OBaT1aEmLqJZEsIjfgEd2+PAUIiXscE9HB7iKGzHOs5b6W2Rvcl6DUY7LJNrnYQlvWC0/P2pBFx5dtCQ0RaCo/rNiTwHFSdKJrQuTKiQUCooU+sJc2hvG7Dv2V+fSHUC7/Z/wVVNbPYm/0Gt/eYNftxiPhheaXJt73FlPdQ0RKV3QmFwpAeXaaEZC9vdaQ9kJbzZrAQNXzR80yVGyOGbPmm3E+2TlYig== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from BYAPR12MB3176.namprd12.prod.outlook.com (2603:10b6:a03:134::26) by PH7PR12MB7793.namprd12.prod.outlook.com (2603:10b6:510:270::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.33; Tue, 24 Jan 2023 05:49:01 +0000 Received: from BYAPR12MB3176.namprd12.prod.outlook.com ([fe80::465a:6564:6198:2f4e]) by BYAPR12MB3176.namprd12.prod.outlook.com ([fe80::465a:6564:6198:2f4e%4]) with mapi id 15.20.6002.033; Tue, 24 Jan 2023 05:49:01 +0000 From: Alistair Popple To: linux-mm@kvack.org, cgroups@vger.kernel.org Cc: linux-kernel@vger.kernel.org, jgg@nvidia.com, jhubbard@nvidia.com, tjmercier@google.com, hannes@cmpxchg.org, surenb@google.com, mkoutny@suse.com, daniel@ffwll.ch, Alistair Popple , Shuah Khan , linux-kselftest@vger.kernel.org Subject: [RFC PATCH 19/19] selftests/vm: Add pins-cgroup selftest for mlock/mmap Date: Tue, 24 Jan 2023 16:42:48 +1100 Message-Id: <9e24fb108c8fae43c18e0189d86cbb978848fbb0.1674538665.git-series.apopple@nvidia.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: References: X-ClientProxiedBy: SYBPR01CA0014.ausprd01.prod.outlook.com (2603:10c6:10::26) To BYAPR12MB3176.namprd12.prod.outlook.com (2603:10b6:a03:134::26) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BYAPR12MB3176:EE_|PH7PR12MB7793:EE_ X-MS-Office365-Filtering-Correlation-Id: c9337a78-f259-4c02-d6a4-08dafdceb1cb X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: nZW9EVtfJPRnVQsBgUe+SteJC94dgJXjoUvDNPu/QQ+arMxKSzuu42xU6yQFBXsUN9FRqd6n2bz9Hkto7HdQxbbrn2w+EIlvJkF4CF2ieHnLTD7p8EVGsh9oecrXI+bQqwIVfSD3NzkpuLDhbcPydQ6oGn5w1JmzuGRCOpHyKEVoVgmfo4aLIz968XbyZAQqhhNbx3dNgpOKpwIIsuBhxsy11U4k4xT0y6hHo74zWkjYP6gSHbQRz0//e54MsBWfOuT5QlFJqQdxv50zZpyjHFoQT5+G0etKmG2nWjAaC/1aiFzO+NOiSn71PozGXcIf4F180B7ny5HsiUGp8yBq3Ez4Uic1OLdpE5iz1LSDJ7jjARtJXFwr+T2DB1cKITE2hpGpKFsyDgIiKqrBJYwas0wbGsFIUg18aBWU9TDGWBLRxyPEfa4633lLqLSaNd3xgvq3Qhv/3xAs2rREqsHwx1bNmos9iN01lFZgBnCZ0gh/pqO+3rN5QkShHUnDy74GPPkZ8VjbEv3hKy4sr4I2RybuiZB9z0p+wGOok8kkmqR0wnx0yuTiWhQw+ftszDsMJ7yRTwoByR6R+Qj3DnINVVBh0JyGmWD3EW18d1Rp0n+RYFbteYsAQFfIVJ3Y5YRD/OlGfyYFw7xHMfWxoJwMsA== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR12MB3176.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(346002)(136003)(396003)(366004)(376002)(39860400002)(451199015)(36756003)(316002)(4326008)(66556008)(66476007)(8676002)(86362001)(66946007)(186003)(54906003)(26005)(6512007)(6506007)(6666004)(83380400001)(6486002)(478600001)(2616005)(7416002)(5660300002)(8936002)(41300700001)(2906002)(38100700002); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: WPY+GKUJpF9Y0AnpULiDqZkxL9fEGlflMj2N9kpAorwkvBHXgeupmoy3KTTObF5bPE00ymFDCmrtr7v6MwFc/OCWb7g3GGxhWc13XvWaETUs699gIOIrj5bBOKGaBZ+0UrkjcJYTFtp+ANM9RdGm8AWcqA9C4T3Az2MjViioXDZl7ArzSnMuv0QmJ21gs4YVnX/ahMTIXLzkQuFiNYKGx6OMcEIzvgQaodR8zyxcjRTOlg7/i+pGKlKWKanPJ6f/GVJb8buyuS5kpAB4r1IIKV6SBaTLumcbT6wVaFBcVZOuUWnkT9k6XUf6ehFWfW8uAj/lncQCw77ucv1S1OWbjYkh3tPI0czSM7+5fhOISiTJkegqropnn2bxsvBEUgoXm/RQeVGptV+0faOtMfx6GDqrU5sDzaoLEgIWylM2GHLW6m2CWvojnOeLq4dxDj+IHluIUaOvS8sgWNH/KHud6yX1nMZObNHvsX6IIUmyeiEdKr4QT4PpDruaWT/jptQyXgrmI9j9OycqXqo94g9XD9dNFhKPtsWIws/Bf0OpeNAGrrHUqDcc8RA8EQ6bB+Q78vUHb0tFXpPkpBrN/HAm4KNbKJkWTaijhDVeCcfcUAP5pBwr4qwqf6UE6/bAOnIMPQLs2ij2og9o4fERqgg+tx1KN9mMCTJ8IQ+aYCd+s6sGhBrPJfI/fzAoCORnRaXHPyhz6OE5K70mBRmw1DiYG4kN30OaS/WUj3a7fMjabxYJFYpbX54BIi2BbgJxEQ2jy2jPctAiAizCaw3xX+zvgyvARWB6HSKORxzeQhxWMCDx/Z5WXut/ECOUq8DcrMeRqy3plcnaUZKYWVvB1dsxvsL8Y63QOj802W219a8DKMPvBZIWwjRNRc5kNUjJ0v6XtSDmzYn+p1osdjQz/a4tEk9eyDYOyQKJNV92y9cR4sLSEInugPVwSPIDd3q277fVjZjCaDlQZEfv4/0+ndwDxS6Ynq+VjJAz3aZk3F1PRZCfzssDmXi62zCkxeZ0irPWtxQaVAsSlyBkZ5ydpTOBDuICSqM04uBm4cx8cQ0yr2tJm7V6jprvd7naDp06BE6THrkpSYMqtDi9uKmXz8aPFGsMInssrNNKV7GA1MIrsgYcYXxbRZ0dYrJRSSkOu6WFG3wi3INzByAWhfeHjNgsNztHAAVwAFWsomSUup4PB9R7dnHRw8okvwxlVeNLdVyz5eIQTfRwOrxeTM+tI/CUgPzG8isWUzwH8JagEjlPVdm4NY3BHhcZ9zuWTnkexKdWUzIYOn3pYeO5thf5lkYHuW7uowhNd8Yno05EHfCzCJvxoesVK/8mE9Fl/38ktnFy73/OdJrDuMxYLWrIUQvZ/RXVev9+WFy24rKc+Ll/FtRFCyOuQBYC/kH6InLZTJtvebw5YZzevCPUoA83aiMXO6oTQCmFbpxMr/LZxYEXU8stOQMMDcV198aBaernGD+O6F5bBfIs2upvwTV5ST/gGwChYhP2RRy3MJ1U0zK/qMJp7XCMLs3RNewuYpN6MDkZgRnbUwvI39pyZjOb7Wbm78NTKrYBDGZfyCXKrQ1M9x8aE8ms8kfMbxPCLBdPV/Z1 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: c9337a78-f259-4c02-d6a4-08dafdceb1cb X-MS-Exchange-CrossTenant-AuthSource: BYAPR12MB3176.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Jan 2023 05:49:01.7331 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: GiK5/ZfAfZsxtXsuiINaaaXjWE7hysh0hrsnuQQ4ocrW2fQxsZCoSysuoOPBUCDeJ7DQUYM1n/JX3OpGOc96vw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB7793 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Add some basic tests of mlock/mmap cgroup accounting for pinned memory. Signed-off-by: Alistair Popple Cc: Shuah Khan Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-kselftest@vger.kernel.org Cc: cgroups@vger.kernel.org --- MAINTAINERS | 1 +- tools/testing/selftests/vm/Makefile | 1 +- tools/testing/selftests/vm/pins-cgroup.c | 271 ++++++++++++++++++++++++- 3 files changed, 273 insertions(+) create mode 100644 tools/testing/selftests/vm/pins-cgroup.c diff --git a/MAINTAINERS b/MAINTAINERS index f8526e2..4c4eed9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5387,6 +5387,7 @@ L: cgroups@vger.kernel.org L: linux-mm@kvack.org S: Maintained F: mm/pins_cgroup.c +F: tools/testing/selftests/vm/pins-cgroup.c CORETEMP HARDWARE MONITORING DRIVER M: Fenghua Yu diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 89c14e4..0653720 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -56,6 +56,7 @@ TEST_GEN_PROGS += soft-dirty TEST_GEN_PROGS += split_huge_page_test TEST_GEN_FILES += ksm_tests TEST_GEN_PROGS += ksm_functional_tests +TEST_GEN_FILES += pins-cgroup ifeq ($(MACHINE),x86_64) CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_32bit_program.c -m32) diff --git a/tools/testing/selftests/vm/pins-cgroup.c b/tools/testing/selftests/vm/pins-cgroup.c new file mode 100644 index 0000000..c2eabc2 --- /dev/null +++ b/tools/testing/selftests/vm/pins-cgroup.c @@ -0,0 +1,271 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "../kselftest_harness.h" + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define CGROUP_TEMP "/sys/fs/cgroup/pins_XXXXXX" +#define PINS_MAX (-1UL) + +FIXTURE(pins_cg) +{ + char *cg_path; + long page_size; +}; + +static char *cgroup_new(void) +{ + char *cg; + + cg = malloc(sizeof(CGROUP_TEMP)); + strcpy(cg, CGROUP_TEMP); + if (!mkdtemp(cg)) { + perror("Failed to create cgroup"); + return NULL; + } + + return cg; +} + +static int cgroup_add_proc(char *cg, pid_t pid) +{ + char *cg_proc; + FILE *f; + int ret = 0; + + if (asprintf(&cg_proc, "%s/cgroup.procs", cg) < 0) + return -1; + + f = fopen(cg_proc, "w"); + free(cg_proc); + if (!f) + return -1; + + if (fprintf(f, "%ld\n", (long) pid) < 0) + ret = -1; + + fclose(f); + return ret; +} + +static int cgroup_set_limit(char *cg, unsigned long limit) +{ + char *cg_pins_max; + FILE *f; + int ret = 0; + + if (asprintf(&cg_pins_max, "%s/pins.max", cg) < 0) + return -1; + + f = fopen(cg_pins_max, "w"); + free(cg_pins_max); + if (!f) + return -1; + + if (limit != PINS_MAX) { + if (fprintf(f, "%ld\n", limit) < 0) + ret = -1; + } else { + if (fprintf(f, "max\n") < 0) + ret = -1; + } + + fclose(f); + return ret; +} + +FIXTURE_SETUP(pins_cg) +{ + char *cg_subtree_control; + FILE *f; + + if (asprintf(&cg_subtree_control, + "/sys/fs/cgroup/cgroup.subtree_control") < 0) + return; + + f = fopen(cg_subtree_control, "w"); + free(cg_subtree_control); + if (!f) + return; + + fprintf(f, "+pins\n"); + fclose(f); + + self->cg_path = cgroup_new(); + self->page_size = sysconf(_SC_PAGE_SIZE); +} + +FIXTURE_TEARDOWN(pins_cg) +{ + cgroup_add_proc("/sys/fs/cgroup", getpid()); + + rmdir(self->cg_path); + free(self->cg_path); +} + +static long cgroup_pins(char *cg) +{ + long pin_count; + char *cg_pins_current; + FILE *f; + int ret; + + if (asprintf(&cg_pins_current, "%s/pins.current", cg) < 0) + return -1; + + f = fopen(cg_pins_current, "r"); + if (!f) { + printf("Can't open %s\n", cg_pins_current); + getchar(); + free(cg_pins_current); + return -2; + } + + free(cg_pins_current); + + if (fscanf(f, "%ld", &pin_count) == EOF) + ret = -3; + else + ret = pin_count; + + fclose(f); + return ret; +} + +static int set_rlim_memlock(unsigned long size) +{ + struct rlimit rlim_memlock = { + .rlim_cur = size, + .rlim_max = size, + }; + cap_t cap; + cap_value_t capability[1] = { CAP_IPC_LOCK }; + + /* + * Many of the rlimit checks are skipped if a process has + * CAP_IP_LOCK. As this test should be run as root we need to + * explicitly drop it. + */ + cap = cap_get_proc(); + if (!cap) + return -1; + if (cap_set_flag(cap, CAP_EFFECTIVE, 1, capability, CAP_CLEAR)) + return -1; + if (cap_set_proc(cap)) + return -1; + return setrlimit(RLIMIT_MEMLOCK, &rlim_memlock); +} + +TEST_F(pins_cg, basic) +{ + pid_t child_pid; + long page_size = self->page_size; + char *p; + + ASSERT_EQ(cgroup_add_proc(self->cg_path, getpid()), 0); + p = mmap(NULL, 32*page_size, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + ASSERT_NE(p, MAP_FAILED); + + ASSERT_EQ(cgroup_pins(self->cg_path), 0); + memset(p, 0, 16*page_size); + ASSERT_EQ(mlock(p, page_size), 0); + ASSERT_EQ(cgroup_pins(self->cg_path), 1); + ASSERT_EQ(mlock(p + page_size, page_size), 0); + ASSERT_EQ(cgroup_pins(self->cg_path), 2); + ASSERT_EQ(mlock(p, page_size), 0); + ASSERT_EQ(cgroup_pins(self->cg_path), 2); + ASSERT_EQ(mlock(p, 4*page_size), 0); + ASSERT_EQ(cgroup_pins(self->cg_path), 4); + ASSERT_EQ(munlock(p + 2*page_size, 2*page_size), 0); + ASSERT_EQ(cgroup_pins(self->cg_path), 2); + ASSERT_EQ(cgroup_set_limit(self->cg_path, 8), 0); + ASSERT_EQ(mlock(p, 16*page_size), -1); + ASSERT_EQ(errno, ENOMEM); + ASSERT_EQ(cgroup_pins(self->cg_path), 2); + ASSERT_EQ(cgroup_set_limit(self->cg_path, PINS_MAX), 0); + + /* check mremap() a locked region correctly accounts locked pages */ + ASSERT_EQ(mlock(p, 32*page_size), 0); + ASSERT_EQ(cgroup_pins(self->cg_path), 32); + p = mremap(p, 32*page_size, 64*page_size, MREMAP_MAYMOVE); + ASSERT_NE(p, MAP_FAILED); + ASSERT_EQ(cgroup_pins(self->cg_path), 64); + ASSERT_EQ(munmap(p + 32*page_size, 32*page_size), 0) + ASSERT_EQ(cgroup_pins(self->cg_path), 32); + p = mremap(p, 32*page_size, 32*page_size, MREMAP_MAYMOVE | MREMAP_DONTUNMAP); + ASSERT_NE(p, MAP_FAILED); + ASSERT_EQ(cgroup_pins(self->cg_path), 32); + ASSERT_EQ(munlock(p, 32*page_size), 0); + + /* mremap() a locked region should fail if limit exceeded */ + ASSERT_EQ(set_rlim_memlock(32*page_size), 0); + ASSERT_EQ(mlock(p, 32*page_size), 0); + ASSERT_EQ(mremap(p, 32*page_size, 64*page_size, 0), MAP_FAILED); + ASSERT_EQ(munlock(p, 32*page_size), 0); + + /* Exceeds rlimit, expected to fail */ + ASSERT_EQ(set_rlim_memlock(16*page_size), 0); + ASSERT_EQ(mlock(p, 32*page_size), -1); + ASSERT_EQ(errno, ENOMEM); + + /* memory in the child isn't locked so shouldn't increase pin_cg count */ + ASSERT_EQ(mlock(p, 16*page_size), 0); + child_pid = fork(); + if (!child_pid) { + ASSERT_EQ(cgroup_pins(self->cg_path), 16); + ASSERT_EQ(mlock(p, 16*page_size), 0); + ASSERT_EQ(cgroup_pins(self->cg_path), 32); + return; + + } + waitpid(child_pid, NULL, 0); + + /* check that child exit uncharged the pins */ + ASSERT_EQ(cgroup_pins(self->cg_path), 16); +} + +TEST_F(pins_cg, mmap) +{ + char *p; + + ASSERT_EQ(cgroup_add_proc(self->cg_path, getpid()), 0); + p = mmap(NULL, 4*self->page_size, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE | MAP_LOCKED, -1, 0); + ASSERT_NE(p, MAP_FAILED); + ASSERT_EQ(cgroup_pins(self->cg_path), 4); +} + +/* + * Test moving to a different cgroup. + */ +TEST_F(pins_cg, move_cg) +{ + char *p, *new_cg; + + ASSERT_EQ(cgroup_add_proc(self->cg_path, getpid()), 0); + p = mmap(NULL, 16*self->page_size, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + ASSERT_NE(p, MAP_FAILED); + memset(p, 0, 16*self->page_size); + ASSERT_EQ(mlock(p, 16*self->page_size), 0); + ASSERT_EQ(cgroup_pins(self->cg_path), 16); + ASSERT_NE(new_cg = cgroup_new(), NULL); + ASSERT_EQ(cgroup_add_proc(new_cg, getpid()), 0); + ASSERT_EQ(cgroup_pins(new_cg), 16); + ASSERT_EQ(cgroup_add_proc(self->cg_path, getpid()), 0); + rmdir(new_cg); +} +TEST_HARNESS_MAIN