From patchwork Tue Feb 11 21:31:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 208788 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED03CC3B189 for ; Tue, 11 Feb 2020 21:31:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B52EE20708 for ; Tue, 11 Feb 2020 21:31:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="mUJSPgUA" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727043AbgBKVby (ORCPT ); Tue, 11 Feb 2020 16:31:54 -0500 Received: from mail-pl1-f202.google.com ([209.85.214.202]:49158 "EHLO mail-pl1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727555AbgBKVbw (ORCPT ); Tue, 11 Feb 2020 16:31:52 -0500 Received: by mail-pl1-f202.google.com with SMTP id w17so5262398plq.16 for ; Tue, 11 Feb 2020 13:31:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=kGkCsRCXCke3LD6wVlrVVYkJSRxB78g6/0L4LHQ32rE=; b=mUJSPgUAtI2pZ72EeYVZn1R9j3GF6Zp3ABkPoxmE5SW8nmYAfuNM9HBfLaRsfrKleC okdLIMyHTLehZJ9z4e3W/9dUwlqAZ9m0JEhNOQgVt2aHlHKYBlHCDNtdQEbUa+LQKF4c Bl96ZkdGFO1J87Vog3yEk7yVD3XvJH8ps3s30fsp54WV2epHKqfw44CbWhKIin3VU0N/ XsfXcNEIO7I4P67FLES9WXIiTGkJwFVSDIuOi1QyKdro6UZ+aghERNJ08w5oW0BYeTEL dfXfl9Qv0zg+CopSbbL8HDU8ZgtCGl9uvvLAG/B4wLiYuqTTahX1W5AATD2zGj9r0JrJ Q2xQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=kGkCsRCXCke3LD6wVlrVVYkJSRxB78g6/0L4LHQ32rE=; b=eINGUVB8ALVeu9nJSydQjuV/jzaycgxPvwSnKkwCfkY17/AcIsOiA/SicHp5yXWKLb Ii29nieYvylcHkgnAVvNTBJ/5kqKjwjJdJ8jQuK5BkPA9rLlOPoob+UuAJwqCIFyKyCu 9dfERPzkF9/gfYEtXG5AwFN00Y9EXHc+o6kvyvLXcQr9JKcJmuDddYlMP4ql9IuLSYvA SJZxN0+yZ5iyk7wlW3mpgHXsR3u6bV8uJT4D7iHVIg9bWXYPBxKsGCNOVIF8tyQ5RmSm EP0fjVmA+/HSmCFGShkyjZPg+pCQI/yE6QXKm/dr9NdkOISZDUAcVG/xLnI1VaRfoLQy zeMA== X-Gm-Message-State: APjAAAXiR5wk/idBDkYtkQvmZP+Uq1xD6AAi6kl1J3HPGOgA0+MnmSSZ ONCnIyFxy6P9gUjsRh0fDNYXa9g4yu8NJpqUZQ== X-Google-Smtp-Source: APXvYqwEHOt/2rgweTNtAw7GzXs+nZ2z12OsetB8YzF1D+ZxlbEjo/gmhFM07rkt+14t6EDd/yx/LhwVaNSUbhTmPw== X-Received: by 2002:a63:6c82:: with SMTP id h124mr8846472pgc.328.1581456711610; Tue, 11 Feb 2020 13:31:51 -0800 (PST) Date: Tue, 11 Feb 2020 13:31:22 -0800 In-Reply-To: <20200211213128.73302-1-almasrymina@google.com> Message-Id: <20200211213128.73302-3-almasrymina@google.com> Mime-Version: 1.0 References: <20200211213128.73302-1-almasrymina@google.com> X-Mailer: git-send-email 2.25.0.225.g125e21ebc7-goog Subject: [PATCH v12 3/9] hugetlb_cgroup: add reservation accounting for private mappings From: Mina Almasry To: mike.kravetz@oracle.com Cc: shuah@kernel.org, almasrymina@google.com, rientjes@google.com, shakeelb@google.com, gthelen@google.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, cgroups@vger.kernel.org Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Normally the pointer to the cgroup to uncharge hangs off the struct page, and gets queried when it's time to free the page. With hugetlb_cgroup reservations, this is not possible. Because it's possible for a page to be reserved by one task and actually faulted in by another task. The best place to put the hugetlb_cgroup pointer to uncharge for reservations is in the resv_map. But, because the resv_map has different semantics for private and shared mappings, the code patch to charge/uncharge shared and private mappings is different. This patch implements charging and uncharging for private mappings. For private mappings, the counter to uncharge is in resv_map->reservation_counter. On initializing the resv_map this is set to NULL. On reservation of a region in private mapping, the tasks hugetlb_cgroup is charged and the hugetlb_cgroup is placed is resv_map->reservation_counter. On hugetlb_vm_op_close, we uncharge resv_map->reservation_counter. Signed-off-by: Mina Almasry Reviewed-by: Mike Kravetz Acked-by: David Rientjes --- Changes in v12: - Very minor interface update for readability. Changes in v11: - Refactored hugetlb_cgroup_uncharge_conuter a bit to eliminate unnecessary #ifdefs. - Added resv_map_set_hugetlb_cgroup_uncharge_info() to eliminate #ifdefs in the middle of hugetlb logic. Changes in v10: - Fixed cases where the mapping is private but cgroup accounting is disabled. Changes in V9: - Updated for reparenting of hugetlb reservation accounting. --- include/linux/hugetlb.h | 10 ++++++++ include/linux/hugetlb_cgroup.h | 39 +++++++++++++++++++++++++--- mm/hugetlb.c | 47 +++++++++++++++++++++++++++++++--- mm/hugetlb_cgroup.c | 41 +++++------------------------ 4 files changed, 97 insertions(+), 40 deletions(-) -- 2.25.0.225.g125e21ebc7-goog diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index dea6143aa0685..5491932ea5758 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -46,6 +46,16 @@ struct resv_map { long adds_in_progress; struct list_head region_cache; long region_cache_count; +#ifdef CONFIG_CGROUP_HUGETLB + /* + * On private mappings, the counter to uncharge reservations is stored + * here. If these fields are 0, then either the mapping is shared, or + * cgroup accounting is disabled for this resv_map. + */ + struct page_counter *reservation_counter; + unsigned long pages_per_hpage; + struct cgroup_subsys_state *css; +#endif }; extern struct resv_map *resv_map_alloc(void); void resv_map_release(struct kref *ref); diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index 5443f4548d523..9974817eb995f 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -27,6 +27,33 @@ struct hugetlb_cgroup; #define HUGETLB_CGROUP_MIN_ORDER 2 #ifdef CONFIG_CGROUP_HUGETLB +enum hugetlb_memory_event { + HUGETLB_MAX, + HUGETLB_NR_MEMORY_EVENTS, +}; + +struct hugetlb_cgroup { + struct cgroup_subsys_state css; + + /* + * the counter to account for hugepages from hugetlb. + */ + struct page_counter hugepage[HUGE_MAX_HSTATE]; + + /* + * the counter to account for hugepage reservations from hugetlb. + */ + struct page_counter rsvd_hugepage[HUGE_MAX_HSTATE]; + + atomic_long_t events[HUGE_MAX_HSTATE][HUGETLB_NR_MEMORY_EVENTS]; + atomic_long_t events_local[HUGE_MAX_HSTATE][HUGETLB_NR_MEMORY_EVENTS]; + + /* Handle for "hugetlb.events" */ + struct cgroup_file events_file[HUGE_MAX_HSTATE]; + + /* Handle for "hugetlb.events.local" */ + struct cgroup_file events_local_file[HUGE_MAX_HSTATE]; +}; static inline struct hugetlb_cgroup * __hugetlb_cgroup_from_page(struct page *page, bool rsvd) @@ -102,9 +129,9 @@ extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg); extern void hugetlb_cgroup_uncharge_cgroup_rsvd(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg); -extern void hugetlb_cgroup_uncharge_counter(struct page_counter *p, - unsigned long nr_pages, - struct cgroup_subsys_state *css); +extern void hugetlb_cgroup_uncharge_counter(struct resv_map *resv, + unsigned long start, + unsigned long end); extern void hugetlb_cgroup_file_init(void) __init; extern void hugetlb_cgroup_migrate(struct page *oldhpage, @@ -193,6 +220,12 @@ hugetlb_cgroup_uncharge_cgroup_rsvd(int idx, unsigned long nr_pages, { } +static inline void hugetlb_cgroup_uncharge_counter(struct resv_map *resv, + unsigned long start, + unsigned long end) +{ +} + static inline void hugetlb_cgroup_file_init(void) { } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 357817e33fb34..ac7bab0afafe7 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -650,6 +650,25 @@ static void set_vma_private_data(struct vm_area_struct *vma, vma->vm_private_data = (void *)value; } +static void +resv_map_set_hugetlb_cgroup_uncharge_info(struct resv_map *resv_map, + struct hugetlb_cgroup *h_cg, + struct hstate *h) +{ +#ifdef CONFIG_CGROUP_HUGETLB + if (!h_cg || !h) { + resv_map->reservation_counter = NULL; + resv_map->pages_per_hpage = 0; + resv_map->css = NULL; + } else { + resv_map->reservation_counter = + &h_cg->rsvd_hugepage[hstate_index(h)]; + resv_map->pages_per_hpage = pages_per_huge_page(h); + resv_map->css = &h_cg->css; + } +#endif +} + struct resv_map *resv_map_alloc(void) { struct resv_map *resv_map = kmalloc(sizeof(*resv_map), GFP_KERNEL); @@ -666,6 +685,13 @@ struct resv_map *resv_map_alloc(void) INIT_LIST_HEAD(&resv_map->regions); resv_map->adds_in_progress = 0; + /* + * Initialize these to 0. On shared mappings, 0's here indicate these + * fields don't do cgroup accounting. On private mappings, these will be + * re-initialized to the proper values, to indicate that hugetlb cgroup + * reservations are to be un-charged from here. + */ + resv_map_set_hugetlb_cgroup_uncharge_info(resv_map, NULL, NULL); INIT_LIST_HEAD(&resv_map->region_cache); list_add(&rg->link, &resv_map->region_cache); @@ -3190,9 +3216,7 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma) end = vma_hugecache_offset(h, vma, vma->vm_end); reserve = (end - start) - region_count(resv, start, end); - - kref_put(&resv->refs, resv_map_release); - + hugetlb_cgroup_uncharge_counter(resv, start, end); if (reserve) { /* * Decrement reserve counts. The global reserve count may be @@ -3201,6 +3225,8 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma) gbl_reserve = hugepage_subpool_put_pages(spool, reserve); hugetlb_acct_memory(h, -gbl_reserve); } + + kref_put(&resv->refs, resv_map_release); } static int hugetlb_vm_op_split(struct vm_area_struct *vma, unsigned long addr) @@ -4547,6 +4573,7 @@ int hugetlb_reserve_pages(struct inode *inode, struct hstate *h = hstate_inode(inode); struct hugepage_subpool *spool = subpool_inode(inode); struct resv_map *resv_map; + struct hugetlb_cgroup *h_cg; long gbl_reserve; /* This should never happen */ @@ -4580,12 +4607,26 @@ int hugetlb_reserve_pages(struct inode *inode, chg = region_chg(resv_map, from, to); } else { + /* Private mapping. */ resv_map = resv_map_alloc(); if (!resv_map) return -ENOMEM; chg = to - from; + if (hugetlb_cgroup_charge_cgroup_rsvd( + hstate_index(h), chg * pages_per_huge_page(h), + &h_cg)) { + kref_put(&resv_map->refs, resv_map_release); + return -ENOMEM; + } + + /* + * Since this branch handles private mappings, we attach the + * counter to uncharge for this reservation off resv_map. + */ + resv_map_set_hugetlb_cgroup_uncharge_info(resv_map, h_cg, h); + set_vma_resv_map(vma, resv_map); set_vma_resv_flags(vma, HPAGE_RESV_OWNER); } diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index b8246389fb153..bae7bf6b8372f 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -23,34 +23,6 @@ #include #include -enum hugetlb_memory_event { - HUGETLB_MAX, - HUGETLB_NR_MEMORY_EVENTS, -}; - -struct hugetlb_cgroup { - struct cgroup_subsys_state css; - - /* - * the counter to account for hugepages from hugetlb. - */ - struct page_counter hugepage[HUGE_MAX_HSTATE]; - - /* - * the counter to account for hugepage reservations from hugetlb. - */ - struct page_counter rsvd_hugepage[HUGE_MAX_HSTATE]; - - atomic_long_t events[HUGE_MAX_HSTATE][HUGETLB_NR_MEMORY_EVENTS]; - atomic_long_t events_local[HUGE_MAX_HSTATE][HUGETLB_NR_MEMORY_EVENTS]; - - /* Handle for "hugetlb.events" */ - struct cgroup_file events_file[HUGE_MAX_HSTATE]; - - /* Handle for "hugetlb.events.local" */ - struct cgroup_file events_local_file[HUGE_MAX_HSTATE]; -}; - #define MEMFILE_PRIVATE(x, val) (((x) << 16) | (val)) #define MEMFILE_IDX(val) (((val) >> 16) & 0xffff) #define MEMFILE_ATTR(val) ((val) & 0xffff) @@ -408,15 +380,16 @@ void hugetlb_cgroup_uncharge_cgroup_rsvd(int idx, unsigned long nr_pages, __hugetlb_cgroup_uncharge_cgroup(idx, nr_pages, h_cg, true); } -void hugetlb_cgroup_uncharge_counter(struct page_counter *p, - unsigned long nr_pages, - struct cgroup_subsys_state *css) +void hugetlb_cgroup_uncharge_counter(struct resv_map *resv, unsigned long start, + unsigned long end) { - if (hugetlb_cgroup_disabled() || !p || !css) + if (hugetlb_cgroup_disabled() || !resv || !resv->reservation_counter || + !resv->css) return; - page_counter_uncharge(p, nr_pages); - css_put(css); + page_counter_uncharge(resv->reservation_counter, + (end - start) * resv->pages_per_hpage); + css_put(resv->css); } enum { From patchwork Tue Feb 11 21:31:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 208787 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E588CC3B187 for ; Tue, 11 Feb 2020 21:31:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BF7CE20708 for ; Tue, 11 Feb 2020 21:31:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="vwmuNQcG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727772AbgBKVb7 (ORCPT ); Tue, 11 Feb 2020 16:31:59 -0500 Received: from mail-pj1-f73.google.com ([209.85.216.73]:41427 "EHLO mail-pj1-f73.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727763AbgBKVb5 (ORCPT ); Tue, 11 Feb 2020 16:31:57 -0500 Received: by mail-pj1-f73.google.com with SMTP id ds13so2389604pjb.6 for ; Tue, 11 Feb 2020 13:31:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=BVcYMblHo6641wnjh0BNcs5SfcxEom2W5cCD1WNtrzk=; b=vwmuNQcG7XVQMnweR9mQ+8XSocH1puBK5otPbvWSQe43bLTleyrp59ERdrTx9+/9Er L14Wzt7Mq4cGzIz38mn+Ahao4lVxZXvVU/zncGAaJEQxcgKD80QqK+s0xeZKaWtyo+Ke xIr2Zy8tfT+TnOqhEt8dIXGKwguFCCVWQwLbXHwo+fUs3Ii34N9z+OW7HA9kLiOoz0Za BaGN3n5ZRRSwlfiHcNiwyHuKWlPhWov6Dael5DM9/NBmkr4tgEgVWr8d/qWdIpKwF6+x Oa1I8UtxT5ggEpSdJgQOnzOEUuCQ2R5dgXh0hZRbYNt6vb4As9/lFHaaXxMnEbGLOIFw Ea9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=BVcYMblHo6641wnjh0BNcs5SfcxEom2W5cCD1WNtrzk=; b=H/UfVFxfo/nWObETh7hyfqlWLGzlmeS9ADWUrjcL55tXh8MuQpzeqbzriBTku3epIr sdnnCX367uo8+KW/RZyxEoZF3oaanS7gE1I/TyzcJlbQJut1tq2qU/XwFhwbifZ5E2KE enGDiW/8l0a+uvORx/63u47gR5dSykEvyvOpdlC4eFqEQmLvH60i86PkOIxzJOd0q9fo 287nC1407mCwK3QKIX2K1inUW2B7Ovm342Ww9Ftwpe6Sz5VmO5Tw0UmGt6TmbQeRXxZF BE+5/PGUtih0q+yu/roClHGfjzJk7yQbO6JKFQHyXEuodE6Ii4M7BlC7xPHSzTFR29v/ UEUQ== X-Gm-Message-State: APjAAAUOl/YqyMFNozS6yld+WOSd2/jW+GDqjihUKvkFZTmNU3FvTjxa Bt7CPpliq6Lp+SLWVjMzVyEocTzhn00+1Jk6HA== X-Google-Smtp-Source: APXvYqwh97REuZatMpAdAJxZ020CapGXzTC0ReAK88RHoEJQ1sKa2W/C/PD20sjeKiwM5DMnEOcerdklPjH4wFOZSA== X-Received: by 2002:a65:56c6:: with SMTP id w6mr9188220pgs.167.1581456716306; Tue, 11 Feb 2020 13:31:56 -0800 (PST) Date: Tue, 11 Feb 2020 13:31:24 -0800 In-Reply-To: <20200211213128.73302-1-almasrymina@google.com> Message-Id: <20200211213128.73302-5-almasrymina@google.com> Mime-Version: 1.0 References: <20200211213128.73302-1-almasrymina@google.com> X-Mailer: git-send-email 2.25.0.225.g125e21ebc7-goog Subject: [PATCH v12 5/9] hugetlb_cgroup: add accounting for shared mappings From: Mina Almasry To: mike.kravetz@oracle.com Cc: shuah@kernel.org, almasrymina@google.com, rientjes@google.com, shakeelb@google.com, gthelen@google.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, cgroups@vger.kernel.org Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org For shared mappings, the pointer to the hugetlb_cgroup to uncharge lives in the resv_map entries, in file_region->reservation_counter. After a call to region_chg, we charge the approprate hugetlb_cgroup, and if successful, we pass on the hugetlb_cgroup info to a follow up region_add call. When a file_region entry is added to the resv_map via region_add, we put the pointer to that cgroup in file_region->reservation_counter. If charging doesn't succeed, we report the error to the caller, so that the kernel fails the reservation. On region_del, which is when the hugetlb memory is unreserved, we also uncharge the file_region->reservation_counter. Signed-off-by: Mina Almasry --- Changes in v12: - Removed per-file-region pages_per_hpage field. - Rearranged some function params. - Fixed goto out_put_pages error condition. Changes in v11: - Created new function, hugetlb_cgroup_uncharge_file_region to clean up some #ifdefs. - Moved file_region definition to hugetlb.h. - Added copy_hugetlb_cgroup_uncharge_info function to clean up more #ifdefs in the middle of hugetlb code. Changes in v10: - Deleted duplicated code snippet. Changes in V9: - Updated for hugetlb reservation repareting. --- include/linux/hugetlb.h | 35 ++++++++ include/linux/hugetlb_cgroup.h | 10 +++ mm/hugetlb.c | 148 +++++++++++++++++++++------------ mm/hugetlb_cgroup.c | 15 ++++ 4 files changed, 154 insertions(+), 54 deletions(-) -- 2.25.0.225.g125e21ebc7-goog diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 5491932ea5758..50480d16bd334 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -57,6 +57,41 @@ struct resv_map { struct cgroup_subsys_state *css; #endif }; + +/* + * Region tracking -- allows tracking of reservations and instantiated pages + * across the pages in a mapping. + * + * The region data structures are embedded into a resv_map and protected + * by a resv_map's lock. The set of regions within the resv_map represent + * reservations for huge pages, or huge pages that have already been + * instantiated within the map. The from and to elements are huge page + * indicies into the associated mapping. from indicates the starting index + * of the region. to represents the first index past the end of the region. + * + * For example, a file region structure with from == 0 and to == 4 represents + * four huge pages in a mapping. It is important to note that the to element + * represents the first element past the end of the region. This is used in + * arithmetic as 4(to) - 0(from) = 4 huge pages in the region. + * + * Interval notation of the form [from, to) will be used to indicate that + * the endpoint from is inclusive and to is exclusive. + */ +struct file_region { + struct list_head link; + long from; + long to; +#ifdef CONFIG_CGROUP_HUGETLB + /* + * On shared mappings, each reserved region appears as a struct + * file_region in resv_map. These fields hold the info needed to + * uncharge each reservation. + */ + struct page_counter *reservation_counter; + struct cgroup_subsys_state *css; +#endif +}; + extern struct resv_map *resv_map_alloc(void); void resv_map_release(struct kref *ref); diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index 9974817eb995f..a09d4164ba910 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -133,11 +133,21 @@ extern void hugetlb_cgroup_uncharge_counter(struct resv_map *resv, unsigned long start, unsigned long end); +extern void hugetlb_cgroup_uncharge_file_region(struct resv_map *resv, + struct file_region *rg, + unsigned long nr_pages); + extern void hugetlb_cgroup_file_init(void) __init; extern void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage); #else +static inline void hugetlb_cgroup_uncharge_file_region(struct resv_map *resv, + struct file_region *rg, + unsigned long nr_pages) +{ +} + static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) { return NULL; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 57739e1aedeca..a9171c3cbed6b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -220,31 +220,6 @@ static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma) return subpool_inode(file_inode(vma->vm_file)); } -/* - * Region tracking -- allows tracking of reservations and instantiated pages - * across the pages in a mapping. - * - * The region data structures are embedded into a resv_map and protected - * by a resv_map's lock. The set of regions within the resv_map represent - * reservations for huge pages, or huge pages that have already been - * instantiated within the map. The from and to elements are huge page - * indicies into the associated mapping. from indicates the starting index - * of the region. to represents the first index past the end of the region. - * - * For example, a file region structure with from == 0 and to == 4 represents - * four huge pages in a mapping. It is important to note that the to element - * represents the first element past the end of the region. This is used in - * arithmetic as 4(to) - 0(from) = 4 huge pages in the region. - * - * Interval notation of the form [from, to) will be used to indicate that - * the endpoint from is inclusive and to is exclusive. - */ -struct file_region { - struct list_head link; - long from; - long to; -}; - /* Helper that removes a struct file_region from the resv_map cache and returns * it for use. */ @@ -266,6 +241,41 @@ get_file_region_entry_from_cache(struct resv_map *resv, long from, long to) return nrg; } +static void copy_hugetlb_cgroup_uncharge_info(struct file_region *nrg, + struct file_region *rg) +{ +#ifdef CONFIG_CGROUP_HUGETLB + nrg->reservation_counter = rg->reservation_counter; + nrg->css = rg->css; + if (rg->css) + css_get(rg->css); +#endif +} + +/* Helper that records hugetlb_cgroup uncharge info. */ +static void record_hugetlb_cgroup_uncharge_info(struct hugetlb_cgroup *h_cg, + struct hstate *h, + struct resv_map *resv, + struct file_region *nrg) +{ +#ifdef CONFIG_CGROUP_HUGETLB + if (h_cg) { + nrg->reservation_counter = + &h_cg->rsvd_hugepage[hstate_index(h)]; + nrg->css = &h_cg->css; + if (!resv->pages_per_hpage) + resv->pages_per_hpage = pages_per_huge_page(h); + /* pages_per_hpage should be the same for all entries in + * a resv_map. + */ + VM_BUG_ON(resv->pages_per_hpage != pages_per_huge_page(h)); + } else { + nrg->reservation_counter = NULL; + nrg->css = NULL; + } +#endif +} + /* Must be called with resv->lock held. Calling this with count_only == true * will count the number of pages to be added but will not modify the linked * list. If regions_needed != NULL and count_only == true, then regions_needed @@ -273,7 +283,9 @@ get_file_region_entry_from_cache(struct resv_map *resv, long from, long to) * add the regions for this range. */ static long add_reservation_in_range(struct resv_map *resv, long f, long t, - long *regions_needed, bool count_only) + struct hugetlb_cgroup *h_cg, + struct hstate *h, long *regions_needed, + bool count_only) { long add = 0; struct list_head *head = &resv->regions; @@ -312,6 +324,8 @@ static long add_reservation_in_range(struct resv_map *resv, long f, long t, if (!count_only) { nrg = get_file_region_entry_from_cache( resv, last_accounted_offset, rg->from); + record_hugetlb_cgroup_uncharge_info(h_cg, h, + resv, nrg); list_add(&nrg->link, rg->link.prev); } else if (regions_needed) *regions_needed += 1; @@ -328,6 +342,7 @@ static long add_reservation_in_range(struct resv_map *resv, long f, long t, if (!count_only) { nrg = get_file_region_entry_from_cache( resv, last_accounted_offset, t); + record_hugetlb_cgroup_uncharge_info(h_cg, h, resv, nrg); list_add(&nrg->link, rg->link.prev); } else if (regions_needed) *regions_needed += 1; @@ -355,7 +370,8 @@ static long add_reservation_in_range(struct resv_map *resv, long f, long t, * 1 page will only require at most 1 entry. */ static long region_add(struct resv_map *resv, long f, long t, - long in_regions_needed) + long in_regions_needed, struct hstate *h, + struct hugetlb_cgroup *h_cg) { long add = 0, actual_regions_needed = 0, i = 0; struct file_region *trg = NULL, *rg = NULL; @@ -367,7 +383,8 @@ static long region_add(struct resv_map *resv, long f, long t, retry: /* Count how many regions are actually needed to execute this add. */ - add_reservation_in_range(resv, f, t, &actual_regions_needed, true); + add_reservation_in_range(resv, f, t, NULL, NULL, &actual_regions_needed, + true); /* * Check for sufficient descriptors in the cache to accommodate @@ -405,7 +422,7 @@ static long region_add(struct resv_map *resv, long f, long t, goto retry; } - add = add_reservation_in_range(resv, f, t, NULL, false); + add = add_reservation_in_range(resv, f, t, h_cg, h, NULL, false); resv->adds_in_progress -= in_regions_needed; @@ -453,7 +470,8 @@ static long region_chg(struct resv_map *resv, long f, long t, spin_lock(&resv->lock); /* Count how many hugepages in this range are NOT respresented. */ - chg = add_reservation_in_range(resv, f, t, out_regions_needed, true); + chg = add_reservation_in_range(resv, f, t, NULL, NULL, + out_regions_needed, true); if (*out_regions_needed == 0) *out_regions_needed = 1; @@ -589,11 +607,17 @@ static long region_del(struct resv_map *resv, long f, long t) /* New entry for end of split region */ nrg->from = t; nrg->to = rg->to; + + copy_hugetlb_cgroup_uncharge_info(nrg, rg); + INIT_LIST_HEAD(&nrg->link); /* Original entry is trimmed */ rg->to = f; + hugetlb_cgroup_uncharge_file_region( + resv, rg, nrg->to - nrg->from); + list_add(&nrg->link, &rg->link); nrg = NULL; break; @@ -601,6 +625,8 @@ static long region_del(struct resv_map *resv, long f, long t) if (f <= rg->from && t >= rg->to) { /* Remove entire region */ del += rg->to - rg->from; + hugetlb_cgroup_uncharge_file_region(resv, rg, + rg->to - rg->from); list_del(&rg->link); kfree(rg); continue; @@ -609,9 +635,15 @@ static long region_del(struct resv_map *resv, long f, long t) if (f <= rg->from) { /* Trim beginning of region */ del += t - rg->from; rg->from = t; + + hugetlb_cgroup_uncharge_file_region(resv, rg, + t - rg->from); } else { /* Trim end of region */ del += rg->to - f; rg->to = f; + + hugetlb_cgroup_uncharge_file_region(resv, rg, + rg->to - f); } } @@ -2018,7 +2050,7 @@ static long __vma_reservation_common(struct hstate *h, VM_BUG_ON(dummy_out_regions_needed != 1); break; case VMA_COMMIT_RESV: - ret = region_add(resv, idx, idx + 1, 1); + ret = region_add(resv, idx, idx + 1, 1, NULL, NULL); /* region_add calls of range 1 should never fail. */ VM_BUG_ON(ret < 0); break; @@ -2028,7 +2060,7 @@ static long __vma_reservation_common(struct hstate *h, break; case VMA_ADD_RESV: if (vma->vm_flags & VM_MAYSHARE) { - ret = region_add(resv, idx, idx + 1, 1); + ret = region_add(resv, idx, idx + 1, 1, NULL, NULL); /* region_add calls of range 1 should never fail. */ VM_BUG_ON(ret < 0); } else { @@ -4686,7 +4718,7 @@ int hugetlb_reserve_pages(struct inode *inode, struct hstate *h = hstate_inode(inode); struct hugepage_subpool *spool = subpool_inode(inode); struct resv_map *resv_map; - struct hugetlb_cgroup *h_cg; + struct hugetlb_cgroup *h_cg = NULL; long gbl_reserve, regions_needed = 0; /* This should never happen */ @@ -4727,19 +4759,6 @@ int hugetlb_reserve_pages(struct inode *inode, chg = to - from; - if (hugetlb_cgroup_charge_cgroup_rsvd( - hstate_index(h), chg * pages_per_huge_page(h), - &h_cg)) { - kref_put(&resv_map->refs, resv_map_release); - return -ENOMEM; - } - - /* - * Since this branch handles private mappings, we attach the - * counter to uncharge for this reservation off resv_map. - */ - resv_map_set_hugetlb_cgroup_uncharge_info(resv_map, h_cg, h); - set_vma_resv_map(vma, resv_map); set_vma_resv_flags(vma, HPAGE_RESV_OWNER); } @@ -4749,6 +4768,21 @@ int hugetlb_reserve_pages(struct inode *inode, goto out_err; } + ret = hugetlb_cgroup_charge_cgroup_rsvd( + hstate_index(h), chg * pages_per_huge_page(h), &h_cg); + + if (ret < 0) { + ret = -ENOMEM; + goto out_err; + } + + if (vma && !(vma->vm_flags & VM_MAYSHARE) && h_cg) { + /* For private mappings, the hugetlb_cgroup uncharge info hangs + * of the resv_map. + */ + resv_map_set_hugetlb_cgroup_uncharge_info(resv_map, h_cg, h); + } + /* * There must be enough pages in the subpool for the mapping. If * the subpool has a minimum size, there may be some global @@ -4757,7 +4791,7 @@ int hugetlb_reserve_pages(struct inode *inode, gbl_reserve = hugepage_subpool_get_pages(spool, chg); if (gbl_reserve < 0) { ret = -ENOSPC; - goto out_err; + goto out_uncharge_cgroup; } /* @@ -4766,9 +4800,7 @@ int hugetlb_reserve_pages(struct inode *inode, */ ret = hugetlb_acct_memory(h, gbl_reserve); if (ret < 0) { - /* put back original number of pages, chg */ - (void)hugepage_subpool_put_pages(spool, chg); - goto out_err; + goto out_put_pages; } /* @@ -4783,13 +4815,11 @@ int hugetlb_reserve_pages(struct inode *inode, * else has to be done for private mappings here */ if (!vma || vma->vm_flags & VM_MAYSHARE) { - add = region_add(resv_map, from, to, regions_needed); + add = region_add(resv_map, from, to, regions_needed, h, h_cg); if (unlikely(add < 0)) { hugetlb_acct_memory(h, -gbl_reserve); - /* put back original number of pages, chg */ - (void)hugepage_subpool_put_pages(spool, chg); - goto out_err; + goto out_put_pages; } else if (unlikely(chg > add)) { /* * pages in this range were added to the reserve @@ -4800,12 +4830,22 @@ int hugetlb_reserve_pages(struct inode *inode, */ long rsv_adjust; + hugetlb_cgroup_uncharge_cgroup_rsvd( + hstate_index(h), + (chg - add) * pages_per_huge_page(h), h_cg); + rsv_adjust = hugepage_subpool_put_pages(spool, chg - add); hugetlb_acct_memory(h, -rsv_adjust); } } return 0; +out_put_pages: + /* put back original number of pages, chg */ + (void)hugepage_subpool_put_pages(spool, chg); +out_uncharge_cgroup: + hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h), + chg * pages_per_huge_page(h), h_cg); out_err: if (!vma || vma->vm_flags & VM_MAYSHARE) /* Only call region_abort if the region_chg succeeded but the diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index bae7bf6b8372f..ad777fecad28a 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -392,6 +392,21 @@ void hugetlb_cgroup_uncharge_counter(struct resv_map *resv, unsigned long start, css_put(resv->css); } +void hugetlb_cgroup_uncharge_file_region(struct resv_map *resv, + struct file_region *rg, + unsigned long nr_pages) +{ + if (hugetlb_cgroup_disabled() || !resv || !rg || !nr_pages) + return; + + if (rg->reservation_counter && resv->pages_per_hpage && nr_pages > 0 && + !resv->reservation_counter) { + page_counter_uncharge(rg->reservation_counter, + nr_pages * resv->pages_per_hpage); + css_put(rg->css); + } +} + enum { RES_USAGE, RES_RSVD_USAGE, From patchwork Tue Feb 11 21:31:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 208785 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6256CC352A3 for ; Tue, 11 Feb 2020 21:32:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3A48620708 for ; Tue, 11 Feb 2020 21:32:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="BtgkPLXM" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727810AbgBKVcB (ORCPT ); Tue, 11 Feb 2020 16:32:01 -0500 Received: from mail-pj1-f74.google.com ([209.85.216.74]:54217 "EHLO mail-pj1-f74.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727799AbgBKVcA (ORCPT ); Tue, 11 Feb 2020 16:32:00 -0500 Received: by mail-pj1-f74.google.com with SMTP id h6so2320778pju.3 for ; Tue, 11 Feb 2020 13:31:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=bX7NhEua2ERlz39Th9iBgSOUAYmxb/RK4PMlnkgxhw4=; b=BtgkPLXMThgu/syUKS8v1hUUBJzr5rr4sugkb9U9SmG8EucdUE3NmJbmN7UD6vglUU qDnC1J5nk/4LnxTOic1kCFOOGFJJRZ9V/O/kuWTyaiwnYfI3oObcxuAI04Fv/jq+WuEf Y6jvrEVvln+iEZM9lw3JUOpeFGyc+vVRpXSXva72q5J5WDe2JF3XPDxM+U6mXMReiu1D 3n3fys+4aVKQKjoxjbpRbekY39o1sq+Gpb4CeLpVUdJHbfLd9CbY76o22Dpp3UGfd7aY CasIqzzCJj3ozmFDqINalnmQdPE1mexaVaklhVzoGmJA5x1NJsd0K69Sck/ut/8djAFQ MHkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=bX7NhEua2ERlz39Th9iBgSOUAYmxb/RK4PMlnkgxhw4=; b=nsGhQubdWhRdn1wqB23uMP7Sw8OMFYPd5iF+tCEVmnAmKtnl70bfh3bkQYfBK+vBO/ 3Ledypu0cLqBk/gwCA5zKIwyYekzCT4rdG1snv9FF0rD2TTeCHkT8RDIsO48f0f+pMeP 1RHx605DW8t216YVNLkFqSY2qiu7QB9rk/nW3Onqdx/gawY9+phAkXpA4xLZu3utEmIC Nw08br4e8qAFuzfKTyKqi4rlyxFr9rshoejU+Zqlv3+uQy8j89EfjL6azDBOxtxzbR1U JSyGAXR5O0ywgmRksoZqIM+GOwy9eosNaF/JxF/naMrDCcoRv/JD3Nmy1s9sbV1gkEVJ w3gw== X-Gm-Message-State: APjAAAXMvvoGvV/W9vaOYBkBckeKCMyhnTMdgFC/WCGBdAaHOgquuoFq AXCc+rwuRGsZqrENOus5vKVybYCnQKkUofELVA== X-Google-Smtp-Source: APXvYqx0QPMNZu65+W4/JoT3ENr3G6zoSw5leGNsERt3DujmNIf57u3bpNm6BKn3IwAszRlwDO81dHDN9qXxBtx6IQ== X-Received: by 2002:a63:ed49:: with SMTP id m9mr8752932pgk.304.1581456718652; Tue, 11 Feb 2020 13:31:58 -0800 (PST) Date: Tue, 11 Feb 2020 13:31:25 -0800 In-Reply-To: <20200211213128.73302-1-almasrymina@google.com> Message-Id: <20200211213128.73302-6-almasrymina@google.com> Mime-Version: 1.0 References: <20200211213128.73302-1-almasrymina@google.com> X-Mailer: git-send-email 2.25.0.225.g125e21ebc7-goog Subject: [PATCH v12 6/9] hugetlb_cgroup: support noreserve mappings From: Mina Almasry To: mike.kravetz@oracle.com Cc: shuah@kernel.org, almasrymina@google.com, rientjes@google.com, shakeelb@google.com, gthelen@google.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, cgroups@vger.kernel.org Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Support MAP_NORESERVE accounting as part of the new counter. For each hugepage allocation, at allocation time we check if there is a reservation for this allocation or not. If there is a reservation for this allocation, then this allocation was charged at reservation time, and we don't re-account it. If there is no reserevation for this allocation, we charge the appropriate hugetlb_cgroup. The hugetlb_cgroup to uncharge for this allocation is stored in page[3].private. We use new APIs added in an earlier patch to set this pointer. Signed-off-by: Mina Almasry --- Changes in v12: - Minor rebase to new interface for readability. Changes in v10: - Refactored deferred_reserve check. --- mm/hugetlb.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) -- 2.25.0.225.g125e21ebc7-goog diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a9171c3cbed6b..2d62dd35399db 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1342,6 +1342,8 @@ static void __free_huge_page(struct page *page) clear_page_huge_active(page); hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), page); + hugetlb_cgroup_uncharge_page_rsvd(hstate_index(h), + pages_per_huge_page(h), page); if (restore_reserve) h->resv_huge_pages++; @@ -2175,6 +2177,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, long gbl_chg; int ret, idx; struct hugetlb_cgroup *h_cg; + bool deferred_reserve; idx = hstate_index(h); /* @@ -2212,9 +2215,19 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, gbl_chg = 1; } + /* If this allocation is not consuming a reservation, charge it now. + */ + deferred_reserve = map_chg || avoid_reserve || !vma_resv_map(vma); + if (deferred_reserve) { + ret = hugetlb_cgroup_charge_cgroup_rsvd( + idx, pages_per_huge_page(h), &h_cg); + if (ret) + goto out_subpool_put; + } + ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); if (ret) - goto out_subpool_put; + goto out_uncharge_cgroup_reservation; spin_lock(&hugetlb_lock); /* @@ -2237,6 +2250,14 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, /* Fall through */ } hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page); + /* If allocation is not consuming a reservation, also store the + * hugetlb_cgroup pointer on the page. + */ + if (deferred_reserve) { + hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h), + h_cg, page); + } + spin_unlock(&hugetlb_lock); set_page_private(page, (unsigned long)spool); @@ -2261,6 +2282,10 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, out_uncharge_cgroup: hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg); +out_uncharge_cgroup_reservation: + if (deferred_reserve) + hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), + h_cg); out_subpool_put: if (map_chg || avoid_reserve) hugepage_subpool_put_pages(spool, 1); From patchwork Tue Feb 11 21:31:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 208786 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 994A9C3B187 for ; Tue, 11 Feb 2020 21:32:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 571D12082F for ; Tue, 11 Feb 2020 21:32:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="NsrmgFNa" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727771AbgBKVcH (ORCPT ); Tue, 11 Feb 2020 16:32:07 -0500 Received: from mail-pg1-f202.google.com ([209.85.215.202]:51925 "EHLO mail-pg1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727855AbgBKVcE (ORCPT ); Tue, 11 Feb 2020 16:32:04 -0500 Received: by mail-pg1-f202.google.com with SMTP id m18so8253088pgn.18 for ; Tue, 11 Feb 2020 13:32:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=4xuira85RK5C7baiEl75HifS8yYoxqnRD0HM7qhA1Pk=; b=NsrmgFNaaPmRRsF1ErPNzzUS3DEtnqSD16twwdfB7AfwAzOdme1dSASOK+74Kurx5l uvgMxT31p6dKFVDFPJqnX9SxqbX85R0OV8PGyXrFmfYqq0HFUEPp9xr6MXuKnlw7iq6L QGrk+vud7URxlWLTvxKJJ8a2UmCFEt8XpDy1dkDNFDW2vWTd3j8uFXOtWM6FikEmezFG LYBpzRRl+Mu6nEMW+4dBlqv7DNuaq0f56wnzMIpzhRm2CTYqpVlt85Kc8aSPzJcaucpo j+fwS3DDSjZagpGaUN0BhbglJHfG/N3JEW74d569CqKgt28zP/JBa8tOZymQijp1GNVo plYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=4xuira85RK5C7baiEl75HifS8yYoxqnRD0HM7qhA1Pk=; b=lMeaBOUFtOo9pxF5Jty0GZ8GS/oK0NqPWOgYRfz1AadjXTd6HEUEoBAzEnqPVqK0pz rR0YSt7PFVubw/+BCoz3uPbGk+cCzWojwP343pHjUdzADc0xxtyK9sYZd08S1R2YsmcH jLK1bfjCuNwCYC/dHHTSlPF3Opw4FVyWY/yx4Tr7V3GOR6ayTRDYUpUGIekD9ywIGIne /zY1eYNLqmE4NhpJPkRQ2w9ozcp+ZPIrR8RGT1ZFxYz872CqLRyBs6+rsGvZ8jb19Itd nHTIlO7yj3ouY34y0vbzonB9Idhg1nbnvsftaZaQVDWu2SIvbJnfP27wyoKUVd1AH5sT ks9w== X-Gm-Message-State: APjAAAX1+gX7hkio3KBNLnYxzdQwVZnBY2wIVQ0Ugj4Um1eZ+SURkLOv YQokbBg7MU6dHhD6cGY5L28lG/3tsiSoXNF1Rg== X-Google-Smtp-Source: APXvYqyNty6pa+neFm++9CnM/sZeGqMVnnCr+G5tSt4ow8TYIiavaG+0cSDQ3sMKDhZl7JC9FSwejJ2HVsK0Ig6RKw== X-Received: by 2002:a63:be48:: with SMTP id g8mr5295111pgo.23.1581456723480; Tue, 11 Feb 2020 13:32:03 -0800 (PST) Date: Tue, 11 Feb 2020 13:31:27 -0800 In-Reply-To: <20200211213128.73302-1-almasrymina@google.com> Message-Id: <20200211213128.73302-8-almasrymina@google.com> Mime-Version: 1.0 References: <20200211213128.73302-1-almasrymina@google.com> X-Mailer: git-send-email 2.25.0.225.g125e21ebc7-goog Subject: [PATCH v12 8/9] hugetlb_cgroup: Add hugetlb_cgroup reservation tests From: Mina Almasry To: mike.kravetz@oracle.com Cc: shuah@kernel.org, almasrymina@google.com, rientjes@google.com, shakeelb@google.com, gthelen@google.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, cgroups@vger.kernel.org, sandipan@linux.ibm.com Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org The tests use both shared and private mapped hugetlb memory, and monitors the hugetlb usage counter as well as the hugetlb reservation counter. They test different configurations such as hugetlb memory usage via hugetlbfs, or MAP_HUGETLB, or shmget/shmat, and with and without MAP_POPULATE. Also add test for hugetlb reservation reparenting, since this is a subtle issue. Signed-off-by: Mina Almasry Cc: sandipan@linux.ibm.com --- Changes in v12: - Fixed tests on machines with non-2MB default hugepage size. Changes in v11: - Modify test to not assume 2MB hugepage size. - Updated resv.* to rsvd.* Changes in v10: - Updated tests to resv.* name changes. Changes in v9: - Added tests for hugetlb reparenting. - Make tests explicitly support cgroup v1 and v2 via script argument. Changes in v6: - Updates tests for cgroups-v2 and NORESERVE allocations. --- tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 1 + .../selftests/vm/charge_reserved_hugetlb.sh | 575 ++++++++++++++++++ .../selftests/vm/hugetlb_reparenting_test.sh | 244 ++++++++ .../selftests/vm/write_hugetlb_memory.sh | 23 + .../testing/selftests/vm/write_to_hugetlbfs.c | 242 ++++++++ 6 files changed, 1086 insertions(+) create mode 100755 tools/testing/selftests/vm/charge_reserved_hugetlb.sh create mode 100755 tools/testing/selftests/vm/hugetlb_reparenting_test.sh create mode 100644 tools/testing/selftests/vm/write_hugetlb_memory.sh create mode 100644 tools/testing/selftests/vm/write_to_hugetlbfs.c -- 2.25.0.225.g125e21ebc7-goog diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore index 31b3c98b6d34d..d3bed9407773c 100644 --- a/tools/testing/selftests/vm/.gitignore +++ b/tools/testing/selftests/vm/.gitignore @@ -14,3 +14,4 @@ virtual_address_range gup_benchmark va_128TBswitch map_fixed_noreplace +write_to_hugetlbfs diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 7f9a8a8c31da9..662bb95e84c5d 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -22,6 +22,7 @@ TEST_GEN_FILES += userfaultfd ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64)) TEST_GEN_FILES += va_128TBswitch TEST_GEN_FILES += virtual_address_range +TEST_GEN_FILES += write_to_hugetlbfs endif TEST_PROGS := run_vmtests diff --git a/tools/testing/selftests/vm/charge_reserved_hugetlb.sh b/tools/testing/selftests/vm/charge_reserved_hugetlb.sh new file mode 100755 index 0000000000000..18d33684faade --- /dev/null +++ b/tools/testing/selftests/vm/charge_reserved_hugetlb.sh @@ -0,0 +1,575 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +set -e + +if [[ $(id -u) -ne 0 ]]; then + echo "This test must be run as root. Skipping..." + exit 0 +fi + +fault_limit_file=limit_in_bytes +reservation_limit_file=rsvd.limit_in_bytes +fault_usage_file=usage_in_bytes +reservation_usage_file=rsvd.usage_in_bytes + +if [[ "$1" == "-cgroup-v2" ]]; then + cgroup2=1 + fault_limit_file=max + reservation_limit_file=rsvd.max + fault_usage_file=current + reservation_usage_file=rsvd.current +fi + +cgroup_path=/dev/cgroup/memory +if [[ ! -e $cgroup_path ]]; then + mkdir -p $cgroup_path + if [[ $cgroup2 ]]; then + mount -t cgroup2 none $cgroup_path + else + mount -t cgroup memory,hugetlb $cgroup_path + fi +fi + +if [[ $cgroup2 ]]; then + echo "+hugetlb" >/dev/cgroup/memory/cgroup.subtree_control +fi + +function cleanup() { + if [[ $cgroup2 ]]; then + echo $$ >$cgroup_path/cgroup.procs + else + echo $$ >$cgroup_path/tasks + fi + + if [[ -e /mnt/huge ]]; then + rm -rf /mnt/huge/* + umount /mnt/huge || echo error + rmdir /mnt/huge + fi + if [[ -e $cgroup_path/hugetlb_cgroup_test ]]; then + rmdir $cgroup_path/hugetlb_cgroup_test + fi + if [[ -e $cgroup_path/hugetlb_cgroup_test1 ]]; then + rmdir $cgroup_path/hugetlb_cgroup_test1 + fi + if [[ -e $cgroup_path/hugetlb_cgroup_test2 ]]; then + rmdir $cgroup_path/hugetlb_cgroup_test2 + fi + echo 0 >/proc/sys/vm/nr_hugepages + echo CLEANUP DONE +} + +function expect_equal() { + local expected="$1" + local actual="$2" + local error="$3" + + if [[ "$expected" != "$actual" ]]; then + echo "expected ($expected) != actual ($actual): $3" + cleanup + exit 1 + fi +} + +function get_machine_hugepage_size() { + hpz=$(grep -i hugepagesize /proc/meminfo) + kb=${hpz:14:-3} + mb=$(($kb / 1024)) + echo $mb +} + +MB=$(get_machine_hugepage_size) + +function setup_cgroup() { + local name="$1" + local cgroup_limit="$2" + local reservation_limit="$3" + + mkdir $cgroup_path/$name + + echo writing cgroup limit: "$cgroup_limit" + echo "$cgroup_limit" >$cgroup_path/$name/hugetlb.${MB}MB.$fault_limit_file + + echo writing reseravation limit: "$reservation_limit" + echo "$reservation_limit" > \ + $cgroup_path/$name/hugetlb.${MB}MB.$reservation_limit_file + + if [ -e "$cgroup_path/$name/cpuset.cpus" ]; then + echo 0 >$cgroup_path/$name/cpuset.cpus + fi + if [ -e "$cgroup_path/$name/cpuset.mems" ]; then + echo 0 >$cgroup_path/$name/cpuset.mems + fi +} + +function wait_for_hugetlb_memory_to_get_depleted() { + local cgroup="$1" + local path="/dev/cgroup/memory/$cgroup/hugetlb.${MB}MB.$reservation_usage_file" + # Wait for hugetlbfs memory to get depleted. + while [ $(cat $path) != 0 ]; do + echo Waiting for hugetlb memory to get depleted. + cat $path + sleep 0.5 + done +} + +function wait_for_hugetlb_memory_to_get_reserved() { + local cgroup="$1" + local size="$2" + + local path="/dev/cgroup/memory/$cgroup/hugetlb.${MB}MB.$reservation_usage_file" + # Wait for hugetlbfs memory to get written. + while [ $(cat $path) != $size ]; do + echo Waiting for hugetlb memory reservation to reach size $size. + cat $path + sleep 0.5 + done +} + +function wait_for_hugetlb_memory_to_get_written() { + local cgroup="$1" + local size="$2" + + local path="/dev/cgroup/memory/$cgroup/hugetlb.${MB}MB.$fault_usage_file" + # Wait for hugetlbfs memory to get written. + while [ $(cat $path) != $size ]; do + echo Waiting for hugetlb memory to reach size $size. + cat $path + sleep 0.5 + done +} + +function write_hugetlbfs_and_get_usage() { + local cgroup="$1" + local size="$2" + local populate="$3" + local write="$4" + local path="$5" + local method="$6" + local private="$7" + local expect_failure="$8" + local reserve="$9" + + # Function return values. + reservation_failed=0 + oom_killed=0 + hugetlb_difference=0 + reserved_difference=0 + + local hugetlb_usage=$cgroup_path/$cgroup/hugetlb.${MB}MB.$fault_usage_file + local reserved_usage=$cgroup_path/$cgroup/hugetlb.${MB}MB.$reservation_usage_file + + local hugetlb_before=$(cat $hugetlb_usage) + local reserved_before=$(cat $reserved_usage) + + echo + echo Starting: + echo hugetlb_usage="$hugetlb_before" + echo reserved_usage="$reserved_before" + echo expect_failure is "$expect_failure" + + output=$(mktemp) + set +e + if [[ "$method" == "1" ]] || [[ "$method" == 2 ]] || + [[ "$private" == "-r" ]] && [[ "$expect_failure" != 1 ]]; then + + bash write_hugetlb_memory.sh "$size" "$populate" "$write" \ + "$cgroup" "$path" "$method" "$private" "-l" "$reserve" 2>&1 | tee $output & + + local write_result=$? + local write_pid=$! + + until grep -q -i "DONE" $output; do + echo waiting for DONE signal. + if ! ps $write_pid > /dev/null + then + echo "FAIL: The write died" + cleanup + exit 1 + fi + sleep 0.5 + done + + echo ================= write_hugetlb_memory.sh output is: + cat $output + echo ================= end output. + + if [[ "$populate" == "-o" ]] || [[ "$write" == "-w" ]]; then + wait_for_hugetlb_memory_to_get_written "$cgroup" "$size" + elif [[ "$reserve" != "-n" ]]; then + wait_for_hugetlb_memory_to_get_reserved "$cgroup" "$size" + else + # This case doesn't produce visible effects, but we still have + # to wait for the async process to start and execute... + sleep 0.5 + fi + + echo write_result is $write_result + else + bash write_hugetlb_memory.sh "$size" "$populate" "$write" \ + "$cgroup" "$path" "$method" "$private" "$reserve" + local write_result=$? + + if [[ "$reserve" != "-n" ]]; then + wait_for_hugetlb_memory_to_get_reserved "$cgroup" "$size" + fi + fi + set -e + + if [[ "$write_result" == 1 ]]; then + reservation_failed=1 + fi + + # On linus/master, the above process gets SIGBUS'd on oomkill, with + # return code 135. On earlier kernels, it gets actual oomkill, with return + # code 137, so just check for both conditions in case we're testing + # against an earlier kernel. + if [[ "$write_result" == 135 ]] || [[ "$write_result" == 137 ]]; then + oom_killed=1 + fi + + local hugetlb_after=$(cat $hugetlb_usage) + local reserved_after=$(cat $reserved_usage) + + echo After write: + echo hugetlb_usage="$hugetlb_after" + echo reserved_usage="$reserved_after" + + hugetlb_difference=$(($hugetlb_after - $hugetlb_before)) + reserved_difference=$(($reserved_after - $reserved_before)) +} + +function cleanup_hugetlb_memory() { + set +e + local cgroup="$1" + if [[ "$(pgrep -f write_to_hugetlbfs)" != "" ]]; then + echo killing write_to_hugetlbfs + killall -2 write_to_hugetlbfs + wait_for_hugetlb_memory_to_get_depleted $cgroup + fi + set -e + + if [[ -e /mnt/huge ]]; then + rm -rf /mnt/huge/* + umount /mnt/huge + rmdir /mnt/huge + fi +} + +function run_test() { + local size=$(($1 * ${MB} * 1024 * 1024)) + local populate="$2" + local write="$3" + local cgroup_limit=$(($4 * ${MB} * 1024 * 1024)) + local reservation_limit=$(($5 * ${MB} * 1024 * 1024)) + local nr_hugepages="$6" + local method="$7" + local private="$8" + local expect_failure="$9" + local reserve="${10}" + + # Function return values. + hugetlb_difference=0 + reserved_difference=0 + reservation_failed=0 + oom_killed=0 + + echo nr hugepages = "$nr_hugepages" + echo "$nr_hugepages" >/proc/sys/vm/nr_hugepages + + setup_cgroup "hugetlb_cgroup_test" "$cgroup_limit" "$reservation_limit" + + mkdir -p /mnt/huge + mount -t hugetlbfs -o pagesize=${MB}M,size=256M none /mnt/huge + + write_hugetlbfs_and_get_usage "hugetlb_cgroup_test" "$size" "$populate" \ + "$write" "/mnt/huge/test" "$method" "$private" "$expect_failure" \ + "$reserve" + + cleanup_hugetlb_memory "hugetlb_cgroup_test" + + local final_hugetlb=$(cat $cgroup_path/hugetlb_cgroup_test/hugetlb.${MB}MB.$fault_usage_file) + local final_reservation=$(cat $cgroup_path/hugetlb_cgroup_test/hugetlb.${MB}MB.$reservation_usage_file) + + echo $hugetlb_difference + echo $reserved_difference + expect_equal "0" "$final_hugetlb" "final hugetlb is not zero" + expect_equal "0" "$final_reservation" "final reservation is not zero" +} + +function run_multiple_cgroup_test() { + local size1="$1" + local populate1="$2" + local write1="$3" + local cgroup_limit1="$4" + local reservation_limit1="$5" + + local size2="$6" + local populate2="$7" + local write2="$8" + local cgroup_limit2="$9" + local reservation_limit2="${10}" + + local nr_hugepages="${11}" + local method="${12}" + local private="${13}" + local expect_failure="${14}" + local reserve="${15}" + + # Function return values. + hugetlb_difference1=0 + reserved_difference1=0 + reservation_failed1=0 + oom_killed1=0 + + hugetlb_difference2=0 + reserved_difference2=0 + reservation_failed2=0 + oom_killed2=0 + + echo nr hugepages = "$nr_hugepages" + echo "$nr_hugepages" >/proc/sys/vm/nr_hugepages + + setup_cgroup "hugetlb_cgroup_test1" "$cgroup_limit1" "$reservation_limit1" + setup_cgroup "hugetlb_cgroup_test2" "$cgroup_limit2" "$reservation_limit2" + + mkdir -p /mnt/huge + mount -t hugetlbfs -o pagesize=${MB}M,size=256M none /mnt/huge + + write_hugetlbfs_and_get_usage "hugetlb_cgroup_test1" "$size1" \ + "$populate1" "$write1" "/mnt/huge/test1" "$method" "$private" \ + "$expect_failure" "$reserve" + + hugetlb_difference1=$hugetlb_difference + reserved_difference1=$reserved_difference + reservation_failed1=$reservation_failed + oom_killed1=$oom_killed + + local cgroup1_hugetlb_usage=$cgroup_path/hugetlb_cgroup_test1/hugetlb.${MB}MB.$fault_usage_file + local cgroup1_reservation_usage=$cgroup_path/hugetlb_cgroup_test1/hugetlb.${MB}MB.$reservation_usage_file + local cgroup2_hugetlb_usage=$cgroup_path/hugetlb_cgroup_test2/hugetlb.${MB}MB.$fault_usage_file + local cgroup2_reservation_usage=$cgroup_path/hugetlb_cgroup_test2/hugetlb.${MB}MB.$reservation_usage_file + + local usage_before_second_write=$(cat $cgroup1_hugetlb_usage) + local reservation_usage_before_second_write=$(cat $cgroup1_reservation_usage) + + write_hugetlbfs_and_get_usage "hugetlb_cgroup_test2" "$size2" \ + "$populate2" "$write2" "/mnt/huge/test2" "$method" "$private" \ + "$expect_failure" "$reserve" + + hugetlb_difference2=$hugetlb_difference + reserved_difference2=$reserved_difference + reservation_failed2=$reservation_failed + oom_killed2=$oom_killed + + expect_equal "$usage_before_second_write" \ + "$(cat $cgroup1_hugetlb_usage)" "Usage changed." + expect_equal "$reservation_usage_before_second_write" \ + "$(cat $cgroup1_reservation_usage)" "Reservation usage changed." + + cleanup_hugetlb_memory + + local final_hugetlb=$(cat $cgroup1_hugetlb_usage) + local final_reservation=$(cat $cgroup1_reservation_usage) + + expect_equal "0" "$final_hugetlb" \ + "hugetlbt_cgroup_test1 final hugetlb is not zero" + expect_equal "0" "$final_reservation" \ + "hugetlbt_cgroup_test1 final reservation is not zero" + + local final_hugetlb=$(cat $cgroup2_hugetlb_usage) + local final_reservation=$(cat $cgroup2_reservation_usage) + + expect_equal "0" "$final_hugetlb" \ + "hugetlb_cgroup_test2 final hugetlb is not zero" + expect_equal "0" "$final_reservation" \ + "hugetlb_cgroup_test2 final reservation is not zero" +} + +cleanup + +for populate in "" "-o"; do + for method in 0 1 2; do + for private in "" "-r"; do + for reserve in "" "-n"; do + + # Skip mmap(MAP_HUGETLB | MAP_SHARED). Doesn't seem to be supported. + if [[ "$method" == 1 ]] && [[ "$private" == "" ]]; then + continue + fi + + # Skip populated shmem tests. Doesn't seem to be supported. + if [[ "$method" == 2"" ]] && [[ "$populate" == "-o" ]]; then + continue + fi + + if [[ "$method" == 2"" ]] && [[ "$reserve" == "-n" ]]; then + continue + fi + + cleanup + echo + echo + echo + echo Test normal case. + echo private=$private, populate=$populate, method=$method, reserve=$reserve + run_test 5 "$populate" "" 10 10 10 "$method" "$private" "0" "$reserve" + + echo Memory charged to hugtlb=$hugetlb_difference + echo Memory charged to reservation=$reserved_difference + + if [[ "$populate" == "-o" ]]; then + expect_equal "$((5 * $MB * 1024 * 1024))" "$hugetlb_difference" \ + "Reserved memory charged to hugetlb cgroup." + else + expect_equal "0" "$hugetlb_difference" \ + "Reserved memory charged to hugetlb cgroup." + fi + + if [[ "$reserve" != "-n" ]] || [[ "$populate" == "-o" ]]; then + expect_equal "$((5 * $MB * 1024 * 1024))" "$reserved_difference" \ + "Reserved memory not charged to reservation usage." + else + expect_equal "0" "$reserved_difference" \ + "Reserved memory not charged to reservation usage." + fi + + echo 'PASS' + + cleanup + echo + echo + echo + echo Test normal case with write. + echo private=$private, populate=$populate, method=$method, reserve=$reserve + run_test 5 "$populate" '-w' 5 5 10 "$method" "$private" "0" "$reserve" + + echo Memory charged to hugtlb=$hugetlb_difference + echo Memory charged to reservation=$reserved_difference + + expect_equal "$((5 * $MB * 1024 * 1024))" "$hugetlb_difference" \ + "Reserved memory charged to hugetlb cgroup." + + expect_equal "$((5 * $MB * 1024 * 1024))" "$reserved_difference" \ + "Reserved memory not charged to reservation usage." + + echo 'PASS' + + cleanup + continue + echo + echo + echo + echo Test more than reservation case. + echo private=$private, populate=$populate, method=$method, reserve=$reserve + + if [ "$reserve" != "-n" ]; then + run_test "5" "$populate" '' "10" "2" "10" "$method" "$private" "1" \ + "$reserve" + + expect_equal "1" "$reservation_failed" "Reservation succeeded." + fi + + echo 'PASS' + + cleanup + + echo + echo + echo + echo Test more than cgroup limit case. + echo private=$private, populate=$populate, method=$method, reserve=$reserve + + # Not sure if shm memory can be cleaned up when the process gets sigbus'd. + if [[ "$method" != 2 ]]; then + run_test 5 "$populate" "-w" 2 10 10 "$method" "$private" "1" "$reserve" + + expect_equal "1" "$oom_killed" "Not oom killed." + fi + echo 'PASS' + + cleanup + + echo + echo + echo + echo Test normal case, multiple cgroups. + echo private=$private, populate=$populate, method=$method, reserve=$reserve + run_multiple_cgroup_test "3" "$populate" "" "10" "10" "5" \ + "$populate" "" "10" "10" "10" \ + "$method" "$private" "0" "$reserve" + + echo Memory charged to hugtlb1=$hugetlb_difference1 + echo Memory charged to reservation1=$reserved_difference1 + echo Memory charged to hugtlb2=$hugetlb_difference2 + echo Memory charged to reservation2=$reserved_difference2 + + if [[ "$reserve" != "-n" ]] || [[ "$populate" == "-o" ]]; then + expect_equal "3" "$reserved_difference1" \ + "Incorrect reservations charged to cgroup 1." + + expect_equal "5" "$reserved_difference2" \ + "Incorrect reservation charged to cgroup 2." + + else + expect_equal "0" "$reserved_difference1" \ + "Incorrect reservations charged to cgroup 1." + + expect_equal "0" "$reserved_difference2" \ + "Incorrect reservation charged to cgroup 2." + fi + + if [[ "$populate" == "-o" ]]; then + expect_equal "3" "$hugetlb_difference1" \ + "Incorrect hugetlb charged to cgroup 1." + + expect_equal "5" "$hugetlb_difference2" \ + "Incorrect hugetlb charged to cgroup 2." + + else + expect_equal "0" "$hugetlb_difference1" \ + "Incorrect hugetlb charged to cgroup 1." + + expect_equal "0" "$hugetlb_difference2" \ + "Incorrect hugetlb charged to cgroup 2." + fi + echo 'PASS' + + cleanup + echo + echo + echo + echo Test normal case with write, multiple cgroups. + echo private=$private, populate=$populate, method=$method, reserve=$reserve + run_multiple_cgroup_test "3" "$populate" "-w" "10" "10" "5" \ + "$populate" "-w" "10" "10" "10" \ + "$method" "$private" "0" "$reserve" + + echo Memory charged to hugtlb1=$hugetlb_difference1 + echo Memory charged to reservation1=$reserved_difference1 + echo Memory charged to hugtlb2=$hugetlb_difference2 + echo Memory charged to reservation2=$reserved_difference2 + + expect_equal "3" "$hugetlb_difference1" \ + "Incorrect hugetlb charged to cgroup 1." + + expect_equal "3" "$reserved_difference1" \ + "Incorrect reservation charged to cgroup 1." + + expect_equal "5" "$hugetlb_difference2" \ + "Incorrect hugetlb charged to cgroup 2." + + expect_equal "5" "$reserved_difference2" \ + "Incorrected reservation charged to cgroup 2." + echo 'PASS' + + cleanup + + done # reserve + done # private + done # populate +done # method + +umount $cgroup_path +rmdir $cgroup_path diff --git a/tools/testing/selftests/vm/hugetlb_reparenting_test.sh b/tools/testing/selftests/vm/hugetlb_reparenting_test.sh new file mode 100755 index 0000000000000..d11d1febccc3b --- /dev/null +++ b/tools/testing/selftests/vm/hugetlb_reparenting_test.sh @@ -0,0 +1,244 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +set -e + +if [[ $(id -u) -ne 0 ]]; then + echo "This test must be run as root. Skipping..." + exit 0 +fi + +usage_file=usage_in_bytes + +if [[ "$1" == "-cgroup-v2" ]]; then + cgroup2=1 + usage_file=current +fi + +CGROUP_ROOT='/dev/cgroup/memory' +MNT='/mnt/huge/' + +if [[ ! -e $CGROUP_ROOT ]]; then + mkdir -p $CGROUP_ROOT + if [[ $cgroup2 ]]; then + mount -t cgroup2 none $CGROUP_ROOT + sleep 1 + echo "+hugetlb +memory" >$CGROUP_ROOT/cgroup.subtree_control + else + mount -t cgroup memory,hugetlb $CGROUP_ROOT + fi +fi + +function get_machine_hugepage_size() { + hpz=$(grep -i hugepagesize /proc/meminfo) + kb=${hpz:14:-3} + mb=$(($kb / 1024)) + echo $mb +} + +MB=$(get_machine_hugepage_size) + +function cleanup() { + echo cleanup + set +e + rm -rf "$MNT"/* 2>/dev/null + umount "$MNT" 2>/dev/null + rmdir "$MNT" 2>/dev/null + rmdir "$CGROUP_ROOT"/a/b 2>/dev/null + rmdir "$CGROUP_ROOT"/a 2>/dev/null + rmdir "$CGROUP_ROOT"/test1 2>/dev/null + echo 0 >/proc/sys/vm/nr_hugepages + set -e +} + +function assert_state() { + local expected_a="$1" + local expected_a_hugetlb="$2" + local expected_b="" + local expected_b_hugetlb="" + + if [ ! -z ${3:-} ] && [ ! -z ${4:-} ]; then + expected_b="$3" + expected_b_hugetlb="$4" + fi + local tolerance=$((5 * 1024 * 1024)) + + local actual_a + actual_a="$(cat "$CGROUP_ROOT"/a/memory.$usage_file)" + if [[ $actual_a -lt $(($expected_a - $tolerance)) ]] || + [[ $actual_a -gt $(($expected_a + $tolerance)) ]]; then + echo actual a = $((${actual_a%% *} / 1024 / 1024)) MB + echo expected a = $((${expected_a%% *} / 1024 / 1024)) MB + echo fail + + cleanup + exit 1 + fi + + local actual_a_hugetlb + actual_a_hugetlb="$(cat "$CGROUP_ROOT"/a/hugetlb.${MB}MB.$usage_file)" + if [[ $actual_a_hugetlb -lt $(($expected_a_hugetlb - $tolerance)) ]] || + [[ $actual_a_hugetlb -gt $(($expected_a_hugetlb + $tolerance)) ]]; then + echo actual a hugetlb = $((${actual_a_hugetlb%% *} / 1024 / 1024)) MB + echo expected a hugetlb = $((${expected_a_hugetlb%% *} / 1024 / 1024)) MB + echo fail + + cleanup + exit 1 + fi + + if [[ -z "$expected_b" || -z "$expected_b_hugetlb" ]]; then + return + fi + + local actual_b + actual_b="$(cat "$CGROUP_ROOT"/a/b/memory.$usage_file)" + if [[ $actual_b -lt $(($expected_b - $tolerance)) ]] || + [[ $actual_b -gt $(($expected_b + $tolerance)) ]]; then + echo actual b = $((${actual_b%% *} / 1024 / 1024)) MB + echo expected b = $((${expected_b%% *} / 1024 / 1024)) MB + echo fail + + cleanup + exit 1 + fi + + local actual_b_hugetlb + actual_b_hugetlb="$(cat "$CGROUP_ROOT"/a/b/hugetlb.${MB}MB.$usage_file)" + if [[ $actual_b_hugetlb -lt $(($expected_b_hugetlb - $tolerance)) ]] || + [[ $actual_b_hugetlb -gt $(($expected_b_hugetlb + $tolerance)) ]]; then + echo actual b hugetlb = $((${actual_b_hugetlb%% *} / 1024 / 1024)) MB + echo expected b hugetlb = $((${expected_b_hugetlb%% *} / 1024 / 1024)) MB + echo fail + + cleanup + exit 1 + fi +} + +function setup() { + echo 100 >/proc/sys/vm/nr_hugepages + mkdir "$CGROUP_ROOT"/a + sleep 1 + if [[ $cgroup2 ]]; then + echo "+hugetlb +memory" >$CGROUP_ROOT/a/cgroup.subtree_control + else + echo 0 >$CGROUP_ROOT/a/cpuset.mems + echo 0 >$CGROUP_ROOT/a/cpuset.cpus + fi + + mkdir "$CGROUP_ROOT"/a/b + + if [[ ! $cgroup2 ]]; then + echo 0 >$CGROUP_ROOT/a/b/cpuset.mems + echo 0 >$CGROUP_ROOT/a/b/cpuset.cpus + fi + + mkdir -p "$MNT" + mount -t hugetlbfs none "$MNT" +} + +write_hugetlbfs() { + local cgroup="$1" + local path="$2" + local size="$3" + + if [[ $cgroup2 ]]; then + echo $$ >$CGROUP_ROOT/$cgroup/cgroup.procs + else + echo 0 >$CGROUP_ROOT/$cgroup/cpuset.mems + echo 0 >$CGROUP_ROOT/$cgroup/cpuset.cpus + echo $$ >"$CGROUP_ROOT/$cgroup/tasks" + fi + ./write_to_hugetlbfs -p "$path" -s "$size" -m 0 -o + if [[ $cgroup2 ]]; then + echo $$ >$CGROUP_ROOT/cgroup.procs + else + echo $$ >"$CGROUP_ROOT/tasks" + fi + echo +} + +set -e + +size=$((${MB} * 1024 * 1024 * 25)) # 50MB = 25 * 2MB hugepages. + +cleanup + +echo +echo +echo Test charge, rmdir, uncharge +setup +echo mkdir +mkdir $CGROUP_ROOT/test1 + +echo write +write_hugetlbfs test1 "$MNT"/test $size + +echo rmdir +rmdir $CGROUP_ROOT/test1 +mkdir $CGROUP_ROOT/test1 + +echo uncharge +rm -rf /mnt/huge/* + +cleanup + +echo done +echo +echo +if [[ ! $cgroup2 ]]; then + echo "Test parent and child hugetlb usage" + setup + + echo write + write_hugetlbfs a "$MNT"/test $size + + echo Assert memory charged correctly for parent use. + assert_state 0 $size 0 0 + + write_hugetlbfs a/b "$MNT"/test2 $size + + echo Assert memory charged correctly for child use. + assert_state 0 $(($size * 2)) 0 $size + + rmdir "$CGROUP_ROOT"/a/b + sleep 5 + echo Assert memory reparent correctly. + assert_state 0 $(($size * 2)) + + rm -rf "$MNT"/* + umount "$MNT" + echo Assert memory uncharged correctly. + assert_state 0 0 + + cleanup +fi + +echo +echo +echo "Test child only hugetlb usage" +echo setup +setup + +echo write +write_hugetlbfs a/b "$MNT"/test2 $size + +echo Assert memory charged correctly for child only use. +assert_state 0 $(($size)) 0 $size + +rmdir "$CGROUP_ROOT"/a/b +echo Assert memory reparent correctly. +assert_state 0 $size + +rm -rf "$MNT"/* +umount "$MNT" +echo Assert memory uncharged correctly. +assert_state 0 0 + +cleanup + +echo ALL PASS + +umount $CGROUP_ROOT +rm -rf $CGROUP_ROOT diff --git a/tools/testing/selftests/vm/write_hugetlb_memory.sh b/tools/testing/selftests/vm/write_hugetlb_memory.sh new file mode 100644 index 0000000000000..d3d0d108924d4 --- /dev/null +++ b/tools/testing/selftests/vm/write_hugetlb_memory.sh @@ -0,0 +1,23 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +set -e + +size=$1 +populate=$2 +write=$3 +cgroup=$4 +path=$5 +method=$6 +private=$7 +want_sleep=$8 +reserve=$9 + +echo "Putting task in cgroup '$cgroup'" +echo $$ > /dev/cgroup/memory/"$cgroup"/cgroup.procs + +echo "Method is $method" + +set +e +./write_to_hugetlbfs -p "$path" -s "$size" "$write" "$populate" -m "$method" \ + "$private" "$want_sleep" "$reserve" diff --git a/tools/testing/selftests/vm/write_to_hugetlbfs.c b/tools/testing/selftests/vm/write_to_hugetlbfs.c new file mode 100644 index 0000000000000..110bc4e4015d6 --- /dev/null +++ b/tools/testing/selftests/vm/write_to_hugetlbfs.c @@ -0,0 +1,242 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * This program reserves and uses hugetlb memory, supporting a bunch of + * scenarios needed by the charged_reserved_hugetlb.sh test. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* Global definitions. */ +enum method { + HUGETLBFS, + MMAP_MAP_HUGETLB, + SHM, + MAX_METHOD +}; + + +/* Global variables. */ +static const char *self; +static char *shmaddr; +static int shmid; + +/* + * Show usage and exit. + */ +static void exit_usage(void) +{ + printf("Usage: %s -p -s " + "[-m <0=hugetlbfs | 1=mmap(MAP_HUGETLB)>] [-l] [-r] " + "[-o] [-w] [-n]\n", + self); + exit(EXIT_FAILURE); +} + +void sig_handler(int signo) +{ + printf("Received %d.\n", signo); + if (signo == SIGINT) { + printf("Deleting the memory\n"); + if (shmdt((const void *)shmaddr) != 0) { + perror("Detach failure"); + shmctl(shmid, IPC_RMID, NULL); + exit(4); + } + + shmctl(shmid, IPC_RMID, NULL); + printf("Done deleting the memory\n"); + } + exit(2); +} + +int main(int argc, char **argv) +{ + int fd = 0; + int key = 0; + int *ptr = NULL; + int c = 0; + int size = 0; + char path[256] = ""; + enum method method = MAX_METHOD; + int want_sleep = 0, private = 0; + int populate = 0; + int write = 0; + int reserve = 1; + + unsigned long i; + + if (signal(SIGINT, sig_handler) == SIG_ERR) + err(1, "\ncan't catch SIGINT\n"); + + /* Parse command-line arguments. */ + setvbuf(stdout, NULL, _IONBF, 0); + self = argv[0]; + + while ((c = getopt(argc, argv, "s:p:m:owlrn")) != -1) { + switch (c) { + case 's': + size = atoi(optarg); + break; + case 'p': + strncpy(path, optarg, sizeof(path)); + break; + case 'm': + if (atoi(optarg) >= MAX_METHOD) { + errno = EINVAL; + perror("Invalid -m."); + exit_usage(); + } + method = atoi(optarg); + break; + case 'o': + populate = 1; + break; + case 'w': + write = 1; + break; + case 'l': + want_sleep = 1; + break; + case 'r': + private + = 1; + break; + case 'n': + reserve = 0; + break; + default: + errno = EINVAL; + perror("Invalid arg"); + exit_usage(); + } + } + + if (strncmp(path, "", sizeof(path)) != 0) { + printf("Writing to this path: %s\n", path); + } else { + errno = EINVAL; + perror("path not found"); + exit_usage(); + } + + if (size != 0) { + printf("Writing this size: %d\n", size); + } else { + errno = EINVAL; + perror("size not found"); + exit_usage(); + } + + if (!populate) + printf("Not populating.\n"); + else + printf("Populating.\n"); + + if (!write) + printf("Not writing to memory.\n"); + + if (method == MAX_METHOD) { + errno = EINVAL; + perror("-m Invalid"); + exit_usage(); + } else + printf("Using method=%d\n", method); + + if (!private) + printf("Shared mapping.\n"); + else + printf("Private mapping.\n"); + + if (!reserve) + printf("NO_RESERVE mapping.\n"); + else + printf("RESERVE mapping.\n"); + + switch (method) { + case HUGETLBFS: + printf("Allocating using HUGETLBFS.\n"); + fd = open(path, O_CREAT | O_RDWR, 0777); + if (fd == -1) + err(1, "Failed to open file."); + + ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, + (private ? MAP_PRIVATE : MAP_SHARED) | + (populate ? MAP_POPULATE : 0) | + (reserve ? 0 : MAP_NORESERVE), + fd, 0); + + if (ptr == MAP_FAILED) { + close(fd); + err(1, "Error mapping the file"); + } + break; + case MMAP_MAP_HUGETLB: + printf("Allocating using MAP_HUGETLB.\n"); + ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, + (private ? (MAP_PRIVATE | MAP_ANONYMOUS) : + MAP_SHARED) | + MAP_HUGETLB | (populate ? MAP_POPULATE : 0) | + (reserve ? 0 : MAP_NORESERVE), + -1, 0); + + if (ptr == MAP_FAILED) + err(1, "mmap"); + + printf("Returned address is %p\n", ptr); + break; + case SHM: + printf("Allocating using SHM.\n"); + shmid = shmget(key, size, + SHM_HUGETLB | IPC_CREAT | SHM_R | SHM_W); + if (shmid < 0) { + shmid = shmget(++key, size, + SHM_HUGETLB | IPC_CREAT | SHM_R | SHM_W); + if (shmid < 0) + err(1, "shmget"); + } + printf("shmid: 0x%x, shmget key:%d\n", shmid, key); + + ptr = shmat(shmid, NULL, 0); + if (ptr == (int *)-1) { + perror("Shared memory attach failure"); + shmctl(shmid, IPC_RMID, NULL); + exit(2); + } + printf("shmaddr: %p\n", ptr); + + break; + default: + errno = EINVAL; + err(1, "Invalid method."); + } + + if (write) { + printf("Writing to memory.\n"); + memset(ptr, 1, size); + } + + if (want_sleep) { + /* Signal to caller that we're done. */ + printf("DONE\n"); + + /* Hold memory until external kill signal is delivered. */ + while (1) + sleep(100); + } + + if (method == HUGETLBFS) + close(fd); + + return 0; +}