From patchwork Mon Mar 1 06:22:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 389048 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FF54C433E0 for ; Mon, 1 Mar 2021 06:27:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2937961494 for ; Mon, 1 Mar 2021 06:27:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232016AbhCAG04 (ORCPT ); Mon, 1 Mar 2021 01:26:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48234 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231933AbhCAGZx (ORCPT ); Mon, 1 Mar 2021 01:25:53 -0500 Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55595C061793 for ; Sun, 28 Feb 2021 22:24:47 -0800 (PST) Received: by mail-pj1-x102f.google.com with SMTP id u12so10803141pjr.2 for ; Sun, 28 Feb 2021 22:24:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=o7QhLv5gQurpL6BkSCLndHABMXsANOfSdd3xoUJITvQ=; b=Yar2qzoSxku0EoB9WmsR/XSu1T9sO9Fx49sg1/60AwPq3qbgiP/Y8Ck4ZCrh4R6OXA KoZDdJzIOuO7ltFFb5kUO55IP9PwiOYtJ+QNaGTS1h7GkKdxO+01SNIzhVbNvIi/8loz sinz/+KSOoHEo03bdgot6IzvHDT///+t45el89jQmqGASFqmWSAo6netSHf9+g3SDwWY xHx4D99m4+YRf6zSAEYFB5A9ay2HO7ONkgea70W3E5xLJkikMgIJgq4cSS80xRGJvPh/ yOFpm7UI9g2P2XOF49h4yRPoNYA7oLRwBKLB+mCzEuDssMTYf8H1xy38Duov3x/OhCEJ iodg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=o7QhLv5gQurpL6BkSCLndHABMXsANOfSdd3xoUJITvQ=; b=IzAvYFzwY+7IQhUqB4Mo5l2E8qBQMpZPl+L8JiLHinroi5Rn1kPTk+PrHTcaRcj+MR PuuB2juGBy/sRH7pro3ItqjGMMmDLy0ID454OvLD2N5HGJ7J72j+9WnZOnmjvTxt3lXw quuQS0w8eGEEDIB4YqRGla3L5p3wdukBzOE1pnBlQe02HNG8rT39XJi90YbyuUTLxPoM Fkjkz4HEQJ1/jBrjZ+UCAw9Ctb/f4EdYNJQIJ4P0BTSuYn4UX020BST44DKF6iZrFDSU 8rCAy/ZhnUVhbA3QPMZFWgS5WhI5ZABiuTepKcMlIogIN5soPtrSBCIVLEr0TiKnFSbD 42GA== X-Gm-Message-State: AOAM533Lo0TRaZEJLYB+8RiXNix/VbF+1VlkGMRJKGLJw1c90ehD2AvH 7nVBnxrfgK8dWYtM6iWwvD25cg== X-Google-Smtp-Source: ABdhPJwG2x6S0WoOhSbLApU+6gZ68jAo9FncXvx14C+W9F4m8AqCqBOXVN9iwo2ZJyMz0HrV1Ek8wQ== X-Received: by 2002:a17:90a:8c84:: with SMTP id b4mr15827997pjo.21.1614579886895; Sun, 28 Feb 2021 22:24:46 -0800 (PST) Received: from localhost.localdomain ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id x6sm14304626pfd.12.2021.02.28.22.24.28 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 28 Feb 2021 22:24:46 -0800 (PST) From: Muchun Song To: viro@zeniv.linux.org.uk, jack@suse.cz, amir73il@gmail.com, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org, shakeelb@google.com, guro@fb.com, songmuchun@bytedance.com, alex.shi@linux.alibaba.com, alexander.h.duyck@linux.intel.com, chris@chrisdown.name, richard.weiyang@gmail.com, vbabka@suse.cz, mathieu.desnoyers@efficios.com, posk@google.com, jannh@google.com, iamjoonsoo.kim@lge.com, daniel.vetter@ffwll.ch, longman@redhat.com, walken@google.com, christian.brauner@ubuntu.com, ebiederm@xmission.com, keescook@chromium.org, krisman@collabora.com, esyr@redhat.com, surenb@google.com, elver@google.com Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com Subject: [PATCH 1/5] mm: memcontrol: introduce obj_cgroup_{un}charge_page Date: Mon, 1 Mar 2021 14:22:23 +0800 Message-Id: <20210301062227.59292-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210301062227.59292-1-songmuchun@bytedance.com> References: <20210301062227.59292-1-songmuchun@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org We know that the unit of charging slab object is bytes, the unit of charging kmem page is PAGE_SIZE. So If we want to reuse obj_cgroup APIs to charge the kmem pages, we should pass PAGE_SIZE (as third parameter) to obj_cgroup_charge(). Because the charing size is page size, we always need to refill objcg stock. This is pointless. As we already know the charing size. So we can directly skip touch the objcg stock and introduce obj_cgroup_{un}charge_page() to charge or uncharge a kmem page. In the later patch, we can reuse those helpers to charge/uncharge the kmem pages. This is just code movement without any functional change. Signed-off-by: Muchun Song --- mm/memcontrol.c | 46 +++++++++++++++++++++++++++++++--------------- 1 file changed, 31 insertions(+), 15 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2db2aeac8a9e..2eafbae504ac 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3060,6 +3060,34 @@ static void memcg_free_cache_id(int id) ida_simple_remove(&memcg_cache_ida, id); } +static inline void obj_cgroup_uncharge_page(struct obj_cgroup *objcg, + unsigned int nr_pages) +{ + rcu_read_lock(); + __memcg_kmem_uncharge(obj_cgroup_memcg(objcg), nr_pages); + rcu_read_unlock(); +} + +static int obj_cgroup_charge_page(struct obj_cgroup *objcg, gfp_t gfp, + unsigned int nr_pages) +{ + struct mem_cgroup *memcg; + int ret; + + rcu_read_lock(); +retry: + memcg = obj_cgroup_memcg(objcg); + if (unlikely(!css_tryget(&memcg->css))) + goto retry; + rcu_read_unlock(); + + ret = __memcg_kmem_charge(memcg, gfp, nr_pages); + + css_put(&memcg->css); + + return ret; +} + /** * __memcg_kmem_charge: charge a number of kernel pages to a memcg * @memcg: memory cgroup to charge @@ -3184,11 +3212,8 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock) unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT; unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1); - if (nr_pages) { - rcu_read_lock(); - __memcg_kmem_uncharge(obj_cgroup_memcg(old), nr_pages); - rcu_read_unlock(); - } + if (nr_pages) + obj_cgroup_uncharge_page(old, nr_pages); /* * The leftover is flushed to the centralized per-memcg value. @@ -3246,7 +3271,6 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size) { - struct mem_cgroup *memcg; unsigned int nr_pages, nr_bytes; int ret; @@ -3263,24 +3287,16 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size) * refill_obj_stock(), called from this function or * independently later. */ - rcu_read_lock(); -retry: - memcg = obj_cgroup_memcg(objcg); - if (unlikely(!css_tryget(&memcg->css))) - goto retry; - rcu_read_unlock(); - nr_pages = size >> PAGE_SHIFT; nr_bytes = size & (PAGE_SIZE - 1); if (nr_bytes) nr_pages += 1; - ret = __memcg_kmem_charge(memcg, gfp, nr_pages); + ret = obj_cgroup_charge_page(objcg, gfp, nr_pages); if (!ret && nr_bytes) refill_obj_stock(objcg, PAGE_SIZE - nr_bytes); - css_put(&memcg->css); return ret; } From patchwork Mon Mar 1 06:22:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 390799 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEB10C433E0 for ; Mon, 1 Mar 2021 06:26:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 80DD264DE7 for ; Mon, 1 Mar 2021 06:26:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232006AbhCAG0Y (ORCPT ); Mon, 1 Mar 2021 01:26:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48240 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231960AbhCAGZx (ORCPT ); Mon, 1 Mar 2021 01:25:53 -0500 Received: from mail-pj1-x102d.google.com (mail-pj1-x102d.google.com [IPv6:2607:f8b0:4864:20::102d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 405C0C0617A9 for ; Sun, 28 Feb 2021 22:25:05 -0800 (PST) Received: by mail-pj1-x102d.google.com with SMTP id l18so11041676pji.3 for ; Sun, 28 Feb 2021 22:25:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ScywHubPCL6mfstg+EZNBBsmr5n/XLhbKlBOnoSWD+4=; b=UvFK2RYFvMGJVieWRz6EEiAEkFFy1AuuU+HfcqWKDo+F6EiwDPNX+770wZRay8HF2w 2lGPh1WGEsUC3TyiVsBDqFFXbIUktipAYtsUx0ZTj4MapTihLiryU3bDNQmzXl3ArW2G 7j2f10KmaRnl/I1zahSdLOswaH40DSKTjOYWYT+6G7fEHXdGfnKUSF62wAYMTc5Jiy2Y 2wcW4cv3U8VRbUpLmQukb2zk6Jg/hwXE8GN+Y78qjbuP86wLz18jHBXtV9qVyYBUhi+7 QmZodSw52kDaeZ8NGtVZBjn5x6VxbKX64g4xjlsdD5PI1ppKS3ot+9uA2/o+Bm77Ja3C wh5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ScywHubPCL6mfstg+EZNBBsmr5n/XLhbKlBOnoSWD+4=; b=T+VKS4JFPQBliKyxqYFGQg6ldo2lFILFTs6/3qMuhVst2MrjsfqyrnIgDSIpvPzp6H +i8nBxs70TuNehTMIJDcEMgCWyeG1p+APKzFVvdD0Qfa1nVw7YiZgM98jBYqS2OBwRn9 FCBaTImjFH8YH1JAt/eW9B6LUKDa5Ko3BLMxofbxKpKiEaeICH3R2/MviIK5XeqGvXBA KWbfTO55mAGJwneFUxm6ws7Cidq37ijsxX6dFb+ihfGgaMl9Lyuhp7FvUDWa3pfLbF5k xjY/hKmOCr8GFU31IP12uKsSSBacXo5/8IP1f376d8zf116ZLdmv3s0+RbpU+pT+1SU0 Yy6g== X-Gm-Message-State: AOAM531GkGab1Y9BDMDxCerrF4i8fw7I7NrNh7OeeLThQ4hHIH88AnFr RKWPuecFEJIMuza+JJtwyecyhQ== X-Google-Smtp-Source: ABdhPJzl0qMjxF8cjfvOsSElT8t+wiNNqf2AIAekXqR7esLsVA1FS5nXOo9w3dsZVIA37VchS9oTtA== X-Received: by 2002:a17:902:c94f:b029:e4:59a3:2915 with SMTP id i15-20020a170902c94fb02900e459a32915mr14080565pla.9.1614579904758; Sun, 28 Feb 2021 22:25:04 -0800 (PST) Received: from localhost.localdomain ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id x6sm14304626pfd.12.2021.02.28.22.24.47 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 28 Feb 2021 22:25:04 -0800 (PST) From: Muchun Song To: viro@zeniv.linux.org.uk, jack@suse.cz, amir73il@gmail.com, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org, shakeelb@google.com, guro@fb.com, songmuchun@bytedance.com, alex.shi@linux.alibaba.com, alexander.h.duyck@linux.intel.com, chris@chrisdown.name, richard.weiyang@gmail.com, vbabka@suse.cz, mathieu.desnoyers@efficios.com, posk@google.com, jannh@google.com, iamjoonsoo.kim@lge.com, daniel.vetter@ffwll.ch, longman@redhat.com, walken@google.com, christian.brauner@ubuntu.com, ebiederm@xmission.com, keescook@chromium.org, krisman@collabora.com, esyr@redhat.com, surenb@google.com, elver@google.com Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com Subject: [PATCH 2/5] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page Date: Mon, 1 Mar 2021 14:22:24 +0800 Message-Id: <20210301062227.59292-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210301062227.59292-1-songmuchun@bytedance.com> References: <20210301062227.59292-1-songmuchun@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org We want to reuse the obj_cgroup APIs to reparent the kmem pages when the memcg offlined. If we do this, we should store an object cgroup pointer to page->memcg_data for the kmem pages. Finally, page->memcg_data can have 3 different meanings. 1) For the slab pages, page->memcg_data points to an object cgroups vector. 2) For the kmem pages (exclude the slab pages), page->memcg_data points to an object cgroup. 3) For the user pages (e.g. the LRU pages), page->memcg_data points to a memory cgroup. Currently we always get the memcg associated with a page via page_memcg or page_memcg_rcu. page_memcg_check is special, it has to be used in cases when it's not known if a page has an associated memory cgroup pointer or an object cgroups vector. Because the page->memcg_data of the kmem page is not pointing to a memory cgroup in the later patch, the page_memcg and page_memcg_rcu cannot be applicable for the kmem pages. In this patch, we introduce page_memcg_kmem to get the memcg associated with the kmem pages. And make page_memcg and page_memcg_rcu no longer apply to the kmem pages. In the end, there are 4 helpers to get the memcg associated with a page. The usage is as follows. 1) Get the memory cgroup associated with a non-kmem page (e.g. the LRU pages). - page_memcg() - page_memcg_rcu() 2) Get the memory cgroup associated with a kmem page (exclude the slab pages). - page_memcg_kmem() 3) Get the memory cgroup associated with a page. It has to be used in cases when it's not known if a page has an associated memory cgroup pointer or an object cgroups vector. Returns NULL for slab pages or uncharged pages, otherwise, returns memory cgroup for charged pages (e.g. kmem pages, LRU pages). - page_memcg_check() In some place, we use page_memcg to check whether the page is charged. Now we introduce page_memcg_charged helper to do this. This is a preparation for reparenting the kmem pages. To support reparent kmem pages, we just need to adjust page_memcg_kmem and page_memcg_check in the later patch. Signed-off-by: Muchun Song --- include/linux/memcontrol.h | 56 +++++++++++++++++++++++++++++++++++++++------- mm/memcontrol.c | 23 ++++++++++--------- mm/page_alloc.c | 4 ++-- 3 files changed, 63 insertions(+), 20 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e6dc793d587d..1d2c82464c8c 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -358,14 +358,46 @@ enum page_memcg_data_flags { #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1) +/* Return true for charged page, otherwise false. */ +static inline bool page_memcg_charged(struct page *page) +{ + unsigned long memcg_data = page->memcg_data; + + VM_BUG_ON_PAGE(PageSlab(page), page); + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); + + return !!memcg_data; +} + /* - * page_memcg - get the memory cgroup associated with a page + * page_memcg_kmem - get the memory cgroup associated with a kmem page. + * @page: a pointer to the page struct + * + * Returns a pointer to the memory cgroup associated with the kmem page, + * or NULL. This function assumes that the page is known to have a proper + * memory cgroup pointer. It is only suitable for kmem pages which means + * PageMemcgKmem() returns true for this page. + */ +static inline struct mem_cgroup *page_memcg_kmem(struct page *page) +{ + unsigned long memcg_data = page->memcg_data; + + VM_BUG_ON_PAGE(PageSlab(page), page); + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); + VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page); + + return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); +} + +/* + * page_memcg - get the memory cgroup associated with a non-kmem page * @page: a pointer to the page struct * * Returns a pointer to the memory cgroup associated with the page, * or NULL. This function assumes that the page is known to have a * proper memory cgroup pointer. It's not safe to call this function - * against some type of pages, e.g. slab pages or ex-slab pages. + * against some type of pages, e.g. slab pages, kmem pages or ex-slab + * pages. * * Any of the following ensures page and memcg binding stability: * - the page lock @@ -378,27 +410,30 @@ static inline struct mem_cgroup *page_memcg(struct page *page) unsigned long memcg_data = page->memcg_data; VM_BUG_ON_PAGE(PageSlab(page), page); - VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_FLAGS_MASK, page); - return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); + return (struct mem_cgroup *)memcg_data; } /* - * page_memcg_rcu - locklessly get the memory cgroup associated with a page + * page_memcg_rcu - locklessly get the memory cgroup associated with a non-kmem page * @page: a pointer to the page struct * * Returns a pointer to the memory cgroup associated with the page, * or NULL. This function assumes that the page is known to have a * proper memory cgroup pointer. It's not safe to call this function - * against some type of pages, e.g. slab pages or ex-slab pages. + * against some type of pages, e.g. slab pages, kmem pages or ex-slab + * pages. */ static inline struct mem_cgroup *page_memcg_rcu(struct page *page) { + unsigned long memcg_data = READ_ONCE(page->memcg_data); + VM_BUG_ON_PAGE(PageSlab(page), page); + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_FLAGS_MASK, page); WARN_ON_ONCE(!rcu_read_lock_held()); - return (struct mem_cgroup *)(READ_ONCE(page->memcg_data) & - ~MEMCG_DATA_FLAGS_MASK); + return (struct mem_cgroup *)memcg_data; } /* @@ -1072,6 +1107,11 @@ void mem_cgroup_split_huge_fixup(struct page *head); struct mem_cgroup; +static inline bool page_memcg_charged(struct page *page) +{ + return false; +} + static inline struct mem_cgroup *page_memcg(struct page *page) { return NULL; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2eafbae504ac..bfd6efe1e196 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -855,10 +855,11 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx, int val) { struct page *head = compound_head(page); /* rmap on tail pages */ - struct mem_cgroup *memcg = page_memcg(head); + struct mem_cgroup *memcg; pg_data_t *pgdat = page_pgdat(page); struct lruvec *lruvec; + memcg = PageMemcgKmem(head) ? page_memcg_kmem(head) : page_memcg(head); /* Untracked pages have no memcg, no lruvec. Update only the node */ if (!memcg) { __mod_node_page_state(pgdat, idx, val); @@ -3170,12 +3171,13 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) */ void __memcg_kmem_uncharge_page(struct page *page, int order) { - struct mem_cgroup *memcg = page_memcg(page); + struct mem_cgroup *memcg; unsigned int nr_pages = 1 << order; - if (!memcg) + if (!page_memcg_charged(page)) return; + memcg = page_memcg_kmem(page); VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page); __memcg_kmem_uncharge(memcg, nr_pages); page->memcg_data = 0; @@ -6831,24 +6833,25 @@ static void uncharge_batch(const struct uncharge_gather *ug) static void uncharge_page(struct page *page, struct uncharge_gather *ug) { unsigned long nr_pages; + struct mem_cgroup *memcg; VM_BUG_ON_PAGE(PageLRU(page), page); - if (!page_memcg(page)) + if (!page_memcg_charged(page)) return; /* * Nobody should be changing or seriously looking at - * page_memcg(page) at this point, we have fully - * exclusive access to the page. + * page memcg at this point, we have fully exclusive + * access to the page. */ - - if (ug->memcg != page_memcg(page)) { + memcg = PageMemcgKmem(page) ? page_memcg_kmem(page) : page_memcg(page); + if (ug->memcg != memcg) { if (ug->memcg) { uncharge_batch(ug); uncharge_gather_clear(ug); } - ug->memcg = page_memcg(page); + ug->memcg = memcg; /* pairs with css_put in uncharge_batch */ css_get(&ug->memcg->css); @@ -6881,7 +6884,7 @@ void mem_cgroup_uncharge(struct page *page) return; /* Don't touch page->lru of any random page, pre-check: */ - if (!page_memcg(page)) + if (!page_memcg_charged(page)) return; uncharge_gather_clear(&ug); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f10966e3b4a5..bcb58ae15e24 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1124,7 +1124,7 @@ static inline bool page_expected_state(struct page *page, if (unlikely((unsigned long)page->mapping | page_ref_count(page) | #ifdef CONFIG_MEMCG - (unsigned long)page_memcg(page) | + page_memcg_charged(page) | #endif (page->flags & check_flags))) return false; @@ -1149,7 +1149,7 @@ static const char *page_bad_reason(struct page *page, unsigned long flags) bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set"; } #ifdef CONFIG_MEMCG - if (unlikely(page_memcg(page))) + if (unlikely(page_memcg_charged(page))) bad_reason = "page still charged to cgroup"; #endif return bad_reason; From patchwork Mon Mar 1 06:22:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 390798 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6975BC43381 for ; Mon, 1 Mar 2021 06:27:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1A6C164E09 for ; Mon, 1 Mar 2021 06:27:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232001AbhCAG1S (ORCPT ); Mon, 1 Mar 2021 01:27:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48226 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231989AbhCAG0R (ORCPT ); Mon, 1 Mar 2021 01:26:17 -0500 Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CFED6C061756 for ; Sun, 28 Feb 2021 22:25:20 -0800 (PST) Received: by mail-pl1-x62b.google.com with SMTP id s16so9208548plr.9 for ; Sun, 28 Feb 2021 22:25:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=wh9K8k/cLdEa+pvuxX0zgE5b842GEaAUv1TfPTB1I1E=; b=a1njz4rSfNhYNlrVmnxZvIibOJ3N5IqT3x9MOJOzivPWjvWO7cRRE4PQQdR1+f2mBM E0Jkya+zdkPfmtLMf9tMZGa6ujnyrWnvRcaGnsVip8Nm3c8C5Zn3odFwF1uuAHlzMiYP P9Eps74BbRMqv2xdOugMvWCjfQCjogdmgHuo4KAKShWvT7QHAgL8K9X41llRif3QFHN1 NNG+ohZ4yQsg8hp8jHSmKkinxDvnoEDptm5WTFkiXI1/fCzDo4YnjXgHAHQRxUny4u9B WJDupUt8w2sSdcLCT5kr/LZ3bTPVbyJ27SY8iotwHvJWdXL9DasADNr/tiPfSHK30r8v 8spA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=wh9K8k/cLdEa+pvuxX0zgE5b842GEaAUv1TfPTB1I1E=; b=WQpan0U/tdw0FEt6ssNdEjHEj68thsiE48kkwaOhbVbqddmLXuM+J1x/eEq6bla4s8 9vDlpTo6/K31KEAmJEtxiBHDi9pyU8BDekDAqEvH2/oM/gdK8Xea1Rhxa7N7whTlWTKi nSEtN4N74iP3oKeMyojIS6KHeY8aZkPV+CCAgkxNAzFLizdvZcNgsHha/ucX65QHR4ym kaOT+fwq2rJ55ggL8bsGr2BIx2wWkg7LFwrz/2zQjRMSGjuE5Q6yjGP1yBPuqMvff9e1 t1jFQWacJU6Tzo1m5TazpWv1yo6SHG/Uqco742L4JJtS/hjXy1HIz0W85iCNXxYR+h+Y TK1Q== X-Gm-Message-State: AOAM532xLpIUH40Wi2sqJeEEgxLs8nuBHMBzqC1gCqoN6uVMzIroCghU +bsdfIASYHlZP4hHtNDk9ASZvw== X-Google-Smtp-Source: ABdhPJz2R6TNlSFuQyimvuh+McsZ5qntIYM0Gtg8I9zxelrrwm2OitPXd7CeqNORMhI0Hu+T/1UfIA== X-Received: by 2002:a17:902:ed82:b029:e2:d106:e76e with SMTP id e2-20020a170902ed82b02900e2d106e76emr14406195plj.38.1614579920255; Sun, 28 Feb 2021 22:25:20 -0800 (PST) Received: from localhost.localdomain ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id x6sm14304626pfd.12.2021.02.28.22.25.05 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 28 Feb 2021 22:25:19 -0800 (PST) From: Muchun Song To: viro@zeniv.linux.org.uk, jack@suse.cz, amir73il@gmail.com, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org, shakeelb@google.com, guro@fb.com, songmuchun@bytedance.com, alex.shi@linux.alibaba.com, alexander.h.duyck@linux.intel.com, chris@chrisdown.name, richard.weiyang@gmail.com, vbabka@suse.cz, mathieu.desnoyers@efficios.com, posk@google.com, jannh@google.com, iamjoonsoo.kim@lge.com, daniel.vetter@ffwll.ch, longman@redhat.com, walken@google.com, christian.brauner@ubuntu.com, ebiederm@xmission.com, keescook@chromium.org, krisman@collabora.com, esyr@redhat.com, surenb@google.com, elver@google.com Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com Subject: [PATCH 3/5] mm: memcontrol: reparent the kmem pages on cgroup removal Date: Mon, 1 Mar 2021 14:22:25 +0800 Message-Id: <20210301062227.59292-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210301062227.59292-1-songmuchun@bytedance.com> References: <20210301062227.59292-1-songmuchun@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Currently the slab objects already reparent to it's parent memcg on cgroup removal. But there are still some corner objects which are not reparent (e.g. allocations larger than order-1 page on SLUB). Actually those objects are allocated directly from the buddy allocator. And they are chared as kmem to memcg via __memcg_kmem_charge_page(). Such objects are not reparent on cgroup removal. So this patch aims to reparent kmem pages on cgroup removal. Doing this is simple with help of the infrastructures of obj_cgroup. Finally, the page->memcg_data points to an object cgroup for the kmem page. Signed-off-by: Muchun Song --- include/linux/memcontrol.h | 66 +++++++++++-------- mm/memcontrol.c | 155 ++++++++++++++++++++++++--------------------- 2 files changed, 124 insertions(+), 97 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 1d2c82464c8c..27043478220f 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -370,23 +370,15 @@ static inline bool page_memcg_charged(struct page *page) } /* - * page_memcg_kmem - get the memory cgroup associated with a kmem page. - * @page: a pointer to the page struct + * After the initialization objcg->memcg is always pointing at + * a valid memcg, but can be atomically swapped to the parent memcg. * - * Returns a pointer to the memory cgroup associated with the kmem page, - * or NULL. This function assumes that the page is known to have a proper - * memory cgroup pointer. It is only suitable for kmem pages which means - * PageMemcgKmem() returns true for this page. + * The caller must ensure that the returned memcg won't be released: + * e.g. acquire the rcu_read_lock or css_set_lock. */ -static inline struct mem_cgroup *page_memcg_kmem(struct page *page) +static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) { - unsigned long memcg_data = page->memcg_data; - - VM_BUG_ON_PAGE(PageSlab(page), page); - VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); - VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page); - - return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); + return READ_ONCE(objcg->memcg); } /* @@ -462,6 +454,17 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page) if (memcg_data & MEMCG_DATA_OBJCGS) return NULL; + if (memcg_data & MEMCG_DATA_KMEM) { + struct obj_cgroup *objcg; + + /* + * The caller must ensure that the returned memcg won't be + * released: e.g. acquire the rcu_read_lock or css_set_lock. + */ + objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); + return obj_cgroup_memcg(objcg); + } + return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); } @@ -520,6 +523,24 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page) return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); } +/* + * page_objcg - get the object cgroup associated with a kmem page + * @page: a pointer to the page struct + * + * Returns a pointer to the object cgroup associated with the kmem page, + * or NULL. This function assumes that the page is known to have an + * associated object cgroup. It's only safe to call this function + * against kmem pages (PageMemcgKmem() returns true). + */ +static inline struct obj_cgroup *page_objcg(struct page *page) +{ + unsigned long memcg_data = page->memcg_data; + + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); + VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page); + + return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); +} #else static inline struct obj_cgroup **page_objcgs(struct page *page) { @@ -530,6 +551,11 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page) { return NULL; } + +static inline struct obj_cgroup *page_objcg(struct page *page) +{ + return NULL; +} #endif static __always_inline bool memcg_stat_item_in_bytes(int idx) @@ -748,18 +774,6 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg) percpu_ref_put(&objcg->refcnt); } -/* - * After the initialization objcg->memcg is always pointing at - * a valid memcg, but can be atomically swapped to the parent memcg. - * - * The caller must ensure that the returned memcg won't be released: - * e.g. acquire the rcu_read_lock or css_set_lock. - */ -static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) -{ - return READ_ONCE(objcg->memcg); -} - static inline void mem_cgroup_put(struct mem_cgroup *memcg) { if (memcg) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index bfd6efe1e196..39cb8c5bf8b2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -856,10 +856,16 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx, { struct page *head = compound_head(page); /* rmap on tail pages */ struct mem_cgroup *memcg; - pg_data_t *pgdat = page_pgdat(page); + pg_data_t *pgdat; struct lruvec *lruvec; - memcg = PageMemcgKmem(head) ? page_memcg_kmem(head) : page_memcg(head); + if (PageMemcgKmem(head)) { + __mod_lruvec_kmem_state(page_to_virt(head), idx, val); + return; + } + + pgdat = page_pgdat(head); + memcg = page_memcg(head); /* Untracked pages have no memcg, no lruvec. Update only the node */ if (!memcg) { __mod_node_page_state(pgdat, idx, val); @@ -1056,24 +1062,6 @@ static __always_inline struct mem_cgroup *active_memcg(void) return current->active_memcg; } -static __always_inline struct mem_cgroup *get_active_memcg(void) -{ - struct mem_cgroup *memcg; - - rcu_read_lock(); - memcg = active_memcg(); - if (memcg) { - /* current->active_memcg must hold a ref. */ - if (WARN_ON_ONCE(!css_tryget(&memcg->css))) - memcg = root_mem_cgroup; - else - memcg = current->active_memcg; - } - rcu_read_unlock(); - - return memcg; -} - static __always_inline bool memcg_kmem_bypass(void) { /* Allow remote memcg charging from any context. */ @@ -1088,20 +1076,6 @@ static __always_inline bool memcg_kmem_bypass(void) } /** - * If active memcg is set, do not fallback to current->mm->memcg. - */ -static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void) -{ - if (memcg_kmem_bypass()) - return NULL; - - if (unlikely(active_memcg())) - return get_active_memcg(); - - return get_mem_cgroup_from_mm(current->mm); -} - -/** * mem_cgroup_iter - iterate over memory cgroup hierarchy * @root: hierarchy root * @prev: previously returned memcg, NULL on first invocation @@ -3148,18 +3122,18 @@ static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_page */ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) { - struct mem_cgroup *memcg; + struct obj_cgroup *objcg; int ret = 0; - memcg = get_mem_cgroup_from_current(); - if (memcg && !mem_cgroup_is_root(memcg)) { - ret = __memcg_kmem_charge(memcg, gfp, 1 << order); + objcg = get_obj_cgroup_from_current(); + if (objcg) { + ret = obj_cgroup_charge_page(objcg, gfp, 1 << order); if (!ret) { - page->memcg_data = (unsigned long)memcg | + page->memcg_data = (unsigned long)objcg | MEMCG_DATA_KMEM; return 0; } - css_put(&memcg->css); + obj_cgroup_put(objcg); } return ret; } @@ -3171,17 +3145,18 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) */ void __memcg_kmem_uncharge_page(struct page *page, int order) { - struct mem_cgroup *memcg; + struct obj_cgroup *objcg; unsigned int nr_pages = 1 << order; if (!page_memcg_charged(page)) return; - memcg = page_memcg_kmem(page); - VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page); - __memcg_kmem_uncharge(memcg, nr_pages); + VM_BUG_ON_PAGE(!PageMemcgKmem(page), page); + + objcg = page_objcg(page); + obj_cgroup_uncharge_page(objcg, nr_pages); page->memcg_data = 0; - css_put(&memcg->css); + obj_cgroup_put(objcg); } static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) @@ -6798,8 +6773,12 @@ struct uncharge_gather { struct mem_cgroup *memcg; unsigned long nr_pages; unsigned long pgpgout; - unsigned long nr_kmem; struct page *dummy_page; + +#ifdef CONFIG_MEMCG_KMEM + struct obj_cgroup *objcg; + unsigned long nr_kmem; +#endif }; static inline void uncharge_gather_clear(struct uncharge_gather *ug) @@ -6811,12 +6790,21 @@ static void uncharge_batch(const struct uncharge_gather *ug) { unsigned long flags; +#ifdef CONFIG_MEMCG_KMEM + if (ug->objcg) { + obj_cgroup_uncharge_page(ug->objcg, ug->nr_kmem); + /* drop reference from uncharge_kmem_page */ + obj_cgroup_put(ug->objcg); + } +#endif + + if (!ug->memcg) + return; + if (!mem_cgroup_is_root(ug->memcg)) { page_counter_uncharge(&ug->memcg->memory, ug->nr_pages); if (do_memsw_account()) page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages); - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem) - page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem); memcg_oom_recover(ug->memcg); } @@ -6826,26 +6814,40 @@ static void uncharge_batch(const struct uncharge_gather *ug) memcg_check_events(ug->memcg, ug->dummy_page); local_irq_restore(flags); - /* drop reference from uncharge_page */ + /* drop reference from uncharge_user_page */ css_put(&ug->memcg->css); } -static void uncharge_page(struct page *page, struct uncharge_gather *ug) +#ifdef CONFIG_MEMCG_KMEM +static void uncharge_kmem_page(struct page *page, struct uncharge_gather *ug) { - unsigned long nr_pages; - struct mem_cgroup *memcg; + struct obj_cgroup *objcg = page_objcg(page); - VM_BUG_ON_PAGE(PageLRU(page), page); + if (ug->objcg != objcg) { + if (ug->objcg) { + uncharge_batch(ug); + uncharge_gather_clear(ug); + } + ug->objcg = objcg; - if (!page_memcg_charged(page)) - return; + /* pairs with obj_cgroup_put in uncharge_batch */ + obj_cgroup_get(ug->objcg); + } + + ug->nr_kmem += compound_nr(page); + page->memcg_data = 0; + obj_cgroup_put(ug->objcg); +} +#else +static void uncharge_kmem_page(struct page *page, struct uncharge_gather *ug) +{ +} +#endif + +static void uncharge_user_page(struct page *page, struct uncharge_gather *ug) +{ + struct mem_cgroup *memcg = page_memcg(page); - /* - * Nobody should be changing or seriously looking at - * page memcg at this point, we have fully exclusive - * access to the page. - */ - memcg = PageMemcgKmem(page) ? page_memcg_kmem(page) : page_memcg(page); if (ug->memcg != memcg) { if (ug->memcg) { uncharge_batch(ug); @@ -6856,18 +6858,30 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug) /* pairs with css_put in uncharge_batch */ css_get(&ug->memcg->css); } + ug->pgpgout++; + ug->dummy_page = page; + + ug->nr_pages += compound_nr(page); + page->memcg_data = 0; + css_put(&ug->memcg->css); +} - nr_pages = compound_nr(page); - ug->nr_pages += nr_pages; +static void uncharge_page(struct page *page, struct uncharge_gather *ug) +{ + VM_BUG_ON_PAGE(PageLRU(page), page); + if (!page_memcg_charged(page)) + return; + + /* + * Nobody should be changing or seriously looking at + * page memcg at this point, we have fully exclusive + * access to the page. + */ if (PageMemcgKmem(page)) - ug->nr_kmem += nr_pages; + uncharge_kmem_page(page, ug); else - ug->pgpgout++; - - ug->dummy_page = page; - page->memcg_data = 0; - css_put(&ug->memcg->css); + uncharge_user_page(page, ug); } /** @@ -6910,8 +6924,7 @@ void mem_cgroup_uncharge_list(struct list_head *page_list) uncharge_gather_clear(&ug); list_for_each_entry(page, page_list, lru) uncharge_page(page, &ug); - if (ug.memcg) - uncharge_batch(&ug); + uncharge_batch(&ug); } /** From patchwork Mon Mar 1 06:22:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 389047 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA796C433DB for ; Mon, 1 Mar 2021 06:28:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 707E064DE7 for ; Mon, 1 Mar 2021 06:28:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232058AbhCAG1r (ORCPT ); Mon, 1 Mar 2021 01:27:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48240 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232000AbhCAG0T (ORCPT ); Mon, 1 Mar 2021 01:26:19 -0500 Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78466C06121F for ; Sun, 28 Feb 2021 22:25:36 -0800 (PST) Received: by mail-pj1-x1034.google.com with SMTP id c19so10380763pjq.3 for ; Sun, 28 Feb 2021 22:25:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cilHKXrKoJZLqevMF5os7tc8I3h7Pgl0p1o36mO7uUg=; b=zssOM8W3NPBUtqhS102B5C1SbgE+Bno0K/TPpFBgYelvNPEhaNlsO+yd+VVxFNtaaK o5pFwnjOR4aO86xW3/lnvTINF6Iirdi/psRZIigw8T0h2n1QNlYJg/wxUwnXY9zMqEBZ lTh3rQTJtcd+uUKvKFBk0sJK13wJGcNmvDqCYFIp9VywEzeCUUALceII5oYAR8aoC7Zv 3k30ytpmiItqUAZdHqYleUAlLuU0TbWtMh/DZXVKi8EfGvffWzK9kToWwTUlfXgLampg 9OsLywpEh8Y34TOjPoOpb45nqB2oCuFssRLKkeoB9Bq7xuMXkTnpdadyJ4sufQmmEyUt GqCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cilHKXrKoJZLqevMF5os7tc8I3h7Pgl0p1o36mO7uUg=; b=p/+ixmuyprbECsqPrmwS7+j22bewNUH45aJvJi3JJ7yA0hkputHX6u7ZwoyWB1vRAT dU5a535fjKgBgc2+jWD/XGDm+WETnfJae6jylA08QCrYT5KpGTAfaD1z1/lsM3C+D3F+ aywPBRX7R6I6+bQnWaDdzyoiCEKyUk6jqjenNIGiFtHoLC7q6wwqIJ6C+N5L4JijZiIF aZHG0lE/q0PeqJiNqWdbPAcZRRKjZvFH1yPofRFLPsjoXgqYOnpXk7uZLIs9TaDTWIT5 7qS6IfH7wkymEuaBkGL9wBPwNMv8QdC+Nh+K++XxiWUyq0166T1s8z2mIzu+Ehjd8FsV na7A== X-Gm-Message-State: AOAM532B7FoUO/xGaQ4QNUaFdkyfjzhQmugmeYyXo2CW01ZN5rR8eKGT gE2uCI8uB+iidMCIKlUPMevu3A== X-Google-Smtp-Source: ABdhPJwKcQq6Yrke/Tm5Uu3pCbA2fYBARwyQ1sJ8bIr8LOsrVilkNDd2hPKgZi+Qgq+FIgomMOS4GQ== X-Received: by 2002:a17:90b:3890:: with SMTP id mu16mr16002179pjb.9.1614579936050; Sun, 28 Feb 2021 22:25:36 -0800 (PST) Received: from localhost.localdomain ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id x6sm14304626pfd.12.2021.02.28.22.25.20 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 28 Feb 2021 22:25:35 -0800 (PST) From: Muchun Song To: viro@zeniv.linux.org.uk, jack@suse.cz, amir73il@gmail.com, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org, shakeelb@google.com, guro@fb.com, songmuchun@bytedance.com, alex.shi@linux.alibaba.com, alexander.h.duyck@linux.intel.com, chris@chrisdown.name, richard.weiyang@gmail.com, vbabka@suse.cz, mathieu.desnoyers@efficios.com, posk@google.com, jannh@google.com, iamjoonsoo.kim@lge.com, daniel.vetter@ffwll.ch, longman@redhat.com, walken@google.com, christian.brauner@ubuntu.com, ebiederm@xmission.com, keescook@chromium.org, krisman@collabora.com, esyr@redhat.com, surenb@google.com, elver@google.com Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com Subject: [PATCH 4/5] mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM Date: Mon, 1 Mar 2021 14:22:26 +0800 Message-Id: <20210301062227.59292-5-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210301062227.59292-1-songmuchun@bytedance.com> References: <20210301062227.59292-1-songmuchun@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The remote memcg charing APIs is a mechanism to charge kernel memory to a given memcg. So we can move the infrastructure to the scope of the CONFIG_MEMCG_KMEM. As a bonus, on !CONFIG_MEMCG_KMEM build some functions and variables can be compiled out. Signed-off-by: Muchun Song --- include/linux/sched.h | 2 ++ include/linux/sched/mm.h | 2 +- kernel/fork.c | 2 +- mm/memcontrol.c | 4 ++++ 4 files changed, 8 insertions(+), 2 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index ee46f5cab95b..c2d488eddf85 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1314,7 +1314,9 @@ struct task_struct { /* Number of pages to reclaim on returning to userland: */ unsigned int memcg_nr_pages_over_high; +#endif +#ifdef CONFIG_MEMCG_KMEM /* Used by memcontrol for targeted memcg charge: */ struct mem_cgroup *active_memcg; #endif diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 1ae08b8462a4..64a72975270e 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -294,7 +294,7 @@ static inline void memalloc_nocma_restore(unsigned int flags) } #endif -#ifdef CONFIG_MEMCG +#ifdef CONFIG_MEMCG_KMEM DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg); /** * set_active_memcg - Starts the remote memcg charging scope. diff --git a/kernel/fork.c b/kernel/fork.c index d66cd1014211..d66718bc82d5 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -942,7 +942,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) tsk->use_memdelay = 0; #endif -#ifdef CONFIG_MEMCG +#ifdef CONFIG_MEMCG_KMEM tsk->active_memcg = NULL; #endif return tsk; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 39cb8c5bf8b2..092dc4588b43 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -76,8 +76,10 @@ EXPORT_SYMBOL(memory_cgrp_subsys); struct mem_cgroup *root_mem_cgroup __read_mostly; +#ifdef CONFIG_MEMCG_KMEM /* Active memory cgroup to use from an interrupt context */ DEFINE_PER_CPU(struct mem_cgroup *, int_active_memcg); +#endif /* Socket memory accounting disabled? */ static bool cgroup_memory_nosocket; @@ -1054,6 +1056,7 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm) } EXPORT_SYMBOL(get_mem_cgroup_from_mm); +#ifdef CONFIG_MEMCG_KMEM static __always_inline struct mem_cgroup *active_memcg(void) { if (in_interrupt()) @@ -1074,6 +1077,7 @@ static __always_inline bool memcg_kmem_bypass(void) return false; } +#endif /** * mem_cgroup_iter - iterate over memory cgroup hierarchy From patchwork Mon Mar 1 06:22:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 390797 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 767D3C433E0 for ; Mon, 1 Mar 2021 06:28:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 303CA64E31 for ; Mon, 1 Mar 2021 06:28:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232021AbhCAG2Z (ORCPT ); Mon, 1 Mar 2021 01:28:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48438 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232011AbhCAG0p (ORCPT ); Mon, 1 Mar 2021 01:26:45 -0500 Received: from mail-pg1-x52f.google.com (mail-pg1-x52f.google.com [IPv6:2607:f8b0:4864:20::52f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB256C061225 for ; Sun, 28 Feb 2021 22:25:53 -0800 (PST) Received: by mail-pg1-x52f.google.com with SMTP id a23so1042590pga.8 for ; Sun, 28 Feb 2021 22:25:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=nxql2aJwrwhG2WdeUI/4PFWqz52uoM1i3aXYU+EM6Pg=; b=TK3zr8pb3882L+sK1u2LI53XOaW5aXAHXpIDx4WHeHfHhvmh1Dln3Az1N0swv/bhHk hOIiBs97nIXnNCQYBwArdvdkPoT3Z+i3Uy1l0QXDN+/XqsS7PpELpAAokqShO9m6P/Ap sUJnm0E1y9hhGtESNdMYLlAJkE6MBa78pH9PPXxycOchrnvZ1lh4fRMNTYsYkhZJuTTP 46mdbQgqOzFXVyd0rfp837y40sA0/9HfuQHSjz3uP/KqB6VaSvuKHuVwysk3sFqCjKSH TQzDSOP6B03g6CnqRXZ44wpDyguyp39LyUSpPW9zan7/JFjkNQm7HMgFS7T/raBZIFKm pmdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=nxql2aJwrwhG2WdeUI/4PFWqz52uoM1i3aXYU+EM6Pg=; b=ebALU64BDK68ejWtRpRXgGl9eIUELQ7UtYojfRkQDTaBSAbd8KQRLp3KM7YHyA1Qb+ 6pk1ecONkvLn4qgP+L7PxSjvF+V3dnJ7HGBfF6Y8MtCaNbBPAZUWtBvJC3/VLcxtjGdE nTDUuKkR4j3eIQTBMbMUHmUUj220CRgsJP+pkSzHGSBdXIrLixoyo0IN+TuNl/lxXWqy G7RHTljxx0tWLZFEwfti3/QcgXs/B3Rgjr1l7kzztmMCyw8Wifh3Wja4aSB23w7hrySd gRp3oENl1iPUTjpR1ROuf6OJylPTIbOFRSIS9nAVKW/S4LiUJvEp50d1KIoyB5KTE+oI OhvQ== X-Gm-Message-State: AOAM531vliwmLk95k1YMRj7mkmAOaWrpxcsCdkRBTWy7+zYllfKhVZhx VWCetEn+n2oz1U71MARZ8IsteQ== X-Google-Smtp-Source: ABdhPJw8PLzKgQmaEa8ysCMN9sfKFC8WRkmsHR02sons62G/19n1fEiznmiRyzGTrzfOiwBSuLpE9Q== X-Received: by 2002:a63:5d59:: with SMTP id o25mr12327428pgm.322.1614579953162; Sun, 28 Feb 2021 22:25:53 -0800 (PST) Received: from localhost.localdomain ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id x6sm14304626pfd.12.2021.02.28.22.25.36 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 28 Feb 2021 22:25:52 -0800 (PST) From: Muchun Song To: viro@zeniv.linux.org.uk, jack@suse.cz, amir73il@gmail.com, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org, shakeelb@google.com, guro@fb.com, songmuchun@bytedance.com, alex.shi@linux.alibaba.com, alexander.h.duyck@linux.intel.com, chris@chrisdown.name, richard.weiyang@gmail.com, vbabka@suse.cz, mathieu.desnoyers@efficios.com, posk@google.com, jannh@google.com, iamjoonsoo.kim@lge.com, daniel.vetter@ffwll.ch, longman@redhat.com, walken@google.com, christian.brauner@ubuntu.com, ebiederm@xmission.com, keescook@chromium.org, krisman@collabora.com, esyr@redhat.com, surenb@google.com, elver@google.com Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com Subject: [PATCH 5/5] mm: memcontrol: use object cgroup for remote memory cgroup charging Date: Mon, 1 Mar 2021 14:22:27 +0800 Message-Id: <20210301062227.59292-6-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210301062227.59292-1-songmuchun@bytedance.com> References: <20210301062227.59292-1-songmuchun@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org We spent a lot of energy to make slab accounting do not hold a refcount to memory cgroup, so the dying cgroup can be freed as soon as possible on cgroup offlined. But some users of remote memory cgroup charging (e.g. bpf and fsnotify) hold a refcount to memory cgroup for charging to it later. Actually, the slab core use obj_cgroup APIs for memory cgroup charing, so we can hold a refcount to obj_cgroup instead of memory cgroup. In this case, the infrastructure of remote meory charging also do not hold a refcount to memory cgroup. Signed-off-by: Muchun Song --- fs/buffer.c | 10 ++++-- fs/notify/fanotify/fanotify.c | 6 ++-- fs/notify/fanotify/fanotify_user.c | 2 +- fs/notify/group.c | 3 +- fs/notify/inotify/inotify_fsnotify.c | 8 ++--- fs/notify/inotify/inotify_user.c | 2 +- include/linux/bpf.h | 2 +- include/linux/fsnotify_backend.h | 2 +- include/linux/memcontrol.h | 15 ++++++++ include/linux/sched.h | 4 +-- include/linux/sched/mm.h | 28 +++++++-------- kernel/bpf/syscall.c | 35 +++++++++---------- kernel/fork.c | 2 +- mm/memcontrol.c | 66 ++++++++++++++++++++++++++++-------- 14 files changed, 121 insertions(+), 64 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 591547779dbd..cc99fcf66368 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -842,14 +842,16 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, struct buffer_head *bh, *head; gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT; long offset; - struct mem_cgroup *memcg, *old_memcg; + struct mem_cgroup *memcg; + struct obj_cgroup *objcg, *old_objcg; if (retry) gfp |= __GFP_NOFAIL; /* The page lock pins the memcg */ memcg = page_memcg(page); - old_memcg = set_active_memcg(memcg); + objcg = get_obj_cgroup_from_mem_cgroup(memcg); + old_objcg = set_active_obj_cgroup(objcg); head = NULL; offset = PAGE_SIZE; @@ -868,7 +870,9 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, set_bh_page(bh, page, offset); } out: - set_active_memcg(old_memcg); + set_active_obj_cgroup(old_objcg); + if (objcg) + obj_cgroup_put(objcg); return head; /* * In case anything failed, we just free everything we got. diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 1192c9953620..04d24acfffc7 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -530,7 +530,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, struct inode *dirid = fanotify_dfid_inode(mask, data, data_type, dir); const struct path *path = fsnotify_data_path(data, data_type); unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); - struct mem_cgroup *old_memcg; + struct obj_cgroup *old_objcg; struct inode *child = NULL; bool name_event = false; @@ -580,7 +580,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, gfp |= __GFP_RETRY_MAYFAIL; /* Whoever is interested in the event, pays for the allocation. */ - old_memcg = set_active_memcg(group->memcg); + old_objcg = set_active_obj_cgroup(group->objcg); if (fanotify_is_perm_event(mask)) { event = fanotify_alloc_perm_event(path, gfp); @@ -608,7 +608,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, event->pid = get_pid(task_tgid(current)); out: - set_active_memcg(old_memcg); + set_active_obj_cgroup(old_objcg); return event; } diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 9e0c1afac8bd..055ca36d4e0e 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -985,7 +985,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) group->fanotify_data.user = user; group->fanotify_data.flags = flags; atomic_inc(&user->fanotify_listeners); - group->memcg = get_mem_cgroup_from_mm(current->mm); + group->objcg = get_obj_cgroup_from_current(); group->overflow_event = fanotify_alloc_overflow_event(); if (unlikely(!group->overflow_event)) { diff --git a/fs/notify/group.c b/fs/notify/group.c index ffd723ffe46d..fac46b92c16f 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -24,7 +24,8 @@ static void fsnotify_final_destroy_group(struct fsnotify_group *group) if (group->ops->free_group_priv) group->ops->free_group_priv(group); - mem_cgroup_put(group->memcg); + if (group->objcg) + obj_cgroup_put(group->objcg); mutex_destroy(&group->mark_mutex); kfree(group); diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index 1901d799909b..20835554819a 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -66,7 +66,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask, int ret; int len = 0; int alloc_len = sizeof(struct inotify_event_info); - struct mem_cgroup *old_memcg; + struct obj_cgroup *old_objcg; if (name) { len = name->len; @@ -81,12 +81,12 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask, /* * Whoever is interested in the event, pays for the allocation. Do not - * trigger OOM killer in the target monitoring memcg as it may have + * trigger OOM killer in the target monitoring objcg as it may have * security repercussion. */ - old_memcg = set_active_memcg(group->memcg); + old_objcg = set_active_obj_cgroup(group->objcg); event = kmalloc(alloc_len, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL); - set_active_memcg(old_memcg); + set_active_obj_cgroup(old_objcg); if (unlikely(!event)) { /* diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index c71be4fb7dc5..5b4de477fcac 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -649,7 +649,7 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events) oevent->name_len = 0; group->max_events = max_events; - group->memcg = get_mem_cgroup_from_mm(current->mm); + group->objcg = get_obj_cgroup_from_current(); spin_lock_init(&group->inotify_data.idr_lock); idr_init(&group->inotify_data.idr); diff --git a/include/linux/bpf.h b/include/linux/bpf.h index cccaef1088ea..b6894e3cd095 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -158,7 +158,7 @@ struct bpf_map { u32 btf_value_type_id; struct btf *btf; #ifdef CONFIG_MEMCG_KMEM - struct mem_cgroup *memcg; + struct obj_cgroup *objcg; #endif char name[BPF_OBJ_NAME_LEN]; u32 btf_vmlinux_value_type_id; diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index e5409b83e731..d0303f634da6 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -220,7 +220,7 @@ struct fsnotify_group { * notification list is too * full */ - struct mem_cgroup *memcg; /* memcg to charge allocations */ + struct obj_cgroup *objcg; /* objcg to charge allocations */ /* groups can define private fields here or use the void *private */ union { diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 27043478220f..96e63ec7274a 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1642,6 +1642,7 @@ static inline void memcg_set_shrinker_bit(struct mem_cgroup *memcg, int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order); void __memcg_kmem_uncharge_page(struct page *page, int order); +struct obj_cgroup *get_obj_cgroup_from_mem_cgroup(struct mem_cgroup *memcg); struct obj_cgroup *get_obj_cgroup_from_current(void); int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size); @@ -1692,6 +1693,20 @@ static inline int memcg_cache_id(struct mem_cgroup *memcg) struct mem_cgroup *mem_cgroup_from_obj(void *p); #else +static inline +struct obj_cgroup *get_obj_cgroup_from_mem_cgroup(struct mem_cgroup *memcg) +{ + return NULL; +} + +static inline struct obj_cgroup *get_obj_cgroup_from_current(void) +{ + return NULL; +} + +static inline void obj_cgroup_put(struct obj_cgroup *objcg) +{ +} static inline int memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) diff --git a/include/linux/sched.h b/include/linux/sched.h index c2d488eddf85..75d5b571edcb 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1317,8 +1317,8 @@ struct task_struct { #endif #ifdef CONFIG_MEMCG_KMEM - /* Used by memcontrol for targeted memcg charge: */ - struct mem_cgroup *active_memcg; + /* Used by memcontrol for targeted object cgroup charge: */ + struct obj_cgroup *active_objcg; #endif #ifdef CONFIG_BLK_CGROUP diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 64a72975270e..e713f4290914 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -295,36 +295,34 @@ static inline void memalloc_nocma_restore(unsigned int flags) #endif #ifdef CONFIG_MEMCG_KMEM -DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg); +DECLARE_PER_CPU(struct obj_cgroup *, int_active_objcg); /** - * set_active_memcg - Starts the remote memcg charging scope. - * @memcg: memcg to charge. + * set_active_obj_cgroup - Starts the remote object cgroup charging scope. + * @objcg: object cgroup to charge. * - * This function marks the beginning of the remote memcg charging scope. All the - * __GFP_ACCOUNT allocations till the end of the scope will be charged to the - * given memcg. + * This function marks the beginning of the remote object cgroup charging scope. + * All the __GFP_ACCOUNT allocations till the end of the scope will be charged + * to the given object cgroup. * * NOTE: This function can nest. Users must save the return value and * reset the previous value after their own charging scope is over. */ -static inline struct mem_cgroup * -set_active_memcg(struct mem_cgroup *memcg) +static inline struct obj_cgroup *set_active_obj_cgroup(struct obj_cgroup *objcg) { - struct mem_cgroup *old; + struct obj_cgroup *old; if (in_interrupt()) { - old = this_cpu_read(int_active_memcg); - this_cpu_write(int_active_memcg, memcg); + old = this_cpu_read(int_active_objcg); + this_cpu_write(int_active_objcg, objcg); } else { - old = current->active_memcg; - current->active_memcg = memcg; + old = current->active_objcg; + current->active_objcg = objcg; } return old; } #else -static inline struct mem_cgroup * -set_active_memcg(struct mem_cgroup *memcg) +static inline struct obj_cgroup *set_active_obj_cgroup(struct obj_cgroup *objcg) { return NULL; } diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c859bc46d06c..1b078eddf083 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -390,37 +390,38 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock) } #ifdef CONFIG_MEMCG_KMEM -static void bpf_map_save_memcg(struct bpf_map *map) +static void bpf_map_save_objcg(struct bpf_map *map) { - map->memcg = get_mem_cgroup_from_mm(current->mm); + map->objcg = get_obj_cgroup_from_current(); } -static void bpf_map_release_memcg(struct bpf_map *map) +static void bpf_map_release_objcg(struct bpf_map *map) { - mem_cgroup_put(map->memcg); + if (map->objcg) + obj_cgroup_put(map->objcg); } void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags, int node) { - struct mem_cgroup *old_memcg; + struct obj_cgroup *old_objcg; void *ptr; - old_memcg = set_active_memcg(map->memcg); + old_objcg = set_active_obj_cgroup(map->objcg); ptr = kmalloc_node(size, flags | __GFP_ACCOUNT, node); - set_active_memcg(old_memcg); + set_active_obj_cgroup(old_objcg); return ptr; } void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags) { - struct mem_cgroup *old_memcg; + struct obj_cgroup *old_objcg; void *ptr; - old_memcg = set_active_memcg(map->memcg); + old_objcg = set_active_obj_cgroup(map->objcg); ptr = kzalloc(size, flags | __GFP_ACCOUNT); - set_active_memcg(old_memcg); + set_active_obj_cgroup(old_objcg); return ptr; } @@ -428,22 +429,22 @@ void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags) void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, size_t align, gfp_t flags) { - struct mem_cgroup *old_memcg; + struct obj_cgroup *old_objcg; void __percpu *ptr; - old_memcg = set_active_memcg(map->memcg); + old_objcg = set_active_obj_cgroup(map->objcg); ptr = __alloc_percpu_gfp(size, align, flags | __GFP_ACCOUNT); - set_active_memcg(old_memcg); + set_active_obj_cgroup(old_objcg); return ptr; } #else -static void bpf_map_save_memcg(struct bpf_map *map) +static void bpf_map_save_objcg(struct bpf_map *map) { } -static void bpf_map_release_memcg(struct bpf_map *map) +static void bpf_map_release_objcg(struct bpf_map *map) { } #endif @@ -454,7 +455,7 @@ static void bpf_map_free_deferred(struct work_struct *work) struct bpf_map *map = container_of(work, struct bpf_map, work); security_bpf_map_free(map); - bpf_map_release_memcg(map); + bpf_map_release_objcg(map); /* implementation dependent freeing */ map->ops->map_free(map); } @@ -877,7 +878,7 @@ static int map_create(union bpf_attr *attr) if (err) goto free_map_sec; - bpf_map_save_memcg(map); + bpf_map_save_objcg(map); err = bpf_map_new_fd(map, f_flags); if (err < 0) { diff --git a/kernel/fork.c b/kernel/fork.c index d66718bc82d5..5a800916ad8d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -943,7 +943,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) #endif #ifdef CONFIG_MEMCG_KMEM - tsk->active_memcg = NULL; + tsk->active_objcg = NULL; #endif return tsk; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 092dc4588b43..024a0f377eb7 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -77,8 +77,8 @@ EXPORT_SYMBOL(memory_cgrp_subsys); struct mem_cgroup *root_mem_cgroup __read_mostly; #ifdef CONFIG_MEMCG_KMEM -/* Active memory cgroup to use from an interrupt context */ -DEFINE_PER_CPU(struct mem_cgroup *, int_active_memcg); +/* Active object cgroup to use from an interrupt context */ +DEFINE_PER_CPU(struct obj_cgroup *, int_active_objcg); #endif /* Socket memory accounting disabled? */ @@ -1057,18 +1057,18 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm) EXPORT_SYMBOL(get_mem_cgroup_from_mm); #ifdef CONFIG_MEMCG_KMEM -static __always_inline struct mem_cgroup *active_memcg(void) +static __always_inline struct obj_cgroup *active_obj_cgroup(void) { if (in_interrupt()) - return this_cpu_read(int_active_memcg); + return this_cpu_read(int_active_objcg); else - return current->active_memcg; + return current->active_objcg; } static __always_inline bool memcg_kmem_bypass(void) { /* Allow remote memcg charging from any context. */ - if (unlikely(active_memcg())) + if (unlikely(active_obj_cgroup())) return false; /* Memcg to charge can't be determined. */ @@ -2971,26 +2971,47 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p) return page_memcg_check(page); } -__always_inline struct obj_cgroup *get_obj_cgroup_from_current(void) +__always_inline +struct obj_cgroup *get_obj_cgroup_from_mem_cgroup(struct mem_cgroup *memcg) { struct obj_cgroup *objcg = NULL; + + rcu_read_lock(); + for (; !mem_cgroup_is_root(memcg); memcg = parent_mem_cgroup(memcg)) { + objcg = rcu_dereference(memcg->objcg); + if (objcg && obj_cgroup_tryget(objcg)) + break; + objcg = NULL; + } + rcu_read_unlock(); + + return objcg; +} + +__always_inline struct obj_cgroup *get_obj_cgroup_from_current(void) +{ + struct obj_cgroup *objcg; struct mem_cgroup *memcg; if (memcg_kmem_bypass()) return NULL; rcu_read_lock(); - if (unlikely(active_memcg())) - memcg = active_memcg(); - else - memcg = mem_cgroup_from_task(current); + objcg = active_obj_cgroup(); + if (unlikely(objcg)) { + /* remote object cgroup must hold a reference. */ + obj_cgroup_get(objcg); + goto out; + } + memcg = mem_cgroup_from_task(current); for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) { objcg = rcu_dereference(memcg->objcg); if (objcg && obj_cgroup_tryget(objcg)) break; objcg = NULL; } +out: rcu_read_unlock(); return objcg; @@ -5296,16 +5317,33 @@ static struct mem_cgroup *mem_cgroup_alloc(void) return ERR_PTR(error); } +#ifdef CONFIG_MEMCG_KMEM +static inline struct obj_cgroup *memcg_obj_cgroup(struct mem_cgroup *memcg) +{ + return memcg ? memcg->objcg : NULL; +} +#else +static inline struct obj_cgroup *memcg_obj_cgroup(struct mem_cgroup *memcg) +{ + return NULL; +} +#endif + static struct cgroup_subsys_state * __ref mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) { struct mem_cgroup *parent = mem_cgroup_from_css(parent_css); - struct mem_cgroup *memcg, *old_memcg; + struct mem_cgroup *memcg; + struct obj_cgroup *old_objcg; long error = -ENOMEM; - old_memcg = set_active_memcg(parent); + /* + * The @parent cannot be offlined, so @parent->objcg cannot be freed + * under us. + */ + old_objcg = set_active_obj_cgroup(memcg_obj_cgroup(parent)); memcg = mem_cgroup_alloc(); - set_active_memcg(old_memcg); + set_active_obj_cgroup(old_objcg); if (IS_ERR(memcg)) return ERR_CAST(memcg);