From patchwork Thu Jun 7 02:41:27 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 9147 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id 296F523E37 for ; Thu, 7 Jun 2012 02:41:02 +0000 (UTC) Received: from mail-yw0-f52.google.com (mail-yw0-f52.google.com [209.85.213.52]) by fiordland.canonical.com (Postfix) with ESMTP id CC021A1875B for ; Thu, 7 Jun 2012 02:41:01 +0000 (UTC) Received: by yhpp61 with SMTP id p61so101963yhp.11 for ; Wed, 06 Jun 2012 19:41:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-forwarded-to:x-forwarded-for:delivered-to:received-spf:x-auditid :message-id:date:from:user-agent:mime-version:newsgroups:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding:x-brightmail-tracker:x-gm-message-state; bh=AEPRldaaijz/LwVFR+pGXY1zYxHONcEGDYAPf2ebBag=; b=AcdLd7ld0fyBDQSLc0a9IHfxM6iPqbyLByTAA8/vodb8QxsvA65C8dw8P3Tv55cU4a hUDwrZEN6xdsbpxO/9nwcN6wVVYx/MN0jt0kYTm8ppNOdH0lGZrea+nzBf8NSzvf1F7r QIABpKmcNwRGeSLkdPilSH36V/u+MryDFw+Sw5DCtFp5+G0RUzxfs9wybo/ci8R5jUgW mH5W4u8cMnFXMN5IF/BIoVHd7cfzHUSI+tZiW+ApynJgLtmF5UVuaZBb4Hl+3MsxGTsK Ek8yG7sW8OgHLTRT+XleTIRwqEJTS+wq7pEsMq7wZ17hsLSv23ZlQyP3YMpMelGZmSX3 9h+g== Received: by 10.50.163.99 with SMTP id yh3mr215212igb.53.1339036861152; Wed, 06 Jun 2012 19:41:01 -0700 (PDT) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.231.24.148 with SMTP id v20csp118436ibb; Wed, 6 Jun 2012 19:41:00 -0700 (PDT) Received: by 10.68.221.106 with SMTP id qd10mr4382488pbc.42.1339036859624; Wed, 06 Jun 2012 19:40:59 -0700 (PDT) Received: from LGEMRELSE7Q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id to9si3467850pbc.68.2012.06.06.19.40.57; Wed, 06 Jun 2012 19:40:59 -0700 (PDT) Received-SPF: neutral (google.com: 156.147.1.151 is neither permitted nor denied by best guess record for domain of minchan@kernel.org) client-ip=156.147.1.151; Authentication-Results: mx.google.com; spf=neutral (google.com: 156.147.1.151 is neither permitted nor denied by best guess record for domain of minchan@kernel.org) smtp.mail=minchan@kernel.org X-AuditID: 9c930197-b7be2ae000000ebb-35-4fd014b85943 Received: from [192.168.0.4] ( [10.177.220.67]) by LGEMRELSE7Q.lge.com (Symantec Brightmail Gateway) with SMTP id AF.0C.03771.8B410DF4; Thu, 7 Jun 2012 11:40:56 +0900 (KST) Message-ID: <4FD014D7.6000605@kernel.org> Date: Thu, 07 Jun 2012 11:41:27 +0900 From: Minchan Kim User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 MIME-Version: 1.0 Newsgroups: gmane.linux.kernel.mm,gmane.linux.kernel To: Anton Vorontsov CC: Pekka Enberg , KOSAKI Motohiro , Leonid Moiseichuk , John Stultz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linaro-kernel@lists.linaro.org, patches@linaro.org, kernel-team@android.com Subject: Re: [PATCH 0/5] Some vmevent fixes... References: <20120601122118.GA6128@lizard> <4FCC7592.9030403@kernel.org> <20120604113811.GA4291@lizard> <4FCD14F1.1030105@gmail.com> <20120605083921.GA21745@lizard> In-Reply-To: <20120605083921.GA21745@lizard> X-Brightmail-Tracker: AAAAAA== X-Gm-Message-State: ALoCoQlkBuaKX1uYYjiWJmiU2OFJ5/IuomNQ96jJ74/C4lbReLMyTr3WMhL/3M8UH4RVDECvUaIQ On 06/05/2012 05:39 PM, Anton Vorontsov wrote: > On Tue, Jun 05, 2012 at 10:47:18AM +0300, Pekka Enberg wrote: >> On Mon, Jun 4, 2012 at 11:05 PM, KOSAKI Motohiro >> wrote: >>>> Note that 1) and 2) are not problems per se, it's just implementation >>>> details, easy stuff. Vmevent is basically an ABI/API, and I didn't >>>> hear anybody who would object to vmevent ABI idea itself. More than >>>> this, nobody stop us from implementing in-kernel vmevent API, and >>>> make Android Lowmemory killer use it, if we want to. >>> >>> I never agree "it's mere ABI" discussion. Until the implementation is ugly, >>> I never agree the ABI even if syscall interface is very clean. >> >> I don't know what discussion you are talking about. >> >> I also don't agree that something should be merged just because the >> ABI is clean. The implementation must also make sense. I don't see how >> we disagree here at all. > > BTW, I wasn't implying that vmevent should be merged just because > it is a clean ABI, and I wasn't implying that it is clean, and I > didn't propose to merge it at all. :-) > > I just don't see any point in trying to scrap vmevent in favour of > Android low memory killer. This makes no sense at all, since today > vmevent is more useful than Android's solution. For vmevent we have > contributors from Nokia, Samsung, and of course Linaro, plus we > have an userland killer daemon* for Android (which can work with > both cgroups and vmevent backends). So vmevent is more generic > already. > > To me it would make more sense if mm guys would tell us "scrap > this all, just use cgroups and its notifications; fix cgroups' > slab accounting and be happy". Well, I'd understand that. > > Anyway, we all know that vmevent is 'work in progress', so nobody > tries to push it, nobody asks to merge it. So far we're just > discussing any possible solutions, and vmevent is a good > playground. > > > So, question to Minchan. Do you have anything particular in mind > regarding how the vmstat hooks should look like? And how all this > would connect with cgroups, since KOSAKI wants to see it cgroups- > aware... How about this? It's totally pseudo code just I want to show my intention and even it's not a math. Totally we need more fine-grained some expression to standardize memory pressure. For it, we can use VM's several parameter, nr_scanned, nr_reclaimed, order, dirty page scanning ratio and so on. Also, we can aware of zone, node so we can pass lots of information to user space if they want it. For making lowmem notifier general, they are must, I think. We can have a plenty of tools for it. And later as further step, we could replace it with memcg-aware after memcg reclaim work is totally unified with global page reclaim. Many memcg guys have tried it so I expect it works sooner or later but I'm not sure memcg really need it because memcg's goal is limit memory resource among several process groups. If some process feel bad about latency due to short of free memory and it's critical, I think it would be better to create new memcg group has tighter limit for latency and put the process into the group. > p.s. http://git.infradead.org/users/cbou/ulmkd.git > I haven't updated it for new vmevent changes, but still, > its idea should be clear enough. > diff --git a/mm/vmscan.c b/mm/vmscan.c index eeb3bc9..eae3d2e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2323,6 +2323,32 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining, } /* + * higher dirty pages, higher pressure + * higher nr_scanned, higher pressure + * higher nr_reclaimed, lower pressure + * higher unmapped pages, lower pressure + * + * index toward 0 implies memory pressure is heavy. + */ +int lowmem_index(struct zone *zone, struct scan_control *sc) +{ + int pressure = (1000 * (sc->nr_scanned * (zone_page_state(zone, NR_FILE_DIRTY) + * dirty_weight + 1) - sc->nr_reclaimed - + zone_unmapped_file_pages(zone))) / + zone_reclaimable_page(zone); + + return 1000 - pressure; +} + +void lowmem_notifier(struct zone *zone, int index) +{ + if (lowmem_has_interested_zone(zone)) { + if (index < sysctl_lowmem_threshold) + notify(numa_node_id(), zone, index); + } +} + +/* * For kswapd, balance_pgdat() will work across all this node's zones until * they are all at high_wmark_pages(zone). * @@ -2494,6 +2520,7 @@ loop_again: !zone_watermark_ok_safe(zone, testorder, high_wmark_pages(zone) + balance_gap, end_zone, 0)) { + int index; shrink_zone(zone, &sc); reclaim_state->reclaimed_slab = 0; @@ -2503,6 +2530,9 @@ loop_again: if (nr_slab == 0 && !zone_reclaimable(zone)) zone->all_unreclaimable = 1; + + index = lowmem_index(zone, &sc); + lowmem_notifier(zone, index); >