From patchwork Fri Jun 1 18:29:46 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Stultz X-Patchwork-Id: 9080 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id 141BA23E49 for ; Fri, 1 Jun 2012 18:30:05 +0000 (UTC) Received: from mail-yx0-f180.google.com (mail-yx0-f180.google.com [209.85.213.180]) by fiordland.canonical.com (Postfix) with ESMTP id A25E7A185CB for ; Fri, 1 Jun 2012 18:30:04 +0000 (UTC) Received: by yenq6 with SMTP id q6so2307203yen.11 for ; Fri, 01 Jun 2012 11:30:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-forwarded-to:x-forwarded-for:delivered-to:received-spf:from:to:cc :subject:date:message-id:x-mailer:in-reply-to:references :x-content-scanned:x-cbid:x-gm-message-state; bh=ESBCoKx0P5P1mnX1qD1YOUiy3Y6RacOwtLbDRD9CXrQ=; b=aBP/W3sanUVCSmU72z6Oa1tyetve5XS1TT9+gq6dB4YxF3JsasvZJ+QxFzj3ZIL4h5 qeLjabhwXjBaAlqgp5r9y2MwlcytmJwF/0Vcc+uifOLogGgWRQN7JPYpLBT5KyyqfYIN 8S+r6Rf0fCtv6kTDbodIi0MBGu5yaWI/2qPfF7J7Cs2p3PUMYvCzx+MqU+q9IS0dmGiD vAVwtcohOYU9AyQfRnE5CMMPR3dd0vpCU+aKIzZ9vb1+Ma29a2j33LBR7qPAGwTOcH/v KtHDDH3FyM4Tiliyrb7r0mYSlz/R3EWdMlGvBAgMuSlOf1xlco1kKJrk4XWLrYFOs/0w jofQ== Received: by 10.50.193.196 with SMTP id hq4mr2580443igc.57.1338575403615; Fri, 01 Jun 2012 11:30:03 -0700 (PDT) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.231.24.148 with SMTP id v20csp330930ibb; Fri, 1 Jun 2012 11:30:02 -0700 (PDT) Received: by 10.68.129.198 with SMTP id ny6mr12727555pbb.22.1338575401986; Fri, 01 Jun 2012 11:30:01 -0700 (PDT) Received: from e9.ny.us.ibm.com (e9.ny.us.ibm.com. [32.97.182.139]) by mx.google.com with ESMTPS id vr5si4848462pbc.343.2012.06.01.11.30.00 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 01 Jun 2012 11:30:01 -0700 (PDT) Received-SPF: pass (google.com: domain of jstultz@us.ibm.com designates 32.97.182.139 as permitted sender) client-ip=32.97.182.139; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jstultz@us.ibm.com designates 32.97.182.139 as permitted sender) smtp.mail=jstultz@us.ibm.com Received: from /spool/local by e9.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 1 Jun 2012 14:29:59 -0400 Received: from d01dlp02.pok.ibm.com (9.56.224.85) by e9.ny.us.ibm.com (192.168.1.109) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 1 Jun 2012 14:29:56 -0400 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 16A1E6E8062; Fri, 1 Jun 2012 14:29:56 -0400 (EDT) Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q51ITt75140300; Fri, 1 Jun 2012 14:29:55 -0400 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q51ITqKa031480; Fri, 1 Jun 2012 12:29:54 -0600 Received: from kernel.beaverton.ibm.com (kernel.beaverton.ibm.com [9.47.67.96]) by d03av04.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q51ITqSt031458; Fri, 1 Jun 2012 12:29:52 -0600 Received: by kernel.beaverton.ibm.com (Postfix, from userid 1056) id 8DE8BC0623; Fri, 1 Jun 2012 11:29:51 -0700 (PDT) From: John Stultz To: LKML Cc: John Stultz , Andrew Morton , Android Kernel Team , Robert Love , Mel Gorman , Hugh Dickins , Dave Hansen , Rik van Riel , Dmitry Adamushko , Dave Chinner , Neil Brown , Andrea Righi , "Aneesh Kumar K.V" , Taras Glek , Mike Hommey , Jan Kara Subject: [PATCH 2/3] [RFC] Add volatile range management code Date: Fri, 1 Jun 2012 11:29:46 -0700 Message-Id: <1338575387-26972-3-git-send-email-john.stultz@linaro.org> X-Mailer: git-send-email 1.7.3.2.146.gca209 In-Reply-To: <1338575387-26972-1-git-send-email-john.stultz@linaro.org> References: <1338575387-26972-1-git-send-email-john.stultz@linaro.org> X-Content-Scanned: Fidelis XPS MAILER x-cbid: 12060118-7182-0000-0000-000001A6B9AA X-Gm-Message-State: ALoCoQkCffGcEwpJ38XLoaz0ejsVivrSWN9AePhR6VsnjoPP1AHzmW5fU9e0/MknDRj58lnIsOgV This patch provides the volatile range management code that filesystems can utilize when implementing FALLOC_FL_MARK_VOLATILE. It tracks a collection of page ranges against a mapping stored in an interval-tree. This code handles coalescing overlapping and adjacent ranges, as well as splitting ranges when sub-chunks are removed. The ranges can be marked purged or unpurged. And there is a per-fs lru list that tracks all the unpurged ranges for that fs. v2: * Fix bug in volatile_ranges_get_last_used returning bad start,end values * Rework for intervaltree renaming * Optimize volatile_range_lru_size to avoid running through lru list each time. CC: Andrew Morton CC: Android Kernel Team CC: Robert Love CC: Mel Gorman CC: Hugh Dickins CC: Dave Hansen CC: Rik van Riel CC: Dmitry Adamushko CC: Dave Chinner CC: Neil Brown CC: Andrea Righi CC: Aneesh Kumar K.V CC: Taras Glek CC: Mike Hommey CC: Jan Kara Signed-off-by: John Stultz --- include/linux/volatile.h | 45 ++++ mm/Makefile | 2 +- mm/volatile.c | 509 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 555 insertions(+), 1 deletions(-) create mode 100644 include/linux/volatile.h create mode 100644 mm/volatile.c diff --git a/include/linux/volatile.h b/include/linux/volatile.h new file mode 100644 index 0000000..66737a8 --- /dev/null +++ b/include/linux/volatile.h @@ -0,0 +1,45 @@ +#ifndef _LINUX_VOLATILE_H +#define _LINUX_VOLATILE_H + +#include + +struct volatile_fs_head { + struct mutex lock; + struct list_head lru_head; + s64 unpurged_page_count; +}; + + +#define DEFINE_VOLATILE_FS_HEAD(name) struct volatile_fs_head name = { \ + .lock = __MUTEX_INITIALIZER(name.lock), \ + .lru_head = LIST_HEAD_INIT(name.lru_head), \ + .unpurged_page_count = 0, \ +} + + +static inline void volatile_range_lock(struct volatile_fs_head *head) +{ + mutex_lock(&head->lock); +} + +static inline void volatile_range_unlock(struct volatile_fs_head *head) +{ + mutex_unlock(&head->lock); +} + +extern long volatile_range_add(struct volatile_fs_head *head, + struct address_space *mapping, + pgoff_t start_index, pgoff_t end_index); +extern long volatile_range_remove(struct volatile_fs_head *head, + struct address_space *mapping, + pgoff_t start_index, pgoff_t end_index); + +extern s64 volatile_range_lru_size(struct volatile_fs_head *head); + +extern void volatile_range_clear(struct volatile_fs_head *head, + struct address_space *mapping); + +extern s64 volatile_ranges_get_last_used(struct volatile_fs_head *head, + struct address_space **mapping, + loff_t *start, loff_t *end); +#endif /* _LINUX_VOLATILE_H */ diff --git a/mm/Makefile b/mm/Makefile index a156285..dc79eb8 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -16,7 +16,7 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ readahead.o swap.o truncate.o vmscan.o shmem.o \ prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \ page_isolation.o mm_init.o mmu_context.o percpu.o \ - compaction.o $(mmu-y) + compaction.o volatile.o $(mmu-y) obj-y += init-mm.o ifdef CONFIG_NO_BOOTMEM diff --git a/mm/volatile.c b/mm/volatile.c new file mode 100644 index 0000000..f8da602 --- /dev/null +++ b/mm/volatile.c @@ -0,0 +1,509 @@ +/* mm/volatile.c + * + * Volatile page range managment. + * Copyright 2011 Linaro + * + * Based on mm/ashmem.c + * by Robert Love + * Copyright (C) 2008 Google, Inc. + * + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * The volatile range management is a helper layer on top of the range tree + * code, which is used to help filesystems manage page ranges that are volatile. + * + * These ranges are stored in a per-mapping range tree. Storing both purged and + * unpurged ranges connected to that address_space. Unpurged ranges are also + * linked together in an lru list that is per-volatile-fs-head (basically + * per-filesystem). + * + * The goal behind volatile ranges is to allow applications to interact + * with the kernel's cache management infrastructure. In particular an + * application can say "this memory contains data that might be useful in + * the future, but can be reconstructed if necessary, so if the kernel + * needs, it can zap and reclaim this memory without having to swap it out. + * + * The proposed mechanism - at a high level - is for user-space to be able + * to say "This memory is volatile" and then later "this memory is no longer + * volatile". If the content of the memory is still available the second + * request succeeds. If not, the memory is marked non-volatile and an + * error is returned to denote that the contents have been lost. + * + * Credits to Neil Brown for the above description. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +struct volatile_range { + struct list_head lru; + struct interval_tree_node interval_node; + unsigned int purged; + struct address_space *mapping; +}; + + +/* + * To avoid bloating the address_space structure, we use + * a hash structure to map from address_space mappings to + * the interval_tree root that stores volatile ranges + */ +static DEFINE_MUTEX(hash_mutex); +static struct hlist_head *mapping_hash; +static long mapping_hash_shift = 8; +struct mapping_hash_entry { + struct interval_tree_root root; + struct address_space *mapping; + struct hlist_node hnode; +}; + + +static inline +struct interval_tree_root *__mapping_to_root(struct address_space *mapping) +{ + struct hlist_node *elem; + struct mapping_hash_entry *entry; + struct interval_tree_root *ret = NULL; + + hlist_for_each_entry_rcu(entry, elem, + &mapping_hash[hash_ptr(mapping, mapping_hash_shift)], + hnode) + if (entry->mapping == mapping) + ret = &entry->root; + + return ret; +} + + +static inline +struct interval_tree_root *mapping_to_root(struct address_space *mapping) +{ + struct interval_tree_root *ret; + + mutex_lock(&hash_mutex); + ret = __mapping_to_root(mapping); + mutex_unlock(&hash_mutex); + return ret; +} + + +static inline +struct interval_tree_root *mapping_allocate_root(struct address_space *mapping) +{ + struct mapping_hash_entry *entry; + struct interval_tree_root *dblchk; + struct interval_tree_root *ret = NULL; + + entry = kzalloc(sizeof(*entry), GFP_KERNEL); + if (!entry) + return NULL; + + mutex_lock(&hash_mutex); + /* Since we dropped the lock, double check that no one has + * created the same hash entry. + */ + dblchk = __mapping_to_root(mapping); + if (dblchk) { + kfree(entry); + ret = dblchk; + goto out; + } + + INIT_HLIST_NODE(&entry->hnode); + entry->mapping = mapping; + interval_tree_init(&entry->root); + + hlist_add_head_rcu(&entry->hnode, + &mapping_hash[hash_ptr(mapping, mapping_hash_shift)]); + + ret = &entry->root; +out: + mutex_unlock(&hash_mutex); + return ret; +} + + +static inline void mapping_free_root(struct interval_tree_root *root) +{ + struct mapping_hash_entry *entry; + + mutex_lock(&hash_mutex); + entry = container_of(root, struct mapping_hash_entry, root); + + hlist_del_rcu(&entry->hnode); + kfree(entry); + mutex_unlock(&hash_mutex); +} + + +/* volatile range helpers */ +static inline void vrange_resize(struct volatile_fs_head *head, + struct volatile_range *range, + pgoff_t start_index, pgoff_t end_index) +{ + s64 old_size, new_size; + + old_size = range->interval_node.end - range->interval_node.start; + new_size = end_index-start_index; + + if (!range->purged) + head->unpurged_page_count += new_size - old_size; + + range->interval_node.start = start_index; + range->interval_node.end = end_index; +} + +static struct volatile_range *vrange_alloc(void) +{ + struct volatile_range *new; + + new = kzalloc(sizeof(struct volatile_range), GFP_KERNEL); + if (!new) + return 0; + interval_tree_node_init(&new->interval_node); + return new; +} + +static void vrange_del(struct volatile_fs_head *head, + struct interval_tree_root *root, + struct volatile_range *vrange) +{ + if (!vrange->purged) { + head->unpurged_page_count -= + vrange->interval_node.end - vrange->interval_node.start; + list_del(&vrange->lru); + } + interval_tree_remove(root, &vrange->interval_node); + kfree(vrange); +} + + +/** + * volatile_range_add: Marks a page interval as volatile + * @head: per-fs volatile head + * @mapping: address space who's range is being marked volatile + * @start_index: Starting page in range to be marked volatile + * @end_index: Ending page in range to be marked volatile + * + * Mark a region as volatile. Coalesces overlapping and neighboring regions. + * + * Must lock the volatile_fs_head before calling! + * + * Returns 1 if the range was coalesced with any purged ranges. + * Returns 0 on success. + */ +long volatile_range_add(struct volatile_fs_head *head, + struct address_space *mapping, + pgoff_t start_index, pgoff_t end_index) +{ + struct volatile_range *new; + struct interval_tree_node *node; + struct volatile_range *vrange; + struct interval_tree_root *root; + int purged = 0; + u64 start = (u64)start_index; + u64 end = (u64)end_index; + + /* Make sure we're properly locked */ + WARN_ON(!mutex_is_locked(&head->lock)); + + /* + * Because the lock might be held in a shrinker, release + * it during allocation. + */ + mutex_unlock(&head->lock); + new = vrange_alloc(); + mutex_lock(&head->lock); + if (!new) + return -ENOMEM; + + root = mapping_to_root(mapping); + if (!root) { + mutex_unlock(&head->lock); + root = mapping_allocate_root(mapping); + mutex_lock(&head->lock); + if (!root) { + kfree(new); + return -ENOMEM; + } + } + + /* First, find any existing intervals that overlap */ + node = interval_tree_in_interval(root, start, end); + while (node) { + /* Already entirely marked volatile, so we're done */ + if (node->start < start && node->end > end) { + /* don't need the allocated value */ + kfree(new); + return purged; + } + + /* Grab containing volatile range */ + vrange = container_of(node, struct volatile_range, + interval_node); + + /* Resize the new range to cover all overlapping ranges */ + start = min_t(u64, start, node->start); + end = max_t(u64, end, node->end); + + /* Inherit purged state from overlapping ranges */ + purged |= vrange->purged; + + + node = interval_tree_next_in_interval(&vrange->interval_node, + start, end); + /* Delete the old range, as we consume it */ + vrange_del(head, root, vrange); + } + + /* Coalesce left-adjacent ranges */ + node = interval_tree_in_interval(root, start-1, start); + if (node) { + vrange = container_of(node, struct volatile_range, + interval_node); + /* Only coalesce if both are either purged or unpurged */ + if (vrange->purged == purged) { + /* resize new range */ + start = min_t(u64, start, node->start); + end = max_t(u64, end, node->end); + /* delete old range */ + vrange_del(head, root, vrange); + } + } + + /* Coalesce right-adjacent ranges */ + node = interval_tree_in_interval(root, end, end+1); + if (node) { + vrange = container_of(node, struct volatile_range, + interval_node); + /* Only coalesce if both are either purged or unpurged */ + if (vrange->purged == purged) { + /* resize new range */ + start = min_t(u64, start, node->start); + end = max_t(u64, end, node->end); + /* delete old range */ + vrange_del(head, root, vrange); + } + } + /* Assign and store the new range in the range tree */ + new->mapping = mapping; + new->interval_node.start = start; + new->interval_node.end = end; + new->purged = purged; + interval_tree_add(root, &new->interval_node); + + /* Only add unpurged ranges to LRU */ + if (!purged) { + head->unpurged_page_count += end - start; + list_add_tail(&new->lru, &head->lru_head); + } + return purged; +} + + +/** + * volatile_range_remove: Marks a page interval as nonvolatile + * @head: per-fs volatile head + * @mapping: address space who's range is being marked nonvolatile + * @start_index: Starting page in range to be marked nonvolatile + * @end_index: Ending page in range to be marked nonvolatile + * + * Mark a region as nonvolatile. And remove any contained pages + * from the volatile range tree. + * + * Must lock the volatile_fs_head before calling! + * + * Returns 1 if any portion of the range was purged. + * Returns 0 on success. + */ +long volatile_range_remove(struct volatile_fs_head *head, + struct address_space *mapping, + pgoff_t start_index, pgoff_t end_index) +{ + struct volatile_range *new; + struct interval_tree_node *node; + struct interval_tree_root *root; + int ret = 0; + int used_new = 0; + u64 start = (u64)start_index; + u64 end = (u64)end_index; + + /* Make sure we're properly locked */ + WARN_ON(!mutex_is_locked(&head->lock)); + + /* + * Because the lock might be held in a shrinker, release + * it during allocation. + */ + mutex_unlock(&head->lock); + new = vrange_alloc(); + mutex_lock(&head->lock); + if (!new) + return -ENOMEM; + + root = mapping_to_root(mapping); + if (!root) + goto out; + + + /* Find any overlapping ranges */ + node = interval_tree_in_interval(root, start, end); + while (node) { + struct volatile_range *vrange; + vrange = container_of(node, struct volatile_range, + interval_node); + + ret |= vrange->purged; + + if (start <= node->start && end >= node->end) { + /* delete: volatile range is totally within range */ + node = interval_tree_next_in_interval( + &vrange->interval_node, + start, end); + vrange_del(head, root, vrange); + } else if (node->start >= start) { + /* resize: volatile range right-overlaps range */ + vrange_resize(head, vrange, end+1, node->end); + node = interval_tree_next_in_interval( + &vrange->interval_node, + start, end); + + } else if (node->end <= end) { + /* resize: volatile range left-overlaps range */ + vrange_resize(head, vrange, node->start, start-1); + node = interval_tree_next_in_interval( + &vrange->interval_node, + start, end); + } else { + /* split: range is totally within a volatile range */ + used_new = 1; /* we only do this once */ + new->mapping = mapping; + new->interval_node.start = end + 1; + new->interval_node.end = node->end; + new->purged = vrange->purged; + interval_tree_add(root, &new->interval_node); + if (!new->purged) + list_add_tail(&new->lru, &head->lru_head); + vrange_resize(head, vrange, node->start, start-1); + + break; + } + } + +out: + if (!used_new) + kfree(new); + + return ret; +} + +/** + * volatile_range_lru_size: Returns the number of unpurged pages on the lru + * @head: per-fs volatile head + * + * Returns the number of unpurged pages on the LRU + * + * Must lock the volatile_fs_head before calling! + * + */ +s64 volatile_range_lru_size(struct volatile_fs_head *head) +{ + WARN_ON(!mutex_is_locked(&head->lock)); + return head->unpurged_page_count; +} + + +/** + * volatile_ranges_get_last_used: Returns mapping and size of lru unpurged range + * @head: per-fs volatile head + * @mapping: dbl pointer to mapping who's range is being purged + * @start: Pointer to starting address of range being purged + * @end: Pointer to ending address of range being purged + * + * Returns the mapping, start and end values of the least recently used + * range. Marks the range as purged and removes it from the LRU. + * + * Must lock the volatile_fs_head before calling! + * + * Returns 1 on success if a range was returned + * Return 0 if no ranges were found. + */ +s64 volatile_ranges_get_last_used(struct volatile_fs_head *head, + struct address_space **mapping, + loff_t *start, loff_t *end) +{ + struct volatile_range *range; + + WARN_ON(!mutex_is_locked(&head->lock)); + + if (list_empty(&head->lru_head)) + return 0; + + range = list_first_entry(&head->lru_head, struct volatile_range, lru); + + *start = range->interval_node.start; + *end = range->interval_node.end; + *mapping = range->mapping; + + head->unpurged_page_count -= *end - *start; + list_del(&range->lru); + range->purged = 1; + + return 1; +} + + +/* + * Cleans up any volatile ranges. + */ +void volatile_range_clear(struct volatile_fs_head *head, + struct address_space *mapping) +{ + struct volatile_range *tozap; + struct interval_tree_root *root; + + WARN_ON(!mutex_is_locked(&head->lock)); + + root = mapping_to_root(mapping); + if (!root) + return; + + while (!interval_tree_empty(root)) { + struct interval_tree_node *tmp; + tmp = interval_tree_root_node(root); + tozap = container_of(tmp, struct volatile_range, interval_node); + vrange_del(head, root, tozap); + } + mapping_free_root(root); +} + + +static int __init volatile_init(void) +{ + int i, size; + + size = 1U << mapping_hash_shift; + mapping_hash = kzalloc(sizeof(mapping_hash)*size, GFP_KERNEL); + for (i = 0; i < size; i++) + INIT_HLIST_HEAD(&mapping_hash[i]); + + return 0; +} +arch_initcall(volatile_init);