From patchwork Thu Oct 12 06:16:12 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 115579 Delivered-To: patch@linaro.org Received: by 10.140.22.163 with SMTP id 32csp1575262qgn; Wed, 11 Oct 2017 23:17:03 -0700 (PDT) X-Google-Smtp-Source: AOwi7QBdAApN8XH7o9ox/Ke1WtWzCVmTwmblM8YrMhVYIrOlQdIwiYzrJYSHlU+XrIGOjhMx8oZg X-Received: by 10.98.24.20 with SMTP id 20mr1306625pfy.71.1507789023556; Wed, 11 Oct 2017 23:17:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1507789023; cv=none; d=google.com; s=arc-20160816; b=eOxPmaUoTLYyut0u1tTTTqBF0uP6J/URWGUdv1EmwI1dTtP8KD1cmf3mM3KZRnlSYR L1a4Mc071Y3jt2TFMB+dbJp6KDEWEVa0BnH1LE0ObrNb42+r7JfGzQLPy/W28RdC3CbA tQGYtDYvr1//AuGZouqimjQGs1wmiQ7KvYVIn7X7qh6Ry0zGZzhSl+RwR2ds1F+9qcqc uYYzP3BwZ8Qa44arPd9Fz1R7oIVFKU6t6VSL36gOUl/PfpmjhYcMaqGiIv8m3nEIfqu6 IRK48LMixW9gRUtxUP3KjLIC1i3QLbcD6LhcGOp0cMGWrkO/RpZd1hrUJdsPqAz0I1+9 EiBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=gX3FrjGdk+GUQRYTLjnwGYlJloHbTgzGN5+F7Dc/Qno=; b=ppm04OH6y6+xAn7VcGMQQvNFj6igwzDSLgTLptNt9IdwmIWrRvtLaPJPyoMjOwmwXl icqhpj+wiAmuxxqEjpll64t8afQYuZo56Pc7oK1XjZcSuM/Mjnlv6abE/6XQVKuHdDJg 2mXy801m4guRQ9I3lCmqCr+jZvAOkp2rSbgk2URRLcoiZasyYLjl0IfOiTVIpSAZG+1e 2XBLUL2bmy/cotx1mRAvNVhLK5qeuPvMEYyTbvhsnENbNetIQNWV4pSKPtTmm34J/ZCa NaelTcFC3n8FHzi4u+cnyNAAxV6+lR7PwLpuvN0u7q4tplJNhsq5FC2ttqDTszgKr0a8 4OPg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@pobox.com header.s=sasl header.b=LZcoBDhf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y128si11879321pfy.408.2017.10.11.23.17.03; Wed, 11 Oct 2017 23:17:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@pobox.com header.s=sasl header.b=LZcoBDhf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752840AbdJLGRB (ORCPT + 27 others); Thu, 12 Oct 2017 02:17:01 -0400 Received: from pb-smtp1.pobox.com ([64.147.108.70]:56067 "EHLO sasl.smtp.pobox.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752477AbdJLGQ5 (ORCPT ); Thu, 12 Oct 2017 02:16:57 -0400 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id 67A019304E; Thu, 12 Oct 2017 02:16:56 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:date:message-id:in-reply-to:references; s=sasl; bh=k0Q3 liuZPRMNPPmVCNLytOHIT4E=; b=LZcoBDhfVcjExqll1zcisM4xFYH7hJSjAFVp wBW4kURZrWr9D/uL4SOGrePIyeikFhFweddXJNcMdLG3+h9Wgyf92dsKIPMnZjM3 Y2IZB6ga19CKNSfdWiTPb+qzhPsX9pLbt6ucA60K0FIu3Igef7Q0ObDVF7R26AEP 99oHB+o= Received: from pb-smtp1.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id 5E4619304D; Thu, 12 Oct 2017 02:16:56 -0400 (EDT) Received: from yoda.home (unknown [137.175.234.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp1.pobox.com (Postfix) with ESMTPSA id 90BF393043; Thu, 12 Oct 2017 02:16:43 -0400 (EDT) Received: from xanadu.home (xanadu.home [192.168.2.2]) by yoda.home (Postfix) with ESMTP id C4BF42DA06B8; Thu, 12 Oct 2017 02:16:42 -0400 (EDT) From: Nicolas Pitre To: Alexander Viro , Christoph Hellwig Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-embedded@vger.kernel.org, linux-kernel@vger.kernel.org, Chris Brandt Subject: [PATCH v6 3/4] cramfs: add mmap support Date: Thu, 12 Oct 2017 02:16:12 -0400 Message-Id: <20171012061613.28705-4-nicolas.pitre@linaro.org> X-Mailer: git-send-email 2.9.5 In-Reply-To: <20171012061613.28705-1-nicolas.pitre@linaro.org> References: <20171012061613.28705-1-nicolas.pitre@linaro.org> X-Pobox-Relay-ID: EB11998A-AF14-11E7-87D1-8EF31968708C-78420484!pb-smtp1.pobox.com Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When cramfs in physical memory is used then we have the opportunity to map files directly from ROM, directly into user space, saving on RAM usage. This gives us Execute-In-Place (XIP) support. For a file to be mmap()-able, the map area has to correspond to a range of uncompressed and contiguous blocks, and in the MMU case it also has to be page aligned. A version of mkcramfs with appropriate support is necessary to create such a filesystem image. In the MMU case it may happen for a vma structure to extend beyond the actual file size. This is notably the case in binfmt_elf.c:elf_map(). Or the file's last block is shared with other files and cannot be mapped as is. Rather than refusing to mmap it, we do a "mixed" map and let the regular fault handler populate the unmapped area with RAM-backed pages. In practice the unmapped area is seldom accessed so page faults might never occur before this area is discarded. In the non-MMU case it is the get_unmapped_area method that is responsible for providing the address where the actual data can be found. No mapping is necessary of course. Signed-off-by: Nicolas Pitre Tested-by: Chris Brandt --- fs/cramfs/inode.c | 209 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 209 insertions(+) -- 2.9.5 Reviewed-by: Christoph Hellwig diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c index d3066a8534..d967904c53 100644 --- a/fs/cramfs/inode.c +++ b/fs/cramfs/inode.c @@ -15,7 +15,10 @@ #include #include +#include #include +#include +#include #include #include #include @@ -51,6 +54,7 @@ static inline struct cramfs_sb_info *CRAMFS_SB(struct super_block *sb) static const struct super_operations cramfs_ops; static const struct inode_operations cramfs_dir_inode_operations; static const struct file_operations cramfs_directory_operations; +static const struct file_operations cramfs_physmem_fops; static const struct address_space_operations cramfs_aops; static DEFINE_MUTEX(read_mutex); @@ -98,6 +102,10 @@ static struct inode *get_cramfs_inode(struct super_block *sb, case S_IFREG: inode->i_fop = &generic_ro_fops; inode->i_data.a_ops = &cramfs_aops; + if (IS_ENABLED(CONFIG_CRAMFS_MTD) && + CRAMFS_SB(sb)->flags & CRAMFS_FLAG_EXT_BLOCK_POINTERS && + CRAMFS_SB(sb)->linear_phys_addr) + inode->i_fop = &cramfs_physmem_fops; break; case S_IFDIR: inode->i_op = &cramfs_dir_inode_operations; @@ -279,6 +287,207 @@ static void *cramfs_read(struct super_block *sb, unsigned int offset, return NULL; } +/* + * For a mapping to be possible, we need a range of uncompressed and + * contiguous blocks. Return the offset for the first block and number of + * valid blocks for which that is true, or zero otherwise. + */ +static u32 cramfs_get_block_range(struct inode *inode, u32 pgoff, u32 *pages) +{ + struct cramfs_sb_info *sbi = CRAMFS_SB(inode->i_sb); + int i; + u32 *blockptrs, first_block_addr; + + /* + * We can dereference memory directly here as this code may be + * reached only when there is a direct filesystem image mapping + * available in memory. + */ + blockptrs = (u32 *)(sbi->linear_virt_addr + OFFSET(inode) + pgoff * 4); + first_block_addr = blockptrs[0] & ~CRAMFS_BLK_FLAGS; + i = 0; + do { + u32 block_off = i * (PAGE_SIZE >> CRAMFS_BLK_DIRECT_PTR_SHIFT); + u32 expect = (first_block_addr + block_off) | + CRAMFS_BLK_FLAG_DIRECT_PTR | + CRAMFS_BLK_FLAG_UNCOMPRESSED; + if (blockptrs[i] != expect) { + pr_debug("range: block %d/%d got %#x expects %#x\n", + pgoff+i, pgoff + *pages - 1, + blockptrs[i], expect); + if (i == 0) + return 0; + break; + } + } while (++i < *pages); + + *pages = i; + return first_block_addr << CRAMFS_BLK_DIRECT_PTR_SHIFT; +} + +#ifdef CONFIG_MMU + +/* + * Return true if the last page of a file in the filesystem image contains + * some other data that doesn't belong to that file. It is assumed that the + * last block is CRAMFS_BLK_FLAG_DIRECT_PTR | CRAMFS_BLK_FLAG_UNCOMPRESSED + * (verified by cramfs_get_block_range() and directly accessible in memory. + */ +static bool cramfs_last_page_is_shared(struct inode *inode) +{ + struct cramfs_sb_info *sbi = CRAMFS_SB(inode->i_sb); + u32 partial, last_page, blockaddr, *blockptrs; + char *tail_data; + + partial = offset_in_page(inode->i_size); + if (!partial) + return false; + last_page = inode->i_size >> PAGE_SHIFT; + blockptrs = (u32 *)(sbi->linear_virt_addr + OFFSET(inode)); + blockaddr = blockptrs[last_page] & ~CRAMFS_BLK_FLAGS; + blockaddr <<= CRAMFS_BLK_DIRECT_PTR_SHIFT; + tail_data = sbi->linear_virt_addr + blockaddr + partial; + return memchr_inv(tail_data, 0, PAGE_SIZE - partial) ? true : false; +} + +static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct inode *inode = file_inode(file); + struct cramfs_sb_info *sbi = CRAMFS_SB(inode->i_sb); + unsigned int pages, max_pages, offset; + unsigned long address, pgoff = vma->vm_pgoff; + char *bailout_reason; + int ret; + + ret = generic_file_readonly_mmap(file, vma); + if (ret) + return ret; + + /* + * Now try to pre-populate ptes for this vma with a direct + * mapping avoiding memory allocation when possible. + */ + + /* Could COW work here? */ + bailout_reason = "vma is writable"; + if (vma->vm_flags & VM_WRITE) + goto bailout; + + max_pages = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT; + bailout_reason = "beyond file limit"; + if (pgoff >= max_pages) + goto bailout; + pages = min(vma_pages(vma), max_pages - pgoff); + + offset = cramfs_get_block_range(inode, pgoff, &pages); + bailout_reason = "unsuitable block layout"; + if (!offset) + goto bailout; + address = sbi->linear_phys_addr + offset; + bailout_reason = "data is not page aligned"; + if (!PAGE_ALIGNED(address)) + goto bailout; + + /* Don't map the last page if it contains some other data */ + if (pgoff + pages == max_pages && cramfs_last_page_is_shared(inode)) { + pr_debug("mmap: %s: last page is shared\n", + file_dentry(file)->d_name.name); + pages--; + } + + if (!pages) { + bailout_reason = "no suitable block remaining"; + goto bailout; + } + + if (pages == vma_pages(vma)) { + /* + * The entire vma is mappable. remap_pfn_range() will + * make it distinguishable from a non-direct mapping + * in /proc//maps by substituting the file offset + * with the actual physical address. + */ + ret = remap_pfn_range(vma, vma->vm_start, address >> PAGE_SHIFT, + pages * PAGE_SIZE, vma->vm_page_prot); + } else { + /* + * Let's create a mixed map if we can't map it all. + * The normal paging machinery will take care of the + * unpopulated ptes via cramfs_readpage(). + */ + int i; + vma->vm_flags |= VM_MIXEDMAP; + for (i = 0; i < pages && !ret; i++) { + unsigned long off = i * PAGE_SIZE; + pfn_t pfn = phys_to_pfn_t(address + off, PFN_DEV); + ret = vm_insert_mixed(vma, vma->vm_start + off, pfn); + } + } + + if (!ret) + pr_debug("mapped %s[%lu] at 0x%08lx (%u/%lu pages) " + "to vma 0x%08lx, page_prot 0x%llx\n", + file_dentry(file)->d_name.name, pgoff, + address, pages, vma_pages(vma), vma->vm_start, + (unsigned long long)pgprot_val(vma->vm_page_prot)); + return ret; + +bailout: + pr_debug("%s[%lu]: direct mmap impossible: %s\n", + file_dentry(file)->d_name.name, pgoff, bailout_reason); + /* Didn't manage any direct map, but normal paging is still possible */ + return 0; +} + +#else /* CONFIG_MMU */ + +static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma) +{ + return vma->vm_flags & (VM_SHARED | VM_MAYSHARE) ? 0 : -ENOSYS; +} + +static unsigned long cramfs_physmem_get_unmapped_area(struct file *file, + unsigned long addr, unsigned long len, + unsigned long pgoff, unsigned long flags) +{ + struct inode *inode = file_inode(file); + struct super_block *sb = inode->i_sb; + struct cramfs_sb_info *sbi = CRAMFS_SB(sb); + unsigned int pages, block_pages, max_pages, offset; + + pages = (len + PAGE_SIZE - 1) >> PAGE_SHIFT; + max_pages = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT; + if (pgoff >= max_pages || pages > max_pages - pgoff) + return -EINVAL; + block_pages = pages; + offset = cramfs_get_block_range(inode, pgoff, &block_pages); + if (!offset || block_pages != pages) + return -ENOSYS; + addr = sbi->linear_phys_addr + offset; + pr_debug("get_unmapped for %s ofs %#lx siz %lu at 0x%08lx\n", + file_dentry(file)->d_name.name, pgoff*PAGE_SIZE, len, addr); + return addr; +} + +static unsigned int cramfs_physmem_mmap_capabilities(struct file *file) +{ + return NOMMU_MAP_COPY | NOMMU_MAP_DIRECT | + NOMMU_MAP_READ | NOMMU_MAP_EXEC; +} + +#endif /* CONFIG_MMU */ + +static const struct file_operations cramfs_physmem_fops = { + .llseek = generic_file_llseek, + .read_iter = generic_file_read_iter, + .splice_read = generic_file_splice_read, + .mmap = cramfs_physmem_mmap, +#ifndef CONFIG_MMU + .get_unmapped_area = cramfs_physmem_get_unmapped_area, + .mmap_capabilities = cramfs_physmem_mmap_capabilities, +#endif +}; + static void cramfs_kill_sb(struct super_block *sb) { struct cramfs_sb_info *sbi = CRAMFS_SB(sb);