From patchwork Thu Oct 8 10:59:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anand Jain X-Patchwork-Id: 268989 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70141C4363A for ; Thu, 8 Oct 2020 11:00:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EF194217BA for ; Thu, 8 Oct 2020 11:00:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="t0P2dXdT" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726115AbgJHLAO (ORCPT ); Thu, 8 Oct 2020 07:00:14 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:54840 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725941AbgJHLAO (ORCPT ); Thu, 8 Oct 2020 07:00:14 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 098B05KE124111; Thu, 8 Oct 2020 11:00:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=BIZ9fryrmRrVhcBHB0BNbiwOy/jnq8U+/1IagHmFeJo=; b=t0P2dXdTLbS/Vmm/UULAvlkxZebTyPP6+OsSESJj3qP60/U39oBlXs11L5N6DSbZRQam 8Ky7zSE4CZa9foIghuoeq9yuPPd59Jv2RhawFCj8dTTPdCY6dtdzYCMwmyZIYQE9Fcsz u0P1obsp5N3G6RyIVz3P9kzEkS0IkPfpYPELCGIyKHnOpkAk/AfG80oEXKC0wa7UQPDa vWEIAoDItiIZPtKcbKG1oYZeaxh+9YUjHiTAdtVz6MtA0KU5OIsLXfb++3wmTWmZk1h+ n3h8TpMBhh5W5Aagk9iXTkbreertURsROHOpysu4NKO/SixJ0zbWPnIeeX2th4ONrcXT yw== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 33xhxn6x1d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 08 Oct 2020 11:00:07 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 098AuEUi188369; Thu, 8 Oct 2020 11:00:07 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 33y380yv3a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 08 Oct 2020 11:00:07 +0000 Received: from abhmp0013.oracle.com (abhmp0013.oracle.com [141.146.116.19]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 098B068j011110; Thu, 8 Oct 2020 11:00:06 GMT Received: from ltp.sg.oracle.com (/10.191.35.225) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 08 Oct 2020 04:00:05 -0700 From: Anand Jain To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: wqu@suse.com, fdmanana@suse.com, dsterba@suse.com, linux-btrfs@vger.kernel.org Subject: [PATCH stable-5.4 1/6] Btrfs: send, allow clone operations within the same file Date: Thu, 8 Oct 2020 18:59:49 +0800 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9767 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 malwarescore=0 suspectscore=0 adultscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2010080081 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9767 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 malwarescore=0 bulkscore=0 impostorscore=0 lowpriorityscore=0 suspectscore=0 phishscore=0 mlxlogscore=999 adultscore=0 clxscore=1015 spamscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2010080081 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Filipe Manana commit 11f2069c113e02971b8db6fda62f9b9cd31a030f upstream. For send we currently skip clone operations when the source and destination files are the same. This is so because clone didn't support this case in its early days, but support for it was added back in May 2013 by commit a96fbc72884fcb ("Btrfs: allow file data clone within a file"). This change adds support for it. Example: $ mkfs.btrfs -f /dev/sdd $ mount /dev/sdd /mnt/sdd $ xfs_io -f -c "pwrite -S 0xab -b 64K 0 64K" /mnt/sdd/foobar $ xfs_io -c "reflink /mnt/sdd/foobar 0 64K 64K" /mnt/sdd/foobar $ btrfs subvolume snapshot -r /mnt/sdd /mnt/sdd/snap $ mkfs.btrfs -f /dev/sde $ mount /dev/sde /mnt/sde $ btrfs send /mnt/sdd/snap | btrfs receive /mnt/sde Without this change file foobar at the destination has a single 128Kb extent: $ filefrag -v /mnt/sde/snap/foobar Filesystem type is: 9123683e File size of /mnt/sde/snap/foobar is 131072 (32 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 31: 0.. 31: 32: last,unknown_loc,delalloc,eof /mnt/sde/snap/foobar: 1 extent found With this we get a single 64Kb extent that is shared at file offsets 0 and 64K, just like in the source filesystem: $ filefrag -v /mnt/sde/snap/foobar Filesystem type is: 9123683e File size of /mnt/sde/snap/foobar is 131072 (32 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 15: 3328.. 3343: 16: shared 1: 16.. 31: 3328.. 3343: 16: 3344: last,shared,eof /mnt/sde/snap/foobar: 2 extents found Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Anand Jain --- fs/btrfs/send.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 6ad216e8178e..48d028b1cfaf 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -1257,12 +1257,20 @@ static int __iterate_backrefs(u64 ino, u64 offset, u64 root, void *ctx_) */ if (found->root == bctx->sctx->send_root) { /* - * TODO for the moment we don't accept clones from the inode - * that is currently send. We may change this when - * BTRFS_IOC_CLONE_RANGE supports cloning from and to the same - * file. + * If the source inode was not yet processed we can't issue a + * clone operation, as the source extent does not exist yet at + * the destination of the stream. */ - if (ino >= bctx->cur_objectid) + if (ino > bctx->cur_objectid) + return 0; + /* + * We clone from the inode currently being sent as long as the + * source extent is already processed, otherwise we could try + * to clone from an extent that does not exist yet at the + * destination of the stream. + */ + if (ino == bctx->cur_objectid && + offset >= bctx->sctx->cur_inode_next_write_offset) return 0; } From patchwork Thu Oct 8 10:59:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anand Jain X-Patchwork-Id: 268988 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34FCEC43467 for ; Thu, 8 Oct 2020 11:00:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8EC93215A4 for ; Thu, 8 Oct 2020 11:00:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="p8EQJRfN" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729693AbgJHLAV (ORCPT ); Thu, 8 Oct 2020 07:00:21 -0400 Received: from aserp2130.oracle.com ([141.146.126.79]:57890 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728797AbgJHLAT (ORCPT ); Thu, 8 Oct 2020 07:00:19 -0400 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 098B0E4P101656; Thu, 8 Oct 2020 11:00:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=tWPPbUC2AHoVJyZzDwnMDvWIZeZR9GC/fsnwWvFpj1U=; b=p8EQJRfN+TegsQxG+nr/2kVAznUcauLuHFPSkat2c3Nqd0+jZ4vCnHk1gJm0B+7A2yDk qZrWo4g6j+9udRWgKq4drpw2ZlF1Pui2Xdm97Oby8+Qie1YPErUJ2K+orDY3iA4VCZY6 UUH7P6+KewLh+IOokdXQC6foXFZGhpXQ+o6SbA3oHSMYs4x88pMg7sCmGpEvNYjiZqTm ksbOV7/Cwexc4gs/5Od9SC+3RVQTQJKgVrHDYFN25CTF7mxF2a13uTzJbpKZVJckAQkF smEdddHvBUdSXwy8FF1y2GNWr+mVYLynU6ESxuO97pHwAYUkg5AABJ6QHd7epvladh5P 1w== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2130.oracle.com with ESMTP id 33xetb75yh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 08 Oct 2020 11:00:14 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 098AuESp188416; Thu, 8 Oct 2020 11:00:13 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 33y380yvb8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 08 Oct 2020 11:00:12 +0000 Received: from abhmp0013.oracle.com (abhmp0013.oracle.com [141.146.116.19]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 098B0BQj011154; Thu, 8 Oct 2020 11:00:11 GMT Received: from ltp.sg.oracle.com (/10.191.35.225) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 08 Oct 2020 04:00:11 -0700 From: Anand Jain To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: wqu@suse.com, fdmanana@suse.com, dsterba@suse.com, linux-btrfs@vger.kernel.org Subject: [PATCH stable-5.4 3/6] btrfs: volumes: Use more straightforward way to calculate map length Date: Thu, 8 Oct 2020 18:59:51 +0800 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9767 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 malwarescore=0 suspectscore=0 adultscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2010080081 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9767 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 spamscore=0 mlxscore=0 clxscore=1015 priorityscore=1501 adultscore=0 mlxlogscore=999 phishscore=0 impostorscore=0 malwarescore=0 suspectscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2010080081 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Qu Wenruo commit 2d974619a77f106f3d1341686dea95c0eaad601f upstream. The old code goes: offset = logical - em->start; length = min_t(u64, em->len - offset, length); Where @length calculation is dependent on offset, it can take reader several more seconds to find it's just the same code as: offset = logical - em->start; length = min_t(u64, em->start + em->len - logical, length); Use above code to make the length calculate independent from other variable, thus slightly increase the readability. Reviewed-by: Johannes Thumshirn Reviewed-by: Josef Bacik Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Anand Jain --- fs/btrfs/volumes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 196ddbcd2936..1dba45d43485 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5712,7 +5712,7 @@ static int __btrfs_map_block_for_discard(struct btrfs_fs_info *fs_info, } offset = logical - em->start; - length = min_t(u64, em->len - offset, length); + length = min_t(u64, em->start + em->len - logical, length); stripe_len = map->stripe_len; /* From patchwork Thu Oct 8 10:59:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anand Jain X-Patchwork-Id: 268986 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8713FC43467 for ; Thu, 8 Oct 2020 11:02:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1C7F32177B for ; Thu, 8 Oct 2020 11:02:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="v/ASVHOH" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728951AbgJHLCY (ORCPT ); Thu, 8 Oct 2020 07:02:24 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:56610 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725852AbgJHLCW (ORCPT ); Thu, 8 Oct 2020 07:02:22 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 098AxWMo123624; Thu, 8 Oct 2020 11:02:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=iA7FsH2yfLhZah7k5IYFQh92WFeqRDxQdU06/NPvqFo=; b=v/ASVHOH03vSN9pD4cxdNEmMrKdnQFxGL8RJ96l5vQlEAoa8+VrX4UGEpi75dlv1Tptd UnlQYJ1WQovkohxPGgRo2zh0DDR30sxMsDAAJCnyMRqAm77kJt6Eo68NW1ko8FI5+AXT PyQ35pt2YZ+nANlB+ucgG4lU9dRl84Lv8APEdU+cfT1mfpXRTE8tN9c8js0hgJH510TD NzWRlXn6Yt5eW5qtzo8Fqb86d3U5gElqjCoCuzWPJAuxuvtjJB2vfW2vDiCwJmYroVzb SCD7Wa8Mw4vuKNLjU9TGHHqeaOex1cvOQpwF1xncc7TivInL/MiPz8gMURH5Ti82Dvop tg== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 33xhxn6xb3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 08 Oct 2020 11:02:16 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 098AuE7F188463; Thu, 8 Oct 2020 11:00:15 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3030.oracle.com with ESMTP id 33y380yvdq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 08 Oct 2020 11:00:15 +0000 Received: from abhmp0013.oracle.com (abhmp0013.oracle.com [141.146.116.19]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 098B0EUG023479; Thu, 8 Oct 2020 11:00:14 GMT Received: from ltp.sg.oracle.com (/10.191.35.225) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 08 Oct 2020 04:00:13 -0700 From: Anand Jain To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: wqu@suse.com, fdmanana@suse.com, dsterba@suse.com, linux-btrfs@vger.kernel.org Subject: [PATCH stable-5.4 4/6] btrfs: Ensure we trim ranges across block group boundary Date: Thu, 8 Oct 2020 18:59:52 +0800 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9767 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 malwarescore=0 suspectscore=0 adultscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2010080081 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9767 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 malwarescore=0 bulkscore=0 impostorscore=0 lowpriorityscore=0 suspectscore=0 phishscore=0 mlxlogscore=999 adultscore=0 clxscore=1015 spamscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2010080081 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Qu Wenruo commit 6b7faadd985c990324b5b5bd18cc4ba5c395eb65 upstream. [BUG] When deleting large files (which cross block group boundary) with discard mount option, we find some btrfs_discard_extent() calls only trimmed part of its space, not the whole range: btrfs_discard_extent: type=0x1 start=19626196992 len=2144530432 trimmed=1073741824 ratio=50% type: bbio->map_type, in above case, it's SINGLE DATA. start: Logical address of this trim len: Logical length of this trim trimmed: Physically trimmed bytes ratio: trimmed / len Thus leaving some unused space not discarded. [CAUSE] When discard mount option is specified, after a transaction is fully committed (super block written to disk), we begin to cleanup pinned extents in the following call chain: btrfs_commit_transaction() |- btrfs_finish_extent_commit() |- find_first_extent_bit(unpin, 0, &start, &end, EXTENT_DIRTY); |- btrfs_discard_extent() However, pinned extents are recorded in an extent_io_tree, which can merge adjacent extent states. When a large file gets deleted and it has adjacent file extents across block group boundary, we will get a large merged range like this: |<--- BG1 --->|<--- BG2 --->| |//////|<-- Range to discard --->|/////| To discard that range, we have the following calls: btrfs_discard_extent() |- btrfs_map_block() | Returned bbio will end at BG1's end. As btrfs_map_block() | never returns result across block group boundary. |- btrfs_issuse_discard() Issue discard for each stripe. So we will only discard the range in BG1, not the remaining part in BG2. Furthermore, this bug is not that reliably observed, for above case, if there is no other extent in BG2, BG2 will be empty and btrfs will trim all space of BG2, covering up the bug. [FIX] - Allow __btrfs_map_block_for_discard() to modify @length parameter btrfs_map_block() uses its @length paramter to notify the caller how many bytes are mapped in current call. With __btrfs_map_block_for_discard() also modifing the @length, btrfs_discard_extent() now understands when to do extra trim. - Call btrfs_map_block() in a loop until we hit the range end Since we now know how many bytes are mapped each time, we can iterate through each block group boundary and issue correct trim for each range. Reviewed-by: Filipe Manana Reviewed-by: Nikolay Borisov Tested-by: Nikolay Borisov Reviewed-by: Josef Bacik Signed-off-by: Qu Wenruo Signed-off-by: David Sterba Signed-off-by: Anand Jain --- fs/btrfs/extent-tree.c | 41 +++++++++++++++++++++++++++++++---------- fs/btrfs/volumes.c | 6 ++++-- 2 files changed, 35 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ef05cbacef73..a546212594e8 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1306,8 +1306,10 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes, u64 *actual_bytes) { - int ret; + int ret = 0; u64 discarded_bytes = 0; + u64 end = bytenr + num_bytes; + u64 cur = bytenr; struct btrfs_bio *bbio = NULL; @@ -1316,15 +1318,23 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, * associated to its stripes that don't go away while we are discarding. */ btrfs_bio_counter_inc_blocked(fs_info); - /* Tell the block device(s) that the sectors can be discarded */ - ret = btrfs_map_block(fs_info, BTRFS_MAP_DISCARD, bytenr, &num_bytes, - &bbio, 0); - /* Error condition is -ENOMEM */ - if (!ret) { - struct btrfs_bio_stripe *stripe = bbio->stripes; + while (cur < end) { + struct btrfs_bio_stripe *stripe; int i; + num_bytes = end - cur; + /* Tell the block device(s) that the sectors can be discarded */ + ret = btrfs_map_block(fs_info, BTRFS_MAP_DISCARD, cur, + &num_bytes, &bbio, 0); + /* + * Error can be -ENOMEM, -ENOENT (no such chunk mapping) or + * -EOPNOTSUPP. For any such error, @num_bytes is not updated, + * thus we can't continue anyway. + */ + if (ret < 0) + goto out; + stripe = bbio->stripes; for (i = 0; i < bbio->num_stripes; i++, stripe++) { u64 bytes; struct request_queue *req_q; @@ -1341,10 +1351,19 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, stripe->physical, stripe->length, &bytes); - if (!ret) + if (!ret) { discarded_bytes += bytes; - else if (ret != -EOPNOTSUPP) - break; /* Logic errors or -ENOMEM, or -EIO but I don't know how that could happen JDM */ + } else if (ret != -EOPNOTSUPP) { + /* + * Logic errors or -ENOMEM, or -EIO, but + * unlikely to happen. + * + * And since there are two loops, explicitly + * go to out to avoid confusion. + */ + btrfs_put_bbio(bbio); + goto out; + } /* * Just in case we get back EOPNOTSUPP for some reason, @@ -1354,7 +1373,9 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, ret = 0; } btrfs_put_bbio(bbio); + cur += num_bytes; } +out: btrfs_bio_counter_dec(fs_info); if (actual_bytes) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1dba45d43485..ad540283d455 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5674,12 +5674,13 @@ void btrfs_put_bbio(struct btrfs_bio *bbio) * replace. */ static int __btrfs_map_block_for_discard(struct btrfs_fs_info *fs_info, - u64 logical, u64 length, + u64 logical, u64 *length_ret, struct btrfs_bio **bbio_ret) { struct extent_map *em; struct map_lookup *map; struct btrfs_bio *bbio; + u64 length = *length_ret; u64 offset; u64 stripe_nr; u64 stripe_nr_end; @@ -5713,6 +5714,7 @@ static int __btrfs_map_block_for_discard(struct btrfs_fs_info *fs_info, offset = logical - em->start; length = min_t(u64, em->start + em->len - logical, length); + *length_ret = length; stripe_len = map->stripe_len; /* @@ -6127,7 +6129,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, if (op == BTRFS_MAP_DISCARD) return __btrfs_map_block_for_discard(fs_info, logical, - *length, bbio_ret); + length, bbio_ret); ret = btrfs_get_io_geometry(fs_info, op, logical, *length, &geom); if (ret < 0) From patchwork Thu Oct 8 10:59:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anand Jain X-Patchwork-Id: 268987 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2692CC4727E for ; Thu, 8 Oct 2020 11:00:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BE6182177B for ; Thu, 8 Oct 2020 11:00:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="K6TwWS9z" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729712AbgJHLAa (ORCPT ); Thu, 8 Oct 2020 07:00:30 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:58974 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729704AbgJHLAY (ORCPT ); Thu, 8 Oct 2020 07:00:24 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 098B0I3k010622; Thu, 8 Oct 2020 11:00:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=uquMoBZZItGLXnFVj80x0u7kQnKwqnsLLDba9RQOm00=; b=K6TwWS9zjjnHlBPU5/jLZs/nlImFleSHtDtazHayfBwhbV+0aW3AAcYuHUbBZy7sUez9 EbfXMOLlLO/7SC2yyV2ft5/xDfTw0vs2pZPDwhX7N6b0nIhI2qsMiwEUHbmDlEmZNyQv DMLW9EgsMUEYQYZk4QOHvfxDF3z0V0CdaAix4nwUBhinGX6+6F9vDptRBGShkAFBsWNe W9HhFCKJOfoFm2URrHDbuyfDWq+PWzWyl3C/fg+kfPpmTZklWWnIz2mxa6lgfu+eTbmL t5N7z+kx5Frt/RgpZ/8EDeBez/z8Captm7n/jxvyWQKoNkhOp4wme3PxpcxJhhG+juuW hg== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2120.oracle.com with ESMTP id 33ym34v3rs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 08 Oct 2020 11:00:18 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 098AstPH087573; Thu, 8 Oct 2020 11:00:17 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserp3020.oracle.com with ESMTP id 3410k1009c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 08 Oct 2020 11:00:17 +0000 Received: from abhmp0013.oracle.com (abhmp0013.oracle.com [141.146.116.19]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 098B0G5p023487; Thu, 8 Oct 2020 11:00:17 GMT Received: from ltp.sg.oracle.com (/10.191.35.225) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 08 Oct 2020 04:00:16 -0700 From: Anand Jain To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: wqu@suse.com, fdmanana@suse.com, dsterba@suse.com, linux-btrfs@vger.kernel.org Subject: [PATCH stable-5.4 5/6] btrfs: fix RWF_NOWAIT write not failling when we need to cow Date: Thu, 8 Oct 2020 18:59:53 +0800 Message-Id: <3d6c7a6445ab42dfacd86b8b3ab166decc8ee65a.1599750901.git.anand.jain@oracle.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9767 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 mlxlogscore=999 spamscore=0 adultscore=0 bulkscore=0 malwarescore=0 suspectscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2010080081 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9767 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 impostorscore=0 priorityscore=1501 mlxscore=0 mlxlogscore=999 clxscore=1015 bulkscore=0 spamscore=0 malwarescore=0 phishscore=0 suspectscore=0 adultscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2010080081 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Filipe Manana commit 260a63395f90f67d6ab89e4266af9e3dc34a77e9 upstream. If we attempt to do a RWF_NOWAIT write against a file range for which we can only do NOCOW for a part of it, due to the existence of holes or shared extents for example, we proceed with the write as if it were possible to NOCOW the whole range. Example: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ touch /mnt/sdj/bar $ chattr +C /mnt/sdj/bar $ xfs_io -d -c "pwrite -S 0xab -b 256K 0 256K" /mnt/bar wrote 262144/262144 bytes at offset 0 256 KiB, 1 ops; 0.0003 sec (694.444 MiB/sec and 2777.7778 ops/sec) $ xfs_io -c "fpunch 64K 64K" /mnt/bar $ sync $ xfs_io -d -c "pwrite -N -V 1 -b 128K -S 0xfe 0 128K" /mnt/bar wrote 131072/131072 bytes at offset 0 128 KiB, 1 ops; 0.0007 sec (160.051 MiB/sec and 1280.4097 ops/sec) This last write should fail with -EAGAIN since the file range from 64K to 128K is a hole. On xfs it fails, as expected, but on ext4 it currently succeeds because apparently it is expensive to check if there are extents allocated for the whole range, but I'll check with the ext4 people. Fix the issue by checking if check_can_nocow() returns a number of NOCOW'able bytes smaller then the requested number of bytes, and if it does return -EAGAIN. Fixes: edf064e7c6fec3 ("btrfs: nowait aio support") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Anand Jain --- fs/btrfs/file.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 4e4ddd5629e5..4d0e1f9452a5 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1919,13 +1919,27 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, pos = iocb->ki_pos; count = iov_iter_count(from); if (iocb->ki_flags & IOCB_NOWAIT) { + size_t nocow_bytes = count; + /* * We will allocate space in case nodatacow is not set, * so bail */ if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)) || - check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) { + check_can_nocow(BTRFS_I(inode), pos, &nocow_bytes) <= 0) { + inode_unlock(inode); + return -EAGAIN; + } + + /* check_can_nocow() locks the snapshot lock on success */ + btrfs_end_write_no_snapshotting(root); + /* + * There are holes in the range or parts of the range that must + * be COWed (shared extents, RO block groups, etc), so just bail + * out. + */ + if (nocow_bytes < count) { inode_unlock(inode); return -EAGAIN; }