Message ID | 5ca694ff4f8cff4c0ef6896593a1f1d01fbe956d.1611610947.git.josef@toxicpanda.com |
---|---|
State | Accepted |
Commit | 95c85fba1f64c3249c67f0078a29f8a125078189 |
Headers | show |
Series | btrfs: avoid double put of block group when emptying cluster | expand |
On 25.01.21 г. 23:42 ч., Josef Bacik wrote: > In __btrfs_return_cluster_to_free_space we will bail doing the cleanup > of the cluster if the block group we passed in doesn't match the block > group on the cluster. However we drop a reference to block_group, as > the cluster holds a reference to the block group while it's attached to > the cluster. If cluster->block_group != block_group however then this > is an extra put, which means we'll go negative and free this block group > down the line, leading to a UAF. Was this found by code inspection or did you hit in production. Also why in btrfs_remove_free_space_cache just before __btrfs_return_cluster_to_free_space there is: WARN_ON(cluster->block_group != block_group); IMO this patch should also remove the WARN_ON if it's a valid condition to have the passed bg be different than the one in the cluster. Also that WARN_ON is likely racy since it's done outside of cluster->lock. > > Fix this by simply bailing if the block group we passed in does not > match the block group on the cluster. > > CC: stable@vger.kernel.org > Fixes: fa9c0d795f7b ("Btrfs: rework allocation clustering") > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- > fs/btrfs/free-space-cache.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c > index 0d6dcb5ff963..8be36cc6cbd8 100644 > --- a/fs/btrfs/free-space-cache.c > +++ b/fs/btrfs/free-space-cache.c > @@ -2711,8 +2711,10 @@ static void __btrfs_return_cluster_to_free_space( > struct rb_node *node; > > spin_lock(&cluster->lock); > - if (cluster->block_group != block_group) > - goto out; > + if (cluster->block_group != block_group) { > + spin_unlock(&cluster->lock); > + return; > + } > > cluster->block_group = NULL; > cluster->window_start = 0; > @@ -2750,8 +2752,6 @@ static void __btrfs_return_cluster_to_free_space( > entry->offset, &entry->offset_index, bitmap); > } > cluster->root = RB_ROOT; > - > -out: > spin_unlock(&cluster->lock); > btrfs_put_block_group(block_group); > } >
On 1/26/21 4:02 AM, Nikolay Borisov wrote: > > > On 25.01.21 г. 23:42 ч., Josef Bacik wrote: >> In __btrfs_return_cluster_to_free_space we will bail doing the cleanup >> of the cluster if the block group we passed in doesn't match the block >> group on the cluster. However we drop a reference to block_group, as >> the cluster holds a reference to the block group while it's attached to >> the cluster. If cluster->block_group != block_group however then this >> is an extra put, which means we'll go negative and free this block group >> down the line, leading to a UAF. > > Was this found by code inspection or did you hit in production. Also why > in btrfs_remove_free_space_cache just before > __btrfs_return_cluster_to_free_space there is: > It was found in production sort of halfway. I was doing something for WhatsApp and had to convert our block group reference counting to the refcount stuff so I could find where I made a mistake. Turns out this was where the problem was, my stuff had just made it way more likely to happen. I don't have the stack trace because this was like 6 months ago, I'm going through all my WhatsApp magic and getting them actually usable for upstream. > WARN_ON(cluster->block_group != block_group); > > IMO this patch should also remove the WARN_ON if it's a valid condition > to have the passed bg be different than the one in the cluster. Also > that WARN_ON is likely racy since it's done outside of cluster->lock. > Yup that's in a follow up thing, I wanted to get the actual fix out before I got distracted by my mountain of meetings this week. Thanks, Josef
On Tue, Jan 26, 2021 at 09:30:45AM -0500, Josef Bacik wrote: > On 1/26/21 4:02 AM, Nikolay Borisov wrote: > > On 25.01.21 г. 23:42 ч., Josef Bacik wrote: > >> In __btrfs_return_cluster_to_free_space we will bail doing the cleanup > >> of the cluster if the block group we passed in doesn't match the block > >> group on the cluster. However we drop a reference to block_group, as > >> the cluster holds a reference to the block group while it's attached to > >> the cluster. If cluster->block_group != block_group however then this > >> is an extra put, which means we'll go negative and free this block group > >> down the line, leading to a UAF. > > > > Was this found by code inspection or did you hit in production. Also why > > in btrfs_remove_free_space_cache just before > > __btrfs_return_cluster_to_free_space there is: > > > > It was found in production sort of halfway. I was doing something for WhatsApp > and had to convert our block group reference counting to the refcount stuff so I > could find where I made a mistake. Turns out this was where the problem was, my > stuff had just made it way more likely to happen. I don't have the stack trace > because this was like 6 months ago, I'm going through all my WhatsApp magic and > getting them actually usable for upstream. > > > WARN_ON(cluster->block_group != block_group); > > > > IMO this patch should also remove the WARN_ON if it's a valid condition > > to have the passed bg be different than the one in the cluster. Also > > that WARN_ON is likely racy since it's done outside of cluster->lock. > > > > Yup that's in a follow up thing, I wanted to get the actual fix out before I got > distracted by my mountain of meetings this week. Thanks, Removing the WARN_ON in a separate patch sounds ok to me, this patch clearly fixes the refcounting bug, the warning condition is the same but would need a different reasoning. Nikolay, if you're ok with current patch version let me know if you want a rev-by added.
On 11.02.21 г. 0:50 ч., David Sterba wrote: > On Tue, Jan 26, 2021 at 09:30:45AM -0500, Josef Bacik wrote: >> On 1/26/21 4:02 AM, Nikolay Borisov wrote: >>> On 25.01.21 г. 23:42 ч., Josef Bacik wrote: >>>> In __btrfs_return_cluster_to_free_space we will bail doing the cleanup >>>> of the cluster if the block group we passed in doesn't match the block >>>> group on the cluster. However we drop a reference to block_group, as >>>> the cluster holds a reference to the block group while it's attached to >>>> the cluster. If cluster->block_group != block_group however then this >>>> is an extra put, which means we'll go negative and free this block group >>>> down the line, leading to a UAF. >>> >>> Was this found by code inspection or did you hit in production. Also why >>> in btrfs_remove_free_space_cache just before >>> __btrfs_return_cluster_to_free_space there is: >>> >> >> It was found in production sort of halfway. I was doing something for WhatsApp >> and had to convert our block group reference counting to the refcount stuff so I >> could find where I made a mistake. Turns out this was where the problem was, my >> stuff had just made it way more likely to happen. I don't have the stack trace >> because this was like 6 months ago, I'm going through all my WhatsApp magic and >> getting them actually usable for upstream. >> >>> WARN_ON(cluster->block_group != block_group); >>> >>> IMO this patch should also remove the WARN_ON if it's a valid condition >>> to have the passed bg be different than the one in the cluster. Also >>> that WARN_ON is likely racy since it's done outside of cluster->lock. >>> >> >> Yup that's in a follow up thing, I wanted to get the actual fix out before I got >> distracted by my mountain of meetings this week. Thanks, > > Removing the WARN_ON in a separate patch sounds ok to me, this patch > clearly fixes the refcounting bug, the warning condition is the same but > would need a different reasoning. > > Nikolay, if you're ok with current patch version let me know if you want > a rev-by added. > Codewise I'm fine with it. However just had another read of the commit message and I think it could be rewritten to be somewhat simpler: It's wrong calling btrfs_put_block_group in __btrfs_return_cluster_to_free_space if the block group passed is different than the block group the cluster represents. As this means the cluster doesn't have a reference to the passed block group. This results in double put and an UAF. What prompted me is that the 2nd and 3rd sentences read somewhat awkward due to starting with 'However'
On Thu, Feb 11, 2021 at 01:25:52PM +0200, Nikolay Borisov wrote: > > > On 11.02.21 г. 0:50 ч., David Sterba wrote: > > On Tue, Jan 26, 2021 at 09:30:45AM -0500, Josef Bacik wrote: > >> On 1/26/21 4:02 AM, Nikolay Borisov wrote: > >>> On 25.01.21 г. 23:42 ч., Josef Bacik wrote: > >>>> In __btrfs_return_cluster_to_free_space we will bail doing the cleanup > >>>> of the cluster if the block group we passed in doesn't match the block > >>>> group on the cluster. However we drop a reference to block_group, as > >>>> the cluster holds a reference to the block group while it's attached to > >>>> the cluster. If cluster->block_group != block_group however then this > >>>> is an extra put, which means we'll go negative and free this block group > >>>> down the line, leading to a UAF. > >>> > >>> Was this found by code inspection or did you hit in production. Also why > >>> in btrfs_remove_free_space_cache just before > >>> __btrfs_return_cluster_to_free_space there is: > >>> > >> > >> It was found in production sort of halfway. I was doing something for WhatsApp > >> and had to convert our block group reference counting to the refcount stuff so I > >> could find where I made a mistake. Turns out this was where the problem was, my > >> stuff had just made it way more likely to happen. I don't have the stack trace > >> because this was like 6 months ago, I'm going through all my WhatsApp magic and > >> getting them actually usable for upstream. > >> > >>> WARN_ON(cluster->block_group != block_group); > >>> > >>> IMO this patch should also remove the WARN_ON if it's a valid condition > >>> to have the passed bg be different than the one in the cluster. Also > >>> that WARN_ON is likely racy since it's done outside of cluster->lock. > >>> > >> > >> Yup that's in a follow up thing, I wanted to get the actual fix out before I got > >> distracted by my mountain of meetings this week. Thanks, > > > > Removing the WARN_ON in a separate patch sounds ok to me, this patch > > clearly fixes the refcounting bug, the warning condition is the same but > > would need a different reasoning. > > > > Nikolay, if you're ok with current patch version let me know if you want > > a rev-by added. > > > > > Codewise I'm fine with it. However just had another read of the commit > message and I think it could be rewritten to be somewhat simpler: > > It's wrong calling btrfs_put_block_group in > __btrfs_return_cluster_to_free_space if the block group passed is > different than the block group the cluster represents. As this means the > cluster doesn't have a reference to the passed block group. This results > in double put and an UAF. > > What prompted me is that the 2nd and 3rd sentences read somewhat awkward > due to starting with 'However' Ok, updated, thanks. I left the last paragraph "Fix that ...".
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 0d6dcb5ff963..8be36cc6cbd8 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2711,8 +2711,10 @@ static void __btrfs_return_cluster_to_free_space( struct rb_node *node; spin_lock(&cluster->lock); - if (cluster->block_group != block_group) - goto out; + if (cluster->block_group != block_group) { + spin_unlock(&cluster->lock); + return; + } cluster->block_group = NULL; cluster->window_start = 0; @@ -2750,8 +2752,6 @@ static void __btrfs_return_cluster_to_free_space( entry->offset, &entry->offset_index, bitmap); } cluster->root = RB_ROOT; - -out: spin_unlock(&cluster->lock); btrfs_put_block_group(block_group); }
In __btrfs_return_cluster_to_free_space we will bail doing the cleanup of the cluster if the block group we passed in doesn't match the block group on the cluster. However we drop a reference to block_group, as the cluster holds a reference to the block group while it's attached to the cluster. If cluster->block_group != block_group however then this is an extra put, which means we'll go negative and free this block group down the line, leading to a UAF. Fix this by simply bailing if the block group we passed in does not match the block group on the cluster. CC: stable@vger.kernel.org Fixes: fa9c0d795f7b ("Btrfs: rework allocation clustering") Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/free-space-cache.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)