Message ID | 20200623195323.968867013@linuxfoundation.org |
---|---|
State | Superseded |
Headers | show |
Series | None | expand |
Hi, On Tue, Jun 23, 2020 at 09:57:50PM +0200, Greg Kroah-Hartman wrote: > From: Bob Peterson <rpeterso@redhat.com> > > [ Upstream commit 83d060ca8d90fa1e3feac227f995c013100862d3 ] > > Before this patch, transactions could be merged into the system > transaction by function gfs2_merge_trans(), but the transaction ail > lists were never merged. Because the ail flushing mechanism can run > separately, bd elements can be attached to the transaction's buffer > list during the transaction (trans_add_meta, etc) but quickly moved > to its ail lists. Later, in function gfs2_trans_end, the transaction > can be freed (by gfs2_trans_end) while it still has bd elements > queued to its ail lists, which can cause it to either lose track of > the bd elements altogether (memory leak) or worse, reference the bd > elements after the parent transaction has been freed. > > Although I've not seen any serious consequences, the problem becomes > apparent with the previous patch's addition of: > > gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list)); > > to function gfs2_trans_free(). > > This patch adds logic into gfs2_merge_trans() to move the merged > transaction's ail lists to the sdp transaction. This prevents the > use-after-free. To do this properly, we need to hold the ail lock, > so we pass sdp into the function instead of the transaction itself. > > Signed-off-by: Bob Peterson <rpeterso@redhat.com> > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> > Signed-off-by: Sasha Levin <sashal@kernel.org> > --- > fs/gfs2/log.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c > index d3f0612e33471..06752db213d21 100644 > --- a/fs/gfs2/log.c > +++ b/fs/gfs2/log.c > @@ -877,8 +877,10 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags) > * @new: New transaction to be merged > */ > > -static void gfs2_merge_trans(struct gfs2_trans *old, struct gfs2_trans *new) > +static void gfs2_merge_trans(struct gfs2_sbd *sdp, struct gfs2_trans *new) > { > + struct gfs2_trans *old = sdp->sd_log_tr; > + > WARN_ON_ONCE(!test_bit(TR_ATTACHED, &old->tr_flags)); > > old->tr_num_buf_new += new->tr_num_buf_new; > @@ -890,6 +892,11 @@ static void gfs2_merge_trans(struct gfs2_trans *old, struct gfs2_trans *new) > > list_splice_tail_init(&new->tr_databuf, &old->tr_databuf); > list_splice_tail_init(&new->tr_buf, &old->tr_buf); > + > + spin_lock(&sdp->sd_ail_lock); > + list_splice_tail_init(&new->tr_ail1_list, &old->tr_ail1_list); > + list_splice_tail_init(&new->tr_ail2_list, &old->tr_ail2_list); > + spin_unlock(&sdp->sd_ail_lock); > } > > static void log_refund(struct gfs2_sbd *sdp, struct gfs2_trans *tr) > @@ -901,7 +908,7 @@ static void log_refund(struct gfs2_sbd *sdp, struct gfs2_trans *tr) > gfs2_log_lock(sdp); > > if (sdp->sd_log_tr) { > - gfs2_merge_trans(sdp->sd_log_tr, tr); > + gfs2_merge_trans(sdp, tr); > } else if (tr->tr_num_buf_new || tr->tr_num_databuf_new) { > gfs2_assert_withdraw(sdp, test_bit(TR_ALLOCED, &tr->tr_flags)); > sdp->sd_log_tr = tr; > -- > 2.25.1 In Debian two user confirmed issues on writing on a GFS2 partition with this commit applied. The initial Debian report is at https://bugs.debian.org/968567 and Daniel Craig reported it into Bugzilla at https://bugzilla.kernel.org/show_bug.cgi?id=209217 . Writing to a gfs2 filesystem fails and results in a soft lookup of the machine for kernels with that commit applied. I cannot reporduce the issue myself due not having a respective setup available, but Daniel described a minimal serieos of steps to reproduce the issue. This might affect as well other stable series where this commit was applied, as there was a similar report for someone running 5.4.58 in https://www.redhat.com/archives/linux-cluster/2020-August/msg00000.html . Regards, Salvatore
On Thu, Sep 10, 2020 at 09:43:19PM +0200, Salvatore Bonaccorso wrote: > Hi, > > On Tue, Jun 23, 2020 at 09:57:50PM +0200, Greg Kroah-Hartman wrote: > > From: Bob Peterson <rpeterso@redhat.com> > > > > [ Upstream commit 83d060ca8d90fa1e3feac227f995c013100862d3 ] > > > > Before this patch, transactions could be merged into the system > > transaction by function gfs2_merge_trans(), but the transaction ail > > lists were never merged. Because the ail flushing mechanism can run > > separately, bd elements can be attached to the transaction's buffer > > list during the transaction (trans_add_meta, etc) but quickly moved > > to its ail lists. Later, in function gfs2_trans_end, the transaction > > can be freed (by gfs2_trans_end) while it still has bd elements > > queued to its ail lists, which can cause it to either lose track of > > the bd elements altogether (memory leak) or worse, reference the bd > > elements after the parent transaction has been freed. > > > > Although I've not seen any serious consequences, the problem becomes > > apparent with the previous patch's addition of: > > > > gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list)); > > > > to function gfs2_trans_free(). > > > > This patch adds logic into gfs2_merge_trans() to move the merged > > transaction's ail lists to the sdp transaction. This prevents the > > use-after-free. To do this properly, we need to hold the ail lock, > > so we pass sdp into the function instead of the transaction itself. > > > > Signed-off-by: Bob Peterson <rpeterso@redhat.com> > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> > > Signed-off-by: Sasha Levin <sashal@kernel.org> > > --- > > fs/gfs2/log.c | 11 +++++++++-- > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c > > index d3f0612e33471..06752db213d21 100644 > > --- a/fs/gfs2/log.c > > +++ b/fs/gfs2/log.c > > @@ -877,8 +877,10 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags) > > * @new: New transaction to be merged > > */ > > > > -static void gfs2_merge_trans(struct gfs2_trans *old, struct gfs2_trans *new) > > +static void gfs2_merge_trans(struct gfs2_sbd *sdp, struct gfs2_trans *new) > > { > > + struct gfs2_trans *old = sdp->sd_log_tr; > > + > > WARN_ON_ONCE(!test_bit(TR_ATTACHED, &old->tr_flags)); > > > > old->tr_num_buf_new += new->tr_num_buf_new; > > @@ -890,6 +892,11 @@ static void gfs2_merge_trans(struct gfs2_trans *old, struct gfs2_trans *new) > > > > list_splice_tail_init(&new->tr_databuf, &old->tr_databuf); > > list_splice_tail_init(&new->tr_buf, &old->tr_buf); > > + > > + spin_lock(&sdp->sd_ail_lock); > > + list_splice_tail_init(&new->tr_ail1_list, &old->tr_ail1_list); > > + list_splice_tail_init(&new->tr_ail2_list, &old->tr_ail2_list); > > + spin_unlock(&sdp->sd_ail_lock); > > } > > > > static void log_refund(struct gfs2_sbd *sdp, struct gfs2_trans *tr) > > @@ -901,7 +908,7 @@ static void log_refund(struct gfs2_sbd *sdp, struct gfs2_trans *tr) > > gfs2_log_lock(sdp); > > > > if (sdp->sd_log_tr) { > > - gfs2_merge_trans(sdp->sd_log_tr, tr); > > + gfs2_merge_trans(sdp, tr); > > } else if (tr->tr_num_buf_new || tr->tr_num_databuf_new) { > > gfs2_assert_withdraw(sdp, test_bit(TR_ALLOCED, &tr->tr_flags)); > > sdp->sd_log_tr = tr; > > -- > > 2.25.1 > > In Debian two user confirmed issues on writing on a GFS2 partition > with this commit applied. The initial Debian report is at > https://bugs.debian.org/968567 and Daniel Craig reported it into > Bugzilla at https://bugzilla.kernel.org/show_bug.cgi?id=209217 . > > Writing to a gfs2 filesystem fails and results in a soft lookup of the > machine for kernels with that commit applied. I cannot reporduce the > issue myself due not having a respective setup available, but Daniel > described a minimal serieos of steps to reproduce the issue. > > This might affect as well other stable series where this commit was > applied, as there was a similar report for someone running 5.4.58 in > https://www.redhat.com/archives/linux-cluster/2020-August/msg00000.html Can you report this to the gfs2 developers? thanks, greg k-h
----- Original Message ----- > On Thu, Sep 10, 2020 at 09:43:19PM +0200, Salvatore Bonaccorso wrote: > > Hi, > > > > On Tue, Jun 23, 2020 at 09:57:50PM +0200, Greg Kroah-Hartman wrote: > > > From: Bob Peterson <rpeterso@redhat.com> > > > > > > [ Upstream commit 83d060ca8d90fa1e3feac227f995c013100862d3 ] > > > > > > Before this patch, transactions could be merged into the system > > > transaction by function gfs2_merge_trans(), but the transaction ail > > > lists were never merged. Because the ail flushing mechanism can run > > > separately, bd elements can be attached to the transaction's buffer > > > list during the transaction (trans_add_meta, etc) but quickly moved > > > to its ail lists. Later, in function gfs2_trans_end, the transaction > > > can be freed (by gfs2_trans_end) while it still has bd elements > > > queued to its ail lists, which can cause it to either lose track of > > > the bd elements altogether (memory leak) or worse, reference the bd > > > elements after the parent transaction has been freed. > > > > > > Although I've not seen any serious consequences, the problem becomes > > > apparent with the previous patch's addition of: > > > > > > gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list)); > > > > > > to function gfs2_trans_free(). > > > > > > This patch adds logic into gfs2_merge_trans() to move the merged > > > transaction's ail lists to the sdp transaction. This prevents the > > > use-after-free. To do this properly, we need to hold the ail lock, > > > so we pass sdp into the function instead of the transaction itself. > > > > > > Signed-off-by: Bob Peterson <rpeterso@redhat.com> > > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> > > > Signed-off-by: Sasha Levin <sashal@kernel.org> (snip) > > > > In Debian two user confirmed issues on writing on a GFS2 partition > > with this commit applied. The initial Debian report is at > > https://bugs.debian.org/968567 and Daniel Craig reported it into > > Bugzilla at https://bugzilla.kernel.org/show_bug.cgi?id=209217 . > > > > Writing to a gfs2 filesystem fails and results in a soft lookup of the > > machine for kernels with that commit applied. I cannot reporduce the > > issue myself due not having a respective setup available, but Daniel > > described a minimal serieos of steps to reproduce the issue. > > > > This might affect as well other stable series where this commit was > > applied, as there was a similar report for someone running 5.4.58 in > > https://www.redhat.com/archives/linux-cluster/2020-August/msg00000.html > > Can you report this to the gfs2 developers? > > thanks, > > greg k-h Hi Greg, No need. The patch came from the gfs2 developers. I think he just wants it added to a stable release. Regards, Bob Peterson GFS2 File System
Hi Greg, Thanks for your quick reply. On Fri, Sep 11, 2020 at 01:58:16PM +0200, Greg Kroah-Hartman wrote: > On Thu, Sep 10, 2020 at 09:43:19PM +0200, Salvatore Bonaccorso wrote: > > Hi, > > > > On Tue, Jun 23, 2020 at 09:57:50PM +0200, Greg Kroah-Hartman wrote: > > > From: Bob Peterson <rpeterso@redhat.com> > > > > > > [ Upstream commit 83d060ca8d90fa1e3feac227f995c013100862d3 ] > > > > > > Before this patch, transactions could be merged into the system > > > transaction by function gfs2_merge_trans(), but the transaction ail > > > lists were never merged. Because the ail flushing mechanism can run > > > separately, bd elements can be attached to the transaction's buffer > > > list during the transaction (trans_add_meta, etc) but quickly moved > > > to its ail lists. Later, in function gfs2_trans_end, the transaction > > > can be freed (by gfs2_trans_end) while it still has bd elements > > > queued to its ail lists, which can cause it to either lose track of > > > the bd elements altogether (memory leak) or worse, reference the bd > > > elements after the parent transaction has been freed. > > > > > > Although I've not seen any serious consequences, the problem becomes > > > apparent with the previous patch's addition of: > > > > > > gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list)); > > > > > > to function gfs2_trans_free(). > > > > > > This patch adds logic into gfs2_merge_trans() to move the merged > > > transaction's ail lists to the sdp transaction. This prevents the > > > use-after-free. To do this properly, we need to hold the ail lock, > > > so we pass sdp into the function instead of the transaction itself. > > > > > > Signed-off-by: Bob Peterson <rpeterso@redhat.com> > > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> > > > Signed-off-by: Sasha Levin <sashal@kernel.org> > > > --- > > > fs/gfs2/log.c | 11 +++++++++-- > > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c > > > index d3f0612e33471..06752db213d21 100644 > > > --- a/fs/gfs2/log.c > > > +++ b/fs/gfs2/log.c > > > @@ -877,8 +877,10 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags) > > > * @new: New transaction to be merged > > > */ > > > > > > -static void gfs2_merge_trans(struct gfs2_trans *old, struct gfs2_trans *new) > > > +static void gfs2_merge_trans(struct gfs2_sbd *sdp, struct gfs2_trans *new) > > > { > > > + struct gfs2_trans *old = sdp->sd_log_tr; > > > + > > > WARN_ON_ONCE(!test_bit(TR_ATTACHED, &old->tr_flags)); > > > > > > old->tr_num_buf_new += new->tr_num_buf_new; > > > @@ -890,6 +892,11 @@ static void gfs2_merge_trans(struct gfs2_trans *old, struct gfs2_trans *new) > > > > > > list_splice_tail_init(&new->tr_databuf, &old->tr_databuf); > > > list_splice_tail_init(&new->tr_buf, &old->tr_buf); > > > + > > > + spin_lock(&sdp->sd_ail_lock); > > > + list_splice_tail_init(&new->tr_ail1_list, &old->tr_ail1_list); > > > + list_splice_tail_init(&new->tr_ail2_list, &old->tr_ail2_list); > > > + spin_unlock(&sdp->sd_ail_lock); > > > } > > > > > > static void log_refund(struct gfs2_sbd *sdp, struct gfs2_trans *tr) > > > @@ -901,7 +908,7 @@ static void log_refund(struct gfs2_sbd *sdp, struct gfs2_trans *tr) > > > gfs2_log_lock(sdp); > > > > > > if (sdp->sd_log_tr) { > > > - gfs2_merge_trans(sdp->sd_log_tr, tr); > > > + gfs2_merge_trans(sdp, tr); > > > } else if (tr->tr_num_buf_new || tr->tr_num_databuf_new) { > > > gfs2_assert_withdraw(sdp, test_bit(TR_ALLOCED, &tr->tr_flags)); > > > sdp->sd_log_tr = tr; > > > -- > > > 2.25.1 > > > > In Debian two user confirmed issues on writing on a GFS2 partition > > with this commit applied. The initial Debian report is at > > https://bugs.debian.org/968567 and Daniel Craig reported it into > > Bugzilla at https://bugzilla.kernel.org/show_bug.cgi?id=209217 . > > > > Writing to a gfs2 filesystem fails and results in a soft lookup of the > > machine for kernels with that commit applied. I cannot reporduce the > > issue myself due not having a respective setup available, but Daniel > > described a minimal serieos of steps to reproduce the issue. > > > > This might affect as well other stable series where this commit was > > applied, as there was a similar report for someone running 5.4.58 in > > https://www.redhat.com/archives/linux-cluster/2020-August/msg00000.html > > Can you report this to the gfs2 developers? Sure! Bob Peterson and Andreas Gruenbacher were already on the recipient list but I forgot cluster-devel@redhat.com . I can send there a separate report as followup if still needed. Regards, Salvatore
On Fri, Sep 11, 2020 at 2:09 PM Salvatore Bonaccorso <carnil@debian.org> wrote: > On Fri, Sep 11, 2020 at 01:58:16PM +0200, Greg Kroah-Hartman wrote: > > Can you report this to the gfs2 developers? > > Sure! Bob Peterson and Andreas Gruenbacher were already on the > recipient list but I forgot cluster-devel@redhat.com . > > I can send there a separate report as followup if still needed. No need right now, we're looking, thanks. Andreas
On Fri, Sep 11, 2020 at 08:08:35AM -0400, Bob Peterson wrote: > ----- Original Message ----- > > On Thu, Sep 10, 2020 at 09:43:19PM +0200, Salvatore Bonaccorso wrote: > > > Hi, > > > > > > On Tue, Jun 23, 2020 at 09:57:50PM +0200, Greg Kroah-Hartman wrote: > > > > From: Bob Peterson <rpeterso@redhat.com> > > > > > > > > [ Upstream commit 83d060ca8d90fa1e3feac227f995c013100862d3 ] > > > > > > > > Before this patch, transactions could be merged into the system > > > > transaction by function gfs2_merge_trans(), but the transaction ail > > > > lists were never merged. Because the ail flushing mechanism can run > > > > separately, bd elements can be attached to the transaction's buffer > > > > list during the transaction (trans_add_meta, etc) but quickly moved > > > > to its ail lists. Later, in function gfs2_trans_end, the transaction > > > > can be freed (by gfs2_trans_end) while it still has bd elements > > > > queued to its ail lists, which can cause it to either lose track of > > > > the bd elements altogether (memory leak) or worse, reference the bd > > > > elements after the parent transaction has been freed. > > > > > > > > Although I've not seen any serious consequences, the problem becomes > > > > apparent with the previous patch's addition of: > > > > > > > > gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list)); > > > > > > > > to function gfs2_trans_free(). > > > > > > > > This patch adds logic into gfs2_merge_trans() to move the merged > > > > transaction's ail lists to the sdp transaction. This prevents the > > > > use-after-free. To do this properly, we need to hold the ail lock, > > > > so we pass sdp into the function instead of the transaction itself. > > > > > > > > Signed-off-by: Bob Peterson <rpeterso@redhat.com> > > > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> > > > > Signed-off-by: Sasha Levin <sashal@kernel.org> > (snip) > > > > > > In Debian two user confirmed issues on writing on a GFS2 partition > > > with this commit applied. The initial Debian report is at > > > https://bugs.debian.org/968567 and Daniel Craig reported it into > > > Bugzilla at https://bugzilla.kernel.org/show_bug.cgi?id=209217 . > > > > > > Writing to a gfs2 filesystem fails and results in a soft lookup of the > > > machine for kernels with that commit applied. I cannot reporduce the > > > issue myself due not having a respective setup available, but Daniel > > > described a minimal serieos of steps to reproduce the issue. > > > > > > This might affect as well other stable series where this commit was > > > applied, as there was a similar report for someone running 5.4.58 in > > > https://www.redhat.com/archives/linux-cluster/2020-August/msg00000.html > > > > Can you report this to the gfs2 developers? > > > > thanks, > > > > greg k-h > > Hi Greg, > > No need. The patch came from the gfs2 developers. I think he just wants > it added to a stable release. What commit needs to be added to a stable release? confused, greg k-h
----- Original Message ----- > On Fri, Sep 11, 2020 at 08:08:35AM -0400, Bob Peterson wrote: > > ----- Original Message ----- > > > On Thu, Sep 10, 2020 at 09:43:19PM +0200, Salvatore Bonaccorso wrote: > > > > Hi, > > > > > > > > On Tue, Jun 23, 2020 at 09:57:50PM +0200, Greg Kroah-Hartman wrote: > > > > > From: Bob Peterson <rpeterso@redhat.com> > > > > > > > > > > [ Upstream commit 83d060ca8d90fa1e3feac227f995c013100862d3 ] > > > > > > > > > > Before this patch, transactions could be merged into the system > > > > > transaction by function gfs2_merge_trans(), but the transaction ail > > > > > lists were never merged. Because the ail flushing mechanism can run > > > > > separately, bd elements can be attached to the transaction's buffer > > > > > list during the transaction (trans_add_meta, etc) but quickly moved > > > > > to its ail lists. Later, in function gfs2_trans_end, the transaction > > > > > can be freed (by gfs2_trans_end) while it still has bd elements > > > > > queued to its ail lists, which can cause it to either lose track of > > > > > the bd elements altogether (memory leak) or worse, reference the bd > > > > > elements after the parent transaction has been freed. > > > > > > > > > > Although I've not seen any serious consequences, the problem becomes > > > > > apparent with the previous patch's addition of: > > > > > > > > > > gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list)); > > > > > > > > > > to function gfs2_trans_free(). > > > > > > > > > > This patch adds logic into gfs2_merge_trans() to move the merged > > > > > transaction's ail lists to the sdp transaction. This prevents the > > > > > use-after-free. To do this properly, we need to hold the ail lock, > > > > > so we pass sdp into the function instead of the transaction itself. > > > > > > > > > > Signed-off-by: Bob Peterson <rpeterso@redhat.com> > > > > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> > > > > > Signed-off-by: Sasha Levin <sashal@kernel.org> > > (snip) > > > > > > > > In Debian two user confirmed issues on writing on a GFS2 partition > > > > with this commit applied. The initial Debian report is at > > > > https://bugs.debian.org/968567 and Daniel Craig reported it into > > > > Bugzilla at https://bugzilla.kernel.org/show_bug.cgi?id=209217 . > > > > > > > > Writing to a gfs2 filesystem fails and results in a soft lookup of the > > > > machine for kernels with that commit applied. I cannot reporduce the > > > > issue myself due not having a respective setup available, but Daniel > > > > described a minimal serieos of steps to reproduce the issue. > > > > > > > > This might affect as well other stable series where this commit was > > > > applied, as there was a similar report for someone running 5.4.58 in > > > > https://www.redhat.com/archives/linux-cluster/2020-August/msg00000.html > > > > > > Can you report this to the gfs2 developers? > > > > > > thanks, > > > > > > greg k-h > > > > Hi Greg, > > > > No need. The patch came from the gfs2 developers. I think he just wants > > it added to a stable release. > > What commit needs to be added to a stable release? > > confused, > > greg k-h Sorry Greg, It's pretty early here and the caffeine hadn't quite hit my system. The problem is most likely that 4.19.132 is missing this upstream patch: cbcc89b630447ec7836aa2b9242d9bb1725f5a61 I'm not sure how or why 83d060ca8d90fa1e3feac227f995c013100862d3 got put into stable without a stable CC but cbcc89b6304 is definitely required. I'd like to suggest Salvatore try cherry-picking this patch to see if it fixes the problem, and if so, perhaps Greg can add it to stable. Regards, Bob Peterson GFS2 File Systems
Hi Bob, On Fri, Sep 11, 2020 at 08:49:14AM -0400, Bob Peterson wrote: > ----- Original Message ----- > > On Fri, Sep 11, 2020 at 08:08:35AM -0400, Bob Peterson wrote: > > > ----- Original Message ----- > > > > On Thu, Sep 10, 2020 at 09:43:19PM +0200, Salvatore Bonaccorso wrote: > > > > > Hi, > > > > > > > > > > On Tue, Jun 23, 2020 at 09:57:50PM +0200, Greg Kroah-Hartman wrote: > > > > > > From: Bob Peterson <rpeterso@redhat.com> > > > > > > > > > > > > [ Upstream commit 83d060ca8d90fa1e3feac227f995c013100862d3 ] > > > > > > > > > > > > Before this patch, transactions could be merged into the system > > > > > > transaction by function gfs2_merge_trans(), but the transaction ail > > > > > > lists were never merged. Because the ail flushing mechanism can run > > > > > > separately, bd elements can be attached to the transaction's buffer > > > > > > list during the transaction (trans_add_meta, etc) but quickly moved > > > > > > to its ail lists. Later, in function gfs2_trans_end, the transaction > > > > > > can be freed (by gfs2_trans_end) while it still has bd elements > > > > > > queued to its ail lists, which can cause it to either lose track of > > > > > > the bd elements altogether (memory leak) or worse, reference the bd > > > > > > elements after the parent transaction has been freed. > > > > > > > > > > > > Although I've not seen any serious consequences, the problem becomes > > > > > > apparent with the previous patch's addition of: > > > > > > > > > > > > gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list)); > > > > > > > > > > > > to function gfs2_trans_free(). > > > > > > > > > > > > This patch adds logic into gfs2_merge_trans() to move the merged > > > > > > transaction's ail lists to the sdp transaction. This prevents the > > > > > > use-after-free. To do this properly, we need to hold the ail lock, > > > > > > so we pass sdp into the function instead of the transaction itself. > > > > > > > > > > > > Signed-off-by: Bob Peterson <rpeterso@redhat.com> > > > > > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> > > > > > > Signed-off-by: Sasha Levin <sashal@kernel.org> > > > (snip) > > > > > > > > > > In Debian two user confirmed issues on writing on a GFS2 partition > > > > > with this commit applied. The initial Debian report is at > > > > > https://bugs.debian.org/968567 and Daniel Craig reported it into > > > > > Bugzilla at https://bugzilla.kernel.org/show_bug.cgi?id=209217 . > > > > > > > > > > Writing to a gfs2 filesystem fails and results in a soft lookup of the > > > > > machine for kernels with that commit applied. I cannot reporduce the > > > > > issue myself due not having a respective setup available, but Daniel > > > > > described a minimal serieos of steps to reproduce the issue. > > > > > > > > > > This might affect as well other stable series where this commit was > > > > > applied, as there was a similar report for someone running 5.4.58 in > > > > > https://www.redhat.com/archives/linux-cluster/2020-August/msg00000.html > > > > > > > > Can you report this to the gfs2 developers? > > > > > > > > thanks, > > > > > > > > greg k-h > > > > > > Hi Greg, > > > > > > No need. The patch came from the gfs2 developers. I think he just wants > > > it added to a stable release. > > > > What commit needs to be added to a stable release? > > > > confused, > > > > greg k-h > > Sorry Greg, > > It's pretty early here and the caffeine hadn't quite hit my system. > The problem is most likely that 4.19.132 is missing this upstream patch: > > cbcc89b630447ec7836aa2b9242d9bb1725f5a61 > > I'm not sure how or why 83d060ca8d90fa1e3feac227f995c013100862d3 got > put into stable without a stable CC but cbcc89b6304 is definitely > required. > > I'd like to suggest Salvatore try cherry-picking this patch to see if > it fixes the problem, and if so, perhaps Greg can add it to stable. Thanks I will ask the affected users if they can test this (because as said I cannot myself in this case). If it is true that we need to cherry-pick as well cbcc89b630447ec7836aa2b9242d9bb1725f5a61, then all of v4.14.y, v4.19.y, v5.4.y would need to have it included as well (83d060ca8d90fa1e3feac227f995c013100862d3 was applied down to v4.14.186, v4.19.130, v5.4.49, v5.7.6 (EOL)). Regards, Salvatore
Hi Bob, hi Greg, On Fri, Sep 11, 2020 at 08:49:14AM -0400, Bob Peterson wrote: > ----- Original Message ----- > > On Fri, Sep 11, 2020 at 08:08:35AM -0400, Bob Peterson wrote: > > > ----- Original Message ----- > > > > On Thu, Sep 10, 2020 at 09:43:19PM +0200, Salvatore Bonaccorso wrote: > > > > > Hi, > > > > > > > > > > On Tue, Jun 23, 2020 at 09:57:50PM +0200, Greg Kroah-Hartman wrote: > > > > > > From: Bob Peterson <rpeterso@redhat.com> > > > > > > > > > > > > [ Upstream commit 83d060ca8d90fa1e3feac227f995c013100862d3 ] > > > > > > > > > > > > Before this patch, transactions could be merged into the system > > > > > > transaction by function gfs2_merge_trans(), but the transaction ail > > > > > > lists were never merged. Because the ail flushing mechanism can run > > > > > > separately, bd elements can be attached to the transaction's buffer > > > > > > list during the transaction (trans_add_meta, etc) but quickly moved > > > > > > to its ail lists. Later, in function gfs2_trans_end, the transaction > > > > > > can be freed (by gfs2_trans_end) while it still has bd elements > > > > > > queued to its ail lists, which can cause it to either lose track of > > > > > > the bd elements altogether (memory leak) or worse, reference the bd > > > > > > elements after the parent transaction has been freed. > > > > > > > > > > > > Although I've not seen any serious consequences, the problem becomes > > > > > > apparent with the previous patch's addition of: > > > > > > > > > > > > gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list)); > > > > > > > > > > > > to function gfs2_trans_free(). > > > > > > > > > > > > This patch adds logic into gfs2_merge_trans() to move the merged > > > > > > transaction's ail lists to the sdp transaction. This prevents the > > > > > > use-after-free. To do this properly, we need to hold the ail lock, > > > > > > so we pass sdp into the function instead of the transaction itself. > > > > > > > > > > > > Signed-off-by: Bob Peterson <rpeterso@redhat.com> > > > > > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> > > > > > > Signed-off-by: Sasha Levin <sashal@kernel.org> > > > (snip) > > > > > > > > > > In Debian two user confirmed issues on writing on a GFS2 partition > > > > > with this commit applied. The initial Debian report is at > > > > > https://bugs.debian.org/968567 and Daniel Craig reported it into > > > > > Bugzilla at https://bugzilla.kernel.org/show_bug.cgi?id=209217 . > > > > > > > > > > Writing to a gfs2 filesystem fails and results in a soft lookup of the > > > > > machine for kernels with that commit applied. I cannot reporduce the > > > > > issue myself due not having a respective setup available, but Daniel > > > > > described a minimal serieos of steps to reproduce the issue. > > > > > > > > > > This might affect as well other stable series where this commit was > > > > > applied, as there was a similar report for someone running 5.4.58 in > > > > > https://www.redhat.com/archives/linux-cluster/2020-August/msg00000.html > > > > > > > > Can you report this to the gfs2 developers? > > > > > > > > thanks, > > > > > > > > greg k-h > > > > > > Hi Greg, > > > > > > No need. The patch came from the gfs2 developers. I think he just wants > > > it added to a stable release. > > > > What commit needs to be added to a stable release? > > > > confused, > > > > greg k-h > > Sorry Greg, > > It's pretty early here and the caffeine hadn't quite hit my system. > The problem is most likely that 4.19.132 is missing this upstream patch: > > cbcc89b630447ec7836aa2b9242d9bb1725f5a61 > > I'm not sure how or why 83d060ca8d90fa1e3feac227f995c013100862d3 got > put into stable without a stable CC but cbcc89b6304 is definitely > required. > > I'd like to suggest Salvatore try cherry-picking this patch to see if > it fixes the problem, and if so, perhaps Greg can add it to stable. I can confirm (Daniel was able to test): Applying cbcc89b63044 ("gfs2: initialize transaction tr_ailX_lists earlier") fixes the issue. So would be great if you can pick that up for stable for those series which had 83d060ca8d90 ("gfs2: fix use-after-free on transaction ail lists") as well. Regards, Salvatore
----- Original Message ----- > Hi Bob, hi Greg, > > On Fri, Sep 11, 2020 at 08:49:14AM -0400, Bob Peterson wrote: > > ----- Original Message ----- > > > On Fri, Sep 11, 2020 at 08:08:35AM -0400, Bob Peterson wrote: > > > > ----- Original Message ----- > > > > > On Thu, Sep 10, 2020 at 09:43:19PM +0200, Salvatore Bonaccorso wrote: > > > > > > Hi, > > > > > > > > > > > > On Tue, Jun 23, 2020 at 09:57:50PM +0200, Greg Kroah-Hartman wrote: > > > > > > > From: Bob Peterson <rpeterso@redhat.com> > > > > > > > > > > > > > > [ Upstream commit 83d060ca8d90fa1e3feac227f995c013100862d3 ] > > > > > > > > > > > > > > Before this patch, transactions could be merged into the system > > > > > > > transaction by function gfs2_merge_trans(), but the transaction > > > > > > > ail > > > > > > > lists were never merged. Because the ail flushing mechanism can > > > > > > > run > > > > > > > separately, bd elements can be attached to the transaction's > > > > > > > buffer > > > > > > > list during the transaction (trans_add_meta, etc) but quickly > > > > > > > moved > > > > > > > to its ail lists. Later, in function gfs2_trans_end, the > > > > > > > transaction > > > > > > > can be freed (by gfs2_trans_end) while it still has bd elements > > > > > > > queued to its ail lists, which can cause it to either lose track > > > > > > > of > > > > > > > the bd elements altogether (memory leak) or worse, reference the > > > > > > > bd > > > > > > > elements after the parent transaction has been freed. > > > > > > > > > > > > > > Although I've not seen any serious consequences, the problem > > > > > > > becomes > > > > > > > apparent with the previous patch's addition of: > > > > > > > > > > > > > > gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list)); > > > > > > > > > > > > > > to function gfs2_trans_free(). > > > > > > > > > > > > > > This patch adds logic into gfs2_merge_trans() to move the merged > > > > > > > transaction's ail lists to the sdp transaction. This prevents the > > > > > > > use-after-free. To do this properly, we need to hold the ail > > > > > > > lock, > > > > > > > so we pass sdp into the function instead of the transaction > > > > > > > itself. > > > > > > > > > > > > > > Signed-off-by: Bob Peterson <rpeterso@redhat.com> > > > > > > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> > > > > > > > Signed-off-by: Sasha Levin <sashal@kernel.org> > > > > (snip) > > > > > > > > > > > > In Debian two user confirmed issues on writing on a GFS2 partition > > > > > > with this commit applied. The initial Debian report is at > > > > > > https://bugs.debian.org/968567 and Daniel Craig reported it into > > > > > > Bugzilla at https://bugzilla.kernel.org/show_bug.cgi?id=209217 . > > > > > > > > > > > > Writing to a gfs2 filesystem fails and results in a soft lookup of > > > > > > the > > > > > > machine for kernels with that commit applied. I cannot reporduce > > > > > > the > > > > > > issue myself due not having a respective setup available, but > > > > > > Daniel > > > > > > described a minimal serieos of steps to reproduce the issue. > > > > > > > > > > > > This might affect as well other stable series where this commit was > > > > > > applied, as there was a similar report for someone running 5.4.58 > > > > > > in > > > > > > https://www.redhat.com/archives/linux-cluster/2020-August/msg00000.html > > > > > > > > > > Can you report this to the gfs2 developers? > > > > > > > > > > thanks, > > > > > > > > > > greg k-h > > > > > > > > Hi Greg, > > > > > > > > No need. The patch came from the gfs2 developers. I think he just wants > > > > it added to a stable release. > > > > > > What commit needs to be added to a stable release? > > > > > > confused, > > > > > > greg k-h > > > > Sorry Greg, > > > > It's pretty early here and the caffeine hadn't quite hit my system. > > The problem is most likely that 4.19.132 is missing this upstream patch: > > > > cbcc89b630447ec7836aa2b9242d9bb1725f5a61 > > > > I'm not sure how or why 83d060ca8d90fa1e3feac227f995c013100862d3 got > > put into stable without a stable CC but cbcc89b6304 is definitely > > required. > > > > I'd like to suggest Salvatore try cherry-picking this patch to see if > > it fixes the problem, and if so, perhaps Greg can add it to stable. > > I can confirm (Daniel was able to test): Applying cbcc89b63044 ("gfs2: > initialize transaction tr_ailX_lists earlier") fixes the issue. So > would be great if you can pick that up for stable for those series > which had 83d060ca8d90 ("gfs2: fix use-after-free on transaction ail > lists") as well. > > Regards, > Salvatore > > Hi Greg, As per Salvatore's email above, can you please cherry-pick GFS2 patch cbcc89b630447ec7836aa2b9242d9bb1725f5a61 to the stable releases like 4.19 to which ("gfs2: fix use-after-free on transaction ail lists") (83d060ca8d90fa1e3feac227f995c013100862d3) was applied? Thanks. Regards, Bob Peterson GFS2 File System
On Tue, Sep 15, 2020 at 12:52:01PM -0400, Bob Peterson wrote: > ----- Original Message ----- > > Hi Bob, hi Greg, > > > > On Fri, Sep 11, 2020 at 08:49:14AM -0400, Bob Peterson wrote: > > > ----- Original Message ----- > > > > On Fri, Sep 11, 2020 at 08:08:35AM -0400, Bob Peterson wrote: > > > > > ----- Original Message ----- > > > > > > On Thu, Sep 10, 2020 at 09:43:19PM +0200, Salvatore Bonaccorso wrote: > > > > > > > Hi, > > > > > > > > > > > > > > On Tue, Jun 23, 2020 at 09:57:50PM +0200, Greg Kroah-Hartman wrote: > > > > > > > > From: Bob Peterson <rpeterso@redhat.com> > > > > > > > > > > > > > > > > [ Upstream commit 83d060ca8d90fa1e3feac227f995c013100862d3 ] > > > > > > > > > > > > > > > > Before this patch, transactions could be merged into the system > > > > > > > > transaction by function gfs2_merge_trans(), but the transaction > > > > > > > > ail > > > > > > > > lists were never merged. Because the ail flushing mechanism can > > > > > > > > run > > > > > > > > separately, bd elements can be attached to the transaction's > > > > > > > > buffer > > > > > > > > list during the transaction (trans_add_meta, etc) but quickly > > > > > > > > moved > > > > > > > > to its ail lists. Later, in function gfs2_trans_end, the > > > > > > > > transaction > > > > > > > > can be freed (by gfs2_trans_end) while it still has bd elements > > > > > > > > queued to its ail lists, which can cause it to either lose track > > > > > > > > of > > > > > > > > the bd elements altogether (memory leak) or worse, reference the > > > > > > > > bd > > > > > > > > elements after the parent transaction has been freed. > > > > > > > > > > > > > > > > Although I've not seen any serious consequences, the problem > > > > > > > > becomes > > > > > > > > apparent with the previous patch's addition of: > > > > > > > > > > > > > > > > gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list)); > > > > > > > > > > > > > > > > to function gfs2_trans_free(). > > > > > > > > > > > > > > > > This patch adds logic into gfs2_merge_trans() to move the merged > > > > > > > > transaction's ail lists to the sdp transaction. This prevents the > > > > > > > > use-after-free. To do this properly, we need to hold the ail > > > > > > > > lock, > > > > > > > > so we pass sdp into the function instead of the transaction > > > > > > > > itself. > > > > > > > > > > > > > > > > Signed-off-by: Bob Peterson <rpeterso@redhat.com> > > > > > > > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> > > > > > > > > Signed-off-by: Sasha Levin <sashal@kernel.org> > > > > > (snip) > > > > > > > > > > > > > > In Debian two user confirmed issues on writing on a GFS2 partition > > > > > > > with this commit applied. The initial Debian report is at > > > > > > > https://bugs.debian.org/968567 and Daniel Craig reported it into > > > > > > > Bugzilla at https://bugzilla.kernel.org/show_bug.cgi?id=209217 . > > > > > > > > > > > > > > Writing to a gfs2 filesystem fails and results in a soft lookup of > > > > > > > the > > > > > > > machine for kernels with that commit applied. I cannot reporduce > > > > > > > the > > > > > > > issue myself due not having a respective setup available, but > > > > > > > Daniel > > > > > > > described a minimal serieos of steps to reproduce the issue. > > > > > > > > > > > > > > This might affect as well other stable series where this commit was > > > > > > > applied, as there was a similar report for someone running 5.4.58 > > > > > > > in > > > > > > > https://www.redhat.com/archives/linux-cluster/2020-August/msg00000.html > > > > > > > > > > > > Can you report this to the gfs2 developers? > > > > > > > > > > > > thanks, > > > > > > > > > > > > greg k-h > > > > > > > > > > Hi Greg, > > > > > > > > > > No need. The patch came from the gfs2 developers. I think he just wants > > > > > it added to a stable release. > > > > > > > > What commit needs to be added to a stable release? > > > > > > > > confused, > > > > > > > > greg k-h > > > > > > Sorry Greg, > > > > > > It's pretty early here and the caffeine hadn't quite hit my system. > > > The problem is most likely that 4.19.132 is missing this upstream patch: > > > > > > cbcc89b630447ec7836aa2b9242d9bb1725f5a61 > > > > > > I'm not sure how or why 83d060ca8d90fa1e3feac227f995c013100862d3 got > > > put into stable without a stable CC but cbcc89b6304 is definitely > > > required. > > > > > > I'd like to suggest Salvatore try cherry-picking this patch to see if > > > it fixes the problem, and if so, perhaps Greg can add it to stable. > > > > I can confirm (Daniel was able to test): Applying cbcc89b63044 ("gfs2: > > initialize transaction tr_ailX_lists earlier") fixes the issue. So > > would be great if you can pick that up for stable for those series > > which had 83d060ca8d90 ("gfs2: fix use-after-free on transaction ail > > lists") as well. > > > > Regards, > > Salvatore > > > > > > Hi Greg, > > As per Salvatore's email above, can you please cherry-pick GFS2 patch > cbcc89b630447ec7836aa2b9242d9bb1725f5a61 to the stable releases like > 4.19 to which ("gfs2: fix use-after-free on transaction ail lists") > (83d060ca8d90fa1e3feac227f995c013100862d3) was applied? Thanks. Now queued up, thanks. greg k-h
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index d3f0612e33471..06752db213d21 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -877,8 +877,10 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags) * @new: New transaction to be merged */ -static void gfs2_merge_trans(struct gfs2_trans *old, struct gfs2_trans *new) +static void gfs2_merge_trans(struct gfs2_sbd *sdp, struct gfs2_trans *new) { + struct gfs2_trans *old = sdp->sd_log_tr; + WARN_ON_ONCE(!test_bit(TR_ATTACHED, &old->tr_flags)); old->tr_num_buf_new += new->tr_num_buf_new; @@ -890,6 +892,11 @@ static void gfs2_merge_trans(struct gfs2_trans *old, struct gfs2_trans *new) list_splice_tail_init(&new->tr_databuf, &old->tr_databuf); list_splice_tail_init(&new->tr_buf, &old->tr_buf); + + spin_lock(&sdp->sd_ail_lock); + list_splice_tail_init(&new->tr_ail1_list, &old->tr_ail1_list); + list_splice_tail_init(&new->tr_ail2_list, &old->tr_ail2_list); + spin_unlock(&sdp->sd_ail_lock); } static void log_refund(struct gfs2_sbd *sdp, struct gfs2_trans *tr) @@ -901,7 +908,7 @@ static void log_refund(struct gfs2_sbd *sdp, struct gfs2_trans *tr) gfs2_log_lock(sdp); if (sdp->sd_log_tr) { - gfs2_merge_trans(sdp->sd_log_tr, tr); + gfs2_merge_trans(sdp, tr); } else if (tr->tr_num_buf_new || tr->tr_num_databuf_new) { gfs2_assert_withdraw(sdp, test_bit(TR_ALLOCED, &tr->tr_flags)); sdp->sd_log_tr = tr;