Message ID | 20210625154559.8148-1-lhenriques@suse.de |
---|---|
State | New |
Headers | show |
Series | [RFC] ceph: reduce contention in ceph_check_delayed_caps() | expand |
On Fri, Jun 25, 2021 at 12:54:44PM -0400, Jeff Layton wrote: <...> > I'm not sure this approach is viable, unfortunately. Once you've dropped > the cap_delay_lock, then nothing protects the i_cap_delay_list head > anymore. > > So you could detach these objects and put them on the private list, and > then once you drop the spinlock another task could find one of them and > (e.g.) call __cap_delay_requeue on it, potentially corrupting your list. > > I think we'll need to come up with a different way to do this... Ugh, yeah I see what you mean. Another option I can think off is to time-bound this loop, so that it would stop after finding the first ci->i_hold_caps_max timestamp that was set *after* the start of the current run. I'll see if I can come up with an RFC shortly. Cheers, -- Luís
On Mon, 2021-06-28 at 10:04 +0100, Luis Henriques wrote: > On Fri, Jun 25, 2021 at 12:54:44PM -0400, Jeff Layton wrote: > <...> > > I'm not sure this approach is viable, unfortunately. Once you've dropped > > the cap_delay_lock, then nothing protects the i_cap_delay_list head > > anymore. > > > > So you could detach these objects and put them on the private list, and > > then once you drop the spinlock another task could find one of them and > > (e.g.) call __cap_delay_requeue on it, potentially corrupting your list. > > > > I think we'll need to come up with a different way to do this... > > Ugh, yeah I see what you mean. > > Another option I can think off is to time-bound this loop, so that it > would stop after finding the first ci->i_hold_caps_max timestamp that was > set *after* the start of the current run. I'll see if I can come up with > an RFC shortly. > Sounds like a reasonable thing to do. The catch there is that those caps may end up being delayed up to 5s more than they would have, since schedule_delayed always uses a 5s delay. That delay could be made more dynamic if it becomes an issue. Maybe have the schedule_delayed callers calculate and pass in a timeout and schedule the next run for that point in the future? Then delayed_work could schedule the next run to coincide with the timeout of the next entry on the list. -- Jeff Layton <jlayton@kernel.org>
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index a5e93b185515..727e41e3b939 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -4229,6 +4229,7 @@ void ceph_check_delayed_caps(struct ceph_mds_client *mdsc) { struct inode *inode; struct ceph_inode_info *ci; + LIST_HEAD(caps_list); dout("check_delayed_caps\n"); spin_lock(&mdsc->cap_delay_lock); @@ -4239,19 +4240,23 @@ void ceph_check_delayed_caps(struct ceph_mds_client *mdsc) if ((ci->i_ceph_flags & CEPH_I_FLUSH) == 0 && time_before(jiffies, ci->i_hold_caps_max)) break; - list_del_init(&ci->i_cap_delay_list); + list_move_tail(&ci->i_cap_delay_list, &caps_list); + } + spin_unlock(&mdsc->cap_delay_lock); + while (!list_empty(&caps_list)) { + ci = list_first_entry(&caps_list, + struct ceph_inode_info, + i_cap_delay_list); + list_del_init(&ci->i_cap_delay_list); inode = igrab(&ci->vfs_inode); if (inode) { - spin_unlock(&mdsc->cap_delay_lock); dout("check_delayed_caps on %p\n", inode); ceph_check_caps(ci, 0, NULL); /* avoid calling iput_final() in tick thread */ ceph_async_iput(inode); - spin_lock(&mdsc->cap_delay_lock); } } - spin_unlock(&mdsc->cap_delay_lock); } /*
Function ceph_check_delayed_caps() is called from the mdsc->delayed_work workqueue and it can be kept looping for quite some time if caps keep being added back to the mdsc->cap_delay_list. This may result in the watchdog tainting the kernel with the softlockup flag. This patch re-arranges the loop through the caps list so that it initially removes all the caps from list, adding them to a temporary list. And then, with less locking contention, it will eventually call the ceph_check_caps() for each inode. Any caps added to the list in the meantime will be handled in the next run. Cc: stable@vger.kernel.org Signed-off-by: Luis Henriques <lhenriques@suse.de> --- Hi Jeff! So, I've not based this patch on top of your patchset that gets rid of ceph_async_iput() so that it will make it easier to backport it for stable kernels. Of course I'm not 100% this classifies as stable material. Other than that, I've been testing this patch and I couldn't see anything breaking. Let me know what you think. (I *think* I've seen a tracker bug for this in the past but I couldn't find it. I guess it could be added as a 'Link:' tag.) Cheers, -- Luis fs/ceph/caps.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)