diff mbox series

[RFC] ceph: reduce contention in ceph_check_delayed_caps()

Message ID 20210625154559.8148-1-lhenriques@suse.de
State New
Headers show
Series [RFC] ceph: reduce contention in ceph_check_delayed_caps() | expand

Commit Message

Luis Henriques June 25, 2021, 3:45 p.m. UTC
Function ceph_check_delayed_caps() is called from the mdsc->delayed_work
workqueue and it can be kept looping for quite some time if caps keep being
added back to the mdsc->cap_delay_list.  This may result in the watchdog
tainting the kernel with the softlockup flag.

This patch re-arranges the loop through the caps list so that it initially
removes all the caps from list, adding them to a temporary list.  And then, with
less locking contention, it will eventually call the ceph_check_caps() for each
inode.  Any caps added to the list in the meantime will be handled in the next
run.

Cc: stable@vger.kernel.org
Signed-off-by: Luis Henriques <lhenriques@suse.de>
---
Hi Jeff!

So, I've not based this patch on top of your patchset that gets rid of
ceph_async_iput() so that it will make it easier to backport it for stable
kernels.  Of course I'm not 100% this classifies as stable material.

Other than that, I've been testing this patch and I couldn't see anything
breaking.  Let me know what you think.

(I *think* I've seen a tracker bug for this in the past but I couldn't
find it.  I guess it could be added as a 'Link:' tag.)

Cheers,
--
Luis

 fs/ceph/caps.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

Comments

Luis Henriques June 28, 2021, 9:04 a.m. UTC | #1
On Fri, Jun 25, 2021 at 12:54:44PM -0400, Jeff Layton wrote:
<...>
> I'm not sure this approach is viable, unfortunately. Once you've dropped

> the cap_delay_lock, then nothing protects the i_cap_delay_list head

> anymore.

> 

> So you could detach these objects and put them on the private list, and

> then once you drop the spinlock another task could find one of them and

> (e.g.) call __cap_delay_requeue on it, potentially corrupting your list.

> 

> I think we'll need to come up with a different way to do this...


Ugh, yeah I see what you mean.

Another option I can think off is to time-bound this loop, so that it
would stop after finding the first ci->i_hold_caps_max timestamp that was
set *after* the start of the current run.  I'll see if I can come up with
an RFC shortly.

Cheers,
--
Luís
Jeff Layton June 28, 2021, 2:44 p.m. UTC | #2
On Mon, 2021-06-28 at 10:04 +0100, Luis Henriques wrote:
> On Fri, Jun 25, 2021 at 12:54:44PM -0400, Jeff Layton wrote:

> <...>

> > I'm not sure this approach is viable, unfortunately. Once you've dropped

> > the cap_delay_lock, then nothing protects the i_cap_delay_list head

> > anymore.

> > 

> > So you could detach these objects and put them on the private list, and

> > then once you drop the spinlock another task could find one of them and

> > (e.g.) call __cap_delay_requeue on it, potentially corrupting your list.

> > 

> > I think we'll need to come up with a different way to do this...

> 

> Ugh, yeah I see what you mean.

> 

> Another option I can think off is to time-bound this loop, so that it

> would stop after finding the first ci->i_hold_caps_max timestamp that was

> set *after* the start of the current run.  I'll see if I can come up with

> an RFC shortly.

> 


Sounds like a reasonable thing to do.

The catch there is that those caps may end up being delayed up to 5s
more than they would have, since schedule_delayed always uses a 5s
delay. That delay could be made more dynamic if it becomes an issue.

Maybe have the schedule_delayed callers calculate and pass in a timeout
and schedule the next run for that point in the future? Then
delayed_work could schedule the next run to coincide with the timeout of
the next entry on the list.
-- 
Jeff Layton <jlayton@kernel.org>
diff mbox series

Patch

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index a5e93b185515..727e41e3b939 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -4229,6 +4229,7 @@  void ceph_check_delayed_caps(struct ceph_mds_client *mdsc)
 {
 	struct inode *inode;
 	struct ceph_inode_info *ci;
+	LIST_HEAD(caps_list);
 
 	dout("check_delayed_caps\n");
 	spin_lock(&mdsc->cap_delay_lock);
@@ -4239,19 +4240,23 @@  void ceph_check_delayed_caps(struct ceph_mds_client *mdsc)
 		if ((ci->i_ceph_flags & CEPH_I_FLUSH) == 0 &&
 		    time_before(jiffies, ci->i_hold_caps_max))
 			break;
-		list_del_init(&ci->i_cap_delay_list);
+		list_move_tail(&ci->i_cap_delay_list, &caps_list);
+	}
+	spin_unlock(&mdsc->cap_delay_lock);
 
+	while (!list_empty(&caps_list)) {
+		ci = list_first_entry(&caps_list,
+				      struct ceph_inode_info,
+				      i_cap_delay_list);
+		list_del_init(&ci->i_cap_delay_list);
 		inode = igrab(&ci->vfs_inode);
 		if (inode) {
-			spin_unlock(&mdsc->cap_delay_lock);
 			dout("check_delayed_caps on %p\n", inode);
 			ceph_check_caps(ci, 0, NULL);
 			/* avoid calling iput_final() in tick thread */
 			ceph_async_iput(inode);
-			spin_lock(&mdsc->cap_delay_lock);
 		}
 	}
-	spin_unlock(&mdsc->cap_delay_lock);
 }
 
 /*