From patchwork Mon Aug 16 13:02:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 498023 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC9B6C4338F for ; Mon, 16 Aug 2021 13:11:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 93D4C632A6 for ; Mon, 16 Aug 2021 13:11:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237177AbhHPNLk (ORCPT ); Mon, 16 Aug 2021 09:11:40 -0400 Received: from mail.kernel.org ([198.145.29.99]:56748 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239653AbhHPNJ7 (ORCPT ); Mon, 16 Aug 2021 09:09:59 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 137E960E78; Mon, 16 Aug 2021 13:09:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1629119367; bh=EEr00xQjhRt4abvlxPqMIxXmKFUHYRW4cSkD8hI5gok=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=gKBLe5SjHwd7Dfh7Ld54/6zrshA7RrjFymwQp7yzR8dOYRM0TNmTGj7aS76ojbEsv eOzQH7yNbZoI26Ykf8BFL0W0vyWqOl8fPSaD6IAGwSKNr/5Ix8QT6V1AGLU6w9STfY bK4fVzZ5JtoVPvcYmsx1ewNSuenqv5YwkoYnDIbo= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Mark Nelson , Jeff Layton , Luis Henriques , Ilya Dryomov Subject: [PATCH 5.10 95/96] ceph: take snap_empty_lock atomically with snaprealm refcount change Date: Mon, 16 Aug 2021 15:02:45 +0200 Message-Id: <20210816125438.144877986@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816125434.948010115@linuxfoundation.org> References: <20210816125434.948010115@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Jeff Layton commit 8434ffe71c874b9c4e184b88d25de98c2bf5fe3f upstream. There is a race in ceph_put_snap_realm. The change to the nref and the spinlock acquisition are not done atomically, so you could decrement nref, and before you take the spinlock, the nref is incremented again. At that point, you end up putting it on the empty list when it shouldn't be there. Eventually __cleanup_empty_realms runs and frees it when it's still in-use. Fix this by protecting the 1->0 transition with atomic_dec_and_lock, and just drop the spinlock if we can get the rwsem. Because these objects can also undergo a 0->1 refcount transition, we must protect that change as well with the spinlock. Increment locklessly unless the value is at 0, in which case we take the spinlock, increment and then take it off the empty list if it did the 0->1 transition. With these changes, I'm removing the dout() messages from these functions, as well as in __put_snap_realm. They've always been racy, and it's better to not print values that may be misleading. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/46419 Reported-by: Mark Nelson Signed-off-by: Jeff Layton Reviewed-by: Luis Henriques Signed-off-by: Ilya Dryomov Signed-off-by: Greg Kroah-Hartman --- fs/ceph/snap.c | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) --- a/fs/ceph/snap.c +++ b/fs/ceph/snap.c @@ -67,19 +67,19 @@ void ceph_get_snap_realm(struct ceph_mds { lockdep_assert_held(&mdsc->snap_rwsem); - dout("get_realm %p %d -> %d\n", realm, - atomic_read(&realm->nref), atomic_read(&realm->nref)+1); /* - * since we _only_ increment realm refs or empty the empty - * list with snap_rwsem held, adjusting the empty list here is - * safe. we do need to protect against concurrent empty list - * additions, however. + * The 0->1 and 1->0 transitions must take the snap_empty_lock + * atomically with the refcount change. Go ahead and bump the + * nref here, unless it's 0, in which case we take the spinlock + * and then do the increment and remove it from the list. */ - if (atomic_inc_return(&realm->nref) == 1) { - spin_lock(&mdsc->snap_empty_lock); + if (atomic_inc_not_zero(&realm->nref)) + return; + + spin_lock(&mdsc->snap_empty_lock); + if (atomic_inc_return(&realm->nref) == 1) list_del_init(&realm->empty_item); - spin_unlock(&mdsc->snap_empty_lock); - } + spin_unlock(&mdsc->snap_empty_lock); } static void __insert_snap_realm(struct rb_root *root, @@ -208,28 +208,28 @@ static void __put_snap_realm(struct ceph { lockdep_assert_held_write(&mdsc->snap_rwsem); - dout("__put_snap_realm %llx %p %d -> %d\n", realm->ino, realm, - atomic_read(&realm->nref), atomic_read(&realm->nref)-1); + /* + * We do not require the snap_empty_lock here, as any caller that + * increments the value must hold the snap_rwsem. + */ if (atomic_dec_and_test(&realm->nref)) __destroy_snap_realm(mdsc, realm); } /* - * caller needn't hold any locks + * See comments in ceph_get_snap_realm. Caller needn't hold any locks. */ void ceph_put_snap_realm(struct ceph_mds_client *mdsc, struct ceph_snap_realm *realm) { - dout("put_snap_realm %llx %p %d -> %d\n", realm->ino, realm, - atomic_read(&realm->nref), atomic_read(&realm->nref)-1); - if (!atomic_dec_and_test(&realm->nref)) + if (!atomic_dec_and_lock(&realm->nref, &mdsc->snap_empty_lock)) return; if (down_write_trylock(&mdsc->snap_rwsem)) { + spin_unlock(&mdsc->snap_empty_lock); __destroy_snap_realm(mdsc, realm); up_write(&mdsc->snap_rwsem); } else { - spin_lock(&mdsc->snap_empty_lock); list_add(&realm->empty_item, &mdsc->snap_empty); spin_unlock(&mdsc->snap_empty_lock); }