From patchwork Sun Aug 8 07:22:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 493831 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE1BAC432BE for ; Sun, 8 Aug 2021 07:23:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B90EB61052 for ; Sun, 8 Aug 2021 07:23:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231301AbhHHHXY (ORCPT ); Sun, 8 Aug 2021 03:23:24 -0400 Received: from mail.kernel.org ([198.145.29.99]:55008 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231328AbhHHHXV (ORCPT ); Sun, 8 Aug 2021 03:23:21 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 7B27C60F02; Sun, 8 Aug 2021 07:23:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1628407382; bh=q18wI37N9Ng3TVmmftyzIFDQXlXvP0TaDJNf2OjhR4I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=o/uyMCtZ+o//OBpmbtn9zalfupwxUGU3Bmd6IAo9uLAbKbX+FYOYTmFORNwEk/YBP tOVmLFbTMqy2YsRKUKqgst6tEWFRaYZKN58uIv/pQgNZJtfeycoQ5hnjQ/hAIQPloK 0u1F83QoNl4gDAJDCv/vtxoeqbNOGNCuIHZHesY0= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, "Peter Zijlstra (Intel)" , juri.lelli@arm.com, bigeasy@linutronix.de, xlpang@redhat.com, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jdesfossez@efficios.com, dvhart@infradead.org, bristot@redhat.com, Thomas Gleixner , Zhen Lei , Joe Korty Subject: [PATCH 4.4 02/11] futex: Cleanup refcounting Date: Sun, 8 Aug 2021 09:22:37 +0200 Message-Id: <20210808072217.401551055@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210808072217.322468704@linuxfoundation.org> References: <20210808072217.322468704@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Peter Zijlstra [ Upstream commit bf92cf3a5100f5a0d5f9834787b130159397cb22 ] Add a put_pit_state() as counterpart for get_pi_state() so the refcounting becomes consistent. Signed-off-by: Peter Zijlstra (Intel) Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104151.801778516@infradead.org Signed-off-by: Thomas Gleixner Signed-off-by: Zhen Lei Acked-by: Joe Korty Signed-off-by: Greg Kroah-Hartman --- kernel/futex.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) --- a/kernel/futex.c +++ b/kernel/futex.c @@ -825,7 +825,7 @@ static int refill_pi_state_cache(void) return 0; } -static struct futex_pi_state * alloc_pi_state(void) +static struct futex_pi_state *alloc_pi_state(void) { struct futex_pi_state *pi_state = current->pi_state_cache; @@ -858,6 +858,11 @@ static void pi_state_update_owner(struct } } +static void get_pi_state(struct futex_pi_state *pi_state) +{ + WARN_ON_ONCE(!atomic_inc_not_zero(&pi_state->refcount)); +} + /* * Drops a reference to the pi_state object and frees or caches it * when the last reference is gone. @@ -901,7 +906,7 @@ static void put_pi_state(struct futex_pi * Look up the task based on what TID userspace gave us. * We dont trust it. */ -static struct task_struct * futex_find_get_task(pid_t pid) +static struct task_struct *futex_find_get_task(pid_t pid) { struct task_struct *p; @@ -1149,7 +1154,7 @@ static int attach_to_pi_state(u32 __user goto out_einval; out_attach: - atomic_inc(&pi_state->refcount); + get_pi_state(pi_state); raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); *ps = pi_state; return 0; @@ -2204,7 +2209,7 @@ retry_private: */ if (requeue_pi) { /* Prepare the waiter to take the rt_mutex. */ - atomic_inc(&pi_state->refcount); + get_pi_state(pi_state); this->pi_state = pi_state; ret = rt_mutex_start_proxy_lock(&pi_state->pi_mutex, this->rt_waiter, From patchwork Sun Aug 8 07:22:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 493830 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48E42C4320E for ; Sun, 8 Aug 2021 07:23:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2A91561078 for ; Sun, 8 Aug 2021 07:23:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231450AbhHHHX2 (ORCPT ); Sun, 8 Aug 2021 03:23:28 -0400 Received: from mail.kernel.org ([198.145.29.99]:55180 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231359AbhHHHX0 (ORCPT ); Sun, 8 Aug 2021 03:23:26 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id E41CA61028; Sun, 8 Aug 2021 07:23:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1628407387; bh=qc1leVYk35OvjWoUI1YszX8Fq+6Nw1HnuEfGH0P28o4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=JrS6bzS2Bp27y4V5GMdTcmX6Amhl4t6iVz1/2MlLVVvuOHdV6Slfa3cNwlqcaDC4V W3FClQfh98+/Vd6ZSs6hBygzq63zfvM7RQLieGRuNrj42IWuN3EtF4yBYDPEVGY9en ty7MR0aKZEsqcwlIB5N9srAFRlFqTrOjPhTXeISc= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, "Peter Zijlstra (Intel)" , juri.lelli@arm.com, bigeasy@linutronix.de, xlpang@redhat.com, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jdesfossez@efficios.com, dvhart@infradead.org, bristot@redhat.com, Thomas Gleixner , Zhen Lei , Joe Korty Subject: [PATCH 4.4 04/11] futex: Pull rt_mutex_futex_unlock() out from under hb->lock Date: Sun, 8 Aug 2021 09:22:39 +0200 Message-Id: <20210808072217.464956460@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210808072217.322468704@linuxfoundation.org> References: <20210808072217.322468704@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Peter Zijlstra [ Upstream commit 16ffa12d742534d4ff73e8b3a4e81c1de39196f0 ] There's a number of 'interesting' problems, all caused by holding hb->lock while doing the rt_mutex_unlock() equivalient. Notably: - a PI inversion on hb->lock; and, - a SCHED_DEADLINE crash because of pointer instability. The previous changes: - changed the locking rules to cover {uval,pi_state} with wait_lock. - allow to do rt_mutex_futex_unlock() without dropping wait_lock; which in turn allows to rely on wait_lock atomicity completely. - simplified the waiter conundrum. It's now sufficient to hold rtmutex::wait_lock and a reference on the pi_state to protect the state consistency, so hb->lock can be dropped before calling rt_mutex_futex_unlock(). Signed-off-by: Peter Zijlstra (Intel) Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104151.900002056@infradead.org Signed-off-by: Thomas Gleixner Signed-off-by: Zhen Lei Acked-by: Joe Korty Signed-off-by: Greg Kroah-Hartman --- kernel/futex.c | 111 ++++++++++++++++++++++++++++++++++----------------------- 1 file changed, 68 insertions(+), 43 deletions(-) --- a/kernel/futex.c +++ b/kernel/futex.c @@ -966,10 +966,12 @@ static void exit_pi_state_list(struct ta pi_state->owner = NULL; raw_spin_unlock_irq(&curr->pi_lock); - rt_mutex_futex_unlock(&pi_state->pi_mutex); - + get_pi_state(pi_state); spin_unlock(&hb->lock); + rt_mutex_futex_unlock(&pi_state->pi_mutex); + put_pi_state(pi_state); + raw_spin_lock_irq(&curr->pi_lock); } raw_spin_unlock_irq(&curr->pi_lock); @@ -1083,6 +1085,11 @@ static int attach_to_pi_state(u32 __user * has dropped the hb->lock in between queue_me() and unqueue_me_pi(), * which in turn means that futex_lock_pi() still has a reference on * our pi_state. + * + * The waiter holding a reference on @pi_state also protects against + * the unlocked put_pi_state() in futex_unlock_pi(), futex_lock_pi() + * and futex_wait_requeue_pi() as it cannot go to 0 and consequently + * free pi_state before we can take a reference ourselves. */ WARN_ON(!atomic_read(&pi_state->refcount)); @@ -1537,48 +1544,40 @@ static void mark_wake_futex(struct wake_ q->lock_ptr = NULL; } -static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this, - struct futex_hash_bucket *hb) +/* + * Caller must hold a reference on @pi_state. + */ +static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_pi_state *pi_state) { - struct task_struct *new_owner; - struct futex_pi_state *pi_state = this->pi_state; u32 uninitialized_var(curval), newval; + struct task_struct *new_owner; + bool deboost = false; WAKE_Q(wake_q); - bool deboost; int ret = 0; - if (!pi_state) - return -EINVAL; - - /* - * If current does not own the pi_state then the futex is - * inconsistent and user space fiddled with the futex value. - */ - if (pi_state->owner != current) - return -EINVAL; - raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); new_owner = rt_mutex_next_owner(&pi_state->pi_mutex); - - /* - * When we interleave with futex_lock_pi() where it does - * rt_mutex_timed_futex_lock(), we might observe @this futex_q waiter, - * but the rt_mutex's wait_list can be empty (either still, or again, - * depending on which side we land). - * - * When this happens, give up our locks and try again, giving the - * futex_lock_pi() instance time to complete, either by waiting on the - * rtmutex or removing itself from the futex queue. - */ if (!new_owner) { - raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); - return -EAGAIN; + /* + * Since we held neither hb->lock nor wait_lock when coming + * into this function, we could have raced with futex_lock_pi() + * such that we might observe @this futex_q waiter, but the + * rt_mutex's wait_list can be empty (either still, or again, + * depending on which side we land). + * + * When this happens, give up our locks and try again, giving + * the futex_lock_pi() instance time to complete, either by + * waiting on the rtmutex or removing itself from the futex + * queue. + */ + ret = -EAGAIN; + goto out_unlock; } /* - * We pass it to the next owner. The WAITERS bit is always - * kept enabled while there is PI state around. We cleanup the - * owner died bit, because we are the owner. + * We pass it to the next owner. The WAITERS bit is always kept + * enabled while there is PI state around. We cleanup the owner + * died bit, because we are the owner. */ newval = FUTEX_WAITERS | task_pid_vnr(new_owner); @@ -1611,15 +1610,15 @@ static int wake_futex_pi(u32 __user *uad deboost = __rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q); } +out_unlock: raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); - spin_unlock(&hb->lock); if (deboost) { wake_up_q(&wake_q); rt_mutex_adjust_prio(current); } - return 0; + return ret; } /* @@ -2462,7 +2461,7 @@ retry: if (get_futex_value_locked(&uval, uaddr)) goto handle_fault; - while (1) { + for (;;) { newval = (uval & FUTEX_OWNER_DIED) | newtid; if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)) @@ -2975,10 +2974,36 @@ retry: */ match = futex_top_waiter(hb, &key); if (match) { - ret = wake_futex_pi(uaddr, uval, match, hb); + struct futex_pi_state *pi_state = match->pi_state; + + ret = -EINVAL; + if (!pi_state) + goto out_unlock; + /* - * In case of success wake_futex_pi dropped the hash - * bucket lock. + * If current does not own the pi_state then the futex is + * inconsistent and user space fiddled with the futex value. + */ + if (pi_state->owner != current) + goto out_unlock; + + /* + * Grab a reference on the pi_state and drop hb->lock. + * + * The reference ensures pi_state lives, dropping the hb->lock + * is tricky.. wake_futex_pi() will take rt_mutex::wait_lock to + * close the races against futex_lock_pi(), but in case of + * _any_ fail we'll abort and retry the whole deal. + */ + get_pi_state(pi_state); + spin_unlock(&hb->lock); + + ret = wake_futex_pi(uaddr, uval, pi_state); + + put_pi_state(pi_state); + + /* + * Success, we're done! No tricky corner cases. */ if (!ret) goto out_putkey; @@ -2993,7 +3018,6 @@ retry: * setting the FUTEX_WAITERS bit. Try again. */ if (ret == -EAGAIN) { - spin_unlock(&hb->lock); put_futex_key(&key); goto retry; } @@ -3001,7 +3025,7 @@ retry: * wake_futex_pi has detected invalid state. Tell user * space. */ - goto out_unlock; + goto out_putkey; } /* @@ -3011,8 +3035,10 @@ retry: * preserve the WAITERS bit not the OWNER_DIED one. We are the * owner. */ - if (cmpxchg_futex_value_locked(&curval, uaddr, uval, 0)) + if (cmpxchg_futex_value_locked(&curval, uaddr, uval, 0)) { + spin_unlock(&hb->lock); goto pi_faulted; + } /* * If uval has changed, let user space handle it. @@ -3026,7 +3052,6 @@ out_putkey: return ret; pi_faulted: - spin_unlock(&hb->lock); put_futex_key(&key); ret = fault_in_user_writeable(uaddr); From patchwork Sun Aug 8 07:22:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 493829 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E39BBC4320A for ; Sun, 8 Aug 2021 07:23:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CA60C61058 for ; Sun, 8 Aug 2021 07:23:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231452AbhHHHXk (ORCPT ); Sun, 8 Aug 2021 03:23:40 -0400 Received: from mail.kernel.org ([198.145.29.99]:55366 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231465AbhHHHXa (ORCPT ); Sun, 8 Aug 2021 03:23:30 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 6933361075; Sun, 8 Aug 2021 07:23:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1628407391; bh=i4ccSHgXh8UVKMhodjaAs4LvBOiYvMqy9Tq21ZkD7Vs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NDQ57qCDLy/foI3R89cuC/2Nrmew3owbaBrmdkC8qaeoYlcQpY7EF+DCxAxu+ScbT xyGOUFCwBSZruoZ25vpmxSbqU+LL+EF17o1BJAmE1JG1jx+zdafBQL3U8A4bn+9iGb k8psKJ0HcJR4B0e+FNRWd1LgKRg7yjrzrsu+3E+E= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, "Peter Zijlstra (Intel)" , juri.lelli@arm.com, bigeasy@linutronix.de, xlpang@redhat.com, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jdesfossez@efficios.com, dvhart@infradead.org, bristot@redhat.com, Thomas Gleixner , Zhen Lei , Joe Korty Subject: [PATCH 4.4 06/11] futex: Futex_unlock_pi() determinism Date: Sun, 8 Aug 2021 09:22:41 +0200 Message-Id: <20210808072217.541927115@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210808072217.322468704@linuxfoundation.org> References: <20210808072217.322468704@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Peter Zijlstra [ Upstream commit bebe5b514345f09be2c15e414d076b02ecb9cce8 ] The problem with returning -EAGAIN when the waiter state mismatches is that it becomes very hard to proof a bounded execution time on the operation. And seeing that this is a RT operation, this is somewhat important. While in practise; given the previous patch; it will be very unlikely to ever really take more than one or two rounds, proving so becomes rather hard. However, now that modifying wait_list is done while holding both hb->lock and wait_lock, the scenario can be avoided entirely by acquiring wait_lock while still holding hb-lock. Doing a hand-over, without leaving a hole. Signed-off-by: Peter Zijlstra (Intel) Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104152.112378812@infradead.org Signed-off-by: Thomas Gleixner Signed-off-by: Zhen Lei Acked-by: Joe Korty Signed-off-by: Greg Kroah-Hartman --- kernel/futex.c | 24 +++++++++++------------- 1 file changed, 11 insertions(+), 13 deletions(-) --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1555,15 +1555,10 @@ static int wake_futex_pi(u32 __user *uad WAKE_Q(wake_q); int ret = 0; - raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); new_owner = rt_mutex_next_owner(&pi_state->pi_mutex); - if (!new_owner) { + if (WARN_ON_ONCE(!new_owner)) { /* - * Since we held neither hb->lock nor wait_lock when coming - * into this function, we could have raced with futex_lock_pi() - * such that we might observe @this futex_q waiter, but the - * rt_mutex's wait_list can be empty (either still, or again, - * depending on which side we land). + * As per the comment in futex_unlock_pi() this should not happen. * * When this happens, give up our locks and try again, giving * the futex_lock_pi() instance time to complete, either by @@ -3020,15 +3015,18 @@ retry: if (pi_state->owner != current) goto out_unlock; + get_pi_state(pi_state); /* - * Grab a reference on the pi_state and drop hb->lock. + * Since modifying the wait_list is done while holding both + * hb->lock and wait_lock, holding either is sufficient to + * observe it. * - * The reference ensures pi_state lives, dropping the hb->lock - * is tricky.. wake_futex_pi() will take rt_mutex::wait_lock to - * close the races against futex_lock_pi(), but in case of - * _any_ fail we'll abort and retry the whole deal. + * By taking wait_lock while still holding hb->lock, we ensure + * there is no point where we hold neither; and therefore + * wake_futex_pi() must observe a state consistent with what we + * observed. */ - get_pi_state(pi_state); + raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); spin_unlock(&hb->lock); ret = wake_futex_pi(uaddr, uval, pi_state); From patchwork Sun Aug 8 07:22:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 493828 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02548C4338F for ; Sun, 8 Aug 2021 07:23:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D62B660C40 for ; Sun, 8 Aug 2021 07:23:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231586AbhHHHXn (ORCPT ); Sun, 8 Aug 2021 03:23:43 -0400 Received: from mail.kernel.org ([198.145.29.99]:55452 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231576AbhHHHXf (ORCPT ); Sun, 8 Aug 2021 03:23:35 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 1E8DE60C40; Sun, 8 Aug 2021 07:23:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1628407396; bh=t2x4MMJGjpeLVqHrlnTStvWC0xWniWrXx/JRaH3tJuY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dZYBZM22HNC8iHtWlupw5xzk0yNDySCzbQl3XVmSyy6XlWQl3x0+e93hBb13T9zFk Hs3Nc+NUr3kTlPTHnC93q38j189PY4lVPxThKjqkFQ9jKvrGv2f41Mk9crgRHwoe+A fUkfqCu5senawOPhvfTIL3RGk5RsbNX8aZZ2NiZU= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Gratian Crisan , Mike Galbraith , Thomas Gleixner , Zhen Lei , Joe Korty Subject: [PATCH 4.4 08/11] futex: Handle transient "ownerless" rtmutex state correctly Date: Sun, 8 Aug 2021 09:22:43 +0200 Message-Id: <20210808072217.606471864@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210808072217.322468704@linuxfoundation.org> References: <20210808072217.322468704@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Mike Galbraith [ Upstream commit 9f5d1c336a10c0d24e83e40b4c1b9539f7dba627 ] Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner(). This is one possible chain of events leading to this: Task Prio Operation T1 120 lock(F) T2 120 lock(F) -> blocks (top waiter) T3 50 (RT) lock(F) -> boosts T1 and blocks (new top waiter) XX timeout/ -> wakes T2 signal T1 50 unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set) T2 120 cleanup -> try_to_take_mutex() fails because T3 is the top waiter and the lower priority T2 cannot steal the lock. -> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON() The comment states that this is invalid and rt_mutex_real_owner() must return a non NULL owner when the trylock failed, but in case of a queued and woken up waiter rt_mutex_real_owner() == NULL is a valid transient state. The higher priority waiter has simply not yet managed to take over the rtmutex. The BUG_ON() is therefore wrong and this is just another retry condition in fixup_pi_state_owner(). Drop the locks, so that T3 can make progress, and then try the fixup again. Gratian provided a great analysis, traces and a reproducer. The analysis is to the point, but it confused the hell out of that tglx dude who had to page in all the futex horrors again. Condensed version is above. [ tglx: Wrote comment and changelog ] Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex") Reported-by: Gratian Crisan Signed-off-by: Mike Galbraith Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com Link: https://lore.kernel.org/r/87sg9pkvf7.fsf@nanos.tec.linutronix.de Signed-off-by: Zhen Lei Acked-by: Joe Korty Signed-off-by: Greg Kroah-Hartman --- kernel/futex.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2437,10 +2437,22 @@ retry: } /* - * Since we just failed the trylock; there must be an owner. + * The trylock just failed, so either there is an owner or + * there is a higher priority waiter than this one. */ newowner = rt_mutex_owner(&pi_state->pi_mutex); - BUG_ON(!newowner); + /* + * If the higher priority waiter has not yet taken over the + * rtmutex then newowner is NULL. We can't return here with + * that state because it's inconsistent vs. the user space + * state. So drop the locks and try again. It's a valid + * situation and not any different from the other retry + * conditions. + */ + if (unlikely(!newowner)) { + err = -EAGAIN; + goto handle_fault; + } } else { WARN_ON_ONCE(argowner != current); if (oldowner == current) { From patchwork Sun Aug 8 07:22:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 493832 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 663B2C4338F for ; Sun, 8 Aug 2021 07:23:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4E17F61040 for ; Sun, 8 Aug 2021 07:23:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231247AbhHHHXU (ORCPT ); Sun, 8 Aug 2021 03:23:20 -0400 Received: from mail.kernel.org ([198.145.29.99]:54944 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230492AbhHHHXT (ORCPT ); Sun, 8 Aug 2021 03:23:19 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 62A6160C40; Sun, 8 Aug 2021 07:22:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1628407379; bh=uhLAo5FfyVe5NiACB2W/CyEoXjolokTW49NauGIkC0w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=uQ+6c+w8KrSmz/WDTHGYr/YENGI2CdfxWr+Ow+EMbqp/QfdYFuLnDze2wHox7yNAx umnnxmvFcIvpzxT3i+Ov316Hg+kng6MiIX9iWv6M776hC0VNs087TrS7MB1k4wDoY5 vnzbV7lUEDIXucG7JUfqvWe8JcXFy2yntGvrjYnM= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, "Eric W. Biederman" , Anna-Maria Gleixner , Thomas Gleixner , "Paul E. McKenney" , bigeasy@linutronix.de, Zhen Lei , Joe Korty Subject: [PATCH 4.4 11/11] rcu: Update documentation of rcu_read_unlock() Date: Sun, 8 Aug 2021 09:22:46 +0200 Message-Id: <20210808072217.704630357@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210808072217.322468704@linuxfoundation.org> References: <20210808072217.322468704@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Anna-Maria Gleixner [ Upstream commit ec84b27f9b3b569f9235413d1945a2006b97b0aa ] Since commit b4abf91047cf ("rtmutex: Make wait_lock irq safe") the explanation in rcu_read_unlock() documentation about irq unsafe rtmutex wait_lock is no longer valid. Remove it to prevent kernel developers reading the documentation to rely on it. Suggested-by: Eric W. Biederman Signed-off-by: Anna-Maria Gleixner Signed-off-by: Thomas Gleixner Reviewed-by: Paul E. McKenney Acked-by: "Eric W. Biederman" Cc: bigeasy@linutronix.de Link: https://lkml.kernel.org/r/20180525090507.22248-2-anna-maria@linutronix.de Signed-off-by: Zhen Lei Acked-by: Joe Korty Signed-off-by: Greg Kroah-Hartman --- include/linux/rcupdate.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -880,9 +880,7 @@ static __always_inline void rcu_read_loc * Unfortunately, this function acquires the scheduler's runqueue and * priority-inheritance spinlocks. This means that deadlock could result * if the caller of rcu_read_unlock() already holds one of these locks or - * any lock that is ever acquired while holding them; or any lock which - * can be taken from interrupt context because rcu_boost()->rt_mutex_lock() - * does not disable irqs while taking ->wait_lock. + * any lock that is ever acquired while holding them. * * That said, RCU readers are never priority boosted unless they were * preempted. Therefore, one way to avoid deadlock is to make sure