From patchwork Mon Mar 1 17:31:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Hutchings X-Patchwork-Id: 389906 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CDB71C433E6 for ; Mon, 1 Mar 2021 17:40:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8FC07652F6 for ; Mon, 1 Mar 2021 17:40:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234839AbhCARkZ (ORCPT ); Mon, 1 Mar 2021 12:40:25 -0500 Received: from maynard.decadent.org.uk ([95.217.213.242]:32842 "EHLO maynard.decadent.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238317AbhCARcY (ORCPT ); Mon, 1 Mar 2021 12:32:24 -0500 Received: from [2a02:1811:d34:3700:3b8d:b310:d327:e418] (helo=deadeye) by maynard with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lGmOS-0002aJ-6S; Mon, 01 Mar 2021 18:31:40 +0100 Received: from ben by deadeye with local (Exim 4.94) (envelope-from ) id 1lGmOR-004iX7-BR; Mon, 01 Mar 2021 18:31:39 +0100 Date: Mon, 1 Mar 2021 18:31:39 +0100 From: Ben Hutchings To: stable@vger.kernel.org Cc: Lee Jones , "Luis Claudio R. Goncalves" Subject: [PATCH 4.9 4/7] futex: Futex_unlock_pi() determinism Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-SA-Exim-Connect-IP: 2a02:1811:d34:3700:3b8d:b310:d327:e418 X-SA-Exim-Mail-From: ben@decadent.org.uk X-SA-Exim-Scanned: No (on maynard); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Peter Zijlstra commit bebe5b514345f09be2c15e414d076b02ecb9cce8 upstream. The problem with returning -EAGAIN when the waiter state mismatches is that it becomes very hard to proof a bounded execution time on the operation. And seeing that this is a RT operation, this is somewhat important. While in practise; given the previous patch; it will be very unlikely to ever really take more than one or two rounds, proving so becomes rather hard. However, now that modifying wait_list is done while holding both hb->lock and wait_lock, the scenario can be avoided entirely by acquiring wait_lock while still holding hb-lock. Doing a hand-over, without leaving a hole. Signed-off-by: Peter Zijlstra (Intel) Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104152.112378812@infradead.org Signed-off-by: Thomas Gleixner Signed-off-by: Ben Hutchings --- kernel/futex.c | 24 +++++++++++------------- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index c8ec4e7f3609..77cec33ea112 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1555,15 +1555,10 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_pi_state *pi_ WAKE_Q(wake_q); int ret = 0; - raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); new_owner = rt_mutex_next_owner(&pi_state->pi_mutex); - if (!new_owner) { + if (WARN_ON_ONCE(!new_owner)) { /* - * Since we held neither hb->lock nor wait_lock when coming - * into this function, we could have raced with futex_lock_pi() - * such that we might observe @this futex_q waiter, but the - * rt_mutex's wait_list can be empty (either still, or again, - * depending on which side we land). + * As per the comment in futex_unlock_pi() this should not happen. * * When this happens, give up our locks and try again, giving * the futex_lock_pi() instance time to complete, either by @@ -3018,15 +3013,18 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags) if (pi_state->owner != current) goto out_unlock; + get_pi_state(pi_state); /* - * Grab a reference on the pi_state and drop hb->lock. + * Since modifying the wait_list is done while holding both + * hb->lock and wait_lock, holding either is sufficient to + * observe it. * - * The reference ensures pi_state lives, dropping the hb->lock - * is tricky.. wake_futex_pi() will take rt_mutex::wait_lock to - * close the races against futex_lock_pi(), but in case of - * _any_ fail we'll abort and retry the whole deal. + * By taking wait_lock while still holding hb->lock, we ensure + * there is no point where we hold neither; and therefore + * wake_futex_pi() must observe a state consistent with what we + * observed. */ - get_pi_state(pi_state); + raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); spin_unlock(&hb->lock); ret = wake_futex_pi(uaddr, uval, pi_state); From patchwork Mon Mar 1 17:31:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Hutchings X-Patchwork-Id: 389905 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16CE7C433E6 for ; Mon, 1 Mar 2021 17:41:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DE64265306 for ; Mon, 1 Mar 2021 17:41:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238103AbhCARkt (ORCPT ); Mon, 1 Mar 2021 12:40:49 -0500 Received: from maynard.decadent.org.uk ([95.217.213.242]:32848 "EHLO maynard.decadent.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232318AbhCARcj (ORCPT ); Mon, 1 Mar 2021 12:32:39 -0500 Received: from [2a02:1811:d34:3700:3b8d:b310:d327:e418] (helo=deadeye) by maynard with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lGmOb-0002aR-LY; Mon, 01 Mar 2021 18:31:49 +0100 Received: from ben by deadeye with local (Exim 4.94) (envelope-from ) id 1lGmOa-004iXh-Qp; Mon, 01 Mar 2021 18:31:48 +0100 Date: Mon, 1 Mar 2021 18:31:48 +0100 From: Ben Hutchings To: stable@vger.kernel.org Cc: Lee Jones , "Luis Claudio R. Goncalves" Subject: [PATCH 4.9 5/7] futex: Fix pi_state->owner serialization Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-SA-Exim-Connect-IP: 2a02:1811:d34:3700:3b8d:b310:d327:e418 X-SA-Exim-Mail-From: ben@decadent.org.uk X-SA-Exim-Scanned: No (on maynard); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Peter Zijlstra commit c74aef2d06a9f59cece89093eecc552933cba72a upstream. There was a reported suspicion about a race between exit_pi_state_list() and put_pi_state(). The same report mentioned the comment with put_pi_state() said it should be called with hb->lock held, and it no longer is in all places. As it turns out, the pi_state->owner serialization is indeed broken. As per the new rules: 734009e96d19 ("futex: Change locking rules") pi_state->owner should be serialized by pi_state->pi_mutex.wait_lock. For the sites setting pi_state->owner we already hold wait_lock (where required) but exit_pi_state_list() and put_pi_state() were not and raced on clearing it. Fixes: 734009e96d19 ("futex: Change locking rules") Reported-by: Gratian Crisan Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: dvhart@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20170922154806.jd3ffltfk24m4o4y@hirez.programming.kicks-ass.net [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings --- kernel/futex.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 77cec33ea112..a07f6080c8b0 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -868,8 +868,6 @@ static void get_pi_state(struct futex_pi_state *pi_state) /* * Drops a reference to the pi_state object and frees or caches it * when the last reference is gone. - * - * Must be called with the hb lock held. */ static void put_pi_state(struct futex_pi_state *pi_state) { @@ -884,13 +882,15 @@ static void put_pi_state(struct futex_pi_state *pi_state) * and has cleaned up the pi_state already */ if (pi_state->owner) { + raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); pi_state_update_owner(pi_state, NULL); rt_mutex_proxy_unlock(&pi_state->pi_mutex); + raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); } - if (current->pi_state_cache) + if (current->pi_state_cache) { kfree(pi_state); - else { + } else { /* * pi_state->list is already empty. * clear pi_state->owner. @@ -949,13 +949,14 @@ static void exit_pi_state_list(struct task_struct *curr) raw_spin_unlock_irq(&curr->pi_lock); spin_lock(&hb->lock); - - raw_spin_lock_irq(&curr->pi_lock); + raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); + raw_spin_lock(&curr->pi_lock); /* * We dropped the pi-lock, so re-check whether this * task still owns the PI-state: */ if (head->next != next) { + raw_spin_unlock(&pi_state->pi_mutex.wait_lock); spin_unlock(&hb->lock); continue; } @@ -964,9 +965,10 @@ static void exit_pi_state_list(struct task_struct *curr) WARN_ON(list_empty(&pi_state->list)); list_del_init(&pi_state->list); pi_state->owner = NULL; - raw_spin_unlock_irq(&curr->pi_lock); + raw_spin_unlock(&curr->pi_lock); get_pi_state(pi_state); + raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); spin_unlock(&hb->lock); rt_mutex_futex_unlock(&pi_state->pi_mutex); @@ -1349,6 +1351,10 @@ static int attach_to_pi_owner(u32 __user *uaddr, u32 uval, union futex_key *key, WARN_ON(!list_empty(&pi_state->list)); list_add(&pi_state->list, &p->pi_state_list); + /* + * Assignment without holding pi_state->pi_mutex.wait_lock is safe + * because there is no concurrency as the object is not published yet. + */ pi_state->owner = p; raw_spin_unlock_irq(&p->pi_lock); @@ -3027,6 +3033,7 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags) raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); spin_unlock(&hb->lock); + /* drops pi_state->pi_mutex.wait_lock */ ret = wake_futex_pi(uaddr, uval, pi_state); put_pi_state(pi_state); From patchwork Mon Mar 1 17:31:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Hutchings X-Patchwork-Id: 389904 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17444C433DB for ; Mon, 1 Mar 2021 17:41:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D4538650BB for ; Mon, 1 Mar 2021 17:41:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232318AbhCARlA (ORCPT ); Mon, 1 Mar 2021 12:41:00 -0500 Received: from maynard.decadent.org.uk ([95.217.213.242]:32858 "EHLO maynard.decadent.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237892AbhCARco (ORCPT ); Mon, 1 Mar 2021 12:32:44 -0500 Received: from [2a02:1811:d34:3700:3b8d:b310:d327:e418] (helo=deadeye) by maynard with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lGmOi-0002ae-Sy; Mon, 01 Mar 2021 18:31:56 +0100 Received: from ben by deadeye with local (Exim 4.94) (envelope-from ) id 1lGmOh-004iYH-Rj; Mon, 01 Mar 2021 18:31:55 +0100 Date: Mon, 1 Mar 2021 18:31:55 +0100 From: Ben Hutchings To: stable@vger.kernel.org Cc: Lee Jones , "Luis Claudio R. Goncalves" Subject: [PATCH 4.9 6/7] futex: Fix more put_pi_state() vs. exit_pi_state_list() races Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-SA-Exim-Connect-IP: 2a02:1811:d34:3700:3b8d:b310:d327:e418 X-SA-Exim-Mail-From: ben@decadent.org.uk X-SA-Exim-Scanned: No (on maynard); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Peter Zijlstra commit 153fbd1226fb30b8630802aa5047b8af5ef53c9f upstream. Dmitry (through syzbot) reported being able to trigger the WARN in get_pi_state() and a use-after-free on: raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); Both are due to this race: exit_pi_state_list() put_pi_state() lock(&curr->pi_lock) while() { pi_state = list_first_entry(head); hb = hash_futex(&pi_state->key); unlock(&curr->pi_lock); dec_and_test(&pi_state->refcount); lock(&hb->lock) lock(&pi_state->pi_mutex.wait_lock) // uaf if pi_state free'd lock(&curr->pi_lock); .... unlock(&curr->pi_lock); get_pi_state(); // WARN; refcount==0 The problem is we take the reference count too late, and don't allow it being 0. Fix it by using inc_not_zero() and simply retrying the loop when we fail to get a refcount. In that case put_pi_state() should remove the entry from the list. Reported-by: Dmitry Vyukov Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Thomas Gleixner Cc: Gratian Crisan Cc: Linus Torvalds Cc: Peter Zijlstra Cc: dvhart@infradead.org Cc: syzbot Cc: syzkaller-bugs@googlegroups.com Cc: Fixes: c74aef2d06a9 ("futex: Fix pi_state->owner serialization") Link: http://lkml.kernel.org/r/20171031101853.xpfh72y643kdfhjs@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar Signed-off-by: Ben Hutchings --- kernel/futex.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index a07f6080c8b0..855dae277f83 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -941,11 +941,27 @@ static void exit_pi_state_list(struct task_struct *curr) */ raw_spin_lock_irq(&curr->pi_lock); while (!list_empty(head)) { - next = head->next; pi_state = list_entry(next, struct futex_pi_state, list); key = pi_state->key; hb = hash_futex(&key); + + /* + * We can race against put_pi_state() removing itself from the + * list (a waiter going away). put_pi_state() will first + * decrement the reference count and then modify the list, so + * its possible to see the list entry but fail this reference + * acquire. + * + * In that case; drop the locks to let put_pi_state() make + * progress and retry the loop. + */ + if (!atomic_inc_not_zero(&pi_state->refcount)) { + raw_spin_unlock_irq(&curr->pi_lock); + cpu_relax(); + raw_spin_lock_irq(&curr->pi_lock); + continue; + } raw_spin_unlock_irq(&curr->pi_lock); spin_lock(&hb->lock); @@ -956,8 +972,10 @@ static void exit_pi_state_list(struct task_struct *curr) * task still owns the PI-state: */ if (head->next != next) { + /* retain curr->pi_lock for the loop invariant */ raw_spin_unlock(&pi_state->pi_mutex.wait_lock); spin_unlock(&hb->lock); + put_pi_state(pi_state); continue; } @@ -965,9 +983,8 @@ static void exit_pi_state_list(struct task_struct *curr) WARN_ON(list_empty(&pi_state->list)); list_del_init(&pi_state->list); pi_state->owner = NULL; - raw_spin_unlock(&curr->pi_lock); - get_pi_state(pi_state); + raw_spin_unlock(&curr->pi_lock); raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); spin_unlock(&hb->lock);