diff mbox series

futex: Handle transient "ownerless" rtmutex state correctly

Message ID 87sg9pkvf7.fsf@nanos.tec.linutronix.de
State New
Headers show
Series futex: Handle transient "ownerless" rtmutex state correctly | expand

Commit Message

Thomas Gleixner Nov. 4, 2020, 3:12 p.m. UTC
From: Mike Galbraith <efault@gmx.de>

Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().
This is one possible chain of events leading to this:

Task Prio       Operation
T1   120	lock(F)
T2   120	lock(F)   -> blocks (top waiter)
T3   50 (RT)	lock(F)   -> boosts T3 and blocks (new top waiter)
XX   		timeout/  -> wakes T2
		signal
T1   50		unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set)
T2   120	cleanup   -> try_to_take_mutex() fails because T3 is the top waiter
     			     and the lower priority T2 cannot steal the lock.
     			  -> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON()

The comment states that this is invalid and rt_mutex_real_owner() must
return a non NULL owner when the trylock failed, but in case of a queued
and woken up waiter rt_mutex_real_owner() == NULL is a valid transient
state. The higher priority waiter has simply not yet managed to take over
the rtmutex.

The BUG_ON() is therefore wrong and this is just another retry condition in
fixup_pi_state_owner().

Drop the locks, so that T3 can make progress, and then try the fixup again.

Gratian provided a great analysis, traces and a reproducer. The analysis is
to the point, but it confused the hell out of that tglx dude who had to
page in all the futex horrors again. Condensed version is above. 

[ tglx: Wrote comment and changelog ]

Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
Reported-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com
---
 kernel/futex.c |   16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Comments

Thomas Gleixner Nov. 4, 2020, 3:22 p.m. UTC | #1
On Wed, Nov 04 2020 at 16:12, Thomas Gleixner wrote:

> From: Mike Galbraith <efault@gmx.de>

>

> Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().

> This is one possible chain of events leading to this:

>

> Task Prio       Operation

> T1   120	lock(F)

> T2   120	lock(F)   -> blocks (top waiter)

> T3   50 (RT)	lock(F)   -> boosts T3 and blocks (new top waiter)


boosts T1 obviously as Sebastian just pointed out to me. /me pulls the
futex induced brain damage excuse ...
Mike Galbraith Nov. 4, 2020, 4:50 p.m. UTC | #2
On Wed, 2020-11-04 at 16:12 +0100, Thomas Gleixner wrote:
> From: Mike Galbraith <efault@gmx.de>


Hrmph.  How about a suggested-by, or just take it.  That dinky diag hit
the mark (not _entirely_ by accident, but..;) is irrelevant, you did
all of the work to make it a patch.

	-Mike

> Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().

> This is one possible chain of events leading to this:

>

> Task Prio       Operation

> T1   120	lock(F)

> T2   120	lock(F)   -> blocks (top waiter)

> T3   50 (RT)	lock(F)   -> boosts T3 and blocks (new top waiter)

> XX   		timeout/  -> wakes T2

> 		signal

> T1   50		unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set)

> T2   120	cleanup   -> try_to_take_mutex() fails because T3 is the top waiter

>      			     and the lower priority T2 cannot steal the lock.

>      			  -> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON()

>

> The comment states that this is invalid and rt_mutex_real_owner() must

> return a non NULL owner when the trylock failed, but in case of a queued

> and woken up waiter rt_mutex_real_owner() == NULL is a valid transient

> state. The higher priority waiter has simply not yet managed to take over

> the rtmutex.

>

> The BUG_ON() is therefore wrong and this is just another retry condition in

> fixup_pi_state_owner().

>

> Drop the locks, so that T3 can make progress, and then try the fixup again.

>

> Gratian provided a great analysis, traces and a reproducer. The analysis is

> to the point, but it confused the hell out of that tglx dude who had to

> page in all the futex horrors again. Condensed version is above.

>

> [ tglx: Wrote comment and changelog ]

>

> Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")

> Reported-by: Gratian Crisan <gratian.crisan@ni.com>

> Signed-off-by: Mike Galbraith <efault@gmx.de>

> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

> Cc: stable@vger.kernel.org

> Link: https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com

> ---

>  kernel/futex.c |   16 ++++++++++++++--

>  1 file changed, 14 insertions(+), 2 deletions(-)

>

> --- a/kernel/futex.c

> +++ b/kernel/futex.c

> @@ -2380,10 +2380,22 @@ static int fixup_pi_state_owner(u32 __us

>  		}

>

>  		/*

> -		 * Since we just failed the trylock; there must be an owner.

> +		 * The trylock just failed, so either there is an owner or

> +		 * there is a higher priority waiter than this one.

>  		 */

>  		newowner = rt_mutex_owner(&pi_state->pi_mutex);

> -		BUG_ON(!newowner);

> +		/*

> +		 * If the higher priority waiter has not yet taken over the

> +		 * rtmutex then newowner is NULL. We can't return here with

> +		 * that state because it's inconsistent vs. the user space

> +		 * state. So drop the locks and try again. It's a valid

> +		 * situation and not any different from the other retry

> +		 * conditions.

> +		 */

> +		if (unlikely(!newowner)) {

> +			ret = -EAGAIN;

> +			goto handle_err;

> +		}

>  	} else {

>  		WARN_ON_ONCE(argowner != current);

>  		if (oldowner == current) {
Gratian Crisan Nov. 5, 2020, 6:32 a.m. UTC | #3
Thomas Gleixner writes:

> From: Mike Galbraith <efault@gmx.de>

>

> Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().

> This is one possible chain of events leading to this:

>

> Task Prio       Operation

> T1   120	lock(F)

> T2   120	lock(F)   -> blocks (top waiter)

> T3   50 (RT)	lock(F)   -> boosts T3 and blocks (new top waiter)

> XX   		timeout/  -> wakes T2

> 		signal

> T1   50		unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set)

> T2   120	cleanup   -> try_to_take_mutex() fails because T3 is the top waiter

>      			     and the lower priority T2 cannot steal the lock.

>      			  -> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON()

>

> The comment states that this is invalid and rt_mutex_real_owner() must

> return a non NULL owner when the trylock failed, but in case of a queued

> and woken up waiter rt_mutex_real_owner() == NULL is a valid transient

> state. The higher priority waiter has simply not yet managed to take over

> the rtmutex.

>

> The BUG_ON() is therefore wrong and this is just another retry condition in

> fixup_pi_state_owner().

>

> Drop the locks, so that T3 can make progress, and then try the fixup again.

>

> Gratian provided a great analysis, traces and a reproducer. The analysis is

> to the point, but it confused the hell out of that tglx dude who had to

> page in all the futex horrors again. Condensed version is above. 

>

> [ tglx: Wrote comment and changelog ]

>

> Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")

> Reported-by: Gratian Crisan <gratian.crisan@ni.com>

> Signed-off-by: Mike Galbraith <efault@gmx.de>

> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

> Cc: stable@vger.kernel.org

> Link: https://urldefense.com/v3/__https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com__;!!FbZ0ZwI3Qg!5INAsNbAVSp3jaNkkjFazSRC86BpcZnVM3-oTDYl0KijU6jA5pWYk4KI79_L5F4$ 


LGTM, no crashes in my testing today.

-Gratian

> ---

>  kernel/futex.c |   16 ++++++++++++++--

>  1 file changed, 14 insertions(+), 2 deletions(-)

>

> --- a/kernel/futex.c

> +++ b/kernel/futex.c

> @@ -2380,10 +2380,22 @@ static int fixup_pi_state_owner(u32 __us

>  		}

>  

>  		/*

> -		 * Since we just failed the trylock; there must be an owner.

> +		 * The trylock just failed, so either there is an owner or

> +		 * there is a higher priority waiter than this one.

>  		 */

>  		newowner = rt_mutex_owner(&pi_state->pi_mutex);

> -		BUG_ON(!newowner);

> +		/*

> +		 * If the higher priority waiter has not yet taken over the

> +		 * rtmutex then newowner is NULL. We can't return here with

> +		 * that state because it's inconsistent vs. the user space

> +		 * state. So drop the locks and try again. It's a valid

> +		 * situation and not any different from the other retry

> +		 * conditions.

> +		 */

> +		if (unlikely(!newowner)) {

> +			ret = -EAGAIN;

> +			goto handle_err;

> +		}

>  	} else {

>  		WARN_ON_ONCE(argowner != current);

>  		if (oldowner == current) {
diff mbox series

Patch

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2380,10 +2380,22 @@  static int fixup_pi_state_owner(u32 __us
 		}
 
 		/*
-		 * Since we just failed the trylock; there must be an owner.
+		 * The trylock just failed, so either there is an owner or
+		 * there is a higher priority waiter than this one.
 		 */
 		newowner = rt_mutex_owner(&pi_state->pi_mutex);
-		BUG_ON(!newowner);
+		/*
+		 * If the higher priority waiter has not yet taken over the
+		 * rtmutex then newowner is NULL. We can't return here with
+		 * that state because it's inconsistent vs. the user space
+		 * state. So drop the locks and try again. It's a valid
+		 * situation and not any different from the other retry
+		 * conditions.
+		 */
+		if (unlikely(!newowner)) {
+			ret = -EAGAIN;
+			goto handle_err;
+		}
 	} else {
 		WARN_ON_ONCE(argowner != current);
 		if (oldowner == current) {