From patchwork Mon Aug 2 13:46:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhen Lei X-Patchwork-Id: 490343 Delivered-To: patch@linaro.org Received: by 2002:a05:6638:1185:0:0:0:0 with SMTP id f5csp1940504jas; Mon, 2 Aug 2021 06:48:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyKiTCMisENz5HO8+MqCgYFWLT2qHWMlDfqUDKFiJI/sXLhVtxDRdCkTJFJslTLVRexYpxx X-Received: by 2002:a05:6602:2e8f:: with SMTP id m15mr346231iow.80.1627912085902; Mon, 02 Aug 2021 06:48:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627912085; cv=none; d=google.com; s=arc-20160816; b=MMhvA0TJMOruHNHkRYT64caL6pbdhqpZxX7NJxIIUVYkrh4JAF+kv3pBA4GxVLq9rX svSXDxB5RVaqXS4DLqUICVGJwLR0exSL3OYwZNj7UwiLI/4I2zrboK2sJXvauPp0MfBv c2M+/zc7nGDeY5sD6lPctS1RdenzBsWg/UNneCVEJjoBlcePNNpfWOv1ZW4nIl8ZwBLP EAmixbiTWDi35EHWE6OeKnu0GiHeEz+icuRM9WE5EzWxQGHc7418hsiHAyFD5Kui1bsL AITcXtazROXjm029OFP2jLA/1nSokWQr498bI4QSFLBHR1LJYuDdjJma/jX1bYfqPzy/ FkVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=+crz8pLrkR/icDNHLEMbxBaS6LbT64mPCD+e+l/NIjk=; b=U6/qit9hQcZU6K37kbQepgOlmFQQz4b4qF0Wv6EfQE5WtFUlIoplvkh979ngg65nfw AgAoSfo4HK4sT9PLonPZclHjWCgj5XSRVvVf1w914tfFOxK96m/ljIiT0smYElxQ2/qU Sa34bJArdGOYePybxC7QSY1UADe10iK9yE4Y08CxBV/Zif1DcOESWR8wnkZ8QYHOaU2G qiwSSwdDbTzi/yj+RgGPUUFs4hE6pq1P3i0Ha7d5XOy8n96E1WqS2JBKrRxOeDe0z2ea cs6BFtl4az7NuGmso2GtJdpjBT82WJZCEvyty1y5L+aICkkox+yEz9KpWwHM1/8OdS2B 2/WQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m9si12547472jaj.89.2021.08.02.06.48.05; Mon, 02 Aug 2021 06:48:05 -0700 (PDT) Received-SPF: pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234235AbhHBNsN (ORCPT + 12 others); Mon, 2 Aug 2021 09:48:13 -0400 Received: from szxga01-in.huawei.com ([45.249.212.187]:16032 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234202AbhHBNrg (ORCPT ); Mon, 2 Aug 2021 09:47:36 -0400 Received: from dggemv703-chm.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4GdfNZ4p2szZwVw; Mon, 2 Aug 2021 21:43:42 +0800 (CST) Received: from dggpemm500006.china.huawei.com (7.185.36.236) by dggemv703-chm.china.huawei.com (10.3.19.46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Mon, 2 Aug 2021 21:47:14 +0800 Received: from thunder-town.china.huawei.com (10.174.179.0) by dggpemm500006.china.huawei.com (7.185.36.236) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Mon, 2 Aug 2021 21:47:14 +0800 From: Zhen Lei To: Greg Kroah-Hartman , stable CC: Zhen Lei , Anna-Maria Gleixner , Mike Galbraith , Sasha Levin , Ingo Molnar , Peter Zijlstra , Thomas Gleixner , linux-kernel Subject: [PATCH 4.4 05/11] futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock() Date: Mon, 2 Aug 2021 21:46:18 +0800 Message-ID: <20210802134624.1934-6-thunder.leizhen@huawei.com> X-Mailer: git-send-email 2.26.0.windows.1 In-Reply-To: <20210802134624.1934-1-thunder.leizhen@huawei.com> References: <20210802134624.1934-1-thunder.leizhen@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.179.0] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggpemm500006.china.huawei.com (7.185.36.236) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Peter Zijlstra [ Upstream commit cfafcd117da0216520568c195cb2f6cd1980c4bb ] By changing futex_lock_pi() to use rt_mutex_*_proxy_lock() all wait_list modifications are done under both hb->lock and wait_lock. This closes the obvious interleave pattern between futex_lock_pi() and futex_unlock_pi(), but not entirely so. See below: Before: futex_lock_pi() futex_unlock_pi() unlock hb->lock lock hb->lock unlock hb->lock lock rt_mutex->wait_lock unlock rt_mutex_wait_lock -EAGAIN lock rt_mutex->wait_lock list_add unlock rt_mutex->wait_lock schedule() lock rt_mutex->wait_lock list_del unlock rt_mutex->wait_lock -EAGAIN lock hb->lock After: futex_lock_pi() futex_unlock_pi() lock hb->lock lock rt_mutex->wait_lock list_add unlock rt_mutex->wait_lock unlock hb->lock schedule() lock hb->lock unlock hb->lock lock hb->lock lock rt_mutex->wait_lock list_del unlock rt_mutex->wait_lock lock rt_mutex->wait_lock unlock rt_mutex_wait_lock -EAGAIN unlock hb->lock It does however solve the earlier starvation/live-lock scenario which got introduced with the -EAGAIN since unlike the before scenario; where the -EAGAIN happens while futex_unlock_pi() doesn't hold any locks; in the after scenario it happens while futex_unlock_pi() actually holds a lock, and then it is serialized on that lock. Signed-off-by: Peter Zijlstra (Intel) Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104152.062785528@infradead.org Signed-off-by: Thomas Gleixner Signed-off-by: Zhen Lei --- kernel/futex.c | 77 +++++++++++++++++++++++---------- kernel/locking/rtmutex.c | 26 +++-------- kernel/locking/rtmutex_common.h | 1 - 3 files changed, 62 insertions(+), 42 deletions(-) -- 2.26.0.106.g9fadedd diff --git a/kernel/futex.c b/kernel/futex.c index dcea7b214e94d75..45f00a2fb59c554 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2284,20 +2284,7 @@ queue_unlock(struct futex_hash_bucket *hb) hb_waiters_dec(hb); } -/** - * queue_me() - Enqueue the futex_q on the futex_hash_bucket - * @q: The futex_q to enqueue - * @hb: The destination hash bucket - * - * The hb->lock must be held by the caller, and is released here. A call to - * queue_me() is typically paired with exactly one call to unqueue_me(). The - * exceptions involve the PI related operations, which may use unqueue_me_pi() - * or nothing if the unqueue is done as part of the wake process and the unqueue - * state is implicit in the state of woken task (see futex_wait_requeue_pi() for - * an example). - */ -static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb) - __releases(&hb->lock) +static inline void __queue_me(struct futex_q *q, struct futex_hash_bucket *hb) { int prio; @@ -2314,6 +2301,24 @@ static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb) plist_node_init(&q->list, prio); plist_add(&q->list, &hb->chain); q->task = current; +} + +/** + * queue_me() - Enqueue the futex_q on the futex_hash_bucket + * @q: The futex_q to enqueue + * @hb: The destination hash bucket + * + * The hb->lock must be held by the caller, and is released here. A call to + * queue_me() is typically paired with exactly one call to unqueue_me(). The + * exceptions involve the PI related operations, which may use unqueue_me_pi() + * or nothing if the unqueue is done as part of the wake process and the unqueue + * state is implicit in the state of woken task (see futex_wait_requeue_pi() for + * an example). + */ +static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb) + __releases(&hb->lock) +{ + __queue_me(q, hb); spin_unlock(&hb->lock); } @@ -2819,6 +2824,7 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags, { struct hrtimer_sleeper timeout, *to = NULL; struct task_struct *exiting = NULL; + struct rt_mutex_waiter rt_waiter; struct futex_hash_bucket *hb; struct futex_q q = futex_q_init; int res, ret; @@ -2879,24 +2885,51 @@ retry_private: } } + WARN_ON(!q.pi_state); + /* * Only actually queue now that the atomic ops are done: */ - queue_me(&q, hb); + __queue_me(&q, hb); - WARN_ON(!q.pi_state); - /* - * Block on the PI mutex: - */ - if (!trylock) { - ret = rt_mutex_timed_futex_lock(&q.pi_state->pi_mutex, to); - } else { + if (trylock) { ret = rt_mutex_futex_trylock(&q.pi_state->pi_mutex); /* Fixup the trylock return value: */ ret = ret ? 0 : -EWOULDBLOCK; + goto no_block; + } + + /* + * We must add ourselves to the rt_mutex waitlist while holding hb->lock + * such that the hb and rt_mutex wait lists match. + */ + rt_mutex_init_waiter(&rt_waiter); + ret = rt_mutex_start_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter, current); + if (ret) { + if (ret == 1) + ret = 0; + + goto no_block; } + spin_unlock(q.lock_ptr); + + if (unlikely(to)) + hrtimer_start_expires(&to->timer, HRTIMER_MODE_ABS); + + ret = rt_mutex_wait_proxy_lock(&q.pi_state->pi_mutex, to, &rt_waiter); + spin_lock(q.lock_ptr); + /* + * If we failed to acquire the lock (signal/timeout), we must + * first acquire the hb->lock before removing the lock from the + * rt_mutex waitqueue, such that we can keep the hb and rt_mutex + * wait lists consistent. + */ + if (ret && !rt_mutex_cleanup_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter)) + ret = 0; + +no_block: /* * Fixup the pi_state owner and possibly acquire the lock if we * haven't already. diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index fa24df4bc7a8ff0..98e45a2e1236e30 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1488,19 +1488,6 @@ int __sched rt_mutex_lock_interruptible(struct rt_mutex *lock) } EXPORT_SYMBOL_GPL(rt_mutex_lock_interruptible); -/* - * Futex variant with full deadlock detection. - * Futex variants must not use the fast-path, see __rt_mutex_futex_unlock(). - */ -int __sched rt_mutex_timed_futex_lock(struct rt_mutex *lock, - struct hrtimer_sleeper *timeout) -{ - might_sleep(); - - return rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, - timeout, RT_MUTEX_FULL_CHAINWALK); -} - /* * Futex variant, must not use fastpath. */ @@ -1774,12 +1761,6 @@ int rt_mutex_wait_proxy_lock(struct rt_mutex *lock, /* sleep on the mutex */ ret = __rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, to, waiter); - /* - * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might - * have to fix that up. - */ - fixup_rt_mutex_waiters(lock); - raw_spin_unlock(&lock->wait_lock); return ret; @@ -1819,6 +1800,13 @@ bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock, fixup_rt_mutex_waiters(lock); cleanup = true; } + + /* + * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might + * have to fix that up. + */ + fixup_rt_mutex_waiters(lock); + raw_spin_unlock_irq(&lock->wait_lock); return cleanup; diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h index 0424ff97bb69cad..97c048c494f00d8 100644 --- a/kernel/locking/rtmutex_common.h +++ b/kernel/locking/rtmutex_common.h @@ -111,7 +111,6 @@ extern int rt_mutex_wait_proxy_lock(struct rt_mutex *lock, struct rt_mutex_waiter *waiter); extern bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock, struct rt_mutex_waiter *waiter); -extern int rt_mutex_timed_futex_lock(struct rt_mutex *l, struct hrtimer_sleeper *to); extern int rt_mutex_futex_trylock(struct rt_mutex *l); extern int __rt_mutex_futex_trylock(struct rt_mutex *l);