From patchwork Tue Mar 31 06:55:16 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viresh Kumar X-Patchwork-Id: 46568 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-wi0-f197.google.com (mail-wi0-f197.google.com [209.85.212.197]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 4CC39215A7 for ; Tue, 31 Mar 2015 06:55:48 +0000 (UTC) Received: by wiaa2 with SMTP id a2sf2118129wia.1 for ; Mon, 30 Mar 2015 23:55:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject :date:message-id:in-reply-to:references:in-reply-to:references :sender:precedence:list-id:x-original-sender :x-original-authentication-results:mailing-list:list-post:list-help :list-archive:list-unsubscribe; bh=c8sUV0UNBorhEwLqWYWjkCSfniAp5ovF4igvXsXoqI4=; b=i/nkwTavJDjd68I44EVVZcmhVstzGzJyx7SfH8JkSWRPn2PiT4ZuzW2JWmqeYi5PAM 8n5LXnVkK4poJbV6IJ1N/Z/M9yVzgxT9rYXluXFb/lr/M3iW9s6lgBXr2bsZXl6+hXFo q6p4833dEfsr8n9SBJ96OTRdvuVvq7ewPNwjEY+kWMHB8MgkOefX6U18BIBYlTCsJwqx gQ+Nr80WNU8Kk/VxtZsS61BUpfYgSmA4sxUaXeO09M+2ErCQdc4BA4dNclN//Fc8aSzq aTof8hikRBxPICRYZHLlfHeTL+vQHOngv5sBC58KXfgfL5sS+ct1jUGLfnyVWqzxGrtC aYfA== X-Gm-Message-State: ALoCoQm0PEZZnAYvsOH/J5adZeFaZjxkY/fJoXvaCc0ub5OL/LooOz27gdEHU7jlonPYUxs/FH23 X-Received: by 10.112.130.71 with SMTP id oc7mr3833988lbb.13.1427784947418; Mon, 30 Mar 2015 23:55:47 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.181.137 with SMTP id dw9ls600062lac.85.gmail; Mon, 30 Mar 2015 23:55:47 -0700 (PDT) X-Received: by 10.152.180.75 with SMTP id dm11mr3055060lac.39.1427784947162; Mon, 30 Mar 2015 23:55:47 -0700 (PDT) Received: from mail-la0-f41.google.com (mail-la0-f41.google.com. [209.85.215.41]) by mx.google.com with ESMTPS id dx6si8605195lbd.101.2015.03.30.23.55.47 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Mar 2015 23:55:47 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.41 as permitted sender) client-ip=209.85.215.41; Received: by lahf3 with SMTP id f3so5130301lah.2 for ; Mon, 30 Mar 2015 23:55:47 -0700 (PDT) X-Received: by 10.112.201.231 with SMTP id kd7mr30289202lbc.35.1427784947041; Mon, 30 Mar 2015 23:55:47 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.57.201 with SMTP id k9csp1614288lbq; Mon, 30 Mar 2015 23:55:45 -0700 (PDT) X-Received: by 10.70.37.73 with SMTP id w9mr64878077pdj.83.1427784944967; Mon, 30 Mar 2015 23:55:44 -0700 (PDT) Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c3si18093779pat.107.2015.03.30.23.55.44; Mon, 30 Mar 2015 23:55:44 -0700 (PDT) Received-SPF: none (google.com: linux-kernel-owner@vger.kernel.org does not designate permitted sender hosts) client-ip=209.132.180.67; Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752413AbbCaGzi (ORCPT + 27 others); Tue, 31 Mar 2015 02:55:38 -0400 Received: from mail-pd0-f174.google.com ([209.85.192.174]:33410 "EHLO mail-pd0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752278AbbCaGzd (ORCPT ); Tue, 31 Mar 2015 02:55:33 -0400 Received: by pdrw1 with SMTP id w1so2925191pdr.0 for ; Mon, 30 Mar 2015 23:55:33 -0700 (PDT) X-Received: by 10.70.35.193 with SMTP id k1mr65846977pdj.46.1427784932958; Mon, 30 Mar 2015 23:55:32 -0700 (PDT) Received: from localhost ([122.167.118.120]) by mx.google.com with ESMTPSA id yy2sm12764066pbb.6.2015.03.30.23.55.31 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Mon, 30 Mar 2015 23:55:32 -0700 (PDT) From: Viresh Kumar To: Ingo Molnar , Peter Zijlstra , Thomas Gleixner Cc: linaro-kernel@lists.linaro.org, linux-kernel@vger.kernel.org, Viresh Kumar , Steven Miao Subject: [PATCH 1/2] timer: Avoid waking up an idle-core by migrate running timer Date: Tue, 31 Mar 2015 12:25:16 +0530 Message-Id: <80182e47a7103608d2ddab7f62c0c3dffc99fdcc.1427782893.git.viresh.kumar@linaro.org> X-Mailer: git-send-email 2.3.0.rc0.44.ga94655d In-Reply-To: References: In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org Precedence: list List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: viresh.kumar@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.41 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , While queuing a timer, we try to migrate it to a non-idle core if the local core is idle, but we don't try that if the timer is re-armed from its handler. There were few unsolved problems due to which it was avoided until now. But there are cases where solving these problems can be useful. When the timer is always re-armed from its handler, it never migrates to other cores. And many a times, it ends up waking an idle core to just service the timer, which could have been handled by a non-idle core. Problems to solve, if we allow such migrations: - Serialization of timer with itself. Its possible that timer may fire on the new base, before the timer handler finishes on old base. - del_timer_sync() can't detect that the timer's handler has not finished. __mod_timer migrates the timer with following code: spin_lock(&old_base->lock); timer_set_base(timer, NULL); spin_unlock(&old_base->lock); spin_lock(&new_base->lock); timer_set_base(timer, new_base); spin_unlock(&new_base->lock); Once the new_base->lock is released, del_timer_sync() can take the new_base->lock and will get new_base->running_timer != timer. del_timer_sync() will then remove the timer and return while timer's handler hasn't finished yet on the old base. To fix these issues, lets use another bit from base pointer to mark if a timer's handler is currently running or not. This can be used to verify timer's state in del_timer_sync(). Before running timer's handler we must ensure timer isn't running on any other CPU. If it is, then process other expired timers first, if any, and then wait until the handler finishes. Cc: Steven Miao Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Viresh Kumar --- include/linux/timer.h | 3 +- kernel/time/timer.c | 93 ++++++++++++++++++++++++++++++++++++++++----------- 2 files changed, 76 insertions(+), 20 deletions(-) diff --git a/include/linux/timer.h b/include/linux/timer.h index 8c5a197e1587..68bf09d69352 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -67,8 +67,9 @@ extern struct tvec_base boot_tvec_bases; */ #define TIMER_DEFERRABLE 0x1LU #define TIMER_IRQSAFE 0x2LU +#define TIMER_RUNNING 0x4LU -#define TIMER_FLAG_MASK 0x3LU +#define TIMER_FLAG_MASK 0x7LU #define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \ .entry = { .prev = TIMER_ENTRY_STATIC }, \ diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 2d3f5c504939..364644811485 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -105,6 +105,21 @@ static inline unsigned int tbase_get_irqsafe(struct tvec_base *base) return ((unsigned int)(unsigned long)base & TIMER_IRQSAFE); } +static inline unsigned int timer_running(struct timer_list *timer) +{ + return ((unsigned int)(unsigned long)timer->base & TIMER_RUNNING); +} + +static inline void timer_set_running(struct timer_list *timer) +{ + timer->base = (struct tvec_base *)((unsigned long)timer->base | TIMER_RUNNING); +} + +static inline void timer_clear_running(struct timer_list *timer) +{ + timer->base = (struct tvec_base *)((unsigned long)timer->base & ~TIMER_RUNNING); +} + static inline struct tvec_base *tbase_get_base(struct tvec_base *base) { return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK)); @@ -781,21 +796,12 @@ __mod_timer(struct timer_list *timer, unsigned long expires, new_base = per_cpu(tvec_bases, cpu); if (base != new_base) { - /* - * We are trying to schedule the timer on the local CPU. - * However we can't change timer's base while it is running, - * otherwise del_timer_sync() can't detect that the timer's - * handler yet has not finished. This also guarantees that - * the timer is serialized wrt itself. - */ - if (likely(base->running_timer != timer)) { - /* See the comment in lock_timer_base() */ - timer_set_base(timer, NULL); - spin_unlock(&base->lock); - base = new_base; - spin_lock(&base->lock); - timer_set_base(timer, base); - } + /* See the comment in lock_timer_base() */ + timer_set_base(timer, NULL); + spin_unlock(&base->lock); + base = new_base; + spin_lock(&base->lock); + timer_set_base(timer, base); } timer->expires = expires; @@ -1016,7 +1022,7 @@ int try_to_del_timer_sync(struct timer_list *timer) base = lock_timer_base(timer, &flags); - if (base->running_timer != timer) { + if (!timer_running(timer)) { timer_stats_timer_clear_start_info(timer); ret = detach_if_pending(timer, base, true); } @@ -1050,12 +1056,12 @@ EXPORT_SYMBOL(try_to_del_timer_sync); * ---- ---- * * call_timer_fn(); - * base->running_timer = mytimer; + * timer_set_running(mytimer); * spin_lock_irq(somelock); * * spin_lock(somelock); * del_timer_sync(mytimer); - * while (base->running_timer == mytimer); + * while (timer_running(mytimer)); * * Now del_timer_sync() will never return and never release somelock. * The interrupt on the other CPU is waiting to grab somelock but @@ -1189,12 +1195,41 @@ static inline void __run_timers(struct tvec_base *base) cascade(base, &base->tv5, INDEX(3)); ++base->timer_jiffies; list_replace_init(base->tv1.vec + index, head); + +again: while (!list_empty(head)) { void (*fn)(unsigned long); unsigned long data; bool irqsafe; - timer = list_first_entry(head, struct timer_list,entry); + timer = list_first_entry(head, struct timer_list, entry); + + if (unlikely(timer_running(timer))) { + /* + * The only way to get here is if the handler, + * running on another base, re-queued itself on + * this base, and the handler hasn't finished + * yet. + */ + + if (list_is_last(&timer->entry, head)) { + /* + * Drop lock, so that TIMER_RUNNING can + * be cleared on another base. + */ + spin_unlock(&base->lock); + + while (timer_running(timer)) + cpu_relax(); + + spin_lock(&base->lock); + } else { + /* Rotate the list and try someone else */ + list_move_tail(&timer->entry, head); + } + goto again; + } + fn = timer->function; data = timer->data; irqsafe = tbase_get_irqsafe(timer->base); @@ -1202,6 +1237,7 @@ static inline void __run_timers(struct tvec_base *base) timer_stats_account_timer(timer); base->running_timer = timer; + timer_set_running(timer); detach_expired_timer(timer, base); if (irqsafe) { @@ -1213,6 +1249,25 @@ static inline void __run_timers(struct tvec_base *base) call_timer_fn(timer, fn, data); spin_lock_irq(&base->lock); } + + /* + * Handler running on this base, re-queued itself on + * another base ? + */ + if (unlikely(timer->base != base)) { + unsigned long flags; + struct tvec_base *tbase; + + spin_unlock(&base->lock); + + tbase = lock_timer_base(timer, &flags); + timer_clear_running(timer); + spin_unlock(&tbase->lock); + + spin_lock(&base->lock); + } else { + timer_clear_running(timer); + } } } base->running_timer = NULL;