diff mbox

[RFC] vmstat: Avoid waking up idle-cpu to service shepherd work

Message ID 55169723.3070006@linaro.org
State New
Headers show

Commit Message

Viresh Kumar March 28, 2015, 11:57 a.m. UTC
On 28 March 2015 at 15:23, Peter Zijlstra <peterz@infradead.org> wrote:

> Well, for one your patch is indeed disgusting.

Yeah, I agree :)

> But yes I'm aware Thomas
> wants to rewrite the timer thing. But Thomas is away for a little while
> and if this really needs to happen then it does.

Sometime back I was trying to use another bit from base pointer for
marking a timer as PINNED:

------------x--------------------x----------------------

Right?


Now there are few issues I see here (Sorry if they are all imaginary):
- In case a timer re-arms itself from its handler and is migrated from CPU A to B, what
  happens if the re-armed timer fires before the first handler finishes ? i.e. timer->fn()
  hasn't finished running on CPU A and it has fired again on CPU B. Wouldn't this expose
  us to a lot of other problems? It wouldn't be serialized to itself anymore ?

- Because the timer has migrated to another CPU, the locking in __run_timers()
  needs to be fixed. And that will make it complicated ..

  - __run_timer() doesn't lock bases of other CPUs, and it has to do it now..
  - We probably need to take locks of both local CPU and the one to which timer migrated.

- Its possible now that there can be more than one running timer for a base, which wasn't
  true earlier. Not sure if it will break something.


Thanks for your continuous support to reply to my (sometimes stupid) queries.

--
viresh

[1] https://lists.01.org/pipermail/kbuild-all/2014-April/003982.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Comments

Viresh Kumar March 28, 2015, 12:04 p.m. UTC | #1
On 28 March 2015 at 17:27, viresh kumar <viresh.kumar@linaro.org> wrote:
> On 28 March 2015 at 15:23, Peter Zijlstra <peterz@infradead.org> wrote:
>
>> Well, for one your patch is indeed disgusting.
>
> Yeah, I agree :)

Sigh..

Sorry for the series of *nonsense* mails before the last one.

Its some thunderbird *BUG* which did that, I was accessing my
mail from both gmail's interface and thunderbird and somehow
this happened. I have hit the send button only once.

Really sorry for the noise.

(The last mail has few inquiries towards the end and a thanks note,
just to identify it..)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Viresh Kumar March 30, 2015, 12:02 p.m. UTC | #2
On 29 March 2015 at 15:54, Peter Zijlstra <peterz@infradead.org> wrote:
> On Sat, Mar 28, 2015 at 02:44:57PM +0100, Peter Zijlstra wrote:
>> > Now there are few issues I see here (Sorry if they are all imaginary):
>> > - In case a timer re-arms itself from its handler and is migrated from CPU A to B, what
>> >   happens if the re-armed timer fires before the first handler finishes ? i.e. timer->fn()
>> >   hasn't finished running on CPU A and it has fired again on CPU B. Wouldn't this expose
>> >   us to a lot of other problems? It wouldn't be serialized to itself anymore ?
>>
>> What I said above.
>
> What I didn't say, but had thought of is that __run_timer() should skip
> any timer that has RUNNING set -- for obvious reasons :-)

Below is copied from your first reply, and so you probably already
said that ? :)

> Also, once you have tbase_running, we can take base->running_timer out
> altogether.

I wanted to clarify if I understood it correctly..

Are you saying that:

Case 1.) if we find tbase_running on cpuY (because it was rearmed
from its handler on cpuX and has got migrated to cpuY), then we should drop the
timer from the list without calling its handler (as that is already
running in parallel) ?

Or

Case 2.) we keep retrying for it, until the time the other handler finishes?


I have few queries for both the cases.

Case 1.) Will that be fair to the timer user as the timer may get lost
completely.
If we skip the timer on cpuY here, it wouldn't be enqueued again and
so will be lost.

Case 2.) We kept waiting for the first handler to finish ..
- cpuY may waste some cycles as it kept waiting for handler to finish on cpuX ..
- We may need to perform base unlock/lock on cpuY, so that cpuX can take cpuY's
lock to reset tbase_running. And that might be racy, not sure.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Viresh Kumar March 30, 2015, 1:14 p.m. UTC | #3
On 30 March 2015 at 18:17, Peter Zijlstra <peterz@infradead.org> wrote:
> No, I means something else with that. We can remove the
> tvec_base::running_timer field. Everything that uses that can use
> tbase_running() AFAICT.

Okay, there is one instance which still needs it.

migrate_timers():

        BUG_ON(old_base->running_timer);

What I wasn't sure about it is if we get can drop this statement or not.
If we decide not to drop it, then we can convert running_timer into a bool.

> Drop yes, racy not so much I think.
>
>
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 2d3f5c504939..1394f9540348 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1189,12 +1189,39 @@ static inline void __run_timers(struct tvec_base *base)
>                         cascade(base, &base->tv5, INDEX(3));
>                 ++base->timer_jiffies;
>                 list_replace_init(base->tv1.vec + index, head);
> +
> +again:
>                 while (!list_empty(head)) {
>                         void (*fn)(unsigned long);
>                         unsigned long data;
>                         bool irqsafe;
>
> -                       timer = list_first_entry(head, struct timer_list,entry);
> +                       timer = list_first_entry(head, struct timer_list, entry);
> +                       if (unlikely(tbase_running(timer))) {
> +                               /* Only one timer on the list, force wait. */
> +                               if (unlikely(head->next == head->prev)) {
> +                                       spin_unlock(&base->lock);
> +
> +                                       /*
> +                                        * The only way to get here is if the
> +                                        * handler requeued itself on another
> +                                        * base, this guarantees the timer will
> +                                        * not go away.
> +                                        */
> +                                       while (tbase_running(timer))
> +                                               cpu_relax();
> +
> +                                       spin_lock(&base->lock);
> +                               } else  {
> +                                       /*
> +                                        * Otherwise, rotate the list and try
> +                                        * someone else.
> +                                        */
> +                                       list_move_tail(&timer->entry, head);
> +                               }
> +                               goto again;
> +                       }
> +
>                         fn = timer->function;
>                         data = timer->data;
>                         irqsafe = tbase_get_irqsafe(timer->base);

Yeah, so I have written something similar only. Wasn't sure about what you wrote
earlier. Thanks for the clarification.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197e1587..e7184f57449c 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -67,8 +67,9 @@  extern struct tvec_base boot_tvec_bases;
  */
 #define TIMER_DEFERRABLE               0x1LU
 #define TIMER_IRQSAFE                  0x2LU
+#define TIMER_PINNED                   0x4LU
 -#define TIMER_FLAG_MASK                        0x3LU
+#define TIMER_FLAG_MASK                        0x7LU


And Fenguang's build-bot showed the problem (only) on blackfin [1].

        config: make ARCH=blackfin allyesconfig

        All error/warnings:

           kernel/timer.c: In function 'init_timers':
        >> kernel/timer.c:1683:2: error: call to '__compiletime_assert_1683'
        >> declared with attribute error: BUILD_BUG_ON failed:
        >> __alignof__(struct tvec_base) & TIMER_FLAG_MASK


So probably we need to make 'base' aligned to 8 bytes ?



So, what you are suggesting is something like this (untested):

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197e1587..68bf09d69352 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -67,8 +67,9 @@  extern struct tvec_base boot_tvec_bases;
  */
 #define TIMER_DEFERRABLE               0x1LU
 #define TIMER_IRQSAFE                  0x2LU
+#define TIMER_RUNNING                  0x4LU

-#define TIMER_FLAG_MASK                        0x3LU
+#define TIMER_FLAG_MASK                        0x7LU

 #define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \
                .entry = { .prev = TIMER_ENTRY_STATIC },        \
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c504939..8f9efa64bd34 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -105,6 +105,21 @@  static inline unsigned int tbase_get_irqsafe(struct tvec_base *base)
        return ((unsigned int)(unsigned long)base & TIMER_IRQSAFE);
 }

+static inline unsigned int tbase_get_running(struct tvec_base *base)
+{
+       return ((unsigned int)(unsigned long)base & TIMER_RUNNING);
+}
+
+static inline unsigned int tbase_set_running(struct tvec_base *base)
+{
+       return ((unsigned int)(unsigned long)base | TIMER_RUNNING);
+}
+
+static inline unsigned int tbase_clear_running(struct tvec_base *base)
+{
+       return ((unsigned int)(unsigned long)base & ~TIMER_RUNNING);
+}
+
 static inline struct tvec_base *tbase_get_base(struct tvec_base *base)
 {
        return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK));
@@ -781,21 +796,12 @@  __mod_timer(struct timer_list *timer, unsigned long expires,
        new_base = per_cpu(tvec_bases, cpu);

        if (base != new_base) {
-               /*
-                * We are trying to schedule the timer on the local CPU.
-                * However we can't change timer's base while it is running,
-                * otherwise del_timer_sync() can't detect that the timer's
-                * handler yet has not finished. This also guarantees that
-                * the timer is serialized wrt itself.
-                */
-               if (likely(base->running_timer != timer)) {
-                       /* See the comment in lock_timer_base() */
-                       timer_set_base(timer, NULL);
-                       spin_unlock(&base->lock);
-                       base = new_base;
-                       spin_lock(&base->lock);
-                       timer_set_base(timer, base);
-               }
+               /* See the comment in lock_timer_base() */
+               timer_set_base(timer, NULL);
+               spin_unlock(&base->lock);
+               base = new_base;
+               spin_lock(&base->lock);
+               timer_set_base(timer, base);
        }

        timer->expires = expires;
@@ -1016,7 +1022,7 @@  int try_to_del_timer_sync(struct timer_list *timer)

        base = lock_timer_base(timer, &flags);

-       if (base->running_timer != timer) {
+       if (tbase_get_running(timer->base)) {
                timer_stats_timer_clear_start_info(timer);
                ret = detach_if_pending(timer, base, true);
        }
@@ -1202,6 +1208,7 @@  static inline void __run_timers(struct tvec_base *base)
                        timer_stats_account_timer(timer);

                        base->running_timer = timer;
+                       tbase_set_running(timer->base);
                        detach_expired_timer(timer, base);

                        if (irqsafe) {
@@ -1216,6 +1223,7 @@  static inline void __run_timers(struct tvec_base *base)
                }
        }
        base->running_timer = NULL;
+       tbase_clear_running(timer->base);
        spin_unlock_irq(&base->lock);
 }