mbox series

[V3,00/17] timers: Provide timer_shutdown[_sync]()

Message ID 20221123201306.823305113@linutronix.de
Headers show
Series timers: Provide timer_shutdown[_sync]() | expand

Message

Thomas Gleixner Nov. 23, 2022, 8:18 p.m. UTC
This is the third version of the timer shutdown work. The second version
can be found here:

  https://lore.kernel.org/all/20221122171312.191765396@linutronix.de

Tearing down timers can be tedious when there are circular dependencies to
other things which need to be torn down. A prime example is timer and
workqueue where the timer schedules work and the work arms the timer.

Steven and the Google Chromebook team ran into such an issue in the
Bluetooth HCI code.

Steven suggested to create a new function del_timer_free() which marks the
timer as shutdown. Rearm attempts of shutdown timers are discarded and he
wanted to emit a warning for that case:

   https://lore.kernel.org/all/20220407161745.7d6754b3@gandalf.local.home

This resulted in a lengthy discussion and suggestions how this should be
implemented. The patch series went through several iterations and during
the review of the last version it turned out that this approach is
suboptimal:

   https://lore.kernel.org/all/20221110064101.429013735@goodmis.org

The warning is not really helpful because it's entirely unclear how it
should be acted upon. The only way to address such a case is to add 'if
(in_shutdown)' conditionals all over the place. This is error prone and in
most cases of teardown like the HCI one which started this discussion not
required all.

What needs to prevented is that pending work which is drained via
destroy_workqueue() does not rearm the previously shutdown timer. Nothing
in that shutdown sequence relies on the timer being functional.

The conclusion was that the semantics of timer_shutdown_sync() should be:

    - timer is not enqueued
    - timer callback is not running
    - timer cannot be rearmed

Preventing the rearming of shutdown timers is done by discarding rearm
attempts silently.

As Steven is short of cycles, I made some spare cycles available and
reworked the patch series to follow the new semantics and plugged the races
which were discovered during review.

The patches have been split up into small pieces to make review easier and
I took the liberty to throw a bunch of overdue cleanups into the picture
instead of proliferating the existing state further.

The last patch in the series addresses the HCI teardown issue for real.

The series is also available from git:

   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git timers

Changes vs. V2:

  - Fixed up change logs and comments (Anna-Maria)

  - Picked up Reviewed-by tags

Thanks,

	tglx
---
 Documentation/RCU/Design/Requirements/Requirements.rst      |    2 
 Documentation/core-api/local_ops.rst                        |    2 
 Documentation/kernel-hacking/locking.rst                    |   17 
 Documentation/timers/hrtimers.rst                           |    2 
 Documentation/translations/it_IT/kernel-hacking/locking.rst |   14 
 Documentation/translations/zh_CN/core-api/local_ops.rst     |    2 
 arch/arm/mach-spear/time.c                                  |    8 
 drivers/bluetooth/hci_qca.c                                 |   10 
 drivers/char/tpm/tpm-dev-common.c                           |    4 
 drivers/clocksource/arm_arch_timer.c                        |   12 
 drivers/clocksource/timer-sp804.c                           |    6 
 drivers/staging/wlan-ng/hfa384x_usb.c                       |    4 
 drivers/staging/wlan-ng/prism2usb.c                         |    6 
 include/linux/timer.h                                       |   35 
 kernel/time/timer.c                                         |  425 +++++++++---
 net/sunrpc/xprt.c                                           |    2 
 16 files changed, 405 insertions(+), 146 deletions(-)

Comments

Anna-Maria Behnsen Nov. 24, 2022, 7:37 a.m. UTC | #1
On Wed, 23 Nov 2022, Thomas Gleixner wrote:

> Tearing down timers which have circular dependencies to other
> functionality, e.g. workqueues, where the timer can schedule work and work
> can arm timers, is not trivial.
> 
> In those cases it is desired to shutdown the timer in a way which prevents
> rearming of the timer. The mechanism to do so is to set timer->function to
> NULL and use this as an indicator for the timer arming functions to ignore
> the (re)arm request.
> 
> In preparation for that replace the warnings in the relevant code paths
> with checks for timer->function == NULL. If the pointer is NULLL, then

s/NULLL/NULL

> discard the rearm request silently.
> 
> Add debug_assert_init() instead of the WARN_ON_ONCE(!timer->function)
> checks so that debug objects can warn about non-initialized timers.
> 
> The warning of debug objects does warn if timer->function == NULL.  It

does NOT warn

> warns when timer was not initialized using timer_setup[_on_stack]() or via
> DEFINE_TIMER(). If developers fail to enable debug objects and then waste
> lots of time to figure out why their non-initialized timer is not firing,
> they deserve it. Same for initializing a timer with a NULL function.
> 
> Co-developed-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Tested-by: Guenter Roeck <linux@roeck-us.net>
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Link: https://lore.kernel.org/all/20220407161745.7d6754b3@gandalf.local.home
> Link: https://lore.kernel.org/all/20221110064101.429013735@goodmis.org
> ---
> V2: Use continue instead of return and amend the return value docs (Steven)
> V3: Changelog and comment updates (Anna-Maria)
> ---
>  kernel/time/timer.c |   57 +++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 52 insertions(+), 5 deletions(-)
> 
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1128,8 +1144,12 @@ static inline int
>   * mod_timer_pending() is the same for pending timers as mod_timer(), but
>   * will not activate inactive timers.
>   *
> + * If @timer->function == NULL then the start operation is silently
> + * discarded.
> + *
>   * Return:
> - * * %0 - The timer was inactive and not modified
> + * * %0 - The timer was inactive and not modified or was is in
> + *	  shutdown state and the operation was discarded

You forgot to update this "was is" mistake. All other places are fine.

>   * * %1 - The timer was active and requeued to expire at @expires
>   */
>  int mod_timer_pending(struct timer_list *timer, unsigned long expires)

Thanks,

	Anna-Maria
Thomas Gleixner Nov. 24, 2022, 8:18 a.m. UTC | #2
On Thu, Nov 24 2022 at 08:37, Anna-Maria Behnsen wrote:

> On Wed, 23 Nov 2022, Thomas Gleixner wrote:
>
>> Tearing down timers which have circular dependencies to other
>> functionality, e.g. workqueues, where the timer can schedule work and work
>> can arm timers, is not trivial.
>> 
>> In those cases it is desired to shutdown the timer in a way which prevents
>> rearming of the timer. The mechanism to do so is to set timer->function to
>> NULL and use this as an indicator for the timer arming functions to ignore
>> the (re)arm request.
>> 
>> In preparation for that replace the warnings in the relevant code paths
>> with checks for timer->function == NULL. If the pointer is NULLL, then
>
> s/NULLL/NULL

Bah. I should have went to the bar instead of trying to fix this.
Anna-Maria Behnsen Nov. 24, 2022, 2 p.m. UTC | #3
On Wed, 23 Nov 2022, Thomas Gleixner wrote:

> This is the third version of the timer shutdown work. The second version
> can be found here:
> 
>   https://lore.kernel.org/all/20221122171312.191765396@linutronix.de
> 
> Tearing down timers can be tedious when there are circular dependencies to
> other things which need to be torn down. A prime example is timer and
> workqueue where the timer schedules work and the work arms the timer.
> 
> Steven and the Google Chromebook team ran into such an issue in the
> Bluetooth HCI code.
> 
> Steven suggested to create a new function del_timer_free() which marks the
> timer as shutdown. Rearm attempts of shutdown timers are discarded and he
> wanted to emit a warning for that case:
> 
>    https://lore.kernel.org/all/20220407161745.7d6754b3@gandalf.local.home
> 
> This resulted in a lengthy discussion and suggestions how this should be
> implemented. The patch series went through several iterations and during
> the review of the last version it turned out that this approach is
> suboptimal:
> 
>    https://lore.kernel.org/all/20221110064101.429013735@goodmis.org
> 
> The warning is not really helpful because it's entirely unclear how it
> should be acted upon. The only way to address such a case is to add 'if
> (in_shutdown)' conditionals all over the place. This is error prone and in
> most cases of teardown like the HCI one which started this discussion not
> required all.
> 
> What needs to prevented is that pending work which is drained via
> destroy_workqueue() does not rearm the previously shutdown timer. Nothing
> in that shutdown sequence relies on the timer being functional.
> 
> The conclusion was that the semantics of timer_shutdown_sync() should be:
> 
>     - timer is not enqueued
>     - timer callback is not running
>     - timer cannot be rearmed
> 
> Preventing the rearming of shutdown timers is done by discarding rearm
> attempts silently.
> 
> As Steven is short of cycles, I made some spare cycles available and
> reworked the patch series to follow the new semantics and plugged the races
> which were discovered during review.
> 
> The patches have been split up into small pieces to make review easier and
> I took the liberty to throw a bunch of overdue cleanups into the picture
> instead of proliferating the existing state further.
> 
> The last patch in the series addresses the HCI teardown issue for real.
> 
> The series is also available from git:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git timers
> 

With fix of the last NITs:

Reviewed-by: Anna-Maria Behnsen <anna-maria@linutronix.de>

Thanks,
	Anna-Maria