mbox series

[v4,0/4] Extending NMI watchdog during LPM

Message ID 20220712143202.23144-1-ldufour@linux.ibm.com
Headers show
Series Extending NMI watchdog during LPM | expand

Message

Laurent Dufour July 12, 2022, 2:31 p.m. UTC
When a partition is transferred, once it arrives at the destination node,
the partition is active but much of its memory must be transferred from the
start node.

It depends on the activity in the partition, but the more CPU the partition
has, the more memory to be transferred is likely to be. This causes latency
when accessing pages that need to be transferred, and often, for large
partitions, it triggers the NMI watchdog.

The NMI watchdog causes the CPU stack to dump where it appears to be
stuck. In this case, it does not bring much information since it can happen
during any memory access of the kernel.

In addition, the NMI interrupt mechanism is not secure and can generate a
dump system in the event that the interruption is taken while MSR[RI]=0.

Depending on the LPAR size and load, it may be interesting to extend the
NMI watchdog timer during the LPM.

That's configurable through sysctl with the new introduced variable
(specific to powerpc) nmi_watchdog_factor. This value represents the
percentage added to watchdog_tresh to set the NMI watchdog timeout during a
LPM.

Changes in v4 (no functional changes in this version):
 - Patch 1/4 :fix typo and add a comment in pseries_migrate_partition()
 - Patch 3/4: rename new variables and functions as Nick requested.
 - Patch 4/4: rename the called new function

v2:
https://lore.kernel.org/linuxppc-dev/121217bb-6a34-8ccb-9819-f82806d6f47c@linux.ibm.com/

Laurent Dufour (4):
  powerpc/mobility: wait for memory transfer to complete
  watchdog: export lockup_detector_reconfigure
  powerpc/watchdog: introduce a NMI watchdog's factor
  pseries/mobility: set NMI watchdog factor during LPM

 Documentation/admin-guide/sysctl/kernel.rst | 12 +++
 arch/powerpc/include/asm/nmi.h              |  2 +
 arch/powerpc/kernel/watchdog.c              | 21 ++++-
 arch/powerpc/platforms/pseries/mobility.c   | 91 ++++++++++++++++++++-
 include/linux/nmi.h                         |  2 +
 kernel/watchdog.c                           | 21 +++--
 6 files changed, 141 insertions(+), 8 deletions(-)

Comments

Laurent Dufour July 13, 2022, 10:56 a.m. UTC | #1
Le 12/07/2022 à 18:25, Randy Dunlap a écrit :
> Hi--
> 
> On 7/12/22 07:32, Laurent Dufour wrote:
>> During a LPM, while the memory transfer is in progress on the arrival side,
>> some latencies is generated when accessing not yet transferred pages on the
> 
>                  are
> 
>> arrival side. Thus, the NMI watchdog may be triggered too frequently, which
>> increases the risk to hit a NMI interrupt in a bad place in the kernel,
> 
>                             an NMI
> 
>> leading to a kernel panic.
>>
>> Disabling the Hard Lockup Watchdog until the memory transfer could be a too
>> strong work around, some users would want this timeout to be eventually
>> triggered if the system is hanging even during LPM.
>>
>> Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply
>> a factor to the NMI watchdog timeout during a LPM. Just before the CPU are
> 
>                                               an LPM.            the CPU is
> 
>> stopped for the switchover sequence, the NMI watchdog timer is set to
>>  watchdog_tresh + factor%
> 
>    watchdog_thresh
> 
>>
>> A value of 0 has no effect. The default value is 200, meaning that the NMI
>> watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value).
> 
>                                                     watchdog_thresh
> 
>> Once the memory transfer is achieved, the factor is reset to 0.
>>
>> Setting this value to a high number is like disabling the NMI watchdog
>> during a LPM.
> 
>          an LPM.
> 
>>
>> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
>> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
>> ---
>>  Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++
>>  arch/powerpc/platforms/pseries/mobility.c   | 43 +++++++++++++++++++++
>>  2 files changed, 55 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
>> index ddccd1077462..0bb0b7f27e96 100644
>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>> @@ -592,6 +592,18 @@ to the guest kernel command line (see
>>  Documentation/admin-guide/kernel-parameters.rst).
>>  
> 
> This entire block should be in kernel-parameters.txt, not .rst,
> and it should be formatted like everything else in the .txt file.

Thanks for reviewing this patch.

I'll apply your requests in the next version.

However, regarding the change in kernel-parameters.txt, I'm confused. The
newly introduced parameter is only exposed through sysctl. Not as a kernel
boot option. In that case, should it be mentioned in kernel-parameters.txt?

Documentation/process/4.Coding.rst says:
The file :ref:`Documentation/admin-guide/kernel-parameters.rst
<kernelparameters>` describes all of the kernel's boot-time parameters.
Any patch which adds new parameters should add the appropriate entries to
this file.

And Documentation/process/submit-checklist.rst says:
16) All new kernel boot parameters are documented in
    ``Documentation/admin-guide/kernel-parameters.rst``.

What are the rules about editing .txt or .rst files?

>>  
>> +nmi_watchdog_factor (PPC only)
>> +==================================
>> +
>> +Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is
> 
>    Factor to apply to the NMI
> 
>> +set to 1). This factor represents the percentage added to
>> +``watchdog_thresh`` when calculating the NMI watchdog timeout during a
> 
>                                                                  during an
> 
>> +LPM. The soft lockup timeout is not impacted.
>> +
>> +A value of 0 means no change. The default value is 200 meaning the NMI
>> +watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
>> +
>> +
>>  numa_balancing
>>  ==============
>>  
> 
>
Randy Dunlap July 13, 2022, 2:42 p.m. UTC | #2
Hi,

On 7/13/22 03:56, Laurent Dufour wrote:
> Le 12/07/2022 à 18:25, Randy Dunlap a écrit :
>> Hi--
>>
>> On 7/12/22 07:32, Laurent Dufour wrote:

>>>
>>> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
>>> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
>>> ---
>>>  Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++
>>>  arch/powerpc/platforms/pseries/mobility.c   | 43 +++++++++++++++++++++
>>>  2 files changed, 55 insertions(+)
>>>
>>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
>>> index ddccd1077462..0bb0b7f27e96 100644
>>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>>> @@ -592,6 +592,18 @@ to the guest kernel command line (see
>>>  Documentation/admin-guide/kernel-parameters.rst).
>>>  
>>
>> This entire block should be in kernel-parameters.txt, not .rst,
>> and it should be formatted like everything else in the .txt file.

My apologies. I misread the file name.
I don't see a problem with this part of the patch or its location.

> Thanks for reviewing this patch.
> 
> I'll apply your requests in the next version.
> 
> However, regarding the change in kernel-parameters.txt, I'm confused. The
> newly introduced parameter is only exposed through sysctl. Not as a kernel
> boot option. In that case, should it be mentioned in kernel-parameters.txt?
> 
> Documentation/process/4.Coding.rst says:
> The file :ref:`Documentation/admin-guide/kernel-parameters.rst
> <kernelparameters>` describes all of the kernel's boot-time parameters.
> Any patch which adds new parameters should add the appropriate entries to
> this file.
> 
> And Documentation/process/submit-checklist.rst says:
> 16) All new kernel boot parameters are documented in
>     ``Documentation/admin-guide/kernel-parameters.rst``.
> 
> What are the rules about editing .txt or .rst files?

Yeah, that's a little confusing.
kernel-parameters.txt in included in kernel-parameters.rst when
'make htmldocs' is run, so the produced output looks like it is from
the .rst file.

Kernel boot parameters should be added to the .txt file.
The .rst file is just intro material.

Thanks.