Message ID | 20190722103330.255312-1-marc.zyngier@arm.com |
---|---|
Headers | show |
Series | arm64: Allow early timestamping of kernel log | expand |
On Mon, Jul 22, 2019 at 3:33 AM Marc Zyngier <marc.zyngier@arm.com> wrote: > > So far, we've let the arm64 kernel start its meaningful time stamping > of the kernel log pretty late, which is caused by sched_clock() being > initialised rather late compared to other architectures. > > Pavel Tatashin proposed[1] to move the initialisation of sched_clock > much earlier, which I had objections to. The reason for initialising > sched_clock late is that a number of systems have broken counters, and > we need to apply all kind of terrifying workarounds to avoid time > going backward on the affected platforms. Being able to identify the > right workaround comes pretty late in the kernel boot, and providing > an unreliable sched_clock, even for a short period of time, isn't an > appealing prospect. > > To address this, I'm proposing that we allow an architecture to chose > to (1) divorce time stamping and sched_clock during the early phase of > booting, and (2) inherit the time stamping clock as the new epoch the > first time a sched_sched clock gets registered. Could we have a stable clock config for arm64: if it is known that this Linux build is going to run on non-broken firmware, and with a known stable cntvct_el0 register it can be optionally configured to use that stable sched_clock instead of generic clock that arm64 is using? This way, the early printk are going to be available on those systems without adding a different method for printk's only. It would also mean that other users of early sched_clock() such as ftrace could benefit from it. > > (1) would allow arm64 to provide a time stamping clock, however > unreliable it might be, while (2) would allow sched_clock to provide > time stamps that are continuous with the time-stamping clock. > > The last patch in the series adds the necessary logic to arm64, > allowing the (potentially unreliable) time stamping of early kernel > messages. > > Tested on a bunch of arm64 systems, both bare-metal and in VMs. Boot > tested on a x86 guest. > > [1] https://lore.kernel.org/patchwork/patch/1015110/ > > Marc Zyngier (3): > printk: Allow architecture-specific timestamping function > sched/clock: Allow sched_clock to inherit timestamp_clock epoch > arm64: Allow early time stamping > > arch/arm64/Kconfig | 3 +++ > arch/arm64/kernel/setup.c | 44 +++++++++++++++++++++++++++++++++++++ > include/linux/sched/clock.h | 13 +++++++++++ > kernel/printk/printk.c | 4 ++-- > kernel/time/sched_clock.c | 10 +++++++++ > 5 files changed, 72 insertions(+), 2 deletions(-) > > -- > 2.20.1 >
On 2019/7/22 18:33, Marc Zyngier wrote: > So far, we've let the arm64 kernel start its meaningful time stamping > of the kernel log pretty late, which is caused by sched_clock() being > initialised rather late compared to other architectures. > > Pavel Tatashin proposed[1] to move the initialisation of sched_clock > much earlier, which I had objections to. The reason for initialising > sched_clock late is that a number of systems have broken counters, and > we need to apply all kind of terrifying workarounds to avoid time > going backward on the affected platforms. Being able to identify the > right workaround comes pretty late in the kernel boot, and providing > an unreliable sched_clock, even for a short period of time, isn't an > appealing prospect. > > To address this, I'm proposing that we allow an architecture to chose > to (1) divorce time stamping and sched_clock during the early phase of > booting, and (2) inherit the time stamping clock as the new epoch the > first time a sched_sched clock gets registered. > > (1) would allow arm64 to provide a time stamping clock, however > unreliable it might be, while (2) would allow sched_clock to provide > time stamps that are continuous with the time-stamping clock. > > The last patch in the series adds the necessary logic to arm64, > allowing the (potentially unreliable) time stamping of early kernel > messages. > > Tested on a bunch of arm64 systems, both bare-metal and in VMs. Boot > tested on a x86 guest. This makes the boot log more useful and I can debug some time consuming issue easier before the arch timer initialization, tested on my ARM64 server, I can see the timestamping from the start [1], Tested-by: Hanjun Guo <guohanjun@huawei.com> Thanks Hanjun [1]: [ 0.000000] Booting Linux on physical CPU 0x0000080000 [0x481fd010] [ 0.000000] Linux version 5.2.0+ (root@localhost.localdomain) (gcc version 9.0.1 20190312 (Red Hat 9.0.1-0.10) (GCC)) #45 SMP Tue Jul 23 09:17:48 CST 2019 [ 0.000000] Using timestamp clock @100MHz [ 0.000074] efi: Getting EFI parameters from FDT: [ 0.000082] efi: EFI v2.70 by EDK II [ 0.000083] efi: ACPI 2.0=0x3a300000 SMBIOS 3.0=0x39f80000 MEMATTR=0x30996018 MEMRESERVE=0x30997e18 [ 0.000122] crashkernel reserved: 0x000000000ba00000 - 0x000000002ba00000 (512 MB) [ 0.000126] cma: Failed to reserve 512 MiB [ 0.185111] ACPI: Early table checksum verification disabled [ 0.185115] ACPI: RSDP 0x000000003A300000 000024 (v02 HISI ) [ 0.185120] ACPI: XSDT 0x000000003A270000 00009C (v01 HISI HIP08 00000000 01000013) [ 0.185127] ACPI: FACP 0x0000000039B10000 000114 (v06 HISI HIP08 00000000 HISI 20151124) [ 0.185134] ACPI: DSDT 0x0000000039AB0000 0084E4 (v02 HISI HIP08 00000000 INTL 20181213) [ 0.185139] ACPI: PCCT 0x0000000039FB0000 00008A (v01 HISI HIP08 00000000 HISI 20151124) [ 0.185143] ACPI: SSDT 0x0000000039F90000 01021A (v02 HISI HIP07 00000000 INTL 20181213) [ 0.185147] ACPI: BERT 0x0000000039F50000 000030 (v01 HISI HIP08 00000000 HISI 20151124) [ 0.185150] ACPI: HEST 0x0000000039F30000 000308 (v01 HISI HIP08 00000000 HISI 20151124) [ 0.185154] ACPI: ERST 0x0000000039EF0000 000230 (v01 HISI HIP08 00000000 HISI 20151124) [ 0.185158] ACPI: EINJ 0x0000000039EE0000 000170 (v01 HISI HIP08 00000000 HISI 20151124) [ 0.185162] ACPI: SLIT 0x0000000039B30000 00003C (v01 HISI HIP08 00000000 HISI 20151124) [ 0.185166] ACPI: GTDT 0x0000000039B00000 00007C (v02 HISI HIP08 00000000 HISI 20151124) [ 0.185169] ACPI: MCFG 0x0000000039AF0000 00003C (v01 HISI HIP08 00000000 HISI 20151124) [ 0.185173] ACPI: SPCR 0x0000000039AE0000 000050 (v02 HISI HIP08 00000000 HISI 20151124) [ 0.185177] ACPI: SRAT 0x0000000039AD0000 0007D0 (v03 HISI HIP08 00000000 HISI 20151124) [ 0.185181] ACPI: APIC 0x0000000039AC0000 001E6C (v04 HISI HIP08 00000000 HISI 20151124) [ 0.185185] ACPI: IORT 0x0000000039AA0000 001310 (v00 HISI HIP08 00000000 INTL 20181213) [ 0.185189] ACPI: PPTT 0x0000000030970000 0031B0 (v01 HISI HIP08 00000000 HISI 20151124) [ 0.185196] ACPI: SPCR: console: pl011,mmio32,0x94080000,115200 [ 0.185208] ACPI: SRAT: Node 0 PXM 0 [mem 0x2080000000-0x2fffffffff] [ 0.185210] ACPI: SRAT: Node 1 PXM 1 [mem 0x3000000000-0x3fffffffff] [ 0.185212] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff] [ 0.185213] ACPI: SRAT: Node 2 PXM 2 [mem 0x202000000000-0x202fffffffff] [ 0.185215] ACPI: SRAT: Node 3 PXM 3 [mem 0x203000000000-0x203fffffffff] [ 0.185221] NUMA: NODE_DATA [mem 0x2fffffe3c0-0x2fffffffff] [ 0.185224] NUMA: NODE_DATA [mem 0x3fffffe3c0-0x3fffffffff] [ 0.185226] NUMA: NODE_DATA [mem 0x202fffffe3c0-0x202fffffffff] [ 0.185229] NUMA: NODE_DATA [mem 0x203ffdfde3c0-0x203ffdfdffff]
On Mon, 22 Jul 2019 21:52:42 +0100, Pavel Tatashin <pasha.tatashin@soleen.com> wrote: > > On Mon, Jul 22, 2019 at 3:33 AM Marc Zyngier <marc.zyngier@arm.com> wrote: > > > > So far, we've let the arm64 kernel start its meaningful time stamping > > of the kernel log pretty late, which is caused by sched_clock() being > > initialised rather late compared to other architectures. > > > > Pavel Tatashin proposed[1] to move the initialisation of sched_clock > > much earlier, which I had objections to. The reason for initialising > > sched_clock late is that a number of systems have broken counters, and > > we need to apply all kind of terrifying workarounds to avoid time > > going backward on the affected platforms. Being able to identify the > > right workaround comes pretty late in the kernel boot, and providing > > an unreliable sched_clock, even for a short period of time, isn't an > > appealing prospect. > > > > To address this, I'm proposing that we allow an architecture to chose > > to (1) divorce time stamping and sched_clock during the early phase of > > booting, and (2) inherit the time stamping clock as the new epoch the > > first time a sched_sched clock gets registered. > > Could we have a stable clock config for arm64: if it is known that > this Linux build is going to run on non-broken firmware, and with a > known stable cntvct_el0 register it can be optionally configured to > use that stable sched_clock instead of generic clock that arm64 is > using? Hmmm. Then I guess a prerequisite is this patch: https://lwn.net/Articles/490040/ as the number of systems this will reliably run on is a close approximation to zero. Yes, counting is hard... More seriously, we've been there before (cue the 32bit ARM port 8 years ago), and really don't want to go back to a time where we had multiple config options for everything. There is one kernel, which should run *reliably* on everything. > This way, the early printk are going to be available on those > systems without adding a different method for printk's only. It would > also mean that other users of early sched_clock() such as ftrace could > benefit from it. See above. But maybe the real thing to do is to allow local_clock() to be overridden. Same effect, same complexity, just hidden away from any given subsystem. Thanks, M. -- Jazz is not dead, it just smells funny.
On Tue, Jul 23, 2019 at 3:17 AM Marc Zyngier <marc.zyngier@arm.com> wrote: > > On Mon, 22 Jul 2019 21:52:42 +0100, > Pavel Tatashin <pasha.tatashin@soleen.com> wrote: > > > > On Mon, Jul 22, 2019 at 3:33 AM Marc Zyngier <marc.zyngier@arm.com> wrote: > > > > > > So far, we've let the arm64 kernel start its meaningful time stamping > > > of the kernel log pretty late, which is caused by sched_clock() being > > > initialised rather late compared to other architectures. > > > > > > Pavel Tatashin proposed[1] to move the initialisation of sched_clock > > > much earlier, which I had objections to. The reason for initialising > > > sched_clock late is that a number of systems have broken counters, and > > > we need to apply all kind of terrifying workarounds to avoid time > > > going backward on the affected platforms. Being able to identify the > > > right workaround comes pretty late in the kernel boot, and providing > > > an unreliable sched_clock, even for a short period of time, isn't an > > > appealing prospect. > > > > > > To address this, I'm proposing that we allow an architecture to chose > > > to (1) divorce time stamping and sched_clock during the early phase of > > > booting, and (2) inherit the time stamping clock as the new epoch the > > > first time a sched_sched clock gets registered. Hi Marc, I know we briefly discussed this at plumbers, but I want to bring it up again, because I am still puzzled why it is not possible to stabilize unstable clock early in boot. Here is an example where clock is stabilized: https://soleen.com/source/xref/linux/kernel/sched/clock.c?r=457c8996#265 It uses a value that is read at last ticks to normalize clock, and because ticks are not available early in boot instead we can make sure that early in boot sched_clock() never returns value smaller than previously returned value, and if we want to be extra careful, we can also make sure that sched_clock() early in boot does not jump ahead by more than some fixed amount of time i.e. more than one hour. If sched_clock() is available early we will get the benefit of having other tracers that use it to debug early boot information. Pasha