Message ID | 20150904144241.GJ29194@linux |
---|---|
State | New |
Headers | show |
On 2015.09.04 07:43 Viresh Kumar wrote: > On 04-09-15, 16:59, Rafael J. Wysocki wrote: >> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: >>> As of, or about, Kernel 4.2RC1 if I take my highest numbered >>> CPU offline (7 in my case), the system will not suspend. > I wanted to give him some patch to debug it a bit more, but couldn't > do that whole day. > @Doug: Can you please enable DEBUG for cpufreq with this: > > diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile > index 9fde14544ead..c09945aa7f17 100644 > --- a/drivers/cpufreq/Makefile > +++ b/drivers/cpufreq/Makefile > @@ -1,3 +1,4 @@ > +subdir-ccflags-y := -DDEBUG > # CPUfreq core > obj-$(CONFIG_CPU_FREQ) += cpufreq.o freq_table.o > > > And give us the outputs of both successful and unsuccessful logs? Edited /var/log/kern.log attached (might get stripped for on-list e-mail deliveries) > + the values of both affected_cpus and related_cpus fields for all > CPUs. Step 1: CPU 6 offline (sudo pm-suspend works): root@s15:/home/doug# echo -n 0 > /sys/devices/system/cpu/cpu6/online root@s15:/home/doug# cat /sys/devices/system/cpu/cpu*/online 1 1 1 1 1 0 1 root@s15:/sys/devices/system/cpu# cat cpu?/cpufreq/affected_cpus 0 1 2 3 4 5 7 root@s15:/sys/devices/system/cpu# cat cpu?/cpufreq/related_cpus 0 1 2 3 4 5 6 7 Step 2: CPU 7 offline (sudo pm-suspend does not work): root@s15:/sys/devices/system/cpu# cat /sys/devices/system/cpu/cpu*/online 1 1 1 1 1 1 0 root@s15:/sys/devices/system/cpu# cat cpu?/cpufreq/affected_cpus 0 1 2 3 4 5 6 root@s15:/sys/devices/system/cpu# cat cpu?/cpufreq/related_cpus 0 1 2 3 4 5 6 7 >>>>>> Smythies 2015.09.04 Edited /var/log/kern.log file for Virseh, with Makefile modified. >>>>>> Linux s15 4.2.0viresh #46 SMP Fri Sep 4 09:00:41 PDT 2015 x86_64 x86_64 x86_64 GNU/Linux >>>>>> Take CPU 6 offline. Sep 4 09:21:02 s15 kernel: [ 145.256813] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 6 Sep 4 09:21:02 s15 kernel: [ 145.256819] intel_pstate: CPU 6 exiting Sep 4 09:21:02 s15 kernel: [ 145.274158] smpboot: CPU 6 is now offline >>>>>> Do a "sudo pm-suspend" that will work properly. Subsequently turn computer on again. Sep 4 09:26:11 s15 kernel: [ 454.524941] cpufreq: setting new policy for CPU 0: 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.524948] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.524949] cpufreq: setting range Sep 4 09:26:11 s15 kernel: [ 454.526917] cpufreq: setting new policy for CPU 1: 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.526923] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.526924] cpufreq: setting range Sep 4 09:26:11 s15 kernel: [ 454.528952] cpufreq: setting new policy for CPU 2: 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.528958] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.528959] cpufreq: setting range Sep 4 09:26:11 s15 kernel: [ 454.530887] cpufreq: setting new policy for CPU 3: 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.530892] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.530893] cpufreq: setting range Sep 4 09:26:11 s15 kernel: [ 454.532526] cpufreq: setting new policy for CPU 4: 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.532531] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.532533] cpufreq: setting range Sep 4 09:26:11 s15 kernel: [ 454.534520] cpufreq: setting new policy for CPU 5: 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.534525] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.534527] cpufreq: setting range Sep 4 09:26:11 s15 kernel: [ 454.537764] cpufreq: setting new policy for CPU 7: 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.537766] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.537767] cpufreq: setting range Sep 4 09:26:28 s15 kernel: [ 454.675889] PM: Syncing filesystems ... done. Sep 4 09:26:28 s15 kernel: [ 454.782119] PM: Preparing system for sleep (mem) Sep 4 09:26:28 s15 kernel: [ 454.782272] Freezing user space processes ... (elapsed 0.001 seconds) done. Sep 4 09:26:28 s15 kernel: [ 454.783462] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. Sep 4 09:26:28 s15 kernel: [ 454.784612] PM: Suspending system (mem) Sep 4 09:26:28 s15 kernel: [ 454.784628] Suspending console(s) (use no_console_suspend to debug) ... deleted some lines ... Sep 4 09:26:28 s15 kernel: [ 456.490899] PM: suspend of devices complete after 1707.250 msecs Sep 4 09:26:28 s15 kernel: [ 456.506867] PM: late suspend of devices complete after 15.976 msecs Sep 4 09:26:28 s15 kernel: [ 456.507316] pcieport 0000:00:01.0: System wakeup enabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.507328] xhci_hcd 0000:07:00.0: System wakeup enabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.507374] ehci-pci 0000:00:1d.0: System wakeup enabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.507375] r8169 0000:03:00.0: System wakeup enabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.507527] ehci-pci 0000:00:1a.0: System wakeup enabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.522894] PM: noirq suspend of devices complete after 16.036 msecs Sep 4 09:26:28 s15 kernel: [ 456.523192] ACPI: Preparing to enter system sleep state S3 Sep 4 09:26:28 s15 kernel: [ 456.523380] PM: Saving platform NVS memory Sep 4 09:26:28 s15 kernel: [ 456.523389] Disabling non-boot CPUs ... Sep 4 09:26:28 s15 kernel: [ 456.523415] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 1 Sep 4 09:26:28 s15 kernel: [ 456.523416] intel_pstate: CPU 1 exiting Sep 4 09:26:28 s15 kernel: [ 456.524616] smpboot: CPU 1 is now offline Sep 4 09:26:28 s15 kernel: [ 456.547041] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 2 Sep 4 09:26:28 s15 kernel: [ 456.547043] intel_pstate: CPU 2 exiting Sep 4 09:26:28 s15 kernel: [ 456.547173] Broke affinity for irq 19 Sep 4 09:26:28 s15 kernel: [ 456.548215] smpboot: CPU 2 is now offline Sep 4 09:26:28 s15 kernel: [ 456.567001] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 3 Sep 4 09:26:28 s15 kernel: [ 456.567003] intel_pstate: CPU 3 exiting Sep 4 09:26:28 s15 kernel: [ 456.567129] Broke affinity for irq 19 Sep 4 09:26:28 s15 kernel: [ 456.567163] Broke affinity for irq 33 Sep 4 09:26:28 s15 kernel: [ 456.568172] smpboot: CPU 3 is now offline Sep 4 09:26:28 s15 kernel: [ 456.582960] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 4 Sep 4 09:26:28 s15 kernel: [ 456.582961] intel_pstate: CPU 4 exiting Sep 4 09:26:28 s15 kernel: [ 456.583069] Broke affinity for irq 19 Sep 4 09:26:28 s15 kernel: [ 456.583104] Broke affinity for irq 33 Sep 4 09:26:28 s15 kernel: [ 456.584113] smpboot: CPU 4 is now offline Sep 4 09:26:28 s15 kernel: [ 456.598918] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 5 Sep 4 09:26:28 s15 kernel: [ 456.598920] intel_pstate: CPU 5 exiting Sep 4 09:26:28 s15 kernel: [ 456.599016] Broke affinity for irq 19 Sep 4 09:26:28 s15 kernel: [ 456.599023] Broke affinity for irq 25 Sep 4 09:26:28 s15 kernel: [ 456.599049] Broke affinity for irq 33 Sep 4 09:26:28 s15 kernel: [ 456.600058] smpboot: CPU 5 is now offline Sep 4 09:26:28 s15 kernel: [ 456.614896] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 7 Sep 4 09:26:28 s15 kernel: [ 456.614898] intel_pstate: CPU 7 exiting Sep 4 09:26:28 s15 kernel: [ 456.614988] Broke affinity for irq 19 Sep 4 09:26:28 s15 kernel: [ 456.614991] Broke affinity for irq 23 Sep 4 09:26:28 s15 kernel: [ 456.614995] Broke affinity for irq 25 Sep 4 09:26:28 s15 kernel: [ 456.615021] Broke affinity for irq 33 Sep 4 09:26:28 s15 kernel: [ 456.616030] smpboot: CPU 7 is now offline Sep 4 09:26:28 s15 kernel: [ 456.631985] ACPI: Low-level resume complete Sep 4 09:26:28 s15 kernel: [ 456.632020] PM: Restoring platform NVS memory Sep 4 09:26:28 s15 kernel: [ 456.632338] Enabling non-boot CPUs ... Sep 4 09:26:28 s15 kernel: [ 456.632374] x86: Booting SMP configuration: Sep 4 09:26:28 s15 kernel: [ 456.632374] smpboot: Booting Node 0 Processor 1 APIC 0x2 Sep 4 09:26:28 s15 kernel: [ 456.644260] cache: parent cpu1 should not be sleeping Sep 4 09:26:28 s15 kernel: [ 456.644318] cpufreq: adding CPU 1 Sep 4 09:26:28 s15 kernel: [ 456.644323] intel_pstate: controlling: cpu 1 Sep 4 09:26:28 s15 kernel: [ 456.644325] cpufreq: setting new policy for CPU 1: 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.644327] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.644327] cpufreq: setting range Sep 4 09:26:28 s15 kernel: [ 456.644328] cpufreq: initialization complete Sep 4 09:26:28 s15 kernel: [ 456.644364] CPU1 is up Sep 4 09:26:28 s15 kernel: [ 456.644381] smpboot: Booting Node 0 Processor 2 APIC 0x4 Sep 4 09:26:28 s15 kernel: [ 456.652292] cache: parent cpu2 should not be sleeping Sep 4 09:26:28 s15 kernel: [ 456.652348] cpufreq: adding CPU 2 Sep 4 09:26:28 s15 kernel: [ 456.652352] intel_pstate: controlling: cpu 2 Sep 4 09:26:28 s15 kernel: [ 456.652355] cpufreq: setting new policy for CPU 2: 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.652356] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.652357] cpufreq: setting range Sep 4 09:26:28 s15 kernel: [ 456.652357] cpufreq: initialization complete Sep 4 09:26:28 s15 kernel: [ 456.652390] CPU2 is up Sep 4 09:26:28 s15 kernel: [ 456.652406] smpboot: Booting Node 0 Processor 3 APIC 0x6 Sep 4 09:26:28 s15 kernel: [ 456.660308] cache: parent cpu3 should not be sleeping Sep 4 09:26:28 s15 kernel: [ 456.660366] cpufreq: adding CPU 3 Sep 4 09:26:28 s15 kernel: [ 456.660370] intel_pstate: controlling: cpu 3 Sep 4 09:26:28 s15 kernel: [ 456.660373] cpufreq: setting new policy for CPU 3: 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.660374] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.660375] cpufreq: setting range Sep 4 09:26:28 s15 kernel: [ 456.660375] cpufreq: initialization complete Sep 4 09:26:28 s15 kernel: [ 456.660409] CPU3 is up Sep 4 09:26:28 s15 kernel: [ 456.660426] smpboot: Booting Node 0 Processor 4 APIC 0x1 Sep 4 09:26:28 s15 kernel: [ 456.668263] cache: parent cpu4 should not be sleeping Sep 4 09:26:28 s15 kernel: [ 456.668300] cpufreq: adding CPU 4 Sep 4 09:26:28 s15 kernel: [ 456.668304] intel_pstate: controlling: cpu 4 Sep 4 09:26:28 s15 kernel: [ 456.668307] cpufreq: setting new policy for CPU 4: 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.668308] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.668308] cpufreq: setting range Sep 4 09:26:28 s15 kernel: [ 456.668309] cpufreq: initialization complete Sep 4 09:26:28 s15 kernel: [ 456.668336] CPU4 is up Sep 4 09:26:28 s15 kernel: [ 456.668349] smpboot: Booting Node 0 Processor 5 APIC 0x3 Sep 4 09:26:28 s15 kernel: [ 456.680270] cache: parent cpu5 should not be sleeping Sep 4 09:26:28 s15 kernel: [ 456.680307] cpufreq: adding CPU 5 Sep 4 09:26:28 s15 kernel: [ 456.680311] intel_pstate: controlling: cpu 5 Sep 4 09:26:28 s15 kernel: [ 456.680313] cpufreq: setting new policy for CPU 5: 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.680314] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.680314] cpufreq: setting range Sep 4 09:26:28 s15 kernel: [ 456.680315] cpufreq: initialization complete Sep 4 09:26:28 s15 kernel: [ 456.680341] CPU5 is up Sep 4 09:26:28 s15 kernel: [ 456.680355] smpboot: Booting Node 0 Processor 7 APIC 0x7 Sep 4 09:26:28 s15 kernel: [ 456.688288] cache: parent cpu7 should not be sleeping Sep 4 09:26:28 s15 kernel: [ 456.688326] cpufreq: adding CPU 7 Sep 4 09:26:28 s15 kernel: [ 456.688330] intel_pstate: controlling: cpu 7 Sep 4 09:26:28 s15 kernel: [ 456.688333] cpufreq: setting new policy for CPU 7: 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.688333] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.688334] cpufreq: setting range Sep 4 09:26:28 s15 kernel: [ 456.688334] cpufreq: initialization complete Sep 4 09:26:28 s15 kernel: [ 456.688360] CPU7 is up Sep 4 09:26:28 s15 kernel: [ 456.693743] ACPI: Waking up from system sleep state S3 Sep 4 09:26:28 s15 kernel: [ 456.708292] ehci-pci 0000:00:1a.0: System wakeup disabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.708418] xhci_hcd 0000:07:00.0: System wakeup disabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.708583] ehci-pci 0000:00:1d.0: System wakeup disabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.708607] PM: noirq resume of devices complete after 14.593 msecs Sep 4 09:26:28 s15 kernel: [ 456.708929] PM: early resume of devices complete after 0.295 msecs Sep 4 09:26:28 s15 kernel: [ 456.709055] pcieport 0000:00:01.0: System wakeup disabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.709059] r8169 0000:03:00.0: System wakeup disabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.709106] rtc_cmos 00:02: System wakeup disabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.709123] usb usb1: root hub lost power or was reset Sep 4 09:26:28 s15 kernel: [ 456.709124] usb usb2: root hub lost power or was reset Sep 4 09:26:28 s15 kernel: [ 456.709537] parport_pc 00:05: activated Sep 4 09:26:28 s15 kernel: [ 456.709572] i8042 kbd 00:07: System wakeup disabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.710359] serial 00:08: activated Sep 4 09:26:28 s15 kernel: [ 456.713382] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359) Sep 4 09:26:28 s15 kernel: [ 456.713386] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.CHN0.DRV0._GTF] (Node ffff88040ecc8438), AE_NOT_FOUND (20150619/psparse-536) Sep 4 09:26:28 s15 kernel: [ 456.713407] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359) Sep 4 09:26:28 s15 kernel: [ 456.713410] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.CHN0.DRV1._GTF] (Node ffff88040ecc84b0), AE_NOT_FOUND (20150619/psparse-536) Sep 4 09:26:28 s15 kernel: [ 456.716059] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359) Sep 4 09:26:28 s15 kernel: [ 456.716068] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT1.CHN0.DRV0._GTF] (Node ffff88040ecc87d0), AE_NOT_FOUND (20150619/psparse-536) Sep 4 09:26:28 s15 kernel: [ 456.718140] sd 0:0:0:0: [sda] Starting disk Sep 4 09:26:28 s15 kernel: [ 456.718142] sd 0:0:1:0: [sdb] Starting disk Sep 4 09:26:28 s15 kernel: [ 456.830264] r8169 0000:03:00.0 eth0: link down Sep 4 09:26:28 s15 kernel: [ 456.830272] br0: port 1(eth0) entered disabled state Sep 4 09:26:28 s15 kernel: [ 457.035873] ata5: SATA link down (SStatus 0 SControl 300) Sep 4 09:26:28 s15 kernel: [ 457.035921] ata6: SATA link down (SStatus 0 SControl 300) Sep 4 09:26:28 s15 kernel: [ 457.191764] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 330) Sep 4 09:26:28 s15 kernel: [ 457.195886] usb 1-2: reset low-speed USB device number 2 using xhci_hcd Sep 4 09:26:28 s15 kernel: [ 457.199927] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359) Sep 4 09:26:28 s15 kernel: [ 457.199930] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT1.CHN0.DRV0._GTF] (Node ffff88040ecc87d0), AE_NOT_FOUND (20150619/psparse-536) Sep 4 09:26:28 s15 kernel: [ 457.215848] ata3.00: configured for UDMA/100 Sep 4 09:26:28 s15 kernel: [ 457.473072] usb 1-2: ep 0x81 - rounding interval to 64 microframes, ep desc says 80 microframes Sep 4 09:26:28 s15 kernel: [ 457.476193] PM: resume of devices complete after 767.759 msecs Sep 4 09:26:28 s15 kernel: [ 457.476362] PM: Finishing wakeup. Sep 4 09:26:28 s15 kernel: [ 457.476363] Restarting tasks ... done. ... deleted some lines... Sep 4 09:26:36 s15 kernel: [ 464.735231] cpufreq: setting new policy for CPU 0: 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.735235] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.735236] cpufreq: setting range Sep 4 09:26:36 s15 kernel: [ 464.735678] cpufreq: setting new policy for CPU 1: 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.735680] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.735681] cpufreq: setting range Sep 4 09:26:36 s15 kernel: [ 464.736112] cpufreq: setting new policy for CPU 2: 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.736114] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.736115] cpufreq: setting range Sep 4 09:26:36 s15 kernel: [ 464.736547] cpufreq: setting new policy for CPU 3: 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.736549] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.736550] cpufreq: setting range Sep 4 09:26:36 s15 kernel: [ 464.736968] cpufreq: setting new policy for CPU 4: 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.736971] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.736972] cpufreq: setting range Sep 4 09:26:36 s15 kernel: [ 464.737388] cpufreq: setting new policy for CPU 5: 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.737390] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.737391] cpufreq: setting range Sep 4 09:26:36 s15 kernel: [ 464.738325] cpufreq: setting new policy for CPU 7: 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.738328] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.738329] cpufreq: setting range Sep 4 09:26:39 s15 kernel: [ 468.184616] br0: port 1(eth0) entered forwarding state >>>>>> Now put CPU 6 back online and take CPU 7 offline. Sep 4 09:30:52 s15 kernel: [ 720.581828] smpboot: Booting Node 0 Processor 6 APIC 0x5 Sep 4 09:30:52 s15 kernel: [ 720.602104] cpufreq: adding CPU 6 Sep 4 09:30:52 s15 kernel: [ 720.602142] intel_pstate: controlling: cpu 6 Sep 4 09:30:52 s15 kernel: [ 720.602147] cpufreq: setting new policy for CPU 6: 1600000 - 3800000 kHz Sep 4 09:30:52 s15 kernel: [ 720.602150] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:30:52 s15 kernel: [ 720.602151] cpufreq: setting range Sep 4 09:30:52 s15 kernel: [ 720.602153] cpufreq: initialization complete Sep 4 09:31:03 s15 kernel: [ 732.221929] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 7 Sep 4 09:31:03 s15 kernel: [ 732.221932] intel_pstate: CPU 7 exiting Sep 4 09:31:03 s15 kernel: [ 732.235076] smpboot: CPU 7 is now offline >>>>>> about to to "sudo pm-suspend" that will fail. Sep 4 09:32:43 s15 kernel: [ 831.613558] cpufreq: setting new policy for CPU 0: 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.613562] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.613563] cpufreq: setting range Sep 4 09:32:43 s15 kernel: [ 831.614373] cpufreq: setting new policy for CPU 1: 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.614375] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.614376] cpufreq: setting range Sep 4 09:32:43 s15 kernel: [ 831.615165] cpufreq: setting new policy for CPU 2: 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.615166] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.615167] cpufreq: setting range Sep 4 09:32:43 s15 kernel: [ 831.615956] cpufreq: setting new policy for CPU 3: 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.615957] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.615958] cpufreq: setting range Sep 4 09:32:43 s15 kernel: [ 831.616740] cpufreq: setting new policy for CPU 4: 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.616741] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.616742] cpufreq: setting range Sep 4 09:32:43 s15 kernel: [ 831.617594] cpufreq: setting new policy for CPU 5: 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.617596] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.617596] cpufreq: setting range Sep 4 09:32:43 s15 kernel: [ 831.618386] cpufreq: setting new policy for CPU 6: 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.618387] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.618388] cpufreq: setting range >>>>>> During write up, realize that the last entries above must have been during the "sudo pm-suspend" that fails. >>>>>> Verify that these log entries are all that occur for the failed "sudo pm-suspend" by doing another: Sep 4 11:15:38 s15 kernel: [ 7002.787685] cpufreq: setting new policy for CPU 0: 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.787689] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.787690] cpufreq: setting range Sep 4 11:15:38 s15 kernel: [ 7002.788544] cpufreq: setting new policy for CPU 1: 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.788546] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.788547] cpufreq: setting range Sep 4 11:15:38 s15 kernel: [ 7002.789389] cpufreq: setting new policy for CPU 2: 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.789391] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.789391] cpufreq: setting range Sep 4 11:15:38 s15 kernel: [ 7002.790210] cpufreq: setting new policy for CPU 3: 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.790212] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.790213] cpufreq: setting range Sep 4 11:15:38 s15 kernel: [ 7002.790992] cpufreq: setting new policy for CPU 4: 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.790994] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.790994] cpufreq: setting range Sep 4 11:15:38 s15 kernel: [ 7002.791891] cpufreq: setting new policy for CPU 5: 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.791893] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.791894] cpufreq: setting range Sep 4 11:15:38 s15 kernel: [ 7002.792739] cpufreq: setting new policy for CPU 6: 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.792741] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.792741] cpufreq: setting range
On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote: > On 2015.09.04 07:43 Viresh Kumar wrote: >> On 04-09-15, 16:59, Rafael J. Wysocki wrote: >>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: >>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered >>>> CPU offline (7 in my case), the system will not suspend. > >> I wanted to give him some patch to debug it a bit more, but couldn't >> do that whole day. > >> @Doug: Can you please enable DEBUG for cpufreq with this: >> >> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile >> index 9fde14544ead..c09945aa7f17 100644 >> --- a/drivers/cpufreq/Makefile >> +++ b/drivers/cpufreq/Makefile >> @@ -1,3 +1,4 @@ >> +subdir-ccflags-y := -DDEBUG >> # CPUfreq core >> obj-$(CONFIG_CPU_FREQ) += cpufreq.o freq_table.o >> >> >> And give us the outputs of both successful and unsuccessful logs? > > Edited /var/log/kern.log attached (might get stripped for > on-list e-mail deliveries) Hmm. I suspect that your user space does something that fails during the pm-suspend. Instead of invoking the pm-suspend command, can you simply do (as root) # echo mem > /sys/power/state and see if that behaves in the same way? Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2015.09.04 15:26 Rafael J. Wysocki wrote: > On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote: >> On 2015.09.04 07:43 Viresh Kumar wrote: >>> On 04-09-15, 16:59, Rafael J. Wysocki wrote: >>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: >>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered >>>>> CPU offline (7 in my case), the system will not suspend. > >>> @Doug: Can you please enable DEBUG for cpufreq with this: >>> >>> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile >>> index 9fde14544ead..c09945aa7f17 100644 >>> --- a/drivers/cpufreq/Makefile >>> +++ b/drivers/cpufreq/Makefile >>> @@ -1,3 +1,4 @@ >>> +subdir-ccflags-y := -DDEBUG >>> # CPUfreq core >>> obj-$(CONFIG_CPU_FREQ) += cpufreq.o freq_table.o >>> >>> >>> And give us the outputs of both successful and unsuccessful logs? >> >> Edited /var/log/kern.log attached (might get stripped for >> on-list e-mail deliveries) > Hmm. > I suspect that your user space does something that fails during the pm-suspend. Are you saying that the patch might be O.K., but reveals and issue with pm-suspend that was always there? > Instead of invoking the pm-suspend command, can you simply do (as root) > # echo mem > /sys/power/state > and see if that behaves in the same way? With CPU 7 offline, that method seems to suspend just fine. I did not check any other scenarios. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Sep 5, 2015 at 1:05 AM, Doug Smythies <dsmythies@telus.net> wrote: > On 2015.09.04 15:26 Rafael J. Wysocki wrote: >> On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote: >>> On 2015.09.04 07:43 Viresh Kumar wrote: >>>> On 04-09-15, 16:59, Rafael J. Wysocki wrote: >>>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: >>>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered >>>>>> CPU offline (7 in my case), the system will not suspend. >> >>>> @Doug: Can you please enable DEBUG for cpufreq with this: >>>> >>>> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile >>>> index 9fde14544ead..c09945aa7f17 100644 >>>> --- a/drivers/cpufreq/Makefile >>>> +++ b/drivers/cpufreq/Makefile >>>> @@ -1,3 +1,4 @@ >>>> +subdir-ccflags-y := -DDEBUG >>>> # CPUfreq core >>>> obj-$(CONFIG_CPU_FREQ) += cpufreq.o freq_table.o >>>> >>>> >>>> And give us the outputs of both successful and unsuccessful logs? >>> >>> Edited /var/log/kern.log attached (might get stripped for >>> on-list e-mail deliveries) > >> Hmm. >> I suspect that your user space does something that fails during the pm-suspend. > > Are you saying that the patch might be O.K., but reveals > and issue with pm-suspend that was always there? Or it breaks something that pm-suspend does before suspending. It would be good to know what it is. :-) The "setting new policy for" messages in your log are from cpufreq_set_policy() and the last thing printed before pm-suspend exits in the failing case is before calling cpufreq_driver->setpolicy(). I guess we need to focus on that one, will send you a debug patch shortly. >> Instead of invoking the pm-suspend command, can you simply do (as root) >> # echo mem > /sys/power/state >> and see if that behaves in the same way? > > With CPU 7 offline, that method seems to suspend just fine. > I did not check any other scenarios. OK I'm now suspecting that the change in question might break something in intel_pstate which causes ->setpolicy() to fail for the last online CPU. Can you try to offline CPU7 and try to play with min and max for CPU6? Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2015.09.04 17:23 Rafael J. Wysocki wrote: > On Sat, Sep 5, 2015 at 1:05 AM, Doug Smythies <dsmythies@telus.net> wrote: >> On 2015.09.04 15:26 Rafael J. Wysocki wrote: >>> On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote: >>>> On 2015.09.04 07:43 Viresh Kumar wrote: >>>>> On 04-09-15, 16:59, Rafael J. Wysocki wrote: >>>>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: >>>>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered >>>>>>> CPU offline (7 in my case), the system will not suspend. >>> >>>>> @Doug: Can you please enable DEBUG for cpufreq with this: >>>>> >>>>> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile >>>>> index 9fde14544ead..c09945aa7f17 100644 >>>>> --- a/drivers/cpufreq/Makefile >>>>> +++ b/drivers/cpufreq/Makefile >>>>> @@ -1,3 +1,4 @@ >>>>> +subdir-ccflags-y := -DDEBUG >>>>> # CPUfreq core >>>>> obj-$(CONFIG_CPU_FREQ) += cpufreq.o freq_table.o >>>>> >>>>> >>>>> And give us the outputs of both successful and unsuccessful logs? >>>> >>>> Edited /var/log/kern.log attached (might get stripped for >>>> on-list e-mail deliveries) >> >>> Hmm. >>> I suspect that your user space does something that fails during the pm-suspend. >> >> Are you saying that the patch might be O.K., but reveals >> and issue with pm-suspend that was always there? > > Or it breaks something that pm-suspend does before suspending. > > It would be good to know what it is. :-) While researching pm-utils bugs, I found reference to /var/log/pm-suspend.log, which I had not noticed before. Relevant extract attached. It is not clear to me why that echo line (there is only one) would fail. > The "setting new policy for" messages in your log are from > cpufreq_set_policy() and the last thing printed before pm-suspend > exits in the failing case is before calling > cpufreq_driver->setpolicy(). > > I guess we need to focus on that one, will send you a debug patch shortly. > >>> Instead of invoking the pm-suspend command, can you simply do (as root) >>> # echo mem > /sys/power/state >>> and see if that behaves in the same way? >> >> With CPU 7 offline, that method seems to suspend just fine. >> I did not check any other scenarios. > > OK > > I'm now suspecting that the change in question might break something > in intel_pstate which causes ->setpolicy() to fail for the last online > CPU. While, by far, most of my work on this has been done using the intel_pstate scaling driver, I have also tested using the acpi-cpufreq scaling driver, with the same results. > Can you try to offline CPU7 and try to play with min and max for CPU6? Yes. I did, on a kernel with your intel_pstate.c patch from your subsequent e-mail. I didn't notice any problem, but maybe didn't do it correctly to manifest the issue. small /var/log/kern.log segment attached. ... Doug Running hook /usr/lib/pm-utils/sleep.d/60_wpa_supplicant suspend suspend: Failed to connect to non-global ctrl_ifname: (null) error: No such file or directory /usr/lib/pm-utils/sleep.d/60_wpa_supplicant suspend suspend: success. Running hook /usr/lib/pm-utils/sleep.d/75modules suspend suspend: /usr/lib/pm-utils/sleep.d/75modules suspend suspend: not applicable. Running hook /usr/lib/pm-utils/sleep.d/90clock suspend suspend: /usr/lib/pm-utils/sleep.d/90clock suspend suspend: not applicable. Running hook /usr/lib/pm-utils/sleep.d/94cpufreq suspend suspend: sh: echo: I/O error /usr/lib/pm-utils/sleep.d/94cpufreq suspend suspend: Returned exit code 1. Fri Sep 4 18:25:00 PDT 2015: Inhibit found, will not perform suspend Fri Sep 4 18:25:00 PDT 2015: Running hooks for resume Running hook /usr/lib/pm-utils/sleep.d/90clock resume suspend: /usr/lib/pm-utils/sleep.d/90clock resume suspend: not applicable. Running hook /usr/lib/pm-utils/sleep.d/75modules resume suspend: Reloaded unloaded modules. /usr/lib/pm-utils/sleep.d/75modules resume suspend: success. Sep 4 19:14:31 s15 kernel: [ 115.529407] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 7 Sep 4 19:14:31 s15 kernel: [ 115.529413] intel_pstate: CPU 7 exiting Sep 4 19:14:31 s15 kernel: [ 115.542754] smpboot: CPU 7 is now offline Sep 4 19:20:19 s15 kernel: [ 463.426743] cpufreq: setting new policy for CPU 6: 1600000 - 2400000 kHz Sep 4 19:20:19 s15 kernel: [ 463.426750] cpufreq: new min and max freqs are 1600000 - 2400000 kHz Sep 4 19:20:19 s15 kernel: [ 463.426752] cpufreq: setting range Sep 4 19:23:05 s15 kernel: [ 628.598506] cpufreq: setting new policy for CPU 6: 1600000 - 2200000 kHz Sep 4 19:23:05 s15 kernel: [ 628.598511] cpufreq: new min and max freqs are 1600000 - 2200000 kHz Sep 4 19:23:05 s15 kernel: [ 628.598513] cpufreq: setting range
On 2015.09.05 19:35 Doug Smythies wrote: > On 2015.09.04 17:23 Rafael J. Wysocki wrote: >> On Sat, Sep 5, 2015 at 1:05 AM, Doug Smythies <dsmythies@telus.net> wrote: >>> On 2015.09.04 15:26 Rafael J. Wysocki wrote: >>>> On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote: >>>>> On 2015.09.04 07:43 Viresh Kumar wrote: >>>>>> On 04-09-15, 16:59, Rafael J. Wysocki wrote: >>>>>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: >>>>>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered >>>>>>>> CPU offline (7 in my case), the system will not suspend. >>>> Hmm. >>>> I suspect that your user space does something that fails during the pm-suspend. >>> >>> Are you saying that the patch might be O.K., but reveals >>> and issue with pm-suspend that was always there? >> >> Or it breaks something that pm-suspend does before suspending. >> >> It would be good to know what it is. :-) > While researching pm-utils bugs, I found reference to > /var/log/pm-suspend.log, which I had not noticed before. > Relevant extract attached. > It is not clear to me why that echo line (there is only one) > would fail. The echo line fails because the related CPU is offline. If the failed echo is the last pass through the loop, then the script interprets the overall execution of 94cpufreq as a failure and aborts the suspend. If the failed echo is not the last pass through the loop, then the bad exit code gets overwritten with a good one before the loop exits. Since the loop is merely setting a temporary governor, to test I just used performance mode anyway, and commented out the echo. pm-suspend with CPU 7 offline then worked fine. I have not yet gone back to any before the patch kernel to determine why it used to work (it is late in my time zone). However, I would have to assume that before the commit in question, the echo worked even if the CPU was offline. Could someone please confirm or deny the above conclusion. The relevant code segment, with some added debug echo stuff, from /usr/lib/pm-utils/sleep.d/94cpufreq hibernate_cpufreq() { ( cd /sys/devices/system/cpu/ for x in cpu[0-9]*; do # if cpufreq is a symlink, it is handled by another cpu. Skip. [ -L "$x/cpufreq" ] && continue gov="$x/cpufreq/scaling_governor" # if we do not have a scaling_governor file, skip. [ -f "$gov" ] || continue # if our temporary governor is not available, skip. grep -q "$TEMPORARY_CPUFREQ_GOVERNOR" \ "$x/cpufreq/scaling_available_governors" || continue savestate "${x}_governor" < "$gov" # I added the next 3 lines echo "$x" echo "$TEMPORARY_CPUFREQ_GOVERNOR" echo "$gov" # For a test, do not do the echo, as I already set performance mode # echo "$TEMPORARY_CPUFREQ_GOVERNOR" > "$gov" done ) } -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05-09-15, 00:46, Doug Smythies wrote: > > It is not clear to me why that echo line (there is only one) > > would fail. To me it is clear now :) > The echo line fails because the related CPU is offline. > If the failed echo is the last pass through the loop, > then the script interprets the overall execution of > 94cpufreq as a failure and aborts the suspend. If the > failed echo is not the last pass through the loop, then > the bad exit code gets overwritten with a good one before > the loop exits. > > Since the loop is merely setting a temporary governor, > to test I just used performance mode anyway, and commented > out the echo. pm-suspend with CPU 7 offline then worked fine. > > I have not yet gone back to any before the patch kernel > to determine why it used to work (it is late in my time zone). > However, I would have to assume that before the commit in > question, the echo worked even if the CPU was offline. So here is the story behind it. - In your system all CPUs are independent, that is there are no links to cpufreq directory, so that check in the script is useless for you. - The $COMMIT in question did a significant change. Earlier, while offlining the CPU, we used to remove the cpufreq directory from sysfs, which is not the case any more. - So to be precise, following lines came to your rescue earlier: # if we do not have a scaling_governor file, skip. # [ -f "$gov" ] || continue - But they don't after the patch, as the file and directory are present even if the CPU is offline. - But because the CPU is offline, writing to those files isn't allowed and so the echo failed. Solution to that is that we check for CPU offline as well in the beginning of the script, and skip if the CPU is offline.
On Saturday, September 05, 2015 12:46:40 AM Doug Smythies wrote: > On 2015.09.05 19:35 Doug Smythies wrote: > > On 2015.09.04 17:23 Rafael J. Wysocki wrote: > >> On Sat, Sep 5, 2015 at 1:05 AM, Doug Smythies <dsmythies@telus.net> wrote: > >>> On 2015.09.04 15:26 Rafael J. Wysocki wrote: > >>>> On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote: > >>>>> On 2015.09.04 07:43 Viresh Kumar wrote: > >>>>>> On 04-09-15, 16:59, Rafael J. Wysocki wrote: > >>>>>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: > >>>>>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered > >>>>>>>> CPU offline (7 in my case), the system will not suspend. > > >>>> Hmm. > >>>> I suspect that your user space does something that fails during the pm-suspend. > >>> > >>> Are you saying that the patch might be O.K., but reveals > >>> and issue with pm-suspend that was always there? > >> > >> Or it breaks something that pm-suspend does before suspending. > >> > >> It would be good to know what it is. :-) > > > While researching pm-utils bugs, I found reference to > > /var/log/pm-suspend.log, which I had not noticed before. > > Relevant extract attached. > > > It is not clear to me why that echo line (there is only one) > > would fail. > > The echo line fails because the related CPU is offline. > If the failed echo is the last pass through the loop, > then the script interprets the overall execution of > 94cpufreq as a failure and aborts the suspend. That's correct AFAICS. > If the failed echo is not the last pass through the loop, then > the bad exit code gets overwritten with a good one before > the loop exits. Right. > Since the loop is merely setting a temporary governor, > to test I just used performance mode anyway, and commented > out the echo. pm-suspend with CPU 7 offline then worked fine. > > I have not yet gone back to any before the patch kernel > to determine why it used to work (it is late in my time zone). > However, I would have to assume that before the commit in > question, the echo worked even if the CPU was offline. It didn't have to, because the cpudreq directory was not present, so the [ -L "$x/cpufreq" ] && continue line would trigger. > Could someone please confirm or deny the above conclusion. > > The relevant code segment, with some added debug > echo stuff, from /usr/lib/pm-utils/sleep.d/94cpufreq > > hibernate_cpufreq() > { > ( cd /sys/devices/system/cpu/ > for x in cpu[0-9]*; do > # if cpufreq is a symlink, it is handled by another cpu. Skip. > [ -L "$x/cpufreq" ] && continue > gov="$x/cpufreq/scaling_governor" > # if we do not have a scaling_governor file, skip. > [ -f "$gov" ] || continue > # if our temporary governor is not available, skip. > grep -q "$TEMPORARY_CPUFREQ_GOVERNOR" \ > "$x/cpufreq/scaling_available_governors" || continue > savestate "${x}_governor" < "$gov" > # I added the next 3 lines > echo "$x" > echo "$TEMPORARY_CPUFREQ_GOVERNOR" > echo "$gov" > # For a test, do not do the echo, as I already set performance mode > # echo "$TEMPORARY_CPUFREQ_GOVERNOR" > "$gov" > done ) > } Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Saturday, September 05, 2015 01:44:07 PM Viresh Kumar wrote: > On 05-09-15, 00:46, Doug Smythies wrote: > > > It is not clear to me why that echo line (there is only one) > > > would fail. > > To me it is clear now :) > > > The echo line fails because the related CPU is offline. > > If the failed echo is the last pass through the loop, > > then the script interprets the overall execution of > > 94cpufreq as a failure and aborts the suspend. If the > > failed echo is not the last pass through the loop, then > > the bad exit code gets overwritten with a good one before > > the loop exits. > > > > Since the loop is merely setting a temporary governor, > > to test I just used performance mode anyway, and commented > > out the echo. pm-suspend with CPU 7 offline then worked fine. > > > > I have not yet gone back to any before the patch kernel > > to determine why it used to work (it is late in my time zone). > > However, I would have to assume that before the commit in > > question, the echo worked even if the CPU was offline. > > So here is the story behind it. > - In your system all CPUs are independent, that is there are no links > to cpufreq directory, so that check in the script is useless for > you. > - The $COMMIT in question did a significant change. Earlier, while > offlining the CPU, we used to remove the cpufreq directory from > sysfs, which is not the case any more. > > - So to be precise, following lines came to your rescue earlier: > > # if we do not have a scaling_governor file, skip. > # [ -f "$gov" ] || continue > > - But they don't after the patch, as the file and directory are > present even if the CPU is offline. > - But because the CPU is offline, writing to those files isn't allowed > and so the echo failed. > > Solution to that is that we check for CPU offline as well in the > beginning of the script, and skip if the CPU is offline. That's a bug in the script. It should discard all errors from the entire inner loop, but it doesn't discard errors from the last iteration of it. That said, what store() in cpufreq.c does is questionable too. First, if policy->cpu is offline, the policy will be inactive to my eyes, so we don't need the second check. But if the policy is active (and policy->cpu is online), it will not generally fail for an offline CPU. So, if the policy applies to more than 1 CPU, you can use any of them to manipulate it, even if one of them is offline as long as there are any online CPUs in the set. This isn't entirely consistent. We should either fail store() for any offline CPU or make the changes for offline CPUs to. And in the particular case of the governor, I'm wondering what will be the problem with changing last_governor for an inactive policy? Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
To wrap this up, I was thinking to file a bug report on the pm-utils bug system and then to file a bug against the distribution that I use (Ubuntu Server), linking to the upstream bug report. I don't have it working correctly yet, but I was hoping to suggest a fix with the bug reports. Something like (still has all my debug stuff also): hibernate_cpufreq() { ( cd /sys/devices/system/cpu/ for x in cpu[0-9]*; do # if cpufreq is a symlink, it is handled by another cpu. Skip. [ -L "$x/cpufreq" ] && continue gov="$x/cpufreq/scaling_governor" # if we do not have a scaling_governor file, skip. [ -f "$gov" ] || continue echo "before $x online check" + # if the CPU is offline, skip, unless no file, i.e. CPU0. + [ $(cat "$x/online") = "1" -o ! -f "$x/online" ] || continue Or + if [ $(cat "$x/online") = "1" ] || [ ! -f "$x/online" ]; then + continue; + fi Or something similar that actually works. echo "after $x online check" # if our temporary governor is not available, skip. grep -q "$TEMPORARY_CPUFREQ_GOVERNOR" \ "$x/cpufreq/scaling_available_governors" || continue savestate "${x}_governor" < "$gov" echo "$x" echo "$TEMPORARY_CPUFREQ_GOVERNOR" echo "$gov" echo "$TEMPORARY_CPUFREQ_GOVERNOR" > "$gov" done ) } With the proposed fix not dependent on CPU0 at all, just the condition that if the file exists, that the CPU be online, and if it doesn't exist then assume the CPU is online. As you both pointed out, there is a previous check and skip for the no governor or no CPU condition or older kernel conditions. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Monday, September 07, 2015 07:03:16 AM Doug Smythies wrote: > To wrap this up, I was thinking to file a bug report > on the pm-utils bug system and then to file a bug against > the distribution that I use (Ubuntu Server), linking to the > upstream bug report. > > I don't have it working correctly yet, but I was hoping > to suggest a fix with the bug reports. > > Something like (still has all my debug stuff also): > > hibernate_cpufreq() > { > ( cd /sys/devices/system/cpu/ > for x in cpu[0-9]*; do > # if cpufreq is a symlink, it is handled by another cpu. Skip. > [ -L "$x/cpufreq" ] && continue > gov="$x/cpufreq/scaling_governor" > # if we do not have a scaling_governor file, skip. > [ -f "$gov" ] || continue > echo "before $x online check" > > + # if the CPU is offline, skip, unless no file, i.e. CPU0. > + [ $(cat "$x/online") = "1" -o ! -f "$x/online" ] || continue > Or > + if [ $(cat "$x/online") = "1" ] || [ ! -f "$x/online" ]; then > + continue; > + fi > Or something similar that actually works. > > echo "after $x online check" > # if our temporary governor is not available, skip. > grep -q "$TEMPORARY_CPUFREQ_GOVERNOR" \ > "$x/cpufreq/scaling_available_governors" || continue > savestate "${x}_governor" < "$gov" > echo "$x" > echo "$TEMPORARY_CPUFREQ_GOVERNOR" > echo "$gov" > echo "$TEMPORARY_CPUFREQ_GOVERNOR" > "$gov" > done ) > } > > With the proposed fix not dependent on CPU0 at all, just the condition that if > the file exists, that the CPU be online, and if it doesn't exist then assume the > CPU is online. As you both pointed out, there is a previous check and skip for > the no governor or no CPU condition or older kernel conditions. I guess it also would work if you added return 0 at the end of hibernate_cpufreq(). Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07-09-15, 15:32, Rafael J. Wysocki wrote: > First, if policy->cpu is offline, the policy will be inactive to my eyes, so > we don't need the second check. Hmm, or maybe just drop the first check. > But if the policy is active (and policy->cpu is online), it will not generally > fail for an offline CPU. Right. > So, if the policy applies to more than 1 CPU, you > can use any of them to manipulate it, even if one of them is offline as long > as there are any online CPUs in the set. Right. > This isn't entirely consistent. We should either fail store() for any offline > CPU At that point we have no idea of the CPU for which the sysfs operation is called. And so we have to go ahead without failing, if policy is active. > or make the changes for offline CPUs to. What does that mean? Most of the stuff we do is for the policy, rather than per-cpu. And if there is per-cpu stuff, then we *only* should be doing that for the online ones. Not sure if I understood what you meant here. :( > And in the particular case of > the governor, I'm wondering what will be the problem with changing last_governor > for an inactive policy? I don't think we should be adding special cases for updating sysfs attributes of an inactive policy. Its not just about the last_governor thing, but other sysfs attributes as well.
Sorry about the late reply and not helping out earlier. Didn't check this email account for sometime. On 09/07/2015 07:40 PM, Viresh Kumar wrote: > On 07-09-15, 15:32, Rafael J. Wysocki wrote: >> First, if policy->cpu is offline, the policy will be inactive to my eyes, so >> we don't need the second check. > > Hmm, or maybe just drop the first check. > >> But if the policy is active (and policy->cpu is online), it will not generally >> fail for an offline CPU. > > Right. > >> So, if the policy applies to more than 1 CPU, you >> can use any of them to manipulate it, even if one of them is offline as long >> as there are any online CPUs in the set. > > Right. > >> This isn't entirely consistent. We should either fail store() for any offline >> CPU > > At that point we have no idea of the CPU for which the sysfs operation > is called. And so we have to go ahead without failing, if policy is > active. > >> or make the changes for offline CPUs to. > > What does that mean? Most of the stuff we do is for the policy, rather > than per-cpu. And if there is per-cpu stuff, then we *only* should be > doing that for the online ones. > > Not sure if I understood what you meant here. :( > >> And in the particular case of >> the governor, I'm wondering what will be the problem with changing last_governor >> for an inactive policy? > > I don't think we should be adding special cases for updating sysfs > attributes of an inactive policy. Its not just about the last_governor > thing, but other sysfs attributes as well. > The way I see it, having the cpufreq policy control sysfs "bits" under every CPU directory is what's causing some semantic confusion/inconsistency. Every single node under a cpufreq folder is for policy control and not CPU control. But by putting the policy control bits under the cpuX directory, we give the wrong semantic impression that it's a per CPU attribute when it's really per-policy. Ideally (in terms of semantics) we would have put all the policy control bits in a per policy directory under /sys/devices/system/cpu/cpufreq/policyX/ where X would/could be the tied to the first CPU in related CPUs -- so that it's easy to correlate and also to avoid having the policy numbering being different depending on the order in which CPUs get hotplugged. But we can't go about breaking userspace ABI by removing the cpufreq directories out of the cpu directories just because of the semantic confusion. Well, we COULD still put the policy directories under cpu/cpufreq/ and then make every cpuX/cpufreq directory a symlink to the actual policy directory. But that is not going to help with this specific issue/discussion. Having said all that, I still think that stores to all these sysfs files should work. I'm not saying it's a trivial change (like setting a governor's polling time, etc would need some checks to cache the value and not start a timer immediately, etc), but I think it's a more consistent and user friendly API. If the user wants to set a min CPU freq, why should they care if the CPU is online at that very instant? It gets especially painful if you have a thermal daemon that's plugging in/out CPUs while the user or a script is trying to set the parameters. Thanks, Saravana
On Friday, September 11, 2015 01:43:51 PM Saravana Kannan wrote: > Sorry about the late reply and not helping out earlier. Didn't check > this email account for sometime. > > On 09/07/2015 07:40 PM, Viresh Kumar wrote: > > On 07-09-15, 15:32, Rafael J. Wysocki wrote: > >> First, if policy->cpu is offline, the policy will be inactive to my eyes, so > >> we don't need the second check. > > > > Hmm, or maybe just drop the first check. > > > >> But if the policy is active (and policy->cpu is online), it will not generally > >> fail for an offline CPU. > > > > Right. > > > >> So, if the policy applies to more than 1 CPU, you > >> can use any of them to manipulate it, even if one of them is offline as long > >> as there are any online CPUs in the set. > > > > Right. > > > >> This isn't entirely consistent. We should either fail store() for any offline > >> CPU > > > > At that point we have no idea of the CPU for which the sysfs operation > > is called. And so we have to go ahead without failing, if policy is > > active. > > > >> or make the changes for offline CPUs to. > > > > What does that mean? Most of the stuff we do is for the policy, rather > > than per-cpu. And if there is per-cpu stuff, then we *only* should be > > doing that for the online ones. > > > > Not sure if I understood what you meant here. :( > > > >> And in the particular case of > >> the governor, I'm wondering what will be the problem with changing last_governor > >> for an inactive policy? > > > > I don't think we should be adding special cases for updating sysfs > > attributes of an inactive policy. Its not just about the last_governor > > thing, but other sysfs attributes as well. > > > > The way I see it, having the cpufreq policy control sysfs "bits" under > every CPU directory is what's causing some semantic confusion/inconsistency. > > Every single node under a cpufreq folder is for policy control and not > CPU control. But by putting the policy control bits under the cpuX > directory, we give the wrong semantic impression that it's a per CPU > attribute when it's really per-policy. > > Ideally (in terms of semantics) we would have put all the policy control > bits in a per policy directory under > /sys/devices/system/cpu/cpufreq/policyX/ where X would/could be the tied > to the first CPU in related CPUs -- so that it's easy to correlate and > also to avoid having the policy numbering being different depending on > the order in which CPUs get hotplugged. > > But we can't go about breaking userspace ABI by removing the cpufreq > directories out of the cpu directories just because of the semantic > confusion. > > Well, we COULD still put the policy directories under cpu/cpufreq/ and > then make every cpuX/cpufreq directory a symlink to the actual policy > directory. But that is not going to help with this specific > issue/discussion. It isn't, but it'd be a good change in my view. > Having said all that, I still think that stores to all these sysfs files > should work. I'm not saying it's a trivial change (like setting a > governor's polling time, etc would need some checks to cache the value > and not start a timer immediately, etc), but I think it's a more > consistent and user friendly API. I agree. It also is backwards compatible with scripts that walk the cpufreq directories for all CPUs without checking the online attribute and expect things to work. > If the user wants to set a min CPU freq, why should they care if the CPU > is online at that very instant? It gets especially painful if you have a > thermal daemon that's plugging in/out CPUs while the user or a script is > trying to set the parameters. You mean putting them offline/online I suppose? Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/11/2015 02:30 PM, Rafael J. Wysocki wrote: > On Friday, September 11, 2015 01:43:51 PM Saravana Kannan wrote: >> Sorry about the late reply and not helping out earlier. Didn't check >> this email account for sometime. >> >> On 09/07/2015 07:40 PM, Viresh Kumar wrote: >>> On 07-09-15, 15:32, Rafael J. Wysocki wrote: >>>> First, if policy->cpu is offline, the policy will be inactive to my eyes, so >>>> we don't need the second check. >>> >>> Hmm, or maybe just drop the first check. >>> >>>> But if the policy is active (and policy->cpu is online), it will not generally >>>> fail for an offline CPU. >>> >>> Right. >>> >>>> So, if the policy applies to more than 1 CPU, you >>>> can use any of them to manipulate it, even if one of them is offline as long >>>> as there are any online CPUs in the set. >>> >>> Right. >>> >>>> This isn't entirely consistent. We should either fail store() for any offline >>>> CPU >>> >>> At that point we have no idea of the CPU for which the sysfs operation >>> is called. And so we have to go ahead without failing, if policy is >>> active. >>> >>>> or make the changes for offline CPUs to. >>> >>> What does that mean? Most of the stuff we do is for the policy, rather >>> than per-cpu. And if there is per-cpu stuff, then we *only* should be >>> doing that for the online ones. >>> >>> Not sure if I understood what you meant here. :( >>> >>>> And in the particular case of >>>> the governor, I'm wondering what will be the problem with changing last_governor >>>> for an inactive policy? >>> >>> I don't think we should be adding special cases for updating sysfs >>> attributes of an inactive policy. Its not just about the last_governor >>> thing, but other sysfs attributes as well. >>> >> >> The way I see it, having the cpufreq policy control sysfs "bits" under >> every CPU directory is what's causing some semantic confusion/inconsistency. >> >> Every single node under a cpufreq folder is for policy control and not >> CPU control. But by putting the policy control bits under the cpuX >> directory, we give the wrong semantic impression that it's a per CPU >> attribute when it's really per-policy. >> >> Ideally (in terms of semantics) we would have put all the policy control >> bits in a per policy directory under >> /sys/devices/system/cpu/cpufreq/policyX/ where X would/could be the tied >> to the first CPU in related CPUs -- so that it's easy to correlate and >> also to avoid having the policy numbering being different depending on >> the order in which CPUs get hotplugged. >> >> But we can't go about breaking userspace ABI by removing the cpufreq >> directories out of the cpu directories just because of the semantic >> confusion. >> >> Well, we COULD still put the policy directories under cpu/cpufreq/ and >> then make every cpuX/cpufreq directory a symlink to the actual policy >> directory. But that is not going to help with this specific >> issue/discussion. > > It isn't, but it'd be a good change in my view. >> Having said all that, I still think that stores to all these sysfs files >> should work. I'm not saying it's a trivial change (like setting a >> governor's polling time, etc would need some checks to cache the value >> and not start a timer immediately, etc), but I think it's a more >> consistent and user friendly API. > > I agree. > > It also is backwards compatible with scripts that walk the cpufreq directories > for all CPUs without checking the online attribute and expect things to work. Good to see some support. I do know Viresh doesn't like this :) Thinking more about it, it'll also make the code simpler since we don't have to decide which CPU has the real files vs which ones have symlinks. We probably won't need policy->cpu or kobj_cpu anymore. I'd love to do all these changes, but I doubt I'll find the time with the official job responsibilities I have. We'll see. >> If the user wants to set a min CPU freq, why should they care if the CPU >> is online at that very instant? It gets especially painful if you have a >> thermal daemon that's plugging in/out CPUs while the user or a script is >> trying to set the parameters. > > You mean putting them offline/online I suppose? Yup, I meant that a thermal daemon is putting them online/offline -- so used to using the terms plugging in/out internally since we don't have to deal with physical removals on mobile devices (YET?!). It'll be quite an achievement to see a daemon actually plugging a CPU in/out :) Thanks, Saravana
On 11-09-15, 15:07, Saravana Kannan wrote:
> Good to see some support. I do know Viresh doesn't like this :)
Sorry for catching up late, but looks like you guys did convince me on
this.
Let me get some patches out :)
On 10/11/2015 02:47 AM, Viresh Kumar wrote: > On 11-09-15, 15:07, Saravana Kannan wrote: >> Good to see some support. I do know Viresh doesn't like this :) > > Sorry for catching up late, but looks like you guys did convince me on > this. Did I also convince you about allowing changing of parameters for offline CPUs? Which would also include inactive policies? > > Let me get some patches out :) > Thanks! -Saravana
On 12-10-15, 12:43, Saravana Kannan wrote: > Did I also convince you about allowing changing of parameters for > offline CPUs? Which would also include inactive policies? Yes, but that requires some careful modification of the code as there are various paths of the sysfs write path. And we need to see that we don't do anything more than just updating the files. So, I left it for now.
On 10/12/2015 08:47 PM, Viresh Kumar wrote: > On 12-10-15, 12:43, Saravana Kannan wrote: >> Did I also convince you about allowing changing of parameters for >> offline CPUs? Which would also include inactive policies? > > Yes, but that requires some careful modification of the code as there > are various paths of the sysfs write path. And we need to see that we > don't do anything more than just updating the files. So, I left it for > now. > Agreed. It's non-trivial and we can do that separately. -Saravana
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile index 9fde14544ead..c09945aa7f17 100644 --- a/drivers/cpufreq/Makefile +++ b/drivers/cpufreq/Makefile @@ -1,3 +1,4 @@ +subdir-ccflags-y := -DDEBUG # CPUfreq core obj-$(CONFIG_CPU_FREQ) += cpufreq.o freq_table.o