Message ID | 1670308998-12313-1-git-send-email-lirongqing@baidu.com |
---|---|
State | New |
Headers | show |
Series | cpuidle-haltpoll: Disable kvm guest polling when mwait_idle is used | expand |
> Before change, "sockperf ping-pong -i 127.0.0.1 -p 20001 --tcp " latency is: > sockperf: Summary: Latency is 6.245 usec > > this patch, disable cpuidle-haltpoll > sockperf: Summary: Latency is 4.671 usec > > > using arch_cpu_idle() in default_enter_idle() > sockperf: Summary: Latency is 4.285 usec > When I did upper tests, I taskset sockperf server and client to different cpus, so using arch_cpu_idle() in default_enter_idle() gets better result. I test unixbench also, find that Don't loading cpuidle-haltholl can get more performance on 8 cores (2 threads per core) intel cpu which cstate is disabled in host Don't load cpuidle-haltpoll: System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 45665777.0 3913.1 Double-Precision Whetstone 55.0 6808.3 1237.9 Execl Throughput 43.0 4498.0 1046.1 File Copy 1024 bufsize 2000 maxblocks 3960.0 1285429.0 3246.0 File Copy 256 bufsize 500 maxblocks 1655.0 343057.0 2072.9 File Copy 4096 bufsize 8000 maxblocks 5800.0 3750374.0 6466.2 Pipe Throughput 12440.0 2271618.8 1826.1 Pipe-based Context Switching 4000.0 166393.6 416.0 Process Creation 126.0 12845.8 1019.5 Shell Scripts (1 concurrent) 42.4 9470.2 2233.5 Shell Scripts (8 concurrent) 6.0 4526.6 7544.4 System Call Overhead 15000.0 2082221.1 1388.1 ======== System Benchmarks Index Score 1995.8 System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 388552307.2 33295.0 Double-Precision Whetstone 55.0 98968.4 17994.2 Execl Throughput 43.0 42346.5 9848.0 File Copy 1024 bufsize 2000 maxblocks 3960.0 945510.0 2387.7 File Copy 256 bufsize 500 maxblocks 1655.0 246909.0 1491.9 File Copy 4096 bufsize 8000 maxblocks 5800.0 3394311.0 5852.3 Pipe Throughput 12440.0 21325271.3 17142.5 Pipe-based Context Switching 4000.0 2746301.9 6865.8 Process Creation 126.0 94459.6 7496.8 Shell Scripts (1 concurrent) 42.4 62378.6 14711.9 Shell Scripts (8 concurrent) 6.0 8841.4 14735.7 System Call Overhead 15000.0 8850338.7 5900.2 ======== System Benchmarks Index Score 8482.8 Replace default_idle with arch_cpu_idle, and load cpuidle-haltpoll, code like below: @@ -32,7 +33,7 @@ static int default_enter_idle(struct cpuidle_device *dev, local_irq_enable(); return index; } - default_idle(); + arch_cpu_idle(); return index; } Result is below: System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 42647356.0 3654.4 Double-Precision Whetstone 55.0 6812.9 1238.7 Execl Throughput 43.0 4424.4 1028.9 File Copy 1024 bufsize 2000 maxblocks 3960.0 1297513.0 3276.5 File Copy 256 bufsize 500 maxblocks 1655.0 346228.0 2092.0 File Copy 4096 bufsize 8000 maxblocks 5800.0 3902320.0 6728.1 Pipe Throughput 12440.0 2292292.4 1842.7 Pipe-based Context Switching 4000.0 163775.9 409.4 Process Creation 126.0 12216.8 969.6 Shell Scripts (1 concurrent) 42.4 9375.4 2211.2 Shell Scripts (8 concurrent) 6.0 4395.2 7325.3 System Call Overhead 15000.0 2030942.7 1354.0 ======== System Benchmarks Index Score 1971.4 System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 390316671.8 33446.2 Double-Precision Whetstone 55.0 98945.1 17990.0 Execl Throughput 43.0 39600.0 9209.3 File Copy 1024 bufsize 2000 maxblocks 3960.0 939526.0 2372.5 File Copy 256 bufsize 500 maxblocks 1655.0 256813.0 1551.7 File Copy 4096 bufsize 8000 maxblocks 5800.0 3431020.0 5915.6 Pipe Throughput 12440.0 21320118.9 17138.4 Pipe-based Context Switching 4000.0 2743588.5 6859.0 Process Creation 126.0 94142.2 7471.6 Shell Scripts (1 concurrent) 42.4 62425.0 14722.9 Shell Scripts (8 concurrent) 6.0 8841.3 14735.5 System Call Overhead 15000.0 8864145.2 5909.4 ======== System Benchmarks Index Score 8467.7 So I think we should not load cpuidle-haltpoll driver when guest has mwait What's your suggestion Thanks -Li
On Wed, Dec 07 2022 at 10:49, Rongqing Li wrote: >> Before change, "sockperf ping-pong -i 127.0.0.1 -p 20001 --tcp " latency is: >> sockperf: Summary: Latency is 6.245 usec >> >> this patch, disable cpuidle-haltpoll >> sockperf: Summary: Latency is 4.671 usec >> >> >> using arch_cpu_idle() in default_enter_idle() >> sockperf: Summary: Latency is 4.285 usec >> > > When I did upper tests, I taskset sockperf server and client to > different cpus, so using arch_cpu_idle() in default_enter_idle() gets > better result. > > I test unixbench also, find that Don't loading cpuidle-haltholl can > get more performance on 8 cores (2 threads per core) intel cpu which > cstate is disabled in host > > Don't load cpuidle-haltpoll: > System Benchmarks Index Score 1995.8 > System Benchmarks Index Score 8482.8 > > Replace default_idle with arch_cpu_idle, and load cpuidle-haltpoll, code like below: > System Benchmarks Index Score 1971.4 > System Benchmarks Index Score 8467.7 > > So I think we should not load cpuidle-haltpoll driver when guest has mwait So in the above you got: Driver loaded not modified: 6.245 Driver not loaded: 4.671 ~25% Driver loaded modified: 4.285 ~30% Now with unixbench: Driver not loaded: 8482.8 Driver loaded modified: 8467.7 ~0.2% So because of 0.2% delta you justify to throw away a 5% win? If you really care about the 0.2%, then blacklist the module for your use case. Thanks, tglx
> -----Original Message----- > From: Thomas Gleixner <tglx@linutronix.de> > Sent: Wednesday, December 7, 2022 7:41 PM > To: Li,Rongqing <lirongqing@baidu.com>; Rafael J. Wysocki <rafael@kernel.org> > Cc: mingo@redhat.com; bp@alien8.de; dave.hansen@linux.intel.com; > x86@kernel.org; rafael@kernel.org; daniel.lezcano@linaro.org; > peterz@infradead.org; akpm@linux-foundation.org; tony.luck@intel.com; > jpoimboe@kernel.org; linux-kernel@vger.kernel.org; linux-pm@vger.kernel.org > Subject: RE: [PATCH] cpuidle-haltpoll: Disable kvm guest polling when > mwait_idle is used > > On Wed, Dec 07 2022 at 10:49, Rongqing Li wrote: > >> Before change, "sockperf ping-pong -i 127.0.0.1 -p 20001 --tcp " latency is: > >> sockperf: Summary: Latency is 6.245 usec > >> > >> this patch, disable cpuidle-haltpoll > >> sockperf: Summary: Latency is 4.671 usec > >> > >> > >> using arch_cpu_idle() in default_enter_idle() > >> sockperf: Summary: Latency is 4.285 usec > >> > > > > When I did upper tests, I taskset sockperf server and client to > > different cpus, so using arch_cpu_idle() in default_enter_idle() gets > > better result. > > > > I test unixbench also, find that Don't loading cpuidle-haltholl can > > get more performance on 8 cores (2 threads per core) intel cpu which > > cstate is disabled in host > > > > Don't load cpuidle-haltpoll: > > System Benchmarks Index Score > 1995.8 > > System Benchmarks Index Score > 8482.8 > > > > Replace default_idle with arch_cpu_idle, and load cpuidle-haltpoll, code like > below: > > System Benchmarks Index Score > 1971.4 > > System Benchmarks Index Score > 8467.7 > > > > So I think we should not load cpuidle-haltpoll driver when guest has > > mwait > > So in the above you got: > > Driver loaded not modified: 6.245 > Driver not loaded: 4.671 ~25% > Driver loaded modified: 4.285 ~30% > > Now with unixbench: > > Driver not loaded: 8482.8 > Driver loaded modified: 8467.7 ~0.2% > > So because of 0.2% delta you justify to throw away a 5% win? > > If you really care about the 0.2%, then blacklist the module for your use case. > Ok, Build it as module by default Thanks -Li
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 67c9d73..159ef33 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -862,4 +862,6 @@ bool arch_is_platform_page(u64 paddr); #define arch_is_platform_page arch_is_platform_page #endif +bool is_mwait_idle(void); + #endif /* _ASM_X86_PROCESSOR_H */ diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index c21b734..330972c 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -896,6 +896,12 @@ void select_idle_routine(const struct cpuinfo_x86 *c) x86_idle = default_idle; } +bool is_mwait_idle(void) +{ + return x86_idle == mwait_idle; +} +EXPORT_SYMBOL_GPL(is_mwait_idle); + void amd_e400_c1e_apic_setup(void) { if (boot_cpu_has_bug(X86_BUG_AMD_APIC_C1E)) { diff --git a/drivers/cpuidle/cpuidle-haltpoll.c b/drivers/cpuidle/cpuidle-haltpoll.c index 3a39a7f..8cf1ddf 100644 --- a/drivers/cpuidle/cpuidle-haltpoll.c +++ b/drivers/cpuidle/cpuidle-haltpoll.c @@ -17,6 +17,7 @@ #include <linux/sched/idle.h> #include <linux/kvm_para.h> #include <linux/cpuidle_haltpoll.h> +#include <linux/processor.h> static bool force __read_mostly; module_param(force, bool, 0444); @@ -111,6 +112,9 @@ static int __init haltpoll_init(void) if (!kvm_para_available() || !haltpoll_want()) return -ENODEV; + if (is_mwait_idle()) + return -ENODEV; + cpuidle_poll_state_init(drv); ret = cpuidle_register_driver(drv);