Message ID | 20210730145957.7927-26-chang.seok.bae@intel.com |
---|---|
State | New |
Headers | show |
Series | None | expand |
> #endif /* _ASM_X86_SPECIAL_INSNS_H */ > diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c > index e6c543b5ee1d..fe1ba26cc797 100644 > --- a/drivers/idle/intel_idle.c > +++ b/drivers/idle/intel_idle.c > @@ -54,6 +54,8 @@ > #include <asm/intel-family.h> > #include <asm/mwait.h> > #include <asm/msr.h> > +#include <asm/fpu/internal.h> > +#include <asm/special_insns.h> > > #define INTEL_IDLE_VERSION "0.5.1" > > @@ -155,6 +157,55 @@ static __cpuidle int intel_idle_s2idle(struct cpuidle_device *dev, > return 0; > } > > +/** > + * idle_tile - Initialize TILE registers in INIT-state > + * > + * Leaving state in the dirty TILE registers may prevent the processor from > + * entering lower-power idle states. Use TILERELEASE to initialize the > + * state. Destroying fpregs state is safe after the fpstate update. > + */ > +static inline void idle_tile(void) > +{ > + if (boot_cpu_has(X86_FEATURE_XGETBV1) && (xgetbv(1) & XFEATURE_MASK_XTILE)) { > + tile_release(); > + fpregs_deactivate(¤t->thread.fpu); > + } > +} This isn't obviously safe code. There's a window in there when we have bogus, destroyed FPU register state but where we might be rescheduled. I would assume that preempt is off *somewhere* in this, but it would be nice to make sure of that, or at least mention the requirement for it to be off before this code is safe. I'm also not sure TILERELEASE is *technically* what you want here. xgetbv(1) tells you whether a feature is being tracked by the processor as being in its init state. tile_release() gets a feature into its init state, but does not guarantee that the processor will *track* it as being in the init state. TILERELEASE is not documented to have an effect on XINUSE (init tracking). XRSTOR, on the other hand, is at least documented to affect XINUSE. It sounds like we either need a documentation update, or a clear explanation why TILERELEASE is being used over XRSTOR.
On 7/30/21 7:59 AM, Chang S. Bae wrote: > SPR supports AMX, and so this custom table uses idle entry points that know > how to initialize AMX TMM state, if necessary. That's pretty direct with where this is showing up. But, the cover letter is quite a bit more cagey: > Intel Advanced Matrix Extensions (AMX)[1][2] will be shipping on servers > soon. If you do another version of these, it might be handy to make sure those are as consistent as possible.
On Jul 30, 2021, at 11:41, Hansen, Dave <dave.hansen@intel.com> wrote: >> +/** >> + * idle_tile - Initialize TILE registers in INIT-state >> + * >> + * Leaving state in the dirty TILE registers may prevent the processor from >> + * entering lower-power idle states. Use TILERELEASE to initialize the >> + * state. Destroying fpregs state is safe after the fpstate update. >> + */ >> +static inline void idle_tile(void) >> +{ >> + if (boot_cpu_has(X86_FEATURE_XGETBV1) && (xgetbv(1) & XFEATURE_MASK_XTILE)) { >> + tile_release(); >> + fpregs_deactivate(¤t->thread.fpu); >> + } >> +} > > This isn't obviously safe code. There's a window in there when we have > bogus, destroyed FPU register state but where we might be rescheduled. > > I would assume that preempt is off *somewhere* in this, but it would be > nice to make sure of that, or at least mention the requirement for it to > be off before this code is safe. I can see preempt_disable() in this path: $kernel/sched/idle.c::play_idle_precise() --> preempt_disable() ... --> do_idle() --> cpuidle_idle_call() --> call_cpuidle() --> $drivers/cpuidle/cpuidle.c::cpuidle_enter() --> cpuidle_enter_state() --> target_state->enter() --> $drivers/idle/intel_idle.c::intel_idle_tile() --> idle_tile() ... --> preempt_enable() > I'm also not sure TILERELEASE is *technically* what you want here. > xgetbv(1) tells you whether a feature is being tracked by the processor > as being in its init state. tile_release() gets a feature into its init > state, but does not guarantee that the processor will *track* it as > being in the init state. > > TILERELEASE is not documented to have an effect on XINUSE (init tracking). > > XRSTOR, on the other hand, is at least documented to affect XINUSE. > > It sounds like we either need a documentation update, or a clear > explanation why TILERELEASE is being used over XRSTOR. TILERELEASE impacts INIT tracking at least on the first AMX implementation. I agree that the documentation needs some update. Thanks, Chang
On 8/3/21 2:32 PM, Bae, Chang Seok wrote: >>> +static inline void idle_tile(void) >>> +{ >>> + if (boot_cpu_has(X86_FEATURE_XGETBV1) && (xgetbv(1) & XFEATURE_MASK_XTILE)) { >>> + tile_release(); >>> + fpregs_deactivate(¤t->thread.fpu); >>> + } >>> +} >> This isn't obviously safe code. There's a window in there when we have >> bogus, destroyed FPU register state but where we might be rescheduled. >> >> I would assume that preempt is off *somewhere* in this, but it would be >> nice to make sure of that, or at least mention the requirement for it to >> be off before this code is safe. > I can see preempt_disable() in this path: > > $kernel/sched/idle.c::play_idle_precise() > --> preempt_disable() > ... > --> do_idle() > --> cpuidle_idle_call() > --> call_cpuidle() > --> $drivers/cpuidle/cpuidle.c::cpuidle_enter() > --> cpuidle_enter_state() > --> target_state->enter() > --> $drivers/idle/intel_idle.c::intel_idle_tile() > --> idle_tile() > ... > --> preempt_enable() OK, that's good. Can we comment about the preempt requirement somewhere? Or, maybe add a !in_atomic() warning? Also, should this have something like a fpregs_state_valid() check? If the registers are invalid, should this be calling tile_release()?
> Also, should this have something like a fpregs_state_valid() check? If the registers are invalid, should this be calling tile_release()?
From a correctness point of view, it is valid to always call tile_release() here.
From a performance point of view, tile_release() is very fast.
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index f3fbb84ff8a7..fada1bb82c7b 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -294,6 +294,12 @@ static inline int enqcmds(void __iomem *dst, const void *src) return 0; } +static inline void tile_release(void) +{ + /* Instruction opcode for TILERELEASE; supported in binutils >= 2.36. */ + asm volatile(".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0"); +} + #endif /* __KERNEL__ */ #endif /* _ASM_X86_SPECIAL_INSNS_H */ diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index e6c543b5ee1d..fe1ba26cc797 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -54,6 +54,8 @@ #include <asm/intel-family.h> #include <asm/mwait.h> #include <asm/msr.h> +#include <asm/fpu/internal.h> +#include <asm/special_insns.h> #define INTEL_IDLE_VERSION "0.5.1" @@ -155,6 +157,55 @@ static __cpuidle int intel_idle_s2idle(struct cpuidle_device *dev, return 0; } +/** + * idle_tile - Initialize TILE registers in INIT-state + * + * Leaving state in the dirty TILE registers may prevent the processor from + * entering lower-power idle states. Use TILERELEASE to initialize the + * state. Destroying fpregs state is safe after the fpstate update. + */ +static inline void idle_tile(void) +{ + if (boot_cpu_has(X86_FEATURE_XGETBV1) && (xgetbv(1) & XFEATURE_MASK_XTILE)) { + tile_release(); + fpregs_deactivate(¤t->thread.fpu); + } +} + +/** + * intel_idle_tile - Ask the processor to enter the given idle state. + * @dev: cpuidle device of the target CPU. + * @drv: cpuidle driver (assumed to point to intel_idle_driver). + * @index: Target idle state index. + * + * Ensure TILE registers in INIT-state before using intel_idle() to + * enter the idle state. + */ +static __cpuidle int intel_idle_tile(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int index) +{ + idle_tile(); + + return intel_idle(dev, drv, index); +} + +/** + * intel_idle_s2idle_tile - Ask the processor to enter the given idle state. + * @dev: cpuidle device of the target CPU. + * @drv: cpuidle driver (assumed to point to intel_idle_driver). + * @index: Target idle state index. + * + * Ensure TILE registers in INIT-state before using intel_idle_s2idle() to + * enter the idle state. + */ +static __cpuidle int intel_idle_s2idle_tile(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int index) +{ + idle_tile(); + + return intel_idle_s2idle(dev, drv, index); +} + /* * States are indexed by the cstate number, * which is also the index into the MWAIT hint array. @@ -752,6 +803,27 @@ static struct cpuidle_state icx_cstates[] __initdata = { .enter = NULL } }; +static struct cpuidle_state spr_cstates[] __initdata = { + { + .name = "C1", + .desc = "MWAIT 0x00", + .flags = MWAIT2flg(0x00), + .exit_latency = 1, + .target_residency = 1, + .enter = &intel_idle, + .enter_s2idle = intel_idle_s2idle, }, + { + .name = "C6", + .desc = "MWAIT 0x20", + .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED, + .exit_latency = 128, + .target_residency = 384, + .enter = &intel_idle_tile, + .enter_s2idle = intel_idle_s2idle_tile, }, + { + .enter = NULL } +}; + static struct cpuidle_state atom_cstates[] __initdata = { { .name = "C1E", @@ -1095,6 +1167,12 @@ static const struct idle_cpu idle_cpu_icx __initconst = { .use_acpi = true, }; +static const struct idle_cpu idle_cpu_spr __initconst = { + .state_table = spr_cstates, + .disable_promotion_to_c1e = true, + .use_acpi = true, +}; + static const struct idle_cpu idle_cpu_avn __initconst = { .state_table = avn_cstates, .disable_promotion_to_c1e = true, @@ -1157,6 +1235,7 @@ static const struct x86_cpu_id intel_idle_ids[] __initconst = { X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X, &idle_cpu_skx), X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, &idle_cpu_icx), X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D, &idle_cpu_icx), + X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &idle_cpu_spr), X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL, &idle_cpu_knl), X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM, &idle_cpu_knl), X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT, &idle_cpu_bxt),