Message ID | 20220622144820.751402-5-mlevitsk@redhat.com |
---|---|
State | New |
Headers | show |
Series | x86: cpuid: improve support for broken CPUID configurations | expand |
On 6/22/22 07:48, Maxim Levitsky wrote: > Due to configuration bugs, sometimes a CPU feature is disabled in CPUID, > but not features that depend on it. > > While the above is not supported, the kernel should try to not crash, > and clearing the dependent cpu caps is the best way to do it. That's a rather paltry changelog. If I remember correctly, there's a crystal clear problem: If a CPU enumerates support for AVX2 but AVX via CPUID, the kernel crashes. There's also a follow-on problem. The kernel has all the data it needs to fix this, but just doesn't consult it: To make matters worse, the kernel _knows_ that this is an ill- advised situation: The kernel prevents itself from clearing the software representation of the AVX CPUID bit without also clearing AVX2. But, the kernel only consults this knowledge when it is clearing cpu_cap bits. It does not consult this information when it is populating those cpu_cap bits. > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > index 4cc79971d2d847..c83a8f447d6aed 100644 > --- a/arch/x86/kernel/cpu/common.c > +++ b/arch/x86/kernel/cpu/common.c > @@ -1469,7 +1469,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) > this_cpu->c_early_init(c); > > c->cpu_index = 0; > - filter_cpuid_features(c, false); > + filter_cpuid_features(c, true); > > if (this_cpu->c_bsp_init) > this_cpu->c_bsp_init(c); > @@ -1757,7 +1757,7 @@ static void identify_cpu(struct cpuinfo_x86 *c) > */ > > /* Filter out anything that depends on CPUID levels we don't have */ > - filter_cpuid_features(c, true); > + filter_cpuid_features(c, false); > > /* If the model name is still unset, do table lookup. */ > if (!c->x86_model_id[0]) { While we're at it, could we please rid ourselves of this unreadable mystery true/false gunk? > diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c > index bcb091d02a754b..6d9c0e39851805 100644 > --- a/arch/x86/kernel/cpu/cpuid-deps.c > +++ b/arch/x86/kernel/cpu/cpuid-deps.c > @@ -94,6 +94,11 @@ static inline void clear_feature(struct cpuinfo_x86 *c, unsigned int feature) > set_bit(feature, (unsigned long *)cpu_caps_cleared); > } > > +static inline bool test_feature(struct cpuinfo_x86 *c, unsigned int feature) > +{ > + return test_bit(feature, (unsigned long *)c->x86_capability); > +} > + > /* Take the capabilities and the BUG bits into account */ > #define MAX_FEATURE_BITS ((NCAPINTS + NBUGINTS) * sizeof(u32) * 8) > > @@ -127,6 +132,7 @@ void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int feature) > } while (changed); > } > > + > void setup_clear_cpu_cap(unsigned int feature) More superfluous whitespace. > { > clear_cpu_cap(&boot_cpu_data, feature); > @@ -137,6 +143,10 @@ void setup_clear_cpu_cap(unsigned int feature) > * Some CPU features depend on higher CPUID levels, which may not always > * be available due to CPUID level capping or broken virtualization > * software. Add those features to this table to auto-disable them. > + * > + * Also due to configuration bugs, some CPUID features might be present > + * while CPUID features that they depend on are not present, > + * e.g a AVX2 present but AVX is not present. > */ > struct cpuid_dependent_feature { > u32 feature; > @@ -151,9 +161,10 @@ cpuid_dependent_features[] = { > { 0, 0 } > }; > > -void filter_cpuid_features(struct cpuinfo_x86 *c, bool warn) > +void filter_cpuid_features(struct cpuinfo_x86 *c, bool early) > { I have at least an inkling what 'warn' could mean. But, 'early'? One man's 'early' is another one's 'late'. > const struct cpuid_dependent_feature *df; > + const struct cpuid_dep *d; > > for (df = cpuid_dependent_features; df->feature; df++) { > > @@ -172,10 +183,22 @@ void filter_cpuid_features(struct cpuinfo_x86 *c, bool warn) > continue; > > clear_cpu_cap(c, df->feature); > - if (!warn) > + if (early) > continue; Why is it that 'early' calls don't want warnings? > pr_warn("CPU: CPU feature " X86_CAP_FMT " disabled, no CPUID level 0x%x\n", > x86_cap_flag(df->feature), df->level); > } > + > + for (d = cpuid_deps; d->feature; d++) { > + > + if (!test_feature(c, d->feature) || test_feature(c, d->depends)) > + continue; > + > + clear_feature(c, d->feature); > + > + pr_warn("CPU: CPU feature " X86_CAP_FMT " disabled, because it depends on " > + X86_CAP_FMT " which is not supported in CPUID\n", > + x86_cap_flag(d->feature), x86_cap_flag(d->depends)); > + } > } The do_clear_cpu_cap() does this with a loop, presumably because a later (higher index in the array) feature in cpuid_deps[] could theoretically clear an earlier (lower index) feature. Also, is that message strictly correct? There might have been a clearcpuid= argument or even another dependency that ended up clearing a bit. It might have nothing to do with CPUID itself.
On Wed, 2022-06-22 at 08:32 -0700, Dave Hansen wrote: > On 6/22/22 07:48, Maxim Levitsky wrote: > > Due to configuration bugs, sometimes a CPU feature is disabled in CPUID, > > but not features that depend on it. > > > > While the above is not supported, the kernel should try to not crash, > > and clearing the dependent cpu caps is the best way to do it. > > That's a rather paltry changelog. > > If I remember correctly, there's a crystal clear problem: > > If a CPU enumerates support for AVX2 but AVX via CPUID, the > kernel crashes. > > There's also a follow-on problem. The kernel has all the data it needs > to fix this, but just doesn't consult it: > > To make matters worse, the kernel _knows_ that this is an ill- > advised situation: The kernel prevents itself from clearing the > software representation of the AVX CPUID bit without also > clearing AVX2. > > But, the kernel only consults this knowledge when it is clearing > cpu_cap bits. It does not consult this information when it is > populating those cpu_cap bits. Yes, I agree. I'll update the changelog with something more in depth. > > > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > > index 4cc79971d2d847..c83a8f447d6aed 100644 > > --- a/arch/x86/kernel/cpu/common.c > > +++ b/arch/x86/kernel/cpu/common.c > > @@ -1469,7 +1469,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) > > this_cpu->c_early_init(c); > > > > c->cpu_index = 0; > > - filter_cpuid_features(c, false); > > + filter_cpuid_features(c, true); > > > > if (this_cpu->c_bsp_init) > > this_cpu->c_bsp_init(c); > > @@ -1757,7 +1757,7 @@ static void identify_cpu(struct cpuinfo_x86 *c) > > */ > > > > /* Filter out anything that depends on CPUID levels we don't have */ > > - filter_cpuid_features(c, true); > > + filter_cpuid_features(c, false); > > > > /* If the model name is still unset, do table lookup. */ > > if (!c->x86_model_id[0]) { > > While we're at it, could we please rid ourselves of this unreadable > mystery true/false gunk? It is present if I understand the code correctly to avoid printing a warning twice. It used to be 'warn' parameter, and I changed it to 'early' parameter, inverting its boolean value, because I have seen that warning is not printed at all, and I assumed that it is because the first early call already clears the cpuid cap and the second call doesn't get the warning. Now however, looking at that I think that the same will happen with the cpuid level fitering as well, and thus we can just remove that 'warn' parameter. > > > diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c > > index bcb091d02a754b..6d9c0e39851805 100644 > > --- a/arch/x86/kernel/cpu/cpuid-deps.c > > +++ b/arch/x86/kernel/cpu/cpuid-deps.c > > @@ -94,6 +94,11 @@ static inline void clear_feature(struct cpuinfo_x86 *c, unsigned int feature) > > set_bit(feature, (unsigned long *)cpu_caps_cleared); > > } > > > > +static inline bool test_feature(struct cpuinfo_x86 *c, unsigned int feature) > > +{ > > + return test_bit(feature, (unsigned long *)c->x86_capability); > > +} > > + > > /* Take the capabilities and the BUG bits into account */ > > #define MAX_FEATURE_BITS ((NCAPINTS + NBUGINTS) * sizeof(u32) * 8) > > > > @@ -127,6 +132,7 @@ void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int feature) > > } while (changed); > > } > > > > + > > void setup_clear_cpu_cap(unsigned int feature) > > More superfluous whitespace. Sorry about that, will check better next time. > > > { > > clear_cpu_cap(&boot_cpu_data, feature); > > @@ -137,6 +143,10 @@ void setup_clear_cpu_cap(unsigned int feature) > > * Some CPU features depend on higher CPUID levels, which may not always > > * be available due to CPUID level capping or broken virtualization > > * software. Add those features to this table to auto-disable them. > > + * > > + * Also due to configuration bugs, some CPUID features might be present > > + * while CPUID features that they depend on are not present, > > + * e.g a AVX2 present but AVX is not present. > > */ > > struct cpuid_dependent_feature { > > u32 feature; > > @@ -151,9 +161,10 @@ cpuid_dependent_features[] = { > > { 0, 0 } > > }; > > > > -void filter_cpuid_features(struct cpuinfo_x86 *c, bool warn) > > +void filter_cpuid_features(struct cpuinfo_x86 *c, bool early) > > { > > I have at least an inkling what 'warn' could mean. But, 'early'? One > man's 'early' is another one's 'late'. I understand what you mean, as I said above, I will try to reproduce the original issue of cpuid level mismatch and see if I can remove the warn parameter at all. > > > const struct cpuid_dependent_feature *df; > > + const struct cpuid_dep *d; > > > > for (df = cpuid_dependent_features; df->feature; df++) { > > > > @@ -172,10 +183,22 @@ void filter_cpuid_features(struct cpuinfo_x86 *c, bool warn) > > continue; > > > > clear_cpu_cap(c, df->feature); > > - if (!warn) > > + if (early) > > continue; > > Why is it that 'early' calls don't want warnings? I don't know to be honest, except that I assumed that this allows to not print the warning twice, but as I said above, I might be able to just remove that code. > > > pr_warn("CPU: CPU feature " X86_CAP_FMT " disabled, no CPUID level 0x%x\n", > > x86_cap_flag(df->feature), df->level); > > } > > + > > + for (d = cpuid_deps; d->feature; d++) { > > + > > + if (!test_feature(c, d->feature) || test_feature(c, d->depends)) > > + continue; > > + > > + clear_feature(c, d->feature); > > + > > + pr_warn("CPU: CPU feature " X86_CAP_FMT " disabled, because it depends on " > > + X86_CAP_FMT " which is not supported in CPUID\n", > > + x86_cap_flag(d->feature), x86_cap_flag(d->depends)); > > + } > > } > > The do_clear_cpu_cap() does this with a loop, presumably because a later > (higher index in the array) feature in cpuid_deps[] could theoretically > clear an earlier (lower index) feature. Sorry this is my silly mistake. I intended to call clear_cpu_cap here, which will if needed disable all the depedencies, so a loop doesn't seem to be needed here. It's not very efficient but this is only done once per vCPU so shouldn't matter. > > Also, is that message strictly correct? There might have been a > clearcpuid= argument or even another dependency that ended up clearing a > bit. It might have nothing to do with CPUID itself. I think it should work, because clearcpuid= will end up calling clear_cpu_cap which will disable both the requested feature and everything that depends on it, thus filter_cpuid_features should not notice any inconsistencies. Other way around, if the clear_cpu_cap is called before filter_cpuid_features, it might 'fix' the inconsistency, and silence the warning but that isn't an issue IMHO. Thanks a lot for the review, Best regards, Maxim Levitsky >
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 4cc79971d2d847..c83a8f447d6aed 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1469,7 +1469,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) this_cpu->c_early_init(c); c->cpu_index = 0; - filter_cpuid_features(c, false); + filter_cpuid_features(c, true); if (this_cpu->c_bsp_init) this_cpu->c_bsp_init(c); @@ -1757,7 +1757,7 @@ static void identify_cpu(struct cpuinfo_x86 *c) */ /* Filter out anything that depends on CPUID levels we don't have */ - filter_cpuid_features(c, true); + filter_cpuid_features(c, false); /* If the model name is still unset, do table lookup. */ if (!c->x86_model_id[0]) { diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index bcb091d02a754b..6d9c0e39851805 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -94,6 +94,11 @@ static inline void clear_feature(struct cpuinfo_x86 *c, unsigned int feature) set_bit(feature, (unsigned long *)cpu_caps_cleared); } +static inline bool test_feature(struct cpuinfo_x86 *c, unsigned int feature) +{ + return test_bit(feature, (unsigned long *)c->x86_capability); +} + /* Take the capabilities and the BUG bits into account */ #define MAX_FEATURE_BITS ((NCAPINTS + NBUGINTS) * sizeof(u32) * 8) @@ -127,6 +132,7 @@ void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int feature) } while (changed); } + void setup_clear_cpu_cap(unsigned int feature) { clear_cpu_cap(&boot_cpu_data, feature); @@ -137,6 +143,10 @@ void setup_clear_cpu_cap(unsigned int feature) * Some CPU features depend on higher CPUID levels, which may not always * be available due to CPUID level capping or broken virtualization * software. Add those features to this table to auto-disable them. + * + * Also due to configuration bugs, some CPUID features might be present + * while CPUID features that they depend on are not present, + * e.g a AVX2 present but AVX is not present. */ struct cpuid_dependent_feature { u32 feature; @@ -151,9 +161,10 @@ cpuid_dependent_features[] = { { 0, 0 } }; -void filter_cpuid_features(struct cpuinfo_x86 *c, bool warn) +void filter_cpuid_features(struct cpuinfo_x86 *c, bool early) { const struct cpuid_dependent_feature *df; + const struct cpuid_dep *d; for (df = cpuid_dependent_features; df->feature; df++) { @@ -172,10 +183,22 @@ void filter_cpuid_features(struct cpuinfo_x86 *c, bool warn) continue; clear_cpu_cap(c, df->feature); - if (!warn) + if (early) continue; pr_warn("CPU: CPU feature " X86_CAP_FMT " disabled, no CPUID level 0x%x\n", x86_cap_flag(df->feature), df->level); } + + for (d = cpuid_deps; d->feature; d++) { + + if (!test_feature(c, d->feature) || test_feature(c, d->depends)) + continue; + + clear_feature(c, d->feature); + + pr_warn("CPU: CPU feature " X86_CAP_FMT " disabled, because it depends on " + X86_CAP_FMT " which is not supported in CPUID\n", + x86_cap_flag(d->feature), x86_cap_flag(d->depends)); + } }
Due to configuration bugs, sometimes a CPU feature is disabled in CPUID, but not features that depend on it. While the above is not supported, the kernel should try to not crash, and clearing the dependent cpu caps is the best way to do it. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> --- arch/x86/kernel/cpu/common.c | 4 ++-- arch/x86/kernel/cpu/cpuid-deps.c | 27 +++++++++++++++++++++++++-- 2 files changed, 27 insertions(+), 4 deletions(-)