Message ID | 20220718141123.136106-2-mlevitsk@redhat.com |
---|---|
State | Accepted |
Commit | b329f5ddc9ce4b622d9c7aaf5c6df4de52caf91a |
Headers | show |
Series | x86: cpuid: improve support for broken CPUID configurations | expand |
On Mon, Jul 18, 2022 at 05:11:19PM +0300, Maxim Levitsky wrote: > clear_cpu_cap(&boot_cpu_data) is very similar to setup_clear_cpu_cap > except that the latter also sets a bit in 'cpu_caps_cleared' which > later clears the same cap in secondary cpus, which is likely > what is meant here. > > Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") > > Reviewed-by: Kan Liang <kan.liang@linux.intel.com> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > --- > arch/x86/events/intel/lbr.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c > index 13179f31fe10fa..b08715172309a7 100644 > --- a/arch/x86/events/intel/lbr.c > +++ b/arch/x86/events/intel/lbr.c > @@ -1860,7 +1860,7 @@ void __init intel_pmu_arch_lbr_init(void) > return; > > clear_arch_lbr: > - clear_cpu_cap(&boot_cpu_data, X86_FEATURE_ARCH_LBR); > + setup_clear_cpu_cap(X86_FEATURE_ARCH_LBR); setup_clear_cpu_cap() has a very specific purpose - see apply_forced_caps(). This whole call sequence is an early_initcall() which is way after the whole CPU features picking apart happens. So what is actually this fixing?
On Mon, 2022-09-19 at 16:31 +0200, Borislav Petkov wrote: > On Mon, Jul 18, 2022 at 05:11:19PM +0300, Maxim Levitsky wrote: > > clear_cpu_cap(&boot_cpu_data) is very similar to setup_clear_cpu_cap > > except that the latter also sets a bit in 'cpu_caps_cleared' which > > later clears the same cap in secondary cpus, which is likely > > what is meant here. > > > > Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") > > > > Reviewed-by: Kan Liang <kan.liang@linux.intel.com> > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > > --- > > arch/x86/events/intel/lbr.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c > > index 13179f31fe10fa..b08715172309a7 100644 > > --- a/arch/x86/events/intel/lbr.c > > +++ b/arch/x86/events/intel/lbr.c > > @@ -1860,7 +1860,7 @@ void __init intel_pmu_arch_lbr_init(void) > > return; > > > > clear_arch_lbr: > > - clear_cpu_cap(&boot_cpu_data, X86_FEATURE_ARCH_LBR); > > + setup_clear_cpu_cap(X86_FEATURE_ARCH_LBR); > > setup_clear_cpu_cap() has a very specific purpose - see > apply_forced_caps(). > > This whole call sequence is an early_initcall() which is way after the > whole CPU features picking apart happens. > > So what is actually this fixing? > If I understand that correctly, the difference between clear_cpu_cap and setup_clear_cpu_cap is that setup_clear_cpu_cap should be called early when only the boot cpu is running and it 1. works on 'boot_cpu_data' which represents the boot cpu. 2. sets a bit in 'cpu_caps_cleared' which are later applied to all CPUs, including these that are hotplugged. On the other hand the clear_cpu_cap just affects the given 'struct cpuinfo_x86'. Call of 'clear_cpu_cap(&boot_cpu_data, X86_FEATURE_ARCH_LBR)' is weird since it still affects 'boot_cpu_data' but doesn't affect 'cpu_caps_cleared' I assumed that this was a mistake and the intention was to disable the feature on all CPUs. I need this patch because in the next patch, I change the clear_cpu_cap such as it detects being called on boot_cpu_data and in this case also clears bits in 'cpu_caps_cleared', thus while this patch does introduce a functional change, the next patch doesn't since this is the only place where clear_cpu_cap is called explicitly on 'boot_cpu_data' I do now notice that initcalls are run after smp is initialized, which means that this code doesn't really disable the CPUID feature on all CPUs at all. Maybe we can drop the call instead, which does seem to be wrong? Best regards, Maxim Levitsky
On Tue, Sep 20, 2022 at 11:20:47AM +0300, Maxim Levitsky wrote: > If I understand that correctly, the difference between clear_cpu_cap and setup_clear_cpu_cap > is that setup_clear_cpu_cap should be called early when only the boot cpu is running and it > > 1. works on 'boot_cpu_data' which represents the boot cpu. > 2. sets a bit in 'cpu_caps_cleared' which are later applied to all CPUs, including these that are hotplugged. Yes. > On the other hand the clear_cpu_cap just affects the given 'struct cpuinfo_x86'. Yes. > Call of 'clear_cpu_cap(&boot_cpu_data, X86_FEATURE_ARCH_LBR)' is weird since it still affects 'boot_cpu_data' > but doesn't affect 'cpu_caps_cleared' Yes. > I assumed that this was a mistake and the intention was to disable the feature on all CPUs. peterz says yes. > I need this patch because in the next patch, I change the clear_cpu_cap such as it detects being > called on boot_cpu_data and in this case also clears bits in 'cpu_caps_cleared', thus > while this patch does introduce a functional change, the next patch doesn't since this is the only > place where clear_cpu_cap is called explicitly on 'boot_cpu_data' This is not needed - this patch doing setup_clear_cpu_cap() should suffice. But, there must be something you're fixing with this. Which is it? Some weird virt config? > I do now notice that initcalls are run after smp is initialized, which > means that this code doesn't really disable the CPUID feature on all > CPUs at all. Well, not exactly. There's do_pre_smp_calls() which is where the early_initcall() thing is run. So setup_clear_cpu_cap() will make sure that the feature bit is cleared when the APs come online. Do you have a virt configuration where you can test this case where the feature flag is clear on all CPUs when it fails? I.e., "arch_lbr" will disappear in /proc/cpuinfo completely. Thx.
On Wed, Sep 28, 2022 at 01:49:34PM +0300, Maxim Levitsky wrote: > Patch 5 is the main fix - it makes the kernel to be tolerant to a > broken CPUID config (coming hopefully from hypervisor), where you have > a feature (AVX2 in my case) but not a feature on which this feature > depends (AVX). I really really don't like it when people are fixing the wrong thing. Why does the kernel need to get fixed when something else can't get its CPUID dependencies straight? I don't even want to know why something would set AVX2 without AVX?!?! Srsly. Some of your other bits look sensible and I'd take a deeper look but this does not make any sense. This is a hypervisor problem - not a kernel one. Thx.
On Thu, Oct 20, 2022 at 10:59:48AM +0200, Borislav Petkov wrote: > > I really really don't like it when people are fixing the wrong thing. > > Why does the kernel need to get fixed when something else can't get its > CPUID dependencies straight? I don't even want to know why something > would set AVX2 without AVX?!?! That's exactly what I said when this was first reported to me as a crypto bug :) Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Thu, 2022-10-20 at 17:05 +0800, Herbert Xu wrote: > On Thu, Oct 20, 2022 at 10:59:48AM +0200, Borislav Petkov wrote: > > I really really don't like it when people are fixing the wrong thing. > > > > Why does the kernel need to get fixed when something else can't get its > > CPUID dependencies straight? I don't even want to know why something > > would set AVX2 without AVX?!?! > > That's exactly what I said when this was first reported to me as > a crypto bug :) I agree with you, however this patch series is just refactoring/hardening of the kernel - if the kernel can avoid crashing - why not. Of course the hypervisor should not present such broken configurations to the guest - in fact the guest kernel can't fix this - guest userspace will still see wrong CPUID and can still crash. TL;DR - this patch series is not intended to workaround a broken hypervisor and such, it is just a hardening against misconfiguration. Best regards, Maxim Levitsky > > Cheers, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt >
On Thu, Oct 20, 2022 at 01:21:30PM +0300, Maxim Levitsky wrote: > I agree with you, however this patch series is just > refactoring/hardening of the kernel - if the kernel can avoid crashing > - why not. Because we're already drowning in patches which are trying to fix real problems. If we open the floodgates on alleged hardening just because some other part of the stack is misbehaving, there'll be no end of it. Thx.
On October 20, 2022 1:59:48 AM PDT, Borislav Petkov <bp@alien8.de> wrote: >On Wed, Sep 28, 2022 at 01:49:34PM +0300, Maxim Levitsky wrote: >> Patch 5 is the main fix - it makes the kernel to be tolerant to a >> broken CPUID config (coming hopefully from hypervisor), where you have >> a feature (AVX2 in my case) but not a feature on which this feature >> depends (AVX). > >I really really don't like it when people are fixing the wrong thing. > >Why does the kernel need to get fixed when something else can't get its >CPUID dependencies straight? I don't even want to know why something >would set AVX2 without AVX?!?! > >Srsly. > >Some of your other bits look sensible and I'd take a deeper look but >this does not make any sense. This is a hypervisor problem - not a >kernel one. > >Thx. > Yes, this is utterly nonsensical and it will break user space applications left, right, and center.
On 10/20/22 10:59, Borislav Petkov wrote: > On Wed, Sep 28, 2022 at 01:49:34PM +0300, Maxim Levitsky wrote: >> Patch 5 is the main fix - it makes the kernel to be tolerant to a >> broken CPUID config (coming hopefully from hypervisor), where you have >> a feature (AVX2 in my case) but not a feature on which this feature >> depends (AVX). > > I really really don't like it when people are fixing the wrong thing. > > Why does the kernel need to get fixed when something else can't get its > CPUID dependencies straight? I don't even want to know why something > would set AVX2 without AVX?!?! Users do so because they just "disable AVX" (e.g. in QEMU -cpu host,-avx) and that removes the AVX bit. Userspace didn't bother to implement the whole set of CPUID bit dependencies for AVX because: 1) Intel is adding AVX features every other week and probably half the time people would forget to add the dependency 2) anyway you absolutely need to check XCR0 before using AVX, which in the kernel is done using cpu_has_xfeatures(XFEATURE_MASK_YMM), and userspace *does* remove the XSAVE state from 0Dh leaf if you remove AVX. (2) in particular holds even on bare metal. The kernel bug here is that X86_FEATURE_AVX only tells you if the instructions are _present_, not if they are _usable_. Indeed, the XCR0 check is present for all other files in arch/x86/crypto, either instead or in addition to boot_cpu_has(X86_FEATURE_AVX). Maxim had sent a patch about a year ago to do it in aesni-intel-glue.c but Dave told him to fix the dependencies instead (https://lore.kernel.org/all/20211103124614.499580-1-mlevitsk@redhat.com/). What do you think of applying that patch instead? Thanks, Paolo
> From: Paolo Bonzini <pbonzini@redhat.com> ... > (2) in particular holds even on bare metal. The kernel bug here is that > X86_FEATURE_AVX only tells you if the instructions are _present_, not if > they are _usable_. Indeed, the XCR0 check is present for all other > files in arch/x86/crypto, either instead or in addition to > boot_cpu_has(X86_FEATURE_AVX). > > Maxim had sent a patch about a year ago to do it in aesni-intel-glue.c > but Dave told him to fix the dependencies instead > (https://lore.kernel.org/all/20211103124614.499580-1- > mlevitsk@redhat.com/). > What do you think of applying that patch instead? Most of the x86 crypto modules using X86_FEATURE_AVX do check cpu_has_xfeatures(XFEATURE_MASK_YMM, ... so it's probably prudent to add it to the rest (or remove it everywhere if it is not needed). 1. Currently checking XSAVE YMM: aria_aesni_avx_glue blake2s-glue camellia_aesni_avx2_glue camellia_aesni_avx_glue cast5_avx_glue cast6_avx_glue chacha_glue poly1305_glue serpent_avx2_glue serpent_avx_glue sha1_ssse3_glue sha256_ssse3_glue sha512_ssse3_glue sm3_avx_glue sm4_aesni_avx2_glue sm4_aesni_avx_glue twofish_avx_glue Currently not checking XSAVE YMM: aesni-intel_glue curve25519-x86_64 nhpoly1305-avx2-glue polyval-clmulni_glue 2. Similarly, modules using X86_FEATURE_AVX512F, X86_FEATURE_AVXX512VL and/or X86_FEATURE_AVX512BW probably need to check XFEATURE_MASK_AVX512: Currently checking XSAVE AVX512: blake2s-glue poly1305_glue Currently not checking XSAVE AVX512: chacha_glue 3. Similarly, modules using X86_FEATURE_XMM2 probably need to check XFEATURE_MASK_SSE: Currently checking XSAVE SSE: aegis128-aesni-glue Current not checking XSAVE SSE: nhpoly1305-sse2_glue serpent_sse2_glue
On November 2, 2022 7:27:52 AM PDT, "Elliott, Robert (Servers)" <elliott@hpe.com> wrote: > >> From: Paolo Bonzini <pbonzini@redhat.com> >... >> (2) in particular holds even on bare metal. The kernel bug here is that >> X86_FEATURE_AVX only tells you if the instructions are _present_, not if >> they are _usable_. Indeed, the XCR0 check is present for all other >> files in arch/x86/crypto, either instead or in addition to >> boot_cpu_has(X86_FEATURE_AVX). >> >> Maxim had sent a patch about a year ago to do it in aesni-intel-glue.c >> but Dave told him to fix the dependencies instead >> (https://lore.kernel.org/all/20211103124614.499580-1- >> mlevitsk@redhat.com/). >> What do you think of applying that patch instead? > >Most of the x86 crypto modules using X86_FEATURE_AVX do check > cpu_has_xfeatures(XFEATURE_MASK_YMM, ... > >so it's probably prudent to add it to the rest (or remove it everywhere >if it is not needed). > >1. Currently checking XSAVE YMM: > aria_aesni_avx_glue > blake2s-glue > camellia_aesni_avx2_glue camellia_aesni_avx_glue > cast5_avx_glue cast6_avx_glue > chacha_glue > poly1305_glue > serpent_avx2_glue serpent_avx_glue > sha1_ssse3_glue sha256_ssse3_glue sha512_ssse3_glue > sm3_avx_glue > sm4_aesni_avx2_glue sm4_aesni_avx_glue > twofish_avx_glue > >Currently not checking XSAVE YMM: > aesni-intel_glue > curve25519-x86_64 > nhpoly1305-avx2-glue > polyval-clmulni_glue > >2. Similarly, modules using X86_FEATURE_AVX512F, X86_FEATURE_AVXX512VL >and/or X86_FEATURE_AVX512BW probably need to check XFEATURE_MASK_AVX512: > >Currently checking XSAVE AVX512: > blake2s-glue > poly1305_glue > >Currently not checking XSAVE AVX512: > chacha_glue > >3. Similarly, modules using X86_FEATURE_XMM2 probably need to >check XFEATURE_MASK_SSE: > >Currently checking XSAVE SSE: > aegis128-aesni-glue > >Current not checking XSAVE SSE: > nhpoly1305-sse2_glue > serpent_sse2_glue > > > We have a dependency system for CPUID features. If you are going to do this (as opposed to "fixing" this in Qemu or just saying "don't do that, it isn't a valid hardware configuration."
On November 2, 2022 9:23:00 AM PDT, "H. Peter Anvin" <hpa@zytor.com> wrote: >On November 2, 2022 7:27:52 AM PDT, "Elliott, Robert (Servers)" <elliott@hpe.com> wrote: >> >>> From: Paolo Bonzini <pbonzini@redhat.com> >>... >>> (2) in particular holds even on bare metal. The kernel bug here is that >>> X86_FEATURE_AVX only tells you if the instructions are _present_, not if >>> they are _usable_. Indeed, the XCR0 check is present for all other >>> files in arch/x86/crypto, either instead or in addition to >>> boot_cpu_has(X86_FEATURE_AVX). >>> >>> Maxim had sent a patch about a year ago to do it in aesni-intel-glue.c >>> but Dave told him to fix the dependencies instead >>> (https://lore.kernel.org/all/20211103124614.499580-1- >>> mlevitsk@redhat.com/). >>> What do you think of applying that patch instead? >> >>Most of the x86 crypto modules using X86_FEATURE_AVX do check >> cpu_has_xfeatures(XFEATURE_MASK_YMM, ... >> >>so it's probably prudent to add it to the rest (or remove it everywhere >>if it is not needed). >> >>1. Currently checking XSAVE YMM: >> aria_aesni_avx_glue >> blake2s-glue >> camellia_aesni_avx2_glue camellia_aesni_avx_glue >> cast5_avx_glue cast6_avx_glue >> chacha_glue >> poly1305_glue >> serpent_avx2_glue serpent_avx_glue >> sha1_ssse3_glue sha256_ssse3_glue sha512_ssse3_glue >> sm3_avx_glue >> sm4_aesni_avx2_glue sm4_aesni_avx_glue >> twofish_avx_glue >> >>Currently not checking XSAVE YMM: >> aesni-intel_glue >> curve25519-x86_64 >> nhpoly1305-avx2-glue >> polyval-clmulni_glue >> >>2. Similarly, modules using X86_FEATURE_AVX512F, X86_FEATURE_AVXX512VL >>and/or X86_FEATURE_AVX512BW probably need to check XFEATURE_MASK_AVX512: >> >>Currently checking XSAVE AVX512: >> blake2s-glue >> poly1305_glue >> >>Currently not checking XSAVE AVX512: >> chacha_glue >> >>3. Similarly, modules using X86_FEATURE_XMM2 probably need to >>check XFEATURE_MASK_SSE: >> >>Currently checking XSAVE SSE: >> aegis128-aesni-glue >> >>Current not checking XSAVE SSE: >> nhpoly1305-sse2_glue >> serpent_sse2_glue >> >> >> > >We have a dependency system for CPUID features. If you are going to do this (as opposed to "fixing" this in Qemu or just saying "don't do that, it isn't a valid hardware configuration." One more thing: for obvious reasons, this doesn't fix user space if user space calls CPUID directly as opposed to reading /proc/cpuinfo or looking in sysfs. Unfortunately this is the rule rather than the exception, although for some features like AVX user space is also supposed to check XCR0, in which case it will work properly anyway.
> >We have a dependency system for CPUID features. If you are going to do > this (as opposed to "fixing" this in Qemu or just saying "don't do that, > it isn't a valid hardware configuration." > One more thing: for obvious reasons, this doesn't fix user space if user > space calls CPUID directly as opposed to reading /proc/cpuinfo or looking > in sysfs. Unfortunately this is the rule rather than the exception, > although for some features like AVX user space is also supposed to check > XCR0, in which case it will work properly anyway. The x86 crypto modules use boot_cpu_has() to check features before using them. If that (or some other function that we change them to use) returned false if the necessary XSAVE bits were not set, then they could drop the cpu_has_xfeatures() calls. arch/x86/kernel/fpu/xstate.c, which provides cpu_has_xfeatures(), and also has an xsave_cpu_features table listing the features needed be each xfeature. Perhaps that should provide a cpu_feature_usable() function that calls boot_cpu_has() and confirms the associated xfeatures are present. That way the logic would be in one place rather than replicated in 20+ crypto modules.
On 11/2/22 17:23, H. Peter Anvin wrote: > We have a dependency system for CPUID features. If you are going to > do this (as opposed to "fixing" this in Qemu or just saying "don't do > that, it isn't a valid hardware configuration." I didn't check Robert's full list, but at least in the case of aesni-intel_glue this is not about AVX2-depends-on-AVX or AVX-depends-on-XSAVE (and it is not about QEMU at all). It's just that checking AVX or AVX2 only tells you about presence and is not enough to say whether the instructions are _usable_. Likewise for AVX512. What would the dependency be? Paolo > > > 1. Currently checking XSAVE YMM: > aria_aesni_avx_glue > blake2s-glue > camellia_aesni_avx2_glue camellia_aesni_avx_glue > cast5_avx_glue cast6_avx_glue > chacha_glue > poly1305_glue > serpent_avx2_glue serpent_avx_glue > sha1_ssse3_glue sha256_ssse3_glue sha512_ssse3_glue > sm3_avx_glue > sm4_aesni_avx2_glue sm4_aesni_avx_glue > twofish_avx_glue > > Currently not checking XSAVE YMM: > aesni-intel_glue > curve25519-x86_64 > nhpoly1305-avx2-glue > polyval-clmulni_glue > > 2. Similarly, modules using X86_FEATURE_AVX512F, X86_FEATURE_AVXX512VL > and/or X86_FEATURE_AVX512BW probably need to check XFEATURE_MASK_AVX512: > > Currently checking XSAVE AVX512: > blake2s-glue > poly1305_glue > > Currently not checking XSAVE AVX512: > chacha_glue > > 3. Similarly, modules using X86_FEATURE_XMM2 probably need to > check XFEATURE_MASK_SSE: > > Currently checking XSAVE SSE: > aegis128-aesni-glue > > Current not checking XSAVE SSE: > nhpoly1305-sse2_glue > serpent_sse2_glue >
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c index 13179f31fe10fa..b08715172309a7 100644 --- a/arch/x86/events/intel/lbr.c +++ b/arch/x86/events/intel/lbr.c @@ -1860,7 +1860,7 @@ void __init intel_pmu_arch_lbr_init(void) return; clear_arch_lbr: - clear_cpu_cap(&boot_cpu_data, X86_FEATURE_ARCH_LBR); + setup_clear_cpu_cap(X86_FEATURE_ARCH_LBR); } /**