Message ID | 20170718120645.15880-1-ard.biesheuvel@linaro.org |
---|---|
Headers | show |
Series | crypto: aes - retire table based generic AES | expand |
On 18 July 2017 at 13:06, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > The generic AES driver uses 16 lookup tables of 1 KB each, and has > encryption and decryption routines that are fully unrolled. Given how > the dependencies between this code and other drivers are declared in > Kconfig files, this code is always pulled into the core kernel, even > if it is usually superseded at runtime by accelerated drivers that > exist for many architectures. > > This leaves us with 25 KB of dead code in the kernel, which is negligible > in typical environments, but which is actually a big deal for the IoT > domain, where every kilobyte counts. > > Also, the scalar, table based AES routines that exist for ARM, arm64, i586 > and x86_64 share the lookup tables with AES generic, and may be invoked > occasionally when the time-invariant AES-NI or other special instruction > drivers are called in interrupt context, at which time the SIMD register > file cannot be used. Pulling 16 KB of code and 9 KB of instructions into > the L1s (and evicting what was already there) when a softirq happens to > be handled in the context of an interrupt taken from kernel mode (which > means no SIMD on x86) is also something that we may like to avoid, by > falling back to a much smaller and moderately less performant driver. > (Note that arm64 will be updated shortly to supply fallbacks for all > SIMD based AES implementations, which will be based on the core routines) > > For the reasons above, this series refactors the way the various AES > implementations are wired up, to allow the generic version in > crypto/aes_generic.c to be omitted from the build entirely. > > Patch #1 removes some bogus 'select CRYPTO_AES' statement. > > Patch #2 factors out aes-generic's lookup tables, which are shared with > arch-specific implementations in arch/x86, arch/arm and arch/arm64. > > Patch #3 replaces the table based aes-generic.o with a new aes.o based on > the fixed time cipher, and uses it to fulfil dependencies on CRYPTO_AES. > > Patch #4 switches the fallback in the AES-NI code to the new, generic encrypt > and decrypt routines so it no longer depends on the x86 scalar code or > [transitively] on AES-generic. > > Patch #5 tweaks the ARM table based code to only use 2 KB + 256 bytes worth > of lookup tables instead of 4 KB. > > Patch #6 does the same for arm64 > > Patch #7 removes the local copy of the AES sboxes from the arm64 NEON driver, > and switches to the ones exposed by the new AES core module instead. > > Patch #8 updates the Kconfig help text to be more descriptive of what they > actually control, rather than duplicating AES's wikipedia entry a number of > times. > > v4: - remove aes-generic altogether instead of allow a preference to be set Actually, after benchmarking the x86_64 asm AES code, I am not so sure we should remove AES_GENERIC at all, since it turns out to be faster. Interestingly, I found a remark by Eric in the git log stating the same, so if we want to cut down on AES variants, we should probably start by deleting the x86 code instead. So please disregard this for now: I will rework the other stuff I have so it no longer depends on this, and repost, because it is much more important for me that that makes it into v4.14. This can wait for v4.15, as far as I am concerned, and I will benchmark a bit more to see if we can get rid of the i586 code as well. -- Ard.
On Mon, Jul 24, 2017 at 07:59:43AM +0100, Ard Biesheuvel wrote: > On 18 July 2017 at 13:06, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > > The generic AES driver uses 16 lookup tables of 1 KB each, and has > > encryption and decryption routines that are fully unrolled. Given how > > the dependencies between this code and other drivers are declared in > > Kconfig files, this code is always pulled into the core kernel, even > > if it is usually superseded at runtime by accelerated drivers that > > exist for many architectures. > > > > This leaves us with 25 KB of dead code in the kernel, which is negligible > > in typical environments, but which is actually a big deal for the IoT > > domain, where every kilobyte counts. > > > > Also, the scalar, table based AES routines that exist for ARM, arm64, i586 > > and x86_64 share the lookup tables with AES generic, and may be invoked > > occasionally when the time-invariant AES-NI or other special instruction > > drivers are called in interrupt context, at which time the SIMD register > > file cannot be used. Pulling 16 KB of code and 9 KB of instructions into > > the L1s (and evicting what was already there) when a softirq happens to > > be handled in the context of an interrupt taken from kernel mode (which > > means no SIMD on x86) is also something that we may like to avoid, by > > falling back to a much smaller and moderately less performant driver. > > (Note that arm64 will be updated shortly to supply fallbacks for all > > SIMD based AES implementations, which will be based on the core routines) > > > > For the reasons above, this series refactors the way the various AES > > implementations are wired up, to allow the generic version in > > crypto/aes_generic.c to be omitted from the build entirely. > > > > Patch #1 removes some bogus 'select CRYPTO_AES' statement. > > > > Patch #2 factors out aes-generic's lookup tables, which are shared with > > arch-specific implementations in arch/x86, arch/arm and arch/arm64. > > > > Patch #3 replaces the table based aes-generic.o with a new aes.o based on > > the fixed time cipher, and uses it to fulfil dependencies on CRYPTO_AES. > > > > Patch #4 switches the fallback in the AES-NI code to the new, generic encrypt > > and decrypt routines so it no longer depends on the x86 scalar code or > > [transitively] on AES-generic. > > > > Patch #5 tweaks the ARM table based code to only use 2 KB + 256 bytes worth > > of lookup tables instead of 4 KB. > > > > Patch #6 does the same for arm64 > > > > Patch #7 removes the local copy of the AES sboxes from the arm64 NEON driver, > > and switches to the ones exposed by the new AES core module instead. > > > > Patch #8 updates the Kconfig help text to be more descriptive of what they > > actually control, rather than duplicating AES's wikipedia entry a number of > > times. > > > > v4: - remove aes-generic altogether instead of allow a preference to be set > > Actually, after benchmarking the x86_64 asm AES code, I am not so sure > we should remove AES_GENERIC at all, since it turns out to be faster. > Interestingly, I found a remark by Eric in the git log stating the > same, so if we want to cut down on AES variants, we should probably > start by deleting the x86 code instead. > > So please disregard this for now: I will rework the other stuff I have > so it no longer depends on this, and repost, because it is much more > important for me that that makes it into v4.14. This can wait for > v4.15, as far as I am concerned, and I will benchmark a bit more to > see if we can get rid of the i586 code as well. > Yes I did notice that aes-generic was actually faster. Probably the x86_64-asm implementation should be removed, but it may be worthwhile to try a few different gcc versions to see how well they compile aes-generic. I expect that x86_64-asm used to be faster but gcc has gotten smarter. Also x86_64-asm is only really useful on older CPUs, so ideally it should be benchmarked on an older CPU. Eric
On 24 July 2017 at 17:57, Eric Biggers <ebiggers3@gmail.com> wrote: > On Mon, Jul 24, 2017 at 07:59:43AM +0100, Ard Biesheuvel wrote: >> On 18 July 2017 at 13:06, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >> > The generic AES driver uses 16 lookup tables of 1 KB each, and has >> > encryption and decryption routines that are fully unrolled. Given how >> > the dependencies between this code and other drivers are declared in >> > Kconfig files, this code is always pulled into the core kernel, even >> > if it is usually superseded at runtime by accelerated drivers that >> > exist for many architectures. >> > >> > This leaves us with 25 KB of dead code in the kernel, which is negligible >> > in typical environments, but which is actually a big deal for the IoT >> > domain, where every kilobyte counts. >> > >> > Also, the scalar, table based AES routines that exist for ARM, arm64, i586 >> > and x86_64 share the lookup tables with AES generic, and may be invoked >> > occasionally when the time-invariant AES-NI or other special instruction >> > drivers are called in interrupt context, at which time the SIMD register >> > file cannot be used. Pulling 16 KB of code and 9 KB of instructions into >> > the L1s (and evicting what was already there) when a softirq happens to >> > be handled in the context of an interrupt taken from kernel mode (which >> > means no SIMD on x86) is also something that we may like to avoid, by >> > falling back to a much smaller and moderately less performant driver. >> > (Note that arm64 will be updated shortly to supply fallbacks for all >> > SIMD based AES implementations, which will be based on the core routines) >> > >> > For the reasons above, this series refactors the way the various AES >> > implementations are wired up, to allow the generic version in >> > crypto/aes_generic.c to be omitted from the build entirely. >> > >> > Patch #1 removes some bogus 'select CRYPTO_AES' statement. >> > >> > Patch #2 factors out aes-generic's lookup tables, which are shared with >> > arch-specific implementations in arch/x86, arch/arm and arch/arm64. >> > >> > Patch #3 replaces the table based aes-generic.o with a new aes.o based on >> > the fixed time cipher, and uses it to fulfil dependencies on CRYPTO_AES. >> > >> > Patch #4 switches the fallback in the AES-NI code to the new, generic encrypt >> > and decrypt routines so it no longer depends on the x86 scalar code or >> > [transitively] on AES-generic. >> > >> > Patch #5 tweaks the ARM table based code to only use 2 KB + 256 bytes worth >> > of lookup tables instead of 4 KB. >> > >> > Patch #6 does the same for arm64 >> > >> > Patch #7 removes the local copy of the AES sboxes from the arm64 NEON driver, >> > and switches to the ones exposed by the new AES core module instead. >> > >> > Patch #8 updates the Kconfig help text to be more descriptive of what they >> > actually control, rather than duplicating AES's wikipedia entry a number of >> > times. >> > >> > v4: - remove aes-generic altogether instead of allow a preference to be set >> >> Actually, after benchmarking the x86_64 asm AES code, I am not so sure >> we should remove AES_GENERIC at all, since it turns out to be faster. >> Interestingly, I found a remark by Eric in the git log stating the >> same, so if we want to cut down on AES variants, we should probably >> start by deleting the x86 code instead. >> >> So please disregard this for now: I will rework the other stuff I have >> so it no longer depends on this, and repost, because it is much more >> important for me that that makes it into v4.14. This can wait for >> v4.15, as far as I am concerned, and I will benchmark a bit more to >> see if we can get rid of the i586 code as well. >> > > Yes I did notice that aes-generic was actually faster. Probably the x86_64-asm > implementation should be removed, but it may be worthwhile to try a few > different gcc versions to see how well they compile aes-generic. I expect that > x86_64-asm used to be faster but gcc has gotten smarter. Also x86_64-asm is > only really useful on older CPUs, so ideally it should be benchmarked on an > older CPU. > I tried it on a ~10 year old E2200 chip, and aes-generic was slightly faster, i.e., 5 - 10 % as you noticed as well. If current GCC creates faster code, we should just remove the asm code IMO. No point in keeping it around for the sake of people who insist on building current linux with an outdated compiler.