mbox series

[v9,00/14] x86: Support Key Locker

Message ID 20240329015346.635933-1-chang.seok.bae@intel.com
Headers show
Series x86: Support Key Locker | expand

Message

Chang S. Bae March 29, 2024, 1:53 a.m. UTC
Hi all,

As posting this version, I wanted to make sure these code changes were
acknowledgeable at first:

The previous enabling process has been paused to address vulnerabilities
[1][2] that could compromise Key Locker's ability to protect AES keys.
Now, with the mainlining of mitigations [3][4], patches (Patch 10-11)
were added to ensure the application of these mitigations.

During this period, there was a significant change in the mainline commit
b81fac906a8f ("x86/fpu: Move FPU initialization into
arch_cpu_finalize_init()"). This affected Key Locker's initialization
code, which clobbers XMM registers for loading a wrapping key, as it
depends on FPU initialization.

In this revision, the setup code was adjusted to separate the
initialization part to be invoked during arch_initcall(). The remaining
code for copying the wrapping key from the backup resides in the
identify_cpu() -> setup_keylocker() path. This separation simplifies the
code and resolves an issue with hotplug.

The remaining changes mainly focus on the AES crypto driver, addressing
feedback from Eric. Notably, while doing so, it was realized better to
disallow a module build. Key Locker's AES instructions do not support
192-bit keys. Supporting a module build would require exporting some
AES-NI functions, leading to performance-impacting indirect calls. I
think we can revisit module support later if necessary.

Then, the following is a summary of changes per patch since v8 [6]:

PATCH7-8:
* Invoke the setup code via arch_initcall() due to upstream changes
  delaying the FPU setup.

PATCH9-11:
* Add new patches for security and hotplug support clarification

PATCH12:
* Drop the "nokeylocker" option. (Borislav Petkov)

PATCH13:
* Introduce 'union x86_aes_ctx'. (Eric Biggers)
* Ensure 'inline' for wrapper functions.

PATCH14:
* Combine the XTS enc/dec assembly code in a macro.  (Eric Biggers)
* Define setkey() as void instead of returning 'int'.  (Eric Biggers)
* Rearrange the assembly code to reduce jumps, especially for success
  cases.  (Eric Biggers)
* Update the changelog for clarification. (Eric Biggers)
* Exclude module build.

This series is based on my AES-NI setkey() cleanup [7], which has been
recently merged into the crypto repository [8], and I thought it was
better to go first. You can also find this series here:
    git://github.com/intel-staging/keylocker.git kl-v9

Thanks,
Chang

[1] Gather Data Sampling (GDS)
    https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html
[2] Register File Data Sampling (RFDS)
    https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/register-file-data-sampling.html
[3] Mainlining of GDS mitigation
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=64094e7e3118aff4b0be8ff713c242303e139834
[4] Mainlining of RFDS Mitigation
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0e33cf955f07e3991e45109cb3e29fbc9ca51d06
[5]  Initialize FPU late
    https://lore.kernel.org/lkml/168778151512.3634408.11432553576702911909.tglx@vps.praguecc.cz/
[6] V8: https://lore.kernel.org/lkml/20230603152227.12335-1-chang.seok.bae@intel.com/
[7] https://lore.kernel.org/lkml/20240322230459.456606-1-chang.seok.bae@intel.com/
[8] git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git

Chang S. Bae (14):
  Documentation/x86: Document Key Locker
  x86/cpufeature: Enumerate Key Locker feature
  x86/insn: Add Key Locker instructions to the opcode map
  x86/asm: Add a wrapper function for the LOADIWKEY instruction
  x86/msr-index: Add MSRs for Key Locker wrapping key
  x86/keylocker: Define Key Locker CPUID leaf
  x86/cpu/keylocker: Load a wrapping key at boot time
  x86/PM/keylocker: Restore the wrapping key on the resume from ACPI
    S3/4
  x86/hotplug/keylocker: Ensure wrapping key backup capability
  x86/cpu/keylocker: Check Gather Data Sampling mitigation
  x86/cpu/keylocker: Check Register File Data Sampling mitigation
  x86/Kconfig: Add a configuration for Key Locker
  crypto: x86/aes - Prepare for new AES-XTS implementation
  crypto: x86/aes-kl - Implement the AES-XTS algorithm

 Documentation/arch/x86/index.rst            |   1 +
 Documentation/arch/x86/keylocker.rst        |  96 +++++
 arch/x86/Kconfig                            |   3 +
 arch/x86/Kconfig.assembler                  |   5 +
 arch/x86/crypto/Kconfig                     |  17 +
 arch/x86/crypto/Makefile                    |   3 +
 arch/x86/crypto/aes-helper_asm.S            |  22 ++
 arch/x86/crypto/aes-helper_glue.h           | 168 ++++++++
 arch/x86/crypto/aeskl-intel_asm.S           | 412 ++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c          | 187 +++++++++
 arch/x86/crypto/aeskl-intel_glue.h          |  35 ++
 arch/x86/crypto/aesni-intel_asm.S           |  47 +--
 arch/x86/crypto/aesni-intel_glue.c          | 193 ++-------
 arch/x86/crypto/aesni-intel_glue.h          |  40 ++
 arch/x86/include/asm/cpufeatures.h          |   1 +
 arch/x86/include/asm/disabled-features.h    |   8 +-
 arch/x86/include/asm/keylocker.h            |  42 ++
 arch/x86/include/asm/msr-index.h            |   6 +
 arch/x86/include/asm/special_insns.h        |  28 ++
 arch/x86/include/uapi/asm/processor-flags.h |   2 +
 arch/x86/kernel/Makefile                    |   1 +
 arch/x86/kernel/cpu/common.c                |   4 +-
 arch/x86/kernel/cpu/cpuid-deps.c            |   1 +
 arch/x86/kernel/keylocker.c                 | 219 +++++++++++
 arch/x86/lib/x86-opcode-map.txt             |  11 +-
 arch/x86/power/cpu.c                        |   2 +
 tools/arch/x86/lib/x86-opcode-map.txt       |  11 +-
 27 files changed, 1363 insertions(+), 202 deletions(-)
 create mode 100644 Documentation/arch/x86/keylocker.rst
 create mode 100644 arch/x86/crypto/aes-helper_asm.S
 create mode 100644 arch/x86/crypto/aes-helper_glue.h
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.h
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h
 create mode 100644 arch/x86/include/asm/keylocker.h
 create mode 100644 arch/x86/kernel/keylocker.c


base-commit: 3a447c31d337bdec7fbc605a7a1e00aff4c492d0

Comments

Chang S. Bae April 7, 2024, 11:24 p.m. UTC | #1
On 3/28/2024 6:53 PM, Chang S. Bae wrote:
> 
> Then, the following is a summary of changes per patch since v8 [6]:
> 
> PATCH7-8:
> * Invoke the setup code via arch_initcall() due to upstream changes
>    delaying the FPU setup.
> 
> PATCH9-11:
> * Add new patches for security and hotplug support clarification

I've recently made updates to a few patches, primarily related to the
mitigation parts. While the series is still under review, Eric's VAES
patches have been merged into the crypto tree and are currently being
sorted out. Once things settle down, I will make a few adjustments on the
crypto side. Then, another revision will be necessary thereafter.

Thanks,
Chang
Eric Biggers April 8, 2024, 1:48 a.m. UTC | #2
Hi,

On Sun, Apr 07, 2024 at 04:24:18PM -0700, Chang S. Bae wrote:
> On 3/28/2024 6:53 PM, Chang S. Bae wrote:
> > 
> > Then, the following is a summary of changes per patch since v8 [6]:
> > 
> > PATCH7-8:
> > * Invoke the setup code via arch_initcall() due to upstream changes
> >    delaying the FPU setup.
> > 
> > PATCH9-11:
> > * Add new patches for security and hotplug support clarification
> 
> I've recently made updates to a few patches, primarily related to the
> mitigation parts. While the series is still under review, Eric's VAES
> patches have been merged into the crypto tree and are currently being
> sorted out. Once things settle down, I will make a few adjustments on the
> crypto side. Then, another revision will be necessary thereafter.
> 
> Thanks,
> Chang

Thanks for the updated patchset!

Do you have a plan for how this will be merged?  Which trees will the patches go
through?  I think that the actual AES-XTS implementation could still use a bit
more polishing; see my comments below.  However, patches 1-12 don't need to wait
for that.  Perhaps the x86 maintainers would like to take patches 1-12 for
v6.10?  Then the AES-XTS support can go through the crypto tree afterwards.

As you noticed, this cycle I've been optimizing AES-XTS for x86_64 by adding new
VAES and AES-NI + AVX implementations.  I have some ideas for the Key Locker
based implementation of AES-XTS:

First, surely it's the case that in practice, all CPUs that support Key Locker
also support AVX?  If so, then there's no need for the Key Locker assembly to
use legacy SSE instructions.  It should instead target AVX and use VEX-coded
instructions.  This would save some instructions and improve performance.

Since the Key Locker assembly only supports 64-bit mode, it should also feel
free to use registers xmm8-xmm15 for purposes such as caching the XTS tweaks.
This would improve performance.

Since the Key Locker assembly advances a large number of XTS tweaks at a time
(8), I'm also wondering if it would be faster to multiply by x^8 directly
instead of multiplying by x sequentially eight times.  This can be done using
the pclmulqdq instruction; see aes-xts-avx-x86_64.S which implements this
optimization.  Probably all CPUs that support Key Locker also support PCLMULQDQ.

I'm also trying to think of the best way to organize the Key Locker AES-XTS glue
code.  I see that you're proposing to share the glue code with the existing
AES-XTS implementations.  Unfortunately I don't think this ends up working very
well, due to the facts that the Key Locker code can return errors and uses a
different key type.  I think that for now, I'd prefer that you simply copied the
XTS glue code into aeskl-intel_glue.c and modified it as needed.  (But make sure
to use the new version of the glue code, which is faster.)

For falling back to AES-NI, I think the cleanest solution is to call the
top-level setkey, encrypt, and decrypt functions (the ones that are set in the
xts-aes-aesni skcipher_alg), instead of calling lower-level functions as your
current patchset does.

If you could keep the Key Locker assembly roughly stylistically consistent with
the new aes-xts-avx-x86_64.S, that would be great too.

Do you happen to know if there's any way to test the Key Locker AES-XTS code
without having access to a bare metal machine with a CPU that supports Key
Locker?  I tried a Sapphire Rapids based VM in Google Compute Engine, but it
doesn't enumerate Key Locker.  I also don't see anything in QEMU related to Key
Locker.  So I don't currently have an easy way to test this patchset.

Finally, a high level question.  Key Locker has been reported to be
substantially slower than AES-NI.  At the same time, VAES has recently doubled
performance over AES-NI.  I'd guess this leaves Key Locker even further behind.
Given that, how useful is this patchset?  I'm a bit concerned that this might be
something that sounds good in theory but won't be used in practice.  Are
performance improvements for Key Locker on the horizon?  (Well, there are the
improvements I suggested above, which should help, but it sounds like main issue
is the Key Locker instructions themselves which are just fundamentally slower.)

- Eric
Chang S. Bae April 15, 2024, 10:16 p.m. UTC | #3
Hi Eric,

Sorry for my delay. I just found your message in my cluttered junk 
folder today after returning from my week off.

On 4/7/2024 6:48 PM, Eric Biggers wrote:
> 
> Do you have a plan for how this will be merged?  Which trees will the patches go
> through?  I think that the actual AES-XTS implementation could still use a bit
> more polishing; see my comments below.  However, patches 1-12 don't need to wait
> for that.  Perhaps the x86 maintainers would like to take patches 1-12 for
> v6.10?  Then the AES-XTS support can go through the crypto tree afterwards.

Yeah, this series spans both x86 code and the crypto driver. I believe 
the decision should be made by maintainers. But, I suspect they'll want 
to see a well-established use case code before moving forward.

> As you noticed, this cycle I've been optimizing AES-XTS for x86_64 by adding new
> VAES and AES-NI + AVX implementations.  I have some ideas for the Key Locker
> based implementation of AES-XTS:

Thanks for the effort! I agree that the code could be beneficial for 
daily disk encryption needs.

> First, surely it's the case that in practice, all CPUs that support Key Locker
> also support AVX?  If so, then there's no need for the Key Locker assembly to
> use legacy SSE instructions.  It should instead target AVX and use VEX-coded
> instructions.  This would save some instructions and improve performance.

Unfortunately, the Key Locker instructions using the AVX states were 
never implemented.

> Since the Key Locker assembly only supports 64-bit mode, it should also feel
> free to use registers xmm8-xmm15 for purposes such as caching the XTS tweaks.
> This would improve performance.
> 
> Since the Key Locker assembly advances a large number of XTS tweaks at a time
> (8), I'm also wondering if it would be faster to multiply by x^8 directly
> instead of multiplying by x sequentially eight times.  This can be done using
> the pclmulqdq instruction; see aes-xts-avx-x86_64.S which implements this
> optimization.  Probably all CPUs that support Key Locker also support PCLMULQDQ.

I'll revisit the assembly code to incorporate your suggestions.

> I'm also trying to think of the best way to organize the Key Locker AES-XTS glue
> code.  I see that you're proposing to share the glue code with the existing
> AES-XTS implementations.  Unfortunately I don't think this ends up working very
> well, due to the facts that the Key Locker code can return errors and uses a
> different key type.  I think that for now, I'd prefer that you simply copied the
> XTS glue code into aeskl-intel_glue.c and modified it as needed.  (But make sure
> to use the new version of the glue code, which is faster.)

I agree; the proposed glue code looks messy due to the different return 
errors and key types. Ard made a point earlier about establishing a 
shared common code as they're logically quite similar. But I suppose it 
is more practical to pursue a separate glue code at this point.

> For falling back to AES-NI, I think the cleanest solution is to call the
> top-level setkey, encrypt, and decrypt functions (the ones that are set in the
> xts-aes-aesni skcipher_alg), instead of calling lower-level functions as your
> current patchset does.

Yes, falling back is indeed one of the ugly parts of this series. Let me 
retry this as you suggested.

> If you could keep the Key Locker assembly roughly stylistically consistent with
> the new aes-xts-avx-x86_64.S, that would be great too.

Okay.

> Do you happen to know if there's any way to test the Key Locker AES-XTS code
> without having access to a bare metal machine with a CPU that supports Key
> Locker?  I tried a Sapphire Rapids based VM in Google Compute Engine, but it
> doesn't enumerate Key Locker.  I also don't see anything in QEMU related to Key
> Locker.  So I don't currently have an easy way to test this patchset.

No, there isn't currently an emulation option available to the public 
that I'm aware of. this feature has been available on client systems 
since the Tiger Lake generation.

> Finally, a high level question.  Key Locker has been reported to be
> substantially slower than AES-NI.  At the same time, VAES has recently doubled
> performance over AES-NI.  I'd guess this leaves Key Locker even further behind.
> Given that, how useful is this patchset?  I'm a bit concerned that this might be
> something that sounds good in theory but won't be used in practice.  Are
> performance improvements for Key Locker on the horizon?  (Well, there are the
> improvements I suggested above, which should help, but it sounds like main issue
> is the Key Locker instructions themselves which are just fundamentally slower.)

On our latest implementations, we've observed the Key Locker performance 
on cryptsetup seems to be roughly the same as what we posted earlier 
[2]. Yes, this sounds like a fair analogy to me, especially given your 
vAES code.

Thanks,
Chang

[1] 
https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
[2] 
https://lore.kernel.org/lkml/20230410225936.8940-1-chang.seok.bae@intel.com/
Eric Biggers April 15, 2024, 10:54 p.m. UTC | #4
On Mon, Apr 15, 2024 at 03:16:18PM -0700, Chang S. Bae wrote:
> > First, surely it's the case that in practice, all CPUs that support Key Locker
> > also support AVX?  If so, then there's no need for the Key Locker assembly to
> > use legacy SSE instructions.  It should instead target AVX and use VEX-coded
> > instructions.  This would save some instructions and improve performance.
> 
> Unfortunately, the Key Locker instructions using the AVX states were never
> implemented.

Sure, you could still use VEX-coded 128-bit instructions for everything other
than the actual AES (for example, the XTS tweak computation) though, right?
They're a bit more convenient to work with since they are non-destructive.

- Eric
Chang S. Bae April 15, 2024, 10:58 p.m. UTC | #5
On 4/15/2024 3:54 PM, Eric Biggers wrote:
> 
> Sure, you could still use VEX-coded 128-bit instructions for everything other
> than the actual AES (for example, the XTS tweak computation) though, right?
> They're a bit more convenient to work with since they are non-destructive.

Right.

Thanks,
Chang