Message ID | 20180122172643.29742-2-ard.biesheuvel@linaro.org |
---|---|
State | Superseded |
Headers | show |
Series | target-arm: add SHA-3, SM3 and SHA512 instruction support | expand |
On 22 January 2018 at 17:26, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > This implements emulation of the new SHA-512 instructions that have > been added as an optional extensions to the ARMv8 Crypto Extensions > in ARM v8.2. > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > +void HELPER(crypto_sha512h)(void *vd, void *vn, void *vm) > +{ > + uint64_t *rd = vd; > + uint64_t *rn = vn; > + uint64_t *rm = vm; > + > + rd[1] += S1_512(rm[1]) + cho512(rm[1], rn[0], rn[1]); > + rd[0] += S1_512(rd[1] + rm[0]) + cho512(rd[1] + rm[0], rm[1], rn[0]); This gives the wrong answer if the destination register happens to be the same as one of the inputs, because the assignment to rd[1] will overwrite the input before the calculation of rd[0] uses it. Some extra temporaries should fix this. I'll try fixing that up locally and see if it passes tests then. thanks -- PMM
On 6 February 2018 at 18:45, Peter Maydell <peter.maydell@linaro.org> wrote: > On 22 January 2018 at 17:26, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >> This implements emulation of the new SHA-512 instructions that have >> been added as an optional extensions to the ARMv8 Crypto Extensions >> in ARM v8.2. >> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > >> +void HELPER(crypto_sha512h)(void *vd, void *vn, void *vm) >> +{ >> + uint64_t *rd = vd; >> + uint64_t *rn = vn; >> + uint64_t *rm = vm; >> + >> + rd[1] += S1_512(rm[1]) + cho512(rm[1], rn[0], rn[1]); >> + rd[0] += S1_512(rd[1] + rm[0]) + cho512(rd[1] + rm[0], rm[1], rn[0]); > > This gives the wrong answer if the destination register > happens to be the same as one of the inputs, because the > assignment to rd[1] will overwrite the input before the > calculation of rd[0] uses it. > It is supposed to use the new value of rd[1], so this is expected. > Some extra temporaries should fix this. I'll try fixing > that up locally and see if it passes tests then. > > thanks > -- PMM
On 6 February 2018 at 18:56, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 6 February 2018 at 18:45, Peter Maydell <peter.maydell@linaro.org> wrote: >> On 22 January 2018 at 17:26, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >>> This implements emulation of the new SHA-512 instructions that have >>> been added as an optional extensions to the ARMv8 Crypto Extensions >>> in ARM v8.2. >>> >>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >> >> >>> +void HELPER(crypto_sha512h)(void *vd, void *vn, void *vm) >>> +{ >>> + uint64_t *rd = vd; >>> + uint64_t *rn = vn; >>> + uint64_t *rm = vm; >>> + >>> + rd[1] += S1_512(rm[1]) + cho512(rm[1], rn[0], rn[1]); >>> + rd[0] += S1_512(rd[1] + rm[0]) + cho512(rd[1] + rm[0], rm[1], rn[0]); >> >> This gives the wrong answer if the destination register >> happens to be the same as one of the inputs, because the >> assignment to rd[1] will overwrite the input before the >> calculation of rd[0] uses it. >> > > It is supposed to use the new value of rd[1], so this is expected. > Ah hold on, I hit send too quickly, apologies. Yes, if rd == rm, then it will assume the wrong value. I missed this when doing the rewrite to use the new interface. >> Some extra temporaries should fix this. I'll try fixing >> that up locally and see if it passes tests then. >> >> thanks >> -- PMM
On 6 February 2018 at 18:57, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 6 February 2018 at 18:56, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >> On 6 February 2018 at 18:45, Peter Maydell <peter.maydell@linaro.org> wrote: >>> On 22 January 2018 at 17:26, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >>>> This implements emulation of the new SHA-512 instructions that have >>>> been added as an optional extensions to the ARMv8 Crypto Extensions >>>> in ARM v8.2. >>>> >>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >>> >>> >>>> +void HELPER(crypto_sha512h)(void *vd, void *vn, void *vm) >>>> +{ >>>> + uint64_t *rd = vd; >>>> + uint64_t *rn = vn; >>>> + uint64_t *rm = vm; >>>> + >>>> + rd[1] += S1_512(rm[1]) + cho512(rm[1], rn[0], rn[1]); >>>> + rd[0] += S1_512(rd[1] + rm[0]) + cho512(rd[1] + rm[0], rm[1], rn[0]); >>> >>> This gives the wrong answer if the destination register >>> happens to be the same as one of the inputs, because the >>> assignment to rd[1] will overwrite the input before the >>> calculation of rd[0] uses it. >>> >> >> It is supposed to use the new value of rd[1], so this is expected. >> > > Ah hold on, I hit send too quickly, apologies. > > Yes, if rd == rm, then it will assume the wrong value. I missed this > when doing the rewrite to use the new interface. OK. It sounds like the fix is more complicated than I thought it was, though, so I'll leave this up to you. My tests show that these insns seem OK: SM3PARTW1, SM3PARTW2, SM3SS1, SM3TT1A, SM3TT1B, SM3TT2A, SM3TT2B EOR3, BCAX, XAR SHA512SU1 These ones fail: SHA512H, SHA512H2, SHA512SU0 SM4EKEY, SM4E You also forgot to enable the SM4 CPU feature in the 'any' CPU and set it in the guest elf hwcaps. thanks -- PMM
On 6 February 2018 at 19:06, Peter Maydell <peter.maydell@linaro.org> wrote:
> SM4EKEY, SM4E
Sample SM4EKEY failure:
insn 0xce78cbdd (SM4EKEY V29.4S, V30.4S, V24.4S)
V24 : 6ee7a2520059bd15bac75e4436b3a1bd
V30 : a67d04e738f68da895ffd0c3e154e3e7
V29 actual: a67d04e7b98aaef47bf01b8158da5407
V29 expected: 8d492252b98aaef47bf01b8158da5407
(top 32 bits are wrong)
Sample SM4E failure:
insn 0xcec087dd (SM4E V29.4S, V30.4S)
V30 : a67d04e738f68da895ffd0c3e154e3e7
V29 actual : e272e88588a781b7e77a90dd5641e34b
V29 expected: a39884af88a781b7e77a90dd5641e34b
(top 32 bits again)
My test setup doesn't capture register values from
before the insn executes, which is awkward for SM4E since
it uses Vd as input as well as output. Probably the bug
is the same as SM4EKEY, though.
thanks
-- PMM
On 6 February 2018 at 19:15, Peter Maydell <peter.maydell@linaro.org> wrote: > On 6 February 2018 at 19:06, Peter Maydell <peter.maydell@linaro.org> wrote: >> SM4EKEY, SM4E > > Sample SM4EKEY failure: > insn 0xce78cbdd (SM4EKEY V29.4S, V30.4S, V24.4S) > V24 : 6ee7a2520059bd15bac75e4436b3a1bd > V30 : a67d04e738f68da895ffd0c3e154e3e7 > > V29 actual: a67d04e7b98aaef47bf01b8158da5407 > V29 expected: 8d492252b98aaef47bf01b8158da5407 > > (top 32 bits are wrong) > > Sample SM4E failure: > insn 0xcec087dd (SM4E V29.4S, V30.4S) > V30 : a67d04e738f68da895ffd0c3e154e3e7 > V29 actual : e272e88588a781b7e77a90dd5641e34b > V29 expected: a39884af88a781b7e77a90dd5641e34b > > (top 32 bits again) > > My test setup doesn't capture register values from > before the insn executes, which is awkward for SM4E since > it uses Vd as input as well as output. Probably the bug > is the same as SM4EKEY, though. ...and it's a pretty simple fix; we just weren't actually doing the 4th iteration of the algorithm: diff --git a/target/arm/crypto_helper.c b/target/arm/crypto_helper.c index b42c7e046b..2c3af64a52 100644 --- a/target/arm/crypto_helper.c +++ b/target/arm/crypto_helper.c @@ -653,7 +653,7 @@ void HELPER(crypto_sm4e)(void *vd, void *vn) union CRYPTO_STATE n = { .l = { rn[0], rn[1] } }; uint32_t t, i; - for (i = 0; i < 3; i++) { + for (i = 0; i < 4; i++) { t = CR_ST_WORD(d, (i + 1) % 4) ^ CR_ST_WORD(d, (i + 2) % 4) ^ CR_ST_WORD(d, (i + 3) % 4) ^ @@ -683,7 +683,7 @@ void HELPER(crypto_sm4ekey)(void *vd, void *vn, void* vm) uint32_t t, i; d = n; - for (i = 0; i < 3; i++) { + for (i = 0; i < 4; i++) { t = CR_ST_WORD(d, (i + 1) % 4) ^ CR_ST_WORD(d, (i + 2) % 4) ^ CR_ST_WORD(d, (i + 3) % 4) ^ That change is sufficient for SM4E and SM4EKEY to pass my tests. thanks -- PMM
On 6 February 2018 at 20:41, Peter Maydell <peter.maydell@linaro.org> wrote: > On 6 February 2018 at 19:15, Peter Maydell <peter.maydell@linaro.org> wrote: >> On 6 February 2018 at 19:06, Peter Maydell <peter.maydell@linaro.org> wrote: >>> SM4EKEY, SM4E >> >> Sample SM4EKEY failure: >> insn 0xce78cbdd (SM4EKEY V29.4S, V30.4S, V24.4S) >> V24 : 6ee7a2520059bd15bac75e4436b3a1bd >> V30 : a67d04e738f68da895ffd0c3e154e3e7 >> >> V29 actual: a67d04e7b98aaef47bf01b8158da5407 >> V29 expected: 8d492252b98aaef47bf01b8158da5407 >> >> (top 32 bits are wrong) >> >> Sample SM4E failure: >> insn 0xcec087dd (SM4E V29.4S, V30.4S) >> V30 : a67d04e738f68da895ffd0c3e154e3e7 >> V29 actual : e272e88588a781b7e77a90dd5641e34b >> V29 expected: a39884af88a781b7e77a90dd5641e34b >> >> (top 32 bits again) >> >> My test setup doesn't capture register values from >> before the insn executes, which is awkward for SM4E since >> it uses Vd as input as well as output. Probably the bug >> is the same as SM4EKEY, though. > > ...and it's a pretty simple fix; we just weren't actually > doing the 4th iteration of the algorithm: > > diff --git a/target/arm/crypto_helper.c b/target/arm/crypto_helper.c > index b42c7e046b..2c3af64a52 100644 > --- a/target/arm/crypto_helper.c > +++ b/target/arm/crypto_helper.c > @@ -653,7 +653,7 @@ void HELPER(crypto_sm4e)(void *vd, void *vn) > union CRYPTO_STATE n = { .l = { rn[0], rn[1] } }; > uint32_t t, i; > > - for (i = 0; i < 3; i++) { > + for (i = 0; i < 4; i++) { > t = CR_ST_WORD(d, (i + 1) % 4) ^ > CR_ST_WORD(d, (i + 2) % 4) ^ > CR_ST_WORD(d, (i + 3) % 4) ^ > @@ -683,7 +683,7 @@ void HELPER(crypto_sm4ekey)(void *vd, void *vn, void* vm) > uint32_t t, i; > > d = n; > - for (i = 0; i < 3; i++) { > + for (i = 0; i < 4; i++) { > t = CR_ST_WORD(d, (i + 1) % 4) ^ > CR_ST_WORD(d, (i + 2) % 4) ^ > CR_ST_WORD(d, (i + 3) % 4) ^ > > That change is sufficient for SM4E and SM4EKEY to pass my tests. > Thanks a lot for debugging that. As I said, I don't have test vectors, or I would have tested it myself, and most likely would have found this as well.
On 6 February 2018 at 19:59, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > Thanks a lot for debugging that. As I said, I don't have test vectors, > or I would have tested it myself, and most likely would have found > this as well. No problem. I spent a surprisingly long time looking at the inside of the loop trying to check whether it matched the pseudocode and why the fourth iteration only would misbehave, before I spotted what was actually happening :-) -- PMM
On 02/06/2018 11:15 AM, Peter Maydell wrote: > My test setup doesn't capture register values from > before the insn executes... I have patches for RISU to dump each record written to the trace file, which does allow one to go back and examine previous register values. r~
diff --git a/target/arm/cpu.h b/target/arm/cpu.h index 0a923e42d8bf..32a18510e70b 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -1372,6 +1372,7 @@ enum arm_features { ARM_FEATURE_M_SECURITY, /* M profile Security Extension */ ARM_FEATURE_JAZELLE, /* has (trivial) Jazelle implementation */ ARM_FEATURE_SVE, /* has Scalable Vector Extension */ + ARM_FEATURE_V8_SHA512, /* implements SHA512 part of v8 Crypto Extensions */ }; static inline int arm_feature(CPUARMState *env, int feature) diff --git a/target/arm/crypto_helper.c b/target/arm/crypto_helper.c index 9ca0bdead7bb..fb45948e9f13 100644 --- a/target/arm/crypto_helper.c +++ b/target/arm/crypto_helper.c @@ -1,7 +1,7 @@ /* * crypto_helper.c - emulate v8 Crypto Extensions instructions * - * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org> + * Copyright (C) 2013 - 2018 Linaro Ltd <ard.biesheuvel@linaro.org> * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -419,3 +419,76 @@ void HELPER(crypto_sha256su1)(void *vd, void *vn, void *vm) rd[0] = d.l[0]; rd[1] = d.l[1]; } + +/* + * The SHA-512 logical functions (same as above but using 64-bit operands) + */ + +static uint64_t cho512(uint64_t x, uint64_t y, uint64_t z) +{ + return (x & (y ^ z)) ^ z; +} + +static uint64_t maj512(uint64_t x, uint64_t y, uint64_t z) +{ + return (x & y) | ((x | y) & z); +} + +static uint64_t S0_512(uint64_t x) +{ + return ror64(x, 28) ^ ror64(x, 34) ^ ror64(x, 39); +} + +static uint64_t S1_512(uint64_t x) +{ + return ror64(x, 14) ^ ror64(x, 18) ^ ror64(x, 41); +} + +static uint64_t s0_512(uint64_t x) +{ + return ror64(x, 1) ^ ror64(x, 8) ^ (x >> 7); +} + +static uint64_t s1_512(uint64_t x) +{ + return ror64(x, 19) ^ ror64(x, 61) ^ (x >> 6); +} + +void HELPER(crypto_sha512h)(void *vd, void *vn, void *vm) +{ + uint64_t *rd = vd; + uint64_t *rn = vn; + uint64_t *rm = vm; + + rd[1] += S1_512(rm[1]) + cho512(rm[1], rn[0], rn[1]); + rd[0] += S1_512(rd[1] + rm[0]) + cho512(rd[1] + rm[0], rm[1], rn[0]); +} + +void HELPER(crypto_sha512h2)(void *vd, void *vn, void *vm) +{ + uint64_t *rd = vd; + uint64_t *rn = vn; + uint64_t *rm = vm; + + rd[1] += S0_512(rm[0]) + maj512(rn[0], rm[1], rm[0]); + rd[0] += S0_512(rd[1]) + maj512(rd[1], rm[0], rm[1]); +} + +void HELPER(crypto_sha512su0)(void *vd, void *vn) +{ + uint64_t *rd = vd; + uint64_t *rn = vn; + + rd[0] += s0_512(rd[1]); + rd[1] += s0_512(rn[0]); +} + +void HELPER(crypto_sha512su1)(void *vd, void *vn, void *vm) +{ + uint64_t *rd = vd; + uint64_t *rn = vn; + uint64_t *rm = vm; + + rd[0] += s1_512(rn[0]) + rm[0]; + rd[1] += s1_512(rn[1]) + rm[1]; +} diff --git a/target/arm/helper.h b/target/arm/helper.h index 5dec2e62626b..81d460702867 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -534,6 +534,11 @@ DEF_HELPER_FLAGS_3(crypto_sha256h2, TCG_CALL_NO_RWG, void, ptr, ptr, ptr) DEF_HELPER_FLAGS_2(crypto_sha256su0, TCG_CALL_NO_RWG, void, ptr, ptr) DEF_HELPER_FLAGS_3(crypto_sha256su1, TCG_CALL_NO_RWG, void, ptr, ptr, ptr) +DEF_HELPER_FLAGS_3(crypto_sha512h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr) +DEF_HELPER_FLAGS_3(crypto_sha512h2, TCG_CALL_NO_RWG, void, ptr, ptr, ptr) +DEF_HELPER_FLAGS_2(crypto_sha512su0, TCG_CALL_NO_RWG, void, ptr, ptr) +DEF_HELPER_FLAGS_3(crypto_sha512su1, TCG_CALL_NO_RWG, void, ptr, ptr, ptr) + DEF_HELPER_FLAGS_3(crc32, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32) DEF_HELPER_FLAGS_3(crc32c, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32) DEF_HELPER_2(dc_zva, void, env, i64) diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 10eef870fee2..888f5a39a283 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -11132,6 +11132,114 @@ static void disas_crypto_two_reg_sha(DisasContext *s, uint32_t insn) tcg_temp_free_ptr(tcg_rn_ptr); } +/* Crypto three-reg SHA512 + * 31 21 20 16 15 14 13 12 11 10 9 5 4 0 + * +-----------------------+------+---+---+-----+--------+------+------+ + * | 1 1 0 0 1 1 1 0 0 1 1 | Rm | 1 | O | 0 0 | opcode | Rn | Rd | + * +-----------------------+------+---+---+-----+--------+------+------+ + */ +static void disas_crypto_three_reg_sha512(DisasContext *s, uint32_t insn) +{ + int opcode = extract32(insn, 10, 2); + int o = extract32(insn, 14, 1); + int rm = extract32(insn, 16, 5); + int rn = extract32(insn, 5, 5); + int rd = extract32(insn, 0, 5); + int feature; + CryptoThreeOpFn *genfn; + + if (o == 0) { + switch (opcode) { + case 0: /* SHA512H */ + feature = ARM_FEATURE_V8_SHA512; + genfn = gen_helper_crypto_sha512h; + break; + case 1: /* SHA512H2 */ + feature = ARM_FEATURE_V8_SHA512; + genfn = gen_helper_crypto_sha512h2; + break; + case 2: /* SHA512SU1 */ + feature = ARM_FEATURE_V8_SHA512; + genfn = gen_helper_crypto_sha512su1; + break; + default: + unallocated_encoding(s); + return; + } + } else { + unallocated_encoding(s); + return; + } + + if (!arm_dc_feature(s, feature)) { + unallocated_encoding(s); + return; + } + + if (!fp_access_check(s)) { + return; + } + + if (genfn) { + TCGv_ptr tcg_rd_ptr, tcg_rn_ptr, tcg_rm_ptr; + + tcg_rd_ptr = vec_full_reg_ptr(s, rd); + tcg_rn_ptr = vec_full_reg_ptr(s, rn); + tcg_rm_ptr = vec_full_reg_ptr(s, rm); + + genfn(tcg_rd_ptr, tcg_rn_ptr, tcg_rm_ptr); + + tcg_temp_free_ptr(tcg_rd_ptr); + tcg_temp_free_ptr(tcg_rn_ptr); + tcg_temp_free_ptr(tcg_rm_ptr); + } else { + g_assert_not_reached(); + } +} + +/* Crypto two-reg SHA512 + * 31 12 11 10 9 5 4 0 + * +-----------------------------------------+--------+------+------+ + * | 1 1 0 0 1 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 | opcode | Rn | Rd | + * +-----------------------------------------+--------+------+------+ + */ +static void disas_crypto_two_reg_sha512(DisasContext *s, uint32_t insn) +{ + int opcode = extract32(insn, 10, 2); + int rn = extract32(insn, 5, 5); + int rd = extract32(insn, 0, 5); + TCGv_ptr tcg_rd_ptr, tcg_rn_ptr; + int feature; + CryptoTwoOpFn *genfn; + + switch (opcode) { + case 0: /* SHA512SU0 */ + feature = ARM_FEATURE_V8_SHA512; + genfn = gen_helper_crypto_sha512su0; + break; + default: + unallocated_encoding(s); + return; + } + + if (!arm_dc_feature(s, feature)) { + unallocated_encoding(s); + return; + } + + if (!fp_access_check(s)) { + return; + } + + tcg_rd_ptr = vec_full_reg_ptr(s, rd); + tcg_rn_ptr = vec_full_reg_ptr(s, rn); + + genfn(tcg_rd_ptr, tcg_rn_ptr); + + tcg_temp_free_ptr(tcg_rd_ptr); + tcg_temp_free_ptr(tcg_rn_ptr); +} + /* C3.6 Data processing - SIMD, inc Crypto * * As the decode gets a little complex we are using a table based @@ -11161,6 +11269,8 @@ static const AArch64DecodeTable data_proc_simd[] = { { 0x4e280800, 0xff3e0c00, disas_crypto_aes }, { 0x5e000000, 0xff208c00, disas_crypto_three_reg_sha }, { 0x5e280800, 0xff3e0c00, disas_crypto_two_reg_sha }, + { 0xce608000, 0xffe0b000, disas_crypto_three_reg_sha512 }, + { 0xcec08000, 0xfffff000, disas_crypto_two_reg_sha512 }, { 0x00000000, 0x00000000, NULL } };
This implements emulation of the new SHA-512 instructions that have been added as an optional extensions to the ARMv8 Crypto Extensions in ARM v8.2. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> --- target/arm/cpu.h | 1 + target/arm/crypto_helper.c | 75 ++++++++++++- target/arm/helper.h | 5 + target/arm/translate-a64.c | 110 ++++++++++++++++++++ 4 files changed, 190 insertions(+), 1 deletion(-) -- 2.11.0