Message ID | alpine.LRH.2.02.2204250723120.26714@file01.intranet.prod.int.rdu2.redhat.com |
---|---|
State | New |
Headers | show |
Series | [v2] hex2bin: make the function hex_to_bin constant-time | expand |
On Mon, Apr 25, 2022 at 08:07:48AM -0400, Mikulas Patocka wrote: > From: Mikulas Patocka <mpatocka@redhat.com> > > The function hex2bin is used to load cryptographic keys into device mapper > targets dm-crypt and dm-integrity. It should take constant time > independent on the processed data, so that concurrently running > unprivileged code can't infer any information about the keys via > microarchitectural convert channels. > > This patch changes the function hex_to_bin so that it contains no branches > and no memory accesses. > > Note that this shouldn't cause performance degradation because the size of > the new function is the same as the size of the old function (on x86-64) - > and the new function causes no branch misprediction penalties. > > I compile-tested this function with gcc on aarch64 alpha arm hppa hppa64 > i386 ia64 m68k mips32 mips64 powerpc powerpc64 riscv sh4 s390x sparc32 > sparc64 x86_64 and with clang on aarch64 arm hexagon i386 mips32 mips64 > powerpc powerpc64 s390x sparc32 sparc64 x86_64 to verify that there are no > branches in the generated code. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Cc: stable@vger.kernel.org > > --- > include/linux/kernel.h | 2 +- > lib/hexdump.c | 32 +++++++++++++++++++++++++------- > 2 files changed, 26 insertions(+), 8 deletions(-) > > Index: linux-2.6/lib/hexdump.c > =================================================================== > --- linux-2.6.orig/lib/hexdump.c 2022-04-24 18:51:20.000000000 +0200 > +++ linux-2.6/lib/hexdump.c 2022-04-25 13:15:26.000000000 +0200 > @@ -22,15 +22,33 @@ EXPORT_SYMBOL(hex_asc_upper); > * > * hex_to_bin() converts one hex digit to its actual value or -1 in case of bad > * input. > + * > + * This function is used to load cryptographic keys, so it is coded in such a > + * way that there are no conditions or memory accesses that depend on data. > + * > + * Explanation of the logic: > + * (ch - '9' - 1) is negative if ch <= '9' > + * ('0' - 1 - ch) is negative if ch >= '0' > + * we "and" these two values, so the result is negative if ch is in the range > + * '0' ... '9' > + * we are only interested in the sign, so we do a shift ">> 8"; note that right > + * shift of a negative value is implementation-defined, so we cast the > + * value to (unsigned) before the shift --- we have 0xffffff if ch is in > + * the range '0' ... '9', 0 otherwise > + * we "and" this value with (ch - '0' + 1) --- we have a value 1 ... 10 if ch is > + * in the range '0' ... '9', 0 otherwise > + * we add this value to -1 --- we have a value 0 ... 9 if ch is in the range '0' > + * ... '9', -1 otherwise > + * the next line is similar to the previous one, but we need to decode both > + * uppercase and lowercase letters, so we use (ch & 0xdf), which converts > + * lowercase to uppercase > */ > -int hex_to_bin(char ch) > +int hex_to_bin(unsigned char ch) > { > - if ((ch >= '0') && (ch <= '9')) > - return ch - '0'; > - ch = tolower(ch); > - if ((ch >= 'a') && (ch <= 'f')) > - return ch - 'a' + 10; > - return -1; > + unsigned char cu = ch & 0xdf; > + return -1 + > + ((ch - '0' + 1) & (unsigned)((ch - '9' - 1) & ('0' - 1 - ch)) >> 8) + > + ((cu - 'A' + 11) & (unsigned)((cu - 'F' - 1) & ('A' - 1 - cu)) >> 8); > } > EXPORT_SYMBOL(hex_to_bin); Hello, Just a heads up it seems this patch is causing some instability with crypto self tests on OpenRISC when using a PREEMPT kernel (no SMP). This was reported by Jason A. Donenfeld as it came up in wireguard testing. I am trying to figure out if this is an OpenRISC PREEMPT issue or something else. The warning I am seeing is a bit random but looks something like the following: [ 0.164000] Freeing initrd memory: 1696K [ 0.188000] xor: measuring software checksum speed [ 0.196000] 8regs : 1343 MB/sec [ 0.204000] 8regs_prefetch : 1347 MB/sec [ 0.212000] 32regs : 1335 MB/sec [ 0.220000] 32regs_prefetch : 1277 MB/sec [ 0.220000] xor: using function: 8regs_prefetch (1347 MB/sec) [ 0.252000] SARU running 25519 tests [ 0.424000] curve25519 self-test 5: FAIL [ 0.496000] curve25519 self-test 7: FAIL [ 1.744000] curve25519 self-test 45: FAIL [ 3.448000] ------------[ cut here ]------------ [ 3.448000] WARNING: CPU: 0 PID: 1 at lib/crypto/curve25519.c:19 curve25519_init+0x38/0x50 [ 3.448000] CPU: 0 PID: 1 Comm: swapper Not tainted 5.18.0-rc4+ #758 [ 3.448000] Call trace: [ 3.448000] [<(ptrval)>] ? __warn+0xe0/0x114 [ 3.448000] [<(ptrval)>] ? curve25519_init+0x38/0x50 [ 3.448000] [<(ptrval)>] ? warn_slowpath_fmt+0x5c/0x94 [ 3.448000] [<(ptrval)>] ? curve25519_init+0x0/0x50 [ 3.452000] [<(ptrval)>] ? curve25519_init+0x38/0x50 [ 3.452000] [<(ptrval)>] ? do_one_initcall+0x98/0x328 [ 3.452000] [<(ptrval)>] ? proc_register+0x4c/0x284 [ 3.452000] [<(ptrval)>] ? ignore_unknown_bootoption+0x0/0x8 [ 3.452000] [<(ptrval)>] ? kernel_init_freeable+0x1fc/0x2a8 [ 3.452000] [<(ptrval)>] ? ignore_unknown_bootoption+0x0/0x8 [ 3.452000] [<(ptrval)>] ? kernel_init+0x0/0x164 [ 3.452000] [<(ptrval)>] ? kernel_init+0x28/0x164 [ 3.452000] [<(ptrval)>] ? schedule_tail+0x18/0xac [ 3.452000] [<(ptrval)>] ? ret_from_fork+0x1c/0x9c [ 3.452000] ---[ end trace 0000000000000000 ]--- [ 3.452000] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 3.464000] printk: console [ttyS0] disabled [ 3.464000] 90000000.serial: ttyS0 at MMIO 0x90000000 (irq = 2, base_baud = 1250000) is a 16550A Example config: https://xn--4db.cc/cCRlQ1AE The self-test iteration number that fails is always a bit different. I am still in progress of investigating this and will not have a lot of time new the next few days. If anything ring's a bell let me know. -Stafford
On Wed, May 04, 2022 at 04:57:35AM -0400, Mikulas Patocka wrote: > On Wed, 4 May 2022, Stafford Horne wrote: > > On Mon, Apr 25, 2022 at 08:07:48AM -0400, Mikulas Patocka wrote: ... > > Just a heads up it seems this patch is causing some instability with crypto self > > tests on OpenRISC when using a PREEMPT kernel (no SMP). > > > > This was reported by Jason A. Donenfeld as it came up in wireguard testing. > > > > I am trying to figure out if this is an OpenRISC PREEMPT issue or something > > else. > That patch is so simple that I can't imagine how could it break the > curve25519 test. Are you sure that you bisected it correctly? Can you provide a test cases for hex_to_bin()?
On Wed, May 04, 2022 at 11:42:27AM +0200, Jason A. Donenfeld wrote: > (which might be semantically better anyway) and then > let the function itself do the sign change (see below). Actually, probably worse, not better. Didn't realize cu was being used after the masking.
On Wed, May 4, 2022 at 11:47 AM Milan Broz <gmazyland@gmail.com> wrote: > BTW we use exactly the same code from Mikulas in cryptsetup now (actually the report > was initiated from here :) and I added some tests for this code, > you can probably adapt it (we just use generic wrapper around it): I use something pretty similar in wireguard-tools: https://git.zx2c4.com/wireguard-tools/tree/src/encoding.c#n74 The code is fine. This is looking like a different issue somewhere else in the OpenRISC stack...
On Wed, May 04, 2022 at 11:57:29AM +0200, Jason A. Donenfeld wrote: > On Wed, May 04, 2022 at 11:42:27AM +0200, Jason A. Donenfeld wrote: > > So more likely is that this patch just helps unmask a real issue > > elsewhere -- linker, compiler, or register restoration after preemption. > > I don't think there's anything to do with regards to the patch of this > > thread, as it's clearly fine. > > The problem even goes away if I just add a nop... Alignment? Compiler bug? HW issue? > diff --git a/lib/hexdump.c b/lib/hexdump.c > index 06833d404398..ace74f9b3d5a 100644 > --- a/lib/hexdump.c > +++ b/lib/hexdump.c > @@ -46,6 +46,7 @@ EXPORT_SYMBOL(hex_asc_upper); > int hex_to_bin(unsigned char ch) > { > unsigned char cu = ch & 0xdf; > + __asm__("l.nop 0"); > return -1 + > ((ch - '0' + 1) & (unsigned)((ch - '9' - 1) & ('0' - 1 - ch)) >> 8) + > ((cu - 'A' + 11) & (unsigned)((cu - 'F' - 1) & ('A' - 1 - cu)) >> 8); >
On Wed, May 4, 2022 at 3:15 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > > Alignment? Compiler bug? HW issue? > > Probably one of those, yea. Removing the instruction addresses, the only > difference between the two compiles is: https://xn--4db.cc/Rrn8usaX/diff#line-440 Well, that address doesn't work for me at all. It turns into א.cc. I'd love to see the compiler problem, since I find those fascinating (mainly because they scare the hell out of me), but those web addresses you use are not working for me. It most definitely looks like an OpenRISC compiler bug - that code doesn't look like it does anything remotely undefined (and with the "unsigned char", nothing implementation-defined either). Linus
Hi Linus, On Wed, May 4, 2022 at 8:00 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, May 4, 2022 at 3:15 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > > > > Alignment? Compiler bug? HW issue? > > > > Probably one of those, yea. Removing the instruction addresses, the only > > difference between the two compiles is: https://xn--4db.cc/Rrn8usaX/diff#line-440 > > Well, that address doesn't work for me at all. It turns into א.cc. > > I'd love to see the compiler problem, since I find those fascinating > (mainly because they scare the hell out of me), but those web > addresses you use are not working for me. א.cc is correct. If you can't load it, your browser or something in your stack is broken. Choosing a non-ASCII domain like that clearly a bad decision because people with broken stacks can't load it? Yea, maybe. But maybe it's like the arch/alpha/ reordering of dependent loads applied to the web... A bit of stretch. > It most definitely looks like an OpenRISC compiler bug - that code > doesn't look like it does anything remotely undefined (and with the > "unsigned char", nothing implementation-defined either). I'm not so certain it's in the compiler anymore, actually. The bug exhibits itself even when that code isn't actually called. Adding nops to unrelated code also makes the problem go away. And removing these nops [1] makes the problem go away too. So maybe it's looking more like a linker bug (or linker script bug) related to alignment. Or whatever is jumping between contexts in the preemption code and restoring registers and such is assuming certain things about code layout that doesn't always hold. More fiddling is necessary still. Jason [1] https://lore.kernel.org/lkml/20220504110911.283525-1-Jason@zx2c4.com/
On Thu, May 5, 2022 at 4:43 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > Hi Linus, > > On Wed, May 4, 2022 at 8:00 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > On Wed, May 4, 2022 at 3:15 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > > > > > > Alignment? Compiler bug? HW issue? > > > > > > Probably one of those, yea. Removing the instruction addresses, the only > > > difference between the two compiles is: https://xn--4db.cc/Rrn8usaX/diff#line-440 > > > > Well, that address doesn't work for me at all. It turns into א.cc. > > > > I'd love to see the compiler problem, since I find those fascinating > > (mainly because they scare the hell out of me), but those web > > addresses you use are not working for me. > > א.cc is correct. If you can't load it, your browser or something in > your stack is broken. Choosing a non-ASCII domain like that clearly a > bad decision because people with broken stacks can't load it? Yea, > maybe. But maybe it's like the arch/alpha/ reordering of dependent > loads applied to the web... A bit of stretch. I have uploaded a diff I created here: https://gist.github.com/54334556f2907104cd12374872a0597c It shows the same output. > > It most definitely looks like an OpenRISC compiler bug - that code > > doesn't look like it does anything remotely undefined (and with the > > "unsigned char", nothing implementation-defined either). > > I'm not so certain it's in the compiler anymore, actually. The bug > exhibits itself even when that code isn't actually called. Adding nops > to unrelated code also makes the problem go away. And removing these > nops [1] makes the problem go away too. So maybe it's looking more > like a linker bug (or linker script bug) related to alignment. Or > whatever is jumping between contexts in the preemption code and > restoring registers and such is assuming certain things about code > layout that doesn't always hold. More fiddling is necessary still. Bisecting definitely came to this patch which is strange. Then reverting e5be15767e7e ("hex2bin: make the function hex_to_bin constant-time") did also fix the problem for me. But it could be any small patch that changes layout could make this go away. I have things to try: - more close look at the produced asembly diff - newer compiler (I fixed a few bugs in gcc 12 for openrisc, and this testing came up in gcc 11) - trying on FPGA's I'll report as I find things. -Stafford
On Wed, May 4, 2022 at 12:58 PM Stafford Horne <shorne@gmail.com> wrote: > > I have uploaded a diff I created here: > https://gist.github.com/54334556f2907104cd12374872a0597c > > It shows the same output. In hex_to_bin itself it seems to only be a difference due to some register allocation (r19 and r3 switched around). But then it gets inlined into hex2bin and there changes there seem to be about instruction and basic block scheduling, so it's a lot harder to see what's going on. And a lot of constant changes, which honestly look just like code code moved around by 16 bytes and offsets changed due to that. So I doubt it's hex_to_bin() that is causing problems, I think it's purely code movement. Which explains why adding a nop or a fake printk fixes things. Some alignment assumption that got broken? Linus
On Wed, May 4, 2022 at 1:12 PM Stafford Horne <shorne@gmail.com> wrote: > > Which version of Fedora? F35 here. But looking further, it's not dnsmasq either. Yes, dnsmasq is built with no-i18n, but as mentioned IDN2 does seem to be enabled, so I think it's just "no i18n messages". It seems to be the upstream dns server. Using 8.8.8.8 explicitly makes it work for me, and that obviously bypasses not just the local dns cache, but also the next dns server hop. Could be anywhere. Xfinity, Nest WiFi, or the cable router. They all are doing their own dns thing. Probably my cable box, since it's likely the oldest thing in the chain. Linus
On Thu, May 05, 2022 at 05:38:58AM +0900, Stafford Horne wrote: > On Wed, May 04, 2022 at 01:10:03PM -0700, Linus Torvalds wrote: > > On Wed, May 4, 2022 at 12:58 PM Stafford Horne <shorne@gmail.com> wrote: > > > > > > I have uploaded a diff I created here: > > > https://gist.github.com/54334556f2907104cd12374872a0597c > > > > > > It shows the same output. > > > > In hex_to_bin itself it seems to only be a difference due to some > > register allocation (r19 and r3 switched around). > > > > But then it gets inlined into hex2bin and there changes there seem to > > be about instruction and basic block scheduling, so it's a lot harder > > to see what's going on. > > > > And a lot of constant changes, which honestly look just like code code > > moved around by 16 bytes and offsets changed due to that. > > > > So I doubt it's hex_to_bin() that is causing problems, I think it's > > purely code movement. Which explains why adding a nop or a fake printk > > fixes things. > > > > Some alignment assumption that got broken? > > This is what it looks like to me too. I will have to do a deep dive on what is > going on with this particular build combination as I can't figure out what it is > off the top of my head. > > This test is using a gcc 11 compiler, I tried with my gcc 12 toolchain and the > issue cannot be reproduced. > > - musl gcc 11 - https://musl.cc/or1k-linux-musl-cross.tgz > - openrisc gcc 12 - https://github.com/openrisc/or1k-gcc/releases/tag/or1k-12.0.1-20220210-20220304 > > But again the difference between the two compiler outputs is a lot of register > allocation and offsets changes. Its not easy to see anything that stands out. > I checked the change log for the openrisc specific changes from gcc 11 to gcc > 12. Nothing seems to stand out, mcount profiler fix for PIC, a new large binary > link flag. Hello, Just an update on this. The issue so far has been traced to the alignment of the crypto multiplication function fe_mul_impl in lib/crypto/curve25519-fiat32.c. This patch e5be15767e7e ("hex2bin: make the function hex_to_bin constant-time") allowed for the alignment to be just right to cause the failure. Without this patch and forcing the alignment we can reproduce the issue. So though the bisect is correct, this patch is not the root cause of the issue. Using some l.nop sliding techniques and some strategically placed .align statements I have been able to reproduce the issue on: - gcc 11 and gcc 12 - preempt and non-preempt kernels I have not been able to reproduce this on my FPGA, so far only QEMU. My hunch now is that since the fe_mul_impl function contains some rather long basic blocks it appears that timer interrupts that interrupt qemu mid basic block execution somehow is causing this. The hypothesis is it may be basic block resuming behavior in qemu that causes incosistent behavior. It will take a bit more time to trace this. Since I maintain OpenRISC QEMU the issue is on me. Again, It's safe to say that patch e5be15767e7e ("hex2bin: make the function hex_to_bin constant-time") is not an issue. -Stafford
On Sun, May 08, 2022 at 09:37:26AM +0900, Stafford Horne wrote: > On Thu, May 05, 2022 at 05:38:58AM +0900, Stafford Horne wrote: > > On Wed, May 04, 2022 at 01:10:03PM -0700, Linus Torvalds wrote: > > > On Wed, May 4, 2022 at 12:58 PM Stafford Horne <shorne@gmail.com> wrote: > > > > > > > > I have uploaded a diff I created here: > > > > https://gist.github.com/54334556f2907104cd12374872a0597c > > > > > > > > It shows the same output. > > > > > > In hex_to_bin itself it seems to only be a difference due to some > > > register allocation (r19 and r3 switched around). > > > > > > But then it gets inlined into hex2bin and there changes there seem to > > > be about instruction and basic block scheduling, so it's a lot harder > > > to see what's going on. > > > > > > And a lot of constant changes, which honestly look just like code code > > > moved around by 16 bytes and offsets changed due to that. > > > > > > So I doubt it's hex_to_bin() that is causing problems, I think it's > > > purely code movement. Which explains why adding a nop or a fake printk > > > fixes things. > > > > > > Some alignment assumption that got broken? > > > > This is what it looks like to me too. I will have to do a deep dive on what is > > going on with this particular build combination as I can't figure out what it is > > off the top of my head. > > > > This test is using a gcc 11 compiler, I tried with my gcc 12 toolchain and the > > issue cannot be reproduced. > > > > - musl gcc 11 - https://musl.cc/or1k-linux-musl-cross.tgz > > - openrisc gcc 12 - https://github.com/openrisc/or1k-gcc/releases/tag/or1k-12.0.1-20220210-20220304 > > > > But again the difference between the two compiler outputs is a lot of register > > allocation and offsets changes. Its not easy to see anything that stands out. > > I checked the change log for the openrisc specific changes from gcc 11 to gcc > > 12. Nothing seems to stand out, mcount profiler fix for PIC, a new large binary > > link flag. > > Hello, > > Just an update on this. The issue so far has been traced to the alignment of > the crypto multiplication function fe_mul_impl in lib/crypto/curve25519-fiat32.c. > > This patch e5be15767e7e ("hex2bin: make the function hex_to_bin constant-time") > allowed for the alignment to be just right to cause the failure. Without > this patch and forcing the alignment we can reproduce the issue. So though the > bisect is correct, this patch is not the root cause of the issue. > > Using some l.nop sliding techniques and some strategically placed .align > statements I have been able to reproduce the issue on: > > - gcc 11 and gcc 12 > - preempt and non-preempt kernels > > I have not been able to reproduce this on my FPGA, so far only QEMU. My > hunch now is that since the fe_mul_impl function contains some rather long basic > blocks it appears that timer interrupts that interrupt qemu mid basic block > execution somehow is causing this. The hypothesis is it may be basic block > resuming behavior in qemu that causes incosistent behavior. > > It will take a bit more time to trace this. Since I maintain OpenRISC QEMU the > issue is on me. > > Again, It's safe to say that patch e5be15767e7e ("hex2bin: make the function > hex_to_bin constant-time") is not an issue. This issue has been fixed. I sent a patch to QEMU for it: - https://lore.kernel.org/qemu-devel/20220511120541.2242797-1-shorne@gmail.com/T/#u The issue was a bug in the OpenRISC emulation in QEMU which was triggered when receiving a TICK TIMER interrupt, in a delay slot, on a page boundary. The fix was simple enough, but investigation took quite some work. Thanks, -Stafford
Index: linux-2.6/lib/hexdump.c =================================================================== --- linux-2.6.orig/lib/hexdump.c 2022-04-24 18:51:20.000000000 +0200 +++ linux-2.6/lib/hexdump.c 2022-04-25 13:15:26.000000000 +0200 @@ -22,15 +22,33 @@ EXPORT_SYMBOL(hex_asc_upper); * * hex_to_bin() converts one hex digit to its actual value or -1 in case of bad * input. + * + * This function is used to load cryptographic keys, so it is coded in such a + * way that there are no conditions or memory accesses that depend on data. + * + * Explanation of the logic: + * (ch - '9' - 1) is negative if ch <= '9' + * ('0' - 1 - ch) is negative if ch >= '0' + * we "and" these two values, so the result is negative if ch is in the range + * '0' ... '9' + * we are only interested in the sign, so we do a shift ">> 8"; note that right + * shift of a negative value is implementation-defined, so we cast the + * value to (unsigned) before the shift --- we have 0xffffff if ch is in + * the range '0' ... '9', 0 otherwise + * we "and" this value with (ch - '0' + 1) --- we have a value 1 ... 10 if ch is + * in the range '0' ... '9', 0 otherwise + * we add this value to -1 --- we have a value 0 ... 9 if ch is in the range '0' + * ... '9', -1 otherwise + * the next line is similar to the previous one, but we need to decode both + * uppercase and lowercase letters, so we use (ch & 0xdf), which converts + * lowercase to uppercase */ -int hex_to_bin(char ch) +int hex_to_bin(unsigned char ch) { - if ((ch >= '0') && (ch <= '9')) - return ch - '0'; - ch = tolower(ch); - if ((ch >= 'a') && (ch <= 'f')) - return ch - 'a' + 10; - return -1; + unsigned char cu = ch & 0xdf; + return -1 + + ((ch - '0' + 1) & (unsigned)((ch - '9' - 1) & ('0' - 1 - ch)) >> 8) + + ((cu - 'A' + 11) & (unsigned)((cu - 'F' - 1) & ('A' - 1 - cu)) >> 8); } EXPORT_SYMBOL(hex_to_bin); Index: linux-2.6/include/linux/kernel.h =================================================================== --- linux-2.6.orig/include/linux/kernel.h 2022-04-21 17:31:39.000000000 +0200 +++ linux-2.6/include/linux/kernel.h 2022-04-25 07:33:43.000000000 +0200 @@ -285,7 +285,7 @@ static inline char *hex_byte_pack_upper( return buf; } -extern int hex_to_bin(char ch); +extern int hex_to_bin(unsigned char ch); extern int __must_check hex2bin(u8 *dst, const char *src, size_t count); extern char *bin2hex(char *dst, const void *src, size_t count);