Message ID | 20201201194556.5220-1-ardb@kernel.org |
---|---|
State | New |
Headers | show |
Series | [v2] crypto: aesni - add ccm(aes) algorithm implementation | expand |
On 12/1/20 1:57 PM, Herbert Xu wrote: > On Tue, Dec 01, 2020 at 08:45:56PM +0100, Ard Biesheuvel wrote: >> Add ccm(aes) implementation from linux-wireless mailing list (see >> http://permalink.gmane.org/gmane.linux.kernel.wireless.general/126679). >> >> This eliminates FPU context store/restore overhead existing in more >> general ccm_base(ctr(aes-aesni),aes-aesni) case in MAC calculation. >> >> Suggested-by: Ben Greear <greearb@candelatech.com> >> Co-developed-by: Steve deRosier <derosier@cal-sierra.com> >> Signed-off-by: Steve deRosier <derosier@cal-sierra.com> >> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> >> --- >> v2: avoid the SIMD helper, as it produces an CRYPTO_ALG_ASYNC aead, which >> is not usable by the 802.11 ccmp driver > > Sorry, but this is not the way to go. Please fix wireless to > use the async interface instead. No one wanted to do this for the last 6+ years, so I don't think it is likely to happen any time soon. If the patch is better than existing behaviour, please let it into the kernel. And it is certainly better in my test case. Thanks, Ben
On Tue, 1 Dec 2020 at 22:57, Herbert Xu <herbert@gondor.apana.org.au> wrote: > > On Tue, Dec 01, 2020 at 08:45:56PM +0100, Ard Biesheuvel wrote: > > Add ccm(aes) implementation from linux-wireless mailing list (see > > http://permalink.gmane.org/gmane.linux.kernel.wireless.general/126679). > > > > This eliminates FPU context store/restore overhead existing in more > > general ccm_base(ctr(aes-aesni),aes-aesni) case in MAC calculation. > > > > Suggested-by: Ben Greear <greearb@candelatech.com> > > Co-developed-by: Steve deRosier <derosier@cal-sierra.com> > > Signed-off-by: Steve deRosier <derosier@cal-sierra.com> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > --- > > v2: avoid the SIMD helper, as it produces an CRYPTO_ALG_ASYNC aead, which > > is not usable by the 802.11 ccmp driver > > Sorry, but this is not the way to go. Please fix wireless to > use the async interface instead. > This is not the first time this has come up. The point is that CCMP in the wireless stack is not used in 99% of the cases, given that any wifi hardware built in the last ~10 years can do it in hardware. Only in exceptional cases, such as Ben's, is there a need for exercising this interface. Also, care to explain why we have synchronous AEADs in the first place if they are not supposed to be used?
On 12/1/20 2:04 PM, Herbert Xu wrote: > On Tue, Dec 01, 2020 at 11:01:57PM +0100, Ard Biesheuvel wrote: >> >> This is not the first time this has come up. The point is that CCMP in >> the wireless stack is not used in 99% of the cases, given that any >> wifi hardware built in the last ~10 years can do it in hardware. Only >> in exceptional cases, such as Ben's, is there a need for exercising >> this interface. > > Either it matters or it doesn't. If it doesn't matter why are we > having this dicussion at all? If it does then fixing just one > direction makes no sense. > >> Also, care to explain why we have synchronous AEADs in the first place >> if they are not supposed to be used? > > Sync AEADs would make sense if you were dealing with a very small > amount of data, e.g., one block. Sure, I bet some part of the kernel does this. So let the patch in to handle that case. It will just be happy luck that it improves some other problems as well. Thanks, Ben
On Tue, Dec 01, 2020 at 11:12:32PM +0100, Ard Biesheuvel wrote: > > What do you mean by just one direction? Ben just confirmed a The TX direction generally executes in process context, which would benefit from an accelerated sync implementation. The RX side on the other hand is always in softirq context. > substantial speedup for his use case, which requires the use of > software encryption even on hardware that could support doing it in > hardware, and that software encryption currently only supports the > synchronous interface. The problem is that the degradation would come at the worst time, when the system is loaded. IOW when you get an interrupt during your TX path and get RX traffic that's when you'll take the fallback path. Cheers,
On Tue, Dec 01, 2020 at 11:27:52PM +0100, Ard Biesheuvel wrote: > > > The problem is that the degradation would come at the worst time, > > when the system is loaded. IOW when you get an interrupt during > > your TX path and get RX traffic that's when you'll take the fallback > > path. > > I can see how in the general case, this is something you would prefer > to avoid. However, on SMP x86_64 systems that implement AES-NI (which > runs at ~1 cycle per byte), I don't see this as a real problem for > this driver. AES-NI is 1 cycle per byte but the fallback is not. > What we could do is expose both versions, where the async version has > a slightly higher priority, so that all users that do support the > async interface will get it, and the wifi stack can use the sync > interface instead. No we've already tried this with IPsec and it doesn't work. That's why the async path exists in aesni. Wireless is no different to IPsec in this respect. Cheers,
On Wed, Dec 02, 2020 at 12:24:47AM +0100, Ard Biesheuvel wrote: > > True. But the fallback only gets executed if the scheduler is stupid > enough to schedule the TX task onto the core that is overloaded doing > RX softirqs. So in the general case, both TX and RX will be using > AES-NI instructions (unless the CCMP is done in hardware which is the > most common case by far) I don't think this makes sense. TX is typically done in response to RX so the natural alignment is for it to be on the same CPU. > Wireless is very different. Wifi uses a medium that is fundamentally > shared, and so the load it can induce is bounded. There is no way a > wifi interface is going to saturate a 64-bit AES-NI core doing CCMP in > software. This sounds pretty tenuous. In any case, even if wireless itself doesn't get you, there could be loads running on top of it, for example, IPsec. > Given the above, can't we be pragmatic here? This code addresses a > niche use case, which is not affected by the general concerns > regarding async crypto. We already have a framework for acceleration that works properly in aesni, I don't want to see us introduce another broken model within the same driver. So either just leave the whole thing along or do it properly by making wireless async. Cheers,
On 12/1/20 3:48 PM, Herbert Xu wrote: > On Wed, Dec 02, 2020 at 12:41:36AM +0100, Ard Biesheuvel wrote: >> >> You just explained that TX typically runs in process context, whereas >> RX is handled in softirq context. So how exactly are these going to >> end up on the same core? > > When you receive a TCP packet via RX, this should wake up your user- > space thread on the same CPU as otherwise you'll pay extra cost > on pulling the data into memory again. > >> Yes, but IPsec will not use the synchronous interface. > > That doesn't matter when the underlying wireless code is using > the sync interface. An async user is completely capable of making > the aesni code-path unavailable to the sync user. > >> Fair enough. But it is unfortunate that we cannot support Ben's use >> case without a lot of additional work that serves no purpose >> otherwise. > > To the contrary, I think to fully support Ben's use case you must > use the async code path. Otherwise sure you'll see good numbers > when you do simple benchmarks like netperf, but when you really > load up the system it just falls apart. I can generate some very complicated traffic to test this, including bi-directional traffic, mix of tcp/udp/https, etc. If numbers will sway your opinion, let me know what traffic tests will make you happy and I'll do the testing. I know for sure that in download traffic (which is normal dominant direction for wifi stations), Ard's patch gives me about 3x increase of throughput. Without the patch, softirqd is pegged 100% futzing around with enabling and disabling the fpu. The wifi stack people do not want any extra complexity in their code, and async processing of this would be a lot of complexity. So, even if I wanted to implement it, likely it would never make it upstream anyway. I also suspect that general users may benefit from this aesni patch since many older wifi chipsets don't support wpa3 in hardware and wpa3 is the new hotness in wifi. Thanks, Ben
On Wed, 2 Dec 2020 at 00:12, Herbert Xu <herbert@gondor.apana.org.au> wrote: > > On Tue, Dec 01, 2020 at 11:27:52PM +0100, Ard Biesheuvel wrote: > > > > > The problem is that the degradation would come at the worst time, > > > when the system is loaded. IOW when you get an interrupt during > > > your TX path and get RX traffic that's when you'll take the fallback > > > path. > > > > I can see how in the general case, this is something you would prefer > > to avoid. However, on SMP x86_64 systems that implement AES-NI (which > > runs at ~1 cycle per byte), I don't see this as a real problem for > > this driver. > > AES-NI is 1 cycle per byte but the fallback is not. > One thing I realized just now is that in the current situation, all the synchronous skciphers already degrade like this. I.e., in Ben's case, without the special ccm implementation, ccm(aes) will resolve to ccm(ctr(aesni),cbcmac(aesni)), which is instantiated as a sync skcipher using the ctr and ccm/cbcmac templates built on top of the AES-NI cipher (not skcipher). This cipher will also fall back to suboptimal scalar code if the SIMD is in use in process context. > > What we could do is expose both versions, where the async version has > > a slightly higher priority, so that all users that do support the > > async interface will get it, and the wifi stack can use the sync > > interface instead. > > No we've already tried this with IPsec and it doesn't work. That's > why the async path exists in aesni. > > Wireless is no different to IPsec in this respect. > > Cheers, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Thu, Dec 10, 2020 at 01:18:12AM +0100, Ard Biesheuvel wrote: > > One thing I realized just now is that in the current situation, all > the synchronous skciphers already degrade like this. > > I.e., in Ben's case, without the special ccm implementation, ccm(aes) > will resolve to ccm(ctr(aesni),cbcmac(aesni)), which is instantiated > as a sync skcipher using the ctr and ccm/cbcmac templates built on top > of the AES-NI cipher (not skcipher). This cipher will also fall back > to suboptimal scalar code if the SIMD is in use in process context. Sure, your patch is not making it any worse. But I don't think the extra code is worth it considering that you're still going to be running into that slow fallback path all the time. Much better to fix the wireless code to actually go async. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On 12/9/20 6:43 PM, Herbert Xu wrote: > On Thu, Dec 10, 2020 at 01:18:12AM +0100, Ard Biesheuvel wrote: >> >> One thing I realized just now is that in the current situation, all >> the synchronous skciphers already degrade like this. >> >> I.e., in Ben's case, without the special ccm implementation, ccm(aes) >> will resolve to ccm(ctr(aesni),cbcmac(aesni)), which is instantiated >> as a sync skcipher using the ctr and ccm/cbcmac templates built on top >> of the AES-NI cipher (not skcipher). This cipher will also fall back >> to suboptimal scalar code if the SIMD is in use in process context. > > Sure, your patch is not making it any worse. But I don't think > the extra code is worth it considering that you're still going to > be running into that slow fallback path all the time. How can we test this assumption? I see 3x performance gain, so it is not hitting the fallback path much in my case. What traffic pattern and protocol do you think will cause the slow fallback path to happen often enough to make this patch not helpful? > Much better to fix the wireless code to actually go async. This will not happen any time soon, so better to make incremental improvement in the crypt code. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com
On Thu, 10 Dec 2020 at 04:01, Ben Greear <greearb@candelatech.com> wrote: > > On 12/9/20 6:43 PM, Herbert Xu wrote: > > On Thu, Dec 10, 2020 at 01:18:12AM +0100, Ard Biesheuvel wrote: > >> > >> One thing I realized just now is that in the current situation, all > >> the synchronous skciphers already degrade like this. > >> > >> I.e., in Ben's case, without the special ccm implementation, ccm(aes) > >> will resolve to ccm(ctr(aesni),cbcmac(aesni)), which is instantiated > >> as a sync skcipher using the ctr and ccm/cbcmac templates built on top > >> of the AES-NI cipher (not skcipher). This cipher will also fall back > >> to suboptimal scalar code if the SIMD is in use in process context. > > > > Sure, your patch is not making it any worse. But I don't think > > the extra code is worth it considering that you're still going to > > be running into that slow fallback path all the time. > > How can we test this assumption? I see 3x performance gain, so it is not hitting > the fallback path much in my case. What traffic pattern and protocol do you think > will cause the slow fallback path to happen often enough to make this patch not > helpful? > Is there a way to verify Herbert's assertion that TX and RX tend to be handled by the same core? I am not a networking guy, but that seems dubious to me. You could add a pr_warn_ratelimited() inside the fallback path and see if it ever gets called at all under various loads. > > Much better to fix the wireless code to actually go async. > > This will not happen any time soon, so better to make incremental > improvement in the crypt code. > I would argue that these are orthogonal. My patch improves both the accelerated and the fallback path, given that the latter does not have to walk the input data twice anymore, and go through 3 layers of templates and the associated indirect calls for each 16 bytes of input. Of course, it would be better to avoid using the fallback path altogether, but I don't think one should hold up the other.
On Thu, Dec 10, 2020 at 08:30:47AM +0100, Ard Biesheuvel wrote: > > I would argue that these are orthogonal. My patch improves both the > accelerated and the fallback path, given that the latter does not have > to walk the input data twice anymore, and go through 3 layers of > templates and the associated indirect calls for each 16 bytes of > input. As I told your before, your patch introduces a new model into aesni that is different to every other algorithm there for the sole purpose of catering for legacy hardware in a subsystem that refuses to do the right thing. That is not acceptable. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Thu, 10 Dec 2020 at 12:14, Herbert Xu <herbert@gondor.apana.org.au> wrote: > > On Thu, Dec 10, 2020 at 08:30:47AM +0100, Ard Biesheuvel wrote: > > > > I would argue that these are orthogonal. My patch improves both the > > accelerated and the fallback path, given that the latter does not have > > to walk the input data twice anymore, and go through 3 layers of > > templates and the associated indirect calls for each 16 bytes of > > input. > > As I told your before, your patch introduces a new model into aesni > that is different to every other algorithm there for the sole purpose > of catering for legacy hardware in a subsystem that refuses to do > the right thing. > > That is not acceptable. > OK, I will stop whining about CCM, apologies if this is getting tedious. But we should probably start policing this a bit more. For instance, we now have drivers/net/macsec.c: /* Pick a sync gcm(aes) cipher to ensure order is preserved. */ tfm = crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC); (btw the comment is bogus, right?) TLS_SW does the same thing in net/tls/tls_device_fallback.c. So it is not only CCM in the 802.11 layer, there are now other places where we end up using a sub-optimal algorithm (and less secure if table based AES or GHASH end up being used) just to avoid a potential fallback which is not even as bad as the fallback we will actually end up with when the crypto API synthesizes it from the GCM, CTR and GHASH templates/drivers. Async is obviously needed for h/w accelerators, but could we perhaps do better for s/w SIMD algorithms? Those are by far the most widely used ones.
On Thu, Dec 10, 2020 at 01:03:56PM +0100, Ard Biesheuvel wrote: > > But we should probably start policing this a bit more. For instance, we now have > > drivers/net/macsec.c: > > /* Pick a sync gcm(aes) cipher to ensure order is preserved. */ > tfm = crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC); > > (btw the comment is bogus, right?) > > TLS_SW does the same thing in net/tls/tls_device_fallback.c. Short of us volunteering to write code for every user out there I don't see a way out. > Async is obviously needed for h/w accelerators, but could we perhaps > do better for s/w SIMD algorithms? Those are by far the most widely > used ones. If you can come up with a way that avoids the cryptd model without using a fallback obviously that would be the ultimate solution. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Thu, 10 Dec 2020 at 13:16, Herbert Xu <herbert@gondor.apana.org.au> wrote: > > On Thu, Dec 10, 2020 at 01:03:56PM +0100, Ard Biesheuvel wrote: > > > > But we should probably start policing this a bit more. For instance, we now have > > > > drivers/net/macsec.c: > > > > /* Pick a sync gcm(aes) cipher to ensure order is preserved. */ > > tfm = crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC); > > > > (btw the comment is bogus, right?) > > > > TLS_SW does the same thing in net/tls/tls_device_fallback.c. > > Short of us volunteering to write code for every user out there > I don't see a way out. > > > Async is obviously needed for h/w accelerators, but could we perhaps > > do better for s/w SIMD algorithms? Those are by far the most widely > > used ones. > > If you can come up with a way that avoids the cryptd model without > using a fallback obviously that would be the ultimate solution. > Could we disable softirq handling in these regions?
On 12/9/20 11:30 PM, Ard Biesheuvel wrote: > On Thu, 10 Dec 2020 at 04:01, Ben Greear <greearb@candelatech.com> wrote: >> >> On 12/9/20 6:43 PM, Herbert Xu wrote: >>> On Thu, Dec 10, 2020 at 01:18:12AM +0100, Ard Biesheuvel wrote: >>>> >>>> One thing I realized just now is that in the current situation, all >>>> the synchronous skciphers already degrade like this. >>>> >>>> I.e., in Ben's case, without the special ccm implementation, ccm(aes) >>>> will resolve to ccm(ctr(aesni),cbcmac(aesni)), which is instantiated >>>> as a sync skcipher using the ctr and ccm/cbcmac templates built on top >>>> of the AES-NI cipher (not skcipher). This cipher will also fall back >>>> to suboptimal scalar code if the SIMD is in use in process context. >>> >>> Sure, your patch is not making it any worse. But I don't think >>> the extra code is worth it considering that you're still going to >>> be running into that slow fallback path all the time. >> >> How can we test this assumption? I see 3x performance gain, so it is not hitting >> the fallback path much in my case. What traffic pattern and protocol do you think >> will cause the slow fallback path to happen often enough to make this patch not >> helpful? >> > > Is there a way to verify Herbert's assertion that TX and RX tend to be > handled by the same core? I am not a networking guy, but that seems > dubious to me. > > You could add a pr_warn_ratelimited() inside the fallback path and see > if it ever gets called at all under various loads. Even if it does sometimes use the same core, if performance is better and CPU usage is lower, why would it even matter? Anyway, looks like Herbert is dead set against this code in hopes that he can force other subsystems to re-write their code. If you come up with some other variant that Herbert will accept, let me know and I'll test it. Otherwise, I will just add your patch to my kernel and carry on. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com
(+ Eric) TL;DR can we find a way to use synchronous SIMD skciphers/aeads without cryptd or scalar fallbacks On Thu, 10 Dec 2020 at 13:19, Ard Biesheuvel <ardb@kernel.org> wrote: > > On Thu, 10 Dec 2020 at 13:16, Herbert Xu <herbert@gondor.apana.org.au> wrote: > > > > On Thu, Dec 10, 2020 at 01:03:56PM +0100, Ard Biesheuvel wrote: > > > > > > But we should probably start policing this a bit more. For instance, we now have > > > > > > drivers/net/macsec.c: > > > > > > /* Pick a sync gcm(aes) cipher to ensure order is preserved. */ > > > tfm = crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC); > > > > > > (btw the comment is bogus, right?) > > > > > > TLS_SW does the same thing in net/tls/tls_device_fallback.c. > > > > Short of us volunteering to write code for every user out there > > I don't see a way out. > > > > > Async is obviously needed for h/w accelerators, but could we perhaps > > > do better for s/w SIMD algorithms? Those are by far the most widely > > > used ones. > > > > If you can come up with a way that avoids the cryptd model without > > using a fallback obviously that would be the ultimate solution. > > > > Could we disable softirq handling in these regions? I have been looking into this a bit, and I wonder if we might consider doing the following: - forbid synchronous skcipher/aead encrypt/decrypt calls from any other context than task or softirq (insofar this is not already the case) - limit kernel mode SIMD in general to task or softirq context - reduce the scope for simd begin/end blocks, which is better for PREEMPT in any case, and no longer results in a performance hit on x86 as it did before, now that we have lazy restore for the userland FPU state - disable softirq processing when enabling kernel mode SIMD This way, we don't need a scalar fallback at all, given that any SIMD use in softirq context is guaranteed to occur when the SIMD registers are dead from the task's pov. So the question is then how granular these kernel mode SIMD regions need to be to avoid excessive latencies in softirq handling. I think this could also be an opportunity for a bit more alignment between architectures on this topic. -- Ard.
On Tue, Dec 15, 2020 at 09:55:37AM +0100, Ard Biesheuvel wrote: > > So the question is then how granular these kernel mode SIMD regions > need to be to avoid excessive latencies in softirq handling. Can you get some real world numbers on what the latency is like? Then we could take it to the scheduler folks and see if they're OK with it. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On 12/15/20 1:19 AM, Herbert Xu wrote: > On Tue, Dec 15, 2020 at 09:55:37AM +0100, Ard Biesheuvel wrote: >> >> So the question is then how granular these kernel mode SIMD regions >> need to be to avoid excessive latencies in softirq handling. > > Can you get some real world numbers on what the latency is like? > > Then we could take it to the scheduler folks and see if they're > OK with it. > > Thanks, > Hello, While rebasing my patches onto 6.1-rc4, I noticed my aesni for ccm(aes) patch didn't apply cleanly, and I found this patch described below is applied now. Does this upstream patch mean that aesni is already supported upstream now? Or is it specific to whatever xctr is? If so, any chance the patch is wanted upstream now? commit fd94fcf09957a75e25941f7dbfc84d30a63817ac Author: Nathan Huckleberry <nhuck@google.com> Date: Fri May 20 18:14:56 2022 +0000 crypto: x86/aesni-xctr - Add accelerated implementation of XCTR Add hardware accelerated version of XCTR for x86-64 CPUs with AESNI support. More information on XCTR can be found in the HCTR2 paper: "Length-preserving encryption with HCTR2": https://eprint.iacr.org/2021/1441.pdf Signed-off-by: Nathan Huckleberry <nhuck@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Thanks, Ben
On Tue, Nov 08, 2022 at 10:50:48AM -0800, Ben Greear wrote: > > While rebasing my patches onto 6.1-rc4, I noticed my aesni for ccm(aes) patch didn't apply cleanly, > and I found this patch described below is applied now. Does this upstream patch mean that aesni is already > supported upstream now? Or is it specific to whatever xctr is? If so, > any chance the patch is wanted upstream now? AFAICS the xctr patch has nothing to do with what you were trying to achieve with wireless. My objection still stands with regards to wireless, we should patch wireless to use the async crypto interface and not hack around it in the Crypto API. Cheers,
On Wed, 9 Nov 2022 at 04:52, Herbert Xu <herbert@gondor.apana.org.au> wrote: > > On Tue, Nov 08, 2022 at 10:50:48AM -0800, Ben Greear wrote: > > > > While rebasing my patches onto 6.1-rc4, I noticed my aesni for ccm(aes) patch didn't apply cleanly, > > and I found this patch described below is applied now. Does this upstream patch mean that aesni is already > > supported upstream now? Or is it specific to whatever xctr is? If so, > > any chance the patch is wanted upstream now? > > AFAICS the xctr patch has nothing to do with what you were trying > to achieve with wireless. My objection still stands with regards > to wireless, we should patch wireless to use the async crypto > interface and not hack around it in the Crypto API. > Indeed. Those are just add/add conflicts because both patches introduce new code into the same set of files. The resolution is generally to keep both sides. As for Herbert's objection: I will note here that in the meantime, arm64 now has gotten rid of the scalar fallbacks entirely in AEAD and skipcher implementations, because those are only callable in task or softirq context, and the arm64 SIMD wrappers now disable softirq processing. This means that the condition that results in the fallback being needed can no longer occur, making the SIMD helper dead code on arm64. I suppose we might do the same thing on x86, but since the kernel mode SIMD handling is highly arch specific, you'd really need to raise this with the x86 maintainers.
On 11/9/22 2:05 AM, Ard Biesheuvel wrote: > On Wed, 9 Nov 2022 at 04:52, Herbert Xu <herbert@gondor.apana.org.au> wrote: >> >> On Tue, Nov 08, 2022 at 10:50:48AM -0800, Ben Greear wrote: >>> >>> While rebasing my patches onto 6.1-rc4, I noticed my aesni for ccm(aes) patch didn't apply cleanly, >>> and I found this patch described below is applied now. Does this upstream patch mean that aesni is already >>> supported upstream now? Or is it specific to whatever xctr is? If so, >>> any chance the patch is wanted upstream now? >> >> AFAICS the xctr patch has nothing to do with what you were trying >> to achieve with wireless. My objection still stands with regards >> to wireless, we should patch wireless to use the async crypto >> interface and not hack around it in the Crypto API. >> > > Indeed. Those are just add/add conflicts because both patches > introduce new code into the same set of files. The resolution is > generally to keep both sides. > > As for Herbert's objection: I will note here that in the meantime, > arm64 now has gotten rid of the scalar fallbacks entirely in AEAD and > skipcher implementations, because those are only callable in task or > softirq context, and the arm64 SIMD wrappers now disable softirq > processing. This means that the condition that results in the fallback > being needed can no longer occur, making the SIMD helper dead code on > arm64. > > I suppose we might do the same thing on x86, but since the kernel mode > SIMD handling is highly arch specific, you'd really need to raise this > with the x86 maintainers. Wifi stack is unlikely to ever change in this regard, so I hope that it does not become too much more trouble to keep this patch alive. I made an attempt at the merge, but the error path clean up looks tangled to me, I'll post a patch for review once I get my tree cleaned up a bit from the rebase. Thanks, Ben
On 11/9/22 2:05 AM, Ard Biesheuvel wrote: > On Wed, 9 Nov 2022 at 04:52, Herbert Xu <herbert@gondor.apana.org.au> wrote: >> >> On Tue, Nov 08, 2022 at 10:50:48AM -0800, Ben Greear wrote: >>> >>> While rebasing my patches onto 6.1-rc4, I noticed my aesni for ccm(aes) patch didn't apply cleanly, >>> and I found this patch described below is applied now. Does this upstream patch mean that aesni is already >>> supported upstream now? Or is it specific to whatever xctr is? If so, >>> any chance the patch is wanted upstream now? >> >> AFAICS the xctr patch has nothing to do with what you were trying >> to achieve with wireless. My objection still stands with regards >> to wireless, we should patch wireless to use the async crypto >> interface and not hack around it in the Crypto API. >> > > Indeed. Those are just add/add conflicts because both patches > introduce new code into the same set of files. The resolution is > generally to keep both sides. > > As for Herbert's objection: I will note here that in the meantime, > arm64 now has gotten rid of the scalar fallbacks entirely in AEAD and > skipcher implementations, because those are only callable in task or > softirq context, and the arm64 SIMD wrappers now disable softirq > processing. This means that the condition that results in the fallback > being needed can no longer occur, making the SIMD helper dead code on > arm64. > > I suppose we might do the same thing on x86, but since the kernel mode > SIMD handling is highly arch specific, you'd really need to raise this > with the x86 maintainers. > Hello Ard, Could you please review the attached patch to make sure I merged it properly? My concern is the cleanup section and/or some problems I might have introduced related to the similarly named code that was added upstream. Thanks, Ben
On Fri, 11 Nov 2022 at 23:29, Ben Greear <greearb@candelatech.com> wrote: > > On 11/9/22 2:05 AM, Ard Biesheuvel wrote: > > On Wed, 9 Nov 2022 at 04:52, Herbert Xu <herbert@gondor.apana.org.au> wrote: > >> > >> On Tue, Nov 08, 2022 at 10:50:48AM -0800, Ben Greear wrote: > >>> > >>> While rebasing my patches onto 6.1-rc4, I noticed my aesni for ccm(aes) patch didn't apply cleanly, > >>> and I found this patch described below is applied now. Does this upstream patch mean that aesni is already > >>> supported upstream now? Or is it specific to whatever xctr is? If so, > >>> any chance the patch is wanted upstream now? > >> > >> AFAICS the xctr patch has nothing to do with what you were trying > >> to achieve with wireless. My objection still stands with regards > >> to wireless, we should patch wireless to use the async crypto > >> interface and not hack around it in the Crypto API. > >> > > > > Indeed. Those are just add/add conflicts because both patches > > introduce new code into the same set of files. The resolution is > > generally to keep both sides. > > > > As for Herbert's objection: I will note here that in the meantime, > > arm64 now has gotten rid of the scalar fallbacks entirely in AEAD and > > skipcher implementations, because those are only callable in task or > > softirq context, and the arm64 SIMD wrappers now disable softirq > > processing. This means that the condition that results in the fallback > > being needed can no longer occur, making the SIMD helper dead code on > > arm64. > > > > I suppose we might do the same thing on x86, but since the kernel mode > > SIMD handling is highly arch specific, you'd really need to raise this > > with the x86 maintainers. > > > > Hello Ard, > > Could you please review the attached patch to make sure I merged it properly? My concern > is the cleanup section and/or some problems I might have introduced related to the similarly > named code that was added upstream. > I don't think the logic is quite right, although it rarely matter. I've pushed my version here - it invokes the static call for CTR so it will use the faster AVX version if the CPU supports it. https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=aesni-ccm-v6.1
On 11/12/22 06:59, Ard Biesheuvel wrote: > On Fri, 11 Nov 2022 at 23:29, Ben Greear <greearb@candelatech.com> wrote: >> >> On 11/9/22 2:05 AM, Ard Biesheuvel wrote: >>> On Wed, 9 Nov 2022 at 04:52, Herbert Xu <herbert@gondor.apana.org.au> wrote: >>>> >>>> On Tue, Nov 08, 2022 at 10:50:48AM -0800, Ben Greear wrote: >>>>> >>>>> While rebasing my patches onto 6.1-rc4, I noticed my aesni for ccm(aes) patch didn't apply cleanly, >>>>> and I found this patch described below is applied now. Does this upstream patch mean that aesni is already >>>>> supported upstream now? Or is it specific to whatever xctr is? If so, >>>>> any chance the patch is wanted upstream now? >>>> >>>> AFAICS the xctr patch has nothing to do with what you were trying >>>> to achieve with wireless. My objection still stands with regards >>>> to wireless, we should patch wireless to use the async crypto >>>> interface and not hack around it in the Crypto API. >>>> >>> >>> Indeed. Those are just add/add conflicts because both patches >>> introduce new code into the same set of files. The resolution is >>> generally to keep both sides. >>> >>> As for Herbert's objection: I will note here that in the meantime, >>> arm64 now has gotten rid of the scalar fallbacks entirely in AEAD and >>> skipcher implementations, because those are only callable in task or >>> softirq context, and the arm64 SIMD wrappers now disable softirq >>> processing. This means that the condition that results in the fallback >>> being needed can no longer occur, making the SIMD helper dead code on >>> arm64. >>> >>> I suppose we might do the same thing on x86, but since the kernel mode >>> SIMD handling is highly arch specific, you'd really need to raise this >>> with the x86 maintainers. >>> >> >> Hello Ard, >> >> Could you please review the attached patch to make sure I merged it properly? My concern >> is the cleanup section and/or some problems I might have introduced related to the similarly >> named code that was added upstream. >> > > I don't think the logic is quite right, although it rarely matter. > > I've pushed my version here - it invokes the static call for CTR so it > will use the faster AVX version if the CPU supports it. > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=aesni-ccm-v6.1 Hello Ard, It looks like something changed again in the intel-aesni logic for 6.6 kernel. I was able to do a small change to the patch to get it to compile, but the kernel crashes when I bring up a wlan port in 6.6. When I remove the aesni patch, the station comes up without crashing. The aesni patch worked fine in 6.5 as far as I can tell. I'm attaching my slightly modified version of the patch you sent previous. If you have time to investigate this it would be much appreciated. Thanks, Ben
On Mon, Oct 16, 2023 at 01:50:05PM -0700, Ben Greear wrote: > On 11/12/22 06:59, Ard Biesheuvel wrote: > > On Fri, 11 Nov 2022 at 23:29, Ben Greear <greearb@candelatech.com> wrote: > > > > > > On 11/9/22 2:05 AM, Ard Biesheuvel wrote: > > > > On Wed, 9 Nov 2022 at 04:52, Herbert Xu <herbert@gondor.apana.org.au> wrote: > > > > > > > > > > On Tue, Nov 08, 2022 at 10:50:48AM -0800, Ben Greear wrote: > > > > > > > > > > > > While rebasing my patches onto 6.1-rc4, I noticed my aesni for ccm(aes) patch didn't apply cleanly, > > > > > > and I found this patch described below is applied now. Does this upstream patch mean that aesni is already > > > > > > supported upstream now? Or is it specific to whatever xctr is? If so, > > > > > > any chance the patch is wanted upstream now? > > > > > > > > > > AFAICS the xctr patch has nothing to do with what you were trying > > > > > to achieve with wireless. My objection still stands with regards > > > > > to wireless, we should patch wireless to use the async crypto > > > > > interface and not hack around it in the Crypto API. > > > > > > > > > > > > > Indeed. Those are just add/add conflicts because both patches > > > > introduce new code into the same set of files. The resolution is > > > > generally to keep both sides. > > > > > > > > As for Herbert's objection: I will note here that in the meantime, > > > > arm64 now has gotten rid of the scalar fallbacks entirely in AEAD and > > > > skipcher implementations, because those are only callable in task or > > > > softirq context, and the arm64 SIMD wrappers now disable softirq > > > > processing. This means that the condition that results in the fallback > > > > being needed can no longer occur, making the SIMD helper dead code on > > > > arm64. > > > > > > > > I suppose we might do the same thing on x86, but since the kernel mode > > > > SIMD handling is highly arch specific, you'd really need to raise this > > > > with the x86 maintainers. > > > > > > > > > > Hello Ard, > > > > > > Could you please review the attached patch to make sure I merged it properly? My concern > > > is the cleanup section and/or some problems I might have introduced related to the similarly > > > named code that was added upstream. > > > > > > > I don't think the logic is quite right, although it rarely matter. > > > > I've pushed my version here - it invokes the static call for CTR so it > > will use the faster AVX version if the CPU supports it. > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=aesni-ccm-v6.1 > > Hello Ard, > > It looks like something changed again in the intel-aesni logic for 6.6 kernel. I was able to do a small > change to the patch to get it to compile, but the kernel crashes when I bring up a wlan port in > 6.6. When I remove the aesni patch, the station comes up without crashing. The aesni patch worked > fine in 6.5 as far as I can tell. > > I'm attaching my slightly modified version of the patch you sent previous. If you have time to > investigate this it would be much appreciated. > > Thanks, > Ben If this patch is useful, shouldn't it be upstreamed? - Eric
On Tue, 17 Oct 2023 at 05:16, Eric Biggers <ebiggers@kernel.org> wrote: > > On Mon, Oct 16, 2023 at 01:50:05PM -0700, Ben Greear wrote: > > On 11/12/22 06:59, Ard Biesheuvel wrote: > > > On Fri, 11 Nov 2022 at 23:29, Ben Greear <greearb@candelatech.com> wrote: > > > > > > > > On 11/9/22 2:05 AM, Ard Biesheuvel wrote: > > > > > On Wed, 9 Nov 2022 at 04:52, Herbert Xu <herbert@gondor.apana.org.au> wrote: > > > > > > > > > > > > On Tue, Nov 08, 2022 at 10:50:48AM -0800, Ben Greear wrote: > > > > > > > > > > > > > > While rebasing my patches onto 6.1-rc4, I noticed my aesni for ccm(aes) patch didn't apply cleanly, > > > > > > > and I found this patch described below is applied now. Does this upstream patch mean that aesni is already > > > > > > > supported upstream now? Or is it specific to whatever xctr is? If so, > > > > > > > any chance the patch is wanted upstream now? > > > > > > > > > > > > AFAICS the xctr patch has nothing to do with what you were trying > > > > > > to achieve with wireless. My objection still stands with regards > > > > > > to wireless, we should patch wireless to use the async crypto > > > > > > interface and not hack around it in the Crypto API. > > > > > > > > > > > > > > > > Indeed. Those are just add/add conflicts because both patches > > > > > introduce new code into the same set of files. The resolution is > > > > > generally to keep both sides. > > > > > > > > > > As for Herbert's objection: I will note here that in the meantime, > > > > > arm64 now has gotten rid of the scalar fallbacks entirely in AEAD and > > > > > skipcher implementations, because those are only callable in task or > > > > > softirq context, and the arm64 SIMD wrappers now disable softirq > > > > > processing. This means that the condition that results in the fallback > > > > > being needed can no longer occur, making the SIMD helper dead code on > > > > > arm64. > > > > > > > > > > I suppose we might do the same thing on x86, but since the kernel mode > > > > > SIMD handling is highly arch specific, you'd really need to raise this > > > > > with the x86 maintainers. > > > > > > > > > > > > > Hello Ard, > > > > > > > > Could you please review the attached patch to make sure I merged it properly? My concern > > > > is the cleanup section and/or some problems I might have introduced related to the similarly > > > > named code that was added upstream. > > > > > > > > > > I don't think the logic is quite right, although it rarely matter. > > > > > > I've pushed my version here - it invokes the static call for CTR so it > > > will use the faster AVX version if the CPU supports it. > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=aesni-ccm-v6.1 > > > > Hello Ard, > > > > It looks like something changed again in the intel-aesni logic for 6.6 kernel. I was able to do a small > > change to the patch to get it to compile, but the kernel crashes when I bring up a wlan port in > > 6.6. When I remove the aesni patch, the station comes up without crashing. The aesni patch worked > > fine in 6.5 as far as I can tell. > > > > I'm attaching my slightly modified version of the patch you sent previous. If you have time to > > investigate this it would be much appreciated. > > > > Thanks, > > Ben > > If this patch is useful, shouldn't it be upstreamed? > It was rejected by Herbert on the basis that the wireless stack should be converted to use the async API instead.
On Tue, Oct 17, 2023 at 08:43:29AM +0200, Ard Biesheuvel wrote: > > It was rejected by Herbert on the basis that the wireless stack should > be converted to use the async API instead. Yes, there is no reason to special-case this one algorithm when everything else uses the simd model. Of course, if we can support simd in softirqs that would be the best outcome. Cheers,
On 10/16/23 23:43, Ard Biesheuvel wrote: > On Tue, 17 Oct 2023 at 05:16, Eric Biggers <ebiggers@kernel.org> wrote: >> >> On Mon, Oct 16, 2023 at 01:50:05PM -0700, Ben Greear wrote: >>> On 11/12/22 06:59, Ard Biesheuvel wrote: >>>> On Fri, 11 Nov 2022 at 23:29, Ben Greear <greearb@candelatech.com> wrote: >>>>> >>>>> On 11/9/22 2:05 AM, Ard Biesheuvel wrote: >>>>>> On Wed, 9 Nov 2022 at 04:52, Herbert Xu <herbert@gondor.apana.org.au> wrote: >>>>>>> >>>>>>> On Tue, Nov 08, 2022 at 10:50:48AM -0800, Ben Greear wrote: >>>>>>>> >>>>>>>> While rebasing my patches onto 6.1-rc4, I noticed my aesni for ccm(aes) patch didn't apply cleanly, >>>>>>>> and I found this patch described below is applied now. Does this upstream patch mean that aesni is already >>>>>>>> supported upstream now? Or is it specific to whatever xctr is? If so, >>>>>>>> any chance the patch is wanted upstream now? >>>>>>> >>>>>>> AFAICS the xctr patch has nothing to do with what you were trying >>>>>>> to achieve with wireless. My objection still stands with regards >>>>>>> to wireless, we should patch wireless to use the async crypto >>>>>>> interface and not hack around it in the Crypto API. >>>>>>> >>>>>> >>>>>> Indeed. Those are just add/add conflicts because both patches >>>>>> introduce new code into the same set of files. The resolution is >>>>>> generally to keep both sides. >>>>>> >>>>>> As for Herbert's objection: I will note here that in the meantime, >>>>>> arm64 now has gotten rid of the scalar fallbacks entirely in AEAD and >>>>>> skipcher implementations, because those are only callable in task or >>>>>> softirq context, and the arm64 SIMD wrappers now disable softirq >>>>>> processing. This means that the condition that results in the fallback >>>>>> being needed can no longer occur, making the SIMD helper dead code on >>>>>> arm64. >>>>>> >>>>>> I suppose we might do the same thing on x86, but since the kernel mode >>>>>> SIMD handling is highly arch specific, you'd really need to raise this >>>>>> with the x86 maintainers. >>>>>> >>>>> >>>>> Hello Ard, >>>>> >>>>> Could you please review the attached patch to make sure I merged it properly? My concern >>>>> is the cleanup section and/or some problems I might have introduced related to the similarly >>>>> named code that was added upstream. >>>>> >>>> >>>> I don't think the logic is quite right, although it rarely matter. >>>> >>>> I've pushed my version here - it invokes the static call for CTR so it >>>> will use the faster AVX version if the CPU supports it. >>>> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=aesni-ccm-v6.1 >>> >>> Hello Ard, >>> >>> It looks like something changed again in the intel-aesni logic for 6.6 kernel. I was able to do a small >>> change to the patch to get it to compile, but the kernel crashes when I bring up a wlan port in >>> 6.6. When I remove the aesni patch, the station comes up without crashing. The aesni patch worked >>> fine in 6.5 as far as I can tell. >>> >>> I'm attaching my slightly modified version of the patch you sent previous. If you have time to >>> investigate this it would be much appreciated. >>> >>> Thanks, >>> Ben >> >> If this patch is useful, shouldn't it be upstreamed? >> > > It was rejected by Herbert on the basis that the wireless stack should > be converted to use the async API instead. Hello, I'm still dragging this patch along (see attached). I notice that there was a big re-write of the aesni logic in 6.11. Anyone know if this patch would need to be (or should be) modified to work well with the new upstream code? I managed to get it compiling with modest merge work, but have not done any actual testing yet... Thanks, Ben
On Sat, 3 Aug 2024 at 19:20, Ben Greear <greearb@candelatech.com> wrote: > > On 10/16/23 23:43, Ard Biesheuvel wrote: > > On Tue, 17 Oct 2023 at 05:16, Eric Biggers <ebiggers@kernel.org> wrote: > >> > >> On Mon, Oct 16, 2023 at 01:50:05PM -0700, Ben Greear wrote: > >>> On 11/12/22 06:59, Ard Biesheuvel wrote: > >>>> On Fri, 11 Nov 2022 at 23:29, Ben Greear <greearb@candelatech.com> wrote: > >>>>> > >>>>> On 11/9/22 2:05 AM, Ard Biesheuvel wrote: > >>>>>> On Wed, 9 Nov 2022 at 04:52, Herbert Xu <herbert@gondor.apana.org.au> wrote: > >>>>>>> > >>>>>>> On Tue, Nov 08, 2022 at 10:50:48AM -0800, Ben Greear wrote: > >>>>>>>> > >>>>>>>> While rebasing my patches onto 6.1-rc4, I noticed my aesni for ccm(aes) patch didn't apply cleanly, > >>>>>>>> and I found this patch described below is applied now. Does this upstream patch mean that aesni is already > >>>>>>>> supported upstream now? Or is it specific to whatever xctr is? If so, > >>>>>>>> any chance the patch is wanted upstream now? > >>>>>>> > >>>>>>> AFAICS the xctr patch has nothing to do with what you were trying > >>>>>>> to achieve with wireless. My objection still stands with regards > >>>>>>> to wireless, we should patch wireless to use the async crypto > >>>>>>> interface and not hack around it in the Crypto API. > >>>>>>> > >>>>>> > >>>>>> Indeed. Those are just add/add conflicts because both patches > >>>>>> introduce new code into the same set of files. The resolution is > >>>>>> generally to keep both sides. > >>>>>> > >>>>>> As for Herbert's objection: I will note here that in the meantime, > >>>>>> arm64 now has gotten rid of the scalar fallbacks entirely in AEAD and > >>>>>> skipcher implementations, because those are only callable in task or > >>>>>> softirq context, and the arm64 SIMD wrappers now disable softirq > >>>>>> processing. This means that the condition that results in the fallback > >>>>>> being needed can no longer occur, making the SIMD helper dead code on > >>>>>> arm64. > >>>>>> > >>>>>> I suppose we might do the same thing on x86, but since the kernel mode > >>>>>> SIMD handling is highly arch specific, you'd really need to raise this > >>>>>> with the x86 maintainers. > >>>>>> > >>>>> > >>>>> Hello Ard, > >>>>> > >>>>> Could you please review the attached patch to make sure I merged it properly? My concern > >>>>> is the cleanup section and/or some problems I might have introduced related to the similarly > >>>>> named code that was added upstream. > >>>>> > >>>> > >>>> I don't think the logic is quite right, although it rarely matter. > >>>> > >>>> I've pushed my version here - it invokes the static call for CTR so it > >>>> will use the faster AVX version if the CPU supports it. > >>>> > >>>> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=aesni-ccm-v6.1 > >>> > >>> Hello Ard, > >>> > >>> It looks like something changed again in the intel-aesni logic for 6.6 kernel. I was able to do a small > >>> change to the patch to get it to compile, but the kernel crashes when I bring up a wlan port in > >>> 6.6. When I remove the aesni patch, the station comes up without crashing. The aesni patch worked > >>> fine in 6.5 as far as I can tell. > >>> > >>> I'm attaching my slightly modified version of the patch you sent previous. If you have time to > >>> investigate this it would be much appreciated. > >>> > >>> Thanks, > >>> Ben > >> > >> If this patch is useful, shouldn't it be upstreamed? > >> > > > > It was rejected by Herbert on the basis that the wireless stack should > > be converted to use the async API instead. > > Hello, > > I'm still dragging this patch along (see attached). I notice that there was a big re-write of > the aesni logic in 6.11. Anyone know if this patch would need to be (or should be) > modified to work well with the new upstream code? > Those changes should be mostly orthogonal, so beyond resolving lexical conflicts, I wouldn't expect any additional work to be needed to forward port this.
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c index ad8a7188a2bf..d90b03d9b420 100644 --- a/arch/x86/crypto/aesni-intel_glue.c +++ b/arch/x86/crypto/aesni-intel_glue.c @@ -97,12 +97,13 @@ asmlinkage void aesni_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out, #define AVX_GEN2_OPTSIZE 640 #define AVX_GEN4_OPTSIZE 4096 +asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out, + const u8 *in, unsigned int len, u8 *iv); + #ifdef CONFIG_X86_64 static void (*aesni_ctr_enc_tfm)(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len, u8 *iv); -asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out, - const u8 *in, unsigned int len, u8 *iv); asmlinkage void aesni_xts_crypt8(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, bool enc, le128 *iv); @@ -454,6 +455,377 @@ static int cbc_decrypt(struct skcipher_request *req) return err; } +static int aesni_ccm_setkey(struct crypto_aead *tfm, const u8 *in_key, + unsigned int key_len) +{ + struct crypto_aes_ctx *ctx = crypto_aead_ctx(tfm); + + return aes_set_key_common(crypto_aead_tfm(tfm), ctx, in_key, key_len); +} + +static int aesni_ccm_setauthsize(struct crypto_aead *tfm, unsigned int authsize) +{ + if ((authsize & 1) || authsize < 4) + return -EINVAL; + return 0; +} + +static int ccm_set_msg_len(u8 *block, unsigned int msglen, int csize) +{ + __be32 data; + + memset(block, 0, csize); + block += csize; + + if (csize >= 4) + csize = 4; + else if (msglen > (1 << (8 * csize))) + return -EOVERFLOW; + + data = cpu_to_be32(msglen); + memcpy(block - csize, (u8 *)&data + 4 - csize, csize); + + return 0; +} + +static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen) +{ + struct crypto_aead *aead = crypto_aead_reqtfm(req); + __be32 *n = (__be32 *)&maciv[AES_BLOCK_SIZE - 8]; + u32 l = req->iv[0] + 1; + + /* verify that CCM dimension 'L' is set correctly in the IV */ + if (l < 2 || l > 8) + return -EINVAL; + + /* verify that msglen can in fact be represented in L bytes */ + if (l < 4 && msglen >> (8 * l)) + return -EOVERFLOW; + + /* + * Even if the CCM spec allows L values of up to 8, the Linux cryptoapi + * uses a u32 type to represent msglen so the top 4 bytes are always 0. + */ + n[0] = 0; + n[1] = cpu_to_be32(msglen); + + memcpy(maciv, req->iv, AES_BLOCK_SIZE - l); + + /* + * Meaning of byte 0 according to CCM spec (RFC 3610/NIST 800-38C) + * - bits 0..2 : max # of bytes required to represent msglen, minus 1 + * (already set by caller) + * - bits 3..5 : size of auth tag (1 => 4 bytes, 2 => 6 bytes, etc) + * - bit 6 : indicates presence of authenticate-only data + */ + maciv[0] |= (crypto_aead_authsize(aead) - 2) << 2; + if (req->assoclen) + maciv[0] |= 0x40; + + memset(&req->iv[AES_BLOCK_SIZE - l], 0, l); + return ccm_set_msg_len(maciv + AES_BLOCK_SIZE - l, msglen, l); +} + +static int compute_mac(struct crypto_aes_ctx *ctx, u8 mac[], u8 *data, int n, + unsigned int ilen, u8 *idata, bool do_simd) +{ + unsigned int bs = AES_BLOCK_SIZE; + u8 *odata = mac; + int datalen, getlen; + + datalen = n; + + /* first time in here, block may be partially filled. */ + getlen = bs - ilen; + if (datalen >= getlen) { + memcpy(idata + ilen, data, getlen); + + if (likely(do_simd)) { + aesni_cbc_enc(ctx, odata, idata, AES_BLOCK_SIZE, odata); + } else { + crypto_xor(odata, idata, AES_BLOCK_SIZE); + aes_encrypt(ctx, odata, odata); + } + + datalen -= getlen; + data += getlen; + ilen = 0; + } + + /* now encrypt rest of data */ + while (datalen >= bs) { + if (likely(do_simd)) { + aesni_cbc_enc(ctx, odata, data, AES_BLOCK_SIZE, odata); + } else { + crypto_xor(odata, data, AES_BLOCK_SIZE); + aes_encrypt(ctx, odata, odata); + } + + datalen -= bs; + data += bs; + } + + /* check and see if there's leftover data that wasn't + * enough to fill a block. + */ + if (datalen) { + memcpy(idata + ilen, data, datalen); + ilen += datalen; + } + return ilen; +} + +static void ccm_calculate_auth_mac(struct aead_request *req, + struct crypto_aes_ctx *ctx, u8 mac[], + struct scatterlist *src, + bool do_simd) +{ + unsigned int len = req->assoclen; + struct scatter_walk walk; + u8 idata[AES_BLOCK_SIZE]; + unsigned int ilen; + struct { + __be16 l; + __be32 h; + } __packed *ltag = (void *)idata; + + /* prepend the AAD with a length tag */ + if (len < 0xff00) { + ltag->l = cpu_to_be16(len); + ilen = 2; + } else { + ltag->l = cpu_to_be16(0xfffe); + ltag->h = cpu_to_be32(len); + ilen = 6; + } + + scatterwalk_start(&walk, src); + + while (len) { + u8 *src; + int n; + + n = scatterwalk_clamp(&walk, len); + if (!n) { + scatterwalk_start(&walk, sg_next(walk.sg)); + n = scatterwalk_clamp(&walk, len); + } + src = scatterwalk_map(&walk); + + ilen = compute_mac(ctx, mac, src, n, ilen, idata, do_simd); + len -= n; + + scatterwalk_unmap(src); + scatterwalk_advance(&walk, n); + scatterwalk_done(&walk, 0, len); + } + + /* any leftover needs padding and then encrypted */ + if (ilen) { + crypto_xor(mac, idata, ilen); + if (likely(do_simd)) + aesni_enc(ctx, mac, mac); + else + aes_encrypt(ctx, mac, mac); + } +} + +static int aesni_ccm_encrypt(struct aead_request *req) +{ + struct crypto_aead *aead = crypto_aead_reqtfm(req); + struct crypto_aes_ctx *ctx = aes_ctx(crypto_aead_ctx(aead)); + bool const do_simd = crypto_simd_usable(); + u8 __aligned(8) mac[AES_BLOCK_SIZE]; + u8 __aligned(8) buf[AES_BLOCK_SIZE]; + struct skcipher_walk walk; + u32 l = req->iv[0] + 1; + int err; + + err = ccm_init_mac(req, mac, req->cryptlen); + if (err) + return err; + + if (likely(do_simd)) { + kernel_fpu_begin(); + aesni_enc(ctx, mac, mac); + } else { + aes_encrypt(ctx, mac, mac); + } + + if (req->assoclen) + ccm_calculate_auth_mac(req, ctx, mac, req->src, do_simd); + + req->iv[AES_BLOCK_SIZE - 1] = 0x1; + err = skcipher_walk_aead_encrypt(&walk, req, true); + + while (walk.nbytes >= AES_BLOCK_SIZE) { + int len = walk.nbytes & AES_BLOCK_MASK; + int n; + + for (n = 0; n < len; n += AES_BLOCK_SIZE) { + if (likely(do_simd)) { + aesni_cbc_enc(ctx, mac, walk.src.virt.addr + n, + AES_BLOCK_SIZE, mac); + } else { + crypto_xor(mac, walk.src.virt.addr + n, + AES_BLOCK_SIZE); + aes_encrypt(ctx, mac, mac); + + aes_encrypt(ctx, buf, walk.iv); + crypto_inc(walk.iv, AES_BLOCK_SIZE); + crypto_xor_cpy(walk.dst.virt.addr + n, + walk.src.virt.addr + n, + buf, AES_BLOCK_SIZE); + } + } + if (likely(do_simd)) + aesni_ctr_enc(ctx, walk.dst.virt.addr, + walk.src.virt.addr, len, walk.iv); + + err = skcipher_walk_done(&walk, walk.nbytes & ~AES_BLOCK_MASK); + } + if (walk.nbytes) { + if (likely(do_simd)) + aesni_enc(ctx, buf, walk.iv); + else + aes_encrypt(ctx, buf, walk.iv); + + crypto_xor(mac, walk.src.virt.addr, walk.nbytes); + crypto_xor_cpy(walk.dst.virt.addr, walk.src.virt.addr, + buf, walk.nbytes); + + if (likely(do_simd)) + aesni_enc(ctx, mac, mac); + else + aes_encrypt(ctx, mac, mac); + + err = skcipher_walk_done(&walk, 0); + } + + if (err) + goto fail; + + memset(walk.iv + AES_BLOCK_SIZE - l, 0, l); + + if (likely(do_simd)) { + aesni_ctr_enc(ctx, mac, mac, AES_BLOCK_SIZE, walk.iv); + } else { + aes_encrypt(ctx, buf, walk.iv); + crypto_xor(mac, buf, AES_BLOCK_SIZE); + } + + /* copy authtag to end of dst */ + scatterwalk_map_and_copy(mac, req->dst, req->assoclen + req->cryptlen, + crypto_aead_authsize(aead), 1); + +fail: + if (likely(do_simd)) + kernel_fpu_end(); + return err; +} + +static int aesni_ccm_decrypt(struct aead_request *req) +{ + struct crypto_aead *aead = crypto_aead_reqtfm(req); + struct crypto_aes_ctx *ctx = aes_ctx(crypto_aead_ctx(aead)); + unsigned int authsize = crypto_aead_authsize(aead); + bool const do_simd = crypto_simd_usable(); + u8 __aligned(8) mac[AES_BLOCK_SIZE]; + u8 __aligned(8) tag[AES_BLOCK_SIZE]; + u8 __aligned(8) buf[AES_BLOCK_SIZE]; + struct skcipher_walk walk; + u32 l = req->iv[0] + 1; + int err; + + err = ccm_init_mac(req, mac, req->cryptlen - authsize); + if (err) + return err; + + /* copy authtag from end of src */ + scatterwalk_map_and_copy(tag, req->src, + req->assoclen + req->cryptlen - authsize, + authsize, 0); + + if (likely(do_simd)) { + kernel_fpu_begin(); + aesni_enc(ctx, mac, mac); + } else { + aes_encrypt(ctx, mac, mac); + } + + if (req->assoclen) + ccm_calculate_auth_mac(req, ctx, mac, req->src, do_simd); + + req->iv[AES_BLOCK_SIZE - 1] = 0x1; + err = skcipher_walk_aead_decrypt(&walk, req, true); + + while (walk.nbytes >= AES_BLOCK_SIZE) { + int len = walk.nbytes & AES_BLOCK_MASK; + int n; + + if (likely(do_simd)) + aesni_ctr_enc(ctx, walk.dst.virt.addr, + walk.src.virt.addr, len, walk.iv); + + for (n = 0; n < len; n += AES_BLOCK_SIZE) { + if (likely(do_simd)) { + aesni_cbc_enc(ctx, mac, walk.dst.virt.addr + n, + AES_BLOCK_SIZE, mac); + } else { + aes_encrypt(ctx, buf, walk.iv); + crypto_inc(walk.iv, AES_BLOCK_SIZE); + crypto_xor_cpy(walk.dst.virt.addr + n, + walk.src.virt.addr + n, + buf, AES_BLOCK_SIZE); + + crypto_xor(mac, walk.dst.virt.addr + n, + AES_BLOCK_SIZE); + aes_encrypt(ctx, mac, mac); + } + } + + err = skcipher_walk_done(&walk, walk.nbytes & ~AES_BLOCK_MASK); + } + if (walk.nbytes) { + if (likely(do_simd)) + aesni_enc(ctx, buf, walk.iv); + else + aes_encrypt(ctx, buf, walk.iv); + + crypto_xor_cpy(walk.dst.virt.addr, walk.src.virt.addr, + buf, walk.nbytes); + crypto_xor(mac, walk.dst.virt.addr, walk.nbytes); + + if (likely(do_simd)) + aesni_enc(ctx, mac, mac); + else + aes_encrypt(ctx, mac, mac); + + err = skcipher_walk_done(&walk, 0); + } + + if (err) + goto fail; + + memset(walk.iv + AES_BLOCK_SIZE - l, 0, l); + + if (likely(do_simd)) { + aesni_ctr_enc(ctx, mac, mac, AES_BLOCK_SIZE, walk.iv); + } else { + aes_encrypt(ctx, buf, walk.iv); + crypto_xor(mac, buf, AES_BLOCK_SIZE); + } + + /* compare calculated auth tag with the stored one */ + if (crypto_memneq(mac, tag, authsize)) + err = -EBADMSG; + +fail: + if (likely(do_simd)) + kernel_fpu_end(); + return err; +} + #ifdef CONFIG_X86_64 static void ctr_crypt_final(struct crypto_aes_ctx *ctx, struct skcipher_walk *walk) @@ -1045,10 +1417,28 @@ static struct aead_alg aesni_aeads[] = { { .cra_module = THIS_MODULE, }, } }; + #else static struct aead_alg aesni_aeads[0]; #endif +static struct aead_alg aesni_ccm_aead = { + .base.cra_name = "ccm(aes)", + .base.cra_driver_name = "ccm-aesni", + .base.cra_priority = 400, + .base.cra_blocksize = 1, + .base.cra_ctxsize = sizeof(struct crypto_aes_ctx), + .base.cra_module = THIS_MODULE, + + .setkey = aesni_ccm_setkey, + .setauthsize = aesni_ccm_setauthsize, + .encrypt = aesni_ccm_encrypt, + .decrypt = aesni_ccm_decrypt, + .ivsize = AES_BLOCK_SIZE, + .chunksize = AES_BLOCK_SIZE, + .maxauthsize = AES_BLOCK_SIZE, +}; + static struct simd_aead_alg *aesni_simd_aeads[ARRAY_SIZE(aesni_aeads)]; static const struct x86_cpu_id aesni_cpu_id[] = { @@ -1098,8 +1488,17 @@ static int __init aesni_init(void) if (err) goto unregister_skciphers; + if (IS_ENABLED(CONFIG_X86_64)) { + err = crypto_register_aead(&aesni_ccm_aead); + if (err) + goto unregister_aeads; + } + return 0; +unregister_aeads: + simd_unregister_aeads(aesni_aeads, ARRAY_SIZE(aesni_aeads), + aesni_simd_aeads); unregister_skciphers: simd_unregister_skciphers(aesni_skciphers, ARRAY_SIZE(aesni_skciphers), aesni_simd_skciphers); @@ -1110,6 +1509,9 @@ static int __init aesni_init(void) static void __exit aesni_exit(void) { + if (IS_ENABLED(CONFIG_X86_64)) + crypto_unregister_aead(&aesni_ccm_aead); + simd_unregister_aeads(aesni_aeads, ARRAY_SIZE(aesni_aeads), aesni_simd_aeads); simd_unregister_skciphers(aesni_skciphers, ARRAY_SIZE(aesni_skciphers),