Message ID | 20230823090107.65749-1-bchalios@amazon.es |
---|---|
Headers | show |
Series | Propagating reseed notifications to user space | expand |
Hi Greg, On 23/8/23 11:08, Greg KH wrote: > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > On Wed, Aug 23, 2023 at 11:01:05AM +0200, Babis Chalios wrote: >> Sometimes, PRNGs need to reseed. For example, on a regular timer >> interval, to ensure nothing consumes a random value for longer than e.g. >> 5 minutes, or when VMs get cloned, to ensure seeds don't leak in to >> clones. >> >> The notification happens through a 32bit epoch value that changes every >> time cached entropy is no longer valid, hence PRNGs need to reseed. User >> space applications can get hold of a pointer to this value through >> /dev/(u)random. We introduce a new ioctl() that returns an anonymous >> file descriptor. From this file descriptor we can mmap() a single page >> which includes the epoch at offset 0. >> >> random.c maintains the epoch value in a global shared page. It exposes >> a registration API for kernel subsystems that are able to notify when >> reseeding is needed. Notifiers register with random.c and receive a >> unique 8bit ID and a pointer to the epoch. When they need to report a >> reseeding event they write a new epoch value which includes the >> notifier ID in the first 8 bits and an increasing counter value in the >> remaining 24 bits: >> >> RNG epoch >> *-------------*---------------------* >> | notifier id | epoch counter value | >> *-------------*---------------------* >> 8 bits 24 bits > Why not just use 32/32 for a full 64bit value, or better yet, 2 > different variables? Why is 32bits and packing things together here > somehow simpler? We made it 32 bits so that we can read/write it atomically in all 32bit architectures. Do you think that's not a problem? Cheers, Babis
On Wed, Aug 23, 2023 at 11:27:11AM +0200, Babis Chalios wrote: > Hi Greg, > > On 23/8/23 11:08, Greg KH wrote: > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > > > > > On Wed, Aug 23, 2023 at 11:01:05AM +0200, Babis Chalios wrote: > > > Sometimes, PRNGs need to reseed. For example, on a regular timer > > > interval, to ensure nothing consumes a random value for longer than e.g. > > > 5 minutes, or when VMs get cloned, to ensure seeds don't leak in to > > > clones. > > > > > > The notification happens through a 32bit epoch value that changes every > > > time cached entropy is no longer valid, hence PRNGs need to reseed. User > > > space applications can get hold of a pointer to this value through > > > /dev/(u)random. We introduce a new ioctl() that returns an anonymous > > > file descriptor. From this file descriptor we can mmap() a single page > > > which includes the epoch at offset 0. > > > > > > random.c maintains the epoch value in a global shared page. It exposes > > > a registration API for kernel subsystems that are able to notify when > > > reseeding is needed. Notifiers register with random.c and receive a > > > unique 8bit ID and a pointer to the epoch. When they need to report a > > > reseeding event they write a new epoch value which includes the > > > notifier ID in the first 8 bits and an increasing counter value in the > > > remaining 24 bits: > > > > > > RNG epoch > > > *-------------*---------------------* > > > | notifier id | epoch counter value | > > > *-------------*---------------------* > > > 8 bits 24 bits > > Why not just use 32/32 for a full 64bit value, or better yet, 2 > > different variables? Why is 32bits and packing things together here > > somehow simpler? > > We made it 32 bits so that we can read/write it atomically in all 32bit > architectures. > Do you think that's not a problem? What 32bit platforms care about this type of interface at all? thanks, greg k-h
On Wed, Aug 23, 2023 at 12:08:35PM +0200, Babis Chalios wrote: > > > On 23/8/23 12:06, Greg KH wrote: > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > > > > > On Wed, Aug 23, 2023 at 11:27:11AM +0200, Babis Chalios wrote: > > > Hi Greg, > > > > > > On 23/8/23 11:08, Greg KH wrote: > > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > > > > > > > > > > > > > On Wed, Aug 23, 2023 at 11:01:05AM +0200, Babis Chalios wrote: > > > > > Sometimes, PRNGs need to reseed. For example, on a regular timer > > > > > interval, to ensure nothing consumes a random value for longer than e.g. > > > > > 5 minutes, or when VMs get cloned, to ensure seeds don't leak in to > > > > > clones. > > > > > > > > > > The notification happens through a 32bit epoch value that changes every > > > > > time cached entropy is no longer valid, hence PRNGs need to reseed. User > > > > > space applications can get hold of a pointer to this value through > > > > > /dev/(u)random. We introduce a new ioctl() that returns an anonymous > > > > > file descriptor. From this file descriptor we can mmap() a single page > > > > > which includes the epoch at offset 0. > > > > > > > > > > random.c maintains the epoch value in a global shared page. It exposes > > > > > a registration API for kernel subsystems that are able to notify when > > > > > reseeding is needed. Notifiers register with random.c and receive a > > > > > unique 8bit ID and a pointer to the epoch. When they need to report a > > > > > reseeding event they write a new epoch value which includes the > > > > > notifier ID in the first 8 bits and an increasing counter value in the > > > > > remaining 24 bits: > > > > > > > > > > RNG epoch > > > > > *-------------*---------------------* > > > > > | notifier id | epoch counter value | > > > > > *-------------*---------------------* > > > > > 8 bits 24 bits > > > > Why not just use 32/32 for a full 64bit value, or better yet, 2 > > > > different variables? Why is 32bits and packing things together here > > > > somehow simpler? > > > We made it 32 bits so that we can read/write it atomically in all 32bit > > > architectures. > > > Do you think that's not a problem? > > What 32bit platforms care about this type of interface at all? > > I think, any 32bit platform that gets random bytes from the kernel. You are making a new api, for some new functionality, for what I thought was virtual machines (hence the virtio driver), none of which work in a 32bit system. I thought this was an ioctl for userspace, which can handle 64bits at once (or 2 32bit numbers). For internal kernel stuff, a lock should be fine, or better yet, a 64bit atomic value read (horrible on 32bit platforms, I know...) Just asking, it feels odd to pack bits in these days for when 90% of the cpus really don't need it. greg k-h
Hello all, On 23/8/23 11:01, Babis Chalios wrote: > This is an RFC, so that we can discuss whether the proposed ABI works. > Also, I'd like to hear people's opinion on the internal registration > API, 8/24 split etc. If we decide that this approach works, I 'm happy > to add documentation for it, with examples on how user space can make > use of it. Some time has passed since I sent this and I haven't received any comments, so I assume people are happy with the proposed API. I will work on adding documentation and examples on how user space can use this and send a v1. Cheers, Babis
On Mon, Sep 04, 2023 at 03:44:48PM +0200, Babis Chalios wrote: > Hello all, > > On 23/8/23 11:01, Babis Chalios wrote: > > This is an RFC, so that we can discuss whether the proposed ABI works. > > Also, I'd like to hear people's opinion on the internal registration > > API, 8/24 split etc. If we decide that this approach works, I 'm happy > > to add documentation for it, with examples on how user space can make > > use of it. > > Some time has passed since I sent this and I haven't received any > comments, so I assume people Nope. This still stands: https://lore.kernel.org/all/CAHmME9pxc-nO_xa=4+1CnvbnuefbRTJHxM7n817c_TPeoxzu_g@mail.gmail.com/ And honestly the constant pushing from you has in part been demotivating.
Hi Jason, On 4/9/23 16:42, Jason A. Donenfeld wrote: > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > On Mon, Sep 04, 2023 at 03:44:48PM +0200, Babis Chalios wrote: >> Hello all, >> >> On 23/8/23 11:01, Babis Chalios wrote: >>> This is an RFC, so that we can discuss whether the proposed ABI works. >>> Also, I'd like to hear people's opinion on the internal registration >>> API, 8/24 split etc. If we decide that this approach works, I 'm happy >>> to add documentation for it, with examples on how user space can make >>> use of it. >> Some time has passed since I sent this and I haven't received any >> comments, so I assume people > Nope. This still stands: > https://lore.kernel.org/all/CAHmME9pxc-nO_xa=4+1CnvbnuefbRTJHxM7n817c_TPeoxzu_g@mail.gmail.com/ Could you elaborate on why the proposed RFC is not inline with your plan? We need to let user space know that it needs to reseed its PRNGs. It is not very clear to me, how does that interplay with having a getrandom vDSO. IOW, say we did have a vDSO getrandom, don't you think we should have such an API to notify when it needs to discard stale state, or do you think this is not the right API? > And honestly the constant pushing from you has in part been > demotivating. Cheers, Babis
[Resending in plain text only. Let's hope it reaches everyone this time :)] Hey Jason! On 04.09.23 16:42, Jason A. Donenfeld wrote: > On Mon, Sep 04, 2023 at 03:44:48PM +0200, Babis Chalios wrote: >> Hello all, >> >> On 23/8/23 11:01, Babis Chalios wrote: >>> This is an RFC, so that we can discuss whether the proposed ABI works. >>> Also, I'd like to hear people's opinion on the internal registration >>> API, 8/24 split etc. If we decide that this approach works, I 'm happy >>> to add documentation for it, with examples on how user space can make >>> use of it. >> Some time has passed since I sent this and I haven't received any >> comments, so I assume people > Nope. This still stands: > https://lore.kernel.org/all/CAHmME9pxc-nO_xa=4+1CnvbnuefbRTJHxM7n817c_TPeoxzu_g@mail.gmail.com/ To recap, that email said: > Just so you guys know, roughly the order of operations here are going to be: > > - vdso vgetrandom v+1 > - virtio fork driver > - exposing fork events to userspace > > I'll keep you posted on those. I don't quite understand both the relationship of vgetrandom to this nor how we could help. I understand how a VDSO vgetrandom could use primitives that are very similar (or maybe even identical) to this patch set. What I'm missing is why there is a dependency between them. I don't expect user space PRNGs to disappear over night, especially given all the heavy lifting and architecture specific code that vDSOs require. So if we want to build a solution that allows user space to generically solve VM snapshots, we should strive to have a mechanism that works in today's environment in addition to making the vgetrandom call safe when it emerges. The last revision of vgetrandom that I found was v14 from January. Is it still in active development? And if so, what is the status? The last fundamental comment I found in archives was this comment from Linus: > This should all be in libc. Not in the kernel with special magic vdso > support and special buffer allocations. The kernel should give good > enough support that libc can do a good job, but the kernel should > simply *not* take the approach of "libc will get this wrong, so let's > just do all the work for it". to which you replied > That buffering cannot be done safely currently -- VM forks, reseeding > semantics, and so forth. Again, discussed in the cover letter of the > patch if you'd like to engage with those ideas. My understanding is that this patch set solves exactly that problem in a way that is fully compatible with existing user space PRNGs and easy to consume as well as add support for in "Enterprise" systems for anyone who wishes to do so. So, where is v15 without VM changes standing? And why exactly should we couple vgetrandom with atomic user space reseed notifications? :) Thanks, Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Hi, Le 23/08/2023 à 11:01, Babis Chalios a écrit : > User space often implements PRNGs that use /dev/random as entropy > source. We can not expect that this randomness sources stay completely > unknown forever. For various reasons, the originating PRNG seed may > become known at which point the PRNG becomes insecure for further random > number generation. Events that can lead to that are for example fast > computers reversing the PRNG function using a number of inputs or > Virtual Machine clones which carry seed values into their clones. > > During LPC 2022 Jason, Alex, Michael and me brainstormed on how to > atomically expose a notification to user space that it should reseed. > Atomicity is key for the VM clone case. This patchset implements a > potential path to do so. > > This patchset introduces an epoch value as the means of communicating to > the guest the need to reseed. The epoch is a 32bit value with the > following form: > > RNG epoch > *-------------*---------------------* > | notifier id | epoch counter value | > *-------------*---------------------* > 8 bits 24 bits > > Changes in this value signal moments in time that PRNGs need to be > re-seeded. As a result, the intended use of the epoch from user space > PRNGs is to cache the epoch value every time they reseed using kernel > entropy, then control that its value hasn't changed before giving out > random numbers. If the value has changed the PRNG needs to reseed before > producing any more random bits. > > The API for getting hold of this value is offered through > /dev/(u)random. We introduce a new ioctl for these devices, which > creates an anonymous file descriptor. User processes can call the > ioctl() to get the anon fd and then mmap it to a single page. That page > contains the value of the epoch at offset 0. > > Naturally, random.c is the component that maintains the RNG epoch. > During initialization it allocates a single global page which holds the > epoch value. Moreover, it exposes an API to kernel subsystems > (notifiers) which can report events that require PRNG reseeding. > Notifiers register with random.c and receive an 8-bit notifier id (up to > 256 subscribers should be enough) and a pointer to the epoch. Notifying, > then, is equivalent to writing in the epoch address a new epoch value. > > Notifiers write epoch values that include the notifier ID on the higher > 8 bits and increasing counter values on the 24 remaining bits. This > guarantees that two notifiers cannot ever write the same epoch value, > since notificator IDs are unique. > > The first patch of this series implements the epoch mechanism. It adds > the logic in the random.c to maintain the epoch page and expose the > user space facing API. It also adds the internal API that allows kernel > systems to register as notifiers. From userspace point of view, having to open /dev/random, ioctl, and mmap() is a no-go for a (CS)PRNG embedded in libc for arc4random(). I'm biased, as I proposed to expose such seed epoch value to userspace through getrandom() directly, relying on vDSO for reasonable performances, because current's glibc arc4random() is somewhat to slow to be a general replacement rand(). See https://lore.kernel.org/all/cover.1673539719.git.ydroneaud@opteya.com/ https://lore.kernel.org/all/ae35afa5b824dc76c5ded98efcabc117e6dd3d70@opteya.com/ Reards.
Hey Yann! On 17.09.23 15:34, Yann Droneaud wrote: > > Hi, > > Le 23/08/2023 à 11:01, Babis Chalios a écrit : >> User space often implements PRNGs that use /dev/random as entropy >> source. We can not expect that this randomness sources stay completely >> unknown forever. For various reasons, the originating PRNG seed may >> become known at which point the PRNG becomes insecure for further random >> number generation. Events that can lead to that are for example fast >> computers reversing the PRNG function using a number of inputs or >> Virtual Machine clones which carry seed values into their clones. >> >> During LPC 2022 Jason, Alex, Michael and me brainstormed on how to >> atomically expose a notification to user space that it should reseed. >> Atomicity is key for the VM clone case. This patchset implements a >> potential path to do so. >> >> This patchset introduces an epoch value as the means of communicating to >> the guest the need to reseed. The epoch is a 32bit value with the >> following form: >> >> RNG epoch >> *-------------*---------------------* >> | notifier id | epoch counter value | >> *-------------*---------------------* >> 8 bits 24 bits >> >> Changes in this value signal moments in time that PRNGs need to be >> re-seeded. As a result, the intended use of the epoch from user space >> PRNGs is to cache the epoch value every time they reseed using kernel >> entropy, then control that its value hasn't changed before giving out >> random numbers. If the value has changed the PRNG needs to reseed before >> producing any more random bits. >> >> The API for getting hold of this value is offered through >> /dev/(u)random. We introduce a new ioctl for these devices, which >> creates an anonymous file descriptor. User processes can call the >> ioctl() to get the anon fd and then mmap it to a single page. That page >> contains the value of the epoch at offset 0. >> >> Naturally, random.c is the component that maintains the RNG epoch. >> During initialization it allocates a single global page which holds the >> epoch value. Moreover, it exposes an API to kernel subsystems >> (notifiers) which can report events that require PRNG reseeding. >> Notifiers register with random.c and receive an 8-bit notifier id (up to >> 256 subscribers should be enough) and a pointer to the epoch. Notifying, >> then, is equivalent to writing in the epoch address a new epoch value. >> >> Notifiers write epoch values that include the notifier ID on the higher >> 8 bits and increasing counter values on the 24 remaining bits. This >> guarantees that two notifiers cannot ever write the same epoch value, >> since notificator IDs are unique. >> >> The first patch of this series implements the epoch mechanism. It adds >> the logic in the random.c to maintain the epoch page and expose the >> user space facing API. It also adds the internal API that allows kernel >> systems to register as notifiers. > > From userspace point of view, having to open /dev/random, ioctl, and > mmap() > is a no-go for a (CS)PRNG embedded in libc for arc4random(). Could you please elaborate on why it's a no-go? With any approach we take, someone somewhere needs to map and expose data to user space that we are in a new "epoch". With this patch set, you do that explicitly from user space through an fd that you keep open plus an mmap that you keep active. With vgetrandom, the kernel does it implicitly for you. So with this patch set's approach, the first call to arc4random() would need to establish the epoch mmap and leave it open. After that epoch handling is (almost) free - it's just a 32bit value compare. Are you saying that there is a problem with keeping track of that additional state? As mentioned above, we need to keep track of some state somewhere: Either in the vdso plus kernel page map logic or in the library that consumes epochs. If this is the problem, maybe the fundamental issue is that arc4random() assumes you always have everything in place to receive randomness without a handle that could go through an open/close (init/destroy) cycle? I suppose you could change that? > I'm biased, as I proposed to expose such seed epoch value to userspace > through > getrandom() directly, relying on vDSO for reasonable performances, > because > current's glibc arc4random() is somewhat to slow to be a general > replacement > rand(). > > See > https://lore.kernel.org/all/cover.1673539719.git.ydroneaud@opteya.com/ > https://lore.kernel.org/all/ae35afa5b824dc76c5ded98efcabc117e6dd3d70@opteya.com/ > There are more problems with coupling epochs to the vgetrandom approach: Not everyone will want to or can use Linux's rng as the sole source of entropy for various reasons (NIST, FIPS, TLS recommendations to not rely on a single source, real time requirements, etc) but still require knowledge of epoch changes. That means we need an alternative path for these applications regardless. May as well start with that :). If we then still conclude that vgetrandom is the best path forward to accelerate access to /dev/urandom in user space, we can just always map this patch set's epoch page into the vDSO range and then make vgetrandom consume it, similar to how a user space library would. I genuinely don't understand how vgetrandom and this patch set contradict each other. Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879