diff mbox series

[RESEND] random: use correct memory barriers for crng_node_pool

Message ID 20211219025139.31085-1-ebiggers@kernel.org
State New
Headers show
Series [RESEND] random: use correct memory barriers for crng_node_pool | expand

Commit Message

Eric Biggers Dec. 19, 2021, 2:51 a.m. UTC
From: Eric Biggers <ebiggers@google.com>

When a CPU selects which CRNG to use, it accesses crng_node_pool without
a memory barrier.  That's wrong, because crng_node_pool can be set by
another CPU concurrently.  Without a memory barrier, the crng_state that
is used might not appear to be fully initialized.

There's an explicit mb() on the write side, but it's redundant with
cmpxchg() (or cmpxchg_release()) and does nothing to fix the read side.

Implement this correctly by using a cmpxchg_release() +
smp_load_acquire() pair.

Note: READ_ONCE() could be used instead of smp_load_acquire(), but it is
harder to verify that it is correct, so I'd prefer not to use it here.

Fixes: 1e7f583af67b ("random: make /dev/urandom scalable for silly userspace programs")
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: Eric Biggers <ebiggers@google.com>
---

I sent this fix about a year ago
(https://lore.kernel.org/lkml/20200916233042.51634-1-ebiggers@kernel.org/T/#u),
and though it's a correct fix, it was derailed by a debate about whether
it's safe to use READ_ONCE() instead of smp_load_acquire() or not.
Therefore, the current code, which (AFAIK) everyone agrees is buggy, was
never actually fixed.  Since random.c has a new maintainer now, I think
it's worth sending this fix for reconsideration.

 drivers/char/random.c | 42 ++++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 20 deletions(-)

Comments

Jason A. Donenfeld Dec. 20, 2021, 3:07 p.m. UTC | #1
Hi Eric,

This patch seems fine to me, and I'll apply it in a few days after
sitting on the list for comments, but:

> Note: READ_ONCE() could be used instead of smp_load_acquire(), but it is
> harder to verify that it is correct, so I'd prefer not to use it here.
> (https://lore.kernel.org/lkml/20200916233042.51634-1-ebiggers@kernel.org/T/#u),
> and though it's a correct fix, it was derailed by a debate about whether
> it's safe to use READ_ONCE() instead of smp_load_acquire() or not.

But holy smokes... I chuckled at your, "please explain in English." :)

Paul - if you'd like to look at this patch and confirm that this
specific patch and usage is fine to be changed into READ_ONCE()
instead of smp_load_acquire(), please pipe up here. And I really do
mean this specific patch and usage, not to be confused with any other
usage elsewhere in the kernel or question about general things, which
doubtlessly involve larger discussions like the one Eric linked to
above. If you're certain this patch here is READ_ONCE()able, I'd
appreciate your saying so with a simple, "it is safe; go for it",
since I'd definitely like the optimization if it's safe. If I don't
hear from you, I'll apply this as-is from Eric, as I'd rather be safe
than sorry.

Jason
Jason A. Donenfeld Dec. 20, 2021, 3:17 p.m. UTC | #2
On Sun, Dec 19, 2021 at 3:52 AM Eric Biggers <ebiggers@kernel.org> wrote:
> +
> +static inline struct crng_state *select_crng(void)
> +{
> +
> +static inline struct crng_state *select_crng(void)
> +{

Usually static inline is avoided in .c files. Any special reason why
we'd need this especially much here? Those functions are pretty small
and I assume will be inlined anyway on most architectures.

I just did a test on x86_64 with GCC 11, and the same file was
produced with 'static' as with 'static inline'. Was there an
arch/config/compiler combo you were concerned about here?

Jason
Eric Biggers Dec. 20, 2021, 3:38 p.m. UTC | #3
On Mon, Dec 20, 2021 at 04:17:59PM +0100, Jason A. Donenfeld wrote:
> On Sun, Dec 19, 2021 at 3:52 AM Eric Biggers <ebiggers@kernel.org> wrote:
> > +
> > +static inline struct crng_state *select_crng(void)
> > +{
> > +
> > +static inline struct crng_state *select_crng(void)
> > +{
> 
> Usually static inline is avoided in .c files. Any special reason why
> we'd need this especially much here? Those functions are pretty small
> and I assume will be inlined anyway on most architectures.
> 
> I just did a test on x86_64 with GCC 11, and the same file was
> produced with 'static' as with 'static inline'. Was there an
> arch/config/compiler combo you were concerned about here?

No special reason, this is just a bad habit.  I'm fine with omitting 'inline'
here.

- Eric
Paul E. McKenney Dec. 20, 2021, 6:11 p.m. UTC | #4
On Mon, Dec 20, 2021 at 04:07:28PM +0100, Jason A. Donenfeld wrote:
> Hi Eric,
> 
> This patch seems fine to me, and I'll apply it in a few days after
> sitting on the list for comments, but:
> 
> > Note: READ_ONCE() could be used instead of smp_load_acquire(), but it is
> > harder to verify that it is correct, so I'd prefer not to use it here.
> > (https://lore.kernel.org/lkml/20200916233042.51634-1-ebiggers@kernel.org/T/#u),
> > and though it's a correct fix, it was derailed by a debate about whether
> > it's safe to use READ_ONCE() instead of smp_load_acquire() or not.
> 
> But holy smokes... I chuckled at your, "please explain in English." :)
> 
> Paul - if you'd like to look at this patch and confirm that this
> specific patch and usage is fine to be changed into READ_ONCE()
> instead of smp_load_acquire(), please pipe up here. And I really do
> mean this specific patch and usage, not to be confused with any other
> usage elsewhere in the kernel or question about general things, which
> doubtlessly involve larger discussions like the one Eric linked to
> above. If you're certain this patch here is READ_ONCE()able, I'd
> appreciate your saying so with a simple, "it is safe; go for it",
> since I'd definitely like the optimization if it's safe. If I don't
> hear from you, I'll apply this as-is from Eric, as I'd rather be safe
> than sorry.

First I would want to see some evidence that READ_ONCE() was really
providing measurable performance benefit.  Such evidence would be
easiest to obtain by running on a weakly ordered system such as ARM,
ARMv8, or PowerPC.

If this does provide a measurable benefit, why not the following?

static inline struct crng_state *select_crng(void)
{
	struct crng_state **pool;
	struct crng_state *pooln;
	int nid = numa_node_id();

	/* pairs with cmpxchg_release() in do_numa_crng_init() */
	pool = rcu_dereference(&crng_node_pool);
	if (pool) {
		pooln = rcu_dereference(pool[nid]);
		if (pooln)
			return pooln;
	}

	return &primary_crng;
}

This is in ignorance of the kfree() side of this code.  So another
question is "Suppose that there was a long delay (vCPU preemption, for
example) just before the 'return pooln'.  What prevents a use-after-free
bug?"

Of course, this question applies equally to the smp_load_acquire()
version.

							Thanx, Paul
Jason A. Donenfeld Dec. 20, 2021, 6:16 p.m. UTC | #5
On Mon, Dec 20, 2021 at 7:11 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> First I would want

It looks like you've answered my question with four other questions,
which seem certainly technically warranted, but also indicates we're
probably not going to get to the nice easy resting place of, "it is
safe; go for it" that I was hoping for. In light of that, it seems
like merging Eric's patch is reasonable.

Jason
Paul E. McKenney Dec. 20, 2021, 6:31 p.m. UTC | #6
On Mon, Dec 20, 2021 at 07:16:48PM +0100, Jason A. Donenfeld wrote:
> On Mon, Dec 20, 2021 at 7:11 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > First I would want
> 
> It looks like you've answered my question with four other questions,
> which seem certainly technically warranted, but also indicates we're
> probably not going to get to the nice easy resting place of, "it is
> safe; go for it" that I was hoping for. In light of that, it seems
> like merging Eric's patch is reasonable.

My hope would be that the questions can be quickly answered by the
developers and maintainers.  But yes, hope springs eternal.

							Thanx, Paul
Eric Biggers Dec. 20, 2021, 6:35 p.m. UTC | #7
On Mon, Dec 20, 2021 at 10:31:40AM -0800, Paul E. McKenney wrote:
> On Mon, Dec 20, 2021 at 07:16:48PM +0100, Jason A. Donenfeld wrote:
> > On Mon, Dec 20, 2021 at 7:11 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > First I would want
> > 
> > It looks like you've answered my question with four other questions,
> > which seem certainly technically warranted, but also indicates we're
> > probably not going to get to the nice easy resting place of, "it is
> > safe; go for it" that I was hoping for. In light of that, it seems
> > like merging Eric's patch is reasonable.
> 
> My hope would be that the questions can be quickly answered by the
> developers and maintainers.  But yes, hope springs eternal.
> 
> 							Thanx, Paul

I wouldn't expect READ_ONCE() to provide a noticable performance improvement
here, as it would be lost in the noise of the other work done, especially
chacha20_block().

The data structures in question are never freed, so your other questions are
irrelevant, if I understand correctly.

- Eric
Paul E. McKenney Dec. 20, 2021, 7 p.m. UTC | #8
On Mon, Dec 20, 2021 at 12:35:05PM -0600, Eric Biggers wrote:
> On Mon, Dec 20, 2021 at 10:31:40AM -0800, Paul E. McKenney wrote:
> > On Mon, Dec 20, 2021 at 07:16:48PM +0100, Jason A. Donenfeld wrote:
> > > On Mon, Dec 20, 2021 at 7:11 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > First I would want
> > > 
> > > It looks like you've answered my question with four other questions,
> > > which seem certainly technically warranted, but also indicates we're
> > > probably not going to get to the nice easy resting place of, "it is
> > > safe; go for it" that I was hoping for. In light of that, it seems
> > > like merging Eric's patch is reasonable.
> > 
> > My hope would be that the questions can be quickly answered by the
> > developers and maintainers.  But yes, hope springs eternal.
> > 
> > 							Thanx, Paul
> 
> I wouldn't expect READ_ONCE() to provide a noticable performance improvement
> here, as it would be lost in the noise of the other work done, especially
> chacha20_block().
> 
> The data structures in question are never freed, so your other questions are
> irrelevant, if I understand correctly.

Very good, and thank you!  You are correct, if the structures never are
freed, there is no use-after-free issue.  And that also explains why I
was not able to find the free path.  ;-)

So the main issue is the race between insertion and lookup.  So yes,
READ_ONCE() suffices.

This assumes that the various crng_node_pool[i] pointers never change
while accessible to readers (and that some sort of synchronization applies
to the values in the pointed-to structure).  If these pointers do change,
then there also needs to be a READ_ONCE(pool[nid]) in select_crng(), where
the value returned from this READ_ONCE() is both tested and returned.
(As in assign this value to a temporary.)

But if the various crng_node_pool[i] pointers really are constant
while readers can access them, then the cmpxchg_release() suffices.
The loads from pool[nid] are then data-race free, and because they
are unmarked, the compiler is prohibited from hoisting them out from
within the "if" statement.  The address dependency prohibits the
CPU from reordering them.

So READ_ONCE() should be just fine.  Which answers Jason's question.  ;-)

Looking at _extract_crng(), if this was my code, I would use READ_ONCE()
in the checks, but that might be my misunderstanding boot-time constraints
or some such.  Without some sort of constraint, I don't see how the code
avoids confusion from reloads of crng->init_time if two CPUs concurrently
see the expiration of CRNG_RESEED_INTERVAL, but I could easily be missing
something that makes this safe.  (And this is irrelevant to this patch.)

You do appear to have ->lock guarding the pointed-to data, so that
is good.

							Thanx, Paul
Jason A. Donenfeld Dec. 20, 2021, 9:45 p.m. UTC | #9
Hi Paul,

On Mon, Dec 20, 2021 at 8:00 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> This assumes that the various crng_node_pool[i] pointers never change
> while accessible to readers (and that some sort of synchronization applies
> to the values in the pointed-to structure).  If these pointers do change,
> then there also needs to be a READ_ONCE(pool[nid]) in select_crng(), where
> the value returned from this READ_ONCE() is both tested and returned.
> (As in assign this value to a temporary.)
>
> But if the various crng_node_pool[i] pointers really are constant
> while readers can access them, then the cmpxchg_release() suffices.
> The loads from pool[nid] are then data-race free, and because they
> are unmarked, the compiler is prohibited from hoisting them out from
> within the "if" statement.  The address dependency prohibits the
> CPU from reordering them.

Right, this is just an initialization-time allocation and assignment,
never updated or freed again after.

> So READ_ONCE() should be just fine.  Which answers Jason's question.  ;-)

Great. So v2 of this patch can use READ_ONCE then. Thanks!

> Looking at _extract_crng(), if this was my code, I would use READ_ONCE()
> in the checks, but that might be my misunderstanding boot-time constraints
> or some such.  Without some sort of constraint, I don't see how the code
> avoids confusion from reloads of crng->init_time if two CPUs concurrently
> see the expiration of CRNG_RESEED_INTERVAL, but I could easily be missing
> something that makes this safe.  (And this is irrelevant to this patch.)

Indeed init_time seems to race via the crng_reseed path, and
READ_ONCE()ing that seems reasonable. The other setters of it --
initialize_{primary,secondary} -- are in the boot path.

Jason
Eric Biggers Dec. 20, 2021, 10:10 p.m. UTC | #10
On Mon, Dec 20, 2021 at 10:45:15PM +0100, Jason A. Donenfeld wrote:
> Hi Paul,
> 
> On Mon, Dec 20, 2021 at 8:00 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > This assumes that the various crng_node_pool[i] pointers never change
> > while accessible to readers (and that some sort of synchronization applies
> > to the values in the pointed-to structure).  If these pointers do change,
> > then there also needs to be a READ_ONCE(pool[nid]) in select_crng(), where
> > the value returned from this READ_ONCE() is both tested and returned.
> > (As in assign this value to a temporary.)
> >
> > But if the various crng_node_pool[i] pointers really are constant
> > while readers can access them, then the cmpxchg_release() suffices.
> > The loads from pool[nid] are then data-race free, and because they
> > are unmarked, the compiler is prohibited from hoisting them out from
> > within the "if" statement.  The address dependency prohibits the
> > CPU from reordering them.
> 
> Right, this is just an initialization-time allocation and assignment,
> never updated or freed again after.
> 
> > So READ_ONCE() should be just fine.  Which answers Jason's question.  ;-)
> 
> Great. So v2 of this patch can use READ_ONCE then. Thanks!

Sure, I really don't care anymore.  If people want READ_ONCE() here, I'll use
it.  It seems that the people who really prefer smp_load_acquire() aren't on
this thread (unlike on
https://lore.kernel.org/linux-fsdevel/20200713033330.205104-1-ebiggers@kernel.org/T/#u
for example, where READ_ONCE() was rejected), so I guess that is what people are
going to agree on in this particular case.

- Eric
diff mbox series

Patch

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 605969ed0f96..349a6f235c61 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -843,8 +843,8 @@  static void do_numa_crng_init(struct work_struct *work)
 		crng_initialize_secondary(crng);
 		pool[i] = crng;
 	}
-	mb();
-	if (cmpxchg(&crng_node_pool, NULL, pool)) {
+	/* pairs with smp_load_acquire() in select_crng() */
+	if (cmpxchg_release(&crng_node_pool, NULL, pool) != NULL) {
 		for_each_node(i)
 			kfree(pool[i]);
 		kfree(pool);
@@ -857,8 +857,26 @@  static void numa_crng_init(void)
 {
 	schedule_work(&numa_crng_init_work);
 }
+
+static inline struct crng_state *select_crng(void)
+{
+	struct crng_state **pool;
+	int nid = numa_node_id();
+
+	/* pairs with cmpxchg_release() in do_numa_crng_init() */
+	pool = smp_load_acquire(&crng_node_pool);
+	if (pool && pool[nid])
+		return pool[nid];
+
+	return &primary_crng;
+}
 #else
 static void numa_crng_init(void) {}
+
+static inline struct crng_state *select_crng(void)
+{
+	return &primary_crng;
+}
 #endif
 
 /*
@@ -1005,15 +1023,7 @@  static void _extract_crng(struct crng_state *crng,
 
 static void extract_crng(__u8 out[CHACHA_BLOCK_SIZE])
 {
-	struct crng_state *crng = NULL;
-
-#ifdef CONFIG_NUMA
-	if (crng_node_pool)
-		crng = crng_node_pool[numa_node_id()];
-	if (crng == NULL)
-#endif
-		crng = &primary_crng;
-	_extract_crng(crng, out);
+	_extract_crng(select_crng(), out);
 }
 
 /*
@@ -1042,15 +1052,7 @@  static void _crng_backtrack_protect(struct crng_state *crng,
 
 static void crng_backtrack_protect(__u8 tmp[CHACHA_BLOCK_SIZE], int used)
 {
-	struct crng_state *crng = NULL;
-
-#ifdef CONFIG_NUMA
-	if (crng_node_pool)
-		crng = crng_node_pool[numa_node_id()];
-	if (crng == NULL)
-#endif
-		crng = &primary_crng;
-	_crng_backtrack_protect(crng, tmp, used);
+	_crng_backtrack_protect(select_crng(), tmp, used);
 }
 
 static ssize_t extract_crng_user(void __user *buf, size_t nbytes)