diff mbox series

[v1,1/1] s390/arch_random: Buffer true random data

Message ID 20220705112712.4433-2-dengler@linux.ibm.com
State New
Headers show
Series [v1,1/1] s390/arch_random: Buffer true random data | expand

Commit Message

Holger Dengler July 5, 2022, 11:27 a.m. UTC
The trng instruction is very expensive and has a constant runtime for
getting 0 to 32 bytes of (conditioned) true random data. Calling trng for
in arch_get_random_seed_long() for each 8 bytes is too time-consuming,
especially if it is called in interrupt context.

This implementation buffers the trng data and deliver parts of it to the
callers. This reduces the costs by factor 4.

Drop the implementation for arch_get_random_seed_int(), because there are
no callers.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
---
 arch/s390/crypto/arch_random.c     | 51 +++++++++++++++++++++++++++++-
 arch/s390/include/asm/archrandom.h | 17 ++++------
 2 files changed, 56 insertions(+), 12 deletions(-)

Comments

Harald Freudenberger July 6, 2022, 4:18 p.m. UTC | #1
On 2022-07-05 18:27, Holger Dengler wrote:
> Hi Jason,
> 
> On 05/07/2022 17:11, Jason A. Donenfeld wrote:
>> Hi Holger,
>> 
>> On Tue, Jul 05, 2022 at 04:58:30PM +0200, Holger Dengler wrote:
>>> It is true, that the performance of the instruction is not really
>>> relevant, but only for calls outside of an interrupt context. I did
>>> some ftrace logging for the s390_random_get_seed_long() calls, and -
>>> as you said - there are a few calls per minute. But there was also
>>> some repeating calls in interrupt context. On systems with a huge
>>> interrupt load, this can cause severe performance impacts. I've no
>> 
>> It'd be interesting to know more about this. The way you get
>> arch_random_get_seed_long() from irq context is:
>> 
>> get_random_{bytes,int,long,u32,u64}()
>>   crng_make_state()
>>     crng_reseed() <-- Rarely
>>       extract_entropy()
>>         arch_get_random_seed_long()
>> 
>> So if an irq user of get_random_xx() is the unlucky one in the minute
>> span who has to call crng_reseed() then, yea, that'll happen. But I
>> wonder about this luck aspect. What scenarios are you seeing where 
>> this
>> happens all the time? Which driver is using random bytes *so* commonly
>> from irq context? Not that, per say, there's anything wrong with that,
>> but it could be eyebrow raising, and might point to de facto solutions
>> that mostly take care of this.
> 
> I saw a few calls in interrupt context during my tracing, but I didn't
> look to see which ones they were. Let me figure that out in the next
> few days and provide more information on that.
> 
>> One such direction might be making a driver that does such a thing do 
>> it
>> a little bit less, somehow. Another direction would be preferring
>> non-irqs to handle crng_reseed(), but not disallowing irqs entirely,
>> with a patch something like the one below. Or maybe there are other
>> ideas.
> 
> Reduce the number of trng in interrupt context is a possibility, but -
> in my opinion - only one single trng instruction call in interrupt
> context in one too much.
> 
> For the moment, I would propose to drop the buffering but also return
> false, if arch_random_get_seed_long() is called in interrupt context.
> 
> diff --git a/arch/s390/include/asm/archrandom.h
> b/arch/s390/include/asm/archrandom.h
> index 2c6e1c6ecbe7..711357bdc464 100644
> --- a/arch/s390/include/asm/archrandom.h
> +++ b/arch/s390/include/asm/archrandom.h
> @@ -32,7 +32,8 @@ static inline bool __must_check
> arch_get_random_int(unsigned int *v)
> 
>  static inline bool __must_check arch_get_random_seed_long(unsigned 
> long *v)
>  {
> -       if (static_branch_likely(&s390_arch_random_available)) {
> +       if (static_branch_likely(&s390_arch_random_available) &&
> +           !in_interrupt()) {
>                 cpacf_trng(NULL, 0, (u8 *)v, sizeof(*v));
>                 atomic64_add(sizeof(*v), &s390_arch_random_counter);
>                 return true;
> 
> (on-top of your commit, without our buffering patch)
> 
>> 
>> But all this is to say that having some more of the "mundane" details
>> about this might actually help us.
>> 
>> Jason
>> 
>> diff --git a/drivers/char/random.c b/drivers/char/random.c
>> index e3dd1dd3dd22..81df8cdf2a62 100644
>> --- a/drivers/char/random.c
>> +++ b/drivers/char/random.c
>> @@ -270,6 +270,9 @@ static bool crng_has_old_seed(void)
>>  	static bool early_boot = true;
>>  	unsigned long interval = CRNG_RESEED_INTERVAL;
>> 
>> +	if (in_hardirq())
>> +		interval += HZ * 10;
>> +
>>  	if (unlikely(READ_ONCE(early_boot))) {
>>  		time64_t uptime = ktime_get_seconds();
>>  		if (uptime >= CRNG_RESEED_INTERVAL / HZ * 2)
>> 

Hi Holger and Jason
I tried to find out what is the reason of the invocations in interrupt 
context.
First I have to admit that there is in fact not much of 
arch_get_random_seed_long()
invocation any more in the recent kernel (5.19-rc5). I see about 100 
invocations
within 10 minutes with an LPAR running some qperf and dd dumps on dasds 
test load.
About half of these invocations is in interrupt context. I 
dump_stack()ed some of
these and I always catch the function
kfence_guarded_alloc()
   prandom_u32_max()
     prandom_u32()
       get_random_u32()
         _get_random_bytes()
           crng_make_state()
             crng_reseed()
               extract_entropy()
                 arch_get_random_seed_long()

However, with so few invocations it should not make any harm when there 
is a
even very expensive trng() invocation in interrupt context.

But I think we should check, if this is really something to backport to 
the older
kernels where arch_get_random_seed_long() is called really frequency.

Harald Freudenberger
Jason A. Donenfeld July 6, 2022, 10:34 p.m. UTC | #2
Hi Christian,

On Wed, Jul 06, 2022 at 08:29:49PM +0200, Christian Borntraeger wrote:
> >> However, with so few invocations it should not make any harm when there
> >> is a
> >> even very expensive trng() invocation in interrupt context.
> >>
> >> But I think we should check, if this is really something to backport to
> >> the older
> >> kernels where arch_get_random_seed_long() is called really frequency.
> > 
> > I backported the current random.c design to old kernels, so the
> > situation there should be the same as in 5.19-rc5.
> > 
> > So if you feel such rare usage is find even in_hardirq(), then I suppose
> > there's nothing more to do here?
> 
> I think up to 190µs in interrupt can result in unwanted latencies. Yes it does not
> happen that often and it is smaller than most timeslices of hypervisors.
> So I would likely turn that question around
> what happens if we return false if in_hardirq is true? Any noticeable
> delays in code that uses random numbers?

I think I already answered this here with mention of the hwrng driver:
https://lore.kernel.org/all/YsSAn2qXqlFkS5sH@zx2c4.com/

Anyway, I would recommend keeping it available in all contexts, so that
s390 isn't a special case to keep in mind, and because Harald said he
couldn't measure an actual problem existing for real. Plus, it's not as
though we're talking about RT kernels or a problem that would affect RT.
But if you're convinced that even the extremely rare case poses a issue,
doing the !in_hardirq() thing won't be the end of the world either and
is partly mitigated by the hwrng driver mentioned earlier. So I think
it's mostly up to you guys on what the tradeoffs are and what's
realistic and such.

Jason
diff mbox series

Patch

diff --git a/arch/s390/crypto/arch_random.c b/arch/s390/crypto/arch_random.c
index 1f2d40993c4d..07abd09ec759 100644
--- a/arch/s390/crypto/arch_random.c
+++ b/arch/s390/crypto/arch_random.c
@@ -2,17 +2,66 @@ 
 /*
  * s390 arch random implementation.
  *
- * Copyright IBM Corp. 2017, 2020
+ * Copyright IBM Corp. 2017, 2022
  * Author(s): Harald Freudenberger
+ *            Holger Dengler <dengler@linux.ibm.com>
+ *
+ * The trng instruction on s390 is very expensive and has a constant runtime
+ * for up to 32 bytes. Each trng call will get 32 bytes of (conditioned) true
+ * random data, which are buffered in a lock-protected array and delivered to
+ * up to four callers. To avoid long running trng calls in the interrupt
+ * context, a refill is skipped there. Also prevent the blocking of callers in
+ * interrupt context by a refill.
  */
 
 #include <linux/kernel.h>
 #include <linux/atomic.h>
 #include <linux/random.h>
+#include <linux/spinlock.h>
 #include <linux/static_key.h>
 #include <asm/cpacf.h>
 
 DEFINE_STATIC_KEY_FALSE(s390_arch_random_available);
 
+static struct {
+	unsigned long data[BITS_TO_LONGS(32 * BITS_PER_BYTE)];
+	int idx;
+} trngbuf;
+static DEFINE_SPINLOCK(trngbuf_lock);
+
 atomic64_t s390_arch_random_counter = ATOMIC64_INIT(0);
 EXPORT_SYMBOL(s390_arch_random_counter);
+
+bool s390_get_random_seed_long(unsigned long *v)
+{
+	unsigned long flags;
+
+	if (in_interrupt()) {
+		if (!spin_trylock_irqsave(&trngbuf_lock, flags))
+			return false;
+	} else {
+		spin_lock_irqsave(&trngbuf_lock, flags);
+	}
+	/* trngbuf is locked */
+
+	if (--trngbuf.idx < 0) {
+		/* buffer empty */
+		if (in_interrupt()) {
+			/* delegate refill to another caller */
+			spin_unlock_irqrestore(&trngbuf_lock, flags);
+			return false;
+		}
+
+		/* refill buffer */
+		cpacf_trng(NULL, 0, (u8 *)trngbuf.data, sizeof(trngbuf.data));
+		trngbuf.idx = ARRAY_SIZE(trngbuf.data) - 1;
+	}
+
+	/* deliver buffered random data */
+	*v = trngbuf.data[trngbuf.idx];
+	spin_unlock_irqrestore(&trngbuf_lock, flags);
+
+	atomic64_add(sizeof(long), &s390_arch_random_counter);
+	return true;
+}
+EXPORT_SYMBOL(s390_get_random_seed_long);
diff --git a/arch/s390/include/asm/archrandom.h b/arch/s390/include/asm/archrandom.h
index 2c6e1c6ecbe7..a1e365de6b33 100644
--- a/arch/s390/include/asm/archrandom.h
+++ b/arch/s390/include/asm/archrandom.h
@@ -2,7 +2,7 @@ 
 /*
  * Kernel interface for the s390 arch_random_* functions
  *
- * Copyright IBM Corp. 2017, 2020
+ * Copyright IBM Corp. 2017, 2022
  *
  * Author: Harald Freudenberger <freude@de.ibm.com>
  *
@@ -20,6 +20,8 @@ 
 DECLARE_STATIC_KEY_FALSE(s390_arch_random_available);
 extern atomic64_t s390_arch_random_counter;
 
+bool s390_get_random_seed_long(unsigned long *v);
+
 static inline bool __must_check arch_get_random_long(unsigned long *v)
 {
 	return false;
@@ -32,21 +34,14 @@  static inline bool __must_check arch_get_random_int(unsigned int *v)
 
 static inline bool __must_check arch_get_random_seed_long(unsigned long *v)
 {
-	if (static_branch_likely(&s390_arch_random_available)) {
-		cpacf_trng(NULL, 0, (u8 *)v, sizeof(*v));
-		atomic64_add(sizeof(*v), &s390_arch_random_counter);
-		return true;
-	}
+	if (static_branch_likely(&s390_arch_random_available))
+		return s390_get_random_seed_long(v);
+
 	return false;
 }
 
 static inline bool __must_check arch_get_random_seed_int(unsigned int *v)
 {
-	if (static_branch_likely(&s390_arch_random_available)) {
-		cpacf_trng(NULL, 0, (u8 *)v, sizeof(*v));
-		atomic64_add(sizeof(*v), &s390_arch_random_counter);
-		return true;
-	}
 	return false;
 }