Message ID | ZovalOTfarFv1SZa@p100 |
---|---|
State | Accepted |
Commit | ab9a244c396aae4aaa34b2399b82fc15ec2df8c1 |
Headers | show |
Series | [v2] crypto: xor - fix template benchmarking | expand |
On Mon, Jul 08, 2024 at 02:24:52PM +0200, Helge Deller wrote: > Commit c055e3eae0f1 ("crypto: xor - use ktime for template benchmarking") > switched from using jiffies to ktime-based performance benchmarking. > > This works nicely on machines which have a fine-grained ktime() > clocksource as e.g. x86 machines with TSC. > But other machines, e.g. my 4-way HP PARISC server, don't have such > fine-grained clocksources, which is why it seems that 800 xor loops > take zero seconds, which then shows up in the logs as: > > xor: measuring software checksum speed > 8regs : -1018167296 MB/sec > 8regs_prefetch : -1018167296 MB/sec > 32regs : -1018167296 MB/sec > 32regs_prefetch : -1018167296 MB/sec > > Fix this with some small modifications to the existing code to improve > the algorithm to always produce correct results without introducing > major delays for architectures with a fine-grained ktime() > clocksource: > a) Delay start of the timing until ktime() just advanced. On machines > with a fast ktime() this should be just one additional ktime() call. > b) Count the number of loops. Run at minimum 800 loops and finish > earliest when the ktime() counter has progressed. > > With that the throughput can now be calculated more accurately under all > conditions. > > Fixes: c055e3eae0f1 ("crypto: xor - use ktime for template benchmarking") > Signed-off-by: Helge Deller <deller@gmx.de> > Tested-by: John David Anglin <dave.anglin@bell.net> > > v2: > - clean up coding style (noticed & suggested by Herbert Xu) > - rephrased & fixed typo in commit message Patch applied. Thanks.
diff --git a/crypto/xor.c b/crypto/xor.c index 8e72e5d5db0d..56aa3169e871 100644 --- a/crypto/xor.c +++ b/crypto/xor.c @@ -83,33 +83,30 @@ static void __init do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2) { int speed; - int i, j; - ktime_t min, start, diff; + unsigned long reps; + ktime_t min, start, t0; tmpl->next = template_list; template_list = tmpl; preempt_disable(); - min = (ktime_t)S64_MAX; - for (i = 0; i < 3; i++) { - start = ktime_get(); - for (j = 0; j < REPS; j++) { - mb(); /* prevent loop optimization */ - tmpl->do_2(BENCH_SIZE, b1, b2); - mb(); - } - diff = ktime_sub(ktime_get(), start); - if (diff < min) - min = diff; - } + reps = 0; + t0 = ktime_get(); + /* delay start until time has advanced */ + while ((start = ktime_get()) == t0) + cpu_relax(); + do { + mb(); /* prevent loop optimization */ + tmpl->do_2(BENCH_SIZE, b1, b2); + mb(); + } while (reps++ < REPS || (t0 = ktime_get()) == start); + min = ktime_sub(t0, start); preempt_enable(); // bytes/ns == GB/s, multiply by 1000 to get MB/s [not MiB/s] - if (!min) - min = 1; - speed = (1000 * REPS * BENCH_SIZE) / (unsigned int)ktime_to_ns(min); + speed = (1000 * reps * BENCH_SIZE) / (unsigned int)ktime_to_ns(min); tmpl->speed = speed; pr_info(" %-16s: %5d MB/sec\n", tmpl->name, speed);