mbox series

[v5,00/10] Optimize buffer_is_zero

Message ID 20240217003918.52229-1-richard.henderson@linaro.org
Headers show
Series Optimize buffer_is_zero | expand

Message

Richard Henderson Feb. 17, 2024, 12:39 a.m. UTC
v3: https://patchew.org/QEMU/20240206204809.9859-1-amonakov@ispras.ru/
v4: https://patchew.org/QEMU/20240215081449.848220-1-richard.henderson@linaro.org/

Changes for v5:
  - Move 3 byte sample back inline; document it.
  - Drop AArch64 SVE alternative; neoverse-v2 still recommends simd for memcpy.
  - Use UMAXV for aarch64 simd reduction
    3 cycles on cortex-a76, 2 cycles on neoverse-n1,
    as compared to UQXTN or CMEQ+SHRN at 4 cycles each.
  - Add benchmark of zeros.

The benchmark is trivial, and could be improved so that it
prints the name of the acceleration routine instead of its
index in the selection process.  But its is good enough to
see that #0 is faster than #1, etc.

A sample set:

Apple M1:
  buffer_is_zero #0: 135416.27 MB/sec
  buffer_is_zero #1: 111771.25 MB/sec

Neoverse N1:
  buffer_is_zero #0: 56489.82 MB/sec
  buffer_is_zero #1: 36347.93 MB/sec

i7-1195G7:
  buffer_is_zero #0: 137327.40 MB/sec
  buffer_is_zero #1: 69159.20 MB/sec
  buffer_is_zero #2: 38319.80 MB/sec


r~


Alexander Monakov (5):
  util/bufferiszero: Remove SSE4.1 variant
  util/bufferiszero: Remove AVX512 variant
  util/bufferiszero: Reorganize for early test for acceleration
  util/bufferiszero: Remove useless prefetches
  util/bufferiszero: Optimize SSE2 and AVX2 variants

Richard Henderson (5):
  util/bufferiszero: Improve scalar variant
  util/bufferiszero: Introduce biz_accel_fn typedef
  util/bufferiszero: Simplify test_buffer_is_zero_next_accel
  util/bufferiszero: Add simd acceleration for aarch64
  tests/bench: Add bufferiszero-bench

 include/qemu/cutils.h            |  32 ++-
 tests/bench/bufferiszero-bench.c |  42 +++
 util/bufferiszero.c              | 449 +++++++++++++++++--------------
 tests/bench/meson.build          |   4 +-
 4 files changed, 319 insertions(+), 208 deletions(-)
 create mode 100644 tests/bench/bufferiszero-bench.c