mbox series

[v2,00/10] Introduce cmpxchg128() -- aka. the demise of cmpxchg_double()

Message ID 20230202145030.223740842@infradead.org
Headers show
Series Introduce cmpxchg128() -- aka. the demise of cmpxchg_double() | expand

Message

Peter Zijlstra Feb. 2, 2023, 2:50 p.m. UTC
Hi!

Since Linus hated on cmpxchg_double(), a few patches to get rid of it, as
proposed here:

  https://lkml.kernel.org/r/Y2U3WdU61FvYlpUh@hirez.programming.kicks-ass.net


These patches are based on 6.2.0-rc6 + cryptodev-2.6, but also apply to next/master.

Available here:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git core/wip-u128

New since v1:

 - rebaed on Eric's ghash cleanups (hence the cryptodev-2.6 dependency)
 - rebased on Heiko's s390/cpum_sf CDSG patch
 - fixed up a bunch of arch code
 - fixed up the inline asm to use 'u128 *' mem argument so the compiler knows
   how wide the modification is.
 - reworked the percpu thing to use union based type-punning instead of
   _Generic() based casts.

---
 Documentation/core-api/this_cpu_ops.rst     |   2 -
 arch/arm64/include/asm/atomic_ll_sc.h       |  56 ++++++-----
 arch/arm64/include/asm/atomic_lse.h         |  39 ++++----
 arch/arm64/include/asm/cmpxchg.h            |  48 +++-------
 arch/arm64/include/asm/percpu.h             |  31 ++++--
 arch/s390/include/asm/cmpxchg.h             |  32 ++-----
 arch/s390/include/asm/cpu_mf.h              |   2 +-
 arch/s390/include/asm/percpu.h              |  35 ++++---
 arch/s390/kernel/perf_cpum_sf.c             |  22 ++---
 arch/x86/include/asm/cmpxchg.h              |  25 -----
 arch/x86/include/asm/cmpxchg_32.h           |   2 +-
 arch/x86/include/asm/cmpxchg_64.h           |  54 ++++++++++-
 arch/x86/include/asm/percpu.h               |  97 +++++++++++--------
 drivers/iommu/amd/amd_iommu_types.h         |   9 +-
 drivers/iommu/amd/iommu.c                   |  10 +-
 drivers/iommu/intel/irq_remapping.c         |   8 +-
 include/asm-generic/percpu.h                |  62 ++----------
 include/crypto/b128ops.h                    |  14 +--
 include/linux/atomic/atomic-arch-fallback.h |  95 ++++++++++++++++++-
 include/linux/atomic/atomic-instrumented.h  |  88 ++++++++++++++---
 include/linux/dmar.h                        | 125 ++++++++++++------------
 include/linux/percpu-defs.h                 |  46 +++------
 include/linux/slub_def.h                    |  12 ++-
 include/linux/types.h                       |   5 +
 include/uapi/linux/types.h                  |   4 +
 lib/crypto/curve25519-hacl64.c              |   2 -
 lib/crypto/poly1305-donna64.c               |   2 -
 mm/slab.h                                   |  45 ++++++++-
 mm/slub.c                                   | 142 +++++++++++++++++-----------
 scripts/atomic/gen-atomic-fallback.sh       |   4 +-
 scripts/atomic/gen-atomic-instrumented.sh   |  21 ++--
 31 files changed, 645 insertions(+), 494 deletions(-)

Comments

David Laight Feb. 2, 2023, 10:45 p.m. UTC | #1
From: Linus Torvalds
> Sent: 02 February 2023 19:39
> 
> On Thu, Feb 2, 2023 at 7:29 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> >  - fixed up the inline asm to use 'u128 *' mem argument so the compiler knows
> >    how wide the modification is.
> >  - reworked the percpu thing to use union based type-punning instead of
> >    _Generic() based casts.
> 
> Looks lovely to me. This removed all my concerns (except for the
> testing one, but all the patches looked nice and clean to me, so
> clearly it must be perfect).

The change is almost certainly for the better.

But did I spot one of the bits using cmpxchg128 just to do an atomic write?
I think it was updating some interrupt info that was at first glance not
dissimilar to that used by MSI-X (it wasn't MSI-X).

If that was a hardware register then it could well require a full bus lock.
Using a write of a sse (or equiv) 128bit register would be an atomic write
without the bus lock problem.

Also, that is only going to work if the hardware/logic side guarantees to
treat a single write as atomic.
I know there are MSI-X implementations out there where the cpu write
will be split into four 32bit writes to some internal memory and the
hardware side will also do multiple accesses.
(Pretty much any implementation on an fpga will behave like that,
not just the one I wrote.)
I didn't see the MSI-X code there, but I do wonder how it safely changes
affinities.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)