Message ID | 20200819121318.52158-1-broonie@kernel.org |
---|---|
Headers | show |
Series | arm64: vdso: getcpu() support | expand |
On 8/19/20 6:13 AM, Mark Brown wrote: > Some applications, especially tracing ones, benefit from avoiding the > syscall overhead for getcpu() so it is common for architectures to have > vDSO implementations. Add one for arm64, using TPIDRRO_EL0 to pass a > pointer to per-CPU data rather than just store the immediate value in > order to allow for future extensibility. > > It is questionable if something TPIDRRO_EL0 based is worthwhile at all > on current kernels, since v4.18 we have had support for restartable > sequences which can be used to provide a sched_getcpu() implementation > with generally better performance than the vDSO approach on > architectures which have that[1]. Work is ongoing to implement this for > glibc: > > https://lore.kernel.org/lkml/20200527185130.5604-3-mathieu.desnoyers@efficios.com/ > > but is not yet merged and will need similar work for other userspaces. > The main advantages for the vDSO implementation are the node parameter > (though this is a static mapping to CPU number so could be looked up > separately when processing data if it's needed, it shouldn't need to be > in the hot path) and ease of implementation for users. > > This is currently not compatible with KPTI due to the use of TPIDRRO_EL0 > by the KPTI trampoline, this could be addressed by reinitializing that > system register in the return path but I have found it hard to justify > adding that overhead for all users for something that is essentially a > profiling optimization which is likely to get superceeded by a more > modern implementation - if there are other uses for the per-CPU data > then the balance might change here. > > This builds on work done by Kristina Martsenko some time ago but is a > new implementation. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d7822b1e24f2df5df98c76f0e94a5416349ff759 > > v3: > - Rebase on v5.9-rc1. > - Drop in progress portions of the series. > v2: > - Rebase on v5.8-rc3. > - Add further cleanup patches & a first draft of multi-page support. > > Mark Brown (5): > arm64: vdso: Provide a define when building the vDSO > arm64: vdso: Add per-CPU data > arm64: vdso: Initialise the per-CPU vDSO data > arm64: vdso: Add getcpu() implementation > selftests: vdso: Support arm64 in getcpu() test > > arch/arm64/include/asm/processor.h | 12 +---- > arch/arm64/include/asm/vdso/datapage.h | 54 +++++++++++++++++++ > arch/arm64/kernel/process.c | 26 ++++++++- > arch/arm64/kernel/vdso.c | 33 +++++++++++- > arch/arm64/kernel/vdso/Makefile | 4 +- > arch/arm64/kernel/vdso/vdso.lds.S | 1 + > arch/arm64/kernel/vdso/vgetcpu.c | 48 +++++++++++++++++ > .../testing/selftests/vDSO/vdso_test_getcpu.c | 10 ++++ > 8 files changed, 172 insertions(+), 16 deletions(-) > create mode 100644 arch/arm64/include/asm/vdso/datapage.h > create mode 100644 arch/arm64/kernel/vdso/vgetcpu.c > Patches look good to me from selftests perspective. My acked by for these patches to go through arm64. Acked-by: Shuah Khan <skhan@linuxfoundation.org> If you would like me to take these through kselftest tree, give me your Acks. I can queue these up for 5.10-rc1 thanks, -- Shuah
On Mon, Aug 31, 2020 at 03:47:17PM -0600, Shuah Khan wrote: > On 8/19/20 6:13 AM, Mark Brown wrote: > > Some applications, especially tracing ones, benefit from avoiding the > > syscall overhead for getcpu() so it is common for architectures to have > > vDSO implementations. Add one for arm64, using TPIDRRO_EL0 to pass a > > pointer to per-CPU data rather than just store the immediate value in > > order to allow for future extensibility. > > > > It is questionable if something TPIDRRO_EL0 based is worthwhile at all > > on current kernels, since v4.18 we have had support for restartable > > sequences which can be used to provide a sched_getcpu() implementation > > with generally better performance than the vDSO approach on > > architectures which have that[1]. Work is ongoing to implement this for > > glibc: > > > > https://lore.kernel.org/lkml/20200527185130.5604-3-mathieu.desnoyers@efficios.com/ > > > > but is not yet merged and will need similar work for other userspaces. > > The main advantages for the vDSO implementation are the node parameter > > (though this is a static mapping to CPU number so could be looked up > > separately when processing data if it's needed, it shouldn't need to be > > in the hot path) and ease of implementation for users. > > > > This is currently not compatible with KPTI due to the use of TPIDRRO_EL0 > > by the KPTI trampoline, this could be addressed by reinitializing that > > system register in the return path but I have found it hard to justify > > adding that overhead for all users for something that is essentially a > > profiling optimization which is likely to get superceeded by a more > > modern implementation - if there are other uses for the per-CPU data > > then the balance might change here. > > > > This builds on work done by Kristina Martsenko some time ago but is a > > new implementation. > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d7822b1e24f2df5df98c76f0e94a5416349ff759 > > > > v3: > > - Rebase on v5.9-rc1. > > - Drop in progress portions of the series. > > v2: > > - Rebase on v5.8-rc3. > > - Add further cleanup patches & a first draft of multi-page support. > > > > Mark Brown (5): > > arm64: vdso: Provide a define when building the vDSO > > arm64: vdso: Add per-CPU data > > arm64: vdso: Initialise the per-CPU vDSO data > > arm64: vdso: Add getcpu() implementation > > selftests: vdso: Support arm64 in getcpu() test > > > > arch/arm64/include/asm/processor.h | 12 +---- > > arch/arm64/include/asm/vdso/datapage.h | 54 +++++++++++++++++++ > > arch/arm64/kernel/process.c | 26 ++++++++- > > arch/arm64/kernel/vdso.c | 33 +++++++++++- > > arch/arm64/kernel/vdso/Makefile | 4 +- > > arch/arm64/kernel/vdso/vdso.lds.S | 1 + > > arch/arm64/kernel/vdso/vgetcpu.c | 48 +++++++++++++++++ > > .../testing/selftests/vDSO/vdso_test_getcpu.c | 10 ++++ > > 8 files changed, 172 insertions(+), 16 deletions(-) > > create mode 100644 arch/arm64/include/asm/vdso/datapage.h > > create mode 100644 arch/arm64/kernel/vdso/vgetcpu.c > > > > Patches look good to me from selftests perspective. My acked by > for these patches to go through arm64. > > Acked-by: Shuah Khan <skhan@linuxfoundation.org> > > If you would like me to take these through kselftest tree, give > me your Acks. I can queue these up for 5.10-rc1 Thanks Shuah for the ack. We are still pondering whether the merge these patches as they have some limitations (the per-CPU data structures may not fit in the sole data vDSO page). -- Catalin
On Tue, Sep 01, 2020 at 10:25:52AM +0100, Catalin Marinas wrote: > Thanks Shuah for the ack. We are still pondering whether the merge these > patches as they have some limitations (the per-CPU data structures may > not fit in the sole data vDSO page). They definitely don't fit, I did have some half-written proof of concept patches that I posted that extend this but I was waiting to see if there was any interest in a vDSO getcpu() at all before taking it further. Vincenzo's work on doing the multipage user data that he announced at Plumbers would cover it as well, I hadn't been aware of that.