new file mode 100644
@@ -0,0 +1,203 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================
+Using AutoFDO with the Linux kernel
+===================================
+
+This enables AutoFDO build support for the kernel when using
+the Clang compiler. AutoFDO (Auto-Feedback-Directed Optimization)
+is a type of profile-guided optimization (PGO) used to enhance the
+performance of binary executables. It gathers information about the
+frequency of execution of various code paths within a binary using
+hardware sampling. This data is then used to guide the compiler's
+optimization decisions, resulting in a more efficient binary. AutoFDO
+is a powerful optimization technique, and data indicates that it can
+significantly improve kernel performance. It's especially beneficial
+for workloads affected by front-end stalls.
+
+For AutoFDO builds, unlike non-FDO builds, the user must supply a
+profile. Acquiring an AutoFDO profile can be done in several ways.
+AutoFDO profiles are created by converting hardware sampling using
+the "perf" tool. It is crucial that the workload used to create these
+perf files is representative; they must exhibit runtime
+characteristics similar to the workloads that are intended to be
+optimized. Failure to do so will result in the compiler optimizing
+for the wrong objective.
+
+The AutoFDO profile often encapsulates the program's behavior. If the
+performance-critical codes are architecture-independent, the profile
+can be applied across platforms to achieve performance gains. For
+instance, using the profile generated on Intel architecture to build
+a kernel for AMD architecture can also yield performance improvements.
+
+There are two methods for acquiring a representative profile:
+(1) Sample real workloads using a production environment.
+(2) Generate the profile using a representative load test.
+When enabling the AutoFDO build configuration without providing an
+AutoFDO profile, the compiler only modifies the dwarf information in
+the kernel without impacting runtime performance. It's advisable to
+use a kernel binary built with the same AutoFDO configuration to
+collect the perf profile. While it's possible to use a kernel built
+with different options, it may result in inferior performance.
+
+One can collect profiles using AutoFDO build for the previous kernel.
+AutoFDO employs relative line numbers to match the profiles, offering
+some tolerance for source changes. This mode is commonly used in a
+production environment for profile collection.
+
+In a profile collection based on a load test, the AutoFDO collection
+process consists of the following steps:
+
+#. Initial build: The kernel is built with AutoFDO options
+ without a profile.
+
+#. Profiling: The above kernel is then run with a representative
+ workload to gather execution frequency data. This data is
+ collected using hardware sampling, via perf. AutoFDO is most
+ effective on platforms supporting advanced PMU features like
+ LBR on Intel machines.
+
+#. AutoFDO profile generation: Perf output file is converted to
+ the AutoFDO profile via offline tools.
+
+The support requires a Clang compiler LLVM 17 or later.
+
+Preparation
+===========
+
+Configure the kernel with:
+
+ .. code-block:: make
+
+ CONFIG_AUTOFDO_CLANG=y
+
+Customization
+=============
+
+You can enable or disable AutoFDO build for individual file and directories by
+adding a line similar to the following to the respective kernel Makefile:
+
+- For enabling a single file (e.g. foo.o)
+
+ .. code-block:: make
+
+ AUTOFDO_PROFILE_foo.o := y
+
+- For enabling all files in one directory
+
+ .. code-block:: make
+
+ AUTOFDO_PROFILE := y
+
+- For disabling one file
+
+ .. code-block:: make
+
+ AUTOFDO_PROFILE_foo.o := n
+
+- For disabling all files in one directory
+
+ .. code-block:: make
+
+ AUTOFDO_PROFILE := n
+
+
+Workflow
+========
+
+Here is an example workflow for AutoFDO kernel:
+
+
+
+1) Build the kernel on the HOST machine with LLVM enabled, for example,
+
+ .. code-block:: make
+
+ $ make menuconfig LLVM=1
+
+
+ Turn on AutoFDO build config:
+
+ .. code-block:: make
+
+ CONFIG_AUTOFDO_CLANG=y
+
+ With a configuration that with LLVM enabled, use the following command:
+
+ .. code-block:: sh
+
+ $ scripts/config -e AUTOFDO_CLANG
+
+ After getting the config, build with
+
+ .. code-block:: make
+
+ $ make LLVM=1
+
+2) Install the kernel on the TEST machine.
+
+3) Run the load tests. The '-c' option in perf specifies the sample
+ event period. We suggest using a suitable prime number, like 500009,
+ for this purpose.
+
+ - For Intel platforms:
+
+ .. code-block:: sh
+
+ $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
+
+ - For AMD platforms: For Intel platforms:
+ The supported systems are: Zen3 with BRS, or Zen4 with amd_lbr_v2. To check,
+ For Zen3:
+
+ .. code-block:: sh
+
+ $ cat proc/cpuinfo | grep " brs"
+
+ For Zen4:
+
+ .. code-block:: sh
+
+ $ cat proc/cpuinfo | grep amd_lbr_v2
+
+ The following command generated the perf data file:
+
+ .. code-block:: sh
+
+ $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b \
+ -c <count> -o <perf_file> -- <loadtest>
+
+4) (Optional) Download the raw perf file to the HOST machine.
+
+5) To generate an AutoFDO profile, two offline tools are available:
+ create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part
+ of the AutoFDO project and can be found on GitHub
+ (https://github.com/google/autofdo), version v0.30.1 or later.
+ The llvm_profgen tool is included in the LLVM compiler itself. It's
+ important to note that the version of llvm_profgen doesn't need to match
+ the version of Clang. It needs to be the LLVM 19 release of Clang
+ or later, or just from the LLVM trunk.
+
+ .. code-block:: sh
+
+ $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> -o <profile_file>
+
+ or
+ .. code-block:: sh
+
+ $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --format=extbinary -o <profile_file>
+
+ Note that multiple AutoFDO profile files can be merged into one via:
+
+ .. code-block:: sh
+
+ $ llvm-profdata merge -o <profile_file> <profile_1> <profile_2> ... <profile_n>
+
+
+6) Rebuild the kernel using the AutoFDO profile file with the same config as step 1,
+ (Note CONFIG_AUTOFDO_CLANG needs to be enabled):
+
+ .. code-block:: sh
+
+ $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file
+
@@ -32,6 +32,7 @@ Documentation/dev-tools/testing-overview.rst
kunit/index
ktap
checkuapi
+ autofdo
.. only:: subproject and html
@@ -3500,6 +3500,13 @@ F: kernel/audit*
F: lib/*audit.c
K: \baudit_[a-z_0-9]\+\b
+AUTOFDO BUILD
+M: Rong Xu <xur@google.com>
+M: Han Shen <shenhan@google.com>
+S: Supported
+F: Documentation/dev-tools/autofdo.rst
+F: scripts/Makefile.autofdo
+
AUXILIARY BUS DRIVER
M: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
R: Dave Ertman <david.m.ertman@intel.com>
@@ -1024,6 +1024,7 @@ include-$(CONFIG_UBSAN) += scripts/Makefile.ubsan
include-$(CONFIG_KCOV) += scripts/Makefile.kcov
include-$(CONFIG_RANDSTRUCT) += scripts/Makefile.randstruct
include-$(CONFIG_GCC_PLUGINS) += scripts/Makefile.gcc-plugins
+include-$(CONFIG_AUTOFDO_CLANG) += scripts/Makefile.autofdo
include $(addprefix $(srctree)/, $(include-y))
@@ -802,6 +802,26 @@ config LTO_CLANG_THIN
If unsure, say Y.
endchoice
+config ARCH_SUPPORTS_AUTOFDO_CLANG
+ bool
+
+config AUTOFDO_CLANG
+ bool "Enable Clang's AutoFDO build (EXPERIMENTAL)"
+ depends on ARCH_SUPPORTS_AUTOFDO_CLANG
+ depends on CC_IS_CLANG && CLANG_VERSION >= 170000
+ help
+ This option enables Clang’s AutoFDO build. When
+ an AutoFDO profile is specified in variable
+ CLANG_AUTOFDO_PROFILE during the build process,
+ Clang uses the profile to optimize the kernel.
+
+ If no profile is specified, AutoFDO options are
+ still passed to Clang to facilitate the collection
+ of perf data for creating an AutoFDO profile in
+ subsequent builds.
+
+ If unsure, say N.
+
config ARCH_SUPPORTS_CFI_CLANG
bool
help
@@ -122,6 +122,7 @@ config X86
select ARCH_USES_CFI_TRAPS if X86_64 && CFI_CLANG
select ARCH_SUPPORTS_LTO_CLANG
select ARCH_SUPPORTS_LTO_CLANG_THIN
+ select ARCH_SUPPORTS_AUTOFDO_CLANG
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if X86_CMPXCHG64
select ARCH_USE_MEMTEST
new file mode 100644
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0
+
+# Enable available and selected Clang AutoFDO features.
+
+CFLAGS_AUTOFDO_CLANG := -fdebug-info-for-profiling -mllvm -enable-fs-discriminator=true -mllvm -improved-fs-discriminator=true
+
+ifdef CLANG_AUTOFDO_PROFILE
+CFLAGS_AUTOFDO_CLANG += -fprofile-sample-use=$(CLANG_AUTOFDO_PROFILE)
+endif
+
+ifdef CONFIG_LTO_CLANG
+ifdef CONFIG_LTO_CLANG_THIN
+ifdef CLANG_AUTOFDO_PROFILE
+KBUILD_LDFLAGS += --lto-sample-profile=$(CLANG_AUTOFDO_PROFILE)
+endif
+KBUILD_LDFLAGS += --mllvm=-enable-fs-discriminator=true --mllvm=-improved-fs-discriminator=true -plugin-opt=thinlto
+endif
+endif
+
+export CFLAGS_AUTOFDO_CLANG
@@ -209,6 +209,16 @@ _c_flags += $(if $(patsubst n%,, \
-D__KCSAN_INSTRUMENT_BARRIERS__)
endif
+#
+# Enable Clang's AutoFDO build flags for a file or directory depending on
+# variables AUTOFDO_PROFILE_obj.o and AUTOFDO_PROFILE.
+#
+ifeq ($(CONFIG_AUTOFDO_CLANG),y)
+_c_flags += $(if $(patsubst n%,, \
+ $(AUTOFDO_PROFILE_$(target-stem).o)$(AUTOFDO_PROFILE)y), \
+ $(CFLAGS_AUTOFDO_CLANG))
+endif
+
# $(src) for including checkin headers from generated source files
# $(obj) for including generated headers from checkin source files
ifeq ($(KBUILD_EXTMOD),)
@@ -4488,6 +4488,7 @@ static int validate_ibt(struct objtool_file *file)
!strcmp(sec->name, "__jump_table") ||
!strcmp(sec->name, "__mcount_loc") ||
!strcmp(sec->name, ".kcfi_traps") ||
+ !strcmp(sec->name, ".llvm.call-graph-profile") ||
strstr(sec->name, "__patchable_function_entries"))
continue;