mbox series

[v2,0/6] ubsan: Introduce wrap-around sanitizers

Message ID 20240202101311.it.893-kees@kernel.org
Headers show
Series ubsan: Introduce wrap-around sanitizers | expand

Message

Kees Cook Feb. 2, 2024, 10:16 a.m. UTC
Hi,

v2:
 - improve CC list
 - add reviewed-by tags
 - reword some commit logs
v1: https://lore.kernel.org/all/20240129175033.work.813-kees@kernel.org/

Lay the ground work for gaining instrumentation for signed[1],
unsigned[2], and pointer[3] wrap-around by making all 3 sanitizers
available for testing. Additionally gets x86_64 bootable under the
unsigned sanitizer for the first time.

The compilers will need work before this can be generally useful, as the
signed and pointer sanitizers are effectively a no-op with the kernel's
required use of -fno-strict-overflow. The unsigned sanitizer will also
need adjustment to deal with the many common code patterns that exist
for unsigned wrap-around (e.g. "while (var--)", "-1UL", etc).

-Kees

Link: https://github.com/KSPP/linux/issues/26 [1]
Link: https://github.com/KSPP/linux/issues/27 [2]
Link: https://github.com/KSPP/linux/issues/344 [3]

Kees Cook (6):
  ubsan: Use Clang's -fsanitize-trap=undefined option
  ubsan: Reintroduce signed and unsigned overflow sanitizers
  ubsan: Introduce CONFIG_UBSAN_POINTER_WRAP
  ubsan: Remove CONFIG_UBSAN_SANITIZE_ALL
  ubsan: Split wrapping sanitizer Makefile rules
  ubsan: Get x86_64 booting with unsigned wrap-around sanitizer

 Documentation/dev-tools/ubsan.rst | 28 +++-------
 arch/arm/Kconfig                  |  2 +-
 arch/arm64/Kconfig                |  2 +-
 arch/mips/Kconfig                 |  2 +-
 arch/parisc/Kconfig               |  2 +-
 arch/powerpc/Kconfig              |  2 +-
 arch/riscv/Kconfig                |  2 +-
 arch/s390/Kconfig                 |  2 +-
 arch/x86/Kconfig                  |  2 +-
 arch/x86/kernel/Makefile          |  1 +
 arch/x86/kernel/apic/Makefile     |  1 +
 arch/x86/mm/Makefile              |  1 +
 arch/x86/mm/pat/Makefile          |  1 +
 crypto/Makefile                   |  1 +
 drivers/acpi/Makefile             |  1 +
 include/linux/compiler_types.h    | 19 ++++++-
 kernel/Makefile                   |  1 +
 kernel/locking/Makefile           |  1 +
 kernel/rcu/Makefile               |  1 +
 kernel/sched/Makefile             |  1 +
 lib/Kconfig.ubsan                 | 41 +++++++++-----
 lib/Makefile                      |  1 +
 lib/crypto/Makefile               |  1 +
 lib/crypto/mpi/Makefile           |  1 +
 lib/test_ubsan.c                  | 82 ++++++++++++++++++++++++++++
 lib/ubsan.c                       | 89 +++++++++++++++++++++++++++++++
 lib/ubsan.h                       |  5 ++
 lib/zlib_deflate/Makefile         |  1 +
 lib/zstd/Makefile                 |  2 +
 mm/Makefile                       |  1 +
 net/core/Makefile                 |  1 +
 net/ipv4/Makefile                 |  1 +
 scripts/Makefile.lib              | 11 +++-
 scripts/Makefile.ubsan            | 11 +++-
 34 files changed, 278 insertions(+), 43 deletions(-)

Comments

Kees Cook Feb. 2, 2024, 12:17 p.m. UTC | #1
On Fri, Feb 02, 2024 at 12:01:55PM +0100, Marco Elver wrote:
> On Fri, 2 Feb 2024 at 11:16, Kees Cook <keescook@chromium.org> wrote:
> > [...]
> > +config UBSAN_UNSIGNED_WRAP
> > +       bool "Perform checking for unsigned arithmetic wrap-around"
> > +       depends on $(cc-option,-fsanitize=unsigned-integer-overflow)
> > +       depends on !X86_32 # avoid excessive stack usage on x86-32/clang
> > +       depends on !COMPILE_TEST
> > +       help
> > +         This option enables -fsanitize=unsigned-integer-overflow which checks
> > +         for wrap-around of any arithmetic operations with unsigned integers. This
> > +         currently causes x86 to fail to boot.
> 
> My hypothesis is that these options will quickly be enabled by various
> test and fuzzing setups, to the detriment of kernel developers. While
> the commit message states that these are for experimentation, I do not
> think it is at all clear from the Kconfig options.

I can certainly rephrase it more strongly. I would hope that anyone
enabling the unsigned sanitizer would quickly realize how extremely
noisy it is.

> Unsigned integer wrap-around is relatively common (it is _not_ UB
> after all). While I can appreciate that in some cases wrap around is a
> genuine semantic bug, and that's what we want to find with these
> changes, ultimately marking all semantically valid wrap arounds to
> catch the unmarked ones. Given these patterns are so common, and C
> programmers are used to them, it will take a lot of effort to mark all
> the intentional cases. But I fear that even if we get to that place,
> _unmarked_  but semantically valid unsigned wrap around will keep
> popping up again and again.

I agree -- it's going to be quite a challenge. My short-term goal is to
see how far the sanitizer itself can get with identifying intentional
uses. For example, I found two more extremely common code patterns that
trip it now:

	unsigned int i = ...;
	...
	while (i--) { ... }

This trips the sanitizer at loop exit. :P It seems like churn to
refactor all of these into "for (; i; i--)". The compiler should be able
to identify this by looking for later uses of "i", etc.

The other is negative constants: -1UL, -3ULL, etc. These are all over
the place and very very obviously intentional and should be ignored by
the compiler.

> What is the long-term vision to minimize the additional churn this may
> introduce?

My hope is that we can evolve the coverage over time. Solving it all at
once won't be possible, but I think we can get pretty far with the
signed overflow sanitizer, which runs relatively cleanly already.

If we can't make meaningful progress in unsigned annotations, I think
we'll have to work on gaining type-based operator overloading so we can
grow type-aware arithmetic. That will serve as a much cleaner
annotation. E.g. introduce jiffie_t, which wraps.

> I think the problem reminds me a little of the data race problem,
> although I suspect unsigned integer wraparound is much more common
> than data races (which unlike unsigned wrap around is actually UB) -
> so chasing all intentional unsigned integer wrap arounds and marking
> will take even more effort than marking all intentional data races
> (which we're still slowly, but steadily, making progress towards).
> 
> At the very least, these options should 'depends on EXPERT' or even
> 'depends on BROKEN' while the story is still being worked out.

Perhaps I should hold off on bringing the unsigned sanitizer back? I was
hoping to work in parallel with the signed sanitizer, but maybe this
isn't the right approach?