Message ID | 20240203-arm64-gcs-v8-0-c9fec77673ef@kernel.org |
---|---|
Headers | show |
Series | arm64/gcs: Provide support for GCS in userspace | expand |
On Sat, Feb 3, 2024, at 7:25 AM, Mark Brown wrote: > The arm64 Guarded Control Stack (GCS) feature provides support for > hardware protected stacks of return addresses, intended to provide > hardening against return oriented programming (ROP) attacks and to make > it easier to gather call stacks for applications such as profiling. > > When GCS is active a secondary stack called the Guarded Control Stack is > maintained, protected with a memory attribute which means that it can > only be written with specific GCS operations. The current GCS pointer > can not be directly written to by userspace. When a BL is executed the > value stored in LR is also pushed onto the GCS, and when a RET is > executed the top of the GCS is popped and compared to LR with a fault > being raised if the values do not match. GCS operations may only be > performed on GCS pages, a data abort is generated if they are not. > > The combination of hardware enforcement and lack of extra instructions > in the function entry and exit paths should result in something which > has less overhead and is more difficult to attack than a purely software > implementation like clang's shadow stacks. > > This series implements support for use of GCS by userspace, along with > support for use of GCS within KVM guests. It does not enable use of GCS > by either EL1 or EL2, this will be implemented separately. Executables > are started without GCS and must use a prctl() to enable it, it is > expected that this will be done very early in application execution by > the dynamic linker or other startup code. For dynamic linking this will > be done by checking that everything in the executable is marked as GCS > compatible. > > x86 has an equivalent feature called shadow stacks, this series depends > on the x86 patches for generic memory management support for the new > guarded/shadow stack page type and shares APIs as much as possible. As > there has been extensive discussion with the wider community around the > ABI for shadow stacks I have as far as practical kept implementation > decisions close to those for x86, anticipating that review would lead to > similar conclusions in the absence of strong reasoning for divergence. > > The main divergence I am concious of is that x86 allows shadow stack to > be enabled and disabled repeatedly, freeing the shadow stack for the > thread whenever disabled, while this implementation keeps the GCS > allocated after disable but refuses to reenable it. This is to avoid > races with things actively walking the GCS during a disable, we do > anticipate that some systems will wish to disable GCS at runtime but are > not aware of any demand for subsequently reenabling it. > > x86 uses an arch_prctl() to manage enable and disable, since only x86 > and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a > patch set for the equivalent RISC-V Zicfiss feature which I initially > adopted fairly directly but following review feedback has been revised > quite a bit. While discussing the ABI implications of shadow stacks in the context of Zicfiss and musl a few days ago, I had the following idea for how to solve the source compatibility problems with shadow stacks in POSIX.1-2004 and POSIX.1-2017: 1. Introduce a "flexible shadow stack handling" option. For what follows, it doesn't matter if this is system-wide, per-mm, or per-vma. 2. Shadow stack faults on non-shadow stack pages, if flexible shadow stack handling is in effect, cause the affected page to become a shadow stack page. When this happens, the page filled with invalid address tokens. Faults from non-shadow-stack accesses to a shadow-stack page which was created by the previous paragraph will cause the page to revert to non-shadow-stack usage, with or without clearing. Important: a shadow stack operation can only load a valid address from a page if that page has been in continuous shadow stack use since the address was written by another shadow stack operation; the flexibility delays error reporting in cases of stray writes but it never allows for corruption of shadow stack operation. 3. Standards-defined operations which use a user-provided stack (makecontext, sigaltstack, pthread_attr_setstack) use a subrange of the provided stack for shadow stack storage. I propose to use a shadow stack size of 1/32 of the provided stack size, rounded up to a positive integer number of pages, and place the shadow stack allocation at the lowest page-aligned address inside the provided stack region. Since page usage is flexible, no change in page permissions is immediately needed; this merely sets the initial shadow stack pointer for the new context. If the shadow stack grew in the opposite direction to the architectural stack, it would not be necessary to pick a fixed direction. 4. SIGSTKSZ and MINSIGSTKSZ are increased by 2 pages to provide sufficient space for a minimum-sized shadow stack region and worst case alignment. _Without_ doing this, sigaltstack cannot be used to recover from stack overflows if the shadow stack limit is reached first, and makecontext cannot be supported without memory leaks and unreportable error conditions. Kernel-allocated shadow stacks with a unique VM type are still useful since they allows stray writes to crash at the time the stray write is performed, rather than delaying the crash until the next shadow stack read. The pthread and makecontext changes could be purely libc side, but we would need kernel support for sigaltstack and page usage changes. Luckily, there is no need to support stacks which are simultaneously used from more than one process, so "is this a shadow stack page" can be tracked purely at the vma/pte level without any need to involve the inode. POSIX explicitly allows using mmap to obtain stack memory and does not forbid MAP_SHARED; I consider this sufficiently perverse application behavior that it is not necessary to ensure exclusive use of the underlying pages while a shadow stack pte exists. (Applications that use MAP_SHARED for stacks do not get the full benefit of the shadow stack but they keep POSIX.1-2004 conformance, applications that allocate stacks exclusively in MAP_PRIVATE memory lose no security.) The largest complication of this scheme is likely to be that the shadow stack usage property of a page needs to survive the page being swapped out and back in, which likely means that it must be present in the swap PTE. I am substantially less familiar with GCS and SHSTK than with Zicfiss. It is likely that a syscall or other mechanism is needed to initialize the shadow stack in flexible memory for makecontext. Is there interest on the kernel side on having mechanisms to fully support POSIX.1-2004 with GCS or Zicfiss enabled? -s > We currently maintain the x86 pattern of implicitly allocating a shadow > stack for threads started with shadow stack enabled, there has been some > discussion of removing this support and requiring the use of clone3() > with explicit allocation of shadow stacks instead. I have no strong > feelings either way, implicit allocation is not really consistent with > anything else we do and creates the potential for errors around thread > exit but on the other hand it is existing ABI on x86 and minimises the > changes needed in userspace code. > > There is an open issue with support for CRIU, on x86 this required the > ability to set the GCS mode via ptrace. This series supports > configuring mode bits other than enable/disable via ptrace but it needs > to be confirmed if this is sufficient. > > The series depends on support for shadow stacks in clone3(), that series > includes the addition of ARCH_HAS_USER_SHADOW_STACK. > > > https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@kernel.org > > It also depends on the addition of more waitpid() flags to nolibc: > > > https://lore.kernel.org/r/20231023-nolibc-waitpid-flags-v2-1-b09d096f091f@kernel.org > > You can see a branch with the full set of dependencies against Linus' > tree at: > > https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs > > [1] https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/ > > Signed-off-by: Mark Brown <broonie@kernel.org> > --- > Changes in v8: > - Invalidate signal cap token on stack when consuming. > - Typo and other trivial fixes. > - Don't try to use process_vm_write() on GCS, it intentionally does not > work. > - Fix leak of thread GCSs. > - Rebase onto latest clone3() series. > - Link to v7: > https://lore.kernel.org/r/20231122-arm64-gcs-v7-0-201c483bd775@kernel.org > > Changes in v7: > - Rebase onto v6.7-rc2 via the clone3() patch series. > - Change the token used to cap the stack during signal handling to be > compatible with GCSPOPM. > - Fix flags for new page types. > - Fold in support for clone3(). > - Replace copy_to_user_gcs() with put_user_gcs(). > - Link to v6: > https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org > > Changes in v6: > - Rebase onto v6.6-rc3. > - Add some more gcsb_dsync() barriers following spec clarifications. > - Due to ongoing discussion around clone()/clone3() I've not updated > anything there, the behaviour is the same as on previous versions. > - Link to v5: > https://lore.kernel.org/r/20230822-arm64-gcs-v5-0-9ef181dd6324@kernel.org > > Changes in v5: > - Don't map any permissions for user GCSs, we always use EL0 accessors > or use a separate mapping of the page. > - Reduce the standard size of the GCS to RLIMIT_STACK/2. > - Enforce a PAGE_SIZE alignment requirement on map_shadow_stack(). > - Clarifications and fixes to documentation. > - More tests. > - Link to v4: > https://lore.kernel.org/r/20230807-arm64-gcs-v4-0-68cfa37f9069@kernel.org > > Changes in v4: > - Implement flags for map_shadow_stack() allowing the cap and end of > stack marker to be enabled independently or not at all. > - Relax size and alignment requirements for map_shadow_stack(). > - Add more blurb explaining the advantages of hardware enforcement. > - Link to v3: > https://lore.kernel.org/r/20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org > > Changes in v3: > - Rebase onto v6.5-rc4. > - Add a GCS barrier on context switch. > - Add a GCS stress test. > - Link to v2: > https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org > > Changes in v2: > - Rebase onto v6.5-rc3. > - Rework prctl() interface to allow each bit to be locked independently. > - map_shadow_stack() now places the cap token based on the size > requested by the caller not the actual space allocated. > - Mode changes other than enable via ptrace are now supported. > - Expand test coverage. > - Various smaller fixes and adjustments. > - Link to v1: > https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org > > --- > Mark Brown (38): > arm64/mm: Restructure arch_validate_flags() for extensibility > prctl: arch-agnostic prctl for shadow stack > mman: Add map_shadow_stack() flags > arm64: Document boot requirements for Guarded Control Stacks > arm64/gcs: Document the ABI for Guarded Control Stacks > arm64/sysreg: Add definitions for architected GCS caps > arm64/gcs: Add manual encodings of GCS instructions > arm64/gcs: Provide put_user_gcs() > arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS) > arm64/mm: Allocate PIE slots for EL0 guarded control stack > mm: Define VM_SHADOW_STACK for arm64 when we support GCS > arm64/mm: Map pages for guarded control stack > KVM: arm64: Manage GCS registers for guests > arm64/gcs: Allow GCS usage at EL0 and EL1 > arm64/idreg: Add overrride for GCS > arm64/hwcap: Add hwcap for GCS > arm64/traps: Handle GCS exceptions > arm64/mm: Handle GCS data aborts > arm64/gcs: Context switch GCS state for EL0 > arm64/gcs: Ensure that new threads have a GCS > arm64/gcs: Implement shadow stack prctl() interface > arm64/mm: Implement map_shadow_stack() > arm64/signal: Set up and restore the GCS context for signal handlers > arm64/signal: Expose GCS state in signal frames > arm64/ptrace: Expose GCS via ptrace and core files > arm64: Add Kconfig for Guarded Control Stack (GCS) > kselftest/arm64: Verify the GCS hwcap > kselftest/arm64: Add GCS as a detected feature in the signal tests > kselftest/arm64: Add framework support for GCS to signal handling tests > kselftest/arm64: Allow signals tests to specify an expected si_code > kselftest/arm64: Always run signals tests with GCS enabled > kselftest/arm64: Add very basic GCS test program > kselftest/arm64: Add a GCS test program built with the system libc > kselftest/arm64: Add test coverage for GCS mode locking > selftests/arm64: Add GCS signal tests > kselftest/arm64: Add a GCS stress test > kselftest/arm64: Enable GCS for the FP stress tests > kselftest: Provide shadow stack enable helpers for arm64 > > Documentation/admin-guide/kernel-parameters.txt | 6 + > Documentation/arch/arm64/booting.rst | 22 + > Documentation/arch/arm64/elf_hwcaps.rst | 3 + > Documentation/arch/arm64/gcs.rst | 233 +++++++ > Documentation/arch/arm64/index.rst | 1 + > Documentation/filesystems/proc.rst | 2 +- > arch/arm64/Kconfig | 20 + > arch/arm64/include/asm/cpufeature.h | 6 + > arch/arm64/include/asm/el2_setup.h | 17 + > arch/arm64/include/asm/esr.h | 28 +- > arch/arm64/include/asm/exception.h | 2 + > arch/arm64/include/asm/gcs.h | 107 +++ > arch/arm64/include/asm/hwcap.h | 1 + > arch/arm64/include/asm/kvm_arm.h | 4 +- > arch/arm64/include/asm/kvm_host.h | 12 + > arch/arm64/include/asm/mman.h | 23 +- > arch/arm64/include/asm/pgtable-prot.h | 14 +- > arch/arm64/include/asm/processor.h | 7 + > arch/arm64/include/asm/sysreg.h | 20 + > arch/arm64/include/asm/uaccess.h | 40 ++ > arch/arm64/include/uapi/asm/hwcap.h | 1 + > arch/arm64/include/uapi/asm/ptrace.h | 8 + > arch/arm64/include/uapi/asm/sigcontext.h | 9 + > arch/arm64/kernel/cpufeature.c | 19 + > arch/arm64/kernel/cpuinfo.c | 1 + > arch/arm64/kernel/entry-common.c | 23 + > arch/arm64/kernel/idreg-override.c | 2 + > arch/arm64/kernel/process.c | 85 +++ > arch/arm64/kernel/ptrace.c | 59 ++ > arch/arm64/kernel/signal.c | 242 ++++++- > arch/arm64/kernel/traps.c | 11 + > arch/arm64/kvm/emulate-nested.c | 4 + > arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 + > arch/arm64/kvm/sys_regs.c | 22 + > arch/arm64/mm/Makefile | 1 + > arch/arm64/mm/fault.c | 79 ++- > arch/arm64/mm/gcs.c | 300 +++++++++ > arch/arm64/mm/mmap.c | 13 +- > arch/arm64/tools/cpucaps | 1 + > arch/x86/include/uapi/asm/mman.h | 3 - > fs/proc/task_mmu.c | 3 + > include/linux/mm.h | 16 +- > include/uapi/asm-generic/mman.h | 4 + > include/uapi/linux/elf.h | 1 + > include/uapi/linux/prctl.h | 22 + > kernel/sys.c | 30 + > tools/testing/selftests/arm64/Makefile | 2 +- > tools/testing/selftests/arm64/abi/hwcap.c | 19 + > tools/testing/selftests/arm64/fp/assembler.h | 15 + > tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 + > tools/testing/selftests/arm64/fp/sve-test.S | 2 + > tools/testing/selftests/arm64/fp/za-test.S | 2 + > tools/testing/selftests/arm64/fp/zt-test.S | 2 + > tools/testing/selftests/arm64/gcs/.gitignore | 5 + > tools/testing/selftests/arm64/gcs/Makefile | 24 + > tools/testing/selftests/arm64/gcs/asm-offsets.h | 0 > tools/testing/selftests/arm64/gcs/basic-gcs.c | 428 ++++++++++++ > tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++ > .../selftests/arm64/gcs/gcs-stress-thread.S | 311 +++++++++ > tools/testing/selftests/arm64/gcs/gcs-stress.c | 532 +++++++++++++++ > tools/testing/selftests/arm64/gcs/gcs-util.h | 100 +++ > tools/testing/selftests/arm64/gcs/libc-gcs.c | 736 +++++++++++++++++++++ > tools/testing/selftests/arm64/signal/.gitignore | 1 + > .../testing/selftests/arm64/signal/test_signals.c | 17 +- > .../testing/selftests/arm64/signal/test_signals.h | 6 + > .../selftests/arm64/signal/test_signals_utils.c | 32 +- > .../selftests/arm64/signal/test_signals_utils.h | 39 ++ > .../arm64/signal/testcases/gcs_exception_fault.c | 62 ++ > .../selftests/arm64/signal/testcases/gcs_frame.c | 88 +++ > .../arm64/signal/testcases/gcs_write_fault.c | 67 ++ > .../selftests/arm64/signal/testcases/testcases.c | 7 + > .../selftests/arm64/signal/testcases/testcases.h | 1 + > tools/testing/selftests/ksft_shstk.h | 37 ++ > 73 files changed, 4241 insertions(+), 40 deletions(-) > --- > base-commit: 50abefbf1bc07f5c4e403fd28f71dcee855100f7 > change-id: 20230303-arm64-gcs-e311ab0d8729 > > Best regards, > -- > Mark Brown <broonie@kernel.org> > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On Tue, Feb 20, 2024 at 06:41:05PM +0000, Edgecombe, Rick P wrote: > Hi, > > I worked on the x86 kernel shadow stack support. I think it is an > interesting suggestion. Some questions below, and I will think more on > it. > > On Tue, 2024-02-20 at 11:36 -0500, Stefan O'Rear wrote: > > While discussing the ABI implications of shadow stacks in the context > > of > > Zicfiss and musl a few days ago, I had the following idea for how to > > solve > > the source compatibility problems with shadow stacks in POSIX.1-2004 > > and > > POSIX.1-2017: > > > > 1. Introduce a "flexible shadow stack handling" option. For what > > follows, > > it doesn't matter if this is system-wide, per-mm, or per-vma. > > > > 2. Shadow stack faults on non-shadow stack pages, if flexible shadow > > stack > > handling is in effect, cause the affected page to become a shadow > > stack > > page. When this happens, the page filled with invalid address > > tokens. > > Hmm, could the shadow stack underflow onto the real stack then? Not > sure how bad that is. INCSSP (incrementing the SSP register on x86) > loops are not rare so it seems like something that could happen. Shadow stack underflow should fault on attempt to access non-shadow-stack memory as shadow-stack, no? > > Faults from non-shadow-stack accesses to a shadow-stack page which > > was > > created by the previous paragraph will cause the page to revert to > > non-shadow-stack usage, with or without clearing. > > Won't this prevent catching stack overflows when they happen? An > overflow will just turn the shadow stack into normal stack and only get > detected when the shadow stack unwinds? I don't think that's as big a problem as it sounds like. It might make pinpointing the spot at which things went wrong take a little bit more work, but it should not admit any wrong-execution. > A related question would be how to handle the expanding nature of the > initial stack. I guess the initial stack could be special and have a > separate shadow stack. That seems fine. > > Important: a shadow stack operation can only load a valid address > > from > > a page if that page has been in continuous shadow stack use since > > the > > address was written by another shadow stack operation; the > > flexibility > > delays error reporting in cases of stray writes but it never > > allows for > > corruption of shadow stack operation. > > Shadow stacks currently have automatic guard gaps to try to prevent one > thread from overflowing onto another thread's shadow stack. This would > somewhat opens that up, as the stack guard gaps are usually maintained > by userspace for new threads. It would have to be thought through if > these could still be enforced with checking at additional spots. I would think the existing guard pages would already do that if a thread's shadow stack is contiguous with its own data stack. > > 3. Standards-defined operations which use a user-provided stack > > (makecontext, sigaltstack, pthread_attr_setstack) use a subrange > > of the > > provided stack for shadow stack storage. I propose to use a > > shadow > > stack size of 1/32 of the provided stack size, rounded up to a > > positive > > integer number of pages, and place the shadow stack allocation at > > the > > lowest page-aligned address inside the provided stack region. > > > > Since page usage is flexible, no change in page permissions is > > immediately needed; this merely sets the initial shadow stack > > pointer for > > the new context. > > > > If the shadow stack grew in the opposite direction to the > > architectural > > stack, it would not be necessary to pick a fixed direction. > > > > 4. SIGSTKSZ and MINSIGSTKSZ are increased by 2 pages to provide > > sufficient > > space for a minimum-sized shadow stack region and worst case > > alignment. > > Do all makecontext() callers ensure the size is greater than this? > > I guess glibc's makecontext() could do this scheme to prevent leaking > without any changes to the kernel. Basically steal a little of the > stack address range and overwrite it with a shadow stack mapping. But > only if the apps leave enough room. If they need to be updated, then > they could be updated to manage their own shadow stacks too I think.
On Tue, 2024-02-20 at 20:14 +0000, Mark Brown wrote: > > Hmm, could the shadow stack underflow onto the real stack then? Not > > sure how bad that is. INCSSP (incrementing the SSP register on x86) > > loops are not rare so it seems like something that could happen. > > Yes, they'd trash any pages of normal stack they touch as they do so > but > otherwise seems similar to overflow. I was thinking in the normal buffer overflow case there is a guard gap at the end of the stack, but in this case the shadow stack is directly adjacent to the regular stack. It's probably a minor point.
On Tue, 2024-02-20 at 18:54 -0500, dalias@libc.org wrote: > On Tue, Feb 20, 2024 at 11:30:22PM +0000, Edgecombe, Rick P wrote: > > On Tue, 2024-02-20 at 13:57 -0500, Rich Felker wrote: > > > On Tue, Feb 20, 2024 at 06:41:05PM +0000, Edgecombe, Rick P > > > wrote: > > > > Hmm, could the shadow stack underflow onto the real stack then? > > > > Not > > > > sure how bad that is. INCSSP (incrementing the SSP register on > > > > x86) > > > > loops are not rare so it seems like something that could > > > > happen. > > > > > > Shadow stack underflow should fault on attempt to access > > > non-shadow-stack memory as shadow-stack, no? > > > > Maybe I'm misunderstanding. I thought the proposal included > > allowing > > shadow stack access to convert normal address ranges to shadow > > stack, > > and normal writes to convert shadow stack to normal. > > As I understood the original discussion of the proposal on IRC, it > was > only one-way (from shadow to normal). Unless I'm missing something, > making it one-way is necessary to catch situations where the shadow > stack would become compromised. The original post here: https://lore.kernel.org/lkml/22a53b78-10d7-4a5a-a01e-b2f3a8c22e94@app.fastmail.com/ ...has: "Shadow stack faults on non-shadow stack pages, if flexible shadow stack handling is in effect, cause the affected page to become a shadow stack page. When this happens, the page filled with invalid address tokens." ...and: "Faults from non-shadow-stack accesses to a shadow-stack page which was created by the previous paragraph will cause the page to revert to non- shadow-stack usage, with or without clearing." I see Stefan has clarified in another response. So I'll go try to figure it out. > > > > > Shadow stacks currently have automatic guard gaps to try to > > > > prevent > > > > one > > > > thread from overflowing onto another thread's shadow stack. > > > > This > > > > would > > > > somewhat opens that up, as the stack guard gaps are usually > > > > maintained > > > > by userspace for new threads. It would have to be thought > > > > through > > > > if > > > > these could still be enforced with checking at additional > > > > spots. > > > > > > I would think the existing guard pages would already do that if a > > > thread's shadow stack is contiguous with its own data stack. > > > > The difference is that the kernel provides the guard gaps, where > > this > > would rely on userspace to do it. It is not a showstopper either. > > > > I think my biggest question on this is how does it change the > > capability for two threads to share a shadow stack. It might > > require > > some special rules around the syscall that writes restore tokens. > > So > > I'm not sure. It probably needs a POC. > > Why would they be sharing a shadow stack? The guard gap was introduced originally based on a suggestion that overflowing a shadow stack onto an adjacent shadow stack could cause corruption that could be used by an attacker to work around the protection. There wasn't any concrete demonstrated attacks or suggestion that all the protection was moot. But when we talk about capabilities for converting memory to shadow stack with simple memory accesses, and syscalls that can write restore token to shadow stacks, it's not immediately clear to me that it wouldn't open up something like that. Like if two restore tokens were written to a shadow stack, or two shadow stacks were adjacent with normal memory between them that later got converted to shadow stack. Those sorts of scenarios, but I won't lean on those specific examples. Sorry for being hand wavy. It's just where I'm at, at this point. > > > > From the musl side, I have always looked at the entirely of > > > shadow > > > stack stuff with very heavy skepticism, and anything that breaks > > > existing interface contracts, introduced places where apps can > > > get > > > auto-killed because a late resource allocation fails, or requires > > > applications to code around the existence of something that > > > should be > > > an implementation detail, is a non-starter. To even consider > > > shadow > > > stack support, it must truely be fully non-breaking. > > > > The manual assembly stack switching and JIT code in the apps needs > > to > > be updated. I don't think there is a way around it. > > Indeed, I'm not talking about programs with JIT/manual stack- > switching > asm, just anything using existing APIs for control of stack -- > pthread_setstack, makecontext, sigaltstack, etc. Then I think WRSS might fit your requirements better than what glibc did. It was considered a reduced security mode that made libc's job much easier and had better compatibility, but the last discussion was to try to do it without WRSS. > > > I agree though that the late allocation failures are not great. > > Mark is > > working on clone3 support which should allow moving the shadow > > stack > > allocation to happen in userspace with the normal stack. Even for > > riscv > > though, doesn't it need to update a new register in stack > > switching? > > If clone is called with signals masked, it's probably not necessary > for the kernel to set the shadow stack register as part of clone3. So you would want a mode of clone3 that basically leaves the shadow stack bits alone? Mark was driving that effort, but it doesn't seem horrible to me on first impression. If it would open up the possibility of musl support. > > > BTW, x86 shadow stack has a mode where the shadow stack is writable > > with a special instruction (WRSS). It enables the SSP to be set > > arbitrarily by writing restore tokens. We discussed this as an > > option > > to make the existing longjmp() and signal stuff work more > > transparently > > for glibc. > > > > > > > > BTW, when I talk about "not supporting" I don't mean the app should > > crash. I mean it should instead run normally, just without shadow > > stack > > enabled. Not sure if that was clear. Since shadow stack is not > > essential for an application to function, it is only security > > hardening > > on top. > > > > Although determining if an application supports shadow stack has > > turned > > out to be difficult in practice. Handling dlopen() is especially > > hard. > > One reasonable thing to do, that might be preferable to > overengineered > solutions, is to disable shadow-stack process-wide if an interface > incompatible with it is used (sigaltstack, pthread_create with an > attribute setup using pthread_attr_setstack, makecontext, etc.), as > well as if an incompatible library is is dlopened. I think it would be an interesting approach to determining compatibility. On x86 there has been cases of binaries getting mismarked as supporting shadow stack. So an automated way of filtering some of those out would be very useful I think. I guess the dynamic linker could determine this based on some list of functions? The dlopen() bit gets complicated though. You need to disable shadow stack for all threads, which presumably the kernel could be coaxed into doing. But those threads might be using shadow stack instructions (INCSSP, RSTORSSP, etc). These are a collection of instructions that allow limited control of the SSP. When shadow stack gets disabled, these suddenly turn into #UD generating instructions. So any other threads executing those instructions when shadow stack got disabled would be in for a nasty surprise. Glibc's permissive mode (that disables shadow stack when dlopen()ing a DSO that doesn't support shadow stack) is quite limited because of this. There was a POC for working around it, but I'll stop there for now, to not spam you with the details. I'm not sure of arm and risc-v details on this specific corner, but for x86. > This is much more > acceptable than continuing to run with shadow stacks managed sloppily > by the kernel and async killing the process on OOM, and is probably > *more compatible* with apps than changing the minimum stack size > requirements out from under them. Yep. > > The place where it's really needed to be able to allocate the shadow > stack synchronously under userspace control, in order to harden > normal > applications that aren't doing funny things, is in pthread_create > without a caller-provided stack. Yea most apps don't do anything too tricky. Mostly shadow stack "just works". But it's no excuse to just crash for the others.
On Wed, Feb 21, 2024 at 12:35:48AM +0000, Edgecombe, Rick P wrote: > doing. But those threads might be using shadow stack instructions > (INCSSP, RSTORSSP, etc). These are a collection of instructions that > allow limited control of the SSP. When shadow stack gets disabled, > these suddenly turn into #UD generating instructions. So any other > threads executing those instructions when shadow stack got disabled > would be in for a nasty surprise. > Glibc's permissive mode (that disables shadow stack when dlopen()ing a > DSO that doesn't support shadow stack) is quite limited because of > this. There was a POC for working around it, but I'll stop there for > now, to not spam you with the details. I'm not sure of arm and risc-v > details on this specific corner, but for x86. We have the same issue with disabling GCS causing GCS instructions to become undefined.
On Wed, Feb 21, 2024 at 12:35:48AM +0000, Edgecombe, Rick P wrote: > On Tue, 2024-02-20 at 18:54 -0500, dalias@libc.org wrote: > > On Tue, Feb 20, 2024 at 11:30:22PM +0000, Edgecombe, Rick P wrote: > > > On Tue, 2024-02-20 at 13:57 -0500, Rich Felker wrote: > > > > On Tue, Feb 20, 2024 at 06:41:05PM +0000, Edgecombe, Rick P > > > > > Shadow stacks currently have automatic guard gaps to try to > > > > > prevent > > > > > one > > > > > thread from overflowing onto another thread's shadow stack. > > > > > This > > > > > would > > > > > somewhat opens that up, as the stack guard gaps are usually > > > > > maintained > > > > > by userspace for new threads. It would have to be thought > > > > > through > > > > > if > > > > > these could still be enforced with checking at additional > > > > > spots. > > > > > > > > I would think the existing guard pages would already do that if a > > > > thread's shadow stack is contiguous with its own data stack. > > > > > > The difference is that the kernel provides the guard gaps, where > > > this > > > would rely on userspace to do it. It is not a showstopper either. > > > > > > I think my biggest question on this is how does it change the > > > capability for two threads to share a shadow stack. It might > > > require > > > some special rules around the syscall that writes restore tokens. > > > So > > > I'm not sure. It probably needs a POC. > > > > Why would they be sharing a shadow stack? > > The guard gap was introduced originally based on a suggestion that > overflowing a shadow stack onto an adjacent shadow stack could cause > corruption that could be used by an attacker to work around the > protection. There wasn't any concrete demonstrated attacks or > suggestion that all the protection was moot. OK, so not sharing, just happening to be adjacent. I was thinking from a standpoint of allocating them as part of the same range as the main stack, just with different protections, where that would never happen; you'd always have intervening non-shadowstack pages. But when they're kernel-allocated, yes, they need their own guard pages. > But when we talk about capabilities for converting memory to shadow > stack with simple memory accesses, and syscalls that can write restore > token to shadow stacks, it's not immediately clear to me that it > wouldn't open up something like that. Like if two restore tokens were > written to a shadow stack, or two shadow stacks were adjacent with > normal memory between them that later got converted to shadow stack. > Those sorts of scenarios, but I won't lean on those specific examples. > Sorry for being hand wavy. It's just where I'm at, at this point. I don't think it's safe to have automatic conversions back and forth, only for normal accesses to convert shadowstack to normal memory (in which case, any subsequent attempt to operate on it as shadow stack indicates a critical bug and should be trapped to terminate the process). > > > > From the musl side, I have always looked at the entirely of > > > > shadow > > > > stack stuff with very heavy skepticism, and anything that breaks > > > > existing interface contracts, introduced places where apps can > > > > get > > > > auto-killed because a late resource allocation fails, or requires > > > > applications to code around the existence of something that > > > > should be > > > > an implementation detail, is a non-starter. To even consider > > > > shadow > > > > stack support, it must truely be fully non-breaking. > > > > > > The manual assembly stack switching and JIT code in the apps needs > > > to > > > be updated. I don't think there is a way around it. > > > > Indeed, I'm not talking about programs with JIT/manual stack- > > switching > > asm, just anything using existing APIs for control of stack -- > > pthread_setstack, makecontext, sigaltstack, etc. > > Then I think WRSS might fit your requirements better than what glibc > did. It was considered a reduced security mode that made libc's job > much easier and had better compatibility, but the last discussion was > to try to do it without WRSS. Where can I read more about this? Some searches I tried didn't turn up much useful information. > > > I agree though that the late allocation failures are not great. > > > Mark is > > > working on clone3 support which should allow moving the shadow > > > stack > > > allocation to happen in userspace with the normal stack. Even for > > > riscv > > > though, doesn't it need to update a new register in stack > > > switching? > > > > If clone is called with signals masked, it's probably not necessary > > for the kernel to set the shadow stack register as part of clone3. > > So you would want a mode of clone3 that basically leaves the shadow > stack bits alone? Mark was driving that effort, but it doesn't seem > horrible to me on first impression. If it would open up the possibility > of musl support. Well I'm not sure. That's what we're trying to figure out. But I don't think modifying it is a hard requirement, since it can be modified from userspace if needed as long as signals are masked. > > One reasonable thing to do, that might be preferable to > > overengineered > > solutions, is to disable shadow-stack process-wide if an interface > > incompatible with it is used (sigaltstack, pthread_create with an > > attribute setup using pthread_attr_setstack, makecontext, etc.), as > > well as if an incompatible library is is dlopened. > > I think it would be an interesting approach to determining > compatibility. On x86 there has been cases of binaries getting > mismarked as supporting shadow stack. So an automated way of filtering > some of those out would be very useful I think. I guess the dynamic > linker could determine this based on some list of functions? I didn't follow this whole mess, but from our side (musl) it does not seem relevant. There are no legacy binaries wrongly marked because we have never supported shadow stacks so far. > The dlopen() bit gets complicated though. You need to disable shadow > stack for all threads, which presumably the kernel could be coaxed into > doing. But those threads might be using shadow stack instructions > (INCSSP, RSTORSSP, etc). These are a collection of instructions that > allow limited control of the SSP. When shadow stack gets disabled, > these suddenly turn into #UD generating instructions. So any other > threads executing those instructions when shadow stack got disabled > would be in for a nasty surprise. This is the kernel's problem if that's happening. It should be trapping these and returning immediately like a NOP if shadow stack has been disabled, not generating SIGILL. > > The place where it's really needed to be able to allocate the shadow > > stack synchronously under userspace control, in order to harden > > normal > > applications that aren't doing funny things, is in pthread_create > > without a caller-provided stack. > > Yea most apps don't do anything too tricky. Mostly shadow stack "just > works". But it's no excuse to just crash for the others. One thing to note here is that, to enable this, we're going to need some way to detect "new enough kernel that shadow stack semantics are all right". If there are kernels that have shadow stack support but with problems that make it unsafe to use (this sounds like the case), we can't turn it on without a way to avoid trying to use it on those. Rich
On Tue, 2024-02-20 at 18:59 -0500, Stefan O'Rear wrote: > > Ideally for riscv only writes would cause conversion, an incssp > underflow > which performs shadow stack reads would be able to fault early. Why can't makecontext() just clobber part of the low address side of the passed in stack with a shadow stack mapping? Like say it just munmap()'s part of the passed stack, and map_shadow_stack() in it's place. Then you could still have the shadow stack->normal conversion process triggered by normal writes. IIUC the concern there is to make sure the caller can reuse it as normal memory when it is done with the ucontext/sigaltstack stuff? So the normal->shadow stack part could be explicit. But the more I think about this, the more I think it is a hack, and a proper fix is to use new interfaces. It also would be difficult to sell, if the faulting conversion stuff is in any way complex.
On Wed, Feb 21, 2024 at 05:36:12PM +0000, Mark Brown wrote: > On Wed, Feb 21, 2024 at 09:58:01AM -0500, dalias@libc.org wrote: > > On Wed, Feb 21, 2024 at 01:53:10PM +0000, Mark Brown wrote: > > > On Tue, Feb 20, 2024 at 08:27:37PM -0500, dalias@libc.org wrote: > > > > On Wed, Feb 21, 2024 at 12:35:48AM +0000, Edgecombe, Rick P wrote: > > > > > > (INCSSP, RSTORSSP, etc). These are a collection of instructions that > > > > > allow limited control of the SSP. When shadow stack gets disabled, > > > > > these suddenly turn into #UD generating instructions. So any other > > > > > threads executing those instructions when shadow stack got disabled > > > > > would be in for a nasty surprise. > > > > > This is the kernel's problem if that's happening. It should be > > > > trapping these and returning immediately like a NOP if shadow stack > > > > has been disabled, not generating SIGILL. > > > > I'm not sure that's going to work out well, all it takes is some code > > > that's looking at the shadow stack and expecting something to happen as > > > a result of the instructions it's executing and we run into trouble. A > > > I said NOP but there's no reason it strictly needs to be a NOP. It > > could instead do something reasonable to convey the state of racing > > with shadow stack being disabled. > > This feels like it's getting complicated and I fear it may be an uphill > struggle to get such code merged, at least for arm64. My instinct is > that it's going to be much more robust and generally tractable to let > things run to some suitable synchronisation point and then disable > there, but if we're going to do that then userspace can hopefully > arrange to do the disabling itself through the standard disable > interface anyway. Presumably it'll want to notice things being disabled > at some point anyway? TBH that's been how all the prior proposals for > process wide disable I've seen were done. If it's possible to disable per-thread rather than per-process, some things are easier. Disabling on account of using alt stacks only needs to be done on the threads using those stacks. However, for dlopen purposes you need a way to disable shadow stack for the whole process. Initially this is only needed for the thread that called dlopen, but it needs to have propagated to any thread that synchronizes with completion of the call to dlopen by the time that synchronization occurs, and since that synchronization can happen in lots of different ways that are purely userspace (thanks to futexes being userspace in the uncontended case), I don't see any way to make it work without extremely invasive, high-cost checks. If folks on the kernel side are not going to be amenable to doing the things that are easy for the kernel to make it work without breaking compatibility with existing interfaces, but that are impossible or near-impossible for userspace to do, this seems like a dead-end. And I suspect an operation to "disable shadow stack, but without making threads still in SS-critical sections crash" is going to be necessary.. Rich
On Wed, Feb 21, 2024 at 06:12:30PM +0000, Edgecombe, Rick P wrote: > On Wed, 2024-02-21 at 12:57 -0500, dalias@libc.org wrote: > > > This feels like it's getting complicated and I fear it may be an > > > uphill > > > struggle to get such code merged, at least for arm64. My instinct > > > is > > > that it's going to be much more robust and generally tractable to > > > let > > > things run to some suitable synchronisation point and then disable > > > there, but if we're going to do that then userspace can hopefully > > > arrange to do the disabling itself through the standard disable > > > interface anyway. Presumably it'll want to notice things being > > > disabled > > > at some point anyway? TBH that's been how all the prior proposals > > > for > > > process wide disable I've seen were done. > > > > If it's possible to disable per-thread rather than per-process, some > > things are easier. Disabling on account of using alt stacks only > > needs > > to be done on the threads using those stacks. However, for dlopen > > purposes you need a way to disable shadow stack for the whole > > process. > > Initially this is only needed for the thread that called dlopen, but > > it needs to have propagated to any thread that synchronizes with > > completion of the call to dlopen by the time that synchronization > > occurs, and since that synchronization can happen in lots of > > different > > ways that are purely userspace (thanks to futexes being userspace in > > the uncontended case), I don't see any way to make it work without > > extremely invasive, high-cost checks. > > For glibc's use, we talked about a couple of options. > 1. A mode to start suppressing the #UD's from the shadow stack > instructions > 2. A mode to start suppressing #CPs (the exception that happens when > the shadow stack doesn't match). So the shadow stack instructions > continue to operate normally, but if the shadow stack gets mismatched > due to lack of support, the ret is emulated. It probably is safer (but > still not perfect), but the performance penalty of emulating every RET > after things get screwed up would be a significant down side. There > also needs to be clean handling of shadow stack #PFs. > 3. Per-thread locking that is used around all shadow stack operations > that could be sensitive to disabling. This could be maybe exposed to > apps in case they want to use shadow stack instructions manually. Then > during dlopen() it waits until it can cleanly disable shadow stack for > each thread. In each critical sections there are checks for whether > shadow stack is still enabled. > > 3 is the cleanest and safest I think, and it was thought it might not > need kernel help, due to a scheme Florian had to direct signals to > specific threads. It's my preference at this point. The operations where the shadow stack has to be processed need to be executable from async-signal context, so this imposes a requirement to block all signals around the lock. This makes all longjmps a heavy, multi-syscall operation rather than O(1) userspace operation. I do not think this is an acceptable implementation, especially when there are clearly superior alternatives without that cost or invasiveness. > 1 and 2 are POCed here, if you are interested: > https://github.com/rpedgeco/linux/commits/shstk_suppress_rfc/ I'm not clear why 2 (suppression of #CP) is desirable at all. If shadow stack is being disabled, it should just be disabled, with minimal fault handling to paper over any racing operations at the moment it's disabled. Leaving it on with extreme slowness to make it not actually do anything does not seem useful. Is there some way folks have in mind to use option 2 to lazily disable shadow stack once the first SS-incompatible code is executed, when execution is then known not to be in the middle of a SS-critical section, instead of doing it right away? I don't see how this could work, since the SS-incompatible code could be running from a signal handler that interrupted an SS-critical section. > > If folks on the kernel side are not going to be amenable to doing the > > things that are easy for the kernel to make it work without breaking > > compatibility with existing interfaces, but that are impossible or > > near-impossible for userspace to do, this seems like a dead-end. And > > I > > suspect an operation to "disable shadow stack, but without making > > threads still in SS-critical sections crash" is going to be > > necessary.. > > I think we have to work through all the alternative before we can > accuse the kernel of not being amenable. Is there something that you > would like to see out of this conversation that is not happening? No, I was just interpreting "uphill battle". I really do not want to engage in an uphill battle for the sake of making it practical to support something that was never my goal to begin with. If I'm misreading this, or if others are willing to put the effort into that "battle", I'd be happy to be mistaken about "not amenable". Rich
On Mon, Feb 19, 2024 at 11:15:57PM -0300, Thiago Jung Bauermann wrote: > The only issue as can be seen above is that the can_call_function test > is failing. The child is getting a GCS Segmentation fault when returning > from fork(). > I tried debugging it with GDB, but I don't see what's wrong since the > address in LR matches the first entry in GCSPR. Here is the > debug session: I'm simply not seeing this in my testing. There's *something* going on somewhere, I had another report of a similarish thing elsewhere, but not in any way that I've ever been able to reproduce. It smells like there might be something missing with the page tables...
On Wed, Feb 21, 2024 at 07:22:21PM +0000, Edgecombe, Rick P wrote: > On Wed, 2024-02-21 at 14:06 -0500, dalias@libc.org wrote: > > It's fine to turn RDSSP into an actual emulated read of the SSP, or > > at > > least an emulated load of zero so that uninitialized data is not left > > in the target register. > We can't intercept RDSSP, but it becomes a NOP by default. (disclaimer > x86-only knowledge). For arm64 we have a separate control GCSCRE0_EL1.nTR for access to GCSPR_EL0 (our SSP equivalent) we can use. > > I have not looked at all the instructions that become #UD but I > > suspect they all have reasonable trivial ways to implement a > > "disabled" version of them that userspace can act upon reasonably. > This would have to be thought through functionally and performance > wise. I'm not opposed if can come up with a fully fleshed out plan. How > serious are you in pursuing musl support, if we had something like > this? Same here, we have to be careful since it's defining ABI in a way that we don't normally provide ABI but if there's a clear case for doing it then...
On Mon, Feb 19, 2024 at 11:15:57PM -0300, Thiago Jung Bauermann wrote: > The only issue as can be seen above is that the can_call_function test > is failing. The child is getting a GCS Segmentation fault when returning > from fork(). > I tried debugging it with GDB, but I don't see what's wrong since the > address in LR matches the first entry in GCSPR. Here is the > debug session: I believe based on prior discussions that you're running this using shrinkwrap - can you confirm exactly how please, including things like which firmware configuration you're using? I'm using current git with shrinkwrap run \ --rtvar KERNEL=arch/arm64/boot/Image \ --rtvar ROOTFS=${ROOTFS} \ --rtvar CMDLINE="${CMDLINE}" \ --overlay=arch/v9.4.yaml ns-edk2.yaml and a locally built yocto and everything seems perfectly happy.
Mark Brown <broonie@kernel.org> writes: > On Mon, Feb 19, 2024 at 11:15:57PM -0300, Thiago Jung Bauermann wrote: > >> The only issue as can be seen above is that the can_call_function test >> is failing. The child is getting a GCS Segmentation fault when returning >> from fork(). > >> I tried debugging it with GDB, but I don't see what's wrong since the >> address in LR matches the first entry in GCSPR. Here is the >> debug session: > > I believe based on prior discussions that you're running this using > shrinkwrap - can you confirm exactly how please, including things like > which firmware configuration you're using? I'm using current git with > > shrinkwrap run \ > --rtvar KERNEL=arch/arm64/boot/Image \ > --rtvar ROOTFS=${ROOTFS} \ > --rtvar CMDLINE="${CMDLINE}" \ > --overlay=arch/v9.4.yaml ns-edk2.yaml > > and a locally built yocto and everything seems perfectly happy. Yes, this is how I'm running it: CMDLINE="Image dtb=fdt.dtb console=ttyAMA0 earlycon=pl011,0x1c090000 root=/dev/vda2 ip=dhcp maxcpus=1" shrinkwrap run \ --rtvar=KERNEL=Image-gcs-v8-v6.7-rc4-14743-ga551a7d7af93 \ --rtvar=ROOTFS=$HOME/VMs/ubuntu-aarch64.img \ --rtvar=CMDLINE="$CMDLINE" \ ns-edk2.yaml I ran the following to set up the FVP VM: $ shrinkwrap build --overlay=arch/v9.4.yaml ns-edk2.yaml My rootfs is Ubuntu 22.04.3. In case it's useful, my kernel config is here: https://people.linaro.org/~thiago.bauermann/gcs/config-v6.8.0-rc2 I tried removing "maxcpus=1" from the kernel command line, but it made no difference. I also tried resetting my Shrinkwrap setup and starting from scratch, but it also made no difference: I just pulled from the current main branch and removed Shrinkwrap's build and package directories, and also removed all Docker images and the one container I had. Here are some firmware versions from early boot: NOTICE: Booting Trusted Firmware NOTICE: BL1: v2.10.0 (release):v2.10.0 NOTICE: BL1: Built : 00:07:29, Feb 23 2024 ⋮ NOTICE: BL2: v2.10.0 (release):v2.10.0 NOTICE: BL2: Built : 00:07:29, Feb 23 2024 ⋮ NOTICE: BL31: v2.10.0 (release):v2.10.0 NOTICE: BL31: Built : 00:07:29, Feb 23 2024 ⋮ [ edk2 ] UEFI firmware (version built at 00:06:55 on Feb 23 2024) Press ESCAPE for boot options ...........UEFI Interactive Shell v2.2 EDK II UEFI v2.70 (EDK II, 0x00010000) It looks like our main differences are the kernel config and the distro.
On Thu, Feb 22, 2024 at 11:24:59PM -0300, Thiago Jung Bauermann wrote: > Mark Brown <broonie@kernel.org> writes: > My rootfs is Ubuntu 22.04.3. In case it's useful, my kernel config is > here: > https://people.linaro.org/~thiago.bauermann/gcs/config-v6.8.0-rc2 Does using defconfig make a difference for you? > Here are some firmware versions from early boot: These are (as you'd expect) the same. > It looks like our main differences are the kernel config and the distro. Indeed.
Mark Brown <broonie@kernel.org> writes: > [[PGP Signed Part:Undecided]] > On Thu, Feb 22, 2024 at 11:24:59PM -0300, Thiago Jung Bauermann wrote: >> Mark Brown <broonie@kernel.org> writes: > >> My rootfs is Ubuntu 22.04.3. In case it's useful, my kernel config is >> here: > >> https://people.linaro.org/~thiago.bauermann/gcs/config-v6.8.0-rc2 > > Does using defconfig make a difference for you? No, I still get the same result with the defconfig.
On Thu, Feb 22, 2024 at 11:24:59PM -0300, Thiago Jung Bauermann wrote: > Mark Brown <broonie@kernel.org> writes: > > I believe based on prior discussions that you're running this using > > shrinkwrap - can you confirm exactly how please, including things like > > which firmware configuration you're using? I'm using current git with > > > > shrinkwrap run \ > > --rtvar KERNEL=arch/arm64/boot/Image \ > > --rtvar ROOTFS=${ROOTFS} \ > > --rtvar CMDLINE="${CMDLINE}" \ > > --overlay=arch/v9.4.yaml ns-edk2.yaml > > > > and a locally built yocto and everything seems perfectly happy. > > Yes, this is how I'm running it: > > CMDLINE="Image dtb=fdt.dtb console=ttyAMA0 earlycon=pl011,0x1c090000 root=/dev/vda2 ip=dhcp maxcpus=1" > > shrinkwrap run \ > --rtvar=KERNEL=Image-gcs-v8-v6.7-rc4-14743-ga551a7d7af93 \ I guess that's bitrotted? > My rootfs is Ubuntu 22.04.3. In case it's useful, my kernel config is > here: ... > https://people.linaro.org/~thiago.bauermann/gcs/config-v6.8.0-rc2 Thanks, it seems to be something in your config that's making a difference - I can see issues with that. Hopefully that'll help me get to the bottom of this quickly. I spent a bunch of time fighting with Ubuntu images to get them running but once I did they didn't seem to make much difference.
Mark Brown <broonie@kernel.org> writes: > [[PGP Signed Part:Undecided]] > On Thu, Feb 22, 2024 at 11:24:59PM -0300, Thiago Jung Bauermann wrote: >> Mark Brown <broonie@kernel.org> writes: > >> > I believe based on prior discussions that you're running this using >> > shrinkwrap - can you confirm exactly how please, including things like >> > which firmware configuration you're using? I'm using current git with >> > >> > shrinkwrap run \ >> > --rtvar KERNEL=arch/arm64/boot/Image \ >> > --rtvar ROOTFS=${ROOTFS} \ >> > --rtvar CMDLINE="${CMDLINE}" \ >> > --overlay=arch/v9.4.yaml ns-edk2.yaml >> > >> > and a locally built yocto and everything seems perfectly happy. >> >> Yes, this is how I'm running it: >> >> CMDLINE="Image dtb=fdt.dtb console=ttyAMA0 earlycon=pl011,0x1c090000 root=/dev/vda2 ip=dhcp maxcpus=1" >> >> shrinkwrap run \ >> --rtvar=KERNEL=Image-gcs-v8-v6.7-rc4-14743-ga551a7d7af93 \ > > I guess that's bitrotted? Ah, sorry. When I renamed the Image I messed up the kernel version in the filename, but I did confirm via "uname -r" that I was running the correct version: 6.8.0-rc2-ga551a7d7af93. >> My rootfs is Ubuntu 22.04.3. In case it's useful, my kernel config is >> here: > > ... > >> https://people.linaro.org/~thiago.bauermann/gcs/config-v6.8.0-rc2 > > Thanks, it seems to be something in your config that's making a > difference - I can see issues with that. Hopefully that'll help me get > to the bottom of this quickly. I spent a bunch of time fighting with > Ubuntu images to get them running but once I did they didn't seem to > make much difference. In that case, it's interesting that I still run into the problem with the defconfig. One thing I failed to mention and perhaps is relevant considering your result, is that I didn't copy the modules into the disk image, so the FVP was running just with was built into the kernel. That was actually the main reason for me to use a custom config: I didn't want to have to deal with kernel modules, so I created a config that didn't have any.
* Mark Brown <broonie@kernel.org> [2024-02-21 17:36:12 +0000]: > On Wed, Feb 21, 2024 at 09:58:01AM -0500, dalias@libc.org wrote: > > On Wed, Feb 21, 2024 at 01:53:10PM +0000, Mark Brown wrote: > > > On Tue, Feb 20, 2024 at 08:27:37PM -0500, dalias@libc.org wrote: > > > > On Wed, Feb 21, 2024 at 12:35:48AM +0000, Edgecombe, Rick P wrote: > > > > > > (INCSSP, RSTORSSP, etc). These are a collection of instructions that > > > > > allow limited control of the SSP. When shadow stack gets disabled, > > > > > these suddenly turn into #UD generating instructions. So any other > > > > > threads executing those instructions when shadow stack got disabled > > > > > would be in for a nasty surprise. > > > > > This is the kernel's problem if that's happening. It should be > > > > trapping these and returning immediately like a NOP if shadow stack > > > > has been disabled, not generating SIGILL. > > > > I'm not sure that's going to work out well, all it takes is some code > > > that's looking at the shadow stack and expecting something to happen as > > > a result of the instructions it's executing and we run into trouble. A > > > I said NOP but there's no reason it strictly needs to be a NOP. It > > could instead do something reasonable to convey the state of racing > > with shadow stack being disabled. > > This feels like it's getting complicated and I fear it may be an uphill > struggle to get such code merged, at least for arm64. My instinct is the aarch64 behaviour is already nop for gcs instructions when gcs is disabled. the isa was designed so async disable is possible. only x86 linux would have to emulate this. > that it's going to be much more robust and generally tractable to let > things run to some suitable synchronisation point and then disable > there, but if we're going to do that then userspace can hopefully > arrange to do the disabling itself through the standard disable > interface anyway. Presumably it'll want to notice things being disabled > at some point anyway? TBH that's been how all the prior proposals for > process wide disable I've seen were done.
On Sat, Mar 02, 2024 at 03:57:02PM +0100, Szabolcs Nagy wrote: > * Mark Brown <broonie@kernel.org> [2024-02-21 17:36:12 +0000]: > > > I said NOP but there's no reason it strictly needs to be a NOP. It > > > could instead do something reasonable to convey the state of racing > > > with shadow stack being disabled. > > This feels like it's getting complicated and I fear it may be an uphill > > struggle to get such code merged, at least for arm64. My instinct is > the aarch64 behaviour is already nop > for gcs instructions when gcs is disabled. > the isa was designed so async disable is > possible. Yeah, we'd need to handle GCSPR_EL0 somehow (currently it's inaccessible when GCS is disabled) and userspace would need to take care it's not doing something that could get stuck if for example a pop didn't actually *do* anything.
The arm64 Guarded Control Stack (GCS) feature provides support for hardware protected stacks of return addresses, intended to provide hardening against return oriented programming (ROP) attacks and to make it easier to gather call stacks for applications such as profiling. When GCS is active a secondary stack called the Guarded Control Stack is maintained, protected with a memory attribute which means that it can only be written with specific GCS operations. The current GCS pointer can not be directly written to by userspace. When a BL is executed the value stored in LR is also pushed onto the GCS, and when a RET is executed the top of the GCS is popped and compared to LR with a fault being raised if the values do not match. GCS operations may only be performed on GCS pages, a data abort is generated if they are not. The combination of hardware enforcement and lack of extra instructions in the function entry and exit paths should result in something which has less overhead and is more difficult to attack than a purely software implementation like clang's shadow stacks. This series implements support for use of GCS by userspace, along with support for use of GCS within KVM guests. It does not enable use of GCS by either EL1 or EL2, this will be implemented separately. Executables are started without GCS and must use a prctl() to enable it, it is expected that this will be done very early in application execution by the dynamic linker or other startup code. For dynamic linking this will be done by checking that everything in the executable is marked as GCS compatible. x86 has an equivalent feature called shadow stacks, this series depends on the x86 patches for generic memory management support for the new guarded/shadow stack page type and shares APIs as much as possible. As there has been extensive discussion with the wider community around the ABI for shadow stacks I have as far as practical kept implementation decisions close to those for x86, anticipating that review would lead to similar conclusions in the absence of strong reasoning for divergence. The main divergence I am concious of is that x86 allows shadow stack to be enabled and disabled repeatedly, freeing the shadow stack for the thread whenever disabled, while this implementation keeps the GCS allocated after disable but refuses to reenable it. This is to avoid races with things actively walking the GCS during a disable, we do anticipate that some systems will wish to disable GCS at runtime but are not aware of any demand for subsequently reenabling it. x86 uses an arch_prctl() to manage enable and disable, since only x86 and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a patch set for the equivalent RISC-V Zicfiss feature which I initially adopted fairly directly but following review feedback has been revised quite a bit. We currently maintain the x86 pattern of implicitly allocating a shadow stack for threads started with shadow stack enabled, there has been some discussion of removing this support and requiring the use of clone3() with explicit allocation of shadow stacks instead. I have no strong feelings either way, implicit allocation is not really consistent with anything else we do and creates the potential for errors around thread exit but on the other hand it is existing ABI on x86 and minimises the changes needed in userspace code. There is an open issue with support for CRIU, on x86 this required the ability to set the GCS mode via ptrace. This series supports configuring mode bits other than enable/disable via ptrace but it needs to be confirmed if this is sufficient. The series depends on support for shadow stacks in clone3(), that series includes the addition of ARCH_HAS_USER_SHADOW_STACK. https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@kernel.org It also depends on the addition of more waitpid() flags to nolibc: https://lore.kernel.org/r/20231023-nolibc-waitpid-flags-v2-1-b09d096f091f@kernel.org You can see a branch with the full set of dependencies against Linus' tree at: https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs [1] https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/ Signed-off-by: Mark Brown <broonie@kernel.org> --- Changes in v8: - Invalidate signal cap token on stack when consuming. - Typo and other trivial fixes. - Don't try to use process_vm_write() on GCS, it intentionally does not work. - Fix leak of thread GCSs. - Rebase onto latest clone3() series. - Link to v7: https://lore.kernel.org/r/20231122-arm64-gcs-v7-0-201c483bd775@kernel.org Changes in v7: - Rebase onto v6.7-rc2 via the clone3() patch series. - Change the token used to cap the stack during signal handling to be compatible with GCSPOPM. - Fix flags for new page types. - Fold in support for clone3(). - Replace copy_to_user_gcs() with put_user_gcs(). - Link to v6: https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org Changes in v6: - Rebase onto v6.6-rc3. - Add some more gcsb_dsync() barriers following spec clarifications. - Due to ongoing discussion around clone()/clone3() I've not updated anything there, the behaviour is the same as on previous versions. - Link to v5: https://lore.kernel.org/r/20230822-arm64-gcs-v5-0-9ef181dd6324@kernel.org Changes in v5: - Don't map any permissions for user GCSs, we always use EL0 accessors or use a separate mapping of the page. - Reduce the standard size of the GCS to RLIMIT_STACK/2. - Enforce a PAGE_SIZE alignment requirement on map_shadow_stack(). - Clarifications and fixes to documentation. - More tests. - Link to v4: https://lore.kernel.org/r/20230807-arm64-gcs-v4-0-68cfa37f9069@kernel.org Changes in v4: - Implement flags for map_shadow_stack() allowing the cap and end of stack marker to be enabled independently or not at all. - Relax size and alignment requirements for map_shadow_stack(). - Add more blurb explaining the advantages of hardware enforcement. - Link to v3: https://lore.kernel.org/r/20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org Changes in v3: - Rebase onto v6.5-rc4. - Add a GCS barrier on context switch. - Add a GCS stress test. - Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org Changes in v2: - Rebase onto v6.5-rc3. - Rework prctl() interface to allow each bit to be locked independently. - map_shadow_stack() now places the cap token based on the size requested by the caller not the actual space allocated. - Mode changes other than enable via ptrace are now supported. - Expand test coverage. - Various smaller fixes and adjustments. - Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org --- Mark Brown (38): arm64/mm: Restructure arch_validate_flags() for extensibility prctl: arch-agnostic prctl for shadow stack mman: Add map_shadow_stack() flags arm64: Document boot requirements for Guarded Control Stacks arm64/gcs: Document the ABI for Guarded Control Stacks arm64/sysreg: Add definitions for architected GCS caps arm64/gcs: Add manual encodings of GCS instructions arm64/gcs: Provide put_user_gcs() arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS) arm64/mm: Allocate PIE slots for EL0 guarded control stack mm: Define VM_SHADOW_STACK for arm64 when we support GCS arm64/mm: Map pages for guarded control stack KVM: arm64: Manage GCS registers for guests arm64/gcs: Allow GCS usage at EL0 and EL1 arm64/idreg: Add overrride for GCS arm64/hwcap: Add hwcap for GCS arm64/traps: Handle GCS exceptions arm64/mm: Handle GCS data aborts arm64/gcs: Context switch GCS state for EL0 arm64/gcs: Ensure that new threads have a GCS arm64/gcs: Implement shadow stack prctl() interface arm64/mm: Implement map_shadow_stack() arm64/signal: Set up and restore the GCS context for signal handlers arm64/signal: Expose GCS state in signal frames arm64/ptrace: Expose GCS via ptrace and core files arm64: Add Kconfig for Guarded Control Stack (GCS) kselftest/arm64: Verify the GCS hwcap kselftest/arm64: Add GCS as a detected feature in the signal tests kselftest/arm64: Add framework support for GCS to signal handling tests kselftest/arm64: Allow signals tests to specify an expected si_code kselftest/arm64: Always run signals tests with GCS enabled kselftest/arm64: Add very basic GCS test program kselftest/arm64: Add a GCS test program built with the system libc kselftest/arm64: Add test coverage for GCS mode locking selftests/arm64: Add GCS signal tests kselftest/arm64: Add a GCS stress test kselftest/arm64: Enable GCS for the FP stress tests kselftest: Provide shadow stack enable helpers for arm64 Documentation/admin-guide/kernel-parameters.txt | 6 + Documentation/arch/arm64/booting.rst | 22 + Documentation/arch/arm64/elf_hwcaps.rst | 3 + Documentation/arch/arm64/gcs.rst | 233 +++++++ Documentation/arch/arm64/index.rst | 1 + Documentation/filesystems/proc.rst | 2 +- arch/arm64/Kconfig | 20 + arch/arm64/include/asm/cpufeature.h | 6 + arch/arm64/include/asm/el2_setup.h | 17 + arch/arm64/include/asm/esr.h | 28 +- arch/arm64/include/asm/exception.h | 2 + arch/arm64/include/asm/gcs.h | 107 +++ arch/arm64/include/asm/hwcap.h | 1 + arch/arm64/include/asm/kvm_arm.h | 4 +- arch/arm64/include/asm/kvm_host.h | 12 + arch/arm64/include/asm/mman.h | 23 +- arch/arm64/include/asm/pgtable-prot.h | 14 +- arch/arm64/include/asm/processor.h | 7 + arch/arm64/include/asm/sysreg.h | 20 + arch/arm64/include/asm/uaccess.h | 40 ++ arch/arm64/include/uapi/asm/hwcap.h | 1 + arch/arm64/include/uapi/asm/ptrace.h | 8 + arch/arm64/include/uapi/asm/sigcontext.h | 9 + arch/arm64/kernel/cpufeature.c | 19 + arch/arm64/kernel/cpuinfo.c | 1 + arch/arm64/kernel/entry-common.c | 23 + arch/arm64/kernel/idreg-override.c | 2 + arch/arm64/kernel/process.c | 85 +++ arch/arm64/kernel/ptrace.c | 59 ++ arch/arm64/kernel/signal.c | 242 ++++++- arch/arm64/kernel/traps.c | 11 + arch/arm64/kvm/emulate-nested.c | 4 + arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 + arch/arm64/kvm/sys_regs.c | 22 + arch/arm64/mm/Makefile | 1 + arch/arm64/mm/fault.c | 79 ++- arch/arm64/mm/gcs.c | 300 +++++++++ arch/arm64/mm/mmap.c | 13 +- arch/arm64/tools/cpucaps | 1 + arch/x86/include/uapi/asm/mman.h | 3 - fs/proc/task_mmu.c | 3 + include/linux/mm.h | 16 +- include/uapi/asm-generic/mman.h | 4 + include/uapi/linux/elf.h | 1 + include/uapi/linux/prctl.h | 22 + kernel/sys.c | 30 + tools/testing/selftests/arm64/Makefile | 2 +- tools/testing/selftests/arm64/abi/hwcap.c | 19 + tools/testing/selftests/arm64/fp/assembler.h | 15 + tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 + tools/testing/selftests/arm64/fp/sve-test.S | 2 + tools/testing/selftests/arm64/fp/za-test.S | 2 + tools/testing/selftests/arm64/fp/zt-test.S | 2 + tools/testing/selftests/arm64/gcs/.gitignore | 5 + tools/testing/selftests/arm64/gcs/Makefile | 24 + tools/testing/selftests/arm64/gcs/asm-offsets.h | 0 tools/testing/selftests/arm64/gcs/basic-gcs.c | 428 ++++++++++++ tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++ .../selftests/arm64/gcs/gcs-stress-thread.S | 311 +++++++++ tools/testing/selftests/arm64/gcs/gcs-stress.c | 532 +++++++++++++++ tools/testing/selftests/arm64/gcs/gcs-util.h | 100 +++ tools/testing/selftests/arm64/gcs/libc-gcs.c | 736 +++++++++++++++++++++ tools/testing/selftests/arm64/signal/.gitignore | 1 + .../testing/selftests/arm64/signal/test_signals.c | 17 +- .../testing/selftests/arm64/signal/test_signals.h | 6 + .../selftests/arm64/signal/test_signals_utils.c | 32 +- .../selftests/arm64/signal/test_signals_utils.h | 39 ++ .../arm64/signal/testcases/gcs_exception_fault.c | 62 ++ .../selftests/arm64/signal/testcases/gcs_frame.c | 88 +++ .../arm64/signal/testcases/gcs_write_fault.c | 67 ++ .../selftests/arm64/signal/testcases/testcases.c | 7 + .../selftests/arm64/signal/testcases/testcases.h | 1 + tools/testing/selftests/ksft_shstk.h | 37 ++ 73 files changed, 4241 insertions(+), 40 deletions(-) --- base-commit: 50abefbf1bc07f5c4e403fd28f71dcee855100f7 change-id: 20230303-arm64-gcs-e311ab0d8729 Best regards,