A Faster Path to Memory Safety: CHERI, Memory Tagging, and Control Flow Integrity

MalBot · March 12, 2024, 5:35pm

Recently, the White House published a technical paper asking organizations to develop roadmaps for implementing memory safety in their software applications. The goal is to eliminate a broad class of software defects that make up to 70 percent of all vulnerabilities, according to researchers at Microsoft and Google.

Most often when the topic of memory safety comes up people think about memory-safe programming languages like Rust and Swift. But rewriting applications in a new programming language will be an arduous undertaking. It doesn’t mean we shouldn’t try; it’s just going to take a while.

Since so much existing code has been written in memory-unsafe languages like C/C++, another approach has been adding mitigations to help protect against certain types of attacks through software such as LLVM’s Control Flow Integrity (CFI) mechanisms which have been integrated into Android and enabled by default. The focus of LLVM’s CFI is to add additional checks around indirect calls, jump tables, and type casting operations, both at compile time and at runtime. This addresses a useful subset of the overall collection of memory safety issues and is certainly worth enabling if possible, but only addresses part of the problem space.

In addition to software-based mechanisms, there are also some hardware-based options to mitigate some memory safety risks that will be relatively faster to implement:

Control flow integrity, such as that provided by Intel’s Control-flow Enforcement Technology (CTE).
Memory tagging extensions such as ARM’s Memory Tagging Extension (MTE), where the CPU will produce an error if the pointers do not match memory locations. Intel has also announced that memory tagging is on their roadmap.
Capability Hardware Enhanced RISC Instructions (CHERI), a research project that adds new memory-safe features to existing chip architectures.

These hardware-based memory safety mitigations are summarized below.

Control-flow Enforcement Technology (CET)

Intel’s Control-flow Enforcement Technology is a hardware enhancement included in Intel processors starting with the Tiger Lake generation in 2020.

It has two main components:

Shadow Stack, which is used to ensure that the saved return address is not modified while on the stack between CALL and it’s corresponding RETURN instructions are executed
Indirect Branch Tracking, which is used to verify the target of indirect branches and not allowing indirect jumps to any arbitrary location

Both of these features require operating system and development toolchain changes for them to be enabled, as well as recompiling software with the new toolchain updates, but they’re designed to not require much effort from application developers.

Memory Tagging Extension (MTE)

Rather than focusing on the control-flow operations, Arm added their Memory Tagging Extension to the Armv8.5 specification released in 2019 and the first chips with this implemented in hardware were announced in 2021.

MTE focuses on the memory accessed via pointers to attempt to detect incorrect operations, such as reading or writing outside the bounds of an expected region, or using a stale pointer to access memory that has been freed and reallocated.

The mechanism that MTE uses involves storing the “key” for memory accesses as part of the top byte of pointers and enabling a feature in the Armv8-A architecture to ignore this upper byte when performing address translation. When regions of memory are allocated or otherwise set aside, those memory regions are tagged to match the “key” stored in the pointers intended to be used to access them. Then later, when reads and writes occur, the embedded key in each pointer is verified against the memory region to determine whether or not to allow the operation.

Like CET, enabling MTE also requires operating system and development toolchain changes, as well as recompiling software with the new toolchain updates. MTE was designed to not require developers to change their source code at all, but due to how it operates, there’s some overhead added. Some memory access patterns can slow things down more than necessary, so Arm has created some recommendations for optimizing memory access while keeping in mind how MTE works in order to limit that additional overhead.

Capability Hardware Enhanced RISC Instructions (CHERI)

Both CET and MTE mitigate individual parts of the problem, but CHERI is a very promising research project which is attempting to address these types of memory safety issues in a more comprehensive way. Unfortunately, it is not available in real production devices yet.

For now, there are implementations of CHERI for ARM and RISC-V which have been tested in FPGAs and QEMU, but there isn’t an x86 version available yet beyond what’s described as an “architectural sketch” of how CHERI might be integrated into the x86-64 architecture. There previously was a MIPS implementation of CHERI which had been tested in FPGAs also, but that appears to have been removed with focus shifting to the RISC-V implementation instead.

Additionally, a set of limited-edition prototype development boards for the Morello project implementing the ARMv8 flavor of CHERI have been built and distributed to certain stakeholders such as Google, Microsoft, and other interested parties.

Although CHERI is not available in consumer devices yet, David Chisnall, who works on the project at the University of Cambridge, argues that CHERI-equipped CPUs will be the fastest route to memory safety in our collective trusted computing bases:

“There are around 13 billion lines of open source C and C++, which end up in various TCBs [trusted computing bases]. This number gets even bigger when you include proprietary code.

I did a back-of-the-envelope calculation a few years ago that suggested that, if we all stopped writing C/C++ code now and every software engineer focused on rewriting legacy code in safe languages (and on the assumption that everything can be written in safe languages) then it would take 5-10 to replace everything and we’d likely see a lot of logic bugs because we’d be replacing old well-tested code with new code that would need different algorithms and data structures to fit with allowable idioms in safe languages. If we didn’t do the rewriting thing and just stopped writing code in C/C++, then at normal code replacement rates, our TCBs would be entirely safe in around 50 years. If we don’t all agree to stop writing C/C++, it’s at least 100 years.

In contrast, if the major CPU vendors shipped CHERI CPUs in five years, most machines (and all high-value) ones would have memory safety within a 15 years of today, without needing programmers to change their behaviour.”

For now, CHERI isn’t ready for adoption, but it’s definitely a technology that we’ll be paying attention to going forward.

How Eclypsium can help with hardware-based memory safety

At Eclypsium, we believe that eliminating the source of vulnerabilities will be the most effective way to improve security posture in the long term. Ideally, organizations will proactively deal with the root cause of security issues while also reactively mitigating the risk of insecure software and hardware. However, along the way to reaching that goal, mitigating technologies can be implemented and enabled to add additional layers of protection. Understanding when those are present and if they’re configured correctly is a key part of assessing the risks present in a particular environment. Read our recent blog post on a sophisticated iPhone exploit to learn why transparency in firmware and hardware is needed for comprehensive security.

As chip manufacturers begin to offer hardware-based memory safety features, Eclypsium will provide organizations with visibility into the posture of your computing environment when it comes to these important features. Specifically, you will be able to use Eclypsium to see whether hardware-based memory safety functionality is available, and (in some cases) whether it is properly configured. In addition, for workstations, the Linux Vendor Firmware Service (LVFS) is working on a Host Security ID Specification that will help consumers easily understand the security of the hardware and firmware for specific product models including memory safety features.

The Eclypsium platform shows users whether security features on their systems are available and configured correctly.

Additional resources

The post A Faster Path to Memory Safety: CHERI, Memory Tagging, and Control Flow Integrity appeared first on Eclypsium | Supply Chain Security for the Modern Enterprise.

Article Link: A Faster Path to Memory Safety: CHERI, Memory Tagging, and Control Flow Integrity - Eclypsium | Supply Chain Security for the Modern Enterprise