Notes on Build Hardening

I thought I’d comment on a paper about “build safety” in consumer products, describing how software is built to harden it against hackers trying to exploit bugs.

What is build safety?

Modern languages (Java, C#, Go, Rust, JavaScript, Python, etc.) are inherently “safe”, meaning they don’t have “buffer-overflows” or related problems.

However, C/C++ is “unsafe”, and is the most popular language for building stuff that interacts with the network. In other cases, while the language itself may be safe, it’ll use underlying infrastructure (“libraries”) written in C/C++. When we are talking about hardening builds, making them safe or security, we are talking about C/C++.

In the last two decades, we’ve improved both hardware and operating-systems around C/C++ in order to impose safety on it from the outside. We do this with  options when the software is built (compiled and linked), and then when the software is run.

That’s what the paper above looks at: how consumer devices are built using these options, and thereby, measuring the security of these devices.

In particular, we are talking about the Linux operating system here and the GNU compiler gcc. Consumer products almost always use Linux these days, though a few also use embedded Windows or QNX. They are almost always built using gcc, though some are built using a clone known as clang (or llvm).

How software is built

Software is first compiled then linked. Compiling means translating the human-readable source code into machine code. Linking means combining multiple compiled files into a single executable.

Consider a program hello.c. We might compile it using the following command:

gcc -o hello hello.c

This command takes the file, hello.c, compiles it, then outputs -o an executable with the name hello.

We can set additional compilation options on the command-line here. For example, to enable stack guards, we’d compile with a command that looks like the following:

gcc -o hello -fstack-protector hello.c

In the following sections, we are going to look at specific options and what they do.

Stack guards

A running program has various kinds of memory, optimized for different use cases. One chunk of memory is known as the stack. This is the scratch pad for functions. When a function in the code is called, the stack grows with additional scratchpad needs of that functions, then shrinks back when the function exits. As functions call other functions, which call other functions, the stack keeps growing larger and larger. When they return, it then shrinks back again.

The scratch pad for each function is known as the stack frame. Among the things stored in the stack frame is the return address, where the function was called from so that when it exits, the caller of the function can continue executing where it left off.

The way stack guards work is to stick a carefully constructed value in between each stack frame, known as a canary. Right before the function exits, it’ll check this canary in order to validate it hasn’t been corrupted. If corruption is detected, the program exits, or crashes, to prevent worse things from happening.

This solves the most common exploited vulnerability in C/C++ code, the stack buffer-overflow. This is the bug described in that famous paper Smashing the Stack for Fun and Profit from the 1990s.

To enable this stack protection, code is compiled with the option -fstack-protector. This looks for functions that have typical buffers, inserting the guard/canary on the stack when the function is entered, and then verifying the value hasn’t been overwritten before exit.

Not all functions are instrumented this way, for performance reasons, only those that appear to have character buffers. Only those with buffers of 8 bytes or more are instrumented by default. You can change this by adding the option –param ssp-buffer-size=n, where n is the number of bytes. You can include other arrays to check using -fstack-protector-strong instead. You can instrument all functions with -fstack-protector-all.

Since this feature was added, many vulnerabilities have been found that evade the default settings. Recently, -fstack-protector-strong has been added to gcc that significantly increases the number of protected functions. The setting -fstack-protector-all is still avoided due to performance cost, as even trivial functions which can’t possibly overflow are still instrumented.

Heap guards

The other major dynamic memory structure is known as the heap (or the malloc region). When a function returns, everything in its scratchpad memory on the stack will lost. If something needs to stay around longer than this, then it must be allocated from the heap rather than the stck.

Just as there are stack buffer-overflows, there can be heap overflows, and the same solution of using canaries can guard against them.

The heap has two additional problems. The first is use-after-free, when memory on the heap is freed (marked as no longer in use), an then used anyway. The other is double-free, where the code attempts to free the memory twice. These problems don’t exist on the stack, because things are either added to or removed from the top of the stack, as in a stack of dishes. The heap looks more like the game Jenga, where things can be removed from the middle.

Whereas stack guards change the code generated by the compiler, heap guards don’t. Instead, the heap exists in library functions.

The most common library added by a linker is known as glibc, the standard GNU C library. However, this library is about 1.8-megabytes in size. Many of the home devices in the paper above may only have 4-megabytes total flash drive space, so this is too large. Instead, most of these home devices use an alternate library, something like uClibc or musl, which is only 0.6-megabytes in size. In addition, regardless of the standard library used for other features, a program my still replace the heap implementation with a custom one, such as jemalloc.

Even if using a library that does heap guards, it may not be enabled in the software. If using glibc, a program can still turn off checking internally (using mallopt), or it can be disabled externally, before running a program, by setting the environment variable MALLOC_CHECK_.

The above paper didn’t evaluate heap guards. I assume this is because it can be so hard to check.

ASLR

When a buffer-overflow is exploited, a hacker will overwrite values pointing to specific locations in memory. That’s because locations, the layout of memory, are predictable. It’s a detail that programmers don’t know when they write the code, but something hackers can reverse-engineer when trying to figure how to exploit code.

Obviously a useful mitigation step would be to randomize the layout of memory, so nothing is in a predictable location. This is known as address space layout randomization or ASLR.

The word layout comes from the fact that the when a program runs, it’ll consist of several segments of memory. The basic list of segments are:

  • the executable code
  • static values (like strings)
  • global variables
  • libraries
  • heap (growable)
  • stack (growable)
  • mmap()/VirtualAlloc() (random location)

Historically, the first few segments are laid out sequentially, starting from address zero. Remember that user-mode programs have virtual memory, so what’s located starting at 0 for one program is different from another.

As mentioned above, the heap and the stack need to be able to grow as functions are called data allocated from the heap. The way this is done is to place the heap after all the fixed-sized segments, so that it can grow upwards. Then, the stack is placed at the top of memory, and grows downward (as functions are called, the stack frames are added at the bottom).

Sometimes a program may request memory chunks outside the heap/stack directly from the operating system, such as using the mmap() system call on Linux, or the VirtualAlloc() system call on Windows. This will usually be placed somewhere in the middle between the heap and stack.

With ASLR, all these locations are randomized, and can appear anywhere in memory. Instead of growing contiguously, the heap has to sometimes jump around things already allocated in its way, which is a fairly easy problem to solve, since the heap isn’t really contiguous anyway (as chunks are allocated and freed from the middle). However, the stack has a problem. It must grow contiguously, and if there is something in its way, the program has little choice but to exit (i.e. crash). Usually, that’s not a problem, because the stack rarely grows very large. If it does grow too big, it’s usually because of a bug that requires the program to crash anyway.

ASLR for code

The problem for executable code is that for ASLR to work, it must be made position independent. Historically, when code would call a function, it would jump to the fixed location in memory where that function was know to be located, thus it was dependent on the position in memory.

To fix this, code can be changed to jump to relative positions instead, where the code jumps at an offset from wherever it was jumping from.

To  enable this on the compiler, the flag -fPIE (position independent executable) is used. Or, if building just a library and not a full executable program, the flag -fPIC (position independent code) is used.

Then, when linking a program composed of compiled files and libraries, the flag -pie is used. In other words, use -pie -fPIE when compiling executables, and -fPIC when compiling for libraries.

When compiled this way, exploits will no longer be able to jump directly into known locations for code.

ASLR for libraries

The above paper glossed over details about ASLR, probably just looking at whether an executable program was compiled to be position independent. However, code links to shared libraries that may or may not likewise be position independent, regardless of the settings for the main executable.

I’m not sure it matters for the current paper, as most programs had position independence disabled, but in the future, a comprehensive study will need to look at libraries as a separate case.

ASLR for other segments

The above paper equated ASLR with randomized location for code, but ASLR also applies to the heap and stack. The randomization status of these programs is independent of whatever was configured for the main executable.

As far as I can tell, modern Linux systems will randomize these locations, regardless of build settings. Thus, for build settings, it just code randomization that needs to be worried about. But when running the software, care must be taken that the operating system will behave correctly. A lot of devices, especially old ones, use older versions of Linux that may not have this randomization enabled, or be using custom kernels where it has been turned off.

RELRO

Modern systems have dynamic/shared libraries. Most of the code of a typical program consists of standard libraries. As mentioned above, the GNU standard library glibc is 8-megabytes in size. Linking that into every one of hundreds of programs means gigabytes of disk space may be needed to store all the executables. It’s better to have a single file on the disk, libglibc.so, that all programs can share it.

The problem is that every program will load libraries into random locations. Therefore, code cannot jump to functions in the library, either with a fixed or relative address. To solve this, what position independent code does is jump to an entry in a table, relative to its own position. That table will then have the fixed location of the real function that it’ll jump to. When a library is loaded, that table is filled in with the correct values.

The problem is that hacker exploits can also write to that table. Therefore, what you need to do is make that table read-only after it’s been filled in. That’s done with the “relro” flag, meaning “relocation read-only”. An additional flag, “now”, must be set to force this behavior at program startup, rather than waiting until later.

When passed to the linker, these flags would be “-z relro -z now”. However, we usually call the linker directly from the compiler, and pass the flags through. This is done in gcc by doing “-Wl,-z,relro -Wl,-z,now”.

Non-executable stack

Exploiting a stack buffer overflow has three steps:
  • figure out where the stack is located (mitigated by ASLR)
  • overwrite the stack frame control structure (mitigated by stack guards)
  • execute code in the  buffer
We can mitigate the third step by preventing code from executing from stack buffers. The stack contains data, not code, so this shouldn’t be a problem. In much the same way that we can mark memory regions as read-only, we can mark them no-execute. This should be the default, of course, but as the paper above points out, there are some cases where code is actually placed on the stack.

This open can be set with -Wl,-z,noexecstack, when compiling both the executable and the libraries. This is the default, so you shouldn’t need to do anything special. However, as the paper points out, there are things that get in the way of this if you aren’t careful. The setting is more what you’d call “guidelines” than actual “rules”. Despite setting this flag, building software may result in an executable stack.

So, you may want to verify it after building software, such as using the “readelf -l [programname]”. This will tell you what the stack has been configured to be.

Non-executable heap

The above paper focused on executable stack, but there is also the question of an executable heap. It likewise contains data and not code, so should be marked no-execute. Like for heap guards mentioned above, this isn’t a build setting but a feature of the library. The default library for Linux, glibc, marks the heap no-execute. However, it appears the other standard libraries or alternative heaps mark the stack as executable.

FORTIFY_SOURCE

The paper above doesn’t discuss this hardening step, but it’s an important one.

One reason for so many buffer-overflow bugs is that the standard functions that copy buffers have no ability to verify whether they’ve gone past the end of a buffer. A common recommendation for code is to replace those inherently unsafe functions with safer alternatives that include length checks. For example the notoriously unsafe function strcpy() can be replaced with strlcpy(), which adds a length check.

Instead of editing the code, the GNU compiler can do this automatically. This is done with the build option -O2 -D_FORTIFY_SOURCE=2.

This is only a partial solution. The compiler can’t always figure out the size of the buffer being copied into, and thus will leave the code untouched. Only when the compiler can figure things out does it make the change.

It’s fairly easy to detect if code has been compiled with this flag, but the above paper didn’t look much into it. That’s probably for the same reason it didn’t look into heap checks: it requires the huge glibc library. These devices use the smaller libraries, which don’t support this feature.

Format string bugs

Because this post is about build hardening, I want to mention format-string bugs. This is a common  bug in old code that can be caught by adding warnings for it in the build options, namely:
 -Wformat -Wformat-security -Werror=format-security
It’s hard to check if code has been built with these options, however. Instead of simple programs like readelf that can verify many of the issues above, this would take static analysis tools that read the executable code and reverse engineer what’s going on.

Warnings and static analysis

When building code, the compiler will generate warnings about confusion, possible bugs, or bad style. The compiler has a default set of warnings. Robust code is compiled with the -Wall, meaning “all” warnings, though it actually doesn’t enable all of them. Paranoid code uses the -Wextra warnings to include those not included with -Wall. There is also the -pedantic or -Wpedantic flag, which warns on C compatibility issues.

All of these warnings can be converted into errors, which prevents the software from building, using the -Werror flag. As shown above, this can also be used with individual error names to make only some warnings into errors.

Optimization level

Compilers can optimize code, looking for common patterns, to make it go faster. You can set your desired optimization level.

Some things, namely the FORTIFY_SOURCE feature, don’t work without optimization enabled. That’s why in the above example, -O2 is specified, to set optimization level 2.

Higher levels aren’t recommended. For one thing, this doesn’t make code faster on modern, complex processors. The advanced processors you have in your desktop/mobile-phone themselves do extensive optimization. What they want is small code size that fits within cache. The -O3 level make code bigger, which is good for older, simpler processors, but is bad for modern, advanced processors.

In addition, the aggressive settings of -O3 have lead to security problems over “undefined behavior”. Even -O2 is a little suspect, with some guides suggesting the proper optimization is -O1. However, some optimizations are actually more secure than no optimizations, so for the security paranoid, -O1 should be considered the minimum.

What about sanitizers?

You can compile code with address sanitizers that do more comprehensive buffer-overflow checks, as well as undefined behavior sanitizers for things like integer overflows.

While these are good for testing and debugging, they’ve proven so far too difficult to get working in production code. These may become viable in the future.

Summary

If  you are building code using gcc on Linux, here are the options/flags you should use:
-Wall -Wformat -Wformat-security -Werror=format-security -fstack-protector -pie -fPIE -D_FORTIFY_SOURCE=2 -O2 -Wl,-z,relro -Wl,-z,now -Wl,-z,noexecstack
If you are more paranoid, these options would be:
-Wall -Wformat -Wformat-security -Wstack-protector -Werror -pedantic -fstack-protector-all –param ssp-buffer-size=1 -pie -fPIE -D_FORTIFY_SOURCE=2 -O2 -Wl,-z,relro -Wl,-z,now -Wl,-z,noexecstack

Article Link: Errata Security: Notes on Build Hardening