How STACKLEAK improves Linux kernel security

MalBot · October 11, 2018, 1:32pm

STACKLEAK is a Linux kernel security feature initially developed by Grsecurity/PaX. I'm working on introducing STACKLEAK into the Linux kernel mainline. This article describes the inner workings of this security feature and why the vanilla kernel needs it.

In short, STACKLEAK is needed because it mitigates several types of Linux kernel vulnerabilities, by:

Reducing the information that can be revealed to an attacker by kernel stack leak bugs,
Blocking some uninitialized stack variable attacks,
Detecting kernel stack overflow during Stack Clash attack against Linux Kernel.

This security feature fits the mission of the Kernel Self Protection Project (KSPP): security is more than just fixing bugs. Fixing absolutely all bugs is impossible, which is why the Linux kernel has to fail safely in case of an error or vulnerability exploitation. More details about KSPP are available on its wiki.

STACKLEAK was initially developed by the PaX Team, going as PAX_MEMORY_STACKLEAK in the Grsecurity/PaX patch. But this patch is no longer freely available to the Linux kernel community. So I took its last public version for the 4.9 kernel (April 2017) and got to work. The plan has been as follows:

First extract STACKLEAK from the Grsecurity/PaX patch.
Then carefully study the code and create a new patch.
Send the result to the Linux kernel mailing list (LKML), get feedback, make improvements, and repeat until the code is accepted into the mainline.

As of October 9, 2018, the 15th version of the STACKLEAK patch series has been submitted. It contains the common code and x86_64/x86_32 support. The arm64 support developed by Laura Abbott from Red Hat has already been merged into mainline kernel v4.19.

Security features

Most importantly, STACKLEAK erases the kernel stack at the end of syscalls. This reduces the information that can be revealed through some kernel stack leak bugs. An example of such an information leak is shown in Figure 1.

Figure 1. Kernel stack leak exploitation, pre-STACKLEAK

However, these leaks become useless for the attacker if the used part of the kernel stack is filled by some fixed value at the end of a syscall (Figure 2).

Figure 2. Kernel stack leak exploitation, post-STACKLEAK

Hence, STACKLEAK blocks exploitation of some uninitialized kernel stack variable vulnerabilities, such as CVE-2010-2963 and CVE-2017-17712. For a description of exploitation of vulnerability CVE-2010-2963, refer to the article by Kees Cook.

Figure 3 illustrates an attack on an uninitialized kernel stack variable.

Figure 3. Uninitialized kernel stack variable exploitation, pre-STACKLEAK

STACKLEAK mitigates this type of attack because at the end of a syscall, it fills the kernel stack with a value that points to an unused hole in the virtual memory map (Figure 4).

Figure 4. Uninitialized kernel stack variable exploitation, post-STACKLEAK

There is an important limitation: STACKLEAK does not help against similar attacks performed during a single syscall.

Runtime detection of kernel stack depth overflow

In the mainline kernel, STACKLEAK would be effective against kernel stack depth overflow only in combination with CONFIG_THREAD_INFO_IN_TASK and CONFIG_VMAP_STACK (both introduced by Andy Lutomirski).

The simplest type of stack depth overflow exploit is shown in Figure 5.

Figure 5. Stack depth overflow exploitation: mitigation with CONFIG_THREAD_INFO_IN_TASK

Overwriting the thread_info structure at the bottom of the kernel stack allows an attacker to escalate privileges on the system. However, CONFIG_THREAD_INFO_IN_TASK moves thread_info out of the thread stack and therefore mitigates such an attack.

There is a more complex variant of the attack: make the kernel stack grow beyond the end of the kernel's preallocated stack space and overwrite security-sensitive data in a neighboring memory region (Figure 6). More technical details are available in:

"The Stack is Back" by Jon Oberheide
"Exploiting Recursion in the Linux Kernel" by Jann Horn

Figure 6. Stack depth overflow exploitation: a more complicated version

CONFIG_VMAP_STACK protects against such attacks by placing a special guard page next to the kernel stack (Figure 7). If accessed, the guard page triggers an exception.

Figure 7. Stack depth overflow exploitation: mitigation with guard pages

Finally, the most interesting version of a stack depth overflow attack is a Stack Clash (Figure 8). Gael Delalleau published this idea in 2005. It was later revisited by the Qualys Research Team in 2017. In essence, it is possible to jump over a guard page and overwrite data from a neighboring memory region using Variable Length Arrays (VLA).

Figure 8. Stack Clash attack

STACKLEAK mitigates Stack Clash attacks against the kernel stack. More information about STACKLEAK and Stack Clash is available on the grsecurity blog.

To prevent a Stack Clash in the kernel stack, a stack depth overflow check is performed before each alloca() call. This is the code from v14 of the patch series:

void __used stackleak_check_alloca(unsigned long size)
{
unsigned long sp = (unsigned long)&sp;
struct stack_info stack_info = {0};
unsigned long visit_mask = 0;
unsigned long stack_left;

BUG_ON(get_stack_info(&sp, current, &stack_info, &visit_mask));

stack_left = sp - (unsigned long)stack_info.begin;

if (size >= stack_left) {
/*
* Kernel stack depth overflow is detected, let's report that.
* If CONFIG_VMAP_STACK is enabled, we can safely use BUG().
* If CONFIG_VMAP_STACK is disabled, BUG() handling can corrupt
* the neighbour memory. CONFIG_SCHED_STACK_END_CHECK calls
* panic() in a similar situation, so let's do the same if that
* option is on. Otherwise just use BUG() and hope for the best.
*/
#if !defined(CONFIG_VMAP_STACK) && defined(CONFIG_SCHED_STACK_END_CHECK)
panic("alloca() over the kernel stack boundary\n");
#else
BUG();
#endif
}
}

However, this functionality was excluded from the 15th version of the STACKLEAK patch series. The main reason is that Linus Torvalds has forbidden use of BUG_ON() in kernel hardening patches. Moreover, during discussion of the 9th version, the maintainers decided to remove all VLAs from the mainline kernel. There are 15 kernel developers participating in that work, which will be finished soon.

Performance impact

Cursory performance testing was performed on x86_64 hardware: Intel Core i7-4770, 16 GB RAM.

Test 1, looking good: compiling the Linux kernel on one CPU core.

# time make
Result on 4.18:
real 12m14.124s
user 11m17.565s
sys 1m6.943s
Result on 4 .18+stackleak:
real 12m20.335s (+0.85%)
user 11m23.283s
sys 1m8.221s

Test 2, not so hot:

# hackbench -s 4096 -l 2000 -g 15 -f 25 –P
Average on 4.18: 9.08 s
Average on 4.18+stackleak: 9.47 s (+4.3%)

In summary: the performance penalty varies for different workloads. Test STACKLEAK on your expected workload before deploying it in production.

Inner workings

STACKLEAK consists of:

The code that erases the kernel stack at the end of syscalls,
The GCC plugin for kernel compile-time instrumentation.

Erasing the kernel stack is performed in the stackleak_erase() function. This function runs before returning from a syscall to userspace and writes STACKLEAK_POISON (-0xBEEF) to the used part of the thread stack (Figure 10). For speed, stackleak_erase() uses the lowest_stack variable as a starting point (Figure 9). This variable is regularly updated in stackleak_track_stack() during system calls.

Figure 9. Erasing the kernel stack with stackleak_erase()

Figure 10. Erasing the kernel stack with stackleak_erase(), continued

Kernel compile-time instrumentation is handled by the STACKLEAK GCC plugin. GCC plugins are compiler-loadable modules that can be project-specific. They register new compilation passes via the GCC Pass Manager and provide the callbacks for these passes.

So the STACKLEAK GCC plugin inserts the aforementioned stackleak_track_stack() calls for the functions with a large stack frame. It also inserts the stackleak_check_alloca() call before alloca and the stackleak_track_stack() call after it.

As I already mentioned, inserting stackleak_check_alloca() was dropped in the 15th version of the STACKLEAK patch series.

The way to the mainline

The path of STACKLEAK to the Linux kernel mainline is very long and complicated (Figure 11).

Figure 11. The way to the mainline

In April 2017, the authors of grsecurity made their patches commercial. In May 2017, I decided to work on upstreaming STACKLEAK. It was the beginning of a very long story. My employer Positive Technologies allows me to spend a part of my working time on this task, although I mainly spend my free time on it.

As of October 9, 2018, the 15th version of the STACKLEAK patch series is contained in the linux-next branch. It fits Linus' requirements and is ready for the merge window of the 4.20/5.0 kernel release.

Conclusion

STACKLEAK is a very useful Linux kernel self-protection feature that mitigates several types of vulnerabilities. Moreover, the PaX Team has made it rather fast and technically beautiful. Considering the substantial work done in this direction, upstreaming STACKLEAK would benefit Linux users with high information security requirements and also focus the attention of the Linux developer community on kernel self-protection.

Author: Alexander Popov, Positive Technologies

Article Link: http://blog.ptsecurity.com/2018/10/how-stackleak-improves-linux-kernel.html