An overview of macOS kernel debugging

MalBot · May 7, 2019, 11:15am

The terms macOS kernel, Darwin kernel and XNU are used interchangeably throughout the posts. References are provided for XNU 4903.221.2 from macOS 10.14.1, the latest available sources at the time of writing.

What is a kernel debugger?

Debugging is the process of searching and correcting software issues that may cause a program to misbehave. Faults include wrong results, program freezes or crashes, and sometimes even security vulnerabilities. To examine running applications, operating systems provide userland debuggers mechanisms like ptrace or exception ports; but when working at kernel/driver/OS level, more powerful capabilities are required.

Modern operating systems like macOS or iOS consist of millions of lines of code, through which the kernel orchestrates the execution of hundreds of threads manipulating thousands of critical data structures. This complexity facilitates the introduction of likewise complex programming errors, which at minimum can cause the machine to stop or reboot. Even when kernel sources are available, tracing the root causes of such bugs is often very difficult, especially without knowing exactly which code has been executed or the state of registers and memory; similarly, the analysis of kernel rootkits and exploits of security vulnerabilities requires an accurate study of the behaviour of the machine.

For these reasons, operating systems often implement a kernel debugger, usually composed of a simple agent running inside the kernel, which receives and executes debugging commands, and a complete debugger running on a remote machine, which sends commands to the kernel and displays the results. The debugging stub internal to the kernel generally has the tasks of:

reading and writing registers;
reading and writing memory;
single-stepping through the code;
catching CPU interrupts.

With these capabilities, it also becomes possible to:

pause the kernel execution at specific virtual addresses, by patching the code with INT3 instructions and then waiting for type-3 interrupts to occur;
introspect kernel structures by parsing the kernel header and reading memory.

The next sections describe in detail how kernel debugging is implemented by XNU.

Debugging the macOS kernel

As described in the kernel’s README, XNU supports remote (two-machine) debugging by implementing the Kernel Debugging Protocol (KDP). Apple’s documentation about the topic is outdated and no longer being updated, but luckily detailed guides [1][2][3] on how to set up recent macOS kernels for remote debugging are available on the Internet; summarising, it is required to switch to one of the debug builds of the kernel (released as part of the Kernel Debug Kit, or KDK), rebuild the kernel extension (kext) caches and set the debug boot-arg in the NVRAM to the appropriate values. After this, LLDB (or any other debugger supporting KDP) can attach to the kernel. Conveniently, it is also possible to debug a virtual machine instead of a second Mac [4][5][6].

Mentioned for completeness, at least two other methods for kernel debugging have been supported at some point for several XNU releases. The archived Apple’s docs suggest to use ddb over a serial line when debugging via KDP is not possible or problematic (e.g., before the network hardware is initialised), but support for this feature seems to have been dropped after XNU 1699.26.8 as all related files were removed in the next release. Other documents, like the README of the kernel debug kit for macOS 10.7.3 build 11D50, allude to the possibility of using /dev/kmem for limited self-debugging:

‘Live (single-machine) kernel debugging was introduced in Mac OS X Leopard. This allows limited introspection of the kernel on a currently-running system. This works using the normal kernel and the symbols in this Kernel Debug Kit by specifying kmem=1 in your boot-args; the DEBUG kernel is not required.’

This method still works in recent macOS builds provided that System Integrity Protection (SIP) is disabled [7][8], but newer KDKs do not mention it anymore, and a note from the archived Apple’s docs says that support for /dev/kmem will be removed entirely in the future.

The Kernel Debugging Protocol

As already introduced, to make remote debugging possible XNU implements the Kernel Debugging Protocol, a client–server protocol over UDP that allows a debugger to send commands to the kernel and receive back results and exceptions notifications. The current revision of the protocol is the 12th, around since macOS 10.6 and XNU 1456.1.26.

Like in typical communication protocols, KDP packets are composed of a common header (containing, among others: the request type; a flag for distinguishing between requests and replies; and a sequence number) and specialised bodies for the different types of requests, like KDP_READMEM64 and KDP_WRITEMEM64, KDP_READREGS and KDP_WRITEREGS, KDP_BREAKPOINT_SET and KDP_BREAKPOINT_REMOVE. As stated in most debug kits’ README, communications between the kernel and the external debugger may occur either via FireWire or Ethernet (with Thunderbolt adapters in case no such ports are available); Wi-Fi is not supported. The kernel listens for KDP connections only when:

it is a DEVELOPMENT or DEBUG build and the debug boot-arg has been set to DB_HALT, in which case the kernel stops after the initial startup waiting for a debugger to attach [9][10];
it is being run on a hypervisor, the debug boot-arg has been set to DB_NMI and a non-maskable interrupt (NMI) is triggered [11][12];
the debug boot-arg has been set to any value (even invalid ones) and a panic occurs [13][14].

As might be expected, XNU assumes at most one KDP client is attached to it at any given time. With an initial KDP_CONNECT request, the debugger informs the kernel on which UDP port should notifications be sent back when exceptions occur. The interested reader can have an in depth look at the full KDP implementation starting from osfmk/kdp/kdp_protocol.h and osfmk/kdp/kdp_udp.c.

Detailed account of kernel-debugger interactions over KDP

For the even more curious, this section documents thoroughly what happens when LLDB attaches to XNU via KDP; reading is not required to follow the rest of the post. References are provided for LLDB 8.0.0.

Assuming that the kernel has been properly set up for debugging and the debug boot-arg has been set to DB_HALT, at some point during the XNU startup an IOKernelDebugger object will call kdp_register_send_receive() [15]. This routine, after parsing the debug boot-arg, executes kdp_call() [16][17] to generate an EXC_BREAKPOINT trap [18], which in turn triggers the execution of trap_from_kernel() [19], kernel_trap() [20] and kdp_i386_trap() [21][22][23]. This last function calls handle_debugger_trap() [24][25] and eventually kdp_raise_exception() [26][27] to start kdp_debugger_loop() [28][29]. Since no debugger is connected (yet), the kernel stops at kdp_connection_wait() [30][31], printing the string ‘Waiting for remote debugger connection.’ [32] and then waiting to receive a KDP_REATTACH request followed by a KDP_CONNECT [33].

In LLDB, the kdp-remote plug-in handles the logic for connecting to a remote KDP server. When the kdp-remote command is executed by the user, LLDB initiates the connection to the specified target by executing ProcessKDP::DoConnectRemote() [34], which sends in sequence the two initial requests KDP_REATTACH [35][36] and KDP_CONNECT [37][38].

Upon receiving the two requests, kdp_connection_wait() terminates [39][40] and kdp_handler() is entered [41][42]. Here, requests from the client are received [43], processed using a dispatch table [44][45] and responded [46] in a loop until either a KDP_RESUMECPUS or a KDP_DISCONNECT request is received [47][48].

Completed the initial handshake, LLDB then sends three more requests (KDP_VERSION [49][50], KDP_HOSTINFO [51][52] and KDP_KERNELVERSION [53][54]) to extract information about the debuggee. If the kernel version string (an example is ‘Darwin Kernel Version 16.0.0: Mon Aug 29 17:56:21 PDT 2016; root:xnu-3789.1.32~3/DEVELOPMENT_X86_64; UUID=3EC0A137-B163-3D46-A23B-BCC07B747D72; stext=0xffffff800e000000’) is recognised as coming from a Darwin kernel [55][56], then the darwin-kernel dynamic loader plug-in is loaded. At this point, the connection to the remote target is established and the attach phase is completed [57][58] by eventually instanciating the said plug-in [59][60], which tries to locate the kernel load address [61][62] and the kernel image [63][64]. Finally, the Darwin kernel module is loaded [65][66][67][68], which first searches the local file system for an on-disk file copy of the kernel using its UUID [69][70] and then eventually loads all kernel extensions [71][72].

After attaching, LLDB waits for commands from the user, which will be translated into KDP requests and sent to the kernel:

commands register read and register write generate KDP_READREGS [73] and KDP_WRITEREGS [74] requests;
commands memory read and memory write generate KDP_READMEM [75] and KDP_WRITEMEM [76] requests (respectively KDP_READMEM64 and KDP_WRITEMEM64 for 64-bit targets);
commands breakpoint set and breakpoint delete generate KDP_BREAKPOINT_SET and KDP_BREAKPOINT_REMOVE [77] requests (respectively KDP_BREAKPOINT_SET64 and KDP_BREAKPOINT_REMOVE64 for 64-bit targets);
commands continue and step both generate KDP_RESUMECPUS [78] requests; in case of single-stepping, the TRACE bit of the RFLAGS register is set [79][80][81] with a KDP_WRITEREGS request before resuming, which later causes a type-1 interrupt to be raised by the CPU after the next instruction is executed.

Upon receiving a KDP_RESUMECPUS request, kdp_handler() and kdp_debugger_loop() terminate [82][83][84] and the machine resumes its execution. When the CPU hits a breakpoint a trap is generated, and starting from trap_from_kernel() a new call to kdp_debugger_loop() is made (as discussed above). Since this time the debugger is connected, a KDP_EXCEPTION notification is generated [85][86] to inform the debugger about the event. After this, kdp_handler() [87] is executed again and the kernel is ready to receive new commands.

The Kernel Debug Kit

For some macOS releases, Apple also publishes the related Kernel Debug Kits, containing:

the RELEASE, KASAN (only occasionally), DEVELOPMENT and DEBUG builds of the kernel, the last two compiled with ‘additional assertions and error checking’;
symbols and debugging information in DWARF format, for each of the kernel builds and some Apple kexts included in macOS;
the lldbmacros, a set of additional LLDB commands for Darwin kernels.

KDKs are incredibly valuable for kernel debugging, but unfortunately they are not made available for all XNU builds and are often published weeks or months after them. By searching the Apple Developer Portal for the non-beta builds of macOS 10.14 as an example, at the time of writing the article, the KDKs released on the same day as the respective macOS release are only three (18A391, 18C54 and 18E226) out of nine builds; one KDK was released two weeks late (18B75); and no KDK was released for the other five builds (18B2107, 18B3094, 18D42, 18D43, 18D109). From a post on the Apple Developer Forums it appears that nowadays ‘the correct way to request a new KDK is to file a bug asking for it.’

lldbmacros

Starting with Apple’s adoption of LLVM with Xcode 3.2, GDB was eventually replaced by LLDB as the debugger of choice for macOS and its kernel. Analogously to the old kgmacros for GDB, Apple has been releasing since at least macOS 10.8 and XNU 2050.7.9 the so-called lldbmacros, a set of Python scripts for extending LLDB’s capabilities with helpful commands and macros for kernel debugging. Examples of these commands are allproc (for printing procinfo for each process structure), pmap_walk (to perform a page-table walk for virtual addresses) and showallkmods (for a summary of all loaded kexts).

Limitations of the available tools

The combination of KDP and LLDB, alongside with the notable introspection possibilities offered by lldbmacros, make for a great kernel debugger; still, at present time this approach also has a few annoyances and drawbacks, here summarised.

First, as already noted, the KDP stub in the kernel is activated only after setting the debug boot-arg in the non-volatile RAM, but such operation requires to disable SIP. Secondly, the whole debugging procedure has many side effects: the modification of global variables (like kdp_flag); the value of the kernel base address written at a fixed memory location; the altering of kernel code with 0xCC software breakpoints [88][89] (watchpoints are not supported). All these (and others) can be easily detected by drivers, rootkits and exploits by reading NVRAM or global variables or with code checksums. Thirdly, the remote debugger cannot stop the execution of the kernel once it has been resumed: the only way to return to the debugger is to wait for breakpoints to occur (or to generate traps by, for example, injecting an NMI with dtrace from inside the debuggee). Fourthly, debugging can obviously start only after the initialisation of the KDP agent in the kernel, which happens relatively late in the startup phase and makes early debugging impossible. Finally, being part of the Kernel Debug Kits, lldbmacros are unfortunately only available for a few macOS releases.

Wrapping up

With this post, we tried to document accurately how macOS kernel debugging works, in the hope of creating an up-to-date reference on the topic. In the next post, we will present our solution for a better macOS debugging experience, also intended to overcome the limitations of the current approach.

Article Link: An overview of macOS kernel debugging - Quarkslab's blog