Written by: Nino Isakovic, Chuong Dong
Overview
This blog post delves into the analysis of a control flow obfuscation technique employed by recent LummaC2 (LUMMAC.V2) stealer samples. In addition to the traditional control flow flattening technique used in older versions, the malware now leverages customized control flow indirection to manipulate the execution of the malware. This technique thwarts all binary analysis tools including IDA Pro and Ghidra, significantly hindering not only the reverse engineering process, but also automation tooling designed to capture execution artifacts and generate detections.
To provide insights to Google and Mandiant security teams, we developed an automated method for removing this protection layer through symbolic backward slicing. By leveraging the recovered control flow, we are able to rebuild and deobfuscate the samples into a format readily consumable for any static binary analysis platform.
Protection Components
Overview
An obfuscating compiler, which we will also informally refer to as an "obfuscator," is a transformation tool designed to enhance the security of software binaries by making them more resilient to binary analysis. It operates by transforming a given binary into a protected representation, thereby increasing the difficulty for the code to be analyzed or tampered with. These transformations are typically applied at a per-function basis where the user selects the specific functions to apply these transformations to.
Obfuscating compilers are distinct from packers, although they may incorporate packing techniques as part of their functionality. They fall under the broader classification of software protections, such as OLLVM, VMProtect, and Code Virtualizer, which provide comprehensive code transformation and protection mechanisms beyond simple packing. Notably, for all protected components, the original code will never be exposed in its original, unprotected form at any point during the runtime of a protected binary. It is also common for obfuscating compilers to mix the original compiler-generated code with obfuscator-introduced code. This generally tends to necessitate a comprehensive deobfuscator from an analyst in order to analyze the binary.
The obfuscator employed by LummaC2 applies a multitude of transformations consistent with standard obfuscating compiler technology. Our concern only focuses on the newly introduced control flow protection scheme that we uncovered.
Our analysis strongly suggests that the authors of the obfuscator have intimate knowledge of the LummaC2 stealer. Certain parts of the protection, as described in the upcoming sections, are specialized to handle specific components of the LummaC2 stealer.
Dispatcher Blocks
The obfuscator transforms the control flow of a protected function into one guided by "dispatcher blocks," each consisting of a subset of the original instructions that constituted the unprotected function and the new instructions introduced by the obfuscator. Each dispatcher block ends with an indirect jump that branches to a dynamically-resolved destination stored in a register or memory address. The result produced thereof mutates the original progressive linear control flow into a disjointed series of scattered blocks. Each block is isolated, containing only the runtime logic necessary to transfer execution to its immediate successor block.
<div>
<div>
<img alt="Dispatcher blocks overview" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig1.max-1000x1000.png" />
<p>Figure 1: Dispatcher blocks overview</p>
</div>
</div>
We refer to all instructions generated by the obfuscator as "dispatcher instructions" to differentiate them from "original instructions." Dispatcher blocks used by the obfuscator can be categorized into two main types: unconditional and conditional dispatcher.
-
Unconditional dispatcher: This dispatcher type protects the majority of instructions in an obfuscated function. It consists of dispatcher instructions that fetch encoded offsets from a lookup table in the
.data
section and perform ADD and XOR operations on them to calculate the next destination to transfer execution to. -
Conditional dispatcher: This dispatcher type protects either individual conditional jump instructions (e.g.,
jne
orja
) or basic blocks that end with a conditional jump. Instead of a single encoded offset to calculate and transfer execution to, the conditional dispatcher fetches one of two possible encoded offsets depending on the result of the condition to test.
<div>
<div>
<img alt="Dispatcher block types" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig2.max-1000x1000.png" />
<p>Figure 2: Dispatcher block types</p>
</div>
</div>
Conditional and unconditional dispatcher blocks are further categorized based on the distinct characteristics and layout of dispatcher instructions.
- Register-based dispatcher: All calculations from dispatcher instructions operate solely on registers and always constitute the remaining instructions of the basic block.
- Memory-based dispatcher: Dispatcher instructions operate on both registers and stack values for calculating the final jump destination and are also always the remaining instructions within the basic block.
- Mixed-order dispatcher: A variant of register-based and memory-based dispatchers. The order and positions of dispatcher instructions in this layout are intertwined among original instructions that they are protecting instead of being placed at the end of the block.
<div>
<div>
<img alt="Obfuscating compiler dispatcher layouts" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig3.max-1000x1000.png" />
<p>Figure 3: Obfuscating compiler dispatcher layouts</p>
</div>
</div>
Dispatcher blocks can also exist standalone where they do not protect any original code. In such cases, they act as a single step responsible for continuing the control flow.
Register-based Dispatcher Layout
Using the following LummaC2 sample with MD5 hash 205e45e123aea66d444feaba9a846748
from the Google Threat Intelligence collection here as a case study, we discover that out of 2,009 dispatcher blocks processed, there are 1,981 register-based dispatcher blocks, making it the most common dispatcher layout. This layout is applied to both conditional and unconditional dispatcher types that occur in any protected function.
00416630 mov eax, off_457C8C ; Retrieve CONSTANT1 from .data section
00416635 mov ecx, 22A7266Eh ; Populate CONSTANT2
0041663A xor ecx, dword_457C94 ; XOR CONSTANT2 with CONSTANT3
; from the .data section
00416640 add eax, ecx ; ADD CONSTANT1 with the result
00416642 inc eax ; Increment the result
00416643 jmp eax ; Jump to the result
Figure 4: Register-based instruction dispatcher
By analyzing dispatcher blocks of this layout, we can derive some key characteristics of the protection. These blocks typically include mov
instructions to fetch a value from the malware's .data
section or populate the register with a constant. Next, an xor/lea
instruction and an inc
instruction perform arithmetic operations on the retrieved values. Finally, the dispatcher block ends with a jmp
instruction to branch to the dynamically calculated value stored in a register.
This final indirect jump obfuscates the function's original control flow. It breaks the control flow recovery algorithms of tools like IDA Pro which is unable to recover the jump destination statically, hindering both the disassembly and decompilation operations.
<div>
<div>
<img alt="IDA Pro's disassembly and decompiler views of a protected subroutine" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig5.max-1000x1000.png" />
<p>Figure 5: IDA Pro's disassembly and decompiler views of a protected subroutine</p>
</div>
</div>
By identifying the common patterns within these dispatcher instructions, it's possible to differentiate them from the function's core instructions, which is crucial for lifting the protection and deobfuscating the function.
Another observation is that the obfuscator produces duplicated original instructions when injecting its dispatcher instructions. Our assumption is that the obfuscator does not want to reallocate original instruction blocks when injecting the dispatcher code. As a result, it resolves this by copying those instructions to a new block at the destination.
0041665A push 0FFFFFFF6h ; Duplicated instruction
0041665C call ds:GetStdHandle ; Duplicated instruction
00416662 call sub_41A4A0 ; Duplicated instruction
00416667 push 0FFFFFFF6h ; Original instruction. Last dispatcher
; block will jump here
00416669 call ds:GetStdHandle ; Original instruction of next block
0041666F call sub_41A4A0 ; Original instruction of next block
00416674 mov ecx, off_457CB0 ; Next dispatcher instructions
0041667A mov edx, 9148854h
0041667F xor edx, dword_457CB4
00416685 add ecx, edx
00416687 inc ecx
00416688 jmp ecx
Figure 6: Duplicated instructions between two dispatcher blocks
Memory-based Dispatcher Layout
Memory-based dispatcher blocks appear significantly less frequently, as there are only 28 dispatchers of this type in the 2,009 blocks processed. Unlike the register-based layout, this layout relies on both registers and stack values for calculating and jumping to the destination. An example of this layout is shown in Figure 7, where the add
dispatcher instruction adds a value stored on the stack to a register.
0044AA3A mov edi, [esi+50h] ; esi = esp in previous instruction
0044AA3D cmp edi, [esi+98h]
0044AA43 setb bl
0044AA46 mov edi, off_46C030[ebx*4]
0044AA4D add edi, [esi+9Ch] ; Dispatcher instruction. Adding a stack
; value to edi (jump destination)
0044AA53 mov ebx, [esi+0A0h]
0044AA59 jmp edi ; Jumping to edi
Figure 7: Dispatcher utilizing stack values to calculate the indirect jump's destination
In a smaller number of cases, we encounter dispatcher blocks of this layout ending with a jmp
instruction that does not branch to a register value. Instead, it utilizes a value stored on the stack to determine the jump target.
0041CCB4 mov eax, [esi+5Ch]
0041CCB7 mov [eax], edi
0041CCB9 jmp dword ptr [esi+14h] ; Dispatcher jump to a stack value
Figure 8: Dispatcher with memory-based indirect jump
Mixed-order Dispatcher Layout
Mixed-order dispatcher layout is a variant of the register-based and memory-based dispatcher layouts. There are 12 memory-based and 28 register-based dispatcher blocks that fall into this mixed-order category.
Most dispatcher instructions are placed at the tail of an original instruction or a sequence of original instructions. However, this can vary and parts of the dispatcher block can also be split up and randomly intertwined with the initial instructions. This unpredictable placement adds another layer of complexity to the deobfuscation process.
Dispatcher instructions: 0041E847 mov eax, 0F5A88CDAh ; Dispatcher instruction 0041E84C xor eax, dword_459880 ; Dispatcher instruction 0041E852 mov ecx, off_459878 ; Dispatcher instruction 0041E858 add eax, ecx ; Dispatcher instruction 0041E85A inc eax ; Dispatcher instruction
Original instructions:
0041E85B mov ebx, [esi+48h]
0041E85E mov ecx, [ebp+10h]
0041E861 mov [ebx], ecx
0041E863 mov edi, [esi+2Ch]
0041E866 mov ecx, [ebp+0Ch]
0041E869 mov [edi], ecx
0041E86B mov edi, [esi+0Ch]
0041E86E mov ecx, [esi+20h]
0041E871 mov dword ptr [edi], 0
0041E877 mov dword ptr [ecx], 0
0041E87D xorps xmm0, xmm0
0041E880 movups xmmword ptr [edx+4], xmm0
0041E884 movups xmmword ptr [edx+14h], xmm0
0041E888 movups xmmword ptr [edx+24h], xmm0
0041E88C mov dword ptr [edx+38h], 0
0041E893 mov dword ptr [edx+34h], 0
0041E89A mov dword ptr [edx], 3Ch
0041E8A0 mov dword ptr [edx+8], 0FFFFFFFFh
0041E8A7 mov dword ptr [edx+14h], 0FFFFFFFFh
0041E8AE mov dword ptr [edx+30h], 0FFFFFFFFh
0041E8B5 jmp eax ; Indirect jump
Figure 9: Mixed-order dispatcher example
Conditional Dispatcher
Conditional dispatchers deserve extra attention as they introduce more logic than unconditional ones. It is also important to note that all conditional branches are not subject to being obfuscated. We have identified 379 such instances within the case study sample that remain in their original state. These are leveraged in the context of tight loops and heavy string processing routines. They are likely left out of the protection scheme due to the severe performance degradation they induce.
The structure of conditional dispatcher blocks exhibits a slight variation from that of unconditional dispatchers. Given that the intent is to protect conditional logic, there will always be two possible outcomes:
-
The branch that satisfies the condition being taken
-
The fallthrough branch that does not satisfy the condition being taken
The obfuscator employs a table of paired entries for each conditional branch that is indexed given the result of the condition, which will either be true or false (0 or 1). Each index corresponds to one of the two branches that can be taken.
Conditional dispatchers fall into three distinct categories.
- Standard conditional logic
- The obfuscator accounts for all common conditional jump conditions
- The condition code is evaluated using one of the following instructions:
test <reg>, <reg>
cmp <reg>, <imm>
setcc
is then used to capture the original conditional jump logic. That is to say, every original conditional jump instruction is reflected as itssetcc
counterpart (e.g., ajnz
becomes asetnz
)
- Loop logic
- Non-infinite loops require conditional logic as a means of exiting the loop body. The obfuscator implements this using three distinct dispatcher blocks linked with an arbitrary subset of dispatcher blocks that represent the loop body
- Initialization block
- Initializes the default branch target via an "exit condition" flag that is always set to false (so that execution is transferred to the start of the loop body)
- Update block
- Updates the exit condition flag based on the processing of either the initialization block or logic stemming from the loop body
- Exit-check block
- Checks whether the exit condition flag is either set to exit the loop or transfer execution back to the loop body
- Initialization block
- Non-infinite loops require conditional logic as a means of exiting the loop body. The obfuscator implements this using three distinct dispatcher blocks linked with an arbitrary subset of dispatcher blocks that represent the loop body
- Syscall logic
- This category is specific to a LummaC2 component that invokes Windows syscalls and disguises how the resulting
NTSTATUS
code is verified. This is effectively a conditional dispatcher that implements theNT_SUCCESS
macro. - The following instruction sequences are used to determine the success of a syscall by negating the returned
NTSTATUS
and inspecting its sign value. A value of 1 indicates a successful syscall while 0 indicates a failed syscall.not eax
shr eax, 0x1F
- This category is specific to a LummaC2 component that invokes Windows syscalls and disguises how the resulting
Standard Conditional Dispatcher Type
Continuing with using the case study sample from earlier, we find the standard conditional dispatcher type occurring 987 times out of the 1,063 conditional dispatchers.
Figure 10 and Figure 11 illustrate this type where the conditional value is tested against both zero and a non-zero constant. The first figure shows the conditional value being compared to 0 using a test
instruction. The second shows the conditional value being evaluated against a non-zero constant 0x5A4D
using a cmp
instruction.
0041656E call sub_41C610 ; subroutine call at 0x41C610
00416573 mov esi, eax ; save set return value (eax) into esi
00416575 xor eax, eax ; clear out the index
00416577 test esi, esi ; evaluate the result
00416579 setnz al ; Set al if conditional value is not zero
0041657C mov eax, off_457CF4[eax*4] ; fetch appropriate encoded branch target
00416583 mov ecx, 0C09E0A35h ; start the decoding sequence
00416588 xor ecx, dword_457CFC
0041658E add eax, ecx
00416590 inc eax
00416591 jmp eax ; transfer execution to the decoded
; branch value
Figure 10: Conditional dispatcher with the conditional value being compared to 0
0044DD15 movzx ecx, word ptr [edi] ; fetch the 16-bit value to evaluate
0044DD18 xor edx, edx ; clear out the index
0044DD1A cmp ecx, 5A4Dh ; compare to the 0x5A4D constant
0044DD20 setnz dl ; set the index to the result
0044DD23 mov ecx, off_46F304[edx*4] ; fetch appropriate encoded branch target
0044DD2A mov edx, 9EC9743Dh ; start the decoding sequence
0044DD2F xor edx, dword_46F30C
0044DD35 add ecx, edx
0044DD37 inc ecx
0044DD38 jmp ecx ; transfer execution to the decoded
; branch value
Figure 11: Conditional dispatcher with the conditional value being compared to a non-zero constant
Loop Conditional Dispatcher Type
Figure 12, Figure 13 and Figure 14 provide an illustration of a loop conditional dispatcher type, which occurs 42 times within the sample. It is always a collection of linked dispatcher blocks that include the loop initialization sequence, the loop body (an arbitrary collection of dispatcher blocks specific to the loop logic), an update condition block, and finally a check-exit condition block.
The initialization block sets the stage for a loop by establishing an "exit condition" flag and initializing it to false, ensuring the loop body executes at least once. The update block then modifies this flag based on the results of the initialization block or the loop body's logic. Finally, the exit-check block examines the flag's state to determine whether to continue iterating or exit the loop.
0044CD55 mov dword_470A30, ebx
0044CD5B mov edi, [ebp-34h]
0044CD5E xchg ax, ax
0044CD60 mov eax, off_46CB3C
0044CD65 mov ecx, 74F906B5h
0044CD6A xor ecx, dword_46CB44
0044CD70 add eax, ecx
0044CD72 inc eax
0044CD73 mov dword ptr [ebp-30h], 0
0044CD7A mov dword ptr [ebp-18h], 0 ; conditional flag, initially 0 to
; reflect transfer to the loop body
; not the loop exit
0044CD81 mov dword ptr [ebp-28h], 0
0044CD88 mov dword ptr [ebp-40h], 0
0044CD8F jmp eax
Figure 12: A loop implementation block implemented
0044C108 mov ecx, [ebp-5Ch]
0044C10B mov eax, [ecx+1]
0044C10E add eax, ecx
0044C110 add eax, 5
0044C113 mov [ebp-18h], eax ; instructions that update the
; conditional flag
0044C116 mov eax, off_46CFE4
0044C11B mov ecx, 681DADB7h
0044C120 xor ecx, dword_46CFEC
0044C126 add eax, ecx
0044C128 inc eax
0044C129 nop dword ptr [eax+00000000h]
0044C130 mov ecx, [ebp-18h]
0044C133 mov [ebp-28h], ecx
0044C136 jmp eax
Figure 13: A update-block loop
0044C2AD xor eax, eax
0044C2AF mov edx, [ebp-18h] ; evaluate the conditional flag
0044C2B2 test edx, edx
0044C2B4 setnz al
0044C2B7 mov ecx, 27DC8BC9h
0044C2BC xor ecx, dword_46D248
0044C2C2 mov eax, off_46D240[eax*4] ; fetch the target
0044C2C9 add eax, ecx
0044C2CB inc eax
0044C2CC mov [ebp-28h], edx
0044C2CF mov ebx, [ebp-20h]
0044C2D2 jmp eax ; Jump back to a loop body block
; or exit the loop
Figure 14: An exit-check block
Syscall Conditional Dispatcher Type
Dispatchers of this type are used for checking the return values of LummaC2-specific function calls that perform a syscall. They appear only 34 times in the case study sample. In these functions, LummaC2 decrypts the shellcode in Figure 15 and executes it in memory to make a particular syscall.
mov eax, <syscall ID>
mov edx, win32u.Wow64SystemServiceCall
call edx
ret <imm16>
Figure 15: Shellcode to call Windows system call
In other cases, the malware makes direct calls to Windows Native APIs instead of utilizing the shellcode in Figure 15.
The conditional dispatcher for this type implements the NT_SUCCESS
macro by checking whether the returned NTSTATUS
code is successful or not. This is done via checking the sign of the inverted NTSTATUS
code and capturing it as the branch target index, which will either be 0 or 1. Given that a successful NTSTATUS
code is always a 32-bit zero value, a successful syscall will result in the true branch (index 1) being taken, and a failed syscall will result in the false branch (index 0) being taken.
00424D95 call sub_44EDA0 ; wrapper function to perform a syscall
00424D9A add esp, 0Ch
00424D9D not eax ; negate all bits of the NTSTATUS return value
00424D9F shr eax, 1Fh ; isolate the sign bit to capture the
; result and in turn, the index to
; the according branch
00424DA2 mov eax, off_45DC9C[eax*4] ; fetch the according branch target
00424DA9 mov ecx, 31637ACh
00424DAE xor ecx, dword_45DCA4
00424DB4 add eax, ecx
00424DB6 inc eax
00424DB7 jmp eax
Figure 16: Conditional dispatcher to check syscall return values
Obfuscated Function Recovery
Original Instruction Recovery
Recovering the original control flow of a protected function requires us to differentiate between the obfuscator's injected dispatcher instructions and the function's original instructions. To solve this, we decide to use symbolic backward slicing, a program analysis technique that identifies instructions that influence a specific register or memory address at a given point within a simulated execution on an intermediate representation. In this context, we employ backward slicing to do the following:
-
Isolate the dispatcher instructions from the original instructions
-
Determine which explicit instructions calculate the final indirect transfer of control
In our deobfuscator design, we leverage the Triton symbolic execution engine to conduct the core of the recovery. Triton implements backward tracing APIs that we can use directly. When executing the program, Triton maintains a set of symbolic expressions that represent the values of registers and memory addresses. These expressions are stored as an Abstract Syntax Tree (AST), where each tree node represents an operation with operands that result from the execution flow. Triton refers to this implementation as "processing," which is the result of simulating the memory effects a culmination of emulated instructions produce and reflecting that result as an AST.
This is a powerful abstraction that allows us to reason about the deobfuscation at an AST level and ignore the verbose disassembly produced by the obfuscator.
To distinguish dispatcher instructions, we'll focus on the destination of the final indirect jump in a dispatcher block. By looking up this destination in the constructed ASTs after all dispatcher instructions are processed, we can extract its corresponding symbolic expressions.
Figure 17 shows the AST of the destination register eax
at an indirect jump. This AST represents all symbolic expressions from the result of the symbolic processing of the corresponding instructions that influence the value of the destination register before the indirect jump is executed.
<div>
<div>
<img alt="ASTs of the destination register after the indirect jump instruction is processed" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig17.max-1000x1000.png" />
<p>Figure 17: ASTs of the destination register after the indirect jump instruction is processed</p>
</div>
</div>
Using Triton's APIs, we can extract a subset (or slice) of the processed expressions that collectively contribute to the final destination address of an indirect jump. For each expression in the slice, we can map it back to the specific dispatcher instruction that generates it. This mapping is possible because Triton maintains the association between instructions and the symbolic expressions they produce during its execution.
A snippet of the code used to perform backward slicing to distinguish dispatcher instructions from the original ones is shown in Figure 18.
# Retrieve the bytes of the instruction at the current program counter instructionBytes = context.getConcreteMemoryAreaValue(pc, 16)
Create a Triton Instruction object from the retrieved bytes
instruction = Instruction(pc, instructionBytes)
Process the instruction using the Triton context
context.processing(instruction)
Scan for dispatcher jump instruction
if instruction.getType() == OPCODE.X86.JMP:
# Extract the operand of the JMP instruction
jmpOperand = instruction.getOperands()[0]
# Process JMP instructions with register operand only
if jmpOperand.getType() == OPERAND.REG:
# Get symbolic expression of destination register
destRegExpression = context.getSymbolicRegisters()[jmpOperand.getId()]
# Backward slice on the destination register
slicing = context.sliceExpressions(destRegExpression)
# Iterating through the slices
for _, sliceInstr in sorted(slicing.items()):
# Print out the disassembled instruction of each slice
sliceInstrDisassembly = sliceInstr.getDisassembly()
print(‘\t[Slice]’, sliceInstrDisassembly)
Figure 18: Triton code to perform backward slicing to recover all dispatcher instructions
Here, we continuously execute instructions until a jmp
instruction is encountered. If the instruction's operand is a register, we retrieve its set of symbolic expressions and perform a backward slice to recover all instructions that influenced its result. Triton allows us to further preserve the original disassembly given a set of symbolic expressions that we leverage to extract the exact dispatcher instructions that produce the slice, and not merely the AST representation.
Once the complete backward slice for the destination has been retrieved, we can confidently distinguish the dispatcher instructions from the original instructions within the function. This distinction holds true regardless of the placement or order of the dispatcher instructions within a protected block since the backward slice only monitors those instructions that directly influence the final value.
Backward slicing output:
...
[Processing] 0x416530: lea eax, [esp + 8]
[Processing] 0x416534: push eax
[Processing] 0x416535: call dword ptr [0x454a18]
[Processing] 0x41653b: mov eax, esp
[Processing] 0x41653d: push eax
[Processing] 0x41653e: call dword ptr [0x454a14]
[Processing] 0x416544: mov eax, dword ptr [0x457c1c]
[Processing] 0x416549: mov ecx, 0xa15bd01f
[Processing] 0x41654e: xor ecx, dword ptr [0x457c24]
[Processing] 0x416554: add eax, ecx
[Processing] 0x416556: inc eax
[Processing] 0x416557: jmp eax
[Slice] 0x416544: mov eax, dword ptr [0x457c1c]
[Slice] 0x416549: mov ecx, 0xa15bd01f
[Slice] 0x41654e: xor ecx, dword ptr [0x457c24]
[Slice] 0x416554: add eax, ecx
[Slice] 0x416556: inc eax
...
Figure 19: Output for the code in Figure 18 to distinguish dispatcher instructions
Control Flow Recovery
In addition to recovering all original instructions of the function, we must also recover the original control flow. While instructions are processed dynamically, Triton allows us to determine the concrete destination value of the final indirect jump in the dispatcher block. With this, we can trace the program's execution flow and reconstruct the order in which dispatcher blocks are executed.
To explore all possible execution paths within the function, we employ a depth-first search (DFS) traversal algorithm.
We begin by exploring a single path, following the control flow dictated by the obfuscator's indirect jumps. This continues until the path reaches a termination point, such as a ret
instruction or a program-ending API call (e.g., ExitProcess
).
In our deobfuscator design, we default to viewing all of these protected jumps as jnz
instructions by forcing the index register to be 1 in the main execution path being processed. When encountering a protected conditional jump, we assume the condition is met and continue exploring the path that follows the jump. However, we don't discard the alternative path. The alternative path is stored in a queue-like data structure. This allows us to revisit these paths later when we've exhausted all possibilities on the current path.
By systematically exploring all paths using DFS and handling conditional jumps strategically, we can reconstruct the original control flow that has been obfuscated with the compiler's indirect jumps.
Deobfuscation: Rebuilding Original Function
With the original instructions and execution paths identified, we can deobfuscate the sample by rebuilding the functions we have processed. Our goal is to ensure the deobfuscated functions are restored to their original state, preserving their original semantics and removing all traces of the obfuscator.
Instruction Rewriting
When rebuilding, we can overwrite the original protected function with the deobfuscated instructions. Since a deobfuscated function always has fewer instructions than an obfuscated function, there is guaranteed space to accommodate the rebuilt function. The remaining space can be padded with standard compiler padding instructions like 0xCC
.
The rewriting process involves writing instructions back from the function's entry point in the order they are processed and executed during the Triton analysis, excluding all dispatcher instructions. Here, we will address two specific cases involving indirect jumps originally added by the obfuscator.
The first case involves processing an unconditional dispatcher block. For this case, if the jump target has not been written yet, we simply skip it and continue writing instructions sequentially. If the jump target has already been written, we replace the indirect jump with a direct one to branch back to that target.
The second case for handling the jump instruction of a conditional dispatcher block is a bit more convoluted. Before tackling this, we must determine the original conditional jump type (e.g., jz
, jnz
, jl
) based on the preceding setcc
dispatcher instruction.
Since the indirect jump can target one of the two destinations given a condition, we must replace it with two instructions. The first instruction is a conditional jump to the first destination using the correct conditional jump type.
The second instruction can be either:
-
A conditional jump with the opposite type as the first, targeting the second destination.
-
A direct jump to the second destination. This is chosen for simplicity of our deobfuscator implementation.
0041652B call sub_4455F0 ; original instruction
00416530 movzx eax, al ; eax = al = return value
00416533 test eax, eax ; set flags
00416535 jnz loc_416540 ; replacing indirect jmp with jnz for the first path
0041653B jmp loc_416554 ; insert a jmp for the second path
Figure 20: Replacing an indirect conditional jump with a jnz-jmp
instruction pair
Offset Relocation
The final step, relocation, addresses a remnant from our rebuilding process. As we remove dispatcher instructions and duplicated instructions, the rewritten instructions will occupy different locations from where they were in the original function. This displacement throws off the offsets of jump, call, and other memory-referencing instructions that are not position-independent, as they now need to refer to memory locations from their new addresses.
In our current implementation, we address this by parsing all of the memory-referencing instructions and calculating their correct offsets after deobfuscation. This involves tracking both the original and relocated addresses of each instruction. With this information, we can calculate the adjusted offset to reach the target memory reference and craft the correct opcode for each instruction.
Final Result
By employing techniques described in this blog post, we have successfully developed a deobfuscation tool for this version of LummaC2. In the following figures, we see the result of our deobfuscator lifting the protection from two protected functions in the case study sample.
<div>
<div>
<img alt="Disassembly view of the subroutine at the binary's entrypoint before deobfuscation" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig21.max-1000x1000.png" />
<p>Figure 21: Disassembly view of the subroutine at the binary's entrypoint before deobfuscation</p>
</div>
</div>
<div>
<div>
<img alt="Decompiler view of the subroutine at the binary's entrypoint after deobfuscation" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig22.max-1000x1000.png" />
<p>Figure 22: Decompiler view of the subroutine at the binary's entrypoint after deobfuscation</p>
</div>
</div>
<div>
<div>
<img alt="Disassembly view of the subroutine at address 0x41EE50 before deobfuscation" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig23.max-1000x1000.png" />
<p>Figure 23: Disassembly view of the subroutine at address 0x41EE50 before deobfuscation</p>
</div>
</div>
<div>
<div>
<img alt="Decompiler view of the subroutine at address 0x41EE50 after deobfuscation" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig24.max-1000x1000.png" />
<p>Figure 24: Decompiler view of the subroutine at address 0x41EE50 after deobfuscation</p>
</div>
</div>
As shown in these figures, the original instructions are now readily apparent, free from the clutter of dispatcher blocks added by the obfuscator. The control flow, once obscured by indirect jumps, is now clearly visible and can be recovered and decompiled using IDA Pro. After deobfuscating all protected functions, we can now analyze the original program to comprehend its capabilities and behaviors.
Conclusion
In this blog post, we have explored the inner workings of LummaC2's obfuscation technique using indirect jumps to manipulate control flow. By leveraging backward slicing and symbolic execution, we have been able to consistently identify the original instructions and eliminate dispatcher instructions added by the obfuscator. Furthermore, we have discussed strategies for deobfuscation, including rebuilding the original function from the recovered control flow and addressing relocation challenges.
While this blog post focuses on deobfuscating LummaC2 protected subroutines, the power of backward slicing as a binary analysis technique extends well beyond this specific case. We hope our exploration of deobfuscating LummaC2 through the use of backward slicing has provided valuable insights to fellow analysts tackling similar challenges in the ever-evolving realm of reverse engineering and malware analysis.
Indicators of Compromise
A Google Threat Intelligence Collection featuring indicators of compromise (IOCs) related to the activity described in this post is now available.
Host-Based IOCs
MD5 |
Associated Malware Family |
d01e27462252c573f66a14bb03c09dd2 |
LUMMAC.V2 |
5099026603c86efbcf943449cd6df54a |
LUMMAC.V2 |
205e45e123aea66d444feaba9a846748 |
LUMMAC.V2 |
Article Link: https://cloud.google.com/blog/topics/threat-intelligence/lummac2-obfuscation-through-indirect-control-flow/