LummaC2: Obfuscation Through Indirect Control Flow

Written by: Nino Isakovic, Chuong Dong

Overview

This blog post delves into the analysis of a control flow obfuscation technique employed by recent LummaC2 (LUMMAC.V2) stealer samples. In addition to the traditional control flow flattening technique used in older versions, the malware now leverages customized control flow indirection to manipulate the execution of the malware. This technique thwarts all binary analysis tools including IDA Pro and Ghidra, significantly hindering not only the reverse engineering process, but also automation tooling designed to capture execution artifacts and generate detections.

To provide insights to Google and Mandiant security teams, we developed an automated method for removing this protection layer through symbolic backward slicing. By leveraging the recovered control flow, we are able to rebuild and deobfuscate the samples into a format readily consumable for any static binary analysis platform.

Protection Components

Overview

An obfuscating compiler, which we will also informally refer to as an "obfuscator," is a transformation tool designed to enhance the security of software binaries by making them more resilient to binary analysis. It operates by transforming a given binary into a protected representation, thereby increasing the difficulty for the code to be analyzed or tampered with. These transformations are typically applied at a per-function basis where the user selects the specific functions to apply these transformations to.

Obfuscating compilers are distinct from packers, although they may incorporate packing techniques as part of their functionality. They fall under the broader classification of software protections, such as OLLVM, VMProtect, and Code Virtualizer, which provide comprehensive code transformation and protection mechanisms beyond simple packing. Notably, for all protected components, the original code will never be exposed in its original, unprotected form at any point during the runtime of a protected binary. It is also common for obfuscating compilers to mix the original compiler-generated code with obfuscator-introduced code. This generally tends to necessitate a comprehensive deobfuscator from an analyst in order to analyze the binary.

The obfuscator employed by LummaC2 applies a multitude of transformations consistent with standard obfuscating compiler technology. Our concern only focuses on the newly introduced control flow protection scheme that we uncovered.

Our analysis strongly suggests that the authors of the obfuscator have intimate knowledge of the LummaC2 stealer. Certain parts of the protection, as described in the upcoming sections, are specialized to handle specific components of the LummaC2 stealer.

Dispatcher Blocks

The obfuscator transforms the control flow of a protected function into one guided by "dispatcher blocks," each consisting of a subset of the original instructions that constituted the unprotected function and the new instructions introduced by the obfuscator. Each dispatcher block ends with an indirect jump that branches to a dynamically-resolved destination stored in a register or memory address. The result produced thereof mutates the original progressive linear control flow into a disjointed series of scattered blocks. Each block is isolated, containing only the runtime logic necessary to transfer execution to its immediate successor block.

<div>
  <div>




  
  
    
    <img alt="Dispatcher blocks overview" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig1.max-1000x1000.png" />
    
    
  
    <p>Figure 1: Dispatcher blocks overview</p>
  



  </div>
</div>

We refer to all instructions generated by the obfuscator as "dispatcher instructions" to differentiate them from "original instructions." Dispatcher blocks used by the obfuscator can be categorized into two main types: unconditional and conditional dispatcher.

    • Unconditional dispatcher: This dispatcher type protects the majority of instructions in an obfuscated function. It consists of dispatcher instructions that fetch encoded offsets from a lookup table in the .data section and perform ADD and XOR operations on them to calculate the next destination to transfer execution to.

    • Conditional dispatcher: This dispatcher type protects either individual conditional jump instructions (e.g., jne or ja) or basic blocks that end with a conditional jump. Instead of a single encoded offset to calculate and transfer execution to, the conditional dispatcher fetches one of two possible encoded offsets depending on the result of the condition to test.

<div>
  <div>




  
  
    
    <img alt="Dispatcher block types" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig2.max-1000x1000.png" />
    
    
  
    <p>Figure 2: Dispatcher block types</p>
  



  </div>
</div>

Conditional and unconditional dispatcher blocks are further categorized based on the distinct characteristics and layout of dispatcher instructions.

  • Register-based dispatcher: All calculations from dispatcher instructions operate solely on registers and always constitute the remaining instructions of the basic block.
  • Memory-based dispatcher: Dispatcher instructions operate on both registers and stack values for calculating the final jump destination and are also always the remaining instructions within the basic block.
  • Mixed-order dispatcher: A variant of register-based and memory-based dispatchers. The order and positions of dispatcher instructions in this layout are intertwined among original instructions that they are protecting instead of being placed at the end of the block.
<div>
  <div>




  
  
    
    <img alt="Obfuscating compiler dispatcher layouts" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig3.max-1000x1000.png" />
    
    
  
    <p>Figure 3: Obfuscating compiler dispatcher layouts</p>
  



  </div>
</div>

Dispatcher blocks can also exist standalone where they do not protect any original code. In such cases, they act as a single step responsible for continuing the control flow. 

Register-based Dispatcher Layout

Using the following LummaC2 sample with MD5 hash 205e45e123aea66d444feaba9a846748 from the Google Threat Intelligence collection here as a case study, we discover that out of 2,009 dispatcher blocks processed, there are 1,981 register-based dispatcher blocks, making it the most common dispatcher layout. This layout is applied to both conditional and unconditional dispatcher types that occur in any protected function.

00416630 mov     eax, off_457C8C      ; Retrieve CONSTANT1 from .data section
00416635 mov     ecx, 22A7266Eh       ; Populate CONSTANT2
0041663A xor     ecx, dword_457C94    ; XOR CONSTANT2 with CONSTANT3 
                                      ; from the .data section
00416640 add     eax, ecx             ; ADD CONSTANT1 with the result
00416642 inc     eax                  ; Increment the result
00416643 jmp     eax                  ; Jump to the result

Figure 4: Register-based instruction dispatcher

By analyzing dispatcher blocks of this layout, we can derive some key characteristics of the protection. These blocks typically include mov instructions to fetch a value from the malware's .data section or populate the register with a constant. Next, an xor/lea instruction and an inc instruction perform arithmetic operations on the retrieved values. Finally, the dispatcher block ends with a jmp instruction to branch to the dynamically calculated value stored in a register.

This final indirect jump obfuscates the function's original control flow. It breaks the control flow recovery algorithms of tools like IDA Pro which is unable to recover the jump destination statically, hindering both the disassembly and decompilation operations.

<div>
  <div>




  
  
    
    <img alt="IDA Pro's disassembly and decompiler views of a protected subroutine" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig5.max-1000x1000.png" />
    
    
  
    <p>Figure 5: IDA Pro's disassembly and decompiler views of a protected subroutine</p>
  



  </div>
</div>

By identifying the common patterns within these dispatcher instructions, it's possible to differentiate them from the function's core instructions, which is crucial for lifting the protection and deobfuscating the function.

Another observation is that the obfuscator produces duplicated original instructions when injecting its dispatcher instructions. Our assumption is that the obfuscator does not want to reallocate original instruction blocks when injecting the dispatcher code. As a result, it resolves this by copying those instructions to a new block at the destination.

0041665A push    0FFFFFFF6h            ; Duplicated instruction
0041665C call    ds:GetStdHandle       ; Duplicated instruction
00416662 call    sub_41A4A0            ; Duplicated instruction
00416667 push    0FFFFFFF6h            ; Original instruction. Last dispatcher
                                       ; block will jump here
00416669 call    ds:GetStdHandle       ; Original instruction of next block
0041666F call    sub_41A4A0            ; Original instruction of next block
00416674 mov     ecx, off_457CB0       ; Next dispatcher instructions
0041667A mov     edx, 9148854h
0041667F xor     edx, dword_457CB4
00416685 add     ecx, edx
00416687 inc     ecx
00416688 jmp     ecx

Figure 6: Duplicated instructions between two dispatcher blocks

Memory-based Dispatcher Layout

Memory-based dispatcher blocks appear significantly less frequently, as there are only 28 dispatchers of this type in the 2,009 blocks processed. Unlike the register-based layout, this layout relies on both registers and stack values for calculating and jumping to the destination. An example of this layout is shown in Figure 7, where the add dispatcher instruction adds a value stored on the stack to a register.

0044AA3A mov     edi, [esi+50h]             ; esi = esp in previous instruction
0044AA3D cmp     edi, [esi+98h]             
0044AA43 setb    bl
0044AA46 mov     edi, off_46C030[ebx*4]
0044AA4D add     edi, [esi+9Ch]             ; Dispatcher instruction. Adding a stack 
                                            ; value to edi (jump destination)
0044AA53 mov     ebx, [esi+0A0h]
0044AA59 jmp     edi                        ; Jumping to edi 

Figure 7: Dispatcher utilizing stack values to calculate the indirect jump's destination

In a smaller number of cases, we encounter dispatcher blocks of this layout ending with a jmp instruction that does not branch to a register value. Instead, it utilizes a value stored on the stack to determine the jump target.

0041CCB4 mov     eax, [esi+5Ch]
0041CCB7 mov     [eax], edi
0041CCB9 jmp     dword ptr [esi+14h]        ; Dispatcher jump to a stack value

Figure 8: Dispatcher with memory-based indirect jump

Mixed-order Dispatcher Layout

Mixed-order dispatcher layout is a variant of the register-based and memory-based dispatcher layouts. There are 12 memory-based and 28 register-based dispatcher blocks that fall into this mixed-order category.

Most dispatcher instructions are placed at the tail of an original instruction or a sequence of original instructions. However, this can vary and parts of the dispatcher block can also be split up and randomly intertwined with the initial instructions. This unpredictable placement adds another layer of complexity to the deobfuscation process.

Dispatcher instructions:
  0041E847 mov     eax, 0F5A88CDAh                   ; Dispatcher instruction
  0041E84C xor     eax, dword_459880                 ; Dispatcher instruction
  0041E852 mov     ecx, off_459878                   ; Dispatcher instruction
  0041E858 add     eax, ecx                          ; Dispatcher instruction
  0041E85A inc     eax                               ; Dispatcher instruction

Original instructions:
0041E85B mov ebx, [esi+48h]
0041E85E mov ecx, [ebp+10h]
0041E861 mov [ebx], ecx
0041E863 mov edi, [esi+2Ch]
0041E866 mov ecx, [ebp+0Ch]
0041E869 mov [edi], ecx
0041E86B mov edi, [esi+0Ch]
0041E86E mov ecx, [esi+20h]
0041E871 mov dword ptr [edi], 0
0041E877 mov dword ptr [ecx], 0
0041E87D xorps xmm0, xmm0
0041E880 movups xmmword ptr [edx+4], xmm0
0041E884 movups xmmword ptr [edx+14h], xmm0
0041E888 movups xmmword ptr [edx+24h], xmm0
0041E88C mov dword ptr [edx+38h], 0
0041E893 mov dword ptr [edx+34h], 0
0041E89A mov dword ptr [edx], 3Ch
0041E8A0 mov dword ptr [edx+8], 0FFFFFFFFh
0041E8A7 mov dword ptr [edx+14h], 0FFFFFFFFh
0041E8AE mov dword ptr [edx+30h], 0FFFFFFFFh

0041E8B5 jmp eax ; Indirect jump

Figure 9: Mixed-order dispatcher example

Conditional Dispatcher

Conditional dispatchers deserve extra attention as they introduce more logic than unconditional ones. It is also important to note that all conditional branches are not subject to being obfuscated. We have identified 379 such instances within the case study sample that remain in their original state. These are leveraged in the context of tight loops and heavy string processing routines. They are likely left out of the protection scheme due to the severe performance degradation they induce.

The structure of conditional dispatcher blocks exhibits a slight variation from that of unconditional dispatchers. Given that the intent is to protect conditional logic, there will always be two possible outcomes:

  • The branch that satisfies the condition being taken

  • The fallthrough branch that does not satisfy the condition being taken

The obfuscator employs a table of paired entries for each conditional branch that is indexed given the result of the condition, which will either be true or false (0 or 1). Each index corresponds to one of the two branches that can be taken.

Conditional dispatchers fall into three distinct categories.

  1. Standard conditional logic
    • The obfuscator accounts for all common conditional jump conditions
    • The condition code is evaluated using one of the following instructions:
      • test <reg>, <reg>
      • cmp <reg>, <imm>
    • setcc is then used to capture the original conditional jump logic. That is to say, every original conditional jump instruction is reflected as its setcc counterpart (e.g., a jnz becomes a setnz)
  2. Loop logic
    • Non-infinite loops require conditional logic as a means of exiting the loop body. The obfuscator implements this using three distinct dispatcher blocks linked with an arbitrary subset of dispatcher blocks that represent the loop body
      • Initialization block
        • Initializes the default branch target via an "exit condition" flag that is always set to false (so that execution is transferred to the start of the loop body)
      • Update block
        • Updates the exit condition flag based on the processing of either the initialization block or logic stemming from the loop body
      • Exit-check block
        • Checks whether the exit condition flag is either set to exit the loop or transfer execution back to the loop body
  3. Syscall logic
    • This category is specific to a LummaC2 component that invokes Windows syscalls and disguises how the resulting NTSTATUS code is verified. This is effectively a conditional dispatcher that implements the NT_SUCCESS macro.
    • The following instruction sequences are used to determine the success of a syscall by negating the returned NTSTATUS and inspecting its sign value. A value of 1 indicates a successful syscall while 0 indicates a failed syscall.
      • not eax
      • shr eax, 0x1F

Standard Conditional Dispatcher Type

Continuing with using the case study sample from earlier, we find the standard conditional dispatcher type occurring 987 times out of the 1,063 conditional dispatchers.

Figure 10 and Figure 11 illustrate this type where the conditional value is tested against both zero and a non-zero constant. The first figure shows the conditional value being compared to 0 using a test instruction. The second shows the conditional value being evaluated against a non-zero constant 0x5A4D using a cmp instruction.

0041656E call    sub_41C610                 ; subroutine call at 0x41C610
00416573 mov     esi, eax                   ; save set return value (eax) into esi
00416575 xor     eax, eax                   ; clear out the index
00416577 test    esi, esi                   ; evaluate the result
00416579 setnz   al                         ; Set al if conditional value is not zero
0041657C mov     eax, off_457CF4[eax*4]     ; fetch appropriate encoded branch target
00416583 mov     ecx, 0C09E0A35h            ; start the decoding sequence
00416588 xor     ecx, dword_457CFC
0041658E add     eax, ecx
00416590 inc     eax
00416591 jmp     eax                        ; transfer execution to the decoded
                                            ; branch value

Figure 10: Conditional dispatcher with the conditional value being compared to 0

0044DD15 movzx   ecx, word ptr [edi]        ; fetch the 16-bit value to evaluate
0044DD18 xor     edx, edx                   ; clear out the index
0044DD1A cmp     ecx, 5A4Dh                 ; compare to the 0x5A4D constant
0044DD20 setnz   dl                         ; set the index to the result 
0044DD23 mov     ecx, off_46F304[edx*4]     ; fetch appropriate encoded branch target
0044DD2A mov     edx, 9EC9743Dh             ; start the decoding sequence
0044DD2F xor     edx, dword_46F30C
0044DD35 add     ecx, edx
0044DD37 inc     ecx
0044DD38 jmp     ecx                        ; transfer execution to the decoded
                                            ; branch value 

Figure 11: Conditional dispatcher with the conditional value being compared to a non-zero constant

Loop Conditional Dispatcher Type

Figure 12, Figure 13 and Figure 14 provide an illustration of a loop conditional dispatcher type, which occurs 42 times within the sample. It is always a collection of linked dispatcher blocks that include the loop initialization sequence, the loop body (an arbitrary collection of dispatcher blocks specific to the loop logic), an update condition block, and finally a check-exit condition block.

The initialization block sets the stage for a loop by establishing an "exit condition" flag and initializing it to false, ensuring the loop body executes at least once. The update block then modifies this flag based on the results of the initialization block or the loop body's logic. Finally, the exit-check block examines the flag's state to determine whether to continue iterating or exit the loop.

0044CD55 mov     dword_470A30, ebx
0044CD5B mov     edi, [ebp-34h]
0044CD5E xchg    ax, ax
0044CD60 mov     eax, off_46CB3C
0044CD65 mov     ecx, 74F906B5h
0044CD6A xor     ecx, dword_46CB44
0044CD70 add     eax, ecx
0044CD72 inc     eax
0044CD73 mov     dword ptr [ebp-30h], 0
0044CD7A mov     dword ptr [ebp-18h], 0     ; conditional flag, initially 0 to
                                            ; reflect transfer to the loop body
                                            ; not the loop exit
0044CD81 mov     dword ptr [ebp-28h], 0
0044CD88 mov     dword ptr [ebp-40h], 0
0044CD8F jmp     eax

Figure 12: A loop implementation block implemented

0044C108 mov     ecx, [ebp-5Ch]
0044C10B mov     eax, [ecx+1]
0044C10E add     eax, ecx
0044C110 add     eax, 5
0044C113 mov     [ebp-18h], eax             ; instructions that update the
                                            ; conditional flag
0044C116 mov     eax, off_46CFE4
0044C11B mov     ecx, 681DADB7h
0044C120 xor     ecx, dword_46CFEC
0044C126 add     eax, ecx
0044C128 inc     eax
0044C129 nop     dword ptr [eax+00000000h]
0044C130 mov     ecx, [ebp-18h]
0044C133 mov     [ebp-28h], ecx
0044C136 jmp     eax

Figure 13: A update-block loop

0044C2AD xor     eax, eax
0044C2AF mov     edx, [ebp-18h]             ; evaluate the conditional flag
0044C2B2 test    edx, edx
0044C2B4 setnz   al
0044C2B7 mov     ecx, 27DC8BC9h
0044C2BC xor     ecx, dword_46D248
0044C2C2 mov     eax, off_46D240[eax*4]     ; fetch the target                          
0044C2C9 add     eax, ecx
0044C2CB inc     eax
0044C2CC mov     [ebp-28h], edx
0044C2CF mov     ebx, [ebp-20h]
0044C2D2 jmp     eax                        ; Jump back to a loop body block
                                            ; or exit the loop

Figure 14: An exit-check block

Syscall Conditional Dispatcher Type

Dispatchers of this type are used for checking the return values of LummaC2-specific function calls that perform a syscall. They appear only 34 times in the case study sample. In these functions, LummaC2 decrypts the shellcode in Figure 15 and executes it in memory to make a particular syscall.

mov eax, <syscall ID> 
mov edx, win32u.Wow64SystemServiceCall
call edx
ret <imm16>

Figure 15: Shellcode to call Windows system call

In other cases, the malware makes direct calls to Windows Native APIs instead of utilizing the shellcode in Figure 15.

The conditional dispatcher for this type implements the NT_SUCCESS macro by checking whether the returned NTSTATUS code is successful or not. This is done via checking the sign of the inverted NTSTATUS code and capturing it as the branch target index, which will either be 0 or 1. Given that a successful NTSTATUS code is always a 32-bit zero value, a successful syscall will result in the true branch (index 1) being taken, and a failed syscall will result in the false branch (index 0) being taken.

00424D95 call    sub_44EDA0              ; wrapper function to perform a syscall
00424D9A add     esp, 0Ch
00424D9D not     eax                     ; negate all bits of the NTSTATUS return value
00424D9F shr     eax, 1Fh              	 ; isolate the sign bit to capture the
                                         ; result and in turn, the index to
                                         ; the according branch
00424DA2 mov     eax, off_45DC9C[eax*4]  ; fetch the according branch target
00424DA9 mov     ecx, 31637ACh
00424DAE xor     ecx, dword_45DCA4
00424DB4 add     eax, ecx
00424DB6 inc     eax
00424DB7 jmp     eax

Figure 16: Conditional dispatcher to check syscall return values

Obfuscated Function Recovery

Original Instruction Recovery 

Recovering the original control flow of a protected function requires us to differentiate between the obfuscator's injected dispatcher instructions and the function's original instructions. To solve this, we decide to use symbolic backward slicing, a program analysis technique that identifies instructions that influence a specific register or memory address at a given point within a simulated execution on an intermediate representation. In this context, we employ backward slicing to do the following:

  • Isolate the dispatcher instructions from the original instructions

  • Determine which explicit instructions calculate the final indirect transfer of control

In our deobfuscator design, we leverage the Triton symbolic execution engine to conduct the core of the recovery. Triton implements backward tracing APIs that we can use directly. When executing the program, Triton maintains a set of symbolic expressions that represent the values of registers and memory addresses. These expressions are stored as an Abstract Syntax Tree (AST), where each tree node represents an operation with operands that result from the execution flow. Triton refers to this implementation as "processing," which is the result of simulating the memory effects a culmination of emulated instructions produce and reflecting that result as an AST.

This is a powerful abstraction that allows us to reason about the deobfuscation at an AST level and ignore the verbose disassembly produced by the obfuscator. 

To distinguish dispatcher instructions, we'll focus on the destination of the final indirect jump in a dispatcher block. By looking up this destination in the constructed ASTs after all dispatcher instructions are processed, we can extract its corresponding symbolic expressions. 

Figure 17 shows the AST of the destination register eax at an indirect jump. This AST represents all symbolic expressions from the result of the symbolic processing of the corresponding instructions that influence the value of the destination register before the indirect jump is executed.

<div>
  <div>




  
  
    
    <img alt="ASTs of the destination register after the indirect jump instruction is processed" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig17.max-1000x1000.png" />
    
    
  
    <p>Figure 17: ASTs of the destination register after the indirect jump instruction is processed</p>
  



  </div>
</div>

Using Triton's APIs, we can extract a subset (or slice) of the processed expressions that collectively contribute to the final destination address of an indirect jump. For each expression in the slice, we can map it back to the specific dispatcher instruction that generates it. This mapping is possible because Triton maintains the association between instructions and the symbolic expressions they produce during its execution.

A snippet of the code used to perform backward slicing to distinguish dispatcher instructions from the original ones is shown in Figure 18.

# Retrieve the bytes of the instruction at the current program counter
instructionBytes = context.getConcreteMemoryAreaValue(pc, 16)

Create a Triton Instruction object from the retrieved bytes

instruction = Instruction(pc, instructionBytes)

Process the instruction using the Triton context

context.processing(instruction)

Scan for dispatcher jump instruction

if instruction.getType() == OPCODE.X86.JMP:
# Extract the operand of the JMP instruction
jmpOperand = instruction.getOperands()[0]
# Process JMP instructions with register operand only
if jmpOperand.getType() == OPERAND.REG:
# Get symbolic expression of destination register
destRegExpression = context.getSymbolicRegisters()[jmpOperand.getId()]
# Backward slice on the destination register
slicing = context.sliceExpressions(destRegExpression)
# Iterating through the slices
for _, sliceInstr in sorted(slicing.items()):
# Print out the disassembled instruction of each slice
sliceInstrDisassembly = sliceInstr.getDisassembly()
print(‘\t[Slice]’, sliceInstrDisassembly)

Figure 18: Triton code to perform backward slicing to recover all dispatcher instructions

Here, we continuously execute instructions until a jmp instruction is encountered. If the instruction's operand is a register, we retrieve its set of symbolic expressions and perform a backward slice to recover all instructions that influenced its result. Triton allows us to further preserve the original disassembly given a set of symbolic expressions that we leverage to extract the exact dispatcher instructions that produce the slice, and not merely the AST representation.

Once the complete backward slice for the destination has been retrieved, we can confidently distinguish the dispatcher instructions from the original instructions within the function. This distinction holds true regardless of the placement or order of the dispatcher instructions within a protected block since the backward slice only monitors those instructions that directly influence the final value.

Backward slicing output:
...
[Processing] 0x416530: lea eax, [esp + 8]
[Processing] 0x416534: push eax
[Processing] 0x416535: call dword ptr [0x454a18]
[Processing] 0x41653b: mov eax, esp
[Processing] 0x41653d: push eax
[Processing] 0x41653e: call dword ptr [0x454a14]
[Processing] 0x416544: mov eax, dword ptr [0x457c1c]
[Processing] 0x416549: mov ecx, 0xa15bd01f
[Processing] 0x41654e: xor ecx, dword ptr [0x457c24]
[Processing] 0x416554: add eax, ecx
[Processing] 0x416556: inc eax
[Processing] 0x416557: jmp eax
	[Slice] 0x416544: mov eax, dword ptr [0x457c1c]
	[Slice] 0x416549: mov ecx, 0xa15bd01f
	[Slice] 0x41654e: xor ecx, dword ptr [0x457c24]
	[Slice] 0x416554: add eax, ecx
	[Slice] 0x416556: inc eax
...

Figure 19: Output for the code in Figure 18 to distinguish dispatcher instructions

Control Flow Recovery

In addition to recovering all original instructions of the function, we must also recover the original control flow. While instructions are processed dynamically, Triton allows us to determine the concrete destination value of the final indirect jump in the dispatcher block. With this, we can trace the program's execution flow and reconstruct the order in which dispatcher blocks are executed.

To explore all possible execution paths within the function, we employ a depth-first search (DFS) traversal algorithm. 

We begin by exploring a single path, following the control flow dictated by the obfuscator's indirect jumps. This continues until the path reaches a termination point, such as a ret instruction or a program-ending API call (e.g., ExitProcess).

In our deobfuscator design, we default to viewing all of these protected jumps as jnz instructions by forcing the index register to be 1 in the main execution path being processed. When encountering a protected conditional jump, we assume the condition is met and continue exploring the path that follows the jump. However, we don't discard the alternative path. The alternative path is stored in a queue-like data structure. This allows us to revisit these paths later when we've exhausted all possibilities on the current path.

By systematically exploring all paths using DFS and handling conditional jumps strategically, we can reconstruct the original control flow that has been obfuscated with the compiler's indirect jumps.

Deobfuscation: Rebuilding Original Function

With the original instructions and execution paths identified, we can deobfuscate the sample by rebuilding the functions we have processed. Our goal is to ensure the deobfuscated functions are restored to their original state, preserving their original semantics and removing all traces of the obfuscator.

Instruction Rewriting

When rebuilding, we can overwrite the original protected function with the deobfuscated instructions. Since a deobfuscated function always has fewer instructions than an obfuscated function, there is guaranteed space to accommodate the rebuilt function. The remaining space can be padded with standard compiler padding instructions like 0xCC.

The rewriting process involves writing instructions back from the function's entry point in the order they are processed and executed during the Triton analysis, excluding all dispatcher instructions. Here, we will address two specific cases involving indirect jumps originally added by the obfuscator.

The first case involves processing an unconditional dispatcher block. For this case, if the jump target has not been written yet, we simply skip it and continue writing instructions sequentially. If the jump target has already been written, we replace the indirect jump with a direct one to branch back to that target.

The second case for handling the jump instruction of a conditional dispatcher block is a bit more convoluted. Before tackling this, we must determine the original conditional jump type (e.g., jz, jnz, jl) based on the preceding setcc dispatcher instruction.

Since the indirect jump can target one of the two destinations given a condition, we must replace it with two instructions. The first instruction is a conditional jump to the first destination using the correct conditional jump type.

The second instruction can be either:

  • A conditional jump with the opposite type as the first, targeting the second destination.

  • A direct jump to the second destination. This is chosen for simplicity of our deobfuscator implementation.

0041652B call    sub_4455F0         ; original instruction
00416530 movzx   eax, al            ; eax = al = return value
00416533 test    eax, eax           ; set flags
00416535 jnz     loc_416540         ; replacing indirect jmp with jnz for the first path
0041653B jmp     loc_416554         ; insert a jmp for the second path

Figure 20: Replacing an indirect conditional jump with a jnz-jmp instruction pair

Offset Relocation

The final step, relocation, addresses a remnant from our rebuilding process. As we remove dispatcher instructions and duplicated instructions, the rewritten instructions will occupy different locations from where they were in the original function. This displacement throws off the offsets of jump, call, and other memory-referencing instructions that are not position-independent, as they now need to refer to memory locations from their new addresses.

In our current implementation, we address this by parsing all of the memory-referencing instructions and calculating their correct offsets after deobfuscation. This involves tracking both the original and relocated addresses of each instruction. With this information, we can calculate the adjusted offset to reach the target memory reference and craft the correct opcode for each instruction.

Final Result

By employing techniques described in this blog post, we have successfully developed a deobfuscation tool for this version of LummaC2. In the following figures, we see the result of our deobfuscator lifting the protection from two protected functions in the case study sample.

<div>
  <div>




  
  
    
    <img alt="Disassembly view of the subroutine at the binary's entrypoint before deobfuscation" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig21.max-1000x1000.png" />
    
    
  
    <p>Figure 21: Disassembly view of the subroutine at the binary's entrypoint before deobfuscation</p>
  



  </div>
</div>
<div>
  <div>




  
  
    
    <img alt="Decompiler view of the subroutine at the binary's entrypoint after deobfuscation" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig22.max-1000x1000.png" />
    
    
  
    <p>Figure 22: Decompiler view of the subroutine at the binary's entrypoint after deobfuscation</p>
  



  </div>
</div>
<div>
  <div>




  
  
    
    <img alt="Disassembly view of the subroutine at address 0x41EE50 before deobfuscation" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig23.max-1000x1000.png" />
    
    
  
    <p>Figure 23: Disassembly view of the subroutine at address 0x41EE50 before deobfuscation</p>
  



  </div>
</div>
<div>
  <div>




  
  
    
    <img alt="Decompiler view of the subroutine at address 0x41EE50 after deobfuscation" src="https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig24.max-1000x1000.png" />
    
    
  
    <p>Figure 24: Decompiler view of the subroutine at address 0x41EE50 after deobfuscation</p>
  



  </div>
</div>

As shown in these figures, the original instructions are now readily apparent, free from the clutter of dispatcher blocks added by the obfuscator. The control flow, once obscured by indirect jumps, is now clearly visible and can be recovered and decompiled using IDA Pro. After deobfuscating all protected functions, we can now analyze the original program to comprehend its capabilities and behaviors.

Conclusion

In this blog post, we have explored the inner workings of LummaC2's obfuscation technique using indirect jumps to manipulate control flow. By leveraging backward slicing and symbolic execution, we have been able to consistently identify the original instructions and eliminate dispatcher instructions added by the obfuscator. Furthermore, we have discussed strategies for deobfuscation, including rebuilding the original function from the recovered control flow and addressing relocation challenges.

While this blog post focuses on deobfuscating LummaC2 protected subroutines, the power of backward slicing as a binary analysis technique extends well beyond this specific case. We hope our exploration of deobfuscating LummaC2 through the use of backward slicing has provided valuable insights to fellow analysts tackling similar challenges in the ever-evolving realm of reverse engineering and malware analysis.

Indicators of Compromise

A Google Threat Intelligence Collection featuring indicators of compromise (IOCs) related to the activity described in this post is now available.

Host-Based IOCs

MD5

Associated Malware Family

d01e27462252c573f66a14bb03c09dd2

LUMMAC.V2

5099026603c86efbcf943449cd6df54a

LUMMAC.V2

205e45e123aea66d444feaba9a846748

LUMMAC.V2

Article Link: https://cloud.google.com/blog/topics/threat-intelligence/lummac2-obfuscation-through-indirect-control-flow/