Code obfuscation is one of the cornerstones of malware. The harder code is to analyze the longer attackers can fly below the radar and hide the full capabilities of their creations.
Code obfuscation techniques are very old and take many many forms from source code modifications, opcode manipulations, packer layers, virtual machines and more.
Obfuscations are common amongst native code, script languages,
.NET IL, and
Java byte code.
As a defender, it’s important to be able to recognize these types of tricks, and have tools that are capable of dealing with them. Understanding the capabilities of the medium is paramount to determine what is junk, what is code, and what may simply be a tool error in data display.
On the attackers side, in order to develop a code obfuscation there are certain prerequisites required. The attacker needs tooling and documentation that allows them to craft and debug the complex code flow.
For binary implementations such as
native code or
IL, this would involve specs of the target file format, documentation on the opcode instruction set, disassemblers, assemblers, and a capable debugger.
One of the code formats that has not seen common obfuscation has been the
Visual Basic 6 P-Code byte streams. This is a proprietary opcode set, in a complex file format, with limited tooling available to work with it.
In the course of exploring this instruction set certain questions arose:
VB6 P-Codebe obsfuscated at the byte stream layer?
- Has this occurred in samples in the wild?
- What would this look like?
- Do we have tooling capable of handling it?
Before we continue, we will briefly discuss the
VB6 P-Code format and the tools available for working with it.
VB6 P-Code is a proprietary, variable length, binary instruction set that is interpreted by the
VB6 Virtual Machine (
In terms of documentation, Microsoft has never published details of the
VB6 file format or opcode instruction set. The opcode handler names were gathered by reversers from the debug symbols leaked with only a handful of runtimes.
At one time there was a reversing community, vb-decompiler.theautomaters.com, which was dedicated to the
VB6 file format and P-Code instruction set. Mirrors of this message board are still available today .
On the topic of tooling the main disassemblers are
Semi-Vbdecompiler and the
WKTVBDE P-Code debugger.
Of these only Semi-Vbdecompiler shows you the full argument byte stream, the rest display only the opcode byte. While several private P-Code debuggers exist,
WKTVBDE is the only public tool with debugging capabilities at the P-Code level.
In terms of opcode meanings. This is still widely undocumented at this point. Beyond intuition from their names you would really have to compile your own programs from source, disassemble them, disassemble the opcode handlers and debug both the native runtime and P-Code to get a firm grasp of whats going on.
As you can glimpse, there is a great deal of information required to make sense of P-Code disassembly and it is still a pretty dark art for most reversers.
Do VB6 obfuscators exist?
While doing research for this series of blog posts we started with an initial sample set of
25,000 P-Code binaries which we analyzed using various metrics.
VB6 malware uses to obfuscate their intent include:
- junk code insertion at source level
- inclusion of large bodies of open source code to bulk up binary
- randomized internal object and method names
- mostly commonly done at pre-compilation stage
- some tools work post compilation.
- all manner of encoded strings and data hiding
- native code blobs launched with various tricks such as
To date, we have not yet documented P-Code level manipulations in the wild.
Due to the complexity of the vector, P-Code obsfuscations could have easily gone undetected to date which made it an interesting area to research. Hunting for samples will continue.
Can VB P-Code even be obfuscated and what would that look like?
In the course of research, this was a natural question to arise. We also wanted to make sure we had tooling which could handle it.
Consider the following
The default P-Code compilation is as follows:
An obsfuscated sample may look like the following:
From the above we see multiple opcode obfuscation tricks commonly seen in native code.
It has been verified that this code runs fine and does not cause any problems with the runtime. This mutated file has been made available on
Virustotal in order for vendors to test the capabilities of their tooling .
To single out some of the tricks:
Jump over junk:
Jumping into argument bytes:
At runtime what executes is:
Do nothing sequences:
Invalid sequences which may trigger fatal errors in disassembly tools:
The easiest markers of P-Code obfuscation are:
- jumps into the middle of other instructions
- unmatched for/next opcodes counts
- invalid/undefined opcodes
- unnatural opcode sequences not produced by the compiler
- errors in argument resolution from randomized data
Some junk sequences such as
Not Not can show up normally depending on how a routine was coded.
This level of detection will require a competent, error-free, disassembly engine that is aware of the full structures within the
VB6 file format.
Code obfuscation is a fact of life for malware analysts. The more common and well documented the file format, the more likely that obfuscation tools are wide spread in the wild.
This reasoning is likely why complex formats such as
Java had many public obfuscators early on.
This research proves that
VB6 P-Code obfuscation is equally possible and gives us the opportunity to make sure our tools are capable of handling it before being required in a time constrained incident response.
The techniques explored here also grant us the insight to hunt for advanced threats which may have been already using this technique and had flown under the radar for years.
We encourage researchers to examine the mutated sample [
2] and make sure that their frameworks can handle it without error.
 vb-decompiler.theautomaters.com mirror
 Mutated P-Code sample SHA256 and VirusTotal link